When creating a multiclass neural network, when do we need to include an extra class for "no object of interest here" inputs?
In most tutorials and articles I have read there was no explanation or additional thought put into the cases where the network will receive an input that shouldn't be classified as something of interest. For me this is more apparent in CNNs, where frames from a video feed are inputted to the network. Obviously not every frame will contain an object needed to be classified. Even in the basic cats vs dogs examples I haven't read about an extra class for handling empty frames.
In the case where no extra class is used, will the probability distribution of softmax output give a relatively evenly distributed prediction, for example if we have 3 classes: [class1, class2, class3], will an empty frame be predicted as [0.33, 0.33, 0.33]?
Also if a network is very confident in its predictions what's to stop it from outputting a prediction that is confidently wrong like [0.9, 0.1, 0.0]? This may be above a threshold of say 0.85 and thus we get an incorrectly classified frame.
What should be done to avoid such situations?
Related
My original task was to classify various cell types (the classes) based on gene expression patterns and this problem simply involves predicting one label from multiple classes. This was done easily since I could assign a one-hot encoded vector and train neural network.
Now the new problem is within a sample there could be a mixture of various cells (hence a multi-label problem). The new challenge is to not only detect multiple labels but the proportions of each label. For example, if there are a total of 3 cell_types and a sample contained 2 cell_type_1, 1 cell_type_2, and 1 cell_type_3 then the output of the classifier should be [0.50, 0.25, 0.25] as opposed to [1, 1, 1].
From the brief research I have done there are various methods to do the binary style classification but not a whole lot for the proportions one. I have read about different accuracy function like exact match ratio and hamming loss that seem promising for this type of problem. Also learned that the activation function for the last layer should be a sigmoid as opposed to softmax because a softmax assigns a probability and this property deteoriotes its ability to recognize multiple labels. I wonder if in my case this would play to my advantage since proportions matter?
I want to first get a sense of whether this problem is even possible (I am used to doing categorical), the kinds of loss/accuracy function recommended for this problem, various architecture (if this has been done well before), and any other recommendation/resources. Also I am using Keras in R if that may aid in providing more context.
If i am doing a multi-classification problem, is there a way to essentially make a class an "unsure" class? For example if my model doesn't have a very strong prediction, it should default to this class. Like when you take a test, some tests penalize you for wrong answers, some don't. I want to do a custom loss function that doesn't penalize my model for guessing the neutral class, but does penalize if the model makes a prediction that is wrong. Is there a way to do what i am trying to do?
For classifiers using a one-hot encoded softmax output layer, the outputs can be interpreted as a probability that the input falls into each of the categories. e.g. if your model has outputs (cat, dog, frog), then an output of (0.6, 0.2, 0.2) means the input has (according to the classifier) a 60% chance of being a cat and a 20% chance for each of being a dog or frog.
In this case, when the model is uncertain it can (and will) have an output where no one class is particularly likely, e.g. (0.33, 0.33, 0.33). There's no need to add a separate 'Other' category.
Separate to this, it might be difficult to train an "unsure" category, unless you have specific input examples that you want to train the model to classify as "unsure".
I encountered the very same problem.
I tried using a neutral class, but the neural net will either put nothing in it, or everything in it depending on the reduced loss.
After some searching, it looks like we are trying to achieve "neural network uncertainty estimation". One of the ways to achieve that is to run your image 100 times in your neural net with random dropouts and see how many times it hits the same class.
This blog post explains it well : https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
This video also : https://medium.com/deeplearningmadeeasy/how-to-add-uncertainty-to-your-neural-network-afb5f855e66a
I will let you know and publish here if I have some results with that.
I am experimenting with classification using neural networks (I am using tensorflow).
And unfortunately the training of my neural network gets stuck at 42% accuracy.
I have 4 classes, into which I try to classify the data.
And unfortunately, my data set is not well balanced, meaning that:
43% of the data belongs to class 1 (and yes, my network gets stuck predicting only this)
37% to class 2
13% to class 3
7% to class 4
The optimizer I am using is AdamOptimizer and the cost function is tf.nn.softmax_cross_entropy_with_logits.
I was wondering if the reason for my training getting stuck at 42% is really the fact that my data set is not well balanced, or because the nature of the data is really random, and there are really no patterns to be found.
Currently my NN consists of:
input layer
2 convolution layers
7 fully connected layers
output layer
I tried changing this structure of the network, but the result is always the same.
I also tried Support Vector Classification, and the result is pretty much the same, with small variations.
Did somebody else encounter similar problems?
Could anybody please provide me some hints how to get out of this issue?
Thanks,
Gerald
I will assume that you have already double, triple and quadruple checked that the data going in is matching what you expect.
The question is quite open-ended, and even a topic for research. But there are some things that can help.
In terms of better training, there's two normal ways in which people train neural networks with an unbalanced dataset.
Oversample the examples with lower frequency, such that the proportion of examples for each class that the network sees is equal. e.g. in every batch, enforce that 1/4 of the examples are from class 1, 1/4 from class 2, etc.
Weight the error for misclassifying each class by it's proportion. e.g. incorrectly classifying an example of class 1 is worth 100/43, while incorrectly classifying an example of class 4 is worth 100/7
That being said, if your learning rate is good, neural networks will often eventually (after many hours of just sitting there) jump out of only predicting for one class, but they still rarely end well with a badly skewed dataset.
If you want to know whether or not there are patterns in your data which can be determined, there is a simple way to do that.
Create a new dataset by randomly select elements from all of your classes such that you have an even number of all of them (i.e. if there's 700 examples of class 4, then construct a dataset by randomly selecting 700 examples from every class)
Then you can use all of your techniques on this new dataset.
Although, this paper suggests that even with random labels, it should be able to find some pattern that it understands.
Firstly you should check if your model is overfitting or underfitting, both of which could cause low accuracy. Check the accuracy of both training set and dev set, if accuracy on training set is much higher than dev/test set, the model may be overfiiting, and if accuracy on training set is as low as it on dev/test set, then it could be underfitting.
As for overfiiting, more data or simpler learning structures may work while make your structure more complex and longer training time may solve underfitting problem
When training a set of classes (let's say #clases (number of classes) = N) on Caffe Deep Learning (or any CNN framework) and I make a query to the caffemodel, I get a % of probability of that image could be OK.
So, let's take a picture of a similar Class 1, and I get the result:
1.- 90%
2.- 10%
rest... 0%
the problem is: when I take a random picture (for example of my environment), I keep getting the same result, where one of the class is predominant (>90% probability) but it doesn't belong to any class.
So what I'd like to hear is opinions/answers from people which has experienced this and would have solved how to deal with no-sense inputs to the Neural Network.
My purposes are:
Train one more extra class with negative images (like with train_cascade).
Train one more class extra with all the positive images in the TRAIN set, and the negative on the VAL set.
But my purposes don't have any scientific base to execute them, that's why I ask you this question.
What would you do?
Thank you very much in advance.
Rafael.
I Wrote a simple recurrent neural network (7 neurons, each one is initially connected to all the neurons) and trained it using a genetic algorithm to learn "complicated", non-linear functions like 1/(1+x^2). As the training set, I used 20 values within the range [-5,5] (I tried to use more than 20 but the results were not changed dramatically).
The network can learn this range pretty well, and when given examples of other points within this range, it can predict the value of the function. However, it can not extrapolate correctly and predicting the values of the function outside the range [-5,5]. What are the reasons for that and what can I do to improve its extrapolation abilities?
Thanks!
Neural networks are not extrapolation methods (no matter - recurrent or not), this is completely out of their capabilities. They are used to fit a function on the provided data, they are completely free to build model outside the subspace populated with training points. So in non very strict sense one should think about them as an interpolation method.
To make things clear, neural network should be capable of generalizing the function inside subspace spanned by the training samples, but not outside of it
Neural network is trained only in the sense of consistency with training samples, while extrapolation is something completely different. Simple example from "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8" shows how NN behave in this context
All of these networks are consistent with training data, but can do anything outside of this subspace.
You should rather reconsider your problem's formulation, and if it can be expressed as a regression or classification problem then you can use NN, otherwise you should think about some completely different approach.
The only thing, which can be done to somehow "correct" what is happening outside the training set is to:
add artificial training points in the desired subspace (but this simply grows the training set, and again - outside of this new set, network's behavious is "random")
add strong regularization, which will force network to create very simple model, but model's complexity will not guarantee any extrapolation strength, as two model's of exactly the same complexity can have for example completely different limits in -/+ infinity.
Combining above two steps can help building model which to some extent "extrapolates", but this, as stated before, is not a purpose of a neural network.
As far as I know this is only possible with networks which do have the echo property. See Echo State Networks on scholarpedia.org.
These networks are designed for arbitrary signal learning and are capable to remember their behavior.
You can also take a look at this tutorial.
The nature of your post(s) suggests that what you're referring to as "extrapolation" would be more accurately defined as "sequence recognition and reproduction." Training networks to recognize a data sequence with or without time-series (dt) is pretty much the purpose of Recurrent Neural Network (RNN).
The training function shown in your post has output limits governed by 0 and 1 (or -1, since x is effectively abs(x) in the context of that function). So, first things first, be certain your input layer can easily distinguish between negative and positive inputs (if it must).
Next, the number of neurons is not nearly as important as how they're layered and interconnected. How many of the 7 were used for the sequence inputs? What type of network was used and how was it configured? Network feedback will reveal the ratios, proportions, relationships, etc. and aid in the adjustment of network weight adjustments to match the sequence. Feedback can also take the form of a forward-feed depending on the type of network used to create the RNN.
Producing an 'observable' network for the exponential-decay function: 1/(1+x^2), should be a decent exercise to cut your teeth on RNNs. 'Observable', meaning the network is capable of producing results for any input value(s) even though its training data is (far) smaller than all possible inputs. I can only assume that this was your actual objective as opposed to "extrapolation."