How to determine unknown class in image classification model such as mobilenet? - image-processing

I have n classes and a one unknown.
Unknown is not included in the training set as the way is not explored as yet by me.
I trained the mobilenet(or inception-v3) for n classes. The confusion matrix is very good.
Now if an unknown-class image comes in for prediction, the model predicts it as any of the n classes. Which is clearly misclassified.
The confidence also comes near by 0.998, which makes it difficult to filter out. Otherwise known class object of n trained classes is classified at same confidence.
I tried keeping a class which does not include any feature set of n classes i.e. sort of negative sampled class as unknown class. As output, the confusion matrix goes very bad. Bad enough not to go further with that. This I am still going through.
How to determine unknown class using neural network? That it doesn't fall in any known class classification.

I have dealt with a similar problem in the past. You have to keep the following points in mind:
a. When you introduce the "unknown" class as your "n+1"th class, you should keep in mind that this class should represent the same variance as you would expect during your live/prod run. To elaborate, if you assume images from m different categories to come which are not a part of your training label, then all such images should be represented in this "unknown" label. This will help to bring the confidence score down for these "out-of-scope" images.
b. Additionally, once you bring the confidence score down with the help of above method, you can set a threshold on the confidence score.
The above two steps combined can help you filter out the "out-of-scope" images. I hope this helps for you. In case of further clarification, let me know.

Related

Is there a way to do a "neutral" class in a multi-classification problem?

If i am doing a multi-classification problem, is there a way to essentially make a class an "unsure" class? For example if my model doesn't have a very strong prediction, it should default to this class. Like when you take a test, some tests penalize you for wrong answers, some don't. I want to do a custom loss function that doesn't penalize my model for guessing the neutral class, but does penalize if the model makes a prediction that is wrong. Is there a way to do what i am trying to do?
For classifiers using a one-hot encoded softmax output layer, the outputs can be interpreted as a probability that the input falls into each of the categories. e.g. if your model has outputs (cat, dog, frog), then an output of (0.6, 0.2, 0.2) means the input has (according to the classifier) a 60% chance of being a cat and a 20% chance for each of being a dog or frog.
In this case, when the model is uncertain it can (and will) have an output where no one class is particularly likely, e.g. (0.33, 0.33, 0.33). There's no need to add a separate 'Other' category.
Separate to this, it might be difficult to train an "unsure" category, unless you have specific input examples that you want to train the model to classify as "unsure".
I encountered the very same problem.
I tried using a neutral class, but the neural net will either put nothing in it, or everything in it depending on the reduced loss.
After some searching, it looks like we are trying to achieve "neural network uncertainty estimation". One of the ways to achieve that is to run your image 100 times in your neural net with random dropouts and see how many times it hits the same class.
This blog post explains it well : https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
This video also : https://medium.com/deeplearningmadeeasy/how-to-add-uncertainty-to-your-neural-network-afb5f855e66a
I will let you know and publish here if I have some results with that.

How to interpret scored probabilities in machine learning classification algorithm?

I am using two Neural networks for two class text classification. I'm getting 90% accuracy on test data. Also using different performance metrics like precision, recall, f-score and confusion matrix to make sure that model is performing as expected.
In the predictive experiment using trained model, I'm fetching probabilities for each prediction.The output looks as follows (Couldn't provide codes it's implemented in Azure ML Studio )
ex:
class (probability) , class 2 (probability) -> predicted class
class 1 (0.99) , class 2 (0.01) -> class 1
class 1 (0.53) , class 2 (0.47) -> class 1
class 1 (0.2) , class 2(0.8) -> class 2
Example
As per my understanding so far, by looking at the probability we can tell, how confident is the model about its prediction.And 90% accuracy means out 100 records 10 predictions could go wrong.
Now my question is, by looking at probability (confidence) can we tell which bucket the current records falls into 90%(correct prediction) or 10% (wrong prediction)?
What I'm trying to achieve is, to give end your some metric to tell him/her that this prediction is probably wrong, they might want to change it to some other class before using these results.
90% accuracy means out 100 records 10 predictions could go wrong.
It is not exactly like that; accuracy is always (although implicitly) linked to the specific test set we have used to measure it: so, 90% means that out of 100 records our classifier indeed misclassified 10 (i.e. there is not "could").
What we hope for in machine learning is that the performance of our models in new, unseen data, will be comparable to that of our test set (which, regarding the training of our model, is also unseen). Roughly speaking, provided that our new data come from the same statistical distribution with our training & test sets, it is not an unreasonable expectation.
What I'm trying to achieve is, to give end your some metric to tell him/her that this prediction is probably wrong, they might want to change it to some other class before using these results.
Intuitively, you should already know the answer to this: interpreting the returned probabilities as confidence (which, at least in principle, is not an invalid interpretation), their values tell you something about how "certain" your model is about its answers: so, what you could do is provide the end users with these probability values; in your example, the case of "Question" with probability 0.97 is indeed qualitatively not the same with the case "Question" with probability ~ 0.50...

Should a deep-learning based image classifier include a negative class

I am building a image classifier similar to alexnet(https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks).
I have 6 categories [people,car,bikes,animals,hydroplanes,boats]. So right now if I give an object that doesn't not belong to the above mentioned classes it gets classified to either of the classes with some probability.
To increase the accuracy is it wise to add more classes or add a negative class?
And if I had to add a negative class what kind of data would I train it on?
Thank You
Think about what you really want to produce at the end.
You need an algorithm that tells you wether the image you passed is a car, bike, animal, person, hydroplane, boat.
Do the user is supposed to pass an image that represents something else ? If so, you can add a "other" class.
Well, it depends on what kind of classifier you want to build and available training data.
If you have enough training data for a new class, e.g., train, you can easily add a new class. It is quite straightforward. But the problem remains, if some new object appears at input then what to do....
I think your question is, how to handle a situation when an object(which is not the training set) is presented to the network. In such cases adding a negative class is quite complex as network need enough and clear training data for negative class as well. So one way to deal with this situation is to put a check on the output probabilities. If no training class get say 70% of output probability than classify the input as ambiguous or negative class.

Does prior distribution matter in classification?

Currently I get a classification problem with two classes. what I want to do is that given a bunch of candidates, find out who will more likely to be the class 1. The problem is that class 1 is very rare (around 1%), which I guess makes my prediction quite inaccurate.
For training the dataset, can I sample half class 1 and half class 0? This will change the prior distribution, but I don't know whether the prior distribution affects the classification results?
Indeed, a very imbalanced dataset can cause problems in classification. Because by defaulting to the majority class 0, you can get your error rate already very low.
There are some workarounds that may or may not work for your particular problem, such as giving equal weight to the two classes (thus weighting instances from the rare class stronger), oversampling the rare class (i.e. learning each instance multiple times), producing slight variations of the rare objects to restore balance etc. SMOTE and so on.
You really should to grab some classification or machine learning book, and check the index for "imbalanced classification" or "unbalanced classification". If the book is any good, it will discuss this problem. (I just assume you did not know the term that they use.)
If you're forced to pick exactly one from a group, then the prior distribution over classes won't matter because it will be constant for all members of that group. If you must look at each in turn and make an independent decision as to whether they're class one or class two, the prior will potentially change the decision, depending on which method you choose to do the classification. I would suggest you get hold of as many examples of the rare class as possible, but beware that feeding a 50-50 split to a classifier as training blindly may make it implicitly fit a model that assumes this is the distribution at test time.
Sampling your two classes evenly doesn't change assumed priors unless your classification algorithm computes (and uses) priors based on the training data. You stated that your problem is "given a bunch of candidates, find out who will more likely to be the class 1". I read this to mean that you want to determine which observation is most likely to belong to class 1. To do this, you want to pick the observation $x_i$ that maximizes $p(c_1|x_i)$. Using Bayes' theorem, this becomes:
$$
p(c_1|x_i)=\frac{p(x_i|c_1)p(c_1)}{p(x_i)}
$$
You can ignore $p(c_1)$ in the equation above since it is a constant. However, computing the denominator will still involve using prior probabilities. Since your problem is really more of a target detection problem than a classification problem, an alternate approach for detecting low probability targets is to take the likelihood ratio of the two classes:
$$
\Lambda=\frac{p(x_i|c_1)}{p(x_i|c_0)}
$$
To pick which of your candidates is most likely to belong to class 1, pick the one with the highest value of $\Lambda$. If your two classes are described by multivariate Gaussian distributions, you can replace $\Lambda$ with its natural logarithm, resulting in a simpler quadratic detector. If you further assume that the target and background have the same covariance matrices, this results in a linear discriminant (http://en.wikipedia.org/wiki/Linear_discriminant_analysis).
You may want to consider Bayesian utility theory to re-weight the costs of different kinds of error to get away from the problem of the priors dominating the decision.
Let A be the 99% prior probability class, B be the 1% class.
If we just say that all errors incur the same cost (negative utility), then
it's possible that the optimal decision approach is to always declare "A". Many
classification algorithms (implicitly) assume this.
If instead, we declare that the cost of declaring "B" when, in fact, the instance
was "A" is much bigger than the cost of the opposite error, then the decision logic
becomes, in a sense, more sensitive to slighter differences in the features.
This kind of situation frequently comes up in fault detection -- faults in the monitored
system will be rare, but you want to be sure that if we see any data that points to
an error condition, action needs to be taken (even if it is just reviewing the data).

A few implementation details for a Support-Vector Machine (SVM)

In a particular application I was in need of machine learning (I know the things I studied in my undergraduate course). I used Support Vector Machines and got the problem solved. Its working fine.
Now I need to improve the system. Problems here are
I get additional training examples every week. Right now the system starts training freshly with updated examples (old examples + new examples). I want to make it incremental learning. Using previous knowledge (instead of previous examples) with new examples to get new model (knowledge)
Right my training examples has 3 classes. So, every training example is fitted into one of these 3 classes. I want functionality of "Unknown" class. Anything that doesn't fit these 3 classes must be marked as "unknown". But I can't treat "Unknown" as a new class and provide examples for this too.
Assuming, the "unknown" class is implemented. When class is "unknown" the user of the application inputs the what he thinks the class might be. Now, I need to incorporate the user input into the learning. I've no idea about how to do this too. Would it make any difference if the user inputs a new class (i.e.. a class that is not already in the training set)?
Do I need to choose a new algorithm or Support Vector Machines can do this?
PS: I'm using libsvm implementation for SVM.
I just wrote my Answer using the same organization as your Question (1., 2., 3).
Can SVMs do this--i.e., incremental learning? Multi-Layer Perceptrons of course can--because the subsequent training instances don't affect the basic network architecture, they'll just cause adjustment in the values of the weight matrices. But SVMs? It seems to me that (in theory) one additional training instance could change the selection of the support vectors. But again, i don't know.
I think you can solve this problem quite easily by configuring LIBSVM in one-against-many--i.e., as a one-class classifier. SVMs are one-class classifiers; application of an SVM for multi-class means that it has been coded to perform multiple, step-wise one-against-many classifications, but again the algorithm is trained (and tested) one class at a time. If you do this, then what's left after step-wise execution against the test set, is "unknown"--in other words, whatever data is not classified after performing multiple, sequential one-class classifications, is by definition in that 'unknown' class.
Why not make the user's guess a feature (i.e., just another dependent variable)? The only other option is to make it the class label itself, and you don't want that. So you would, for instance, add a column to your data matrix "user class guess", and just populate it with some value most likely to have no effect for those data points not in the 'unknown' category and therefore for which the user will not offer a guess--this value could be '0' or '1', but really it depends on how you have your data scaled and normalized).
Your first item will likely be the most difficult, since there are essentially no good incremental SVM implementations in existence.
A few months ago, I also researched online or incremental SVM algorithms. Unfortunately, the current state of implementations is quite sparse. All I found was a Matlab example, OnlineSVR (a thesis project only implementing regression support), and SVMHeavy (only binary class support).
I haven't used any of them personally. They all appear to be at the "research toy" stage. I couldn't even get SVMHeavy to compile.
For now, you can probably get away with doing periodic batch training to incorporate updates. I also use LibSVM, and it's quite fast, so it sould be a good substitute until a proper incremental version is implemented.
I also don't think SVM's can model the concept of an "unknown" sample by default. They typically work as a series of boolean classifiers, so a sample ends up as positively being classified as something, even if that sample is drastically different from anything seen previously. A possible workaround would be to model the ranges of your features, and randomly generate samples that exist outside of these ranges, and then add these to your training set.
For example, if you have an attribute called "color", which has a minimum value of 4 and a maximum value of 123, then you could add these to your training set
[({'color':3},'unknown'),({'color':125},'unknown')]
to give your SVM an idea of what an "unknown" color means.
There are algorithms to train an SVM incrementally, but I don't think libSVM implements this. I think you should consider whether you really need this feature. I see no problem with your current approach, unless the training process is really too slow. If it is, could you retrain in batches (i.e. after every 100 new examples)?
You can get libSVM to produce probabilities of class membership. I think this can be done for multiclass classification, but I'm not entirely sure about that. You will need to decide some threshold at which the classification is not certain enough and then output 'Unknown'. I suppose something like setting a threshold on the difference between the most likely and second most likely class would achieve this.
I think libSVM scales to any number of new classes. The accuracy of your model may well suffer by adding new classes, however.
Even though this question is probably out of date, I feel obliged to give some additional thoughts.
Since your first question has been answered by others (there is no production-ready SVM which implements incremental learning, even though it is possible), I will skip it. ;)
Adding 'Unknown' as a class is not a good idea. Depending on it's use, the reasons are different.
If you are using the 'Unknown' class as a tag for "this instance has not been classified, but belongs to one of the known classes", then your SVM is in deep trouble. The reason is, that libsvm builds several binary classifiers and combines them. So if you have three classes - let's say A, B and C - the SVM builds the first binary classifier by splitting the training examples into "classified as A" and "any other class". The latter will obviously contain all examples from the 'Unknown' class. When trying to build a hyperplane, examples in 'Unknown' (which really belong to the class 'A') will probably cause the SVM to build a hyperplane with a very small margin and will poorly recognizes future instances of A, i.e. it's generalization performance will diminish. That's due to the fact, that the SVM will try to build a hyperplane which separates most instances of A (those officially labeled as 'A') onto one side of the hyperplane and some instances (those officially labeled as 'Unknown') on the other side .
Another problem occurs if you are using the 'Unknown' class to store all examples, whose class is not yet known to the SVM. For example, the SVM knows the classes A, B and C, but you recently got example data for two new classes D and E. Since these examples are not classified and the new classes not known to the SVM, you may want to temporarily store them in 'Unknown'. In that case the 'Unknown' class may cause trouble, since it possibly contains examples with enormous variation in the values of it's features. That will make it very hard to create good separating hyperplanes and therefore the resulting classifier will poorly recognize new instances of D or E as 'Unknown'. Probably the classification of new instances belonging to A, B or C will be hindered as well.
To sum up: Introducing an 'Unknown' class which contains examples of known classes or examples of several new classes will result in a poor classifier. I think it's best to ignore all unclassified instances when training the classifier.
I would recommend, that you solve this issue outside the classification algorithm. I was asked for this feature myself and implemented a single webpage, which shows an image of the object in question and a button for each known class. If the object in question belongs to a class which is not known yet, the user can fill out another form to add a new class. If he goes back to the classification page, another button for that class will magically appear. After the instances have been classified, they can be used for training the classifier. (I used a database to store the known classes and reference which example belongs to which class. I implemented an export function to make the data SVM-ready.)

Resources