Classes never seen before on Deep Learning Models - machine-learning

I have a basic question. Supposedly I am training an image classifier for cats and dogs. But I need an extra functionality. If an image does not belong to any of the category, how do I get to know it. Some of the options I was thinking of were:
Instead of 2 neurons I add a 3rd Neuron to the last layer. And get my training labels y as a one hot encoding of 3 labels, 3rd for being not in either of cat or dog class. I will use some random examples for my 3rd class.
I will use only 2 neurons and using some probability threshold I will use it to tell which class should my image belong.
However I do not think any of the methods is viable.
Can anyone suggest I a good technique to classify images which do not belong to my training category?

Before going into the solution I would first comment on the proposed solution of the questions. The first solution would work better compared to the second. This is because It is very hard to interpret the (probability )values of the neural network output. Closeness of the values might be caused by similarity of the classes involving(in this case a dog might look like a cat). Sometimes you may end up getting unseen classes being assigned to one of the classes with high probability.
Most of supervised classification machine learning algorithms are designed to map an input to one of some fixed number of classes. This type of classification is called closed world classification.
E.g.
MNIST - handwritten digit classification
Cat - Dog classification
When classification involves some unlabeled/unknown classes, the approach is called Open-world classification. There are various papers published[1, 2, 3].
I will explain my solution using the solution proposed by 3.
There are two options to apply the Open world classification(Here on I will refer to OWC) to the problem in question.
Classifying all new classes as single class
Classifying all new classes as single class, then further grouping similar samples into single class and different samples into different classes.
1. Classifying all new classes as single class
Although there could be many types of model that could fit to this type of classification(One of could be the first solution proposed by the question.) I would discusses model of 3. Here the network first decides to classify or to reject the input. Ideally if the sample is from seen classes then the network will classify into one of seen classes. Other wise the network rejects. The authors of 3 called this network Open classification network(OCN). Keras implementation of OCN could be(I've simplified the network to just focus on output of the model.
inputs = keras.layers.Input(shape=(28, 28,1))
x = keras.layers.Conv2D(64, 3, activation="relu")(inputs)
x = keras.layers.Flatten()(x)
embedding = keras.layers.Dense(256, activation="linear", name="embedding_layer")(x)
reject_output = keras.layers.Dense(1, activaton="sigmoid", name="reject_layer")(embedding)
classification_output = keras.layers.Dense(num_of_classes, activaton="softmax", name="reject_layer")(embedding)
ocn_model = keras.models.Model(inputs=inputs, outputs=[reject_output, classification_output)
The model is trained in a way that jointly optimizes both reject_output and classification_output losses.
2. Classifying all new classes as single class, then further grouping similar
The authors of 3 used another network to find similarity between samples. They called the network Pairwise Classification Network(PCN). PCN classifies whether two inputs are from the same classes or different classes. We can use the embedding of the first solution and use pairwise similarity metrics to create PCN network. In PCN the weights are shared for both inputs. This could be implemented using keras
embedding_model = keras.layers.Sequential([
keras.layers.Conv2D(64, 3, activation="relu", input_shape=(28, 28,1))
keras.layers.Flatten(),
embedding = keras.layers.Dense(256, activation="linear", name="embedding_layer")
])
input1 = keras.layers.Input(shape=(28, 28, 1))
input2 = keras.layers.Input(shape=(28, 28, 1))
embedding1 = embedding_model(input1)
embedding2 = embedding_model(input2)
merged = keras.layers.Concatenate()([embedding1, embedding2])
output = keras.layers.Dense(1, activation="sigmoid")(merged)
pcn_model = keras.models.Model(inputs=[input1, input2], outputs=output)
PCN model will be trained to reduce the distance from the same and increase the distance between different classes.
After the PCN network is trained auto-encoder is trained to learn useful representations from the unseen classes. Then Clustering algorithm is used to group(cluster) unseen classes by using PCN model as distance function.

Related

Evaluation of generative models like variational autoencoder

i hope everyone is doing well
I need some help with generative models.
So im working on a project where the main task is to build a binary classification model. In the dataset which contains 300000 sample and 100 feature, there is an imbalance between the 2 classes where majority class is too much bigger than the minory class.
To handle this problem, i'm using VAE (variational autoencoders) to solve this problem.
So i started training the VAE on the minority class and then use the decoder part of the VAE to generate new or fake samples that are similars to the minority class then concatenate this new data with training set in order to have a new balanced training set.
My question is : is there anyway to evalutate generative models like vae, like is there a way to know if the data generated is similar to the real one ??
I have read that there is some metrics to evaluate generated data like inception distance and Frechet inception distance but i saw that they have been only used on image data
I wanna know if i can use them too on my dataset ?
Thanks in advance
I believe your data is not image as you say there are 100 features. What I believe that you can check the similarity between the synthesised features and the original features (the ones belong to minority class), and keep only the ones with certain similarity. Cosine similarity index would be useful for this problem.
That would be also very nice to check a scatter plot of the synthesised features with the original ones to see if they are close to each other. tSNE would be useful at this point.

Is it possible to have a class feature with several values?

I have a dataset in which the class has several values. For example, a dataset of face recognition where the class could be a tuple (man, old, Chinese).
Is it possible to have such data, if yes what ML classifier should I use?
I beleive this questions must be moved to another paltform like the https://datascience.stackexchange.com/
What you ask for is called Mutli-label Classification
In multiple label classification tasks, the model is trained to provide the probabilities or likelihood of more than one label for a given sample.
You can wether use the Multi-lable classification, or you can use multiple binary classifiers for the prediction of each feature. Like one binary classification for predicting Man or Woman, the other for Old or Young and etc. But you must be cautious that yoru labels be semantically mutual exclusive. I mean if you have labels like "sky" and "outdoor", the binary classification might be noisy if your labels are not carefully made. i.e if for a sample you have "sky" label, but no "outdoor" label, that will cause some noises during your training

Caffe's way of representing negative examples on benchmark dataset for binary classification

I would like to know how to define or represent a negative training set if I would want to train a binary classifier from a pre-trained model say, AlexNet on ILSVRC12 (or ImageNet) dataset. What I am currently thinking of is to take one the classes which is not related as the negative training set while the one which is related as positive one. Is there any better way which is more elegant?
The CNNs trained on the ILSVRC data set are already discriminating among 1000 classes of images. Yes, you can use one of those topologies to train a binary classifier, but I suggest that you start with an untrained model and run it through your two chosen classes. If you start with a trained model, you have to unlearn a lot, and your result is still trying to discriminate among 1000 classes: that last FC layer is going to give you trouble.
There are ways to work around the 1000-class problem. If your application already overlaps one or more of the trained classes, then simply add a layer that maps those classes to label "1" and all the others to label "0".
If you're insistent on retaining the trained kernels, then try replacing the final FC layer (1000) with a 2-class FC layer. Then choose your two classes (applicable images vs everything else) and run your training.

Can anyone give me some pointers for using SVM for user recognition using keystroke timing?

I am trying to perform user identification using keystroke dynamics. The data consists of the timing of individual keystrokes. I am using an SVM for binary classification. How can I train this for multiple users?
i have times of dynamic keyword, very times of users, example “hello” h->16seg, e->10, l->30, o->20, therefore, i not have class(1pos, -1neg)
SVMs are a binary classifier. However, SVMs do give you a confidence score (a function of distance from the separating hyperplane). So, you can use this information in one of two popular ways to convert a binary classifier into a multiclass classifier. These two ways are One-vs-All and One-vs-One.
See this article on how to use SVMs in a multiclass setting.
For example, in the One vs. All setting, for each class you separate the training data into samples that belong to that class and samples that belong to any other class. Then you fit an SVM on that data. At the end of the day you have k classifiers if you have k classes. Then you run your test data through all k classifiers and return the class with the highest probability (confidence score).

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

Resources