I understand that in zero shot learning, the classes are divided into seen/unseen categories. Then, we train the network for example on 50 classes and test on the other 50 that the network has not seen. I also understand that the network uses attributes in the unseen classes(Not sure how it is used). However, my question is that how the network classifies the unseen classes? Does it actually label each class by its name. For example, if I am doing zero-shot action recognition and the unseen classes are such biking, swimming, football. Does the network actually name these classes? How does it know their labels?
The network uses the seen classes to learn relation between images and attributes or other information such as human gaze , word embeddings or whatever information that could be related between classes and images. Based on what the network learns it could be further mapped to the objects and attributes.
Say your classifier has pig , dogs , horses and cats images and its attributes during training time and has to classify a zebra during test time. During training time it learns the relation between image pixels and attribute 'stripes,tail,black,white...'
So during test time given image and attributes of zebra you need to use the classifier to figure out if they are related or not. Oh , well you could be given a image of a horse too which looks like a Zebra. So your classifier must learn to generalize well.
Related
I am working on a text classification problem for which I cant think of or find a solution. Essentially I am classifying a private complaint database which has custom categories per municipality this because some municipalities have other issues than others.
Example:
Mun. Issue Class
London Street lights are off Street-lighting
New York Street lights are off lighting
As you can see, I want to classify the issue based on the municipality, thus based on the first column select only the specific categories of that municipality and then choose the one which is classified by the issue. Currently I created superclasses which contains similar classes but now I want to be more specific. I have a big dataset and every municipality has around 10 classes.
You can use a normal classification algorithm with a neural net. steps would be:
1. Create the corpus into One hot vector
2. Train the neural network as multiclass classification
I think any normal neural network with a sufficient number of neurons can provide the results.
I am currently working on a model where I have to predict some materials like ladders, nuts, bolts, mouse, bottles, etc. I have written one algorithm for this which is working okay as of now, The set of images that I have is available on my local computer and I have enough training data to do the training and testing as well. As of now, I have a total of 26 image classes to predict from, all are material type.
Now, this is fine, but I want a case where if an image doesn't belong to said image classes I want it to return something like this, where it would specify that this is not a material rather it's a different picture altogether.
To do this I am thinking to double train my model with a different set of images( for e.g. Imagenet) where just by looking at any non-material image it would return me something like this "this is not material!"
So basically, the same model would get train on two different datasets, one dataset is my material dataset another one is anything other than materials, like images in Imagenet.
My question is how do I approach this? Or do I even need to do this? Or else I just write a simple if - else and put anything that it is not recognizing as material as Non-material type?
You can just merge the two datasets and label the ones that do not belong to said 26 classes as a special 27th class. Whenever your model predicts that class you know it's not part of your dataset. For example:
pred = [0.1, 0.1, 0.8] # Assume label 2 is not-this-dataset label
then you can use images from other dataset with label 2 and train as usual in a training cycle. Make sure to balance the dataset, as in there aren't proportionally too many special not-this-dataset labels so your model doesn't overfit and just predict everything is not from your original dataset.
I would like to know how to define or represent a negative training set if I would want to train a binary classifier from a pre-trained model say, AlexNet on ILSVRC12 (or ImageNet) dataset. What I am currently thinking of is to take one the classes which is not related as the negative training set while the one which is related as positive one. Is there any better way which is more elegant?
The CNNs trained on the ILSVRC data set are already discriminating among 1000 classes of images. Yes, you can use one of those topologies to train a binary classifier, but I suggest that you start with an untrained model and run it through your two chosen classes. If you start with a trained model, you have to unlearn a lot, and your result is still trying to discriminate among 1000 classes: that last FC layer is going to give you trouble.
There are ways to work around the 1000-class problem. If your application already overlaps one or more of the trained classes, then simply add a layer that maps those classes to label "1" and all the others to label "0".
If you're insistent on retaining the trained kernels, then try replacing the final FC layer (1000) with a 2-class FC layer. Then choose your two classes (applicable images vs everything else) and run your training.
I was trying to create a convolution neural network for the recognition of animals, vehicles, buildings, trees, plants from a large data-set having the combination of these objects.
At the time of training I got a doubt about the way in which the network should be trained. My doubt is that whether I could train the network with the data-set of whole animals as a single attribute or train each animals separately?
Means, one group for lions, one for tigers, one for elephants etc and at the time of testing I can code it to output the result as animal if any one of its subcategory is satisfied.
I got this doubt since I have read that there should be a correct pattern in the data-set for the efficient detection and there should be a pattern only if we are training with the subcategory of objects than the vast data-set.
I have attached a figure showing the sample dataset(only logically correct). I want to know whether there should be separate data-set or single data-set.
Training on a separate data-set or a single data-set will depend on a variety of factors. If you want to classify the images in your test dataset using the Convolution Neural Network into just animals and not further subdivide them, then training on a single-data should be done. However, if you plan to further sub classify the images into tigers and lions, then the training needs to be done on separate datasets of tigers and lions.
The type of the dataset that you use for training will highly depend on your requirements of classification on the test dataset.
Moreover, you have to make sure that you normalize the images before you use it for training.
We are planning to build image classifiers using Google Tensorflow.
I wonder what are the minimum and what are the optimum requirements to train a custom image classifier using a convolutional deep neural network?
The questions are specifically:
how many images per class should be provided at a minimum?
do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?
what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes.
is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000.
"how many images per class should be provided at a minimum?"
Depends how you train.
If training a new model from scratch, purely supervised: For a rule of thumb on the number of images, you can look at the MNIST and CIFAR tasks. These seem to work OK with about 5,000 images per class. That's if you're training from scratch.
You can probably bootstrap your network by beginning with a model trained on ImageNet. This model will already have good features, so it should be able to learn to classify new categories without as many labeled examples. I don't think this is well-studied enough to tell you a specific number.
If training with unlabeled data, maybe only 100 labeled images per class. There is a lot of recent research work on this topic, though not scaling to as large of tasks as Imagenet.
Simple to implement:
http://arxiv.org/abs/1507.00677
Complicated to implement:
http://arxiv.org/abs/1507.02672
http://arxiv.org/abs/1511.06390
http://arxiv.org/abs/1511.06440
"do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?"
It should work with different numbers of examples per class.
"what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes."
You should use the label smoothing technique described in this paper:
http://arxiv.org/abs/1512.00567
Smooth the labels based on your estimate of the label error rate.
"is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000."
Yes
How many images per class should be provided at a minimum?
do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?
what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes.
These three questions are not really TensorFlow specific. But the short answer is, it depends on the resiliency of your model in handling unbalanced data set and noisy labels.
is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000.
Yes, definitely. This would mean a much larger classifier layer, so your training time might be longer. Other than that, there are no limitations in TensorFlow.