Does a ML model classify between desired image classes or by datasets? - machine-learning

If I had a Dataset 1 with 90% cat images and 10% dog images, and I combined Dataset 2, with only dogs to equalize the class imbalance, will my model classify which are cats and dogs or which are dataset 1 images and dataset 2 images?
If it's the latter, how do I get the model to classify between cats and dogs?

Your model will only do what it is trained for, regardless of what name your dataset(s) have.
Name of the dataset is just an organizational issue which does not go into training, does not really effect the amount of loss that will be produced during a training step. What will effect your models responses is however is the properties of the data.
Sometimes data from different datasets have different properties even though the datasets serve for the same purpose; like images with different illumination, background, resolution etc. That surely have an effect on the model performance. This is why mixing datasets should be performed with caution. You might find it useful to have a look at this paper.

Related

How can I train a single model where I have two different set of data to train with, simultaneously?

I am currently working on a model where I have to predict some materials like ladders, nuts, bolts, mouse, bottles, etc. I have written one algorithm for this which is working okay as of now, The set of images that I have is available on my local computer and I have enough training data to do the training and testing as well. As of now, I have a total of 26 image classes to predict from, all are material type.
Now, this is fine, but I want a case where if an image doesn't belong to said image classes I want it to return something like this, where it would specify that this is not a material rather it's a different picture altogether.
To do this I am thinking to double train my model with a different set of images( for e.g. Imagenet) where just by looking at any non-material image it would return me something like this "this is not material!"
So basically, the same model would get train on two different datasets, one dataset is my material dataset another one is anything other than materials, like images in Imagenet.
My question is how do I approach this? Or do I even need to do this? Or else I just write a simple if - else and put anything that it is not recognizing as material as Non-material type?
You can just merge the two datasets and label the ones that do not belong to said 26 classes as a special 27th class. Whenever your model predicts that class you know it's not part of your dataset. For example:
pred = [0.1, 0.1, 0.8] # Assume label 2 is not-this-dataset label
then you can use images from other dataset with label 2 and train as usual in a training cycle. Make sure to balance the dataset, as in there aren't proportionally too many special not-this-dataset labels so your model doesn't overfit and just predict everything is not from your original dataset.

tackle class imbalance in single shot object detector

I am training an object detection model for multi-class objects in the image. The dataset is custom collected and labelled data with bounding boxes and class labels in the ground truth data.
I trained the MobileNet+SSD , SqueezeDet and YoloV3 networks with this custom data but get poor results. The rationale of choosing these models is their fast performance and light weight (low memory foot print). Their single shot detector approach is shown to perform well in literature as well.
The class instance distribution in the dataset is as below
Class 1 -- 2469
Class 2 -- 5660
Class 3 -- 7614
Class 4 -- 13253
Class 5 -- 35262
Each image can have objects from any of the five classes. Class 4 and 5 have very high incidence.
The performance is very skewed with high recall scores and Average Precision for the class 4 and 5 , and an order of magnitude difference (lower) for the other 3 classes.
I have tried fine tuning on different filtering parameters , NMS threshold, model training parameters to no avail.
Question,
How to tackle such class imbalance to boost the detection Average precision and object detection accuracy for all classes in object detection models. ?
Low precision means your model is suffering from false positives. So you can try hard negative mining. Run your model. Find False positives. Include them in your training data. You can even try using only false negatives as false examples.
As you expect another way can be collecting more data if possible.
If it is not possible you may consider adding synthetic data. (i.e. change brightness of image, or view point(multiply with a matrix so it looks stretched))
One last thing may be having data for each class i.e. 5k for each.
PS: Keep in mind that flexibility of your model has a great impact. So be aware of over fitting under fitting.
In generating your synthetic data as mentioned by previous author, do not apply illumination or viewpoint variations..etc to all your dataset but rather, randomly. The number of classes is also way off, and will be best to either limit the numbers or gather more datasets for those classes. You could also try applying class weights to penalize the over representing classes more. You are making alot of assumptions that simple experimentation will yield results that could surprise you. Remember deep learning is part science and alot of art.

ZERO SHOT LEARNING

I understand that in zero shot learning, the classes are divided into seen/unseen categories. Then, we train the network for example on 50 classes and test on the other 50 that the network has not seen. I also understand that the network uses attributes in the unseen classes(Not sure how it is used). However, my question is that how the network classifies the unseen classes? Does it actually label each class by its name. For example, if I am doing zero-shot action recognition and the unseen classes are such biking, swimming, football. Does the network actually name these classes? How does it know their labels?
The network uses the seen classes to learn relation between images and attributes or other information such as human gaze , word embeddings or whatever information that could be related between classes and images. Based on what the network learns it could be further mapped to the objects and attributes.
Say your classifier has pig , dogs , horses and cats images and its attributes during training time and has to classify a zebra during test time. During training time it learns the relation between image pixels and attribute 'stripes,tail,black,white...'
So during test time given image and attributes of zebra you need to use the classifier to figure out if they are related or not. Oh , well you could be given a image of a horse too which looks like a Zebra. So your classifier must learn to generalize well.

Finding the suitable CNN architecture for the calssification

I want to use convolutional Neural Network (CNN) to classify between two classes of images. I built several CNN architectures, but I always get the same result; the network always classify all cases as a second class sample. Therefore, I always get 50% accuracy in leave-one-out. The data is balanced in terms of the number of samples of each class (16 from 1st, and 16 from 2nd). Could you please clarify what does this mean.
With such small number of training samples, Your CNN model is very likely to overfit the data giving good training accuracy and worst test accuracy.
Else your model can be skewed predicting the same class at all times.
Below are some of the solutions you can try:
1) As you have commented, if you cannot get any more images, then try creating new images by modifying the ones already available. For ex: Let's say you have 16 images of a cat (cat is the class). You can crop the cat and paste it in different backgrounds, try varying the brightness, intensity etc, Try rotation, translation operations etc.
This will help you create a good training set.
2) Try creating a smaller model (with one or two layers) and check if it improves your accuracy.
3) You can do transfer learning by using a good pre-trained model as it can learn pretty well when compared to creating a model from base.

type of recognition of convolution neural network

I was trying to create a convolution neural network for the recognition of animals, vehicles, buildings, trees, plants from a large data-set having the combination of these objects.
At the time of training I got a doubt about the way in which the network should be trained. My doubt is that whether I could train the network with the data-set of whole animals as a single attribute or train each animals separately?
Means, one group for lions, one for tigers, one for elephants etc and at the time of testing I can code it to output the result as animal if any one of its subcategory is satisfied.
I got this doubt since I have read that there should be a correct pattern in the data-set for the efficient detection and there should be a pattern only if we are training with the subcategory of objects than the vast data-set.
I have attached a figure showing the sample dataset(only logically correct). I want to know whether there should be separate data-set or single data-set.
Training on a separate data-set or a single data-set will depend on a variety of factors. If you want to classify the images in your test dataset using the Convolution Neural Network into just animals and not further subdivide them, then training on a single-data should be done. However, if you plan to further sub classify the images into tigers and lions, then the training needs to be done on separate datasets of tigers and lions.
The type of the dataset that you use for training will highly depend on your requirements of classification on the test dataset.
Moreover, you have to make sure that you normalize the images before you use it for training.

Resources