I am building a system that can classify cars based on damage severity. In this system I needed to insert a module that can tell me if an uploaded image is car or not. I am using tensorflow for this purpose. I only have one idea that I can have images of car in one folder and some random images of other things in other folder. But this is not feasible at all as I cannot add images of every possible thing.
Is there any other solution for this ?
Thanks in advance.
First solution
You can find images of "every possible thing" in some dataset as CIFAR100, (you can specify than you don't want car images before the downloading) then you can train your network to identify car images from other ones.
Second solution
Use a pretrained model, many models have been already trained to recognize cars in Tensorflow, you just have to pick one.
Third solution
If you have a folder with car images, you can train a Generative Adversial Network to generate some pictures of cars from a random vector, after training the discriminator should be able to recognise cars !
Related
If I had a Dataset 1 with 90% cat images and 10% dog images, and I combined Dataset 2, with only dogs to equalize the class imbalance, will my model classify which are cats and dogs or which are dataset 1 images and dataset 2 images?
If it's the latter, how do I get the model to classify between cats and dogs?
Your model will only do what it is trained for, regardless of what name your dataset(s) have.
Name of the dataset is just an organizational issue which does not go into training, does not really effect the amount of loss that will be produced during a training step. What will effect your models responses is however is the properties of the data.
Sometimes data from different datasets have different properties even though the datasets serve for the same purpose; like images with different illumination, background, resolution etc. That surely have an effect on the model performance. This is why mixing datasets should be performed with caution. You might find it useful to have a look at this paper.
I want to build a face detector/classifier to generate a network that detects whether a face is present in an image/video.
I understand the basic concept, but what I have problems with is the choice of the number of classes.
Initially, I thought that two classes (with face / without face) would be sufficient. However, I was unsure which data I should use for the class 'without face'. So I threw together datasets of equipment and plants and animals, whereupon the classes were very unbalanced, which is apparently not good.
Then I thought it would be better to use as many classes as possible.
But again, I am unsure what would be the best/common approach to the problem?
You can experiment with any number of samples and different images for the negative class. If the datasets with equipment/plant/places you have are imbalanced, you can try to subsample, e.g. pick 100 images from each.
Just don't make the negative class too huge, w.r.t the number of images with human samples you have. The rest is up to experimentation.
im trying to figure out a question, the thing is that im working with a big dataset of pictures, the key idea is that almost all the pictures have just 1 person in it, every class should represent a different person but for some reason, lets say 1 of 1000 pictures in every class has a face that does not belong to that class(is not the same person that is on the other pics in that class) actually the person miss labeled is not from any class. here is my question: what happens on the learning process?, the convnet learns that that face is not useful for the task? or it generate some kind of error? i ask this because i need to know if i need to remove these "noisy" pictures for better performance, or if it is the case, the error would be neglectable. Thank you all in advance
Misleading targets will definitely add noise to your data. It will make training much more unstable if you have significant amount of incorrectly labeled data. Although, in your case, if you have 1/1000 ratio of incorrectly labeled data, unless you are using weighted classes, it won't much affect training.
By the way, if you are trying to create model that classifies a person by
face image, you might want to create other features, like eyes position, skin color, etc.
I understand that in zero shot learning, the classes are divided into seen/unseen categories. Then, we train the network for example on 50 classes and test on the other 50 that the network has not seen. I also understand that the network uses attributes in the unseen classes(Not sure how it is used). However, my question is that how the network classifies the unseen classes? Does it actually label each class by its name. For example, if I am doing zero-shot action recognition and the unseen classes are such biking, swimming, football. Does the network actually name these classes? How does it know their labels?
The network uses the seen classes to learn relation between images and attributes or other information such as human gaze , word embeddings or whatever information that could be related between classes and images. Based on what the network learns it could be further mapped to the objects and attributes.
Say your classifier has pig , dogs , horses and cats images and its attributes during training time and has to classify a zebra during test time. During training time it learns the relation between image pixels and attribute 'stripes,tail,black,white...'
So during test time given image and attributes of zebra you need to use the classifier to figure out if they are related or not. Oh , well you could be given a image of a horse too which looks like a Zebra. So your classifier must learn to generalize well.
I am using tensorflow (object-detection) on my own dataset (drone recognition), also only 1 class named 'drone', after about 30000 steps trained, my result model can detect drone with very high accuracy, but I got a problem, I used ssd_inception_v2_coco model and its fine_tune_checkpoint on model zoo, right now sometimes in my real time detection, it detected human face as drone (very big different between 2 objects like that), I think because of the old checkpoint.
How can I prevent the detection of some object that have big different with my drone object, like human, dog, cat... Or can someone describe for me what problem here?
Sorry for my bad english
Even if you train an SSD for one class, it automatically creates another class called background. The background is trained using the regions of the training images that are not labeled as the desired classes (in your case, drone).
An easy way out is to add training samples that include images that have both drones and the things that you don't want to recognize as drones, in the same scene. Doing this and then increasing the number of epochs should improve the precision.
If you are doing an application where there are frequent occurences of some objects with drones, another possiblity is to actually train the network for those things too. This will increase your training workload, but improve the accuracy.
Some implementations of SSD have an option for hard negative mining of data, so that mistakes made during validation are specifically used with training. If you are familiar with the code, you might want to check if this is available.