Train model to detect only 1 class label - machine-learning

I want to train a model to see if the image is dog or not dog only, by implementing Sigmoid Activation Function in the output layer.
However normally, we can only put 2 train dataset (dog vs cat, dog vs cow), now I want to detect (dog vs all), so is there any way to do that? As if we detect dog vs cat, then in case I test with a human face, it could be possibly classified as dog at the end...
I tried with Keras, but seems impossible now.
I do not know why in object detection, we can train a model with only 1 class label needed and it doesnt classify any unrelated objects as the one we want to train on (like we only detect dogs in an image, and then books, humans are also detected as dogs).

If your issue is that your target feature has multiple classes such as Dog, Cow, Cat etc. and you want to classify the images as Dog vs Not Dog, then you could just change the labels in your dataframe.
1 for images of dogs.
0 for all other images.

Related

Labeling images for yolo v4

I'm trying to label some images for train a YOLO model and I have two questions:
1 - I will try to locate standing persons on the images, (the images consists of standing persons and/or persons lying on the ground, or no person at all) so I basically have to only label the standing person? the person laying on the ground (basically rotated 90ยบ) will not be a problem to the model training?
2 - Shoud I pass some images with no person at all to the model training? or only images with one or more people?
thanks in advance!
In this case, theoretically deep learning based object detection model should be able to learn standing persons with only labelling standing persons, but I would recommend labelling persons laying on the ground as well with another class label. This will help the model learn discriminating features between standing and laying persons.
Any object detection model suffers from class imbalance between foreground and background, since majority of the time in any real world scene image region/pixels occupied by foreground object (in this case Persons) are very small as compared to the background region which create class imbalance between foreground and background region. This kind of class imbalance makes the object detection model biased towards the background class. Go through this article for more details.
So, adding images with zero persons will increase the foreground/background class imbalance problem even more.

How does masks and images work with each other in UNET?

Let's say , we have a 1000 number of images with their corresponding masks .Correct me if I am wrong that if we use UNET then it will pass through a number of different convolutional layers , relu , pooling etc. . It will learn the features of images according to its corresponding masks . It will give the label to objects and then it learns the features of images we pass in its training . It will match the object of image with its corresponding mask to learn the object features only not unnecessary objects features . Like if we pass the image of cat and its background is filled with some unnecessary obstacles(bins , table , chair etc. )
According to the mask of cat , it will learn the features of cats only . Kindly elaborate your answer if I am wrong ?
Yes, you are right.
However not only UNET every segmentation algorithm works in the same way that it will learn to detect the features that are masked and ignoring unnecessary objects(as you mentioned).
By the way, people typically choose Fast RCNN, Yolo than UNET for multiclass segmentation for real world objects (like chair, table, cat, cars, etc).
so here is a short explanation (but not limited to).
1- All the segmentation network or let's say task (in a more general term), uses the actual image and ground truth (your masks) to learn a classification task.
Is it really a classification task like logistics regression or decision tree? (then why the hell such a complex name).
Ans: Cool, intrinsically YES, Your network is learning to classify. But it's a bit different than your decision tree or logistics.
So our network like UNET tries to learn, how to classify each pixel in the image. And this learning is completely supervised, as you have a ground truth (masks), which tells the network, which class a pixel in the image belongs to. Hence, when you do the training the network weights (weights of all your conv layers and blah blah...) are adjusted such that it learns to classify each pixel in the image to its corresponding classes.

Can a generally trained deep learning classifier be used to classify within subclasses?

Suppose, a deep learning classifier is trained to distinguish between images of cars, ships, trucks, birds, horses and dogs. But all the training data for the birds were yellow birds.
Can the trained classifier then be used to only detect yellow birds within a birds image data set ? Image data is just an example here. The data can be other things like DNA sequences too. Please bear with me if the question is non-sensical or too basic.
In the example which you mentioned you are not learning your classifier to discriminate cars, ships, trucks, birds, horses and dogs but between five first things you mentioned and yellow birds. This means that when the score out from birds unit - assuming that your model is performing well and your dataset was sufficiently large - then you might assume that it will be able to discriminate between different objects and yellow birds - also when these different objects would be other birds. Of course - there is some small probability that it will learn to discriminate among birds and different objects using only shapes - but it's too small in my opinion to be taken into account. Of course - you might check that by simply generating an appropriate testing dataset.
In general - it depends on many factors. One of them is the architecture and design of your network. Discriminating yellow birds from different coloured one should be easy because of the convolution of coloured images nature. In different cases - it might not be so obvious. Other thing is how far conceptually are these classes which you want to discriminate from each other. If e.g. example - this other class can be build out of the same concepts as the learnt one - you might have problem - cause network might simply learn them as indicators of yellow birds.
So the best thing to do is to design appropriate testing dataset and perform the comparision between scores of different classes. If you prove that this score performs well - then you are done. If not - you need to retrain your network.
It depends mainly upon which features were captured by the classifier to detect birds. If the main criteria were, for example, wing-looking shapes and beaks, then yellow birds will be almost indistinguishable from other birds.
On the other hand, if the yellow color indeed got important for classification, then yellow birds will be labeled as birds with higher confidence than birds of any other color. For instance, a yellow parrot will be "80% bird, 10% cat" and a white swan will be "60% bird, 30% fish". However, you can't rely on this in advance.

Extracting Image Attributes

I am doing a project in computer vision and I need some help.
The objective of my project is to extract the attributes of any object - for example if I have a Nike running shoe, I should be able to figure out that it is a shoe in the first place, then figure out that it is a Nike shoe and not an Adidas shoe (possibly because of the Nike tick) and then figure out that it is a running shoe and not football studs.
I have started off by treating this as an image classification problem and I am using the following steps:
I have taken training samples (around 60 each) of say shoes, heels, watches and extracted their features using Dense SIFT.
Creating a vocabulary using k-means clustering (arbitrarily chosen the vocabulary size to be 600).
Creating a Bag-Of-Words representation for the images.
Training an SVM classifier to obtain a bag-of-words (feature vector) for every class (shoe,heel,watch).
For testing, I extracted the feature vector for the test image and found its bag-of-words representation from the already created vocabulary.
I compared the bag-of-words of the test image with that of each class and returned the class which matched closest.
I would like to know how I should proceed from here? Will feature extraction using D-SIFT help me identify the attributes as it only represents the gradient around certain points?
And sometimes, my classification goes wrong, for example if I have trained the classifier with the images of a left shoe, and a watch, a right shoe is classified as a watch. I understand that I have to include right shoes in my training set to solve this problem, but is there any other approach that I should follow?
Also is there any way to understand the shape? For example if I have trained the classifier for watches, and there are watches with both circular and rectangular dials in the training set, can I identify the shape of any new test image? Or do I simply have train it separately for watches with circular and rectangular dials?
Thanks

What kind of feature vector is better to detect whether there is a car in a car park slot ?

My aim is to detect whether a car slot is empty or occupied by a car. Finally, the number of cars will be counted in a car park.
The camera is monitoring the car park as it is seen in the sample pictures. Each car park slot is presented with very less pixels. I select four pixel points to define ROI, and I apply the perspective transformation in the image, please see Image 1.
SVM would be a nice approach to classify the samples and train. Unfortunately, I am not sure about the feature vector.
The challenges:
-Shadow of the cars in the adjacent slots
-A car is one slot is visible partially in another slot.
-Shadow of the big buildings
-Weather changes (sunny, cloudy etc. )
-After the rain, slot color is changed (dry or wet)
-Different slots and perspective changes
What kind of features or feature vectors would be the best for the classification?
Thank you in advance,
A color histogram could already be enough if you have enough training data. You can train with shadowed, partly shadowed, non-shadowed empty spots as well as with different cars. It might be difficult to get enough training data, you could also use synthetic data (render cars and shadows on the images).
So it is not only a question about features, but also about training samples.

Resources