Labeling images for yolo v4 - machine-learning

I'm trying to label some images for train a YOLO model and I have two questions:
1 - I will try to locate standing persons on the images, (the images consists of standing persons and/or persons lying on the ground, or no person at all) so I basically have to only label the standing person? the person laying on the ground (basically rotated 90ยบ) will not be a problem to the model training?
2 - Shoud I pass some images with no person at all to the model training? or only images with one or more people?
thanks in advance!

In this case, theoretically deep learning based object detection model should be able to learn standing persons with only labelling standing persons, but I would recommend labelling persons laying on the ground as well with another class label. This will help the model learn discriminating features between standing and laying persons.
Any object detection model suffers from class imbalance between foreground and background, since majority of the time in any real world scene image region/pixels occupied by foreground object (in this case Persons) are very small as compared to the background region which create class imbalance between foreground and background region. This kind of class imbalance makes the object detection model biased towards the background class. Go through this article for more details.
So, adding images with zero persons will increase the foreground/background class imbalance problem even more.

Related

Training an object detection model on custom data images missing labels for other objects in the dataset. Is it possible?

I am working on creating an object detection model that should be able to look at an image (and later watch a video) and label particular objects inside the image. However, in one dataset of "gun", "officer"s and "gun" are the two objects labelled, and if things like batons or riot shields happen to be inside the image they aren't labelled. There is however separate datasets for "riot shield" and "baton"s, because these are objects I want to detect. Equally, these two datasets happen to sometimes have guns inside them that aren't labelled etc etc because they were collected only to recognize those individual objects.
Here is my question:
If I train the model on these datasets, and it is training on the "gun" dataset for example, and see's unlabelled riot shields, will those unlabelled objects in those images conflict with the labelled images when it trains on "riot shield"s and ruin the detection? If so, is there a way to isolate it's training so it doesn't make assumptions about other objects that are unlabelled in images?
It's a problem if an object shows up unlabled in a picture. The network is being given conflicting messages about what is and isn't a certain object. AFAIK there's no real way to isolate training. However, you can train on the gun dataset first and then have it run through and label all the guns in the other datasets. The same idea should work for each object you want to detect in other datasets.

Does the presence of an particular object in all the images of data set affect a CNN's performance

Context: I have partial images of size view of different types of vehicles in my data set ( Partial images because of limited Field Of View of my camera lens ). These partial images cover more than half the vehicle and can be considered as good representative images of the vehicle. The vehicle categories are car, bus, trucks. I always get a wheel of the vehicle in these images and because I am capturing these images during different parts of the day the colour intensity of the wheels vary throughout the day. However a wheel is definitely present in all the images.
Question: I wanted to know if presence of a object in all the images of a data set not logically useful for classification will affect the CNN in any way. Basically I wanted to know before training the CNN should I mask the object i.e black it out in all the images or just let it be there.
A CNN creates a hierarchical decomposition of the image into combinations of various discriminatory patterns. These patterns are learnt during training to find those that separate the classes well.
If an object is present in every image, it is likely that it is not needed to separate the classes and won't be learnt. If there is some variation on the onject that is class dependant, then maybe it will be used. It is really difficult to know what features are important beforehand. Maybe busses have shinier wheels than other cars, and this is something you have not noticed, and thus having the wheel in the image is beneficial.
If you have inadvertently introduced some class specific variation, this can cause a problem for later classification. For example, if you only took photos of busses at night, the network might learn night = bus and when you show it a photo of a bus during the day it won't classify correctly.
However, using dropout in the network forces it to learn multiple features for classification, and not just rely on one. So if there is variation, this might not have as big an impact.
I would use the images without blanking anything out. Unless it is something simple such as background removal of particles etc., finding and blacking out the object adds another layer of complexity. You can test if the wheels make a big difference by training the network on the normal images, then classifying a few training examples with the object blacked out and seeing if the class probabilities change.
Focus you energy on doing good data augmentation, that is where you will get the most gains.
You can see an example of which features are learnt on MNIST in this paper.

Good approach for training a neural network

I am training a neural network model to differentiate the orange and pomegranate.
In the training dataset, the background of the object (for both orange and pomegranate) is same and constant. But while testing, the background of the object is different than what I trained with.
So my first doubt is,
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
Second, I trained the object with different position and the same background. Since the theory says that position doesn't matter for convolution, it has ability to recognise the object placed at anywhere, because anyhow, after convolution, the dimension of the activation map decreases and the depth increases.
So my second doubt is,
Is it necessary or good approach to keep the object at different position while training
the model?
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
When training a neural network, it is important to shuffle the dataset you are using and split the dataset to training and testing sets. The reason why you need to shuffle the data, is in order for your model to see all types of samples in the training set so the moment it is exposed to new unseen data, it can reflect it over the previously seen data. In the example you mentioned above, it is important to shuffle the data due to the fact that there are different background colors which can effect the prediction of the model. Therefore both the training and the testing set need to have both background colors in order for your model to give good predictions.
Is it necessary or good approach to keep the object at different position while training
the model?
It is indeed better to train your model with objects in different positions due to the fact it can bring your model to predict more types of oranges or pomegranates. Note that if you are using different positions for the object you are trying to predict, it is important to have a sufficient amount of data in order for the model to give you good predictions over the testing set.
I hope this short explanation helped, if something isn't clear please let me know and I'll edit the post.
Is it good approach to train a model with one background (suppose white background) and test with another background (suppose grey background)?.
Background is a property of an image that is not required for distinguishing the object. You want your network to learn this behavior. Consider two cases now:
You give your network images with one background. Lets see what can go possibly wrong here.
Assume that your background is completely black. This means that there will be 0 output for a feature map (kernel) when it was put into the background. Your network will learn that it can put any high weights for these features and it will do a good job during training as long as those weights can successfully extract feature of the classes.
Now during testing, the background color is white. The same feature maps with high weight now will have very high output. These high output can saturate the non-linear unit and all categories may be classified as one category.
The second case where during training you shows images with different background.
In this case, neural network has to learn that the feature maps corresponding to background and need to subtract the bias based on the background.
In short, there is an extra information that you need to learn that is background is not important for deciding the category. When you provide only one color background, your neural network cannot learn this behavior and can give garbage result on test dataset.
Is it necessary or good approach to keep the object at different position while training the model?
You are right, Convolutional Neural Network are translational-equivariant. But for building a classifier, you pass the output of CNN-layer through a fully-connected layer. If you put image at different positions, different input will go to the fully-connected layer but output for all these images is the same category. So you are forcing your neural network to learn that the position of the object is not required for classifying its category.
Regarding your first doubt, It is not much of an issue as long as the target object is present in the images. Shuffle the data before feeding it to the network.
For second doubt, Yes it is always a good idea have target object at different positions. Also one more thing to take care is that the source of your data is same and mostly of same quality. Otherwise performance issue will arise.

Can a generally trained deep learning classifier be used to classify within subclasses?

Suppose, a deep learning classifier is trained to distinguish between images of cars, ships, trucks, birds, horses and dogs. But all the training data for the birds were yellow birds.
Can the trained classifier then be used to only detect yellow birds within a birds image data set ? Image data is just an example here. The data can be other things like DNA sequences too. Please bear with me if the question is non-sensical or too basic.
In the example which you mentioned you are not learning your classifier to discriminate cars, ships, trucks, birds, horses and dogs but between five first things you mentioned and yellow birds. This means that when the score out from birds unit - assuming that your model is performing well and your dataset was sufficiently large - then you might assume that it will be able to discriminate between different objects and yellow birds - also when these different objects would be other birds. Of course - there is some small probability that it will learn to discriminate among birds and different objects using only shapes - but it's too small in my opinion to be taken into account. Of course - you might check that by simply generating an appropriate testing dataset.
In general - it depends on many factors. One of them is the architecture and design of your network. Discriminating yellow birds from different coloured one should be easy because of the convolution of coloured images nature. In different cases - it might not be so obvious. Other thing is how far conceptually are these classes which you want to discriminate from each other. If e.g. example - this other class can be build out of the same concepts as the learnt one - you might have problem - cause network might simply learn them as indicators of yellow birds.
So the best thing to do is to design appropriate testing dataset and perform the comparision between scores of different classes. If you prove that this score performs well - then you are done. If not - you need to retrain your network.
It depends mainly upon which features were captured by the classifier to detect birds. If the main criteria were, for example, wing-looking shapes and beaks, then yellow birds will be almost indistinguishable from other birds.
On the other hand, if the yellow color indeed got important for classification, then yellow birds will be labeled as birds with higher confidence than birds of any other color. For instance, a yellow parrot will be "80% bird, 10% cat" and a white swan will be "60% bird, 30% fish". However, you can't rely on this in advance.

Using Haar Cascade Classifier in OpenCV to count cars in an aerial image of a parking lot

I am wanting to count the number of cars in aerial images of parking lots. After some research I believe that Haar Cascade Classifiers might be an option for this. An example of an image I will be using would be something similar to a zoomed in image of a parking lot from Google Maps.
My current plan to accomplish this is to train a custom Haar Classifier using cars that I crop out of images in only one orientation (up and down), and then attempt recognition multiple times while rotating the image in 15 degree increments. My specific questions are:
Is using a Haar Classifier a good approach here or is there something better?
Assuming this is a good approach, when cropping cars from larger images for training data would it be better to crop a larger area that could possibly contain small portions of cars in adjacent parking spaces (although some training images would obviously include solo cars, cars with only one car next to them, etc.) or would it be best to crop the cars as close to their outline as possible?
Again assuming I am taking this approach, how could I avoid double counting cars? If a car was recognized in one orientation, I don't want it to be counted again. Is there some way that I could mark a car as counted and have it ignored?
I think in your case I would not go for Haar features, you should search for something that is rotation invariant.
I would recommend to approach this task in the following order:
Create a solid training / testing data set and have a good look into papers about getting good negative samples. In my experience good negative samples have a great deal of influence on the resulting quality of your classifier. It makes your life a lot easier if all your samples are of the same image size. Add different types of negative samples, half cars, just pavement, grass, trees, people etc...
Before starting your search for a classifier make sure that you have your evaluation pipeline in order, do a 10 fold cross evaluation with the simplest Haar classifier possible. Now you have a baseline. Try to keep the software for all features you tested working in caseou find out that your data set needs adjustment. Ideally you can just execute a script and rerun your whole evaluation on the new data set automatically.
The problem of counting cars multiple times will not be of such importance when you can find a feature that is rotation invariant. Still non maximum suppression will be in order becaus you might not get a good recognition with simple thresholding.
As a tip, you might consider HOG features, I did have some good results on cars with them.

Resources