OpenCV and SVM training for luggages detection - opencv

In my project, I am trying to differenciate a luggage from anything else, usually a human.
For the moment, I use OpenCV and SVM training method with 2 classes, one with luggages, and another one with humans. Before injecting the frames, I converted them to grayscale, but I don't apply additional filters. The result of the prediction is not very accurate.
I am wondering if applying additional filters to the frames before training might give a better result. For example contours detection. If the contour is close to a 'rectangle' then it is a luggage otherwise it is 'something else'. I am also thinking about switching to a ONE_CLASS method.
What do you think ? Or do you have better ideas ?
Regards,
Julien.

After giving much thought regarding the question, I think Anomaly Detection is the best way to go. I got that idea since you mentioned ONE_CLASS method.
Assuming that luggage is of rectangular shape in an image, your suggestion of "anything close to rectangle is a luggage", is also a viable approach. Hence you have only one class 'Luggage'.
As the term implies, 'anomaly detection' is used to detect objects that do not conform to a particular pattern. In other words, it is used to detect outliers (objects other than those present in the dataset).
Since you are emphasizing on luggage alone I presume this approach to be the best.
You could try other approaches as well, in case you come across any.

So the rectangle approximation method seems to fit my requirements. I haven't tested with a lot of images yet, so I am not 100% sure I'll go for it. As always, there is exceptions: when the color of the luggage is close to the color of the background, the result is not accurate. Is there a way to amplify the difference between two close colors ?
Regards,
Julien.

Related

How to choose negative samples? (OpenCV, Object Detection)

I want to make a face detection application that detect only my face using Python, OpenCV. I wonder if there is any rule about choosing negative samples. Should I select the any image that does not contain my face? (For example: roads, scenes, animals etc.) or should I select faces of people that does not contain my face as negative image.
And also I wonder that does common environment where the object that looked for occupy affect the efficiency? (For example: Is it a good practice to select empty roads as negative image when detecting cars?)
I really wonder your thoughts. Could you also share articles, document about it if there is any? I would really appreciate for that. Thanks for your helps!
This is quite a difficult question, and depends heavily on your chosen algorithm. If you e.g. use SIFT, then you only use positive samples. If you are using Cascades then negative samples are necessary.
What samples to use depends on the application and no single answer exist. You should generally provide positive and negative samples covering the possible situations you could expect to appear. As the algorithm becomes even more complex (Deep Learning) this becomes even more relevant.
So if you think other faces can appear in the image and you want to make sure it sees them as negative you need to provide this. If your face can appear with different backgrounds then it is also important to include such situations.
To summarize, include samples (positive/negative) of situations expected to appear.

How do I segment the connected characters in this case?

It seems that I need some advice on segmenting connected characters (see the image below).
As you can see, C and U, as well as 4,9 and 9 are connected and therefore when I try to draw contours they are joined into one block. Unfortunately, there are plenty of such problematic images so I think I need to find some solution.
I have tried using different morphological transforms (erosion, dilation, opening), but that doesn't solve the problem.
Thanks in advance for any recommendations.
It seems to me that the best solution will be to work on the preprocessing, if there is a possibility.
Otherwise, you can try Machine Learning techniques. You may get inspiration from Viola-Jones or Histograms of Oriented Gradients + SVM algorithms (even though those algorithms solve a problem that differs from Optical Character Recognition, I had plenty of insights from them). In other words, try "sliding" a window along a horizontal of predefined aspect ratio and recognize characters. But the problem may be that you will need to train a model, which may require a lot of data.
As I said earlier, it may be a good idea to reconsider the image preprocessing step. By the way, it seems that in the case of "C" and "U", erosion may help.
Good luck!:)

Recognize objects while falling - viewpoint variation

I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.

How to enhance colors and contrast of an noisy image

I asked this question previously "How to extract numbers from an image" LINK and finally i made this step but there is some test cases that leads to awful outputs when i try to recognize digits .. Consider this image as an example
This image is low contrast (from my POV) i tried to adjust its contrast and the results still unacceptable .I tried also to sharp it then i applied gamma correction but the results still not fair ,so the extracted numbers doesn't recognized well by the classifier
this is the image after (sharpening + gamma)
Number 4 after separation :
Could anybody tell me what is the best ideas to solve such a problem ?
Sharpening is not always the best tool to approach a problem like this. Contrary to what the name implies, sharpening does not "recover" information to add detail and edges back into an image. Instead, sharpening is a class of operations that increase local contrast along edges.
Because your original image is highly degraded, this sharpening operation looks to be adding a lot of noise in, and generally not making anything better.
There is another class of algorithms called "deblurring" algorithms that attempt to actually reconstruct image detail through (much more complex) mathematical models. Some versions of this are blind deconvolution, regularized deconvolution, and Wiener deconvolution.
However, it is important to note that all of these methods are approximations - once image content is lost through an operation such as blurring , it can (almost) never be fully recovered. Also, these methods are generally much more complex.
The best way to handle these situations is make sure that they never happen. Ensure good focus during image capture, use a system with a resolution well suited to your task, control the lighting environment. However, when these methods do not or cannot work, image reconstruction techniques are needed.
Your image is blurred, and I suggest you try wiener deconvolution. You can assume the point spread function a Gaussian function and observe what's going on with the deconvolution process. Since you do not know the blur kernel in advance, blind deconvolution is an alternative.

how to recognize an same image with different size ?

We as human, could recognize these two images as same image :
In computer, it will be easy to recognize these two image if they are in the same size, so we have to make Preprocessing stage or step before recognize it, like scaling, but if we look deeply to scaling process, we will know that it's not an efficient way.
Now, could you help me to find some way to convert images into objects that doesn't deal with size or pixel location, to be input for recognition method ?
Thanks advance.
I have several ideas:
Let the image have several color thresholds. This way you get large
areas of the same color. The shapes of those areas can be traced with
curves which are math. If you do this for the larger and the smaller
one and see if the curves match.
Try to define key spots in the area. I don't know for sure how
this works but you can look up face detection algoritms. In such
an algoritm there is a math equation for how a face should look.
If you define enough object in such algorithms you can define
multiple objects in the images to see if the object match on the
same spots.
And you could see if the predator algorithm can accept images
of multiple size. If so your problem is solved.
It looks like you assume that human's brain recognize image in computationally effective way, which is rather not true. this algorithm is so complicated that we did not find it. It also takes a large part of your brain to deal with visual data.
When it comes to software there are some scale(or affine) invariant algorithms. One of such algorithms is LeNet 5 neural network.

Resources