Need Haar Casscades for coins (especially euro ones) - opencv

I would need haar cascade classifier to detect coins, in particular euros, if they exists. I have been trying to generate my own for days bur results are always bad. Or maybe, do you know a good tutorial?
Thank you

You're probably not going to find many cascades pre-made for coins, or even specifically for euros. I'd recommend training your own.
As for tutorials, I used the opencv 3.0 traincascade tutorial when I was creating my LBP cascade, but it also makes HAARs. I also used mergevec to inflate my positive count.
Basically what I did when I was making mine was this:
I generated positive vectors using opencv_createsamples (which is in the opencv install) and mergevec. I basically just created all my vectors off of small batches of individual positive images and all the negative images, which game me some positive images to work off of. Then, I used mergevec and merged those vectors together to get a single vector file that opencv_traincascade could use.
Then, I ran opencv_traincascade with that new positive vector from the mergevec, and the negatives that I had. I think I ended up with about 7000 negatives and about 13000 positives, which is probably a bit overkill but I got a really nice cascade out of it. Try to keep the width and height below about 100x100, otherwise it will take all week to train.

Related

Are there any ways to build an ML model using CBIR and SIFT for image comparison in my case?

I have this project I'm working on. A part of the project involves multiple test runs during which screenshots of an application window are taken. Now, we have to ensure that screenshots taken between consecutive runs match (barring some allowable changes). These changes could be things like filenames, dates, different logos, etc. within the application window that we're taking a screenshot of.
I had the bright idea to automate the process of doing this checking. Essentially my idea was this. If I could somehow mathematically quantify the difference between a screenshot from the N-1th run and the Nth run, I could create a binary labelled dataset that mapped feature vectors of some sort to a label (0 for pass or 1 for fail if the images do not adequately match up). The reason for all of this was so that my labelled data would help make the model understand what scale of changes are acceptable, because there are so many kinds that are acceptable.
Now lets say I have access to lots of data that I have meticulously labelled, in the thousands. So far I have tried using SIFT in opencv using keypoint matching to determine a similarity score between images. But this isn't an intelligent, learning process. Is there some way I could take some information from SIFT and use it as my x-value in my dataset?
Here are my questions:
what would that be the information I need as my x-value? It needs to be something that represents the difference between two images. So maybe the difference between feature vectors from SIFT? What do I do when those vectors are of slightly different dimensions?
Am I on the right track with thinking about using SIFT? Should I look elsewhere and if so where?
Thanks for your time!
The approach that is being suggested in the question goes like this -
Find SIFT features of two consecutive images.
Use those to somehow quantify the similarity between two images (sounds reasonable)
Use this metric to first classify the images into similar and non-similar.
Use this dataset to train a NN do to the same job.
I am not completely convinced if this is a good approach. Let's say that you created the initial classifier with SIFT features. You are then using this data to train a NN. But this data will definitely have a lot of wrong labels. Because if it didn't have a lot of wrong labels, what's stopping you from using your original SIFT based classifier as your final solution?
So if your SIFT based classification is good, why even train a NN? On the other hand, if it's bad, you are giving a lot of wrong labeled data to the NN for training. I think the latter is a probably a bad idea. I say probably because there is a possibility that maybe the wrong labels just encourage the NN to generalize better, but that would require a lot of data, I imagine.
Another way to look at this is, let's say that your initial classifier is 90% accurate. That's probably the upper limit of the performance for the NN that you are looking at when talking about training it with this data.
You said that the issue that you have with your first approach is that 'it's not a an intelligent, learning process'. I think it's the wrong approach to think that the former approach is always inferior to the latter. SIFT is a powerful tool that can solve a lot of problems without all the 'black-boxness' of an NN. If this problem can be solved with sufficient accuracy using SIFT, I think going after a learning based approach is not the way to go, because again, a learning based approach isn't necessarily superior.
However, if the SIFT approach isn't giving you good enough results, definitely start thinking of NN stuff, but at that point, using the "bad" method to label the data is probably a bad idea.
Also in relation, I think you could potentially be underestimating the amount of data that is needed for this. You mentioned data in the thousands, but that's honestly, not a lot. You would need a lot more, I think.
One way I would think about instead doing this -
Do SIFT keyponits detection for a sample reference image.
Manually filter out keypoints that does not belong to the things in the image that are invariant. That is, just take keypoints at the locations in the image that is guaranteed (or very likely) to be always present.
When you get a new image, compute the keypoints and do matching with the reference image.
Set some threshold of the ratio of good matches to the total number of matches.
Depending on your application, this might give you good enough results.
If not, and if you really want your solution to be NN based, I would say you need to manually label the dataset as opposed to using SIFT.

OpenCV poor performance of Haar classifier trained by me

I would like to use the Haar classifier to detect the presence of vehicles in a scene (trying with only cars so far). Since I have not found many trained XML files online, I decided to generate my own.
I found some image sets of vehicles that have been used for similar purposes (training computer vision algorithms) and used these to create my own XML files. It has been almost a week and some of them have finished, so I tried using them but the results were terrible. The classifiers I found online worked decently, at least it appears they are trying to detect vehicles and work fast enough for real-time application (maybe 5-10 FPS or so).
Whereas mine can take several minutes to analyze a frame using detectMultiScale() using the same parameters, and if I pass different parameters (e.g. increase min size, decrease max size, increase scaling factor) it will work faster (maybe 1 FPS) but detects absolutely nothing of note, never detects any vehicle and randomly detects some spots of asphalt as a vehicle.
Where did I go wrong in generating my files? I have limited time to complete this task and these classifiers can take a whole week to train so I have very few attempts remaining. For reference, my methodology is (following this tutorial):
-Take all positive and negative images; if no negative images supplied, take negative images from another data set, at least as many negatives as positives
-Generate as many samples as the number of positives
-Use same parameters as suggested, except image size (set to the size of images in a given data set), and nstages (set to 10 because 20 takes far too long)
-For the npos parameter, I use 1/10th the number of samples, using the full number of samples resulted in "assertion failed" after a few hours, apparently the number of samples cannot be the same as the npos according to this so I gave myself a safety margin.
TL;DR Haar classifier I trained myself performs much worse than one found online (in terms of time and accuracy), need advice on how to improve it and not waste another week training it.
There are two problems here. One, the accuracy of the classifier is low. The other, the classifier runs too slow.
There seems to be no problem with the reference that you used. The steps seem accurate, and I have personally tried them in that order and managed to get good results.
As #Micka mentions, nPos around 90% of the original sample count is good enough. minHitRate is a parameter that you can change. Did you observe the numbers that are displayed while training? How was the accuracy improving, and did your classifier stop training (or are you using the trained parameters before learning ends?)?
For the low speed in detection, the most likely reason is that your training data did not have simple features to learn quickly. Did you trying detection on the data that you used for training? How were the results in that case? Compiler settings or high image resolution can be a problem too, but if you tried the same inputs and settings with other classifiers, this is unlikely.
If you like tor try a different approach (and have a GPU), YOLO V2 should be much faster and more accurate for this task.

Use Azure Machine learning to detect symbol within an image

4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!

opencv cascade classifier detects background

I have been using cascade classifier to train some kind of plants. Here is a sample image for what I want to detect
I sampled the little green plants for positives, and made negatives out of images with similar background and no green plants (as suggested by many sources). Used many images similar to this one for sampling.
I did not have a lot of training data so of course I did not expect some idealistic classification results.
I have set the usual parameters min_hit_rate 0.95 max_false_alarm 0.5 etc. I have tried training with 5,6,7,8,9 and 10 stages. The strange thing that happens to me is that during the training process I get hit rate of 1 during all stages, and after 5 stages I get good acceptance ratio 0.004 (similar for later stages 6,7,8...).
I tried testing my classifier on the same image which I used for the training samples and there is very illogical behavior:
the classifier detects almost everything BUT the positive samples i took from it (the same samples in the training with HIT RATION EQUAL TO 1).
the classifier is really but really slow it took over an hour for single input image (down-sampled scale factor 1.1).
I do not get it how could the same samples be classified as positives during training (through all the stages) and then NONE of it as positive on the image (there are a lot of false positives around it).
I checked everything a million times (I thought that I somehow mixed positives and negatives but I did not).
Can someone help me with this issue?
I can try and help but of course I can't train this thing for you unless you send me your images.
In my experience if you aren't getting the desired results, you are simply giving traincascade the wrong or not enough images (either or both positives or negatives).
I did not get great results until I created an annotation file using the built-in opencv_annotation tool. Have you done that? How many positives?
Did your negatives contain the background that you are attempting to detect your object in? This is key and can't be overlooked.
Also, I would use LBP, it's much faster.
If you or anyone is still stuck and have some positives created, send them to me and I'll see if I can train this thing.
And also, I have written hopefully a one-stop tutorial about this stuff after my experiences with it:
http://johnallen.github.io/opencv-object-detection-tutorial/

OpenCV: Training a soft cascade classifier

I've built an algorithm for pedestrian detection using openCV tools. To perform classification I use a boosted classifier trained with the CvBoost class.
The problem of this implementation is that I need to feed my classifier the whole set of features I used for training. This makes the algorithm extremely slow, so much that each image takes around 20 seconds to be fully analysed.
I need a different detection structure, and openCV has this Soft Cascade class that seems like exactly what I need. Its basic principle is that there is no need to examine all the features of a testing sample, since a detector can reject most negative samples using a small number of features. The problem is that I have no idea how to train one given a fully labeled set of negative and positive examples.
I find no information about this online, so I am looking for any tips you can give me on how to use this soft cascade to make classification.
Best regards

Resources