Foreground extraction using Haar cascade classifier - opencv

I am working on a dataset (training + testing) which contains a different shopping cart items (eg: biscuits, soaps etc.) with different backgrounds. I need to predict the product ID for all testing images (product IDs are unique for each product, let's say Good-day 10 rs is having product ID 1 and so on... for different products )
My approach was to :
Extract the foreground from the image.
Apply sift/surf algorithm for finding matching keypoints.
However, the results are not satisfactory.
This is the input image:
This is the output image:
As you can see the bounding box generated by Haar-cascade doesn't cover the whole biscuit packet correctly.
Can you please tell me how to achieve bounding boxes correctly using Haar-cascade classifier (positive images dataset and negative images folder consists of persons and different climate conditions).
I know that in my dataset each biscuit packets are distinct products and contains only one image for a particular product, is this the reason why my Haar-cascade is not performing well?
If yes: please specify the data preprocessing steps to do.
And also specify other foreground extraction algorithms that solves my problem

Related

Does a ML model classify between desired image classes or by datasets?

If I had a Dataset 1 with 90% cat images and 10% dog images, and I combined Dataset 2, with only dogs to equalize the class imbalance, will my model classify which are cats and dogs or which are dataset 1 images and dataset 2 images?
If it's the latter, how do I get the model to classify between cats and dogs?
Your model will only do what it is trained for, regardless of what name your dataset(s) have.
Name of the dataset is just an organizational issue which does not go into training, does not really effect the amount of loss that will be produced during a training step. What will effect your models responses is however is the properties of the data.
Sometimes data from different datasets have different properties even though the datasets serve for the same purpose; like images with different illumination, background, resolution etc. That surely have an effect on the model performance. This is why mixing datasets should be performed with caution. You might find it useful to have a look at this paper.

Why does object detection result in multiple found objects?

I trained an object detector with CreateML and when I test the model in CreateML, I get a high number of identified objects:
Notes:
The model was trained on a small data set of ~30 images with that particular label face-gendermale occuring ~20 times.
Each training image has 1-3 labelled objects.
There are 5 label total.
Questions:
Is that expected or is there something wrong with the model?
If this is expected, how should I evaluate these multiple results or even count the number of objects found in the model?
Cross-posted in Apple Developer Forums. Photo of man © Jason Stitt | Dreamstime.com
A typical object detection model with make about 1000 predictions for every image (although it can be much more depending on the model architecture). Most of these predictions have very low confidence, so they are filtered out. Then the ones that are left over are sent through non-maximum suppression (NMS), which removes bounding boxes that overlap too much.
In your case, it seems that the threshold for NMS is too low (or too high), because many overlapping boxes survive.
However, it also seems that the model hasn't been trained very well yet, probably because you used very few images.

How to select features when you have image(pixels) with extra information(categories)?

Suppose you need to train your classifier on a dataset that has images as well as more descriptor features available (along with the labels of-course).
For eg. if you have to classify cats vs dogs, and you are provided with the image, weight and age of each animal. If I just had the image, I could use each pixel as a feature and train my model. I don't want the age and weight to be lost among the pixels like needles in a haystack(of features). Do I make sense ?
Any help would be much appreciated :)

Using surf detector to find similarity between same bank denominations

Backstory , in my country there is a picture of its founding father in every bank denomination :
.
I want to find the similarity between these two images via surf detectors .The system will be trained by both images. The user will present the bottom picture or the top picture via a webcam and will use the similarity score between them to find its denomination value .
My pseudocode:
1.Detect keypoints and the corresponding descriptors of both the images via surf detector and descriptor .
2.a.Calculate the matching vector between the query and each of the trained example .Find number of good matches / total number of matches for each image .
2.b.OR Apply RANSAC algorithm and find the highest number of closest pair between query and training algorithm
3.The one having the higher value will have higher score and better similarity.
Is my method sound enough , or is there any other method to find similarity between two images in which the query image will undergo various transformations . I have looked for solutions for this such as finding Manhattan distance , or finding correlation , however none of them are adequate for this problem.
Yes, you are doing it the right way
1) You create a training set and store all its feature-points .
2) Perform ratio test for matches with the query and train feature-points.
3) Apply ransac test and draw matches (apply homograpghy if you want highlight the detected note).
This paper might be helpful, they are doing similar thing using SIFT
Your algorithm looks fine, but you have much more information with you which you can make use of. I will give you a list of information which you can use to further improve your results:
1. Location of the part where denominations are written on the image.
2. Information about how denominations are written - Script knowledge.
3. Homo-graphic information as you know the original image and The observed image
Make use all the above information to improve the result.

bag of words - image classification

i have some doubts incase of bag of words based image classification, i will first of tell what i have done
i have extracted the features from the training image with two different categories using SURF method,
i have then made clustering of the features for the two categories.
in order to classify my test image (i.e) to which of the two category the test image belongs to. for this classifying purpose i am using SVM classifier, but here is what i have a doubt , how do we input the test image do we have to do the same step from 1 to 2 again and then use it as a test set or is there any other method to do,
also would be great to know the efficiency of the bow approach,
kindly some one provide me with an clarification
The classifier needs the representation for the test data to have the same meaning as the training data. So, when you're evaluating a test image, you extract the features and then make the histogram of which words from your original vocabulary they're closest to.
That is:
Extract features from your entire training set.
Cluster those features into a vocabulary V; you get K distinct cluster centers.
Encode each training image as a histogram of the number of times each vocabulary element shows up in the image. Each image is then represented by a length-K vector.
Train the classifier.
When given a test image, extract the features. Now represent the test image as a histogram of the number of times each cluster center from V was closest to a feature in the test image. This is a length K vector again.
It's also often helpful to discount the histograms by taking the square root of the entries. This approximates a more realistic model for image features.

Resources