Image Descriptors with SIFT/VLFEAT - image-processing

I want to perform a classification task in which I map a given image of an object to one of a list of predefined constellations that object can be in (i.e. find the most probable match).
In order to get descriptors of the image (on which i will run machine learning algorithms) i was suggested using SIFT with the VLFeat implementation.
First of all my main question - I would like to ignore the key-point finding part of sift, and only use it for its descriptors. In the tutorial I saw that there is an option to do exactly that by calling
[f,d] = vl_sift(I,'frames',fc) ;
where fc specifies the key-points. My problem is that I want to explicitly specify the
bounding box in which i want to calculate the descriptors around the key-point - but it seems i can only specify a scale parameter which right now is a bit cryptic to me and doesn't allow me to specify explicitly the bounding box. Is there a way to achieve this?
The second question is does setting the scale manually and getting the descriptors this way make sense? ( i.e. result in a good descriptor? ). Any other suggestions regarding better ways of getting descriptors ? ( using SIFT with other implementations, or other non-SIFT descriptors ). I should mention that my object is always the only object in the image, is centered, has constant illumination, and changes by some kinds of rotations of its internal parts - And this is why I thought SIFT would work out as i understood it focuses on the orientation gradients which would change accordingly with the rotations of the object.
Thanks

Agreed about the fact that the descriptor scale looks a bit cryptic.
See the third image in the VLFeat SIFT tutorial where they overlay the extracted descriptors on the image with the following commands
h3 = vl_plotsiftdescriptor(d(:,sel),f(:,sel)) ;
set(h3,'color','g') ;
You can thus play with scale and see if the region where the histogram is extracted jives with what you expected.
SIFT sounds like it might be overkill for your application if you have that much control over the imaging environment but it should work.

Hey.
It might help looking through the background chapter of this Thesis:
http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2001260
it would take time for me to explain about the scale so try reading it and see the relevant citation. Btw in that work the descriptors are used at base resolution, i.e. scale ~ 1.
Hope this helps.

Maybe I did not understood the problem, but, if the query image must be matched against a database of train images, and both train and test images are constant in illumination, scale, ... maybe SIFT is not necessary in here. You could have a look on correlation. Are you using matlab?
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html#template-matching "Here" you can see an example using correlation with opencv.

Related

object detection LEDs in simple scene

I am new to opencv, I am guessing that this problem could be somewhat simple: I am trying to detect an object which is almost 25 by 15 pixels in an image which is 470 by 590 pixels.
I am attaching a zoomed image of this object, I have several options to go with:
1 - Two close Circles Detection using hough transformation,
2 - Histogram matching
3 - SURF feature detection
Any advise on which direction should I take? Please consider speed and real-time application. Thanks
I think it should go without explicitly saying so, but there are probably hundreds of things that could be tried, and with only one example image it is quite difficult to advise. For instance are the LED always green? we don't know.
That aside, imho, two good places to start would be with the ol' faithful template matching, or blob detection.
Then if that is not robust enough, you will need to look at some alternative representations of the template/blob, like the classic HoG (good for shape, maybe a bit heavy this app.), or even your own bespoke one that encodes your own domain specific knowledge of this problem.
Then if that is not robust enough, build a dataset of representative +ve and -ve examples, as big as you can, and then train a machine like svm , or a boosted classifier.
Template Matching:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Blob detection:
https://code.google.com/p/cvblob/
Machine Learning:
http://docs.opencv.org/modules/ml/doc/ml.html
TIPS:
Add as much domain knowledge as possible, i.e. if they are always green, use color in the representation, like hog on g channel for instance. If they are always circular, try to encode that, like use a log-polar grid in the template,rather than a regular grid... and so on.
Machine Learning is not magic, a linear classifier will essentially weight different points in the feature space, so you still require a good representation, so if the Template matching was a total fail, the it is unlikely that simple linear ml with help, but if the Template matching was okay, then ml may well boost the performance to a good level.
step 1: Remove the black background.
step 2: A snake algorithm can be used to find the boundaries of your object

How to remove similar images based on hog features?

I have 5000 images and each image can generate a vector with about 1000 dimensions(hog feature), but some of the images are very similar so I want to remove the similar ones. Is there a way to achieve this?
===============================================================
As #thedarkside ofthemoon suggested, let me explain a little bit more about what I am trying to do. I am using SVM + HOG features to do image classification. I have prepared some training data but some of the training images are very similar so that I want to remove the similar ones to reduce computation cost. I don't know if the removal of similar images has a side effect on the final classification rate so a good criteria of 'similarity' must be found. That's what i am trying to do.
In another way(not using hog features) you can compute color histogram for each image and compare against others.
Like,
Get the first image and compute the histogram,
Now for each other images calculate histogram and compare with the first one.
If you find close match on the histogram you can discard it. And by using CV_COMP_CORREL you will get match in the range of 0-1.
Well it depends what you mean by similar, currently my favorite image similarity descriptor is the gist descriptor.
http://people.csail.mit.edu/torralba/code/spatialenvelope/
but it is not in opencv. however it is coded in C here, so can be added to a c++ project (extern "C"), if your using the c++ opencv, not sure about python sorry.
http://people.rennes.inria.fr/Herve.Jegou/software.html
I have found this to be pretty good, and quite efficient.
(Sorry this is not a direct opencv solution, but i feel it is a reasonable answer as gist C code can be added to c++ project, and works nicely.)
EDIT:
if you just want to remove ones with similar hog descriptor you can use the:
http://docs.opencv.org/modules/ml/doc/k_nearest_neighbors.html
or
http://docs.opencv.org/trunk/modules/flann/doc/flann_fast_approximate_nearest_neighbor_search.html

What is the best method to template match a image with noise?

I have a large image (5400x3600) that has multiple CCTVs that I need to detect.
The detection takes lot of time (4-7 minutes) with rotation. But it still fails to resolve certain CCTVs.
What is the best method to match a template like this?
I am using skImage - openCV is not an option for me, but I am open to suggestions on that too.
For example: in the images below, the template is correct matched with the second image - but the first image is not matched - I guess due to the noise created by the text "BLDG..."
Template:
Source image:
Match result:
The fastest method is probably a cascade of boosted classifiers trained with several variations of your logo and possibly a few rotations and some negative examples too (non-logos). You have to roughly scale your overall image so the test and training examples are approximately matched by scale. Unlike SIFT or SURF that spend a lot of time in searching for interest points and creating descriptors for both learning and searching, binary classifiers shift most of the burden to a training stage while your testing or search will be much faster.
In short, the cascade would run in such a way that a very first test would discard a large portion of the image. If the first test passes the others will follow and refine. They will be super fast consisting of just a few intensity comparison in average around each point. Only a few locations will pass the whole cascade and can be verified with additional tests such as your rotation-correlation routine.
Thus, the classifiers are effective not only because they quickly detect your object but because they can also quickly discard non-object areas. To read more about boosted classifiers see a following openCV section.
This problem in general is addressed by Logo Detection. See this for similar discussion.
There are many robust methods for template matching. See this or google for a very detailed discussion.
But from your example i can guess that following approach would work.
Create a feature for your search image. It essentially has a rectangle enclosing "CCTV" word. So the width, height, angle, and individual character features for matching the textual information could be a suitable choice. (Or you may also use the image having "CCTV". In that case the method will not be scale invariant.)
Now when searching first detect rectangles. Then use the angle to prune your search space and also use image transformation to align the rectangles in parallel to axis. (This should take care of the need for the rotation). Then according to the feature choosen in step 1, match the text content. If you use individual character features, then probably your template matching step is essentially a classification step. Otherwise if you use image for matching, you may use cv::matchTemplate.
Hope it helps.
Symbol spotting is more complicated than logo spotting because interest points work hardly on document images such as architectural plans. Many conferences deals with pattern recognition, each year there are many new algorithms for symbol spotting so giving you the best method is not possible. You could check IAPR conferences : ICPR, ICDAR, DAS, GREC (Workshop on Graphics Recognition), etc. This researchers focus on this topic : M Rusiñol, J Lladós, S Tabbone, J-Y Ramel, M Liwicki, etc. They work on several techniques for improving symbol spotting such as : vectorial signatures, graph based signature and so on (check google scholar for more papers).
An easy way to start a new approach is to work with simples shapes such as lines, rectangles, triangles instead of matching everything at one time.
Your example can be recognized by shape matching (contour matching), much faster than 4 minutes.
For good match , you require nice preprocess and denoise.
examples can be found http://www.halcon.com/applications/application.pl?name=shapematch

sift features for "similar" objects

I find out that SIFT features is only good for find the same object in the scene, but it seems not suitable for "similar" objects.
maybe I doing something wrong?
maybe I must use some other descriptors?
images and SIFT\ASIFT algorithms work:
link
same problem- no matches
link
I find out that SIFT features is only good for find the same object in the scene, but it seems not suitable for "similar" objects.
It is exactly what they are doing (and not only them, task is called "wide baseline matching") - 1)for each feature find the most similar - called "tentative" or "putative" correspondence
2)use RANSAC or other similar method to find geometric transformation between sets of correspondences.
So, if you need to find "similar", you have to use other method, like Viola-Jones http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
Or (but it will give you a lot of false positives) you can compare big image to small and do not use step 2.
The basic SIFT algorithm using VLfeat gives me this as a result. Which given the small and not so unique target image, is a pretty good result I would say.

OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?

There are some topics here that are very helpful on how to find similar pictures.
What I want to do is to get a fingerprint of a picture and find the same picture on different photos taken by a digital camera. The SURF algorithm seams to be the best way to be independent on scaling, angle and other distortions.
I'm using OpenCV with the SURF algorithm to extract features on the sample image. Now I'm wondering how to convert all this feature data (position, laplacian, size, orientation, hessian) into a fingerprint or hash.
This fingerprint will be stored in a database and a search query must be able to compare that fingerprint with a fingerprint of a photo with almost the same features.
Update:
It seems that there is no way to convert all the descriptor vectors into a simple hash. So what would be the best way to store the image descriptors into the database for fast querying?
Would Vocabulary Trees be an option?
I would be very thankful for any help.
The feature data you mention (position, laplacian, size, orientation, hessian) is insufficient for your purpose (these are actually the less relevant parts of the descriptor if you want to do matching). The data you want to look at are the "descriptors" (the 4th argument):
void cvExtractSURF(const CvArr* image, const CvArr* mask, CvSeq** keypoints, CvSeq** descriptors, CvMemStorage* storage, CvSURFParams params)
These are 128 or 64 (depending on params) vectors which contain the "fingerprints" of the specific feature (each image will contain a variable amount of such vectors).
If you get the latest version of Opencv they have a sample named find_obj.cpp which shows you how it is used for matching
update:
you might find this discussion helpful
A trivial way to compute a hash would be the following. Get all the descriptors from the image (say, N of them). Each descriptor is a vector of 128 numbers (you can convert them to be integers between 0 and 255). So you have a set of N*128 integers. Just write them one after another into a string and use that as a hash value. If you want the hash values to be small, I believe there are ways to compute hash functions of strings, so convert descriptors to string and then use the hash value of that string.
That might work if you want to find exact duplicates. But it seems (since you talk about scale, rotation, etc) you want to just find "similar" images. In that case, using a hash is probably not a good way to go. You probably use some interest point detector to find points at which to compute SURF descriptors. Imagine that it will return the same set of points, but in different order. Suddenly your hash value will be very different, even if the images and descriptors are the same.
So, if I had to find similar images reliably, I'd use a different approach. For example, I could vector-quantize the SURF descriptors, build histograms of vector-quantized values, and use histogram intersection for matching. Do you really absolutely have to use hash functions (maybe for efficiency), or do you just want to use whatever to find similar images?
It seems like GIST might be a more appropriate thing to use.
http://people.csail.mit.edu/torralba/code/spatialenvelope/ has MATLAB code.
Min-Hash or min-Hashing is a technique that might help you. It encodes the whole image in a representation with adjustable size that is then stored in hash tables. Several variants like Geometric min-Hashing, Partition min-Hash and Bundle min-Hashing do exist. The resulting memory footprint is not one of the smallest but these techniques works for a variety of scenarios such as near-duplicate retrieval and even small object retrieval - a scenario where other short signatures often do not perform very well.
There are several papers on this topic. Entry literature would be:
Near Duplicate Image Detection: min-Hash and tf-idf Weighting
Ondrej Chum, James Philbin, Andrew Zisserman, BMVC 2008 PDF

Resources