I find out that SIFT features is only good for find the same object in the scene, but it seems not suitable for "similar" objects.
maybe I doing something wrong?
maybe I must use some other descriptors?
images and SIFT\ASIFT algorithms work:
link
same problem- no matches
link
I find out that SIFT features is only good for find the same object in the scene, but it seems not suitable for "similar" objects.
It is exactly what they are doing (and not only them, task is called "wide baseline matching") - 1)for each feature find the most similar - called "tentative" or "putative" correspondence
2)use RANSAC or other similar method to find geometric transformation between sets of correspondences.
So, if you need to find "similar", you have to use other method, like Viola-Jones http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
Or (but it will give you a lot of false positives) you can compare big image to small and do not use step 2.
The basic SIFT algorithm using VLfeat gives me this as a result. Which given the small and not so unique target image, is a pretty good result I would say.
Related
I have extracted DenseSIFT from the query and database image and quantized by kmeans using VLFeat. The challenge is to find those SIFT features that quantized to the same visual words and be spatially consistent (have a similar position to object centers). I have tried few techniques:
using FLANN() on the SIFT (normal SIFT) coordinates on both query and database image and find the nearest neighbor and then comparing the visual words (NOTE: this gave few points that did not work).
Using Coherent-Point-Drift (CPD) on SIFT coordinates to find the matched points (I am not sure about this whether it is a right solution or not).
I am struggling with it for many days, and I hope experts can guide me with this. What are the possible solutions or algorithms that I can use for solving this?
Neither of those two methods you mentioned achieve what you want do. The answer depends on the object in your pictures. If it has mostly flat faces, then you can rely on estimating the homography, see this tutorial.
If that's not case then can use the epipolar constraint to remove outliers / get geometrically consistent matches, see this tutorial. There are some other ways to achieve this if the speed is of importance in your application.
I want to find images similar to another image. So after researching i found two methods first was two represent the image by its attributes like
length = full
pattern = check
color = blue
but the limitation of this method is that I will not be able to get an exhaustive dataset with all the features marked.
The second approach I found was to extract features and do feature mapping.
So I decided to use deep convolution neural networks with caffe so that by using any of the exsisting models I could learn the features and then perform feature matching or some other operation. I just wanted to take a general advice what can be the other methods which are good and worth a try. And since I am just starting out with caffe so can anyone give a general guideline how to approach the problem with caffe?
Thanks in advance
I looked at phash just was curious that it will find the images which are same like there are minor intensity variations and some other variation wiill it also work to give the same type(semantically) like for a tshirt with blue and red stripes will it give black and white stripe as similar and would it consider things like the length of shirt, collar style etc
It's true, that it's been empirically shown, that the euclidean distance between the features extracted using ConvNets is closer for images of the same class, while farther for images of different classes - but it's important to understand what kind of similarity you're looking for.
One can define many types of similarity measures, and the type of features you use (in the case of ConvNets, the type of data it was trained on) affects the kind of similar images you'll get. For instance, maybe given an image of a dog, you want to find other pictures of dogs but not specifically that exact dog, alternatively, maybe you have a picture of a church and you want to find another image of the exact same church but from a different angle - these are two very different problems, with different methods you can use to solve them.
One particular kind of convolutional neural networks you can look at, are Siamese Network, which are built to learn similarities between two images, given a dataset of pairs of images with the labels same/not_same. You can look for implementation in Caffe for this method here.
A different method, is to take a ConvNet trained on ImageNet data (see here for options), and use the python/matlab interface to classify images, and then extract the second to last layer, and use that as the representation for that image. Now you can just take the euclidean distance of those representations and this would be your similarity measure.
Unrelated to Caffe, you can also use "old school" methods of feature matching, included in open source libraries like OpenCV (an example tutorial of such method).
I have 5000 images and each image can generate a vector with about 1000 dimensions(hog feature), but some of the images are very similar so I want to remove the similar ones. Is there a way to achieve this?
===============================================================
As #thedarkside ofthemoon suggested, let me explain a little bit more about what I am trying to do. I am using SVM + HOG features to do image classification. I have prepared some training data but some of the training images are very similar so that I want to remove the similar ones to reduce computation cost. I don't know if the removal of similar images has a side effect on the final classification rate so a good criteria of 'similarity' must be found. That's what i am trying to do.
In another way(not using hog features) you can compute color histogram for each image and compare against others.
Like,
Get the first image and compute the histogram,
Now for each other images calculate histogram and compare with the first one.
If you find close match on the histogram you can discard it. And by using CV_COMP_CORREL you will get match in the range of 0-1.
Well it depends what you mean by similar, currently my favorite image similarity descriptor is the gist descriptor.
http://people.csail.mit.edu/torralba/code/spatialenvelope/
but it is not in opencv. however it is coded in C here, so can be added to a c++ project (extern "C"), if your using the c++ opencv, not sure about python sorry.
http://people.rennes.inria.fr/Herve.Jegou/software.html
I have found this to be pretty good, and quite efficient.
(Sorry this is not a direct opencv solution, but i feel it is a reasonable answer as gist C code can be added to c++ project, and works nicely.)
EDIT:
if you just want to remove ones with similar hog descriptor you can use the:
http://docs.opencv.org/modules/ml/doc/k_nearest_neighbors.html
or
http://docs.opencv.org/trunk/modules/flann/doc/flann_fast_approximate_nearest_neighbor_search.html
I have a large image (5400x3600) that has multiple CCTVs that I need to detect.
The detection takes lot of time (4-7 minutes) with rotation. But it still fails to resolve certain CCTVs.
What is the best method to match a template like this?
I am using skImage - openCV is not an option for me, but I am open to suggestions on that too.
For example: in the images below, the template is correct matched with the second image - but the first image is not matched - I guess due to the noise created by the text "BLDG..."
Template:
Source image:
Match result:
The fastest method is probably a cascade of boosted classifiers trained with several variations of your logo and possibly a few rotations and some negative examples too (non-logos). You have to roughly scale your overall image so the test and training examples are approximately matched by scale. Unlike SIFT or SURF that spend a lot of time in searching for interest points and creating descriptors for both learning and searching, binary classifiers shift most of the burden to a training stage while your testing or search will be much faster.
In short, the cascade would run in such a way that a very first test would discard a large portion of the image. If the first test passes the others will follow and refine. They will be super fast consisting of just a few intensity comparison in average around each point. Only a few locations will pass the whole cascade and can be verified with additional tests such as your rotation-correlation routine.
Thus, the classifiers are effective not only because they quickly detect your object but because they can also quickly discard non-object areas. To read more about boosted classifiers see a following openCV section.
This problem in general is addressed by Logo Detection. See this for similar discussion.
There are many robust methods for template matching. See this or google for a very detailed discussion.
But from your example i can guess that following approach would work.
Create a feature for your search image. It essentially has a rectangle enclosing "CCTV" word. So the width, height, angle, and individual character features for matching the textual information could be a suitable choice. (Or you may also use the image having "CCTV". In that case the method will not be scale invariant.)
Now when searching first detect rectangles. Then use the angle to prune your search space and also use image transformation to align the rectangles in parallel to axis. (This should take care of the need for the rotation). Then according to the feature choosen in step 1, match the text content. If you use individual character features, then probably your template matching step is essentially a classification step. Otherwise if you use image for matching, you may use cv::matchTemplate.
Hope it helps.
Symbol spotting is more complicated than logo spotting because interest points work hardly on document images such as architectural plans. Many conferences deals with pattern recognition, each year there are many new algorithms for symbol spotting so giving you the best method is not possible. You could check IAPR conferences : ICPR, ICDAR, DAS, GREC (Workshop on Graphics Recognition), etc. This researchers focus on this topic : M Rusiñol, J Lladós, S Tabbone, J-Y Ramel, M Liwicki, etc. They work on several techniques for improving symbol spotting such as : vectorial signatures, graph based signature and so on (check google scholar for more papers).
An easy way to start a new approach is to work with simples shapes such as lines, rectangles, triangles instead of matching everything at one time.
Your example can be recognized by shape matching (contour matching), much faster than 4 minutes.
For good match , you require nice preprocess and denoise.
examples can be found http://www.halcon.com/applications/application.pl?name=shapematch
I want to perform a classification task in which I map a given image of an object to one of a list of predefined constellations that object can be in (i.e. find the most probable match).
In order to get descriptors of the image (on which i will run machine learning algorithms) i was suggested using SIFT with the VLFeat implementation.
First of all my main question - I would like to ignore the key-point finding part of sift, and only use it for its descriptors. In the tutorial I saw that there is an option to do exactly that by calling
[f,d] = vl_sift(I,'frames',fc) ;
where fc specifies the key-points. My problem is that I want to explicitly specify the
bounding box in which i want to calculate the descriptors around the key-point - but it seems i can only specify a scale parameter which right now is a bit cryptic to me and doesn't allow me to specify explicitly the bounding box. Is there a way to achieve this?
The second question is does setting the scale manually and getting the descriptors this way make sense? ( i.e. result in a good descriptor? ). Any other suggestions regarding better ways of getting descriptors ? ( using SIFT with other implementations, or other non-SIFT descriptors ). I should mention that my object is always the only object in the image, is centered, has constant illumination, and changes by some kinds of rotations of its internal parts - And this is why I thought SIFT would work out as i understood it focuses on the orientation gradients which would change accordingly with the rotations of the object.
Thanks
Agreed about the fact that the descriptor scale looks a bit cryptic.
See the third image in the VLFeat SIFT tutorial where they overlay the extracted descriptors on the image with the following commands
h3 = vl_plotsiftdescriptor(d(:,sel),f(:,sel)) ;
set(h3,'color','g') ;
You can thus play with scale and see if the region where the histogram is extracted jives with what you expected.
SIFT sounds like it might be overkill for your application if you have that much control over the imaging environment but it should work.
Hey.
It might help looking through the background chapter of this Thesis:
http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2001260
it would take time for me to explain about the scale so try reading it and see the relevant citation. Btw in that work the descriptors are used at base resolution, i.e. scale ~ 1.
Hope this helps.
Maybe I did not understood the problem, but, if the query image must be matched against a database of train images, and both train and test images are constant in illumination, scale, ... maybe SIFT is not necessary in here. You could have a look on correlation. Are you using matlab?
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html#template-matching "Here" you can see an example using correlation with opencv.