My question is - can I recognize different templates in a source image using feature detection in OpenCV? Let's say my templates are road signs.
I am using ORB, but this is not tracker-specific question.
My basic approach without feature detection is:
Image preparation (filtering etc);
Detecting ROI where my object may be located;
Resizing ROI to templates' size and comparing with each template I have (ie. template matching);
Maximum correlation after comparison is an object I look for.
But with feature detection I detect keypoints and descriptors for each image in my template set and for my ROI where object might be located, but matcher returns distances for all descriptors I have in my ROI.
I can't tie this to any correlation between ROI and templates, or, in other words, I can't decide whether ROI image and template image are the same objects based on information provided by matcher.
So, to be more specific - is my approach wrong and feature detectors are used to detect one template object in a source image (which is not what I need) or I'm just not grasping the basic concepts of feature detection and thus am in need of help.
You may be missing two aspects. One is to remove outliers in your feature matching using a method like RANSAC+homography. The second point is to project the corners of your template to the scene, to make a "rectangle" of your image. Also you should define a threshold on how many inliers you will consider the minimum for a right detection.
Check this tutorial on finding objects with feature detection.
I will refer you to a book called:
'opencv2 computer vision application programming cookbook'
Just browse the relevant chapters.
Related
I am working on a project where I have to recognize objects in a grocery shells. You can see the sample image below:
I need to find what products exists in an image. The example of result image is shown below:
OpenCV tools like SURF, SIFT, ORB detects only one occurrence of the object in an image. Can you suggest some papers or tools to solve this problem.
Normally there are multiple techniques to detect multiple instances of the same object in an image.
The most primitive way to do that is template matching. So you create a database of training images at multiple scales and rotations to be able to detect such objects in varying conditions. But there are many techniques that are better than such legacy technique.
Some other techniques uses texture features that is invariant over scale, rotation, or both. For example, GLCM, LBP, HOG, SIFT, ORB and others.
Your statement OpenCV tools like SURF, SIFT, ORB detects only one occurrence of the object in an image. needs more enhancement.
The listed tools are not intended to detect objects but they are means to extract texture features.
You are the one to adjust them to detect multiple objects.
There is a more fine way to solve your problem. It seems that all of your objects that are required to be detected contains the text TASSAY.
you can easily extract that text using a group of morphological operations and then using a blob detector detect the location of the text.
After returning the text, it can be easily to measure the text location.
The object bounding box can be easily inferred from the text location.
Hope it helps.
I was reading about image segmentation, and I understood that it is the first step in image analysis. But I also read that if I am using SURF or SIFT to detect and extract features there is no need for segmentation. Is that true? Is there a need for segmentation if I am using SURF?
The dependency between segmentation and recognition is a bit more complex. Clearly, knowing which pixels of the image belong to your object makes recognition easier. However, this relationship works also in the other direction: knowing what is in the image makes it easier to do segmentation. However, for simplicity, I will only speak about a simple pipeline where segmentation is performed first (for instance based on some simple color model) and each of the segments is then processed.
Your question specifically asks about the SURF features. However, in this context, what is important is that SURF is a local descriptor, i.e. it describes small image patches around detected keypoints. Keypoints should be points in the image where information relevant to your recognition problem can be found (interesting parts of the image), but also points that can reliably be detected in a repeatable fashion on all images of objects belonging to the class of interest. As a result, a local descriptor only cares about the pixels around points selected by the keypoint detector and for each such keypoint extracts a small feature vector. On the other hand a global descriptor will consider all pixels within some area, typically a segment, or the whole image.
Therefore, to perform recognition in an image using a global descriptor, you need to first select the area (segment) from which you want your features to be extracted. These features would then be used to recognize what is the content of the segment. The situation is a bit different with a local descriptor, since it describes local patches that the keypoint detector determines as relevant. As a result, you get multiple feature vectors for multiple points in the image, even if you do not perform segmentation. Each of these feature vectors tells you something about the content of the image and you can try to assign each such local feature vector to a "class" and gather their statistics to understand the content of the image. Such simple model is called the Bag-of-words model.
Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!
I am trying to do object recognition using algorithms such as SURF, FERN, FREAK in opencv 2.4.2.
I am using the programs from opencv samples without modifications - find_obj.cpp, find_obj_ferns.cpp, freak_demo.cpp
I tried changing the parameters for the algorithms which didn't help.
I have my training images, test images and the result of FREAK recognition here
As you can see the result is pretty bad.
No feature descriptors is detected for one of the training image - image here
Feature descriptors are detected outside the object boundary for the other - image here
I have a few questions:
Why does these algorithms work with grayscale images ? It is apparent that for my above training images, the object can be detected easily if RGB is included. Is there any technique that takes this into account.
Is there any other way to improve performance. I tried fiddling with feature parameters which didn't work well.
First thing i observed in your image is, object is plane and no texture differences are there...I mean all the feature detectors you used are for finding corners which are view invariant, it means those are the keypoints in an image which are having unique neighborhood and good magnitude of x and y derivatives. I have uploaded my analysis...see the figures
How to know what I am saying is correct?
Just go to the descriptor values of a keypoint you find over your object and see the values, you will see most of them are zeros...Because a descriptor is the description of variation of the edges around a corner point in a specific direction (see surf documentation for more details).
The object you are trying to detect is looking like a mobile phone, so you just reverse the object or mobile and repeat the experiment and you will surely get good results...Because on front side generally objects have more texture like switches, logos etc..
Here is a result I uploaded,
Have OpenCV implementation of shape context matching? I've found only matchShapes() function which do not work for me. I want to get from shape context matching set of corresponding features. Is it good idea to compare and find rotation and displacement of detected contour on two different images.
Also some example code will be very helpfull for me.
I want to detect for example pink square, and in the second case pen. Other examples could be squares with some holes, stars etc.
The basic steps of Image Processing is
Image Acquisition > Preprocessing > Segmentation > Representation > Recognition
And what you are asking for seems to lie within the representation part os this general algorithm. You want some features that descripes the objects you are interested in, right? Before sharing what I've done for simple hand-gesture recognition, I would like you to consider what you actually need. A lot of times simplicity will make it a lot easier. Consider a fixed color on your objects, consider background subtraction (these two main ties to preprocessing and segmentation). As for representation, what features are you interested in? and can you exclude the need of some of these features.
My project group and I have taken a simple approach to preprocessing and segmentation, choosing a green glove for our hand. Here's and example of the glove, camera and detection on the screen:
We have used a threshold on defects, and specified it to find defects from fingers, and we have calculated the ratio of a rotated rectangular boundingbox, to see how quadratic our blod is. With only four different hand gestures chosen, we are able to distinguish these with only these two features.
The functions we have used, and the measurements are all available in the documentation on structural analysis for OpenCV, and for acces of values in vectors (which we've used a lot), can be found in the documentation for vectors in c++
I hope you can use the train of thought put into this; if you want more specific info I'll be happy to comment, Enjoy.