Confusion regarding Object recognition and features using SURF - opencv

I have some conceptual issues in understanding SURF and SIFT algorithm All about SURF. As far as my understanding goes SURF finds Laplacian of Gaussians and SIFT operates on difference of Gaussians. It then constructs a 64-variable vector around it to extract the features. I have applied this CODE.
(Q1 ) So, what forms the features?
(Q2) We initialize the algorithm using SurfFeatureDetector detector(500). So, does this means that the size of the feature space is 500?
(Q3) The output of SURF Good_Matches gives matches between Keypoint1 and Keypoint2 and by tuning the number of matches we can conclude that if the object has been found/detected or not. What is meant by KeyPoints ? Do these store the features ?
(Q4) I need to do object recognition application. In the code, it appears that the algorithm can recognize the book. So, it can be applied for object recognition. I was under the impression that SURF can be used to differentiate objects based on color and shape. But, SURF and SIFT find the corner edge detection, so there is no point in using color images as training samples since they will be converted to gray scale. There is no option of using colors or HSV in these algorithms, unless I compute the keypoints for each channel separately, which is a different area of research (Evaluating Color Descriptors for Object and Scene Recognition).
So, how can I detect and recognize objects based on their color, shape? I think I can use SURF for differentiating objects based on their shape. Say, for instance I have a 2 books and a bottle. I need to only recognize a single book out of the entire objects. But, as soon as there are other similar shaped objects in the scene, SURF gives lots of false positives. I shall appreciate suggestions on what methods to apply for my application.

The local maxima (response of the DoG which is greater (smaller) than responses of the neighbour pixels about the point, upper and lover image in pyramid -- 3x3x3 neighbourhood) forms the coordinates of the feature (circle) center. The radius of the circle is level of the pyramid.
It is Hessian threshold. It means that you would take only maximas (see 1) with values bigger than threshold. Bigger threshold lead to the less number of features, but stability of features is better and visa versa.
Keypoint == feature. In OpenCV Keypoint is the structure to store features.
No, SURF is good for comparison of the textured objects but not for shape and color. For the shape I recommend to use MSER (but not OpenCV one), Canny edge detector, not local features. This presentation might be useful

Related

Gaussian Filters with ORB

I have started my first project in the field of Image recognition using Feature Point detectors and descriptors. I have no prior knowledge on the topics of Image recognition techniques before starting of this project and then I have researched on the available detectors and descriptors and came to know about the differences between them. Finally, I have opted out to work with the ORB detectors and descriptors for Image recognition (If it didn't worked according to my requiremnets then I would like to go out with the BRISK later).
As of now am in a stage of getting the results for Image recognition using ORB. At this Point, I was thinking of to use Gaussian Filters in my code so that I can get better results even though the Input Image is a bit blur.
My questions:
1) Is it possible to use Gaussian filters with ORB to get much better results for Image recognition?
2) When I read the paper on ORB I came to know that the lines below
FAST does not produce a measure of cornerness, and we have found that it has large
responses along edges. We employ a Harris corner measure [11] to order the FAST keypoints.
For a target number N of keypoints, we first set the threshold low enough to get more than
N keypoints, then order them according to the Harris measure, and pick the top N points.
FAST does not produce multi-scale features. We employ a scale pyramid of the image, and
produce FAST Features (filtered by Harris) at each level in the pyramid.
ORB provides the Harris Corner inorder to detect the corners in an image and is it worth for me to use Gaussian Filters along with ORB?
3) ORB uses only Harris Corner to detect the corners or any other?
Please let me know about this and just enlighten me on the above mentioned questions.

Which feature descriptors to use and why?

I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.

Detecting object class using shape descriptors in computer vision

I want to differentiate between two classes of objects through the differences in the shape of blob(blob is in the form of binary image) using shape descriptors and machine learning .I want to ask if there is any good shape feature which I can use to detect the descriptors for the irregular contour or blob obtained ?
there is a large body of work associated with shape descriptors, these methods work on either the outer edge detected pixels (the boundary) or the full filled-in binary shape. Both approaches rely on making the shape descriptors invariant to translation, rotation and scaling, and some to skew. The classical boundary method is Fourier Descriptors and the classic filled in method is Moment Invariants, both are covered in most good image processing textbooks and are easy to implement with OpenCV.
The answer is very subjective on the kinds of shapes you are looking for. If the contours of the shapes are discriminative enough, you can try shape context. To classify shapes, feed in these features into any classifier -- SVM or random forests for instance.
If the shapes have consistently occuring corners, then you can extract the corners using FAST or SURF, and describe the regions around the corners using SIFT or SURF. In this case, shapes are best recognised by feature matching or bags of words.

Color SURF detector

SURF by default works on Gray image. I am thinking to do SURF on HSV image. My method is to separate the channels into H, S and V. And I use S and V for keypoint detection. I tried to compare the number of keypoints in SV vs RGB and in terms of channel wise, HSV gives more features.
Not sure what I am doing is correct or not. Need some explanation of the possibility of applying SURF on HSV image. I have read a paper on applying SIFT on different color space but not SURF.
Is there better way to achieve this?
Can we apply SURF to color, HSV space?
Thank you for your time.
Can we apply SURF to color, HSV space?
I didn't test it, but as far as I know, SIFT and SURF use quite (in principle) similar detection techniques:
SIFT detector uses the Difference-of-Gaussian (DoG) technique to efficiently approximate the Laplacian-of-Gaussian (LoG), which both are Blob Detection techniques.
SURF detector uses box-filters/box-blurs of arbitrary size to compute (or approximate?) The determinant of the Hessian which is a Blob Detection technique.
Both methods use some strategy to compute those blobs in multiple scales (SIFT: DoG-Pyramid; SURF: integral images to scale the filter sizes). At the end, both methods detect blobs in the given 2D array.
So if SIFT can detect good features in your (H)SV channels, SURF should be able to do the same because in principle they both detect blobs. What you will do is detecting blobs in the hue/saturation/value channel:
hue-blobs: regions of similar color-tone which are surrounded by different (all higher or all lower) color-tones;
saturation-blobs: regions of... yea of what? no idea how to interpret that;
value-blobs: should give very similar results to the grayimage converted RGB image's blobs.
One thing to add: I'm just handling the detector! No idea how SIFT/SURF description is influenced by color data.
I didn't test it, but what you could do is using the interest point HSV values as additional matching criteria. What I used in the original implementation and what speeded up matching image pairs was the sign of the determinant of the Hessian matrix. The sign tells us whether it is a light blob on a dark background or a dark blob on a light background. Obviously, one would not attempt to match a dark blob with a bright blob.
In a similar way, you could use HSV values and use the distance. Why matching blue blobs with yellow blobs. Makes no sense, except white balance or lighting is completely messed up. Maybe my paper about matching line segments can help here. I used HSV there.
As for extracting SURF interest points on the different channels H, S, and V, I agree with the answer of Micka.
What you could try is to make a descriptor using the Hue channel.

How to crop the roi of the image

in my project I want to crop the ROI of an image. For this I create a map with the regions of interesst. Now I want to crop the area which has the most important pixels (black is not important, white is important).
Has someone an idea how to realize it? I think this is a maximazion problem
The red border in the image below is an example how I want to crop this image
If I understood your question correctly, you have computed a value at every point in the image. These values suggests the "importance"/"interestingness"/"saliency" of each point. The matrix/image containing these values is the "map" you are referring to. Your goal is to get the bounding box for regions of interests (ROI) with high "importance" score.
The way I think you can go about segmenting the ROIs is to apply Graph Cut based segmentation computing a "score" at each pixel using your importance map. The result of the segmentation is a binary mask that masks the "important" pixels. Next, run OpenCV's findcontours function on this binary mask to get the individual connected components. Then use OpenCV's boundingRect function on the contours returned by findContours(...) to get the bounding boxes.
The good thing about using a Graph Cut based segmentation algorithm in this way is that it will join up fragmented components i.e. the resulting binary mask will tend not to have pockets of small holes even if your "importance" map is noisy.
One Graph Cut based segmentation algorithm already implemented in OpenCV is the GrabCut algorithm. A quick hack would be to apply it on your "importance" map to get the binary mask I mentioned above. A more sophisticated approach would be to build the foreground and background (color perhaps?) model using your "importance" map and passing it as input to the function. More details on GrabCut in OpenCV can be found here: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html?highlight=grabcut#void grabCut(InputArray img, InputOutputArray mask, Rect rect, InputOutputArray bgdModel, InputOutputArray fgdModel, int iterCount, int mode)
If you would like greater flexibility, you can hack your own graphcut based segmentation algorithm using the following MRF library. This library allows you to specify your custom objective function in computing the graph cut: http://vision.middlebury.edu/MRF/code/
To use the MRF library, you will need to specify the "cost" at each point in your image indicating whether that point is "foreground" or "background". You can also think of this dichotomy as "important" or "not important" instead of "foreground" vs "background".
The MRF library's goal is to return you a label at each point such that total cost of assigning those labels is as small as possible. Hence, the game is to come up with a function to compute a small cost for points you consider important and large otherwise.
Specifically, the cost at each point is composed of 2 parts: 1) The data term/function and 2) The smoothness term/function. As mentioned earlier, the smaller the data term at each point, the more likely that point will be selected. If your "importance" score s_ij is in the range [0, 1], then a common way to compute your data term would be -log(s_ij).
The smoothness terms is a way to suggest whether 2 neighboring pixels p, q, should have the same label i.e. both "foreground", "background", or one "foreground" and the other "background". Similar to the data cost, you have to construct it such that the cost is small for neighbor pixels having similar "importance" score so that they will be assigned the same label. This term is responsible for "smoothing" the resulting mask so that you will not have pixels of low "importance" sprinkled within regions of high "importance" and vice versa. If there are such regions, OpenCV's findContours(...) function mentioned above will return contours for these regions, which can be filtered out perhaps by checking their size.
Details on functions to compute the cost can be found in the GrabCut paper: GrabCut
This blog post provides a bit more detail (and code) on creating your own graphcut segmentation algorithm in OpenCV: http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/
Another paper showing how to perform graph cut segmentation on grayscale images (your case), with better notations, and without the complicated image matting part (not implemented in OpenCV's version) in the GrabCut paper is this: Graph Cuts and Efficient N-D Image Segmentation
Hope this helps.

Resources