What is difference between features and keypoints in computer vision? - image-processing

I am studying something about some possibilities of OpenCV object detection and this is confusing to me. I just don't see the difference between these two.

Image features are small patches that are useful to compute similarities between images. An image feature is usually composed of a feature keypoint and a feature descriptor.
The keypoint usually contains the patch 2D position and other stuff if available such as scale and orientation of the image feature.
The descriptor contains the visual description of the patch and is used to compare the similarity between image features.

Related

How does multiscale feature matching work? ORB, SIFT, etc

When reading about classic computer vision I am confused on how multiscale feature matching works.
Suppose we use an image pyramid,
How do you deal with the same feature being detected at multiple scales? How do you decide which to make a deacriptor for?
How do you connected features between scales? For example let's say you have a feature detected and matched to a descriptor at scale .5. Is this location then translated to its location in the initial scale?
I can share something about SIFT that might answer question (1) for you.
I'm not really sure what you mean in your question (2) though, so please clarify?
SIFT (Scale-Invariant Feature Transform) was designed specifically to find features that remains identifiable across different image scales, rotations, and transformations.
When you run SIFT on an image of some object (e.g. a car), SIFT will try to create the same descriptor for the same feature (e.g. the license plate), no matter what image transformation you apply.
Ideally, SIFT will only produce a single descriptor for each feature in an image.
However, this obviously doesn't always happen in practice, as you can see in an OpenCV example here:
OpenCV illustrates each SIFT descriptor as a circle of different size. You can see many cases where the circles overlap. I assume this is what you meant in question (1) by "the same feature being detected at multiple scales".
And to my knowledge, SIFT doesn't really care about this issue. If by scaling the image enough you end up creating multiple descriptors from "the same feature", then those are distinct descriptors to SIFT.
During descriptor matching, you simply brute-force compare your list of descriptors, regardless of what scale it was generated from, and try to find the closest match.
The whole point of SIFT as a function, is to take in some image feature under different transformations, and produce a similar numerical output at the end.
So if you do end up with multiple descriptors of the same feature, you'll just end up having to do more computational work, but you will still essentially match the same pair of feature across two images regardless.
Edit:
If you are asking about how to convert coordinates from the scaled images in the image pyramid back into original image coordinates, then David Lowe's SIFT paper dedicates section 4 on that topic.
The naive approach would be to simply calculate the ratios of the scaled coordinates vs the scaled image dimensions, then extrapolate back to the original image coordinates and dimensions. However, this is inaccurate, and becomes increasingly so as you scale down an image.
Example: You start with a 1000x1000 pixel image, where a feature is located at coordinates (123,456). If you had scaled down the image to 100x100 pixel, then the scaled keypoint coordinate would be something like (12,46). Extrapolating back to the original coordinates naively would give the coordinates (120,460).
So SIFT fits a Taylor expansion of the Difference of Gaussian function, to try and locate the original interesting keypoint down to sub-pixel levels of accuracy; which you can then use to extrapolate back to the original image coordinates.
Unfortunately, the math for this part is quite beyond me. But if you are fluent in math, C programming, and want to know specifically how SIFT is implemented; I suggest you dive into Rob Hess' SIFT implementation, lines 467 through 648 is probably the most detailed you can get.

Feature Detection and Feature Descriptor in Image Processing

Well, I am clear about Feature Detection and Feature Descriptor. Feature detection is finding some interesting points in an image and we can describe them by descriptor like SIFT, HoG etc. My doubt is very specific. Suppose I have an image(I), I applied Harris Detector and found x,y positions of the corners in that image. Now, I want to apply SIFT to find SIFT features so how should I do it ? Should I make a new image with detected corners only and then should apply SIFT over it ? Or SIFT should be applied on image I (but that serves no purpose I guess) ?
Please help me to have some clarity on practical grounds.
The SIFT descriptor, as you say, describes the feature point. However, SIFT also tries to be scale invariant. This means that the SIFT detector also examines potential key-points with respect to their response to various scales. The detector then records not only the x,y, but also the scale information.
This means that you're probably better off using the detector that comes with SIFT along with the descriptor. Both Matlab and OpenCV implementations easily allow you to detect and describe points.

What is the difference between image segmentation and feature extraction in image processing?

I have read an article regarding Brain tumor segmentation.That article has some methods to segment the brain tumor cells from normal brain cells.Those methods are pre-processing,segmentation and feature extraction.But I couldn't understand,whats the difference between segmentation and Feature extraction.I googled it also,but still I didn't understand.Can anyone please explain the basic concept of this methods?
Segmentation is usually understood as the decomposition of a whole into parts. In particular, decomposing or partitioning an image into homogeneous regions.
Feature extraction is a broader concept, which can be described as finding areas with specific properties, such as corners, but it can also be any set of measurements, be them scalar, vector or other. Those features are commonly used for pattern recognition and classification.
A typical processing scheme could be to segment out cells from the image, then characterizing their shape by means of, say edge smoothness features, and telling normal from ill cells.
Image Segmentation vs. Feature Localization • Image Segmentation: If R is a segmented region,
1. R is usually connected; all pixels in R are connected (8- connected or 4-connected).
2. Ri \ Rj = , i 6= j; regions are disjoint.
3. [ni=1Ri = I, where I is the entire image; the segmentation
is complete.
• Feature Localization: a coarse localization of image fea- tures based on proximity and compactness – more e↵ective than Image Segmentation.
Feature extraction is a prerequisite for image segmentation.
When you face a project for segmenting a particular shape or structure in an image, one of the procedure to be applied is to extract the relevant features for that region so that you can differentiate it from other region.
A simple and basic features which are commonly used in image segmentation could be intensity. So you can make different groups of structure based on the intensity they show in the image.
Feature extraction is used for classification and relevant and significant features are used for labeling different classed inside an image.

What is a feature descriptor in image processing (algorithm or description)?

I get often confused with the meaning of the term descriptor in the context of image features. Is a descriptor the description of the local neighborhood of a point (e.g. a float vector), or is a descriptor the algorithm that outputs the description? Also, what exactly is then the output of a feature-extractor?
I have been asking myself this question for a long time, and the only explanation I came up with is that a descriptor is both, the algorithm and the description. A feature detector is used to detect distinctive points. A feature-extractor, however, does then not seem to make any sense.
So, is a feature descriptor the description or the algorithm that produces the description?
A feature detector is an algorithm which takes an image and outputs locations (i.e. pixel coordinates) of significant areas in your image. An example of this is a corner detector, which outputs the locations of corners in your image but does not tell you any other information about the features detected.
A feature descriptor is an algorithm which takes an image and outputs feature descriptors/feature vectors. Feature descriptors encode interesting information into a series of numbers and act as a sort of numerical "fingerprint" that can be used to differentiate one feature from another. Ideally this information would be invariant under image transformation, so we can find the feature again even if the image is transformed in some way. An example would be SIFT, which encodes information about the local neighbourhood image gradients the numbers of the feature vector. Other examples you can read about are HOG and SURF.
EDIT: When it comes to feature detectors, the "location" might also include a number describing the size or scale of the feature. This is because things that look like corners when "zoomed in" may not look like corners when "zoomed out", and so specifying scale information is important. So instead of just using an (x,y) pair as a location in "image space", you might have a triple (x,y,scale) as location in "scale space".
For the descriptor, I understand as the description of the neighborhood of a point on the image. In other words, it is a vector in the image (descriptions of the visual features of the contents in images).
For example, there is method in the HOG (Histogram of Oriented Gradients) called Image Gradients and Spatial/Orientation Binning. The extractHOGFeatures in Matlab and Classification using HOG had visual examples for better understanding.

object recognition performance not good

I am trying to do object recognition using algorithms such as SURF, FERN, FREAK in opencv 2.4.2.
I am using the programs from opencv samples without modifications - find_obj.cpp, find_obj_ferns.cpp, freak_demo.cpp
I tried changing the parameters for the algorithms which didn't help.
I have my training images, test images and the result of FREAK recognition here
As you can see the result is pretty bad.
No feature descriptors is detected for one of the training image - image here
Feature descriptors are detected outside the object boundary for the other - image here
I have a few questions:
Why does these algorithms work with grayscale images ? It is apparent that for my above training images, the object can be detected easily if RGB is included. Is there any technique that takes this into account.
Is there any other way to improve performance. I tried fiddling with feature parameters which didn't work well.
First thing i observed in your image is, object is plane and no texture differences are there...I mean all the feature detectors you used are for finding corners which are view invariant, it means those are the keypoints in an image which are having unique neighborhood and good magnitude of x and y derivatives. I have uploaded my analysis...see the figures
How to know what I am saying is correct?
Just go to the descriptor values of a keypoint you find over your object and see the values, you will see most of them are zeros...Because a descriptor is the description of variation of the edges around a corner point in a specific direction (see surf documentation for more details).
The object you are trying to detect is looking like a mobile phone, so you just reverse the object or mobile and repeat the experiment and you will surely get good results...Because on front side generally objects have more texture like switches, logos etc..
Here is a result I uploaded,

Resources