I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.
Related
Using OpenCV's findContours() I have a list of contours in an image. I'm interested only in the straight lines, so if they are too 'squiggly' they should be rejected. The question is how to evaluate how straight each contour is?
I looked at fitLine(), but there doesn't appear to be a goodness-of-fit measure returned. I could evaluate this myself using the returned line.
I looked at arcLength() with the aim to compare this to the bounding rectangle dimensions, but even for somewhat straight lines, the arc length can be relatively long if the contour points are dense.
I could find the convex hull and compare to the bounding rectangle dimensions, but I'd have to analyze the convexity defects.
Is there a moment that would be useful here?
Find the contours as you are doing now
Find the straight lines in the image using HoughLines()
Compute the overlap between the contours and the straight lines
Take two points (with for instance cv::approxPoly) on your contour and compute their absolute distance. Then go through the contour points between the two points and add up all the distances. If the difference between distance over the contour and the absolute distance is bigger than a certain threshold you can reject it.
The function, findContours() already approximated contours with line segments somehow. Each contour is represented by a list of points around it. For your purpose, simply computing the distances of each pair of consecutive points in the contour would give you all line segment lengths.
Here is an example:
c = cnts[0]
#d is the points in contour c shifted by one with wraparound (numpy.roll)
d = np.roll(c, 1, axis=0)
np.linalg.norm(c - d, axis = -1)
Can somebody explain me what exactly does a disparity map return. Because there is not much given in the documentation and I have a few questions related to it.
Does it return difference values of pixels with respect to both images?
How to use disparity values in the formula for depth estimation i.e.
Depth = focalLength*Baseline/Disparity
I have read somewhere that disparity map gives a function of depth f(z)
Please explain what it means. If depth is purely an absolute value how can it be generated as a function or is it a function with respect to the pixels?
The difference d = pl − pr of two corresponding image points is called disparity.
Here, pl is the position of the point in the left stereo image and pr is the position of the point in the right stereo image.
For parallel optical axes, the disparity is d = xl − xr
⇒ search for depth information is equivalent to search for disparity, i.e. corresponding pixel
the distance is inversely proportional to disparity
The disparity values are visualized in a so-called disparity map, each disparity value for each pixel in the reference image (here: left) is coded as a grayscale value. Also for pixel that do not have any correspondences, a grayscale value (here: black) is defined. The so-called groundtruth-map is a disparity map that contains the ideal solution of the correspondence problem.
Relation between Disparity and Depth information:
The following image represent two cameras (left and right) and then tries to find the depth of a point p(x_w, z_x).
The result of depth is given my:
so, it can be seen that the depth is inversely proportional to disparity.
UPDATE:
To calculate the disparity, you need two image (1) Left image and (2) Right image. Lets say that there is a pixel at position (60,30) in left image and that same pixel is present at position (40,30) in right image then your disparity will be: 60 - 40 = 20. So, disparity map gives you the difference between the position of pixels between left image and right image. If a pixel is present in left image but absent in right image then then value at that position in disparity map will be zero. Once you get the disparity value for each pixel of left image then we can easily calculate the depth using the formula given at the end of my answer.
Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.
I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.
For this reasons I would like to try what the author of SIFT said to be pretty good: voting.
I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:
Location [x, y] (someone says Traslation)
Scale
Orientation
While with opencv is easy to get the match scale and orientation with:
cv::Keypoints.octave
cv::Keypoints.angle
I am having hard time to understand how I can calculate the location.
I have found an interesting slide where with only one match we are able to draw a bounding box:
But I don't get how I could draw that bounding box with just one match. Any help?
You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.
Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.
The translation between two features is (dx,dy) = (x2-x1, y2-y1).
The scale change between two features is ds = s2 / s1.
The rotation between two features is d_theta = theta2 - theta1.
So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.
Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.
When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.
To complete Dima's , one needs to add that the 4D Hough space is quantized into a (possibly small) number of 4D boxes, where each box corresponds to the simiéarity given by its center.
Then, for each possible similarity obtained via a tentative matching of features, add 1 into the corresponding box (or cell) in the 4D space. The output similarity is given by the cell with the more votes.
In order to computethe transform from 1 match, just use Dima's formulas in his answer. For several pairs of matches, you may need to use some least squares fit.
Finally, the transform can be applied with the function cv::warpPerspective(), where the third line of the perspective matrix is set to [0,0,1].
I have written algorithm to extract the points shown in the image. They form convex shape and I know order of them. How do I extract corners (top 3 and bottom 3) from such points?
I'm using opencv.
if you already have the convex hull of the object, and that hull includes the corner points, then all you need to to do is simplify the hull until it only has 6 points.
There are many ways to simplify polygons, for example you could just use this simple algorithm used in this answer: How to find corner coordinates of a rectangle in an image
do
for each point P on the convex hull:
measure its distance to the line AB _
between the point A before P and the point B after P,
remove the point with the smallest distance
repeat until 6 points left
If you do not know the exact number of points, then you could remove points until the minimum distance rises above a certain threshold
you could also do Ramer-Douglas-Peucker to simplify the polygon, openCV already has that implemented in cv::approxPolyDP.
Just modify the openCV squares sample to use 6 points instead of 4
Instead of trying to directly determine which of your feature points correspond to corners, how about applying an corner detection algorithm on the entire image then looking for which of your feature points appear close to peaks in the corner detector?
I'd suggest starting with a Harris corner detector. The OpenCV implementation is cv::cornerHarris.
Essentially, the Harris algorithm applies both a horizontal and a vertical Sobel filter to the image (or some other approximation of the partial derivatives of the image in the x and y directions).
It then constructs a 2 by 2 structure matrix at each image pixel, looks at the eigenvalues of that matrix, and calls points corners if both eigenvalues are above some threshold.