Fitting a line - MatLab disagrees with OpenCV - ios

Take sample points (10,10), (20,0), (20,40), (20,20).
In Matlab polyfit returns slope 1, but for the same data openCV fitline returns slope 10.7. From hand calculations the near vertical line (slope 10.7) is a much better least squares fit.
How come we’re getting different lines from the two libraries?
OpenCV code - (on iOS)
vector<cv::Point> vTestPoints;
vTestPoints.push_back(cv::Point( 10, 10 ));
vTestPoints.push_back(cv::Point( 20, 0 ));
vTestPoints.push_back(cv::Point( 20, 40 ));
vTestPoints.push_back(cv::Point( 20, 20 ));
Mat cvTest = Mat(vTestPoints);
cv::Vec4f testWeight;
fitLine( cvTest, testWeight, CV_DIST_L2, 0, 0.01, 0.01);
NSLog(#"Slope: %.2f",testWeight[1]/testWeight[0]);
xcode Log shows
2014-02-12 16:14:28.109 Application[3801:70b] Slope: 10.76
Matlab code
>> px
px =  10 20 20 20
>> py
py = 10 0 20 40
>> polyfit(px,py,1)
ans = 1.0000e+000 -2.7733e-014

MATLAB is trying to minimise the error in y for an given input x (i.e. as if x is your independent and y your dependant variable).
In this case, the line that goes through the points (10,10) and (20,20) is probably the best bet. A near vertical line that goes close to all three points with x=20 would have a very large error if you tried to calculate a value for y given x=10.
Although I don't recognise the OpenCV syntax, I'd guess that CV_DIST_L2 is a distance metric that means you're trying to minimise overall distance between the line and each point in the x-y plane. In that case a more vertical line which passes through the middle of the point set would be the closest.
Which is "correct" depends on what your points represent.

Related

How to find width of a line from its skeleton?

This is similar to a question here But I couldn't use its solution. I'm working in Julia and I'm not sure how it works in this case. Here is an image I'm working on
Due to the curve in line, and unsmooth edges, its not straightforward to get width 'd' of the line. I dont know how to find orthogonal line at each point and get distance d1,d2.. I plan to take its average later to get an estimate of line width. Any tips? Thanks
Using distance transform gets distance from neighbouring pixels, dist = 1 .- distance_transform(feature_transform(Gray.(line_image) .> 0.5)); and lookingup closest pixel with greater distance is the pixel of interest. Using Euclidian distance from an index in centrepoint (skeleton) to nearest distance * 2 gets width of the line.

Hough Transform Accumulator to Cartesian

I'm studying a course on vision systems and one of the questions posed was;
For the accumulator shown;
Determine the most likely r,θ combination representing the straight line of the greatest strength in the original image.
From my understanding of the accumulator this would be r = 60, θ = 150 as the 41 votes is the highest number of votes in this cluster of large votes. Am I correct with this combination?
And hence calculate the equation of this line in the form y = mx + c
I'm not sure of the conversion steps required to convert the r = 60, θ = 150 to y = mx + c with the information given since r = 60, θ = 150 denotes 1 point on the line.
State the resolution of your answer and give your reasoning
I assume the resolution is got to do with some of the steps in the auscultation and not the actual resolution of the original image since that's irrelevant to the edges detected in the image.
Any guidance on the above 3 points would be greatly appreciated!
Yes, this is correct.
This is asking you what the slope and intercept are of the line given r and theta. r and theta are not one point on the line, they are one point of the accumulator. r and theta describe a line using the line equation in polar coordinates: . This is the cool thing about the hough transform, every line in one space, (i.e. image space) can be described by a point in another space (r, theta). This could be done with m and b from the line equation , but as we all know, m is undefined for vertical lines. This is the reason the polar line equation is used. It is important to note that the line described by the HT r and theta refers to a line from the origin extending to the actual line in the image. This means your image line y = mx + b equation will need to be orthogonal to the polar equation. The wiki article on the HT describes this well and shows examples. I would recommend drawing a diagram of your r and theta extending to a line like this:
Then use trig to get two points on the red line. Two points are enough to give you m and b from the line equation.
I'm not entirely sure what "resolution" refers to in this context. But it does seem like your line estimator will have some precision loss since r is every 20 mm and theta is every 15 degrees. Perhaps it is asking what degree of error you could get given an accumulator of this resolution.

Homography estimation from consecutive video frame lines gives bad results

I am trying to build a program which detects offside situation in a football video sequence. In order to track better players and ball I need to estimate the homography between consecutive frames. I am doing this project in Matlab.
I am able to find enough corresponding lines between frames but it seems to me that the resulting homography isn't correct.
I start from the following situation, where I have these two processed images (1280x720 px) with corresponding lines:
image 1 and image 2.
Lines derive from the Hough transform and are of the form cross(P1, P2), where P(i) is [x y 1]', with 0 < x,y < 1 (devided by the image width and height). Lines are normalized too, devided by the third component).
Before lines normalization (just after cross product) I have:
Lines from frame 1 (one line per row).
[ -0.9986 -0.2992 0.6792
-0.9986 -0.4305 0.5686
-0.8000 -0.4500 0.3613
-0.9986 -0.1609 0.7890
-0.9986 -0.0344 0.9074
-0.2500 -0.2164 0.0546]
These are lines from frame 2:
[-0.9986 -0.2984 0.6760
-0.9986 -0.4313 0.5678
-0.7903 -0.4523 0.3587
-0.9986 -0.1609 0.7890
-0.9986 -0.0391 0.9066
-0.2486 -0.2148 0.0539]
After normalization for each mathching line (in this case all rows correspond) I create matrix A(j)
[-u 0 u*x -v 0 v*x -1 0 x];
[0 -u u*y 0 -v v*y 0 -1 y];
where line(j)_1 is [x y 1]' and line(j)_2 is [u v 1]'. Then I form the entire matrix A and calculate SVD
[~,~,V] = svd(A);. Rearranging the last column of V as a 3x3 matrix will give H as:
[0.4234 0.0024 -0.3962
-0.3750 -0.0030 0.3503
0.4622 0.0029 -0.4322]
This homography matrix works quite well for the parallel lines above and the vanishing point (intersection of those lines) but it does a terrible job elsewhere. For example one vanishing point is in unscaled coordinates (1194.2, -607.4), it is supposed to stay there and in fact will be mapped few pixel around (5~10px). But if I take a random point in (300, 300) will go to (1174.1, -582.7)!
I can't see if I did some big mistake or it is because the noise in the measurements. Can you help me?
Well, you computed a homography mapping lines to lines. If you want the corresponding pointwise homography you need to invert and transpose it. See, for example, Chapter 1.3.1 of Hartley and Zisserman's "Multiple View Geometry".
From the images you posted, it looks like that the lines you are considering are all parallel to each other in the scene. Then the problem is ill-posed because there are an infinite amount of homographies which explain the resulting correspondences. Try adding to your correspondences lines which have other directions.

Where to center the kernel when using FFTW for image convolution?

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?
For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)
I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf
I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

How to test proximity of lines (Hough transform) in OpenCV

(This is a follow-up from this previous question).
I was able to successfully use OpenCV / Hough transforms to detect lines in pictures (scanned text); at first it would detect many many lines (at least one line per line of text), but by adjusting the 'threshold' parameter via trial-and-error, it now only detects "real" lines.
(The 'threshold' parameter is dependant on image size, which is a bit of a problem if one has to deal with images of different resolutions, but that's another story).
My problem is that the Hough transform sometimes detects two lines where there is only one; those two lines are very near one another and (apparently) parallel.
=> How can I identify that two lines are almost parallel and very near one another? (so that I can keep only one).
If you use the standard or multiscale hough, you will end up with the rho and theta coordinates of the lines in polar coordinates. Rho is the distance to the origin, and theta is normally the angle between the detected line and the Y axis. Without looking into the details of the hough transform in opencv, this is a general rule in those coordinates: two lines will be almost parallel and very near one another when:
- their thetas are nearly identical AND their rhos are nearly identical
OR
- their thetas are near 180 degrees apart AND their rhos are near each other's negative
I hope that makes sense.
That's interesting about the theta being the angle between the line and the y-axis.
Generally, the rho and theta values are visualized as being the angle from the x-axis to the line perpendicular to the line in question. The rho is then the length of this perpendicular line. Thus, a theta = 90 and rho = 20 would mean a horizontal line 20 pixels up from the origin.
A nice image is shown on Hough Transform question

Resources