Perspective Compensation when Measuring Distances on an Image with a known reference distance

Perspective Compensation when Measuring Distances on an Image with a known reference distance - opencv

I am trying to calculate the real world distance of an arbitrary line drawn along the field of view from a one point perspective, single camera setup.
I will have a known distance running parallel. How can I find the compensation factor I need to apply to the pixel length of the measuring line?
Do I have to take into account the distance from the vanishing point, as the length per pixel increases the nearer you get to the vanishing point? Do I need to use the gradient of the known line to give me a rate of change?

A good study on this and similar problems can be found in Antonio Criminisi's papers and Ph.D. thesis on single-view metrology. This is a good link to start, and the whole paperdump is here

Related

How to find hinge point or axis of rotation point from top view using image processing?

I have a problem at hand where I need to detect/predict the coordinates of the hinge point or axis of rotation point using image processing. The image is as shown below:
I've used a method where I started with tracking the circular movement (in an arc) of a few feature points in an RoI around the default hinge coordinates (entered manually) in a configuration file. This circular motion of these tracked points happens around the vertical axis which passes through the hinge point. Now, I tracked these points from their initial position until the connecting bar made a particular angle (15°/20°) with the y-axis, I drew secants between these different positions (start and end positions) of the same point and drew its perpendicular bisector, which will ideally pass through the centre of the (concentric) circles, which is the ideal hinge point.
Eg:
y_intercepts calculated for each point
H0 (322, 42)
H1 (322, 64) (within tolerance, closest to GT)
H2 (322, 48)
H_avg (322,52)
H_groundtruth (x,y): (322, 61)
We need an accuracy or tolerance of +/- 3 pixels.
Now, the issues we faced in this ideal scenario to practical working of it is:
Different tracked points give different potential hinge points (different dots on the vertical yellow line), (few of which are very close the ground truth(yellow circle)), but their weighted/average (big green circle) goes off the mark. Quite frankly, this is a problem of too many in which we do get the closest potentially to ground truth, but we’re not sure, which of these points is the closest as we’re not to use the default hitch coordinates (entered manually) from config file.

One solution could be to use frameworks already implemented for image registration such as elastix. If you configure it for a rigid registration, you can get the transformation matrix and therefore the center of the rotation.
The problem here is that only one part of your image is moving. Before doing the registration, I would simply mask the region of interest by calculating a mask from the subtraction of the two images, to keep only the part where something actually moved.
Such approach could get a subpixel accuracy. You could also repeat it for multiple angles and average the result. Alternatively to the averaging, you could use the RANSAC algorithm to know which hinge points are off (outliers) and exclude them.
Here is an example how to do a simple rigid transformation with elastix.
I hope this helps!

I intended this as only a comment, but it ended up significantly over the character limit:
The problem from an accuracy perspective (sorry, couldn't resist) seems to be that you're trying to use a planar euclidean geometry technique to solve a projective geometry problem.
Those feature tracks are only circular arcs in 3D world space. They're actually (noisy) elliptical arcs in 2D image pixel space due to the projection.
Your hinge rotation axis isn't a single pixel either, unless your camera's optical axis is directly aligned with the hinge axis. If that's not the case (as the perspective in the photo you added suggests), then your hinge axis is actually a line in pixel space, not a point, and different heights for the different tracks in model space will be 'centered' around different pixels on that line. So asking for +/- 3 pixel hinge 'point' accuracy is unclear, and so is measuring angles in pixel space in general in a way that doesn't account for perspective.
I only mention these details because you seem focused on measuring accurately. Often, those kinds of 2D approximations are fine for many applications, but high accuracy and precision from a single camera (if that's really what you need) requires better 3D scene understanding. (Or you could train a deep network with a bunch of labeled ground truth images and let it figure out the mappings.)
Now maybe you don't need such high accuracy for your application after all. In that case, simple affine geometry techniques like that mentioned in the other answer might work well enough.

How to 3d reconstruct robustly from multiple images with known poses in OpenCV

The traditional solution for high resolution images examples :
extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.
I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)
second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )
I searched quite a bit for libraries in that direction, with no useful result.
again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA),
A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful

this basically shows the underlying geometry for estimating the depth.
As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image
Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.
Pretty easy hey? Truth is it's way harder, image is never strictly convex:
This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?
However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.
To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.
The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.
Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.
To sum up, the total process of the depth estimation is like the following steps:
assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.

Camera Calibration: How to do it right

I am trying to calibrate a camera using a checkerboard by the well known Zhang's method followed by bundle adjustment, which is available in both Matlab and OpenCV. There are a lot of empirical guidelines but from my personal experience the accuracy is pretty random. It could sometimes be really good but also sometimes really bad. The result actually can vary quite a bit just by simply placing the checkerboard at different locations. Suppose the target camera is rectilinear with 110 degree horizontal FOV.
Does the number of squares in the checkerboard affect the accuracy? Zhang uses 8x8 in his original paper without really explaining why.
Does the length of the square affect the accuracy? Zhang uses 17cm x 17cm without really explaining why.
What is the optimal number of snap shots of different checkerboard position/orientation? Zhang uses 5 images only. I saw people suggesting 20~30 images with checkerboards at various angles, fills the entire field of view, tilted to the left, right, top and bottom, and suggested there should be no checkerboard placed at similar position/orientation otherwise the result will be biased towards that position/orientation. Is this correct?
The goal is to figure out a workflow to get consistent calibration result.

If the accuracy you get is "pretty random" then you are likely not doing it right: with stable optics and a well conducted procedure you should consistently be getting RMS projection errors within a few tenths of a pixel. Whether this corresponds to variances of millimeters or meters in 3D space depends, of course, on your optics and sensor resolution (calibration is not a way around physics).
I wrote time ago a few suggestions in this answer, and I recommend you follow them. In particular, pay attention to locking the focus distance (I have seen & heard countless people trying to calibrate a camera on autofocus, and be sorely disappointed). As for the size of the target, again it depends on your optics and camera resolution, but generally speaking the goals are (1) to fill with measurements both the field of view and the volume of space you'll be working with, and (2) to observe significant perspective foreshortening, because that is what constrains the solution for the FOV. Good luck!
[Ed.to address a comment]
Concerning variations on the parameter values across successive calibrations, the first thing I'd do is calculate the cross RMS errors, i.e. the RMS error on dataset 1 with the camera calibrated on dataset 2, and vice versa. If either is significantly higher than the calibration errors, it's an indication that the camera has changed between the two calibrations and so all odds are off. Do you have auto-{focus,iris,zoom,stabilization} on? Turn them all off: auto-anything is the bane of calibration, with the only exception of exposure time. Otherwise, you need to see if the variations you observe on the parameters are actually meaningful (hint, they often are not). A variation of the focal length in pixels of several parts per thousand is probably irrelevant with today's sensor resolutions - you can verify that by expressing it in mm, and comparing it to the dot pitch of the sensor. Also, variations of the position of the principal point in the order of tens of pixels are common, since it is poorly observed unless your calibration procedure is very carefully designed to estimate it.

Ideally you want to place your checkerboard at roughly the same distance from the the camera, as the distance at which you want to do your measurements. So your checkerboard squares must be large enough to be resolvable from that distance. You also do want to cover the entire field of view with points, especially close to the edges and corners of the frame. Also, the smaller the board is, the more images you should take to cover the entire field of view. So 20-30 images is usually a good rule of thumb.
Another thing is that the checkerboard should be asymmetric. Ideally, you want to have an even number of squares along one side, and an odd number of squares along the other. This way the board's in-plane orientation is unambiguous.
Also, I would suggest that you try the Camera Calibrator app in MATLAB. At the very least, look at the documentation, which has a lot of useful suggestions for calibrating cameras.

Unstable homography estimation using ORB

I am developing a feature tracking application and so far, after trying to almost all the feature detectors/descriptors, i've got the most satisfactory overall results with ORB.
Both my feature descriptor and detector is ORB.
I am selecting a specific area for detecting features on my source image (by masking). and then matching it with features detected on subsequent frames.
Then i filter my matches by performing ratio test on 'matches' obtained from the following code:
std::vector<std::vector<DMatch>> matches1;
m_matcher.knnMatch( m_descriptorsSrcScene, m_descriptorsCurScene, matches1,2 );
I also tried the two way ratio test(filtering matches from Source to Current scene and vice-versa, then filtering out common matches) but it didn't do much, so I went ahead with the one way ratio test.
i also add a min distance check to my ratio test, which, it apppears, gives better results
if (distanceRatio < m_fThreshRatio && bestMatch.distance < 5*min_dist)
{
refinedMatches.push_back(bestMatch);
}
and in the end , i estimate the Homography.
Mat H = findHomography(points1,points2);
I've tried using the RANSAC method for estimating inliners and then using those to recalculate my Homography, but that gives more unstability plus consumes more time.
then in the end i draw a rectangle around my specific region which is to be tracked. i get the plane coordinates by:
perspectiveTransform( obj_corners, scene_corners, H);
where 'objcorners' are the coordinates of my masked(or unmasked) region.
The reactangle I draw using 'scene_corners' seems to be vibrating. increasing the number of features has reduced it quite a bit, but I cant increase them too much because of the time constraint.
How can i improve the stability?
Any suggestions would be appreciated.
Thanks.

If it is the vibrations that are really bothersome to you then you could try taking the moving average of the homography matrices over time:
cv::Mat homoG = cv::findHomography(obj, scene, CV_RANSAC);
if (homography.empty()) {
homoG.copyTo(homography);
}
cv::accumulateWeighted(homoG, homography, 0.1);
Make the 'homography' variable global, and keep calling this every time you get a new frame.
The alpha parameter of accumulateWeighted is the reciprocal of the period of the moving average.
So 0.1 is taking the average of the last 10 frames and 0.2 is taking the average of the last 5 and so on...

A suggestion that comes to mind from experience with feature detection/matching is that sometimes you just have to accept the matched feature points will not work perfectly. Even subtle changes in the scene you are looking at can cause somewhat annoying problems, for example changes in light or unwanted objects coming into view.
It appears to me that you have a decently working feature matching in place from what you say, you may want to work on a way of keeping the region of interest constant. If you know the typical speed or any other movement patterns unique to any object you are trying to track between frames, or any constraints relating to the position of your camera, it may be useful in avoiding recalculating the region of interest unnecessarily causing vibrations. Or in fact it may help in creating a more efficient searching algorithm, allowing you to increase the number of feature points you can detect and use.
Another (small) hack you can use is to avoid redrawing the region window if the previous window was of similar size and position.

Statistical analysis on Bell shaped (Gaussian) curve

In my application I am getting images (captured by a high speed camera) containing projections of some light sources on the screen.
1-My first task is to plot a PDF or intensity distribution plot for the light intensity, which should come as bell shape or Gaussian, since at the center the light intensity will be maximum and at the ends it will be diminishing. Like this(just for example, not the exact case for me):
In worst cases I will be having a series of light sources illuminated simultaneously. In such cases theoretically I should get overlapping bell or Gaussian curves, some what like this:
How do I plot such a curve given the Images of light projection (like the one in the figure)?
2-After the Gaussian curve is drawn, the next job is to analyze the same such as finding width and height of the curve. How do I go for this?
I want an executable for this application, so a solution given by MATLAB or similar tool is not acceptable to my client. Also i want the solution to work in real time or near real time.
I guess OpenCV can be used here. But before I start I would like to know opinions of Image processing gurus on this forum. Especially for the step -1 above, I need some inputs.
Any pointers here?
Rgrds,
Heshsham
Note: Image is taken from http://pentileblog.com.

To get the 1D Gaussian out of the 2D one, you can do a couple of things depending on what you want exactly.
- You could sum over every column of the image;
- You could find the local maximum in intensity and copy the intensity profile of that row of the image only;
- You could threshold the image (in case your maximum will be saturated and therefore a plateau), determine the center of gravity of the remaining blob, and copy that row's intensity profile;
- You could threshold, find contours, determine multiple local maxima, and grab multiple intensity profiles if the application calls for it (e.g. if the blobs are not horizontally aligned).
To get the height and width, it's pretty easy, just find the maximum and the points left and right of it where the curve drops to half of the maximum. The standard deviation is the distance between the two points divided by 2.35 (wikipedia link).

Well I solved it:
Algorithms is as follows:
1-use cvSampleLine for reading a particual line of image
2- use cvMinMaxLoc to know the maximum pixel value in a line
3- Note which of these lines is having highest pixel value. Lets say line no. 150
4- Plot pixel value for line 150.
I used MATLAB for verifying my results and graphs, and the OpenCV result is exactly the same.
Thanks for your suggestions guys.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart