A computer-tomography device has a roentgen matrix of 20x500 dots with the resolution of 2mm in each direction. This matrix is rotating around a belt, shich transports items to be analysed. A special reconstruction algorithm produces 3D model of the items from many-many matixes captured from all 360 perspectives ( one image per 1° angle).
The problem is, the reconstruction algorithm is very sensitive to the belt speed/position. Measuring the belt position requires quite complicated and expensive positining sensors and very fine mechanics.
I wonder if it is possible to calculate the belt velocity using the roentgen-image itself. It has a width of 40mm and should be sufficient for capturing the movement. The problem is, the movement is always in 2 directions - rotation and X (belt). For those working in CT-area, are you aware of some applications/publishings about such direct measurement of the belt/table velocity?
P.S.: It is not vor medical application.
Hmm, interesting idea.
Are you doing a full 180 degree for the reconstruction? I'd go with the 0 and 180 degree cone beam images. They should be approximately the same, minus some non-linear stuff, artifacts, Poisson noise and difference in 'shadows' and scattering due to perspective.
You could translate the 180 image along the negative x-axis, to the direction opposite of the movement. You could then subtract images at some suitable intervals along this axis. When the absolute of the sum hits a minimum the translation should be approx at the distance the object has moved between 0 and 180, as the mirror images cancel each other out partially.
This could obviously be ruined by artifacts and wonkily shaped heavy objects. Still, worth a try. I'm assuming your voltage is pretty high if you are doing industrial stuff.
EDIT: "A special reconstruction algorithm produces 3D model of the items from many-many matixes captured from all 360 perspectives ( one image per 1° angle)."
So yes, you are using 180+ degrees. You could then perhaps use multiple opposite images for a more robust routine. How do you get the full circle? Are you shooting through the belt?
Related
I have a problem at hand where I need to detect/predict the coordinates of the hinge point or axis of rotation point using image processing. The image is as shown below:
I've used a method where I started with tracking the circular movement (in an arc) of a few feature points in an RoI around the default hinge coordinates (entered manually) in a configuration file. This circular motion of these tracked points happens around the vertical axis which passes through the hinge point. Now, I tracked these points from their initial position until the connecting bar made a particular angle (15°/20°) with the y-axis, I drew secants between these different positions (start and end positions) of the same point and drew its perpendicular bisector, which will ideally pass through the centre of the (concentric) circles, which is the ideal hinge point.
Eg:
y_intercepts calculated for each point
H0 (322, 42)
H1 (322, 64) (within tolerance, closest to GT)
H2 (322, 48)
H_avg (322,52)
H_groundtruth (x,y): (322, 61)
We need an accuracy or tolerance of +/- 3 pixels.
Now, the issues we faced in this ideal scenario to practical working of it is:
Different tracked points give different potential hinge points (different dots on the vertical yellow line), (few of which are very close the ground truth(yellow circle)), but their weighted/average (big green circle) goes off the mark. Quite frankly, this is a problem of too many in which we do get the closest potentially to ground truth, but we’re not sure, which of these points is the closest as we’re not to use the default hitch coordinates (entered manually) from config file.
One solution could be to use frameworks already implemented for image registration such as elastix. If you configure it for a rigid registration, you can get the transformation matrix and therefore the center of the rotation.
The problem here is that only one part of your image is moving. Before doing the registration, I would simply mask the region of interest by calculating a mask from the subtraction of the two images, to keep only the part where something actually moved.
Such approach could get a subpixel accuracy. You could also repeat it for multiple angles and average the result. Alternatively to the averaging, you could use the RANSAC algorithm to know which hinge points are off (outliers) and exclude them.
Here is an example how to do a simple rigid transformation with elastix.
I hope this helps!
I intended this as only a comment, but it ended up significantly over the character limit:
The problem from an accuracy perspective (sorry, couldn't resist) seems to be that you're trying to use a planar euclidean geometry technique to solve a projective geometry problem.
Those feature tracks are only circular arcs in 3D world space. They're actually (noisy) elliptical arcs in 2D image pixel space due to the projection.
Your hinge rotation axis isn't a single pixel either, unless your camera's optical axis is directly aligned with the hinge axis. If that's not the case (as the perspective in the photo you added suggests), then your hinge axis is actually a line in pixel space, not a point, and different heights for the different tracks in model space will be 'centered' around different pixels on that line. So asking for +/- 3 pixel hinge 'point' accuracy is unclear, and so is measuring angles in pixel space in general in a way that doesn't account for perspective.
I only mention these details because you seem focused on measuring accurately. Often, those kinds of 2D approximations are fine for many applications, but high accuracy and precision from a single camera (if that's really what you need) requires better 3D scene understanding. (Or you could train a deep network with a bunch of labeled ground truth images and let it figure out the mappings.)
Now maybe you don't need such high accuracy for your application after all. In that case, simple affine geometry techniques like that mentioned in the other answer might work well enough.
I did a lot of experiment using the accelerometer for detecting the movement size(magnitude) just one value from x,y,z acceleration. I am using an iPhone 4 with accelerometer update frequency 1.0 / 50.0 (50HZ), but I've also tried with 100HZ, 150HZ, 200HZ.
Examples:
Acceleration on X axis
Acceleration on Y axis
Acceleration on Z axis
I assume ( I hope I am correct) that the accelerations are the small peaks on the graph, not the big steps. I think from my experiments that the big steps show the device position. If changed the position the step is changed too.
If my previous assumption is correct I need to cut the peaks from the graph and summarize them. Here comes my question how can I cut those peaks without losing the information, the peak sizes.
I know that the high pass filter does this kind of thinks(passes the high peaks and blocks the noise, the small ones, I've read some paper about the filters. But for me the filter cut a lot of information from my "signal"(accelerometer data).
I think that there should be a better way for getting the information out from the data.
I've tried a simple one which looks nice but it isn't correct.
I did this data data using my function magnitude
for i = 2 : length(x)
converted(i-1) = x(i-1) - x(i);
end
Where x is my data and converted array is the result.
The following row generated a the image below, which looks like nice.
xyz = magnitude(datay) + magnitude(dataz) + magnitude(datax)
However the problem with that solution is that if I have continuos acceleration the graph just will show the first point and then goes down. I know that I need somehow better filter, but I am bit confused. Could you give some advice how can I do this properly.
Thanks for your time,
I really appreciate your help
Edit(answers for Zaph question):
What are you trying to accomplish?
I want to measure the movement when the iPhone is placed to desk, chair or bed. The accelerometer is so sensible if I put down a pencil it to a desk it shows me. I want to measure all movement that happens in a specific time.
What are the scale units?
I'm not scaling the data.
When you say "device position" what do you mean, an accelerometer provides movement (in iPhones with gyros)
I am using only the accelerometer. When I put the device like the picture below I got values around -1 on x coordinate, 0.0 on z and y coordinate. This is what I mean as device position.
The measurements that are returned from the accelerometer are acceleration, not position.
I'm not sure what you mean with "big steps" but the peaks show a change of acceleration. The fact that the values are not 0 when holding the device still is from the fact that the gravitation accelerates the device with 9.81 m/s^2 (the magnitude of the acceleration vector).
You are potentially trying to do something quite difficult, especially the with low quality sensors that are embedded in phones. That is, getting the actual coordinate acceleration of the phone.
What you can do, is to detect the time periods when the phone was moved or touched. You can first calculate magnitude (norm) of acceleration signal and then, with a moving window, check areas where sample standard deviation is smaller than some threshold. Determining how the phone moved is more complicated issue. Of course you can check orientation for the stationary areas between movements.
I am using standard OpenCV functions to calibrate camera for intrinsic parameters. In order to obtain good results, I know we have to use images of the chessboard from different angles (considering different planes in the 3D). This is stated in all the documentations
and papers but I really don't understand, why is it so important for us to consider different planes and if there is an optimal number of planes that we have to consider for the best calibration results?
I will be glad if you can provide me reference to some paper or documentation which explains this. (I think Zhang's paper talks about it but, its mathematically intensive and was hart to digest.)
Thanks
Mathematically, a unique solution for the intrinsic parameters (up to scale) is defined only if you have 3 or more distinct images of the planar target. See page 6 of Zhang's paper: "If n images of the model plane are observed, by stacking n such equations as (8) we have Vb = 0 ; (9) where V is a 2n×6 matrix. If n ≥ 3, we will have in general a unique solution b defined up to a scale factor..."
There isn't an "optimal" number of planes, where data are concerned, the more you have the merrier you are. But as the solution starts to converge, the marginal gain in calibration accuracy due to adding an extra image becomes negligible. Of course, this assumes that the images show planes well separated in both pose and location.
See also this other answer of mine for practical tips.
If you're looking for a little intuition, here's an example of why one plane isn't enough. Imagine your calibration chessboard is tilting away from you at a 45° angle:
You can see that when you move up the chessboard by 1 meter in the +y direction, you also move away from the camera by 1 meter in the +z direction. This means there's no way to separate the effect of moving in the y direction vs the z direction. The y and z movement directions are effectively tied to each other, for all our training points. So, if we just look at points on this one plane, there's no way to tease apart the effects of y movement vs z movement.
For example, from this 1 plane, we can't tell the difference between these scenarios:
The camera has perspective distortion such that things appear smaller in the image as they move in the world's +y direction.
The camera focal length is such that things appear smaller in the image as they move in the world's +z direction.
Any mixture of the effects in #1 and #2.
Mathematically, this ambiguity means that there are many equally possible solutions when OpenCV tries to fit a camera matrix to match the data. (Note that the 45° angle was not important. Any plane you choose will have the same problem: training examples' (x,y,z) dimensions are coupled together, so you can't separate their effects.)
One last note: if you make enough assumptions about the camera matrix (e.g. no perspective distortion, x and y scale identically, etc) then you can end up with a situation with fewer unknowns (in an extreme case, maybe you're just calculating the focal length) and in that case you could calibrate with just 1 plane.
I am totally new to camera calibration techniques... I am using OpenCV chessboard technique... I am using a webcam from Quantum...
Here are my observations and steps..
I have kept each chess square side = 3.5 cm. It is a 7 x 5 chessboard with 6 x 4 internal corners. I am taking total of 10 images in different views/poses at a distance of 1 to 1.5 m from the webcam.
I am following the C code in Learning OpenCV by Bradski for the calibration.
my code for calibration is
cvCalibrateCamera2(object_points,image_points,point_counts,cvSize(640,480),intrinsic_matrix,distortion_coeffs,NULL,NULL,CV_CALIB_FIX_ASPECT_RATIO);
Before calling this function I am making the first and 2nd element along the diagonal of the intrinsic matrix as one to keep the ratio of focal lengths constant and using CV_CALIB_FIX_ASPECT_RATIO
With the change in distance of the chess board the fx and fy are changing with fx:fy almost equal to 1. there are cx and cy values in order of 200 to 400. the fx and fy are in the order of 300 - 700 when I change the distance.
Presently I have put all the distortion coefficients to zero because I did not get good result including distortion coefficients. My original image looked handsome than the undistorted one!!
Am I doing the calibration correctly?. Should I use any other option than CV_CALIB_FIX_ASPECT_RATIO?. If yes, which one?
Hmm, are you looking for "handsome" or "accurate"?
Camera calibration is one of the very few subjects in computer vision where accuracy can be directly quantified in physical terms, and verified by a physical experiment. And the usual lesson is that (a) your numbers are just as good as the effort (and money) you put into them, and (b) real accuracy (as opposed to imagined) is expensive, so you should figure out in advance what your application really requires in the way of precision.
If you look up the geometrical specs of even very cheap lens/sensor combinations (in the megapixel range and above), it becomes readily apparent that sub-sub-mm calibration accuracy is theoretically achievable within a table-top volume of space. Just work out (from the spec sheet of your camera's sensor) the solid angle spanned by one pixel - you'll be dazzled by the spatial resolution you have within reach of your wallet. However, actually achieving REPEATABLY something near that theoretical accuracy takes work.
Here are some recommendations (from personal experience) for getting a good calibration experience with home-grown equipment.
If your method uses a flat target ("checkerboard" or similar), manufacture a good one. Choose a very flat backing (for the size you mention window glass 5 mm thick or more is excellent, though obviously fragile). Verify its flatness against another edge (or, better, a laser beam). Print the pattern on thick-stock paper that won't stretch too easily. Lay it after printing on the backing before gluing and verify that the square sides are indeed very nearly orthogonal. Cheap ink-jet or laser printers are not designed for rigorous geometrical accuracy, do not trust them blindly. Best practice is to use a professional print shop (even a Kinko's will do a much better job than most home printers). Then attach the pattern very carefully to the backing, using spray-on glue and slowly wiping with soft cloth to avoid bubbles and stretching. Wait for a day or longer for the glue to cure and the glue-paper stress to reach its long-term steady state. Finally measure the corner positions with a good caliper and a magnifier. You may get away with one single number for the "average" square size, but it must be an average of actual measurements, not of hopes-n-prayers. Best practice is to actually use a table of measured positions.
Watch your temperature and humidity changes: paper adsorbs water from the air, the backing dilates and contracts. It is amazing how many articles you can find that report sub-millimeter calibration accuracies without quoting the environment conditions (or the target response to them). Needless to say, they are mostly crap. The lower temperature dilation coefficient of glass compared to common sheet metal is another reason for preferring the former as a backing.
Needless to say, you must disable the auto-focus feature of your camera, if it has one: focusing physically moves one or more pieces of glass inside your lens, thus changing (slightly) the field of view and (usually by a lot) the lens distortion and the principal point.
Place the camera on a stable mount that won't vibrate easily. Focus (and f-stop the lens, if it has an iris) as is needed for the application (not the calibration - the calibration procedure and target must be designed for the app's needs, not the other way around). Do not even think of touching camera or lens afterwards. If at all possible, avoid "complex" lenses - e.g. zoom lenses or very wide angle ones. For example, anamorphic lenses require models much more complex than stock OpenCV makes available.
Take lots of measurements and pictures. You want hundreds of measurements (corners) per image, and tens of images. Where data is concerned, the more the merrier. A 10x10 checkerboard is the absolute minimum I would consider. I normally worked at 20x20.
Span the calibration volume when taking pictures. Ideally you want your measurements to be uniformly distributed in the volume of space you will be working with. Most importantly, make sure to angle the target significantly with respect to the focal axis in some of the pictures - to calibrate the focal length you need to "see" some real perspective foreshortening. For best results use a repeatable mechanical jig to move the target. A good one is a one-axis turntable, which will give you an excellent prior model for the motion of the target.
Minimize vibrations and associated motion blur when taking photos.
Use good lighting. Really. It's amazing how often I see people realize late in the game that you need a generous supply of photons to calibrate a camera :-) Use diffuse ambient lighting, and bounce it off white cards on both sides of the field of view.
Watch what your corner extraction code is doing. Draw the detected corner positions on top of the images (in Matlab or Octave, for example), and judge their quality. Removing outliers early using tight thresholds is better than trusting the robustifier in your bundle adjustment code.
Constrain your model if you can. For example, don't try to estimate the principal point if you don't have a good reason to believe that your lens is significantly off-center w.r.t the image, just fix it at the image center on your first attempt. The principal point location is usually poorly observed, because it is inherently confused with the center of the nonlinear distortion and by the component parallel to the image plane of the target-to-camera's translation. Getting it right requires a carefully designed procedure that yields three or more independent vanishing points of the scene and a very good bracketing of the nonlinear distortion. Similarly, unless you have reason to suspect that the lens focal axis is really tilted w.r.t. the sensor plane, fix at zero the (1,2) component of the camera matrix. Generally speaking, use the simplest model that satisfies your measurements and your application needs (that's Ockam's razor for you).
When you have a calibration solution from your optimizer with low enough RMS error (a few tenths of a pixel, typically, see also Josh's answer below), plot the XY pattern of the residual errors (predicted_xy - measured_xy for each corner in all images) and see if it's a round-ish cloud centered at (0, 0). "Clumps" of outliers or non-roundness of the cloud of residuals are screaming alarm bells that something is very wrong - likely outliers due to bad corner detection or matching, or an inappropriate lens distortion model.
Take extra images to verify the accuracy of the solution - use them to verify that the lens distortion is actually removed, and that the planar homography predicted by the calibrated model actually matches the one recovered from the measured corners.
This is a rather late answer, but for people coming to this from Google:
The correct way to check calibration accuracy is to use the reprojection error provided by OpenCV. I'm not sure why this wasn't mentioned anywhere in the answer or comments, you don't need to calculate this by hand - it's the return value of calibrateCamera. In Python it's the first return value (followed by the camera matrix, etc).
The reprojection error is the RMS error between where the points would be projected using the intrinsic coefficients and where they are in the real image. Typically you should expect an RMS error of less than 0.5px - I can routinely get around 0.1px with machine vision cameras. The reprojection error is used in many computer vision papers, there isn't a significantly easier or more accurate way to determine how good your calibration is.
Unless you have a stereo system, you can only work out where something is in 3D space up to a ray, rather than a point. However, as one can work out the pose of each planar calibration image, it's possible to work out where each chessboard corner should fall on the image sensor. The calibration process (more or less) attempts to work out where these rays fall and minimises the error over all the different calibration images. In Zhang's original paper, and subsequent evaluations, around 10-15 images seems to be sufficient; at this point the error doesn't decrease significantly with the addition of more images.
Other software packages like Matlab will give you error estimates for each individual intrinsic, e.g. focal length, centre of projection. I've been unable to make OpenCV spit out that information, but maybe it's in there somewhere. Camera calibration is now native in Matlab 2014a, but you can still get hold of the camera calibration toolbox which is extremely popular with computer vision users.
http://www.vision.caltech.edu/bouguetj/calib_doc/
Visual inspection is necessary, but not sufficient when dealing with your results. The simplest thing to look for is that straight lines in the world become straight in your undistorted images. Beyond that, it's impossible to really be sure if your cameras are calibrated well just by looking at the output images.
The routine provided by Francesco is good, follow that. I use a shelf board as my plane, with the pattern printed on poster paper. Make sure the images are well exposed - avoid specular reflection! I use a standard 8x6 pattern, I've tried denser patterns but I haven't seen such an improvement in accuracy that it makes a difference.
I think this answer should be sufficient for most people wanting to calibrate a camera - realistically unless you're trying to calibrate something exotic like a Fisheye or you're doing it for educational reasons, OpenCV/Matlab is all you need. Zhang's method is considered good enough that virtually everyone in computer vision research uses it, and most of them either use Bouguet's toolbox or OpenCV.
I am working with a set of calibrated images that form a ring around a foreground object (1). I used Fusiello's method (1) to rectify adjacent pairs of images, and then I performed disparity estimation.
When I take the matched points from a stereo pair and triangulate them, it forms an accurate point cloud. Unfortunately, when I triangulate the points from another stereo image pair, this point cloud never aligns correctly with the original cloud.
Should calibrated, rectified images' point clouds merge together automatically?
Thanks in advance for any help you can offer.
This might be due to the accuracy of calibration - both intrinsic (i.e. the same camera model - and how it handles distortion) and extrinsic (i.e. the camera pose in real space). Together, of course, these dictate the ultimate accuracy of your re-projection.
Do you have a measure of error for camera calibration - in terms of MSE re-projection?
Cumulative error is often noticeable in my experience if simply iterating over subsequent images. Some form of global optimisation often needs to be performed to first correct positions for all the camera poses.
The accuracy of your disparity estimation is also a factor. Not only in terms of the algorithm you using, but also in relation to the stereo baseline and how it relates to the size/nature of the object in question (how concave/convex), and how many sampling of the images you are taking (and the quality of those images - exposure/depth-of-field/etc).
Fundamentally, just how "off" are your point clouds? Are they close to being aligned (you could do a bit of ICP before triangulation...). Are they closer in the "centre" of the re-projection? Are they worse for projections taken from opposing images on opposite sides of the object?
Remember as well that (due to the discrete sampling) you shouldn't expect points to ever be re-projected exactly "on-top" on one another. Some form of binning operation during the triangulation pipeline usually occurs for handling this (hence most of the research work in visual hull -> voxels -> marching cubes -> triangulated surface around this...)
Have you checked out MeshLab BTW?