Hi i am using an asus xtion pro live camera for my object detection, i am also new to opencv. Im trying to get distance of object from the camera. The Object detected is in 2d image. Im not sure on what should i use to get the information then following up with the calculations to get distance between camera and object detected. Could someone advise me please?
In short: You can't.
You're losing the depth information and any visible pixel in your camera image essentially transforms into a ray originating from your camera.
So once you've got an object at pixel X, all you know is that the object somewhere intersects the vector cast based on this pixel and the camera's intrinsic/extrinsic parameters.
You'll essentially need more information. One of the following should suffice:
Know at least one coordinate of the 3D point (e.g. everything detected is on the ground or in some known plane).
Know the relation between two projected points:
Either the same point from different positions (known camera movement/offset)
or two points with significant distance between them (like the two ends of some staff or bar).
Once you've got either, you're able to use simple trigonometry (rule of three) to calculate the missing values.
Since I initially missed this being a camera with an OpenNI compatible depth sensor, it's possible to build OpenCV with support for that by definining the preprocessor define WITH_OPENNI when building the library.
I don't like to be the one breaking this to you but what you are trying to do is either impossible or extremely difficult with a single camera.
You need to have the camera moving, record a video of it and use a complex technique such as this. Usually 3d information is created from at least 2 2d images taken from 2 different places. You also need to know quite precisely the distance and the rotation between the two images. The common technique is to have 2 cameras with a precisely measured distance between the two.
The Xtion is not a basic webcam. It's a stereo-scopic depth sensing cam similar to Kinect and Primesense. The main API for this is OpenNI - see http://structure.io/openni.
Related
Right now I'm exploring features of iOS Depth camera and now I want to obtain the distance in real-world metrics between two points (for example, between two eyes).
I have successfully connected iOS Depth camera functionality and I have AVDepthData in my hands but I'm not quite sure how I can get a real-world distance between two specific points.
I believe I could calculate it if I have depth and viewing angle, but I don't see that the latter is presented as parameter. Also I know that this task could be handled with ARKit, but I'm really curious how I can implement it myself. I mean ARKit uses Depth camera as well, so there must be an algorithm where Depth maps is all I need to calculate the real distance
Could you please give me an advice how to tackle this task? Thanks in advance!
I recently managed to get my augmented reality application up and running close to what is expected. However, I'm having an issue where, even though the values are correct, the augmentation is still off by some translation! It would be wonderful to get this solved as I'm so close to having this done.
The system utilizes an external tracking system (Polaris Spectra stereo optical tracker) with IR-reflective markers to establish global and reference frames. I have a LEGO structure with a marker attached which is the target of the augmentation, a 3D model of the LEGO structure created using CAD with the exact specs of its real-world counterpart, a tracked pointer tool, and a camera with a world reference marker attached to it. The virtual space was registered to the real world using a toolset in 3D Slicer, a medical imaging software which is the environment I'm developing in. Below are a couple of photos just to clarify exactly the system I'm dealing with (May or may not be relevant to the issue).
So a brief overview of exactly what each marker/component does (Markers are the black crosses with four silver balls):
The world marker (1st image on right) is the reference frame for all other marker's transformations. It is fixed to the LEGO model so that a single registration can be done for the LEGO's virtual equivalent.
The camera marker (1st image, attached to camera) tracks the camera. The camera is registered to this marker by an extrinsic calibration performed using cv::solvePnP().
The checkerboard is used to acquire data for extrinsic calibration using a tracked pointer (unshown) and cv::findChessboardCorners().
Up until now I've been smashing my face against the mathematics behind the system until everything finally lined up. When I move where I estimate the camera origin to be to the reference origin, the translation vector between the two is about [0; 0; 0]. So all of the registration appears to work correctly. However, when I run my application, I get the following results:
As you can see, there's a strange offset in the augmentation. I've tried removing distortion correction on the image (currently done with cv::undistort()), but it just makes the issue worse. The rotations are all correct and, as I said before, the translations all seem fine. I'm at a loss for what could be causing this. Of course, there's so much that can go wrong during implementation of the rendering pipeline, so I'm mostly posting this here under the hope that someone has experienced a similar issue. I already performed this project using a webcam-based tracking method and experienced no issues like this even though I used the same rendering process.
I've been purposefully a little ambiguous in this post to avoid bogging down readers with the minutia of the situation as there are so many different details I could include. If any more information is needed I can provide it. Any advice or insight would be massively appreciated. Thanks!
Here are a few tests that you could do to validate that each module works well.
First verify your extrinsic and intrinsic calibrations:
Check that the position of the virtual scene-marker with respect to the virtual lego scene accurately corresponds to the position of the real scene-marker with respect to the real lego scene (e.g. the real scene-marker may have moved since you last measured its position).
Same for the camera-marker, which may have moved since you last calibrated its position with respect to the camera optical center.
Check that the calibration of the camera is still accurate. For such a camera, prefer a camera matrix of the form [fx,0,cx;0,fy,cy;0,0,1] (i.e. with a skew fixed to zero) and estimate the camera distortion coefficients (NB: OpenCV's undistort functions do not support camera matrices with non-zero skews; using such matrices may not raise any exception but will result in erroneous undistortions).
Check that the marker tracker does not need to be recalibrated.
Then verify the rendering pipeline, e.g. by checking that the scene-marker reprojects correctly into the camera image when moving the camera around.
If it does not reproject correctly, there is probably an error with the way you map the OpenCV camera matrix into the OpenGL projection matrix, or with the way you map the OpenCV camera pose into the OpenGL model view matrix. Try to determine which one is wrong using toy examples with simple 3D points and simple projection and modelview matrices.
If it reprojects correctly, then there probably is a calibration problem (see above).
Beyond that, it is hard to guess what could be wrong without directly interacting with the system. If I were you and I still had no idea where the problem could be after doing the tests above, I would try to start back from scratch and validate each intermediate step using toy examples.
I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.
I have a set of 3-d points and some images with the projections of these points. I also have the focal length of the camera and the principal point of the images with the projections (resulting from previously done camera calibration).
Is there any way to, given these parameters, find the automatic correspondence between the 3-d points and the image projections? I've looked through some OpenCV documentation but I didn't find anything suitable until now. I'm looking for a method that does the automatic labelling of the projections and thus the correspondence between them and the 3-d points.
The question is not very clear, but I think you mean to say that you have the intrinsic calibration of the camera, but not its location and attitude with respect to the scene (the "extrinsic" part of the calibration).
This problem does not have a unique solution for a general 3d point cloud if all you have is one image: just notice that the image does not change if you move the 3d points anywhere along the rays projecting them into the camera.
If have one or more images, you know everything about the 3D cloud of points (e.g. the points belong to an object of known shape and size, and are at known locations upon it), and you have matched them to their images, then it is a standard "camera resectioning" problem: you just solve for the camera extrinsic parameters that make the 3D points project onto their images.
If you have multiple images and you know that the scene is static while the camera is moving, and you can match "enough" 3d points to their images in each camera position, you can solve for the camera poses up to scale. You may want to start from David Nister's and/or Henrik Stewenius's papers on solvers for calibrated cameras, and then look into "bundle adjustment".
If you really want to learn about this (vast) subject, Zisserman and Hartley's book is as good as any. For code, look into libmv, vxl, and the ceres bundle adjuster.
i have a stereopair,
photo 1: http://savepic.org/1671682.jpg
photo 2: http://savepic.org/1667586.jpg
there is coordinate system in each image. How can I find coordinates of point A in this system using OpenCV library. It would be nice to see sample code.
I've looked for it at opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html but haven't found (or haven't understood :) )
Your 'stereo' images are fine. What you have already done is solve the correspondence problem: in both images you have indicated points 'A'. This means that you know which pixel corresponds to eachother labeling point 'A'.
What you want to do, is triangulate where your camera is. You can only do this by first calibrating your camera. This is inside of OpenCV already.
http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
This gives you the exact vector/ray of light for each vector, and the optical center of your cameras through which the ray passes. Moreover, you need stereo calibration. This establishes the orientation and position of each camera with respect through each other.
From that point on, your triangulation is simple, knowing the pixel location in both images of point 'A'. You have
Location and orientation of camera 1 and camera 2
Otical Ray Vector (pixel location) from the cameras to label 'A'.
So you have 2 locations in space, and 2 rays from these location. The intersection of these rays is your 3D answer.
Note that in practice there rays will never exactly intersect (2 lines in 3D rarely do), so you need to approximate. Use opencv function triangulatePoints(), using the input of the stereo calibration and the pixel index relating to label A.
Firstly of all this is not truly a stereo pair. A nice stereo pair needs to have 60%-80% overlap usually small rotation differences between images. Even if this pair had the necessary BASE to be a good stereo pair due to the extremely kappa rotation the resulting epipolar image would be useless.
Secondly among others you should take a look at the camera calibration and collinearity equations both supported by OpenCV
http://en.wikipedia.org/wiki/Camera_resectioning
http://en.wikipedia.org/wiki/Collinearity_equation
You need to understand the maths.
If the page isn't enough then you should look at the opencv book - it devotes a couple of chapters to this. Then there are a lot of textbooks that cover it in more detail