I am working on a project where I need to figure out real world distance of an object with reference to a fixed point using an image.
I have detected the object in a 2D image, using SURF. My object is inside a box now. What will give me the position of the centroid of the object. How can I use this to find out the real word distance?
If I plan to incorporate stereo vision, triangulating the centroid of the object, what is the difference between the distance I obtain here and in the previous method?
On a single image, probably the best starting point to learn about estimating metrical properties is Antonio Criminisi's work (2000 vintage, but still very relevant): http://www.cs.illinois.edu/~dhoiem/courses/vision_spring10/sources/criminisi00.pdf
Related
I am trying to figure out how to roughly project the geographic position of an annotated object in an image?
The Setup
A picture with a known object in it. i.e. we know the width/height.
A bounding box highlighting where that object is in frame. X,Y,Width,Height.
The precise longitude and latitude of the camera that took the picture. The Origin.
The heading of the camera.
The focal length of the camera.
The camera sensor size.
The height of the camera off the ground.
Can anyone point me toward a solution for roughly projecting the objects location from the image origin location, given those data points?
The solution is simple if you assume an ellipsoidal surface for the Earth. If you need to use a Digital Terrain Model (DTM) things will get quickly more complicated. For example, your object may be visible in the image but occluded on the DTM because of various sources of error. In the following I assume you work with the ellipsoid.
Briefly, what you need to do is backproject the vertices of the image bounding box, obtaining four vectors (rays) in camera coordinates. You then transform them into Earth-Centered Earth-Fixed (ECEF) coordinates and solve for the intersection of the rays with the WGS-72 (or WGS-84) ellipsoid as explained here.
I recommend using the nvector library to help with this kind of calculations. You didin't specify the language you work with, but there are ports of nvector to many common languages, including Python and Matlab.
I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.
I am working on converting 2d images to 3d environment. The images were collected from a video made in a lateral motion. Then the images were placed one behind the other, so it would be easy to find the correspondences between the two images. This is called a spatial-temporal volume.
Next I take a slice from the spatiotemporal volume. That slice is called the Epipolar Plane Image.
Using the Epipolar Plane Image, I want to calculate the depth of the objects in the scene and make a 3D enviornment. I have listed the reference but I have not been able to figure out the math described in the paper. Can someone help me figure this out? Any help is appreciated.
Reference
Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion* !
The math in this situation is easy and straight forward.
First let's define two the coordinate systems for two overlapping images taken by the same camera with the focal length with the following schema:
Let us say that first camera position is defined as follows:
While it's orientation by using three Euler angles is:
By using this definition the corresponding rotation matrix is the identity matrix
The second camera position can be defined as follows:
And since the orientation is the same as the first camera, all Euler angles remain zero:
Which also means that the corresponding rotation matrix is the identity matrix.
If the images overlap and the orientation is the same, the situation in the image space looks like this:
Here the image coordinates and their measurement accuracy are defined as follows:
This geometrical situation can be described by using the Intercept Theorem:
As you see it's not complicated. But be aware that this solution is certainly not the best, since it's base assumption that all orientation angles are the same can't be fulfilled in reality.
If you need to be accurate then you have to perform an bundle adjustment. However, this equations are often used to determine the approximated solution for this geometric situation, where the values are used to linearize the collinearity equations.
Does anyone know how to locate the coordinates of the moving object? I have found some examples online about tracking the objects by using optical flow, but I only got some tracked points on the moving objects. May I just draw rectangle around the each moving object instead? Is there a way to get the coordinates of each moving object? Appreciate any help in advance. Thanks!
Fit a rectangle to the points you get with optical flow and you can consider the centre of the fitted rectangle as a fair estimate of 2D trajectory of the whole moving body..
u can use the Moments operator
first calculate the contour size....
and just add this code block
Moments moment = moments((cv::Mat)contours[index]);
area = moment.m00;//m00 gives the area
x = moment.m10/area;//gives the x coordinate
y = moment.m01/area; //gives y coordiante
where the contours is the output of the findcontours(),
It is pretty hard to tell the coordinates of an object only from a couple of points on it. You can use moments (here is a tutorial) tu get a quite stable point describing where is Your object.
You may also do some additional work, like segmentation using tracked points to get the contour of tracked object, which should make it even easier to find its mass centre. Went overboard with ths.
There is also a tracking method called CAMSHIFT which returns a rectangle bounding the tracked object.
If You know precisely what are You tracking, and can make sure that some known points on tracked object are tracked, and You are able to recognise them, than You can use POSIT to determine the object's 3D coordinates and orientation. Take a glance at ArUco to get the idea about what I'm talking about.
To get the 3D position from previous methods, You can use stereo vision, and use centre of mass from both cameras to compute the coordinates.
i need to find a marker like the ones used in Augmented Reality.
Like this:
I have a solid background on algebra and calculus, but no experience whatsoever on image processing. My thing is Php, sql and stuff.
I just want this to work, i've read the theory behind this and it's extremely hard to see in code for me.
The main idea is to do this as a batch process, so no interactivity is needed. What do you suggest?
Input : The sample image.
Output: Coordinates and normal vector in 3D of the marker.
The use for this will be linking images that have the same marker to spatialize them, a primitive version of photosync we could say. Just a caroussel of pinned images, the marker acting like the pin.
The reps given allowed me to post images, thanks.
You can always look at the open source libraries such as ARToolkit and see how it works but generally in order to get the 3D coordinates of marker you would need to:
Do the camera calibration.
Find marker in image using local features for example.
Using calibrated camera parameters and 2D coordinates of marker do the approximation the 3D coordinates.
I've never implemented sth similar by myself but I think this is a general concept you should apply on your method.
Your problem can be solved by perspective n point camera pose estimation. When you can reasonably assume that all correspondences are correct, a linear algorithm should do.
Since the marker is planar, you can also recover the displacement from the homography between the model plane and the image plane (link). As usual, best results are obtained by iterative algorithms (link).