Estimate 3D Line from Image projections with known Camera Pose and Calibration - opencv

I know the principle of triangulation for 3D Point estimation from images.
However, how would you solve the following problem, I have images from a Line in 3D space with known Camera position and also known calibration. But since I don't know how much/which segment of the Line is seen in eag image, I am not sure how to form an equation for the Line estimation. See image (I have more than 2 images available, in all images most part of the Line visible should be the same, but not exact the same):
I am thinking of spanning a plane from the Camera through the line in the image and intersecting all Planes spanned from each perspective to get an estimate of the Line?
However, I don't really know if that's possible or how I could do this.
Thanks for any help.
Cheers

Afraid other answers and comments have it backwards, pun intended.
Backprojecting ("triangulating") image lines into 3D space and then trying to fit them together with some ad-hoc heuristics may be good for an initial approximation.
However, to refine this approximation, you should then assume that a 3D line exists with unknown parameters (a point and a unit vector), plus additional scalar parameters identifying the initial and final points along the line of the segments you observe. Using the projection equations, you then set up an optimization problem whose goal is to find the set of parameters that minimize the projection errors of the 3D line with those parameters onto the images. This is essentially bundle adjustment, but expressed in the language of your problem, and in fact you can use any good software package for bundle adjustment (hint: Ceres) to solve it. The initial approximation computed with some ad-hoc heuristic will be used as the starting point of the bundle adjustment.

Plane can be defined with one point and direction.
If bot cameras are calibrated and have known position then ...
Make two planes, first from first camera, using eye and projected line second plane at the same way.
Line from screen you can easily convert to line in space because you have camera position.
What is left to intersect those two planes.

Related

OpenCV - align stack of images - different cameras

We have this camera array arranged in an arc around a person (red dot). Think The Matrix - each camera fires at the same time and then we create an animated gif from the output. The problem is that it is near impossible to align the cameras exactly and so I am looking for a way in OpenCV to align the images better and make it smoother.
Looking for general steps. I'm unsure of the order I would do it. If I start with image 1 and match 2 to it, then 2 is further from three than it was at the start. And so matching 3 to 2 would be more change... and the error would propagate. I have seen similar alignments done though. Any help much appreciated.
Here's a thought. How about performing a quick and very simple "calibration" of the imaging system by using a single reference point?
The best thing about this is you can try it out pretty quickly and even if results are too bad for you, they can give you some more insight into the problem. But the bad thing is it may just not be good enough because it's hard to think of anything "less advanced" than this. Here's the description:
Remove the object from the scene
Place a small object (let's call it a "dot") to position that rougly corresponds to center of mass of object you are about to record (the center of area denoted by red circle).
Record a single image with each camera
Use some simple algorithm to find the position of the dot on every image
Compute distances from dot positions to image centers on every image
Shift images by (-x, -y), where (x, y) is the above mentioned distance; after that, the dot should be located in the center of every image.
When recording an actual object, use these precomputed distances to shift all images. After you translate the images, they will be roughly aligned. But since you are shooting an object that is three-dimensional and has considerable size, I am not sure whether the alignment will be very convincing ... I wonder what results you'd get, actually.
If I understand the application correctly, you should be able to obtain the relative pose of each camera in your array using homographies:
https://docs.opencv.org/3.4.0/d9/dab/tutorial_homography.html
From here, the next step would be to correct for alignment issues by estimating the transform between each camera's actual position and their 'ideal' position in the array. These ideal positions could be computed relative to a single camera, or relative to the focus point of the array (which may help simplify calculation). For each image, applying this corrective transform will result in an image that 'looks like' it was taken from the 'ideal' position.
Note that you may need to estimate relative camera pose in 3-4 array 'sections', as it looks like you have a full 180deg array (e.g. estimate homographies for 4-5 cameras at a time). As long as you have some overlap between sections it should work out.
Most of my experience with this sort of thing comes from using MATLAB's stereo camera calibrator app and related functions. Their help page gives a good overview of how to get started estimating camera pose. OpenCV has similar functionality.
https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
The cited paper by Zhang gives a great description of the mathematics of pose estimation from correspondence, if you're interested.

Extrinsic Camera Calibration Using OpenCV's solvePnP Function

I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.

How to determine distance of objects from camera using Epipolar Plane Image?

I am working on converting 2d images to 3d environment. The images were collected from a video made in a lateral motion. Then the images were placed one behind the other, so it would be easy to find the correspondences between the two images. This is called a spatial-temporal volume.
Next I take a slice from the spatiotemporal volume. That slice is called the Epipolar Plane Image.
Using the Epipolar Plane Image, I want to calculate the depth of the objects in the scene and make a 3D enviornment. I have listed the reference but I have not been able to figure out the math described in the paper. Can someone help me figure this out? Any help is appreciated.
Reference
Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion* !
The math in this situation is easy and straight forward.
First let's define two the coordinate systems for two overlapping images taken by the same camera with the focal length with the following schema:
Let us say that first camera position is defined as follows:
While it's orientation by using three Euler angles is:
By using this definition the corresponding rotation matrix is the identity matrix
The second camera position can be defined as follows:
And since the orientation is the same as the first camera, all Euler angles remain zero:
Which also means that the corresponding rotation matrix is the identity matrix.
If the images overlap and the orientation is the same, the situation in the image space looks like this:
Here the image coordinates and their measurement accuracy are defined as follows:
This geometrical situation can be described by using the Intercept Theorem:
As you see it's not complicated. But be aware that this solution is certainly not the best, since it's base assumption that all orientation angles are the same can't be fulfilled in reality.
If you need to be accurate then you have to perform an bundle adjustment. However, this equations are often used to determine the approximated solution for this geometric situation, where the values are used to linearize the collinearity equations.

Project a 2D point from one camera view onto the corresponding 2D point in another camera view of the same scene

I'm using open cv in C++ in multi-view scene with two cameras. I have the intrinsic and extrinsic parameters for both cameras.
I would like to map a (X,Y) point in View 1 to the same point in the second View. I'm am slightly unsure how I should use the intrinsic and extrinsic matrices in order to convert the points to a 3D world and finally end up with the new 2D point in view 2.
It is (normally) not possible to take a 2D coordinate in one image and map it into another 2D coordinate without some additional information.
The main problem is that a single point in the left image will map to a line in the right image (an epipolar line). There are an infinite number of possible corresponding locations because depth is a free parameter. Secondly it's entirely possible that the point doesn't exist in the right image i.e. it's occluded. Finally it may be difficult to determine exactly which point is the right correspondence, e.g. if there is no texture in the scene or if it contains lots of repeating features.
Although the fundamental matrix (which you get out of cv::StereoCalibrate anyway) gives you a constraint between points in each camera: x'Fx = 0, for a given x' there will be a whole family of x's which will satisfy the equation.
Some possible solutions are as follows:
You know the 3D location of a 2D point in one image. Provided that 3D point is in a common coordinate system, you just use cv::projectPoints with the calibration parameters of the other camera you want to project into.
You do some sparse feature detection and matching using something like SIFT or ORB. Then you can calculate a homography to map the points from one image to the other. This makes a few assumptions about things being planes. If you Google panorama homography, there are plenty of lecture slides detailing this.
You calibrate your cameras, perform an epipolar rectification (cv::StereoRectify, cv::initUndistortRectifyMap, cv::remap) and then run them through a stereo matcher. The output is a disparity map which gives you exactly what you want: a per-pixel mapping from one camera to the other. That is, left[y,x] = right[y, x+disparity_map[y,x]].
(1) is by far the easiest, but it's unlikely you have that information already. (2) is often doable and might be suitable, and as another commenter pointed out will be poor where the planarity assumption fails. (3) is the general (ideal) solution, but has its own drawbacks and relies on the images being amenable to dense matching.

Measuring an object from a picture using a known object size

So what I need to do is measuring a foot length from an image taken by an ordinary user. That image will contain a foot with a black sock wearing, a coin (or other known size object), and a white paper (eg A4) where the other two objects will be upon.
What I already have?
-I already worked with opencv but just simple projects;
-I already started to read some articles about Camera Calibration ("Learn OpenCv") but still don't know if I have to go so far.
What I am needing now is some orientation because I still don't understand if I'm following right way to solve this problem. I have some questions: Will I realy need to calibrate camera to get two or three measures of the foot? How can I find the points of interest to get the line to measure, each picture is a different picture or there are techniques to follow?
Ps: sorry about my english, I really have to improve it :-/
First, some image acquisition things:
Can you count on the black sock and white background? The colors don't matter as much as the high contrast between the sock and background.
Can you standardize the viewing angle? Looking directly down at the foot will reduce perspective distortion.
Can you standardize the lighting of the scene? That will ease a lot of the processing discussed below.
Lastly, you'll get a better estimate if you zoom (or position the camera closer) so that the foot fills more of the image frame.
Analysis. (Note this discussion will directed to your question of identifying the axes of the foot. Identifying and analyzing the coin would use a similar process, but some differences would arise.)
The next task is to isolate the region of interest (ROI). If your camera is looking down at the foot, then the ROI can be limited to the white rectangle. My answer to this Stack Overflow post is a good start to square/rectangle identification: What is the simplest *correct* method to detect rectangles in an image?
If the foot lies completely in the white rectangle, you can clip the image to the rect found in step #1. This will limit the image analysis to region inside the white paper.
"Binarize" the image using a threshold function: http://opencv.willowgarage.com/documentation/cpp/miscellaneous_image_transformations.html#cv-threshold. If you choose the threshold parameters well, you should be able to reduce the image to a black region (sock pixels) and white regions (non-sock pixel).
Now the fun begins: you might try matching contours, but if this were my problem, I would use bounding boxes for a quick solution or moments for a more interesting (and possibly robust) solution.
Use cvFindContours to find the contours of the black (sock) region: http://opencv.willowgarage.com/documentation/structural_analysis_and_shape_descriptors.html#findcontours
Use cvApproxPoly to convert the contour to a polygonal shape http://opencv.willowgarage.com/documentation/structural_analysis_and_shape_descriptors.html#approxpoly
For the simple solution, use cvMinRect2 to find an arbitrarily oriented bounding box for the sock shape. The short axis of the box should correspond to the line in largura.jpg and the long axis of the box should correspond to the line in comprimento.jpg.
http://opencv.willowgarage.com/documentation/structural_analysis_and_shape_descriptors.html#minarearect2
If you want more (possible) accuracy, you might try cvMoments to compute the moments of the shape. http://opencv.willowgarage.com/documentation/structural_analysis_and_shape_descriptors.html#moments
Use cvGetSpatialMoment to determine the axes of the foot. More information on the spatial moment may be found here: http://en.wikipedia.org/wiki/Image_moments#Examples_2 and here http://opencv.willowgarage.com/documentation/structural_analysis_and_shape_descriptors.html#getspatialmoment
With the axes known, you can then rotate the image so that the long axis is axis-aligned (i.e. vertical). Then, you can simply count pixels horizontally and vertically to obtains the lengths of the lines. Note that there are several assumptions in this moment-oriented process. It's a fun solution, but it may not provide any more accuracy - especially since the accuracy of your size measurements is largely dependent on the camera positioning issues discussed above.
Lastly, I've provided links to the older C interface. You might take a look at the new C++ interface (I simply have not gotten around to migrating my code to 2.4)
Antonio Criminisi likely wrote the last word on this subject years ago. See his "Single View Metrology" paper , and his PhD thesis if you have time.
You don't have to calibrate the camera if you have a known-size object in your image. Well... at least if your camera doesn't distort too much and if you're not expecting high quality measurements.
A simple approach would be to detect a white (perspective-distorted) rectangle, mapping the corners to an undistorted rectangle (using e.g. cv::warpPerspective()) and use the known size of that rectangle to determine the size of other objects in the picture. But this only works for objects in the same plane as the paper, preferably not too far away from it.
I am not sure if you need to build this yourself, but if you just need to do it, and not code it. You can use KLONK Image Measurement for this. There is a free and payable versions.

Resources