I think the easiest is to explain problem with image:
I have two cubes (same size) that are laying on the table. One of their side is marked with green color (for easy tracking). I want to calculate the relative position (x,y) of left cube to the right cube (red line on the picture) in cube size unit.
Is it even possible? I know problem would be simple if those two green sides would have common plane - like top side of cube however I can't use that for tracking. I would just calculate homography for one square and multiply with other cube corner.
Should I 'rotate' homography matrix by multiplying with 90deegre rotation matrix to get 'ground' homography? I plan to do processing in smartphone scenario so maybe gyroscope, camera intrinsic params can be of any value.
This is possible.
Let's assume (or state) that the table is the z=0-plane and that your first box is at the origin of this plane. This means that green corners of the left box have the (table-)coordinates (0,0,0),(1,0,0),(0,0,1) and (1,0,1). (Your box has the size 1).
You also have the pixel coordinates of these points. If you give these 2d and 3d-values (as well as the intrinsics and distortion of the camera) to cv::solvePnP, you get the relative Pose of the camera to your box (and the plane).
In the next step, you have to intersect the table-plane with the ray that goes from your camera's center through the lower right corner pixels of the second green box. This intersection will look like (x,y,0) and [x-1,y] will be translation between the right corners of your boxes.
If you have all the information (camera intrinsics) you can do it the way FooBar answered.
But you can use the information that the points lie on a plane even more directly with a homography (no need to calculate rays etc):
Compute the homography between the image plane and the ground plane.
Unfortunately you need 4 point correspondences, but there are only 3 cube-points visible in the image, touching the ground plane.
Instead you can use the top-plane of the cubes, where the same distance can be measured.
first the code:
int main()
{
// calibrate plane distance for boxes
cv::Mat input = cv::imread("../inputData/BoxPlane.jpg");
// if we had 4 known points on the ground plane, we could use the ground plane but here we instead use the top plane
// points on real world plane: height = 1: // so it's not measured on the ground plane but on the "top plane" of the cube
std::vector<cv::Point2f> objectPoints;
objectPoints.push_back(cv::Point2f(0,0)); // top front
objectPoints.push_back(cv::Point2f(1,0)); // top right
objectPoints.push_back(cv::Point2f(0,1)); // top left
objectPoints.push_back(cv::Point2f(1,1)); // top back
// image points:
std::vector<cv::Point2f> imagePoints;
imagePoints.push_back(cv::Point2f(141,302));// top front
imagePoints.push_back(cv::Point2f(334,232));// top right
imagePoints.push_back(cv::Point2f(42,231)); // top left
imagePoints.push_back(cv::Point2f(223,177));// top back
cv::Point2f pointToMeasureInImage(741,200); // bottom right of second box
// for transform we need the point(s) to be in a vector
std::vector<cv::Point2f> sourcePoints;
sourcePoints.push_back(pointToMeasureInImage);
//sourcePoints.push_back(pointToMeasureInImage);
sourcePoints.push_back(cv::Point2f(718,141));
sourcePoints.push_back(imagePoints[0]);
// list with points that correspond to sourcePoints. This is not needed but used to create some ouput
std::vector<int> distMeasureIndices;
distMeasureIndices.push_back(1);
//distMeasureIndices.push_back(0);
distMeasureIndices.push_back(3);
distMeasureIndices.push_back(2);
// draw points for visualization
for(unsigned int i=0; i<imagePoints.size(); ++i)
{
cv::circle(input, imagePoints[i], 5, cv::Scalar(0,255,255));
}
//cv::circle(input, pointToMeasureInImage, 5, cv::Scalar(0,255,255));
//cv::line(input, imagePoints[1], pointToMeasureInImage, cv::Scalar(0,255,255), 2);
// compute the relation between the image plane and the real world top plane of the cubes
cv::Mat homography = cv::findHomography(imagePoints, objectPoints);
std::vector<cv::Point2f> destinationPoints;
cv::perspectiveTransform(sourcePoints, destinationPoints, homography);
// compute the distance between some defined points (here I use the input points but could be something else)
for(unsigned int i=0; i<sourcePoints.size(); ++i)
{
std::cout << "distance: " << cv::norm(destinationPoints[i] - objectPoints[distMeasureIndices[i]]) << std::endl;
cv::circle(input, sourcePoints[i], 5, cv::Scalar(0,255,255));
// draw the line which was measured
cv::line(input, imagePoints[distMeasureIndices[i]], sourcePoints[i], cv::Scalar(0,255,255), 2);
}
// just for fun, measure distances on the 2nd box:
float distOn2ndBox = cv::norm(destinationPoints[0]-destinationPoints[1]);
std::cout << "distance on 2nd box: " << distOn2ndBox << " which should be near 1.0" << std::endl;
cv::line(input, sourcePoints[0], sourcePoints[1], cv::Scalar(255,0,255), 2);
cv::imshow("input", input);
cv::waitKey(0);
return 0;
}
Here's the output which I want to explain:
distance: 2.04674
distance: 2.82184
distance: 1
distance on 2nd box: 0.882265 which should be near 1.0
those distances are:
1. the yellow bottom one from one box to the other
2. the yellow top one
3. the yellow one on the first box
4. the pink one
so the red line (you asked for) should have a length of nearly exactly 2 x cube side length. But we have some error as you can see.
the better/more correct your pixel positions are before homography computation, the more exact your results.
You need a pinhole camera model, so undistort your camera (in real world application).
keep in mind too, that you could compute the distances on the ground plane, if you had 4 linear points visible there (that dont lie on same line)!
Related
I can detect rectangles that are separate from each other. However, I am having problems with rectangles in contact such as below:
Two rectangles in contact
I should detect 2 rectangles in the image. I am using findContours as expected and I have tried various modes:CV_RETR_TREE, CV_RETR_LIST. I always get the outermost single contour as shown below:
Outermost contour detected
I have tried with or without canny edge detection. What I do is below:
cv::Mat element = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3,3));
cv::erode(__mat,__mat, element);
cv::dilate(__mat,__mat, element);
// Find contours
std::vector<std::vector<cv::Point> > contours;
cv::Mat coloredMat;
cv::cvtColor(__mat, coloredMat, cv::COLOR_GRAY2BGR);
int thresh = 100;
cv::Mat canny_output;
cv::Canny( __mat, canny_output, thresh, thresh*2, 3 );
cv::findContours(canny_output, contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);
How can I detect both two rectangles separately?
If you already know the dimensions of the rectangle, you can use generalizedHoughTransform
If the dimensions of the rectangles are not known, you can use distanceTransform. The local maxima will give you the center location as well as the distance from the center to the nearest edge (which will be equal to half the short side of your rect). Further processing with corner detection / watershed and you should be able to find the orientation and dimensions (though this method may fail if the two rectangles overlap each other by a lot)
simple corner detection and brute force search (just try out all possible rectangle combinations given the corner points and see which one best matches the image, note that a rectangle can be defined given only 3 points) might also work
I am attempting to map a fisheye image to a 360 degree view using a sky sphere in Unity. The scene is inside the sphere. I am very close but I am seeing some slight distortion. I am calculating UV coordinates as follows:
Vector3 v = currentVertice; // unit vector from edge of sphere, -1, -1, -1 to 1, 1, 1
float r = Mathf.Atan2(Mathf.Sqrt(v.x * v.x + v.y * v.y), v.z) / (Mathf.PI * 2.0f);
float phi = Mathf.Atan2(v.y, v.x);
textureCoordinates.x = (r * Mathf.Cos(phi)) + 0.5f;
textureCoordinates.y = (r * Mathf.Sin(phi)) + 0.5f;
Here is the distortion and triangles:
The rest of the entire sphere looks great, it's just at this one spot that I get the distortion.
Here is the source fish eye image:
And the same sphere with a UV test texture over the top showing the same distortion area. Full UV test texture is on the right, and is a square although stretched into a rectangle on the right for purposes of my screenshot.
The distortion above is using sphere mapping rather than fish eye mapping. Here is the UV texture using fish eye mapping:
Math isn't my strong point, am I doing anything wrong here or is this kind of mapping simply not 100% possible?
The spot you are seeing is the case where r gets very close to 1. As you can see in the source image, this is the border area between the very distorted image data and the black.
This area is very distorted, however that's not the main problem. Looking at the result you can see that there are problems with UV orientation.
I've added a few lines to your source image to demonstrate what I mean. Where r is small (yellow lines) you can see that the UV coordinates can be interpolated between the corners of your quad (assuming quads instead of tris). However, where r is big (red corners), interpolating UV coordinates will make them travel through areas of your source image whose r is much smaller than 1 (red lines), causing distortions in UV space. Actually, those red lines should not be straight, but they should travel along the border of your source image data.
You can improve this by having a higher polycount in the area of your skysphere where r gets close to 1, but it will never be perfect as long as your UVs are interpolated in a linear way.
I also found another problem. If you look closely at the spot, you'll find that the complete source image is present there in small. This is because your UV coordinates wrap around at that point. As rendering passes around the viewer, uv coordinates travel from 0 towards 1. At the spot they are at 1, the neighboring vertex however is at 0.001 or something, causing the whole source image to be rendered inbetween. To fix that, you'll need to have two seperate vertices at the seam of your skysphere, one where the surface of the sphere starts, and one where it ends. In object space they are identical, but in uv space one is at 0, the other at 1.
I am trying to determine camera position in world coordinates, relative to a fiducial position based on fiducial marker found in a scene.
My methodology for determining the viewMatrix is described here:
Determine camera pose?
I have the rotation and translation, [R|t], from the trained marker to the scene image. Given camera calibration training, and thus the camera intrinsic results, I should be able to discern the cameras position in world coordinates based on the perspective & orientation of the marker found in the scene image.
Can anybody direct me to a discussion or example similar to this? I'd like to know my cameras position based on the fiducial marker, and I'm sure that something similar to this has been done before, I'm just not searching the correct keywords.
Appreciate your guidance.
What do you mean under world coordinates? If you mean object coordinates then you should use the inverse transformation of solvepnp's result.
Given a view matrix [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. In OpenCV:
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, intrinsics, distortion, rvec, tvec);
cv::Mat R;
cv::Rodrigues(rvec, rotation);
R = R.t(); // inverse rotation
tvec = -R * tvec; // translation of inverse
// camPose is a 4x4 matrix with the pose of the camera in the object frame
cv::Mat camPose = cv::Mat::eye(4, 4, R.type());
R.copyTo(camPose.rowRange(0, 3).colRange(0, 3)); // copies R into camPose
tvec.copyTo(camPose.rowRange(0, 3).colRange(3, 4)); // copies tvec into camPose
Update #1:
Result of solvePnP
solvePnP estimates the object pose given a set of object points (model coordinates), their corresponding image projections (image coordinates), as well as the camera matrix and the distortion coefficients.
The object pose is given by two vectors, rvec and tvec. rvec is a compact representation of a rotation matrix for the pattern view seen on the image. That is, rvec together with the corresponding tvec brings the fiducial pattern from the model coordinate space (in which object points are specified) to the camera coordinate space.
That is, we are in the camera coordinate space, it moves with the camera, and the camera is always at the origin. The camera axes have the same directions as image axes, so
x-axis is pointing in the right side from the camera,
y-axis is pointing down,
and z-axis is pointing to the direction of camera view
The same would apply to the model coordinate space, so if you specified the origin in upper right corner of the fiducial pattern, then
x-axis is pointing to the right (e.g. along the longer side of your pattern),
y-axis is pointing to the other side (e.g. along the shorter one),
and z-axis is pointing to the ground.
You can specify the world origin as the first point of the object points that is the first object is set to (0, 0, 0) and all other points have z=0 (in case of planar patterns). Then tvec (combined rvec) points to the origin of the world coordinate space in which you placed the fiducial pattern. solvePnP's output has the same units as the object points.
Take a look at to the following: 6dof positional tracking. I think this is very similar as you need.
In the below picture, I have the 2D locations of the green points and I want to calculate the locations of the red points, or, as an intermediate step, I want to calculate the locations of the blue points. All in 2D.
Of course, I do not only want to find those locations for the picture above. In the end, I want an automated algorithm which takes a set of checkerboard corner points to calculate the outer corners.
I need the resulting coordinates to be as accurate as possible, so I think that I need a solution which does not only take the outer green points into account, but which also uses all the other green points' locations to calculate a best fit for the outer corners (red or blue).
If OpenCV can do this, please point me into that direction.
In general, if all you have is the detection of some, but not all, the inner corners, the problem cannot be solved. This is because the configuration is invariant to translation - shifting the physical checkerboard by whole squares would produce the same detected corner position on the image, but due to different physical corners.
Further, the configuration is also invariant to rotations by 180 deg in the checkerboard plane and, unless you are careful to distinguish between the colors of the squares adjacent each corner, to rotations by 90 deg and reflections with respect the center and the midlines.
This means that, in addition to detecting the corners, you need to extract from the image some features of the physical checkerboard that can be used to break the above invariance. The simplest break is to detect all 9 corners of one row and one column, or at least their end-corners. They can be used directly to rectify the image by imposing the condition that their lines be at 90 deg angle. However, this may turn out to be impossible due to occlusions or detector failure, and more sophisticated methods may be necessary.
For example, you can try to directly detect the chessboard edges, i.e. the fat black lines at the boundary. One way to do that, for example, would be to detect the letters and numbers nearby, and use those locations to constrain a line detector to nearby areas.
By the way, if the photo you posted is just a red herring, and you are interested in detecting general checkerboard-like patterns, and can control the kind of pattern, there are way more robust methods of doing it. My personal favorite is the "known 2D crossratios" pattern of Matsunaga and Kanatani.
I solved it robustly, but not accurately, with the following solution:
Find lines with at least 3 green points closely matching the line. (thin red lines in pic)
Keep bounding lines: From these lines, keep those with points only to one side of the line or very close to the line.
Filter bounding lines: From the bounding lines, take the 4 best ones/those with most points on them. (bold white lines in pic)
Calculate the intersections of the 4 remaining bounding lines (none of the lines are perfectly parallel, so this results in 6 intersections, of which we want only 4).
From the intersections, remove the one farthest from the average position of the intersections until only 4 of them are left.
That's the 4 blue points.
You can then feed these 4 points into OpenCV's findPerspectiveTransform function to find a perspective transform (aka a homography):
Point2f* srcPoints = (Point2f*) malloc(4 * sizeof(Point2f));
std::vector<Point2f> detectedCorners = CheckDet::getOuterCheckerboardCorners(srcImg);
for (int i = 0; i < MIN(4, detectedCorners.size()); i++) {
srcPoints[i] = detectedCorners[i];
}
Point2f* dstPoints = (Point2f*) malloc(4 * sizeof(Point2f));
int dstImgSize = 400;
dstPoints[0] = Point2f(dstImgSize * 1/8, dstImgSize * 1/8);
dstPoints[1] = Point2f(dstImgSize * 7/8, dstImgSize * 1/8);
dstPoints[2] = Point2f(dstImgSize * 7/8, dstImgSize * 7/8);
dstPoints[3] = Point2f(dstImgSize * 1/8, dstImgSize * 7/8);
Mat m = getPerspectiveTransform(srcPoints, dstPoints);
For our example image, the input and output of findPerspectiveTranform looks like this:
input
(349.1, 383.9) -> ( 50.0, 50.0)
(588.9, 243.3) -> (350.0, 50.0)
(787.9, 404.4) -> (350.0, 350.0)
(506.0, 593.1) -> ( 50.0, 350.0)
output
( 1.6 -1.1 -43.8 )
( 1.4 2.4 -1323.8 )
( 0.0 0.0 1.0 )
You can then transform the image's perspective to board coordinates:
Mat plainBoardImg;
warpPerspective(srcImg, plainBoardImg, m, Size(dstImgSize, dstImgSize));
Results in the following image:
For my project, the red points that you can see on the board in the question are not needed anymore, but I'm sure they can be calculated easily from the homography by inverting it and then using the inverse for back-tranforming the points (0, 0), (0, dstImgSize), (dstImgSize, dstImgSize), and (dstImgSize, 0).
The algorithm works surprisingly reliable, however, it does not use all the available information, because it uses only the outer points (those which are connected with the white lines). It does not use any data of the inner points for additional accuracy. I would still like to find an even better solution, which uses the data of the inner points.
As titled, I have an item with a specific position in object space defined by a single vector.
I would like to retrieve the coordinates in camera space of the projection of this vector on the near clipping plane.
In other words, I need the intersection in camera space between this vector and the plane defined by the z coordinate equals to -1 (my near plane).
I needed it for moving object linearly with the mouse in perspective projection
Edit: Right now I go from the object space down to the window space, then from there up to the camera space by setting the window depth window.z equal to 0, that is the near plane.
Note that to get the camera space from the unProject I just pass in as modelview matrix an identity matrix new Mat4(1f):
public Vec3 getCameraSpacePositionOnNearPlane(Vec2i mousePoint) {
int[] viewport = new int[]{0, 0, glViewer.getGlWindow().getWidth(), glViewer.getGlWindow().getHeight()};
Vec3 window = new Vec3();
window.x = mousePoint.x;
window.y = viewport[3] - mousePoint.y - 1;
window.z = 0;
return Jglm.unProject(window, new Mat4(1f), glViewer.getVehicleCameraToClipMatrix(), new Vec4(viewport));
}
Is there a better way (more efficient) to get it without going down to the window space and come back to the camera one?
The most direct approach I could think of would be to simply transform your object space position (let this be called vector x in the following) into eye space, construct a ray from the origin to that eye-space coordinates and calculate the intersection between that ray and the near plane z_eye=-near.
Another approach would be to fully transfrom into the clip space. Since the near plane is z_clip = - w_clip there, you can just set the z coordinate of the result to -w, and project that back to eye space by using the inverse projection matrix.
In both cases, the result will be meaningless if the point lies behind the camera, or at the camera plane z_eye = 0.