OpenCV: solvePnP detection problems - ios

I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)

In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.

So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?

For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.

Related

How to use OpenCV stereoCalibrate output to map pixels from one camera to another

Context: I have two cameras of very different focus and size that I want to align for image processing. One is RGB, one is near-infrared. The cameras are in a static rig, so fixed relative to each other. Because the image focus/width are so different, it's hard to even get both images to recognize the chessboard at the same time. Pretty much only works when the chessboard is centered in both images with very little skew/tilt.
I need to perform computations on the aligned images, so I need as good of a mapping between the optical frames as I can get. Right now the results I'm getting are pretty far off. I'm not sure if I'm using the method itself wrong, or if I am misusing the output. Details and image below.
Computation: I am using OpenCV stereoCalibrate to estimate the rotation and translation matrices with the following code, and throwing out bad results based on final error.
int flag = cv::CALIB_FIX_INTRINSIC;
double err = cv::stereoCalibrate(temp_points_object_vec, temp_points_alignvec, temp_points_basevec, camera_mat_align, camera_distort_align, camera_mat_base, camera_distort_base, mat_align.size(), rotate_mat, translate_mat, essential_mat, F, flag, cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, 1e-6));
if (last_error_ == -1.0 || (err < last_error_ + improve_threshold_)) {
// -1.0 indicate first calibration, accept points. Other cond indicates acceptable error.
points_alignvec_.push_back(addalign);
points_basevec_.push_back(addbase);
points_object_vec_.push_back(object_points);
}
The result doesn't produce an OpenCV error as is, and due to the large difference between images, more than half of the matched points are rejected. Results are much better since I added the conditional on the error, but still pretty poor. Error as computed above starts around 30, but doesn't get lower than 15-17. For comparison, I believe a "good" error would be <1. So for starters, I know the output isn't great, but on top of that, I'm not sure I'm using the output right for validating visually. I've attached images showing some of the best and worst results I see. The middle image on the right of each shows the "cross-validated" chessboard keypoints. These are computed like this (note addalign is the temporary vector containing only the chessboard keypoints from the current image in the frame to be aligned):
for (int i = 0; i < addalign.size(); i++) {
cv::Point2f validate_pt;// = rotate_mat * addalign.at(i) + translate_mat;
// Project pixel from image aligned to 3D
cv::Point3f ray3d = align_camera_model_.projectPixelTo3dRay(addalign.at(i));
// Rotate and translate
rotate_mat.convertTo(rotate_mat, CV_32F);
cv::Mat temp_result = rotate_mat * cv::Mat(ray3d, false);
cv::Point3f ray_transformed;
temp_result.copyTo(cv::Mat(ray_transformed, false));
cv::Mat tmat = cv::Mat(translate_mat, false);
ray_transformed.x += tmat.at<float>(0);
ray_transformed.y += tmat.at<float>(1);
ray_transformed.z += tmat.at<float>(2);
// Reproject to base image pixel
cv::Point2f pixel = base_camera_model_.project3dToPixel(ray_transformed);
corners_validated.push_back(pixel);
}
Here are two images showing sample outputs, including both raw images, both images with "drawChessboard," and a cross-validated image showing the base image with above-computed keypoints translated from the alignment image.
Better result
Worse result
In the computation of corners_validated, I'm not sure I'm using rotate_mat andtranslate_mat correctly. I'm sure there is probably an OpenCV method that does this more efficiently, but I just did it the way that made sense to me at the time.
Also relevant: This is all inside a ROS package, using ROS noetic on Ubuntu 20.04 which only permits the use of OpenCV 4.2, so I don't have access to some of the newer opencv methods.

Opencv get accurate real world coordinates from 2 known parallel planes

So I have been tinkering a little bit with opencv and I want to be able to use a camera image to get the position of certain objects that are lying flat on plane. These objects are simple shapes such as circles squares etc. They all have the same height of 5cm. To be able to relate real world points to pixels on the camera I painted 4 white squares on the plane with known distances between them.
So the steps I have been taking are:
Initialization:
Calibrate my camera using a checkerboard image and save the calibration data.
Get the input image. call cv::undistort with the calibration data for my camera.
Find the center points of the 4 squares in the image and pass that data and the real world coordinates of the squares to the cv::solvePnP function. Save the rvec and tvec return parameters.
Warp the perspective of the image so you can get a top down view from the image. This is essentially following this tutorial: https://docs.opencv.org/3.4.1/d9/dab/tutorial_homography.html
Use the resulting image to again find the 4 white squares and then calculate a "pixels per meter" translation constant which can relate a certain amount of difference in pixels between points to the real world distance on the plane where the 4 squares are.
Finding object, This is done after initialization:
Get the input image. call cv::undistort with the calibration data for my camera.
Warp the perspective of the image so you can get a top down view from the image. This is the same as step 4 during initialisation.
Find the centerpoint of the object to detect.
Since the centerpoint of the object is on a higher plane then where I calibrated I use the following formula to correct this(d = is the pixel offset from the center of the image. camHeight is the cameraHeight I measured by using a tape measure. h is height of the object):
d = x - (h * (x / camHeight))
So here for an illustration how I got this formule:
But still the coordinates are not matching up...
So I am wondering at all if this is the correct. Specifically I have the following questions:
Is using cv::undistort before using cv::solvenPnP correct? cv::solvePnP also takes the camera calibration data as input so I'm not sure if I have to pass an undistorted image to it or not.
Similar to 1. During Finding object I call cv::undistort -> cv::warpPerspective. Is this undistort necessary here?
Is my calculation to correct for the parallel planes in step 4 correct? I feel like I am missing something but I can't see what. One thing I am wondering is whether I can get the camera height from opencv once solvePnp is done.
I am a newbie to CV so If anything else is totally wrong please also point it out to me.
Thank you for reading this wall of text!

camera calibration for single plane

My problem statement is very simple. But I am unable to get the opencv calibration work for me. I am using the code from here : source code.
I have to take images parallel to the camera at a fixed distance. I tried taking test images (about 20 of them) only parallel to the camera as well as at different planes. Also I changed the size and the no of squares.
What would be the best way to calibrate in this scenario?
The undistorted image is cropped later, that's why it looks smaller.
After going through the images closely, the pincushion distortion seems to have been corrected. But the "trapezoidal" distortion still remains. Since the camera is mounted in a closed box, the planes at which I can take images is limited.
To simplify what Vlad already said: It is theoretically impossible to calibrate your camera with test images only parallel to the camera. You have to change your calibration board's orientation. In fact, you should have different orientation in each test image.
Check out the first two images in the link below to see how the calibration board should be slanted (or tilted):
http://www.vision.caltech.edu/bouguetj/calib_doc/
think about calibration problem as finding a projection matrix P:
image_points = P * 3d_points, where P = intrinsic * extrinsic
Now just bear with me:
You basically are interested in intrinsic part but the calibration algorithm has to find both intrinsic and extrinsic. Now, each column of projection matrix can be obtained if you select a 3D point at infinity, for example xInf = [1, 0, 0, 0]. This point is at infinity because when you transform it from homogeneous coordinates to Cartesian you get
[1/0, 0, 0]. If you multiply a projection matrix with a point at infinity you will get its corresponding column (1st for Xinf, 2nd for yInf, 3rd for zInf and 4th for camera center).
Thus the conclusion is simple - to get a projection matrix (that is a successful calibration) you have to clearly see points at infinity or vanishing points from the converging extensions of lines in your chessboard rig (aka end of the railroad tracks at the horizon). Your images don’t make it easy to detect vanishing points since you don’t slant your chessboard, nor rotate nor scale it by stepping back. Thus your calibration will always fail.

Rectification of uncalibrated cameras, via fundamental matrix

I'm trying to do calibration of Kinect camera and external camera, with Emgu/OpenCV.
I'm stuck and I would really appreciate any help.
I've choose do this via fundamental matrix, i.e. epipolar geometry.
But the result is not as I've expected. Result images are black, or have no sense at all.
Mapx and mapy points are usually all equal to infinite or - infinite, or all equals to 0.00, and rarely have regular values.
This is how I tried to do rectification:
1.) Find image points get two arrays of image points (one for every camera) from set of images. I've done this with chessboard and FindChessboardCorners function.
2.) Find fundamental matrix
CvInvoke.cvFindFundamentalMat(points1Matrix, points2Matrix,
_fundamentalMatrix.Ptr, CV_FM.CV_FM_RANSAC,1.0, 0.99, IntPtr.Zero);
Do I pass all collected points from whole set of images, or just from two images trying to rectify?
3.) Find homography matrices
CvInvoke.cvStereoRectifyUncalibrated(points11Matrix, points21Matrix,
_fundamentalMatrix.Ptr, Size, h1.Ptr, h2.Ptr, threshold);
4.) Get mapx and mapy
double scale = 0.02;
CvInvoke.cvInvert(_M1.Ptr, _iM.Ptr, SOLVE_METHOD.CV_LU);
CvInvoke.cvMul(_H1.Ptr, _M1.Ptr, _R1.Ptr,scale);
CvInvoke.cvMul(_iM.Ptr, _R1.Ptr, _R1.Ptr, scale);
CvInvoke.cvInvert(_M2.Ptr, _iM.Ptr, SOLVE_METHOD.CV_LU);
CvInvoke.cvMul(_H2.Ptr, _M2.Ptr, _R2.Ptr, scale);
CvInvoke.cvMul(_iM.Ptr, _R2.Ptr, _R2.Ptr, scale);
CvInvoke.cvInitUndistortRectifyMap(_M1.Ptr,_D1.Ptr, _R1.Ptr, _M1.Ptr,
mapxLeft.Ptr, mapyLeft.Ptr) ;
I have a problem here...since I'm not using calibrated images, what is my camera matrix and distortion coefficients ? How can I get it from fundamental matrix or homography matrices?
5.) Remap
CvInvoke.cvRemap(src.Ptr, destRight.Ptr, mapxRight, mapyRight,
(int)INTER.CV_INTER_LINEAR, new MCvScalar(255));
And this doesn't returning good result. I would appreciate if someone would tell me what am I doing wrong.
I have set of 25 pairs of images, and chessboard pattern size 9x6.
The book "Learning OpenCV," from O'Reilly publishing, has two full chapters devoted to this specific topic. Both make heavy use of OpenCV's included routines cvCalibrateCamera2() and cvStereoCalibrate(); These routines are wrappers to code that is very similar to what you have written here, with the added benefit of having been more thoroughly debugged by the folks who maintain the OpenCV libraries. while they are convenient, both require quite a bit of preprocessing to achieve the necessary inputs to the routines. There may in fact be a sample program, somewhere deep in the samples directory of the OpenCV distribution, that uses these routines, with examples on how to go from chessboard image to calibration/intrinsics matrix. If you take an in depth look at any of these places, I am sure you will see how you can achieve your goal with advice from the experts.
cv::findFundamentalMat cannot work if the intrinsic parameter of your image points is an identity matrix. In other words, it cannot work with unprojected image points.

OpenCV cvRemap Cropping Image

So I am very new to OpenCV (2.1), so please keep that in mind.
So I managed to calibrate my cheap web camera that I am using (with a wide angle attachment), using the checkerboard calibration method to produce the intrinsic and distortion coefficients.
I then have no trouble feeding these values back in and producing image maps, which I then apply to a video feed to correct the incoming images.
I run into an issue however. I know when it is warping/correcting the image, it creates several skewed sections, and then formats the image to crop out any black areas. My question then is can I view the complete warped image, including some regions that have black areas? Below is an example of the black regions with skewed sections I was trying to convey if my terminology was off:
An image better conveying the regions I am talking about can be found here! This image was discovered in this post.
Currently: The cvRemap() returns basically the yellow box in the image linked above, but I want to see the whole image as there is relevant data I am looking to get out of it.
What I've tried: Applying a scale conversion to the image map to fit the complete image (including stretched parts) into frame
CvMat *intrinsic = (CvMat*)cvLoad( "Intrinsics.xml" );
CvMat *distortion = (CvMat*)cvLoad( "Distortion.xml" );
cvInitUndistortMap( intrinsic, distortion, mapx, mapy );
cvConvertScale(mapx, mapx, 1.25, -shift_x); // Some sort of scale conversion
cvConvertScale(mapy, mapy, 1.25, -shift_y); // applied to the image map
cvRemap(distorted,undistorted,mapx,mapy);
The cvConvertScale, when I think I have aligned the x/y shift correctly (guess/checking), is somehow distorting the image map making the correction useless. There might be some math involved here I am not correctly following/understanding.
Does anyone have any other suggestions to solve this problem, or what I might be doing wrong? I've also tried trying to write my own code to fix distortion issues, but lets just say OpenCV knows already how to do it well.
From memory, you need to use InitUndistortRectifyMap(cameraMatrix,distCoeffs,R,newCameraMatrix,map1,map2), of which InitUndistortMap is a simplified version.
cvInitUndistortMap( intrinsic, distort, map1, map2 )
is equivalent to:
cvInitUndistortRectifyMap( intrinsic, distort, Identity matrix, intrinsic,
map1, map2 )
The new parameters are R and newCameraMatrix. R species an additional transformation (e.g. rotation) to perform (just set it to the identity matrix).
The parameter of interest to you is newCameraMatrix. In InitUndistortMap this is the same as the original camera matrix, but you can use it to get that scaling effect you're talking about.
You get the new camera matrix with GetOptimalNewCameraMatrix(cameraMat, distCoeffs, imageSize, alpha,...). You basically feed in intrinsic, distort, your original image size, and a parameter alpha (along with containers to hold the result matrix, see documentation). The parameter alpha will achieve what you want.
I quote from the documentation:
The function computes the optimal new camera matrix based on the free
scaling parameter. By varying this parameter the user may retrieve
only sensible pixels alpha=0, keep all the original image pixels if
there is valuable information in the corners alpha=1, or get something
in between. When alpha>0, the undistortion result will likely have
some black pixels corresponding to “virtual” pixels outside of the
captured distorted image. The original camera matrix, distortion
coefficients, the computed new camera matrix and the newImageSize
should be passed to InitUndistortRectifyMap to produce the maps for
Remap.
So for the extreme example with all the black bits showing you want alpha=1.
In summary:
call cvGetOptimalNewCameraMatrix with alpha=1 to obtain newCameraMatrix.
use cvInitUndistortRectifymap with R being identity matrix and newCameraMatrix set to the one you just calculated
feed the new maps into cvRemap.

Resources