Error in marker pose estimation using single camera - opencv

I use the following OpenCV code to estimate the pose of a square Marker and draw the 3 axises of the marker on the image. But the Z-axis of the marker rotates 180 degrees time to time as shown in the image below. How to make the z-axis stable?
// Marker world coordinates
vector<Point3f> objecPoints;
objecPoints.push_back(Point3f(0, 0, 0));
objecPoints.push_back(Point3f(0, 2.4, 0));
objecPoints.push_back(Point3f(2.4, 2.4, 0));
objecPoints.push_back(Point3f(2.4, 0.0, 0));
// 2D image coordinates of 4 marker corners. They are arranged in the same order for each frame
vector<Point2f> marker2DPoints;
// Calculate Rotation and Translation
cv::Mat Rvec;
cv::Mat_<float> Tvec;
cv::Mat raux, taux;
cv::solvePnP(objecPoints, marker2DPoints, camMatrix, distCoeff, raux, taux);
// Draw marker pose on the image
vector<Point3f> axisPoints3D;
axisPoints3D.push_back(Point3f(0, 0, 0));
axisPoints3D.push_back(Point3f(2.4, 0, 0));
axisPoints3D.push_back(Point3f(0, 2.4, 0));
axisPoints3D.push_back(Point3f(0, 0, 2.4));
vector<Point2f> axisPoints2D;
// Take the camMatrix and distCoeff from camera calibration results
projectPoints(axisPoints3D, Rvec, Tvec, camMatrix, distCoeff, axisPoints2D);
line(srcImg, axisPoints2D[0], axisPoints2D[1], CV_RGB(0, 0, 255), 1, CV_AA);
line(srcImg, axisPoints2D[0], axisPoints2D[2], CV_RGB(0, 255, 0), 1, CV_AA);
line(srcImg, axisPoints2D[0], axisPoints2D[3], CV_RGB(255, 0, 0), 1, CV_AA);

This probably would be better as a comment but I don't have enough reputation for that. I think this may be happening due to the order in which solvePnP is getting the coordinates for your tag. Furthermore, since solvePnP is just trying to match (in this case) 4 points on a 3D plane to 4 2D points in an image, there are multiple solutions for this. The tag could be rotated around its up axis, as well as flipped upside down. solvePnP doesn't know from the provided points which is the upwards facing direction.
I have a hunch that solvePnP is a bit too general for this problem as stable tag detection algorithm should be able to feed the corners to the pose estimation code in a stable order.
Edit: order of the corners is important and the solution given by solvePnP depends on it. Perhaps the algorithm generating your corner points is not providing the corners in a consistent order? Please share the output of tags.points

Related

After Pose Estimation 3D Coordinate Axes Misplaced

As a newbie I'm trying calculate pose of an planar object using OpenCV's solvePnP. Howeve, I see a weird output. The axes I draw always draws axes on the corner of my frame. To draw my axis I use:
drawFrameAxes(frame_copy, cameraMatrix, distanceCoeffisions, rvec, tvec, length);
The output I get is as follows:
P.s. (X:red, Y: green, Z: blue)
output of my code
output of my code_highlight
I don't have any depth information
I am not sure if this is true but to obtain 3D points I use inliers and define the z coordinate as 0.
Points.push_back(Point3f(inliers[i].pt.x, inliers[i].pt.y, 0));
So what could be the problem, any resource pointers or suggestions is my guest.
Solved the problem
Solution :
Fixed camera calibration and problem solved.
Thanks!

Is it possible to use a inverse distance transform?

I have an input depth image of rocks which I have to segment. I have found a good way to detect the edges. I want to apply the Watershed Algorithm to segment the rocks. I need to apply the distance transform now to know the distance to the edges. But the results are not as expected.
I tried various other options provided by opencv including CV_DIST_FAIR, CV_DIST_WELSCH.
But I need something of an inverse to the distance transform which shows highest intensity within the rocks. I can use this then as markers.
// Tried to inverse the binary but that doesn't work
//cv::bitwise_not(cannyedge_detected_image, cannyedge_detected_image);
// Finding the distance to the boundaries
cv::Mat distance_transformed_image, dst_distance_transform;
cv::distanceTransform(cannyedge_detected_image, distance_transformed_image, CV_DIST_L2, 3);
cv::normalize(distance_transformed_image, distance_transformed_image, 0, 1., cv::NORM_MINMAX);
cv::resize(distance_transformed_image, dst_distance_transform, cv::Size(image.cols * 3, image.rows * 3));
cv::imshow("Distance Transform", dst_distance_transform);
cv::waitKey(200);
The results look like :

Servo control with OpenCV

I will track an object according to the coordinates that I read from OpenCV. The thing is that: in order to turn my servo 120 degrees in the positive way, I need to send 3300 more to servo. (for 1 degree I need to send 27.5 more to servo).
Now I need to find a relation with the coordinates that I read from OpenCV and the value I need to send to servo. However I could not understand OpenCV's coordinates. For example I do not change object's height, I only decrease the distance between the object and the camera, in that case, only z value should decrease however it seems like x value also changes significantly. What is the reason for that?
In case, I have a problem with my code (maybe x is not changing, but I am reading it wrong), could you please give me information about OpenCV coordinates and how to interpret it? As I said in the beginning, I need to find a relation like how many degrees turn of my servo correspond to how much change in the balls X coordinates that I read from OpenCV?
Regards
edit1 for #FvD:
int i;
for (i = 0; i < circles->total; i++)
{
float *p = (float*)cvGetSeqElem(circles, i);
printf("Ball! x=%f y=%f r=%f\n\r",p[0],p[1],p[2] );
CvPoint center = cvPoint(cvRound(p[0]),cvRound(p[1]));
CvScalar val = cvGet2D(finalthreshold, center.y, center.x);
if (val.val[0] < 1) continue;
cvCircle(frame, center, 3, CV_RGB(0,255,0), -1, CV_AA, 0);
cvCircle(frame, center, cvRound(p[2]), CV_RGB(255,0,0), 3, CV_AA, 0);
cvCircle(finalthreshold, center, 3, CV_RGB(0,255,0), -1, CV_AA, 0);
cvCircle(finalthreshold, center, cvRound(p[2]), CV_RGB(255,0,0), 3, CV_AA, 0);
}
In general, there are no OpenCV coordinates, but you will frequently use the columns and rows of an image matrix as image coordinates.
If you have calibrated your camera, you can transform those image coordinates to real-world coordinates. In the general case, you cannot pinpoint the location of an object in space with a single camera image, unless you have a second camera (stereo vision) or supplementary information about the scene, e.g. if you are detecting objects on the ground and you know the orientation and position of your camera relative to the ground plane. In that case, moving your ball towards the camera would result in unexpected movement because the assumption that it is lying on the ground is violated.
The coordinates in the code snippet you provided are image coordinates. The third "coordinate" is the radius of the circular blob detected in the webcam image and the first two are the column and row of the circle's center in the image.
I'm not sure how you are moving the ball in your test, but if the center of the ball stays stationary in your images and you still get differing x coordinates, you should look into the detection algorithm you are using.

How to determine world coordinates of a camera?

I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:
Read in previously computed camera intrinsics file
Get the pixel coordinates of the 4 points of the target rectangle from the contour
Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
Call projectPoints with the rotation and translation vectors
???
I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(
2013-04-02
Following the advice from "morynicz", I have written this simple standalone program.
#include <Windows.h>
#include "opencv\cv.h"
using namespace cv;
int main (int argc, char** argv)
{
const char *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage camera_data (calibration_filename, FileStorage::READ);
Mat camera_intrinsics, distortion;
vector<Point3d> world_coords;
vector<Point2d> pixel_coords;
Mat rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat cw_transform = cv::Mat::eye (4, 4, CV_64FC1);
// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();
// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));
// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));
// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));
// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;
printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}
The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?
Rotation vector
2.7005
0.0328
0.4590
Translation vector
-10.4774
8.1194
13.9423
Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009
Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).
What units is the pose in?
You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.
The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.
If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.
The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)
Looking at Your code, You need to do something like this
cv::Mat R;
cv::Rodrigues(rotation_vector, R);
cv::Mat cameraRotationVector;
cv::Rodrigues(R.t(),cameraRotationVector);
cv::Mat cameraTranslationVector = -R.t()*translation_vector;
cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.
It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z.
It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)
Edit:
I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.

How do I set ROIs in OpenCV for face then mouth detection in video feeds?

We are using OpenCV for our project, and we plan to detect the face, then use that created rectangle as our ROI, then detect the mouth there. How do we set the ROI for videos? We searched how, but could only find answers for still images. We would like to set our ROI to the lower 1/3 or lower half of the detected face.
"haarcascade_mcs_mouth.xml" is used, but the rectangle is placed in the wrong spot. The mouth is detected near the right eyebrow.
Then you could first restrict the mouth cascade search in the bottom face region you have already detected.
For image and videos, the process is same.
And if the rectangle is in wrong place, then you probably doing mistake in finding the points of mouth with the respect of frame.
Actually
mouth_cascade.detectMultiScale(faces, mouth);
will give you co-ordinates ** w.r.t face**. So, when you are defining points for the co-ordinates of mouth then you must ensure adding co-ordinates of face.
Ex.
mouth_cascade.detectMultiScale(faces[j], mouth);
for( size_t i=0; i < mouth.size(); i+++
{
Point pt1( faces[j].x + mouth[i].x, faces[j].y + mouth[i].y);
Point pt2( pt1.x + mouth[i].width, pt1.y + mouth[i].height);
rectangle( frame, pt2, pt1, cvScalar(0, 0, 0, 255), 1, 8, 0);
}
I hope I am clear with my view.
You should add your code, as it's a bit confusing what the actual problem here is.
There is no difference in setting ROI for video or for a picture, for a video you will simply have a loop, where the Mat frame is continually updated. (I'm assuming you're using the C++ API and not the C API).
As for how to create a ROI in the bottom half of the face, take a look at this tutorial (which btw uses video) and the cv::detectMultiScale() function.
If you look in the tutorial, you'll see that they create the face ROI like so:
Mat faceROI = frame_gray( faces[i] );
If you look at faces, you see that it's a std::vector< Rect >, so faces[i] is a Rect containing the face detected by face_cascade.detectMultiScale( ... ).
So instead of creating the faceROI directly using that Rect, use a different Rect that only contains the lower half. Take a look at what a cv::Rect is, and you'll find it is defined by the Rect.x and Rect.y coordinates of the top-left corner, and then by it's Rect.width and Rect.height. So create the ROI accordingly:
Rect tmp = faces[i]; //the Rect you want is the same as the original Rect
tmp.y = faces[i].y+faces[i].height/2; //except that it starts from half the face downwards (note that in image coordinates, origin is the topleft corner, and y increases downwards.
Mat faceROI = frame_gray(tmp);

Resources