I will track an object according to the coordinates that I read from OpenCV. The thing is that: in order to turn my servo 120 degrees in the positive way, I need to send 3300 more to servo. (for 1 degree I need to send 27.5 more to servo).
Now I need to find a relation with the coordinates that I read from OpenCV and the value I need to send to servo. However I could not understand OpenCV's coordinates. For example I do not change object's height, I only decrease the distance between the object and the camera, in that case, only z value should decrease however it seems like x value also changes significantly. What is the reason for that?
In case, I have a problem with my code (maybe x is not changing, but I am reading it wrong), could you please give me information about OpenCV coordinates and how to interpret it? As I said in the beginning, I need to find a relation like how many degrees turn of my servo correspond to how much change in the balls X coordinates that I read from OpenCV?
Regards
edit1 for #FvD:
int i;
for (i = 0; i < circles->total; i++)
{
float *p = (float*)cvGetSeqElem(circles, i);
printf("Ball! x=%f y=%f r=%f\n\r",p[0],p[1],p[2] );
CvPoint center = cvPoint(cvRound(p[0]),cvRound(p[1]));
CvScalar val = cvGet2D(finalthreshold, center.y, center.x);
if (val.val[0] < 1) continue;
cvCircle(frame, center, 3, CV_RGB(0,255,0), -1, CV_AA, 0);
cvCircle(frame, center, cvRound(p[2]), CV_RGB(255,0,0), 3, CV_AA, 0);
cvCircle(finalthreshold, center, 3, CV_RGB(0,255,0), -1, CV_AA, 0);
cvCircle(finalthreshold, center, cvRound(p[2]), CV_RGB(255,0,0), 3, CV_AA, 0);
}
In general, there are no OpenCV coordinates, but you will frequently use the columns and rows of an image matrix as image coordinates.
If you have calibrated your camera, you can transform those image coordinates to real-world coordinates. In the general case, you cannot pinpoint the location of an object in space with a single camera image, unless you have a second camera (stereo vision) or supplementary information about the scene, e.g. if you are detecting objects on the ground and you know the orientation and position of your camera relative to the ground plane. In that case, moving your ball towards the camera would result in unexpected movement because the assumption that it is lying on the ground is violated.
The coordinates in the code snippet you provided are image coordinates. The third "coordinate" is the radius of the circular blob detected in the webcam image and the first two are the column and row of the circle's center in the image.
I'm not sure how you are moving the ball in your test, but if the center of the ball stays stationary in your images and you still get differing x coordinates, you should look into the detection algorithm you are using.
Related
I think the easiest is to explain problem with image:
I have two cubes (same size) that are laying on the table. One of their side is marked with green color (for easy tracking). I want to calculate the relative position (x,y) of left cube to the right cube (red line on the picture) in cube size unit.
Is it even possible? I know problem would be simple if those two green sides would have common plane - like top side of cube however I can't use that for tracking. I would just calculate homography for one square and multiply with other cube corner.
Should I 'rotate' homography matrix by multiplying with 90deegre rotation matrix to get 'ground' homography? I plan to do processing in smartphone scenario so maybe gyroscope, camera intrinsic params can be of any value.
This is possible.
Let's assume (or state) that the table is the z=0-plane and that your first box is at the origin of this plane. This means that green corners of the left box have the (table-)coordinates (0,0,0),(1,0,0),(0,0,1) and (1,0,1). (Your box has the size 1).
You also have the pixel coordinates of these points. If you give these 2d and 3d-values (as well as the intrinsics and distortion of the camera) to cv::solvePnP, you get the relative Pose of the camera to your box (and the plane).
In the next step, you have to intersect the table-plane with the ray that goes from your camera's center through the lower right corner pixels of the second green box. This intersection will look like (x,y,0) and [x-1,y] will be translation between the right corners of your boxes.
If you have all the information (camera intrinsics) you can do it the way FooBar answered.
But you can use the information that the points lie on a plane even more directly with a homography (no need to calculate rays etc):
Compute the homography between the image plane and the ground plane.
Unfortunately you need 4 point correspondences, but there are only 3 cube-points visible in the image, touching the ground plane.
Instead you can use the top-plane of the cubes, where the same distance can be measured.
first the code:
int main()
{
// calibrate plane distance for boxes
cv::Mat input = cv::imread("../inputData/BoxPlane.jpg");
// if we had 4 known points on the ground plane, we could use the ground plane but here we instead use the top plane
// points on real world plane: height = 1: // so it's not measured on the ground plane but on the "top plane" of the cube
std::vector<cv::Point2f> objectPoints;
objectPoints.push_back(cv::Point2f(0,0)); // top front
objectPoints.push_back(cv::Point2f(1,0)); // top right
objectPoints.push_back(cv::Point2f(0,1)); // top left
objectPoints.push_back(cv::Point2f(1,1)); // top back
// image points:
std::vector<cv::Point2f> imagePoints;
imagePoints.push_back(cv::Point2f(141,302));// top front
imagePoints.push_back(cv::Point2f(334,232));// top right
imagePoints.push_back(cv::Point2f(42,231)); // top left
imagePoints.push_back(cv::Point2f(223,177));// top back
cv::Point2f pointToMeasureInImage(741,200); // bottom right of second box
// for transform we need the point(s) to be in a vector
std::vector<cv::Point2f> sourcePoints;
sourcePoints.push_back(pointToMeasureInImage);
//sourcePoints.push_back(pointToMeasureInImage);
sourcePoints.push_back(cv::Point2f(718,141));
sourcePoints.push_back(imagePoints[0]);
// list with points that correspond to sourcePoints. This is not needed but used to create some ouput
std::vector<int> distMeasureIndices;
distMeasureIndices.push_back(1);
//distMeasureIndices.push_back(0);
distMeasureIndices.push_back(3);
distMeasureIndices.push_back(2);
// draw points for visualization
for(unsigned int i=0; i<imagePoints.size(); ++i)
{
cv::circle(input, imagePoints[i], 5, cv::Scalar(0,255,255));
}
//cv::circle(input, pointToMeasureInImage, 5, cv::Scalar(0,255,255));
//cv::line(input, imagePoints[1], pointToMeasureInImage, cv::Scalar(0,255,255), 2);
// compute the relation between the image plane and the real world top plane of the cubes
cv::Mat homography = cv::findHomography(imagePoints, objectPoints);
std::vector<cv::Point2f> destinationPoints;
cv::perspectiveTransform(sourcePoints, destinationPoints, homography);
// compute the distance between some defined points (here I use the input points but could be something else)
for(unsigned int i=0; i<sourcePoints.size(); ++i)
{
std::cout << "distance: " << cv::norm(destinationPoints[i] - objectPoints[distMeasureIndices[i]]) << std::endl;
cv::circle(input, sourcePoints[i], 5, cv::Scalar(0,255,255));
// draw the line which was measured
cv::line(input, imagePoints[distMeasureIndices[i]], sourcePoints[i], cv::Scalar(0,255,255), 2);
}
// just for fun, measure distances on the 2nd box:
float distOn2ndBox = cv::norm(destinationPoints[0]-destinationPoints[1]);
std::cout << "distance on 2nd box: " << distOn2ndBox << " which should be near 1.0" << std::endl;
cv::line(input, sourcePoints[0], sourcePoints[1], cv::Scalar(255,0,255), 2);
cv::imshow("input", input);
cv::waitKey(0);
return 0;
}
Here's the output which I want to explain:
distance: 2.04674
distance: 2.82184
distance: 1
distance on 2nd box: 0.882265 which should be near 1.0
those distances are:
1. the yellow bottom one from one box to the other
2. the yellow top one
3. the yellow one on the first box
4. the pink one
so the red line (you asked for) should have a length of nearly exactly 2 x cube side length. But we have some error as you can see.
the better/more correct your pixel positions are before homography computation, the more exact your results.
You need a pinhole camera model, so undistort your camera (in real world application).
keep in mind too, that you could compute the distances on the ground plane, if you had 4 linear points visible there (that dont lie on same line)!
I have a mesh that is stored as an array of Vertices with an Index array used to draw it. Four of the vertices are also redrawn with a shader to highlight the points, and the indices for these are stored in another array.
The user can rotate the model using touches, which affects the modelViewMatrix:
modelViewMatrix = GLKMatrix4Multiply(modelViewMatrix, _rotMatrix);
My problem is that I need to detect which of my four highlighted points is closest to the screen when the user makes a rotation.
I think the best method would be to calculate the distance from the near clip of the view frustum to the the point, but how to I calculate those points in the first place?
You can do this easily from camera/eye space[1], where everything is relative to the camera (So, the camera will be at (0, 0, 0) and looking down the negative z axis).
Use your modelViewMatrix to transform the vertex to camera space, say vertex_cs. Then the distance of the vertex from the camera (plane) would simply be the -vertex_cs.z .
--
1. What exactly are eye space coordinates?
I am totally new in OpenCV and Xcode.
I'm trying to find some traffic sign by using colour and hough circle detection. This is what I have done so far:
cv::cvtColor(cvImage, cvGrayImage, CV_RGB2HSV);
cv::Mat cvThresh;
cv::inRange(cvGrayImage,cv::Scalar(170,160,10),cv::Scalar(180,255,256),cvThresh);
//cv::dilate(cvThresh,cvThresh, cv::Mat(),cv::Point(-1,-1),2,1,1);
cv::GaussianBlur(cvThresh, cvThresh, cv::Size(9,9), 2,2);
cv::vector<cv::Vec3f> circles;
cv::HoughCircles(cvThresh, circles, CV_HOUGH_GRADIENT, 1,cvThresh.rows/4,200,30);
// NSLog(#"Circles: %ld", circles.size());
for(size_t i = 0; i < circles.size(); i++)
{
cv::Point center((cvRound(circles[i][0]), cvRound(circles[i][2])));
int radius = cvRound(circles[i][2]);
cv::circle(cvImage, center, 3, cv::Scalar(255,0,0), -1, 8, 0);
cv::circle(cvImage, center, radius, cv::Scalar(0,0,255),3,8,0);
}
This is my result, but I have no idea anymore..
Any idea or advice or sample code would be appreciated.
If you have an idea what size circles you are looking for, then it would be best to set min_radius and max_radius accordingly. Otherwise, it will return anything circular of any size.
Parameters 1 and 2 don't affect accuracy as such, more reliability.
Param 1 will set the sensitivity; how strong the edges of the circles need to be. Too high and it won't detect anything, too low and it will find too much clutter.
Param 2 will set how many edge points it needs to find to declare that it's found a circle. Again, too high will detect nothing, too low will declare anything to be a circle. The ideal value of param 2 will be related to the circumference of the circles.
I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:
Read in previously computed camera intrinsics file
Get the pixel coordinates of the 4 points of the target rectangle from the contour
Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
Call projectPoints with the rotation and translation vectors
???
I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(
2013-04-02
Following the advice from "morynicz", I have written this simple standalone program.
#include <Windows.h>
#include "opencv\cv.h"
using namespace cv;
int main (int argc, char** argv)
{
const char *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage camera_data (calibration_filename, FileStorage::READ);
Mat camera_intrinsics, distortion;
vector<Point3d> world_coords;
vector<Point2d> pixel_coords;
Mat rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat cw_transform = cv::Mat::eye (4, 4, CV_64FC1);
// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();
// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));
// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));
// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));
// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;
printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}
The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?
Rotation vector
2.7005
0.0328
0.4590
Translation vector
-10.4774
8.1194
13.9423
Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009
Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).
What units is the pose in?
You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.
The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.
If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.
The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)
Looking at Your code, You need to do something like this
cv::Mat R;
cv::Rodrigues(rotation_vector, R);
cv::Mat cameraRotationVector;
cv::Rodrigues(R.t(),cameraRotationVector);
cv::Mat cameraTranslationVector = -R.t()*translation_vector;
cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.
It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z.
It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)
Edit:
I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.
We are using OpenCV for our project, and we plan to detect the face, then use that created rectangle as our ROI, then detect the mouth there. How do we set the ROI for videos? We searched how, but could only find answers for still images. We would like to set our ROI to the lower 1/3 or lower half of the detected face.
"haarcascade_mcs_mouth.xml" is used, but the rectangle is placed in the wrong spot. The mouth is detected near the right eyebrow.
Then you could first restrict the mouth cascade search in the bottom face region you have already detected.
For image and videos, the process is same.
And if the rectangle is in wrong place, then you probably doing mistake in finding the points of mouth with the respect of frame.
Actually
mouth_cascade.detectMultiScale(faces, mouth);
will give you co-ordinates ** w.r.t face**. So, when you are defining points for the co-ordinates of mouth then you must ensure adding co-ordinates of face.
Ex.
mouth_cascade.detectMultiScale(faces[j], mouth);
for( size_t i=0; i < mouth.size(); i+++
{
Point pt1( faces[j].x + mouth[i].x, faces[j].y + mouth[i].y);
Point pt2( pt1.x + mouth[i].width, pt1.y + mouth[i].height);
rectangle( frame, pt2, pt1, cvScalar(0, 0, 0, 255), 1, 8, 0);
}
I hope I am clear with my view.
You should add your code, as it's a bit confusing what the actual problem here is.
There is no difference in setting ROI for video or for a picture, for a video you will simply have a loop, where the Mat frame is continually updated. (I'm assuming you're using the C++ API and not the C API).
As for how to create a ROI in the bottom half of the face, take a look at this tutorial (which btw uses video) and the cv::detectMultiScale() function.
If you look in the tutorial, you'll see that they create the face ROI like so:
Mat faceROI = frame_gray( faces[i] );
If you look at faces, you see that it's a std::vector< Rect >, so faces[i] is a Rect containing the face detected by face_cascade.detectMultiScale( ... ).
So instead of creating the faceROI directly using that Rect, use a different Rect that only contains the lower half. Take a look at what a cv::Rect is, and you'll find it is defined by the Rect.x and Rect.y coordinates of the top-left corner, and then by it's Rect.width and Rect.height. So create the ROI accordingly:
Rect tmp = faces[i]; //the Rect you want is the same as the original Rect
tmp.y = faces[i].y+faces[i].height/2; //except that it starts from half the face downwards (note that in image coordinates, origin is the topleft corner, and y increases downwards.
Mat faceROI = frame_gray(tmp);