OpenCV Get distance to rotated fiducial marker (Aruco) - opencv

(In iOS) I need to determine the distance to a fiducial marker which may be rotated.
I'm using the Aruco library and I have so far implemented marker detection and pose estimation in iOS, but what I really need is the distance to that marker from the camera.
I've seen some examples where you use the real size of the marker, focal length of the camera and the size on screen of the marker to compute distance, but this doesn't take into account any rotation applied to the marker.
As I have the pose estimation working I'm "guessing" that it should be possible to un-rotate the corner points of the marker then use the bounding box of those points, along with the real size and camera focal length. Though I'm not entirely sure that is correct, or how to implement it.
This is my first time using OpenCV so I'm really just guessing at the moment.
Any and all help is very much appreciated.
Many thanks

I use the code below in my project to detect Z distance between camera and marker. I'm not sure whether it's completely correct or not, but it gives acceptable results.
Ptr<aruco::DetectorParameters> detectorParams = aruco::DetectorParameters::create();
detectorParams->doCornerRefinement = true;
Ptr<aruco::Dictionary> dictionary = aruco::getPredefinedDictionary(aruco::PREDEFINED_DICTIONARY_NAME(kMarkerDictionaryID));
Mat camMatrix, distCoeffs;
if (!readCameraParameters(kCameraParametersFile, camMatrix, distCoeffs)) {
// ...
return;
}
VideoCapture inputVideo;
inputVideo.open(kCameraID);
while (inputVideo.grab()) {
Mat image;
inputVideo.retrieve(image);
vector< int > ids;
vector< vector< Point2f > > corners;
vector< Vec3d > rvecs, tvecs;
aruco::detectMarkers(image, dictionary, corners, ids, detectorParams);
if (ids.size() > 0) {
aruco::estimatePoseSingleMarkers(corners, kMarkerLengthInMeters, camMatrix, distCoeffs, rvecs, tvecs);
for (unsigned int i = 0; i < ids.size(); i++) {
// if (ids[i] != kMarkerID)
// continue;
// Calc camera pose
Mat R;
Rodrigues(rvecs[i], R);
Mat cameraPose = -R.t() * (Mat)tvecs[i];
double x = cameraPose.at<double>(0,0);
double y = cameraPose.at<double>(0,1);
double z = cameraPose.at<double>(0,2);
// So z is the distance between camera and marker
// Or if you need rotation invariant offsets
// x = tvecs[i][0];
// y = tvecs[i][0];
// z = tvecs[i][0];
cout << "X: " << x << " Y: " << y << " Z: " << z << endl;
aruco::drawAxis(image, camMatrix, distCoeffs, rvecs[i], tvecs[i], kMarkerLengthInMeters * 0.5f);
}
aruco::drawDetectedMarkers(image, corners, resultIds);
}
resize(image, image, Size(image.cols / 2, image.rows / 2));
imshow("out", image);
char key = (char)waitKey(kWaitTimeInmS);
if (key == 27) break;
}

Related

Image perspective correction and measurement with a homography

I want to calibrate a camera in order to make real world measurements of distance between features in an image. I'm using the OpenCV demo images for this example. I'd like to know if what I'm doing is valid and/or if there is another/better way to go about it. I could use a checkerboard of known pitch to calibrate my camera. But in this example the pose of the camera is as shown in the image below. So I can't just calculate px/mm to make my real world distance measurements, I need to correct for the pose first.
I find the chessboard corners and calibrate the camera with a call to OpenCV's calibrateCamera which gives the intrinsics, extrinsics and distortion parameters.
Now, I want to be able to make measurements with the camera in this same pose. As an example I'll try and measure between two corners on the image above. I want to do a perspective correction on the image so I can effectively get a birds eye view. I do this using the method described here. The idea is that the rotation for camera pose that gets a birds eye view of this image is a rotation on the z-axis. Below is the rotation matrix.
Then following this OpenCV homography example I calculate the homography between the original view and my desired bird's eye view. If I do this on the image above I get an image that looks good, see below. Now that I have my perspective rectified image I find the corners again with findChessboardCorners and I calculate an average pixel distance between corners of ~36 pixels. If the distance between corners in rw units is 25mm I can say my px/mm scaling is 36/25 = 1.44 px/mm. Now for any image taken with the camera in this pose I can rectify the image and use this pixel scaling to measure distance between objects in the image. Is there a better way to allow for real world distance measurements here? Is it possible to do this perspective correction for pixels only? For example if I find the pixel locations of two corners in the original image can I apply the image rectification on the pixel coordinates only? Rather than on the entire image which can be computationally expensive? Just trying to deepen my understanding. Thanks
Some of my code
void MyCalibrateCamera(float squareSize, const std::string& imgPath, const Size& patternSize)
{
Mat img = imread(samples::findFile(imgPath));
Mat img_corners = img.clone(), img_pose = img.clone();
//! [find-chessboard-corners]
vector<Point2f> corners;
bool found = findChessboardCorners(img, patternSize, corners);
//! [find-chessboard-corners]
if (!found)
{
cout << "Cannot find chessboard corners." << endl;
return;
}
drawChessboardCorners(img_corners, patternSize, corners, found);
imshow("Chessboard corners detection", img_corners);
waitKey();
//! [compute-object-points]
vector<Point3f> objectPoints;
calcChessboardCorners(patternSize, squareSize, objectPoints);
vector<Point2f> objectPointsPlanar;
vector <vector <Point3f>> objectPointsArray;
vector <vector <Point2f>> imagePointsArray;
imagePointsArray.push_back(corners);
objectPointsArray.push_back(objectPoints);
Mat intrinsics;
Mat distortion;
vector <Mat> rotation;
vector <Mat> translation;
double RMSError = calibrateCamera(
objectPointsArray,
imagePointsArray,
img.size(),
intrinsics,
distortion,
rotation,
translation,
CALIB_ZERO_TANGENT_DIST |
CALIB_FIX_K3 | CALIB_FIX_K4 | CALIB_FIX_K5 |
CALIB_FIX_ASPECT_RATIO);
cout << "intrinsics: " << intrinsics << endl;
cout << "rotation: " << rotation.at(0) << endl;
cout << "translation: " << translation.at(0) << endl;
cout << "distortion: " << distortion << endl;
drawFrameAxes(img_pose, intrinsics, distortion, rotation.at(0), translation.at(0), 2 * squareSize);
imshow("FrameAxes", img_pose);
waitKey();
//todo: is this a valid px/mm measure?
float px_to_mm = intrinsics.at<double>(0, 0) / (translation.at(0).at<double>(2,0) * 1000);
//undistort the image
Mat imgUndistorted, map1, map2;
initUndistortRectifyMap(
intrinsics,
distortion,
Mat(),
getOptimalNewCameraMatrix(
intrinsics,
distortion,
img.size(),
1,
img.size(),
0),
img.size(),
CV_16SC2,
map1,
map2);
remap(
img,
imgUndistorted,
map1,
map2,
INTER_LINEAR);
imshow("OrgImg", img);
waitKey();
imshow("UndistortedImg", imgUndistorted);
waitKey();
Mat img_bird_eye_view = img.clone();
//rectify
// https://docs.opencv.org/3.4.0/d9/dab/tutorial_homography.html#tutorial_homography_Demo3
// https://stackoverflow.com/questions/48576087/birds-eye-view-perspective-transformation-from-camera-calibration-opencv-python
//Get to a birds eye view, or -90 degrees z rotation
Mat rvec = rotation.at(0);
Mat tvec = translation.at(0);
//-90 degrees z. Required depends on the frame axes.
Mat R_desired = (Mat_<double>(3, 3) <<
0, 1, 0,
-1, 0, 0,
0, 0, 1);
//Get 3x3 rotation matrix from rotation vector
Mat R;
Rodrigues(rvec, R);
//compute the normal to the camera frame
Mat normal = (Mat_<double>(3, 1) << 0, 0, 1);
Mat normal1 = R * normal;
//compute d, distance . dot product between the normal and a point on the plane.
Mat origin(3, 1, CV_64F, Scalar(0));
Mat origin1 = R * origin + tvec;
double d_inv1 = 1.0 / normal1.dot(origin1);
//compute the homography to go from the camera view to the desired view
Mat R_1to2, tvec_1to2;
Mat tvec_desired = tvec.clone();
//get the displacement between our views
computeC2MC1(R, tvec, R_desired, tvec_desired, R_1to2, tvec_1to2);
//now calculate the euclidean homography
Mat H = R_1to2 + d_inv1 * tvec_1to2 * normal1.t();
//now the projective homography
H = intrinsics * H * intrinsics.inv();
H = H / H.at<double>(2, 2);
std::cout << "H:\n" << H << std::endl;
Mat imgToWarp = imgUndistorted.clone();
warpPerspective(imgToWarp, img_bird_eye_view, H, img.size());
Mat compare;
hconcat(imgToWarp, img_bird_eye_view, compare);
imshow("Bird eye view", compare);
waitKey();
...

SceneKit 3D Marker Augmented Reality iOS

Last couple of weeks I've been working on developing a simple proof-of-concept application in which a 3D model is projected over a specific Augmented Reality marker (in my case I am using Aruco markers) in IOS (with Swift and Objective-C)
I calibrated an Ipad Camera with a specific fixed lens position and used that to estimate the pose of the AR marker (which from my debug analysis seem pretty accurate). The problem seems (surprise, surprise) when I try to use SceneKit scene to project a model over the marker.
I am aware that the axis in opencv and SceneKit are different (Y and Z) and already done this correction as well as the row order/column order difference between the two libraries.
After constructing the projection matrix, I apply that same transform to the 3D model and from my debug analysis the object seems to be translated to the desired position and with the desired rotation. The problem is that it never does overlap the specific image pixel position of the marker. I am using a AVCapturePreviewVideoLayer as to put the video in background that has the same bounds as my SceneKit View.
Has anyone has a clue why this happens? I tried to play with cameras FOV's but with no real impact in the results.
Thank you all for your time.
EDIT1: I Will post some of the code here to reveal what I am currently doing.
I have two subviews inside the main view, one which is a background AVCaptureVideoPreviewLayer and another which is a SceneKitView. Both have the same bounds as the main view.
At each frame I use an opencv wrapper which outputs the pose of each marker:
std::vector<int> ids;
std::vector<std::vector<cv::Point2f>> corners, rejected;
cv::aruco::detectMarkers(frame, _dictionary, corners, ids, _detectorParams, rejected);
if (ids.size() > 0 ){
cv::aruco::drawDetectedMarkers(frame, corners, ids);
cv::Mat rvecs, tvecs;
cv::aruco::estimatePoseSingleMarkers(corners, 2.6, _intrinsicMatrix, _distCoeffs, rvecs, tvecs);
// Let's protect ourselves agains multiple markers
if (rvecs.total() > 1)
return;
_markerFound = true;
cv::Rodrigues(rvecs, _currentR);
_currentT = tvecs;
for (int row = 0; row < _currentR.rows; row++){
for (int col = 0; col < _currentR.cols; col++){
_currentExtrinsics.at<double>(row, col) = _currentR.at<double>(row, col);
}
_currentExtrinsics.at<double>(row, 3) = _currentT.at<double>(row);
}
_currentExtrinsics.at<double>(3,3) = 1;
std::cout << tvecs << std::endl;
// Convert coordinate systems of opencv to openGL (SceneKit)
// Note that in openCV z goes away the camera (in openGL goes into the camera)
// and y points down and on openGL point up
// Another note: openCV has a column order matrix representation, while SceneKit
// has a row order matrix, but we'll take care of it later.
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64F);
cvToGl.at<double>(0,0) = 1.0f;
cvToGl.at<double>(1,1) = -1.0f; // invert the y axis
cvToGl.at<double>(2,2) = -1.0f; // invert the z axis
cvToGl.at<double>(3,3) = 1.0f;
_currentExtrinsics = cvToGl * _currentExtrinsics;
cv::aruco::drawAxis(frame, _intrinsicMatrix, _distCoeffs, rvecs, tvecs, 5);
Then in each frame I convert the opencv matrix for a SCN4Matrix:
- (SCNMatrix4) transformToSceneKit:(cv::Mat&) openCVTransformation{
SCNMatrix4 mat = SCNMatrix4Identity;
// Transpose
openCVTransformation = openCVTransformation.t();
// copy the rotationRows
mat.m11 = (float) openCVTransformation.at<double>(0, 0);
mat.m12 = (float) openCVTransformation.at<double>(0, 1);
mat.m13 = (float) openCVTransformation.at<double>(0, 2);
mat.m14 = (float) openCVTransformation.at<double>(0, 3);
mat.m21 = (float)openCVTransformation.at<double>(1, 0);
mat.m22 = (float)openCVTransformation.at<double>(1, 1);
mat.m23 = (float)openCVTransformation.at<double>(1, 2);
mat.m24 = (float)openCVTransformation.at<double>(1, 3);
mat.m31 = (float)openCVTransformation.at<double>(2, 0);
mat.m32 = (float)openCVTransformation.at<double>(2, 1);
mat.m33 = (float)openCVTransformation.at<double>(2, 2);
mat.m34 = (float)openCVTransformation.at<double>(2, 3);
//copy the translation row
mat.m41 = (float)openCVTransformation.at<double>(3, 0);
mat.m42 = (float)openCVTransformation.at<double>(3, 1)+2.5;
mat.m43 = (float)openCVTransformation.at<double>(3, 2);
mat.m44 = (float)openCVTransformation.at<double>(3, 3);
return mat;
}
At each frame in which the AR marker is found I add a box to the scene and apply the transformation to the object node:
SCNBox *box = [SCNBox boxWithWidth:5.0 height:5.0 length:5.0 chamferRadius:0.0];
_boxNode = [SCNNode nodeWithGeometry:box];
if (found){
[self.delegate returnExtrinsicsMat:extrinsicMatrixOfTheMarker];
Mat R, T;
[self.delegate returnRotationMat:R];
[self.delegate returnTranslationMat:T];
SCNMatrix4 Transformation;
Transformation = [self transformToSceneKit:extrinsicMatrixOfTheMarker];
//_cameraNode.transform = SCNMatrix4Invert(Transformation);
[_sceneKitScene.rootNode addChildNode:_cameraNode];
//_cameraNode.camera.projectionTransform = SCNMatrix4Identity;
//_cameraNode.camera.zNear = 0.0;
_sceneKitView.pointOfView = _cameraNode;
_boxNode.transform = Transformation;
[_sceneKitScene.rootNode addChildNode:_boxNode];
//_boxNode.position = SCNVector3Make(Transformation.m41, Transformation.m42, Transformation.m43);
std::cout << (_boxNode.position.x) << " " << (_boxNode.position.y) << " " << (_boxNode.position.z) << std::endl << std::endl;
}
For example if the translation vector is (-1, 5, 20) the object appears in the scene in position (-1, -5, -20) in the scene, and the rotation is correct also. The problem is that it never appears in the correct position in the background image. I will add some images to show the result.
Does anyone know why this is happening?
Found out the solution. Instead of applying the transform to the node of the object I applied the inverted transformation matrix to the camera node. Then for the camera perspective transform matrix I applied the following matrix:
projection = SCNMatrix4Identity
projection.m11 = (2 * (float)(cameraMatrix[0])) / -(ImageWidth*0.5)
projection.m12 = (-2 * (float)(cameraMatrix[1])) / (ImageWidth*0.5)
projection.m13 = (width - (2 * Float(cameraMatrix[2]))) / (ImageWidth*0.5)
projection.m22 = (2 * (float)(cameraMatrix[4])) / (ImageHeight*0.5)
projection.m23 = (-height + (2 * (float)(cameraMatrix[5]))) / (ImageHeight*0.5)
projection.m33 = (-far - near) / (far - near)
projection.m34 = (-2 * far * near) / (far - near)
projection.m43 = -1
projection.m44 = 0
being far and near the z clipping planes.
I also had to correct the box initial position to center it on the marker.

Camera pose estimation from homography or with solvePnP() function

I'm trying to build static augmented reality scene over a photo with 4 defined correspondences between coplanar points on a plane and image.
Here is a step by step flow:
User adds an image using device's camera. Let's assume it contains a rectangle captured with some perspective.
User defines physical size of the rectangle, which lies in horizontal plane (YOZ in terms of SceneKit). Let's assume it's center is world's origin (0, 0, 0), so we can easily find (x,y,z) for each corner.
User defines uv coordinates in image coordinate system for each corner of the rectangle.
SceneKit scene is created with a rectangle of the same size, and visible at the same perspective.
Other nodes can be added and moved in the scene.
I've also measured position of the iphone camera relatively to the center of the A4 paper. So for this shot the position was (0, 14, 42.5) measured in cm. Also my iPhone was slightly pitched to the table (5-10 degrees)
Using this data I've set up SCNCamera to get the desired perspective of the blue plane on the third image:
let camera = SCNCamera()
camera.xFov = 66
camera.zFar = 1000
camera.zNear = 0.01
cameraNode.camera = camera
cameraAngle = -7 * CGFloat.pi / 180
cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: Float(cameraAngle))
cameraNode.position = SCNVector3(x: 0, y: 14, z: 42.5)
This will give me a reference to compare my result with.
In order to build AR with SceneKit I need to:
Adjust SCNCamera's fov, so that it matches real camera's fov.
Calculate position and rotation for camera node using 4 correnspondensies between world points (x,0,z) and image points (u, v)
H - homography; K - Intrinsic matrix; [R | t] - Extrinsic matrix
I tried two approaches in order to find transform matrix for camera: using solvePnP from OpenCV and manual calculation from homography based on 4 coplanar points.
Manual approach:
1. Find out homography
This step is done successfully, since UV coordinates of world's origin seems to be correct.
2. Intrinsic matrix
In order to get intrinsic matrix of iPhone 6, I have used this app, which gave me the following result out of 100 images of 640*480 resolution:
Assuming that input image has 4:3 aspect ratio, I can scale the above matrix depending on resolution
I am not sure but it feels like a potential problem here. I've used cv::calibrationMatrixValues to check fovx for the calculated intrinsic matrix and the result was ~50°, while it should be close to 60°.
3. Camera pose matrix
func findCameraPose(homography h: matrix_float3x3, size: CGSize) -> matrix_float4x3? {
guard let intrinsic = intrinsicMatrix(imageSize: size),
let intrinsicInverse = intrinsic.inverse else { return nil }
let l1 = 1.0 / (intrinsicInverse * h.columns.0).norm
let l2 = 1.0 / (intrinsicInverse * h.columns.1).norm
let l3 = (l1+l2)/2
let r1 = l1 * (intrinsicInverse * h.columns.0)
let r2 = l2 * (intrinsicInverse * h.columns.1)
let r3 = cross(r1, r2)
let t = l3 * (intrinsicInverse * h.columns.2)
return matrix_float4x3(columns: (r1, r2, r3, t))
}
Result:
Since I measured the approximate position and orientation for this particular image, I know the transform matrix, which would give the expected result and it is quite different:
I am also a bit conserned about 2-3 element of reference rotation matrix, which is -9.1, while it should be close to zero instead, since there is very slight rotation.
OpenCV approach:
There is a solvePnP function in OpenCV for this kind of problems, so I tried to use it instead of reinventing the wheel.
OpenCV in Objective-C++:
typedef struct CameraPose {
SCNVector4 rotationVector;
SCNVector3 translationVector;
} CameraPose;
+ (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {
vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];
cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
cv::Mat rvec(3,1,cv::DataType<double>::type);
cv::Mat tvec(3,1,cv::DataType<double>::type);
cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];
cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);
SCNVector4 rotationVector = SCNVector4Make(rvec.at<double>(0), rvec.at<double>(1), rvec.at<double>(2), norm(rvec));
SCNVector3 translationVector = SCNVector3Make(tvec.at<double>(0), tvec.at<double>(1), tvec.at<double>(2));
CameraPose result = CameraPose{rotationVector, translationVector};
return result;
}
+ (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array withSize: (CGSize) size {
vector<Point2f> points;
for (NSValue * value in array) {
CGPoint point = [value CGPointValue];
points.push_back(Point2f(point.x - size.width/2, point.y - size.height/2));
}
return points;
}
+ (vector<Point3f>) convertObjectPoints: (NSArray<NSValue *> *) array {
vector<Point3f> points;
for (NSValue * value in array) {
CGPoint point = [value CGPointValue];
points.push_back(Point3f(point.x, 0.0, -point.y));
}
return points;
}
+ (cv::Mat) intrinsicMatrixWithImageSize: (CGSize) imageSize {
double f = 0.84 * max(imageSize.width, imageSize.height);
Mat result(3,3,cv::DataType<double>::type);
cv::setIdentity(result);
result.at<double>(0) = f;
result.at<double>(4) = f;
return result;
}
Usage in Swift:
func testSolvePnP() {
let source = modelPoints().map { NSValue(cgPoint: $0) }
let destination = perspectivePicker.currentPerspective.map { NSValue(cgPoint: $0)}
let cameraPose = CameraPoseDetector.findCameraPose(source, imagePoints: destination, size: backgroundImageView.size);
cameraNode.rotation = cameraPose.rotationVector
cameraNode.position = cameraPose.translationVector
}
Output:
The result is better but far from my expectations.
Some other things I've also tried:
This question is very similar, though I don't understand how the accepted answer is working without intrinsics.
decomposeHomographyMat also didn't give me the result I expected
I am really stuck with this issue so any help would be much appreciated.
Actually I was one step away from the working solution with OpenCV.
My problem with second approach was that I forgot to convert the output from solvePnP back to SpriteKit's coordinate system.
Note that the input (image and world points) was actually converted correctly to OpenCV coordinate system (convertObjectPoints: and convertImagePoints:withSize: methods)
So here is a fixed findCameraPose method with some comments and intermediate results printed:
+ (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {
vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];
std::cout << "object points: " << cvObjectPoints << std::endl;
std::cout << "image points: " << cvImagePoints << std::endl;
cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
cv::Mat rvec(3,1,cv::DataType<double>::type);
cv::Mat tvec(3,1,cv::DataType<double>::type);
cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];
cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);
std::cout << "rvec: " << rvec << std::endl;
std::cout << "tvec: " << tvec << std::endl;
std::vector<cv::Point2f> projectedPoints;
cvObjectPoints.push_back(Point3f(0.0, 0.0, 0.0));
cv::projectPoints(cvObjectPoints, rvec, tvec, cameraMatrix, distCoeffs, projectedPoints);
for(unsigned int i = 0; i < projectedPoints.size(); ++i) {
std::cout << "Image point: " << cvImagePoints[i] << " Projected to " << projectedPoints[i] << std::endl;
}
cv::Mat RotX(3, 3, cv::DataType<double>::type);
cv::setIdentity(RotX);
RotX.at<double>(4) = -1; //cos(180) = -1
RotX.at<double>(8) = -1;
cv::Mat R;
cv::Rodrigues(rvec, R);
R = R.t(); // rotation of inverse
Mat rvecConverted;
Rodrigues(R, rvecConverted); //
std::cout << "rvec in world coords:\n" << rvecConverted << std::endl;
rvecConverted = RotX * rvecConverted;
std::cout << "rvec scenekit :\n" << rvecConverted << std::endl;
Mat tvecConverted = -R * tvec;
std::cout << "tvec in world coords:\n" << tvecConverted << std::endl;
tvecConverted = RotX * tvecConverted;
std::cout << "tvec scenekit :\n" << tvecConverted << std::endl;
SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));
return CameraPose{rotationVector, translationVector};
}
Notes:
RotX matrix means rotation by 180 degrees around x axis, which will transform any vector from OpenCV coordinate system to SpriteKit's
Rodrigues method transforms rotation vector to rotation matrix (3x3) and vice versa

Augmented Reality iOS application tracking issue

I am able to detect markers, identify markers and initialise OpenGL objects on screen. The issue I'm having is overlaying them on top of the markers position in the camera world. My camera is calibrated best I can using this method Iphone 6 camera calibration for OpenCV. I feel there is an issue with my cameras projection matrix, I create it as follows:
-(void)buildProjectionMatrix:
(Matrix33)cameraMatrix:
(int)screen_width:
(int)screen_height:
(Matrix44&) projectionMatrix
{
float near = 0.01; // Near clipping distance
float far = 100; // Far clipping distance
// Camera parameters
float f_x = cameraMatrix.data[0]; // Focal length in x axis
float f_y = cameraMatrix.data[4]; // Focal length in y axis
float c_x = cameraMatrix.data[2]; // Camera primary point x
float c_y = cameraMatrix.data[5]; // Camera primary point y
std::cout<<"fx "<<f_x<<" fy "<<f_y<<" cx "<<c_x<<" cy "<<c_y<<std::endl;
std::cout<<"width "<<screen_width<<" height "<<screen_height<<std::endl;
projectionMatrix.data[0] = - 2.0 * f_x / screen_width;
projectionMatrix.data[1] = 0.0;
projectionMatrix.data[2] = 0.0;
projectionMatrix.data[3] = 0.0;
projectionMatrix.data[4] = 0.0;
projectionMatrix.data[5] = 2.0 * f_y / screen_height;
projectionMatrix.data[6] = 0.0;
projectionMatrix.data[7] = 0.0;
projectionMatrix.data[8] = 2.0 * c_x / screen_width - 1.0;
projectionMatrix.data[9] = 2.0 * c_y / screen_height - 1.0;
projectionMatrix.data[10] = -( far+near ) / ( far - near );
projectionMatrix.data[11] = -1.0;
projectionMatrix.data[12] = 0.0;
projectionMatrix.data[13] = 0.0;
projectionMatrix.data[14] = -2.0 * far * near / ( far - near );
projectionMatrix.data[15] = 0.0;
}
This is the method to estimate the position of the marker:
void MarkerDetector::estimatePosition(std::vector<Marker>& detectedMarkers)
{
for (size_t i=0; i<detectedMarkers.size(); i++)
{
Marker& m = detectedMarkers[i];
cv::Mat Rvec;
cv::Mat_<float> Tvec;
cv::Mat raux,taux;
cv::solvePnP(m_markerCorners3d, m.points, camMatrix, distCoeff,raux,taux);
raux.convertTo(Rvec,CV_32F);
taux.convertTo(Tvec ,CV_32F);
cv::Mat_<float> rotMat(3,3);
cv::Rodrigues(Rvec, rotMat);
// Copy to transformation matrix
for (int col=0; col<3; col++)
{
for (int row=0; row<3; row++)
{
m.transformation.r().mat[row][col] = rotMat(row,col); // Copy rotation component
}
m.transformation.t().data[col] = Tvec(col); // Copy translation component
}
// Since solvePnP finds camera location, w.r.t to marker pose, to get marker pose w.r.t to the camera we invert it.
m.transformation = m.transformation.getInverted();
}
}
The OpenGL shape is able to track and account for size and roation, but something is going wrong with the translation. If the camera is turned 90 degrees, the opengl shape swings around 90 degrees about the centre of the marker. Its almost as if I am translating before rotating, but I am not.
See video for issue:
https://vid.me/fLvX
I guess you can have some problem with projecting the 3-D modelpoints. Essentially, solvePnP gives a transformation that brings points from the model coordinate system to the camera coordinate system and this is composed of a rotation and translation vector (output of solvePnP):
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec)
At this point you are able to project model points onto the image plane
std::vector<cv::Vec2d> imagePointsRP; // Reprojected image points
cv::projectPoints(objectPoints, rvec, tvec, cameraMatrix, distCoeffs, imagePointsRP);
Now, you should only draw the points of imagePointsRP over the incoming image and if the pose estimation was correct then you'll see the reprojected corners over the corners of the marker
Anyway, the matrices of model TO camera and camera TO model direction can be composed as below:
cv::Mat rmat
cv::Rodrigues(rvec, rmat); // mRmat is 3x3
cv::Mat modelToCam = cv::Mat::eye(4, 4, CV_64FC1);
modelToCam(cv::Range(0, 3), cv::Range(0, 3)) = rmat * 1.0;
modelToCam(cv::Range(0, 3), cv::Range(3, 4)) = tvec * 1.0;
cv::Mat camToModel = cv::Mat::eye(4, 4, CV_64FC1);
cv::Mat rmatinv = rmat.t(); // rotation of inverse
cv::Mat tvecinv = -rmatinv * tvec; // translation of inverse
camToModel(cv::Range(0, 3), cv::Range(0, 3)) = rmatinv * 1.0;
camToModel(cv::Range(0, 3), cv::Range(3, 4)) = tvecinv * 1.0;
In any case, it's also useful to estimate reprojection error and discard the poses with high error (remember, the PnP problem has only unique solution if n=4 and these points are coplanar):
double totalErr = 0.0;
for (size_t i = 0; i < imagePoints.size(); i++)
{
double err = cv::norm(cv::Mat(imagePoints[i]), cv::Mat(imagePointsRP[i]), cv::NORM_L2);
totalErr += err*err;
}
totalErr = std::sqrt(totalErr / imagePoints.size());

Detect semicircle in OpenCV

I am trying to detect full circles and semicircles in an image.
I am following the below mentioned procedure:
Process image (including Canny edge detection).
Find contours and draw them on an empty image, so that I can eliminate unwanted components
(The processed image is exactly what I want).
Detect circles using HoughCircles. And, this is what I get:
I tried varying the parameters in HoughCircles but the results are not consistent as it varies based on lighting and the position of circles in the image.
I accept or reject a circle based on its size. So, the result is not acceptable. Also, I have a long list of "acceptable" circles. So, I need some allowance in the HoughCircle params.
As for the full circles, it's easy - I can simply find the "roundness" of the contour. The problem is semicircles!
Please find the edited image before Hough transform
Use houghCircle directly on your image, don't extract edges first.
Then test for each detected circle, how much percentage is really present in the image:
int main()
{
cv::Mat color = cv::imread("../houghCircles.png");
cv::namedWindow("input"); cv::imshow("input", color);
cv::Mat canny;
cv::Mat gray;
/// Convert it to gray
cv::cvtColor( color, gray, CV_BGR2GRAY );
// compute canny (don't blur with that image quality!!)
cv::Canny(gray, canny, 200,20);
cv::namedWindow("canny2"); cv::imshow("canny2", canny>0);
std::vector<cv::Vec3f> circles;
/// Apply the Hough Transform to find the circles
cv::HoughCircles( gray, circles, CV_HOUGH_GRADIENT, 1, 60, 200, 20, 0, 0 );
/// Draw the circles detected
for( size_t i = 0; i < circles.size(); i++ )
{
Point center(cvRound(circles[i][0]), cvRound(circles[i][1]));
int radius = cvRound(circles[i][2]);
cv::circle( color, center, 3, Scalar(0,255,255), -1);
cv::circle( color, center, radius, Scalar(0,0,255), 1 );
}
//compute distance transform:
cv::Mat dt;
cv::distanceTransform(255-(canny>0), dt, CV_DIST_L2 ,3);
cv::namedWindow("distance transform"); cv::imshow("distance transform", dt/255.0f);
// test for semi-circles:
float minInlierDist = 2.0f;
for( size_t i = 0; i < circles.size(); i++ )
{
// test inlier percentage:
// sample the circle and check for distance to the next edge
unsigned int counter = 0;
unsigned int inlier = 0;
cv::Point2f center((circles[i][0]), (circles[i][1]));
float radius = (circles[i][2]);
// maximal distance of inlier might depend on the size of the circle
float maxInlierDist = radius/25.0f;
if(maxInlierDist<minInlierDist) maxInlierDist = minInlierDist;
//TODO: maybe paramter incrementation might depend on circle size!
for(float t =0; t<2*3.14159265359f; t+= 0.1f)
{
counter++;
float cX = radius*cos(t) + circles[i][0];
float cY = radius*sin(t) + circles[i][1];
if(dt.at<float>(cY,cX) < maxInlierDist)
{
inlier++;
cv::circle(color, cv::Point2i(cX,cY),3, cv::Scalar(0,255,0));
}
else
cv::circle(color, cv::Point2i(cX,cY),3, cv::Scalar(255,0,0));
}
std::cout << 100.0f*(float)inlier/(float)counter << " % of a circle with radius " << radius << " detected" << std::endl;
}
cv::namedWindow("output"); cv::imshow("output", color);
cv::imwrite("houghLinesComputed.png", color);
cv::waitKey(-1);
return 0;
}
For this input:
It gives this output:
The red circles are Hough results.
The green sampled dots on the circle are inliers.
The blue dots are outliers.
Console output:
100 % of a circle with radius 27.5045 detected
100 % of a circle with radius 25.3476 detected
58.7302 % of a circle with radius 194.639 detected
50.7937 % of a circle with radius 23.1625 detected
79.3651 % of a circle with radius 7.64853 detected
If you want to test RANSAC instead of Hough, have a look at this.
Here is another way to do it, a simple RANSAC version (much optimization to be done to improve speed), that works on the Edge Image.
the method loops these steps until it is cancelled
choose randomly 3 edge pixel
estimate circle from them (3 points are enough to identify a circle)
verify or falsify that it's really a circle: count how much percentage of the circle is represented by the given edges
if a circle is verified, remove the circle from input/egdes
int main()
{
//RANSAC
//load edge image
cv::Mat color = cv::imread("../circleDetectionEdges.png");
// convert to grayscale
cv::Mat gray;
cv::cvtColor(color, gray, CV_RGB2GRAY);
// get binary image
cv::Mat mask = gray > 0;
//erode the edges to obtain sharp/thin edges (undo the blur?)
cv::erode(mask, mask, cv::Mat());
std::vector<cv::Point2f> edgePositions;
edgePositions = getPointPositions(mask);
// create distance transform to efficiently evaluate distance to nearest edge
cv::Mat dt;
cv::distanceTransform(255-mask, dt,CV_DIST_L1, 3);
//TODO: maybe seed random variable for real random numbers.
unsigned int nIterations = 0;
char quitKey = 'q';
std::cout << "press " << quitKey << " to stop" << std::endl;
while(cv::waitKey(-1) != quitKey)
{
//RANSAC: randomly choose 3 point and create a circle:
//TODO: choose randomly but more intelligent,
//so that it is more likely to choose three points of a circle.
//For example if there are many small circles, it is unlikely to randomly choose 3 points of the same circle.
unsigned int idx1 = rand()%edgePositions.size();
unsigned int idx2 = rand()%edgePositions.size();
unsigned int idx3 = rand()%edgePositions.size();
// we need 3 different samples:
if(idx1 == idx2) continue;
if(idx1 == idx3) continue;
if(idx3 == idx2) continue;
// create circle from 3 points:
cv::Point2f center; float radius;
getCircle(edgePositions[idx1],edgePositions[idx2],edgePositions[idx3],center,radius);
float minCirclePercentage = 0.4f;
// inlier set unused at the moment but could be used to approximate a (more robust) circle from alle inlier
std::vector<cv::Point2f> inlierSet;
//verify or falsify the circle by inlier counting:
float cPerc = verifyCircle(dt,center,radius, inlierSet);
if(cPerc >= minCirclePercentage)
{
std::cout << "accepted circle with " << cPerc*100.0f << " % inlier" << std::endl;
// first step would be to approximate the circle iteratively from ALL INLIER to obtain a better circle center
// but that's a TODO
std::cout << "circle: " << "center: " << center << " radius: " << radius << std::endl;
cv::circle(color, center,radius, cv::Scalar(255,255,0),1);
// accept circle => remove it from the edge list
cv::circle(mask,center,radius,cv::Scalar(0),10);
//update edge positions and distance transform
edgePositions = getPointPositions(mask);
cv::distanceTransform(255-mask, dt,CV_DIST_L1, 3);
}
cv::Mat tmp;
mask.copyTo(tmp);
// prevent cases where no fircle could be extracted (because three points collinear or sth.)
// filter NaN values
if((center.x == center.x)&&(center.y == center.y)&&(radius == radius))
{
cv::circle(tmp,center,radius,cv::Scalar(255));
}
else
{
std::cout << "circle illegal" << std::endl;
}
++nIterations;
cv::namedWindow("RANSAC"); cv::imshow("RANSAC", tmp);
}
std::cout << nIterations << " iterations performed" << std::endl;
cv::namedWindow("edges"); cv::imshow("edges", mask);
cv::namedWindow("color"); cv::imshow("color", color);
cv::imwrite("detectedCircles.png", color);
cv::waitKey(-1);
return 0;
}
float verifyCircle(cv::Mat dt, cv::Point2f center, float radius, std::vector<cv::Point2f> & inlierSet)
{
unsigned int counter = 0;
unsigned int inlier = 0;
float minInlierDist = 2.0f;
float maxInlierDistMax = 100.0f;
float maxInlierDist = radius/25.0f;
if(maxInlierDist<minInlierDist) maxInlierDist = minInlierDist;
if(maxInlierDist>maxInlierDistMax) maxInlierDist = maxInlierDistMax;
// choose samples along the circle and count inlier percentage
for(float t =0; t<2*3.14159265359f; t+= 0.05f)
{
counter++;
float cX = radius*cos(t) + center.x;
float cY = radius*sin(t) + center.y;
if(cX < dt.cols)
if(cX >= 0)
if(cY < dt.rows)
if(cY >= 0)
if(dt.at<float>(cY,cX) < maxInlierDist)
{
inlier++;
inlierSet.push_back(cv::Point2f(cX,cY));
}
}
return (float)inlier/float(counter);
}
inline void getCircle(cv::Point2f& p1,cv::Point2f& p2,cv::Point2f& p3, cv::Point2f& center, float& radius)
{
float x1 = p1.x;
float x2 = p2.x;
float x3 = p3.x;
float y1 = p1.y;
float y2 = p2.y;
float y3 = p3.y;
// PLEASE CHECK FOR TYPOS IN THE FORMULA :)
center.x = (x1*x1+y1*y1)*(y2-y3) + (x2*x2+y2*y2)*(y3-y1) + (x3*x3+y3*y3)*(y1-y2);
center.x /= ( 2*(x1*(y2-y3) - y1*(x2-x3) + x2*y3 - x3*y2) );
center.y = (x1*x1 + y1*y1)*(x3-x2) + (x2*x2+y2*y2)*(x1-x3) + (x3*x3 + y3*y3)*(x2-x1);
center.y /= ( 2*(x1*(y2-y3) - y1*(x2-x3) + x2*y3 - x3*y2) );
radius = sqrt((center.x-x1)*(center.x-x1) + (center.y-y1)*(center.y-y1));
}
std::vector<cv::Point2f> getPointPositions(cv::Mat binaryImage)
{
std::vector<cv::Point2f> pointPositions;
for(unsigned int y=0; y<binaryImage.rows; ++y)
{
//unsigned char* rowPtr = binaryImage.ptr<unsigned char>(y);
for(unsigned int x=0; x<binaryImage.cols; ++x)
{
//if(rowPtr[x] > 0) pointPositions.push_back(cv::Point2i(x,y));
if(binaryImage.at<unsigned char>(y,x) > 0) pointPositions.push_back(cv::Point2f(x,y));
}
}
return pointPositions;
}
input:
output:
console output:
press q to stop
accepted circle with 50 % inlier
circle: center: [358.511, 211.163] radius: 193.849
accepted circle with 85.7143 % inlier
circle: center: [45.2273, 171.591] radius: 24.6215
accepted circle with 100 % inlier
circle: center: [257.066, 197.066] radius: 27.819
circle illegal
30 iterations performed`
optimization should include:
use all inlier to fit a better circle
dont compute distance transform after each detected circles (it's quite expensive). compute inlier from point/edge set directly and remove the inlier edges from that list.
if there are many small circles in the image (and/or a lot of noise), it's unlikely to hit randomly 3 edge pixels or a circle. => try contour detection first and detect circles for each contour. after that try to detect all "other" circles left in the image.
a lot of other stuff
#Micka's answer is great, here it is roughly translated into python
The method takes a thresholded image mask as an argument, leaving that part as an exercise for the reader
def get_circle_percentages(image):
#compute distance transform of image
dist = cv2.distanceTransform(image, cv2.DIST_L2, 0)
rows = image.shape[0]
circles = cv2.HoughCircles(image, cv2.HOUGH_GRADIENT, 1, rows / 8, 50, param1=50, param2=10, minRadius=40, maxRadius=90)
minInlierDist = 2.0
for c in circles[0, :]:
counter = 0
inlier = 0
center = (c[0], c[1])
radius = c[2]
maxInlierDist = radius/25.0
if maxInlierDist < minInlierDist: maxInlierDist = minInlierDist
for i in np.arange(0, 2*np.pi, 0.1):
counter += 1
x = center[0] + radius * np.cos(i)
y = center[1] + radius * np.sin(i)
if dist.item(int(y), int(x)) < maxInlierDist:
inlier += 1
print(str(100.0*inlier/counter) + ' percent of a circle with radius ' + str(radius) + " detected")
I know that it's little bit late, but I used different approach which is much easier.
From the cv2.HoughCircles(...) you get centre of the circle and the diameter (x,y,r). So I simply go through all centre points of the circles and I check if they are further away from the edge of the image than their diameter.
Here is my code:
height, width = img.shape[:2]
#test top edge
up = (circles[0, :, 0] - circles[0, :, 2]) >= 0
#test left edge
left = (circles[0, :, 1] - circles[0, :, 2]) >= 0
#test right edge
right = (circles[0, :, 0] + circles[0, :, 2]) <= width
#test bottom edge
down = (circles[0, :, 1] + circles[0, :, 2]) <= height
circles = circles[:, (up & down & right & left), :]
The semicircle detected by the hough algorithm is most probably correct. The issue here might be that unless you strictly control the geometry of the scene, i.e. exact position of the camera relative to the target, so that the image axis is normal to the target plane, you will get ellipsis rather than circles projected on the image plane. Not to mention the distortions caused by the optical system, which further degenerate the geometric figure. If you rely on precision here, I would recommend camera calibration.
You better try with different kernel for gaussian blur.That will help you
GaussianBlur( src_gray, src_gray, Size(11, 11), 5,5);
so change size(i,i),j,j)

Resources