OpenCV + OpenGL: proper camera pose using solvePnP - ios

I've got problem with obtaining proper camera pose from iPad camera using OpenCV.
I'm using custom made 2D marker (based on AruCo library ) - I want to render 3D cube over that marker using OpenGL.
In order to recieve camera pose I'm using solvePnP function from OpenCV.
According to THIS LINK I'm doing it like this:
cv::solvePnP(markerObjectPoints, imagePoints, [self currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
tvec.at<double>(0, 0) *= -1; // I don't know why I have to do it, but translation in X axis is inverted
cv::Mat R;
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R * tvec; // translation of inverse
cv::Mat T(4, 4, R.type()); // T is 4x4
T(cv::Range(0, 3), cv::Range(0, 3)) = R * 1; // copies R into T
T(cv::Range(0, 3), cv::Range(3, 4)) = tvec * 1; // copies tvec into T
double *p = T.ptr<double>(3);
p[0] = p[1] = p[2] = 0;
p[3] = 1;
camera matrix & dist coefficients are coming from findChessboardCorners function, imagePoints are manually detected corners of marker (you can see them as green square in the video posted below), and markerObjectPoints are manually hardcoded points that represents marker corners:
markerObjectPoints.push_back(cv::Point3d(-6, -6, 0));
markerObjectPoints.push_back(cv::Point3d(6, -6, 0));
markerObjectPoints.push_back(cv::Point3d(6, 6, 0));
markerObjectPoints.push_back(cv::Point3d(-6, 6, 0));
Because marker is 12 cm long in real world, I've chosed the same size in the for easier debugging.
As a result I'm recieving 4x4 matrix T, that I'll use as ModelView matrix in OpenCV.
Using GLKit drawing function looks more or less like this:
- (void)glkView:(GLKView *)view drawInRect:(CGRect)rect {
// preparations
glClearColor(0.0, 0.0, 0.0, 0.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
float aspect = fabsf(self.bounds.size.width / self.bounds.size.height);
effect.transform.projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(39), aspect, 0.1f, 1000.0f);
// set modelViewMatrix
float mat[16] = generateOpenGLMatFromFromOpenCVMat(T);
currentModelMatrix = GLKMatrix4MakeWithArrayAndTranspose(mat);
effect.transform.modelviewMatrix = currentModelMatrix;
[effect prepareToDraw];
glDrawArrays(GL_TRIANGLES, 0, 36); // draw previously prepared cube
}
I'm not rotating everything for 180 degrees around X axis (as it was mentioned in previously linked article), because I doesn't look as necessary.
The problem is that it doesn't work! Translation vector looks OK, but X and Y rotations are messed up :(
I've recorded a video presenting that issue:
http://www.youtube.com/watch?v=EMNBT5H7-os
I've tried almost everything (including inverting all axises one by one), but nothing actually works.
What should I do? How should I properly display that 3D cube? Translation / rotation vectors that come from solvePnP are looking reasonable, so I guess that I can't correctly map these vectors to OpenGL matrices.

Thanks to Djo1509 from http://answers.opencv.org/ I've found out that the problem was unnecessary transposed rotation matrix R matrix used as part of matrix T, and unnecessary tvec = -R * tvec operation.
For more info look there

Related

Shadow Mapping - Space Transformations are going bad

I am currently studying shadow mapping, and my biggest issue right now is the transformations between spaces. This is my current working theory/steps.
Pass 1:
Get depth of pixel from camera, store in depth buffer
Get depth of pixel from light, store in another buffer
Pass 2:
Use texture coordinate to sample camera's depth buffer at current pixel
Convert that depth to a view space position by multiplying the projection coordinate with invProj matrix. (also do a perspective divide).
Take that view position and multiply by invV (camera's inverse view) to get a world space position
Multiply world space position by light's viewProjection matrix.
Perspective divide that projection-space coordinate, and manipulate into [0..1] to sample from light depth buffer.
Get current depth from light and closest (sampled) depth, if current depth > closest depth, it's in shadow.
Shader Code
Pass1:
PS_INPUT vs(VS_INPUT input) {
output.pos = mul(input.vPos, mvp);
output.cameraDepth = output.pos.zw;
..
float4 vPosInLight = mul(input.vPos, m);
vPosInLight = mul(vPosInLight, light.viewProj);
output.lightDepth = vPosInLight.zw;
}
PS_OUTPUT ps(PS_INPUT input){
float cameraDepth = input.cameraDepth.x / input.cameraDepth.y;
//Bundle cameraDepth in alpha channel of a normal map.
output.normal = float4(input.normal, cameraDepth);
//4 Lights in total -- although only 1 is active right now. Going to use r/g/b/a for each light depth.
output.lightDepths.r = input.lightDepth.x / input.lightDepth.y;
}
Pass 2 (Screen Quad):
float4 ps(PS_INPUT input) : SV_TARGET{
float4 pixelPosView = depthToViewSpace(input.texCoord);
..
float4 pixelPosWorld = mul(pixelPosView, invV);
float4 pixelPosLight = mul(pixelPosWorld, light.viewProj);
float shadow = shadowCalc(pixelPosLight);
//For testing / visualisation
return float4(shadow,shadow,shadow,1);
}
float4 depthToViewSpace(float2 xy) {
//Get pixel depth from camera by sampling current texcoord.
//Extract the alpha channel as this holds the depth value.
//Then, transform from [0..1] to [-1..1]
float z = (_normal.Sample(_sampler, xy).a) * 2 - 1;
float x = xy.x * 2 - 1;
float y = (1 - xy.y) * 2 - 1;
float4 vProjPos = float4(x, y, z, 1.0f);
float4 vPositionVS = mul(vProjPos, invP);
vPositionVS = float4(vPositionVS.xyz / vPositionVS.w,1);
return vPositionVS;
}
float shadowCalc(float4 pixelPosL) {
//Transform pixelPosLight from [-1..1] to [0..1]
float3 projCoords = (pixelPosL.xyz / pixelPosL.w) * 0.5 + 0.5;
float closestDepth = _lightDepths.Sample(_sampler, projCoords.xy).r;
float currentDepth = projCoords.z;
return currentDepth > closestDepth; //Supposed to have bias, but for now I just want shadows working haha
}
CPP Matrices
// (Position, LookAtPos, UpDir)
auto lightView = XMMatrixLookAtLH(XMLoadFloat4(&pos4), XMVectorSet(0,0,0,1), XMVectorSet(0,1,0,0));
// (FOV, AspectRatio (1000/680), NEAR, FAR)
auto lightProj = XMMatrixPerspectiveFovLH(1.57f , 1.47f, 0.01f, 10.0f);
XMStoreFloat4x4(&_cLightBuffer.light.viewProj, XMMatrixTranspose(XMMatrixMultiply(lightView, lightProj)));
Current Outputs
White signifies that a shadow should be projected there. Black indicates no shadow.
CameraPos (0, 2.5, -2)
CameraLookAt (0, 0, 0)
CameraFOV (1.57)
CameraNear (0.01)
CameraFar (10.0)
LightPos (0, 2.5, -2)
LightLookAt (0, 0, 0)
LightFOV (1.57)
LightNear (0.01)
LightFar (10.0)
If I change the CameraPosition to be (0, 2.5, 2), basically just flipped on the Z axis, this is the result.
Obviously a shadow shouldn't change its projection depending on where the observer is, so I think I'm making a mistake with the invV. But I really don't know for sure. I've debugged the light's projView matrix, and the values seem correct - going from CPU to GPU. It's also entirely possible I've misunderstood some theory along the way because this is quite a tricky technique for me.
Aha! Found my problem. It was a silly mistake, I was calculating the depth of pixels from each light, but storing them in a texture that was based on the view of the camera. The following image should explain my mistake better than I can with words.
For future reference, the solution I decided was to scrap my idea for storing light depths in texture channels. Instead, I basically make a new pass for each light, and bind a unique depth-stencil texture to render the geometry to. When I want to do light calculations, I bind each of the depth textures to a shader resource slot and go from there. Obviously this doesn't scale well with many lights, but for my student project where I'm only required to have 2 shadow casters, it suffices.
_context->DrawIndexed(indexCount, 0, 0); //Draw to regular render target
_sunlight->use(1, _context); //Use sunlight shader (basically just runs a Vertex Shader & Null Pixel shader so depth can be written to depth map)
_sunlight->bindDSVSetNullRenderTarget(_context);
_context->DrawIndexed(indexCount, 0, 0); //Draw to sunlight depth target
bindDSVSetNullRenderTarget(ctx){
ID3D11RenderTargetView* nullrv = { nullptr };
ctx->OMSetRenderTargets(1, &nullrv, _sunlightDepthStencilView);
}
//The purpose of setting a null render target before doing the draw call is
//that a draw call with only a depth target bound is much faster.
//(At least I believe so, from my reading online)

Pose Estimation - cv::SolvePnP with Scenekit - Coordinate System Question

I have been working on Pose Estimation (rectifying key points on a 3D model with 2D points on an image to match pose) via OpenCV's cv::solvePNP, using features / key points from Apples Vision framework.
TL-DR:
My scene kit model is being translated and the units look correct when introspecting the translation and rotation vectors from solvePnP (ie, they are the right order of magnitude), but the coordinate system of the translation appears off:
I am trying to understand the coordinate system requirements with solvePnP wrt to Metal / OpenGL coordinate system and my camera projection matrix.
What 'projectionMatrix' does my SCNCamera require to match image based coordinate system passed into solvePnP?
Some things ive read / believe I am taking into account.
OpenCV vs OpenGL (thus Metal) have row major vs column major differences.
OpenCV's coordinate system for 3D is different than OpenGL (thus Metal).
Longer with code:
My workflow is as such:
Step 1 - use a 3D model tool to introspect points on my 3D model and get the objects vertex positions for the major key points in the 2D detected features. I am using left pupil, right pupil, tip of nose, tip of chin, left outer lip corner, right outer lip corner.
Step 2 - Run a vision request and extract a list of points in image space (converting for OpenCV's top left coordinate system) and extract the same ordered list of 2D points.
Step 3 - Construct a camera matrix by using the size of the input image.
Step 4 - run cv::solvePnP, and then use cv::Rodrigues to convert the rotation vector to a matrix
Step 5 - Convert the coordinate system of the resulting transforms into something appropriate for the GPU - invert the y and z axis and combine the translation and rotation to a single 4x4 Matrix, and then transpose it for the appropriate major ness of OpenGL / Metal
Step 6 - apply the resulting transform to Scenekit via:
let faceNodeTransform = openCVWrapper.transform(for: landmarks, imageSize: size)
self.destinationView.pointOfView?.transform = SCNMatrix4Invert(faceNodeTransform)
Below is my Obj-C++ OpenCV Wrapper which takes in a subset of Vision Landmarks and the true pixel size of the image being looked at:
/ https://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
- (SCNMatrix4) transformFor:(VNFaceLandmarks2D*)landmarks imageSize:(CGSize)imageSize
{
// 1 convert landmarks to image points in image space (pixels) to vector of cv::Point2f's :
// Note that this translates the point coordinate system to be top left oriented for OpenCV's image coordinates:
std::vector<cv::Point2f > imagePoints = [self imagePointsForLandmarks:landmarks imageSize:imageSize];
// 2 Load Model Points
std::vector<cv::Point3f > modelPoints = [self modelPoints];
// 3 create our camera extrinsic matrix
// TODO - see if this is sane?
double max_d = fmax(imageSize.width, imageSize.height);
cv::Mat cameraMatrix = (cv::Mat_<double>(3,3) << max_d, 0, imageSize.width/2.0,
0, max_d, imageSize.height/2.0,
0, 0, 1.0);
// 4 Run solvePnP
double distanceCoef[] = {0,0,0,0};
cv::Mat distanceCoefMat = cv::Mat(1 ,4 ,CV_64FC1,distanceCoef);
// Output Matrixes
std::vector<double> rv(3);
cv::Mat rotationOut = cv::Mat(rv);
std::vector<double> tv(3);
cv::Mat translationOut = cv::Mat(tv);
cv::solvePnP(modelPoints, imagePoints, cameraMatrix, distanceCoefMat, rotationOut, translationOut, false, cv::SOLVEPNP_EPNP);
// 5 Convert rotation matrix (actually a vector)
// To a real 4x4 rotation matrix:
cv::Mat viewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::Mat rotation;
cv::Rodrigues(rotationOut, rotation);
// Append our transforms to our matrix and set final to identity:
for(unsigned int row=0; row<3; ++row)
{
for(unsigned int col=0; col<3; ++col)
{
viewMatrix.at<double>(row, col) = rotation.at<double>(row, col);
}
viewMatrix.at<double>(row, 3) = translationOut.at<double>(row, 0);
}
viewMatrix.at<double>(3, 3) = 1.0f;
// Transpose OpenCV to OpenGL coords
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64FC1);
cvToGl.at<double>(0, 0) = 1.0f;
cvToGl.at<double>(1, 1) = -1.0f; // Invert the y axis
cvToGl.at<double>(2, 2) = -1.0f; // invert the z axis
cvToGl.at<double>(3, 3) = 1.0f;
viewMatrix = cvToGl * viewMatrix;
// Finally transpose to get correct SCN / OpenGL Matrix :
cv::Mat glViewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::transpose(viewMatrix , glViewMatrix);
return [self convertCVMatToMatrix4:glViewMatrix];
}
- (SCNMatrix4) convertCVMatToMatrix4:(cv::Mat)matrix
{
SCNMatrix4 scnMatrix = SCNMatrix4Identity;
scnMatrix.m11 = matrix.at<double>(0, 0);
scnMatrix.m12 = matrix.at<double>(0, 1);
scnMatrix.m13 = matrix.at<double>(0, 2);
scnMatrix.m14 = matrix.at<double>(0, 3);
scnMatrix.m21 = matrix.at<double>(1, 0);
scnMatrix.m22 = matrix.at<double>(1, 1);
scnMatrix.m23 = matrix.at<double>(1, 2);
scnMatrix.m24 = matrix.at<double>(1, 3);
scnMatrix.m31 = matrix.at<double>(2, 0);
scnMatrix.m32 = matrix.at<double>(2, 1);
scnMatrix.m33 = matrix.at<double>(2, 2);
scnMatrix.m34 = matrix.at<double>(2, 3);
scnMatrix.m41 = matrix.at<double>(3, 0);
scnMatrix.m42 = matrix.at<double>(3, 1);
scnMatrix.m43 = matrix.at<double>(3, 2);
scnMatrix.m44 = matrix.at<double>(3, 3);
return (scnMatrix);
}
Some questions:
An SCNNode has no modelViewMatrix (just as I understand it, a transform, which is the modelMatrix) to just throw a matrix at - so I've read the inverse of the transform from SolvePNP process can be used to pose the camera instead, which appears to get me the closes result. I want to ensure this approach is correct.
If I have the modelViewMatrix, and the projectionMatrix, I should be able to calculate the appropriate modelMatrix? Is this the approach I should be taking?
Its unclear to me what projectionMatrix I should be using for my SceneKit Scene and If that has any bearing on my results. Do I need a pixel for pixel exact match of my viewport to the image size, and how do I properly configure my SCNCamera to ensure coordinate system agreeance for SolvePnP?
Thank you very much!

How do I get pixels per unit length after calling calibrateCamera?

I'm calibrating a camera using a grid of circles. The camera is in a fixed location above a table so I'm using a single image for calibration. (All the objects I’ll be working with will be flat and on the same table as my calibration image.) I'm putting the real-world locations of the circle centers into objectPoints and passing that to calibrateCamera.
Here is my calibration code (basically distilled down from the OpenCV calibration.cpp sample program to work for a single image):
int circlesPerRow = 56;
int circlesPerColumn = 32;
// The distance between circle centers is 4 cm
double centerToCenterDistance = 0.04;
Mat calibrationImage = imread(calibrationImageFileName, IMREAD_GRAYSCALE);
vector<Point2f> detectedCenters;
Size boardSize(circlesPerRow, circlesPerColumn);
bool found = findCirclesGrid(calibrationImage, boardSize, detectedCenters);
if (!found)
{
return ERR_INVALID_BOARD;
}
// Put the detected centers in the imagePoints vector
vector<vector<Point2f> > imagePoints;
imagePoints.push_back(detectedCenters);
// Set the aspect ratio to 1
Mat cameraMatrix = Mat::eye(3, 3, CV_64F);
double aspectRatio = 1.0;
cameraMatrix.at<double>(0, 0) = 1.0;
Size imageSize(calibrationImage.size());
vector<Mat> rvecs, tvecs;
Mat distCoeffs = Mat::zeros(8, 1, CV_64F);
// Create a vector of the centers in user units
vector<vector<Point3f> > objectPoints(1);
for (int i = 0; i < circlesPerColumn; i++)
for (int j = 0; j < circlesPerRow; j++)
objectPoints[0].push_back(Point3f(float(j*centerToCenterDistance), float(i*centerToCenterDistance), 0));
int flags = CALIB_FIX_ASPECT_RATIO | CALIB_FIX_K4 | CALIB_FIX_K5;
calibrateCamera(objectPoints, imagePoints, imageSize, cameraMatrix, distCoeffs, rvecs, tvecs, flags);
After calling calibrateCamera how do I calculate the number of pixels per meter on the same plane as the calibration circles in an undistorted image?
First things first, you are doing a calibration with only 1 image... it is recommended to use several images in different positions to get more accurate results, because you are calculating the intrisic parameters, if it was only the camera pose, PnP would be enough.
calibrateCamera will give you the intrinsics (camera matrix) parameters needed to project 3D points to the image plane of the camera. It will also give the Extrinsic parameters needed the origin to the camera origin (one per image given).
Once you do this calibration you can create a set of points, for example:
cv::Vec3f a(0., 0., 0.), b(1., 0., 0.);
Assuming that you are using meters in your world coordinate units, if not multiply accordingly :)
Now you have 2 options, the manual way which is apply the pin hole camera model formula to this two points, using as extrinsics the ones generated from your image that has the desired camera pose (in your case you only have one). Or you can use project points like:
// your last line
cv::calibrateCamera(objectPoints, imagePoints, imageSize, cameraMatrix, distCoeffs, rvecs, tvecs, flags);
// prepare the points
std::vector<cv::Point3f> pointsToProject{cv::Vec3f{0., 0., 0.},cv::Vec3f{0., 1., 0.}};
std::vector<cv::Point2f> projectedPoints;
// invert the extrinsic matrix
cv::Mat rotMat;
cv::rodrigues(rvecs[0], rotMat);
cv::Mat transformation = cv::Mat::eye(4,4,CV_32F);
rotMat.setTo(transformation(cv::Rect(0,0,3,3)));
transformation.at<float>(0,3) = tvecs[0][0];
transformation.at<float>(1,3) = tvecs[0][1];
transformation.at<float>(2,3) = tvecs[0][2];
transformation = transformation.inv();
// back rot and translation vectors
cv::Mat rvec, tvec(3,1,CV_32F);
cv::rodrigues(transformation(cv::Rect(0,0,3,3)), rvec);
tvec.at<float>(0) = transformation.at<float>(0,3);
tvec.at<float>(1) =transformation.at<float>(1,3);
tvec.at<float>(2) =transformation.at<float>(2,3);
cv::projectPoints(pointsToProject, rvec, tvec, cameraMatrix, distCoeffs, projectedPoints );
double amountOfPixelsPerMeter = cv::norm(projectedPoints[0]-projectedPoints[1]);
However this will give a meter distance before the extrinsics is applied, so even if it is in x axis, it may be different depending on the rotations.
I hope this helps, if not leave a comment. I wrote most of it out of my head, so it may have a typo or something.

SceneKit 3D Marker Augmented Reality iOS

Last couple of weeks I've been working on developing a simple proof-of-concept application in which a 3D model is projected over a specific Augmented Reality marker (in my case I am using Aruco markers) in IOS (with Swift and Objective-C)
I calibrated an Ipad Camera with a specific fixed lens position and used that to estimate the pose of the AR marker (which from my debug analysis seem pretty accurate). The problem seems (surprise, surprise) when I try to use SceneKit scene to project a model over the marker.
I am aware that the axis in opencv and SceneKit are different (Y and Z) and already done this correction as well as the row order/column order difference between the two libraries.
After constructing the projection matrix, I apply that same transform to the 3D model and from my debug analysis the object seems to be translated to the desired position and with the desired rotation. The problem is that it never does overlap the specific image pixel position of the marker. I am using a AVCapturePreviewVideoLayer as to put the video in background that has the same bounds as my SceneKit View.
Has anyone has a clue why this happens? I tried to play with cameras FOV's but with no real impact in the results.
Thank you all for your time.
EDIT1: I Will post some of the code here to reveal what I am currently doing.
I have two subviews inside the main view, one which is a background AVCaptureVideoPreviewLayer and another which is a SceneKitView. Both have the same bounds as the main view.
At each frame I use an opencv wrapper which outputs the pose of each marker:
std::vector<int> ids;
std::vector<std::vector<cv::Point2f>> corners, rejected;
cv::aruco::detectMarkers(frame, _dictionary, corners, ids, _detectorParams, rejected);
if (ids.size() > 0 ){
cv::aruco::drawDetectedMarkers(frame, corners, ids);
cv::Mat rvecs, tvecs;
cv::aruco::estimatePoseSingleMarkers(corners, 2.6, _intrinsicMatrix, _distCoeffs, rvecs, tvecs);
// Let's protect ourselves agains multiple markers
if (rvecs.total() > 1)
return;
_markerFound = true;
cv::Rodrigues(rvecs, _currentR);
_currentT = tvecs;
for (int row = 0; row < _currentR.rows; row++){
for (int col = 0; col < _currentR.cols; col++){
_currentExtrinsics.at<double>(row, col) = _currentR.at<double>(row, col);
}
_currentExtrinsics.at<double>(row, 3) = _currentT.at<double>(row);
}
_currentExtrinsics.at<double>(3,3) = 1;
std::cout << tvecs << std::endl;
// Convert coordinate systems of opencv to openGL (SceneKit)
// Note that in openCV z goes away the camera (in openGL goes into the camera)
// and y points down and on openGL point up
// Another note: openCV has a column order matrix representation, while SceneKit
// has a row order matrix, but we'll take care of it later.
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64F);
cvToGl.at<double>(0,0) = 1.0f;
cvToGl.at<double>(1,1) = -1.0f; // invert the y axis
cvToGl.at<double>(2,2) = -1.0f; // invert the z axis
cvToGl.at<double>(3,3) = 1.0f;
_currentExtrinsics = cvToGl * _currentExtrinsics;
cv::aruco::drawAxis(frame, _intrinsicMatrix, _distCoeffs, rvecs, tvecs, 5);
Then in each frame I convert the opencv matrix for a SCN4Matrix:
- (SCNMatrix4) transformToSceneKit:(cv::Mat&) openCVTransformation{
SCNMatrix4 mat = SCNMatrix4Identity;
// Transpose
openCVTransformation = openCVTransformation.t();
// copy the rotationRows
mat.m11 = (float) openCVTransformation.at<double>(0, 0);
mat.m12 = (float) openCVTransformation.at<double>(0, 1);
mat.m13 = (float) openCVTransformation.at<double>(0, 2);
mat.m14 = (float) openCVTransformation.at<double>(0, 3);
mat.m21 = (float)openCVTransformation.at<double>(1, 0);
mat.m22 = (float)openCVTransformation.at<double>(1, 1);
mat.m23 = (float)openCVTransformation.at<double>(1, 2);
mat.m24 = (float)openCVTransformation.at<double>(1, 3);
mat.m31 = (float)openCVTransformation.at<double>(2, 0);
mat.m32 = (float)openCVTransformation.at<double>(2, 1);
mat.m33 = (float)openCVTransformation.at<double>(2, 2);
mat.m34 = (float)openCVTransformation.at<double>(2, 3);
//copy the translation row
mat.m41 = (float)openCVTransformation.at<double>(3, 0);
mat.m42 = (float)openCVTransformation.at<double>(3, 1)+2.5;
mat.m43 = (float)openCVTransformation.at<double>(3, 2);
mat.m44 = (float)openCVTransformation.at<double>(3, 3);
return mat;
}
At each frame in which the AR marker is found I add a box to the scene and apply the transformation to the object node:
SCNBox *box = [SCNBox boxWithWidth:5.0 height:5.0 length:5.0 chamferRadius:0.0];
_boxNode = [SCNNode nodeWithGeometry:box];
if (found){
[self.delegate returnExtrinsicsMat:extrinsicMatrixOfTheMarker];
Mat R, T;
[self.delegate returnRotationMat:R];
[self.delegate returnTranslationMat:T];
SCNMatrix4 Transformation;
Transformation = [self transformToSceneKit:extrinsicMatrixOfTheMarker];
//_cameraNode.transform = SCNMatrix4Invert(Transformation);
[_sceneKitScene.rootNode addChildNode:_cameraNode];
//_cameraNode.camera.projectionTransform = SCNMatrix4Identity;
//_cameraNode.camera.zNear = 0.0;
_sceneKitView.pointOfView = _cameraNode;
_boxNode.transform = Transformation;
[_sceneKitScene.rootNode addChildNode:_boxNode];
//_boxNode.position = SCNVector3Make(Transformation.m41, Transformation.m42, Transformation.m43);
std::cout << (_boxNode.position.x) << " " << (_boxNode.position.y) << " " << (_boxNode.position.z) << std::endl << std::endl;
}
For example if the translation vector is (-1, 5, 20) the object appears in the scene in position (-1, -5, -20) in the scene, and the rotation is correct also. The problem is that it never appears in the correct position in the background image. I will add some images to show the result.
Does anyone know why this is happening?
Found out the solution. Instead of applying the transform to the node of the object I applied the inverted transformation matrix to the camera node. Then for the camera perspective transform matrix I applied the following matrix:
projection = SCNMatrix4Identity
projection.m11 = (2 * (float)(cameraMatrix[0])) / -(ImageWidth*0.5)
projection.m12 = (-2 * (float)(cameraMatrix[1])) / (ImageWidth*0.5)
projection.m13 = (width - (2 * Float(cameraMatrix[2]))) / (ImageWidth*0.5)
projection.m22 = (2 * (float)(cameraMatrix[4])) / (ImageHeight*0.5)
projection.m23 = (-height + (2 * (float)(cameraMatrix[5]))) / (ImageHeight*0.5)
projection.m33 = (-far - near) / (far - near)
projection.m34 = (-2 * far * near) / (far - near)
projection.m43 = -1
projection.m44 = 0
being far and near the z clipping planes.
I also had to correct the box initial position to center it on the marker.

OpenCV: rotation/translation vector to OpenGL modelview matrix

I'm trying to use OpenCV to do some basic augmented reality. The way I'm going about it is using findChessboardCorners to get a set of points from a camera image. Then, I create a 3D quad along the z = 0 plane and use solvePnP to get a homography between the imaged points and the planar points. From that, I figure I should be able to set up a modelview matrix which will allow me to render a cube with the right pose on top of the image.
The documentation for solvePnP says that it outputs a rotation vector "that (together with [the translation vector] ) brings points from the model coordinate system to the camera coordinate system." I think that's the opposite of what I want; since my quad is on the plane z = 0, I want a a modelview matrix which will transform that quad to the appropriate 3D plane.
I thought that by performing the opposite rotations and translations in the opposite order I could calculate the correct modelview matrix, but that seems not to work. While the rendered object (a cube) does move with the camera image and seems to be roughly correct translationally, the rotation just doesn't work at all; it on multiple axes when it should only be rotating on one, and sometimes in the wrong direction. Here's what I'm doing so far:
std::vector<Point2f> corners;
bool found = findChessboardCorners(*_imageBuffer, cv::Size(5,4), corners,
CV_CALIB_CB_FILTER_QUADS |
CV_CALIB_CB_FAST_CHECK);
if(found)
{
drawChessboardCorners(*_imageBuffer, cv::Size(6, 5), corners, found);
std::vector<double> distortionCoefficients(5); // camera distortion
distortionCoefficients[0] = 0.070969;
distortionCoefficients[1] = 0.777647;
distortionCoefficients[2] = -0.009131;
distortionCoefficients[3] = -0.013867;
distortionCoefficients[4] = -5.141519;
// Since the image was resized, we need to scale the found corner points
float sw = _width / SMALL_WIDTH;
float sh = _height / SMALL_HEIGHT;
std::vector<Point2f> board_verts;
board_verts.push_back(Point2f(corners[0].x * sw, corners[0].y * sh));
board_verts.push_back(Point2f(corners[15].x * sw, corners[15].y * sh));
board_verts.push_back(Point2f(corners[19].x * sw, corners[19].y * sh));
board_verts.push_back(Point2f(corners[4].x * sw, corners[4].y * sh));
Mat boardMat(board_verts);
std::vector<Point3f> square_verts;
square_verts.push_back(Point3f(-1, 1, 0));
square_verts.push_back(Point3f(-1, -1, 0));
square_verts.push_back(Point3f(1, -1, 0));
square_verts.push_back(Point3f(1, 1, 0));
Mat squareMat(square_verts);
// Transform the camera's intrinsic parameters into an OpenGL camera matrix
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
// Camera parameters
double f_x = 786.42938232; // Focal length in x axis
double f_y = 786.42938232; // Focal length in y axis (usually the same?)
double c_x = 217.01358032; // Camera primary point x
double c_y = 311.25384521; // Camera primary point y
cv::Mat cameraMatrix(3,3,CV_32FC1);
cameraMatrix.at<float>(0,0) = f_x;
cameraMatrix.at<float>(0,1) = 0.0;
cameraMatrix.at<float>(0,2) = c_x;
cameraMatrix.at<float>(1,0) = 0.0;
cameraMatrix.at<float>(1,1) = f_y;
cameraMatrix.at<float>(1,2) = c_y;
cameraMatrix.at<float>(2,0) = 0.0;
cameraMatrix.at<float>(2,1) = 0.0;
cameraMatrix.at<float>(2,2) = 1.0;
Mat rvec(3, 1, CV_32F), tvec(3, 1, CV_32F);
solvePnP(squareMat, boardMat, cameraMatrix, distortionCoefficients,
rvec, tvec);
_rv[0] = rvec.at<double>(0, 0);
_rv[1] = rvec.at<double>(1, 0);
_rv[2] = rvec.at<double>(2, 0);
_tv[0] = tvec.at<double>(0, 0);
_tv[1] = tvec.at<double>(1, 0);
_tv[2] = tvec.at<double>(2, 0);
}
Then in the drawing code...
GLKMatrix4 modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, 0.0f);
modelViewMatrix = GLKMatrix4Translate(modelViewMatrix, -tv[1], -tv[0], -tv[2]);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[0], 1.0f, 0.0f, 0.0f);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[1], 0.0f, 1.0f, 0.0f);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[2], 0.0f, 0.0f, 1.0f);
The vertices I'm rendering create a cube of unit length around the origin (i.e. from -0.5 to 0.5 along each edge.) I know with OpenGL translation functions performed transformations in "reverse order," so the above should rotate the cube along the z, y, and then x axes, and then translate it. However, it seems like it's being translated first and then rotated, so perhaps Apple's GLKMatrix4 works differently?
This question seems very similar to mine, and in particular coder9's answer seems like it might be more or less what I'm looking for. However, I tried it and compared the results to my method, and the matrices I arrived at in both cases were the same. I feel like that answer is right, but that I'm missing some crucial detail.
You have to make sure the axis are facing the correct direction. Especially, the y and z axis are facing different directions in OpenGL and OpenCV to ensure the x-y-z basis is direct. You can find some information and code (with an iPad camera) in this blog post.
-- Edit --
Ah ok. Unfortunately, I used these resources to do it the other way round (opengl ---> opencv) to test some algorithms. My main issue was that the row order of the images was inverted between OpenGL and OpenCV (maybe this helps).
When simulating cameras, I came across the same projection matrices that can be found here and in the generalized projection matrix paper. This paper quoted in the comments of the blog post also shows some link between computer vision and OpenGL projections.
I'm not an IOS programmer, so this answer might be misleading!
If the problem is not in the order of applying the rotations and the translation, then suggest using a simpler and more commonly used coordinate system.
The points in the corners vector have the origin (0,0) at the top left corner of the image and the y axis is towards the bottom of the image. Often from math we are used to think of the coordinate system with the origin at the center and y axis towards the top of the image. From the coordinates you're pushing into board_verts I'm guessing you're making the same mistake. If that's the case, it's easy to transform the positions of the corners by something like this:
for (i=0;i<corners.size();i++) {
corners[i].x -= width/2;
corners[i].y = -corners[i].y + height/2;
}
then you call solvePnP(). Debugging this is not that difficult, just print the positions of the four corners and the estimated R and T, and see if they make sense. Then you can proceed to the OpenGL step. Please let me know how it goes.

Resources