Image transformation matrix in opencv - opencv

I'm currently working on this [opencv sample]
The interesting part is at line 89 warpPerspectiveRand method. I want to set the rotation angle, translation, scaling and other transformation values manually instead of using random generated values. But I don't know how to calculate the matrix elements.
A simple calculation example would be helpful.
Thanks

double ang = 0.1;
double xscale = 1.2;
double yscale = 1.5;
double xTranslation = 100;
double yTranslation = 200;
cv::Mat t(3,3,CV_64F);
t=0;
t.at<double>(0,0) = xscale*cos(ang);
t.at<double>(1,1) = yscale*cos(ang);
t.at<double>(0,1) = -sin(ang);
t.at<double>(1,0) = sin(ang);
t.at<double>(0,2) = xTranslation ;
t.at<double>(1,2) = yTranslation;
t.at<double>(2,2) = 1;
EDIT:
Rotation is always around (0,0). If you would like to rotated around a different point, you need to translate(move), rotate, and move back. It can be done by creating two matrices, one for rotation (A) and one for translation(T), and building a new Matrix M as:
M = inv(T) * A * T

What you're looking for is a projection matrix
http://en.wikipedia.org/wiki/3D_projection
There are different matrix styles, some of them are 4x4 (the complete theoretical projection matrix), some are 3x3 (as in OpenCV), because they consider the projection as a transform from a planar surface to another planar surface, and this constraint allows one to express the trasform by a 3x3 matrix.

Related

Pose Estimation - cv::SolvePnP with Scenekit - Coordinate System Question

I have been working on Pose Estimation (rectifying key points on a 3D model with 2D points on an image to match pose) via OpenCV's cv::solvePNP, using features / key points from Apples Vision framework.
TL-DR:
My scene kit model is being translated and the units look correct when introspecting the translation and rotation vectors from solvePnP (ie, they are the right order of magnitude), but the coordinate system of the translation appears off:
I am trying to understand the coordinate system requirements with solvePnP wrt to Metal / OpenGL coordinate system and my camera projection matrix.
What 'projectionMatrix' does my SCNCamera require to match image based coordinate system passed into solvePnP?
Some things ive read / believe I am taking into account.
OpenCV vs OpenGL (thus Metal) have row major vs column major differences.
OpenCV's coordinate system for 3D is different than OpenGL (thus Metal).
Longer with code:
My workflow is as such:
Step 1 - use a 3D model tool to introspect points on my 3D model and get the objects vertex positions for the major key points in the 2D detected features. I am using left pupil, right pupil, tip of nose, tip of chin, left outer lip corner, right outer lip corner.
Step 2 - Run a vision request and extract a list of points in image space (converting for OpenCV's top left coordinate system) and extract the same ordered list of 2D points.
Step 3 - Construct a camera matrix by using the size of the input image.
Step 4 - run cv::solvePnP, and then use cv::Rodrigues to convert the rotation vector to a matrix
Step 5 - Convert the coordinate system of the resulting transforms into something appropriate for the GPU - invert the y and z axis and combine the translation and rotation to a single 4x4 Matrix, and then transpose it for the appropriate major ness of OpenGL / Metal
Step 6 - apply the resulting transform to Scenekit via:
let faceNodeTransform = openCVWrapper.transform(for: landmarks, imageSize: size)
self.destinationView.pointOfView?.transform = SCNMatrix4Invert(faceNodeTransform)
Below is my Obj-C++ OpenCV Wrapper which takes in a subset of Vision Landmarks and the true pixel size of the image being looked at:
/ https://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
- (SCNMatrix4) transformFor:(VNFaceLandmarks2D*)landmarks imageSize:(CGSize)imageSize
{
// 1 convert landmarks to image points in image space (pixels) to vector of cv::Point2f's :
// Note that this translates the point coordinate system to be top left oriented for OpenCV's image coordinates:
std::vector<cv::Point2f > imagePoints = [self imagePointsForLandmarks:landmarks imageSize:imageSize];
// 2 Load Model Points
std::vector<cv::Point3f > modelPoints = [self modelPoints];
// 3 create our camera extrinsic matrix
// TODO - see if this is sane?
double max_d = fmax(imageSize.width, imageSize.height);
cv::Mat cameraMatrix = (cv::Mat_<double>(3,3) << max_d, 0, imageSize.width/2.0,
0, max_d, imageSize.height/2.0,
0, 0, 1.0);
// 4 Run solvePnP
double distanceCoef[] = {0,0,0,0};
cv::Mat distanceCoefMat = cv::Mat(1 ,4 ,CV_64FC1,distanceCoef);
// Output Matrixes
std::vector<double> rv(3);
cv::Mat rotationOut = cv::Mat(rv);
std::vector<double> tv(3);
cv::Mat translationOut = cv::Mat(tv);
cv::solvePnP(modelPoints, imagePoints, cameraMatrix, distanceCoefMat, rotationOut, translationOut, false, cv::SOLVEPNP_EPNP);
// 5 Convert rotation matrix (actually a vector)
// To a real 4x4 rotation matrix:
cv::Mat viewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::Mat rotation;
cv::Rodrigues(rotationOut, rotation);
// Append our transforms to our matrix and set final to identity:
for(unsigned int row=0; row<3; ++row)
{
for(unsigned int col=0; col<3; ++col)
{
viewMatrix.at<double>(row, col) = rotation.at<double>(row, col);
}
viewMatrix.at<double>(row, 3) = translationOut.at<double>(row, 0);
}
viewMatrix.at<double>(3, 3) = 1.0f;
// Transpose OpenCV to OpenGL coords
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64FC1);
cvToGl.at<double>(0, 0) = 1.0f;
cvToGl.at<double>(1, 1) = -1.0f; // Invert the y axis
cvToGl.at<double>(2, 2) = -1.0f; // invert the z axis
cvToGl.at<double>(3, 3) = 1.0f;
viewMatrix = cvToGl * viewMatrix;
// Finally transpose to get correct SCN / OpenGL Matrix :
cv::Mat glViewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::transpose(viewMatrix , glViewMatrix);
return [self convertCVMatToMatrix4:glViewMatrix];
}
- (SCNMatrix4) convertCVMatToMatrix4:(cv::Mat)matrix
{
SCNMatrix4 scnMatrix = SCNMatrix4Identity;
scnMatrix.m11 = matrix.at<double>(0, 0);
scnMatrix.m12 = matrix.at<double>(0, 1);
scnMatrix.m13 = matrix.at<double>(0, 2);
scnMatrix.m14 = matrix.at<double>(0, 3);
scnMatrix.m21 = matrix.at<double>(1, 0);
scnMatrix.m22 = matrix.at<double>(1, 1);
scnMatrix.m23 = matrix.at<double>(1, 2);
scnMatrix.m24 = matrix.at<double>(1, 3);
scnMatrix.m31 = matrix.at<double>(2, 0);
scnMatrix.m32 = matrix.at<double>(2, 1);
scnMatrix.m33 = matrix.at<double>(2, 2);
scnMatrix.m34 = matrix.at<double>(2, 3);
scnMatrix.m41 = matrix.at<double>(3, 0);
scnMatrix.m42 = matrix.at<double>(3, 1);
scnMatrix.m43 = matrix.at<double>(3, 2);
scnMatrix.m44 = matrix.at<double>(3, 3);
return (scnMatrix);
}
Some questions:
An SCNNode has no modelViewMatrix (just as I understand it, a transform, which is the modelMatrix) to just throw a matrix at - so I've read the inverse of the transform from SolvePNP process can be used to pose the camera instead, which appears to get me the closes result. I want to ensure this approach is correct.
If I have the modelViewMatrix, and the projectionMatrix, I should be able to calculate the appropriate modelMatrix? Is this the approach I should be taking?
Its unclear to me what projectionMatrix I should be using for my SceneKit Scene and If that has any bearing on my results. Do I need a pixel for pixel exact match of my viewport to the image size, and how do I properly configure my SCNCamera to ensure coordinate system agreeance for SolvePnP?
Thank you very much!

opencv Vec3d to Eigen::Quaternion, euler flipping on results

I am using opencv::solvePnP to return a camera pose. I run PnP, and it returns the rvec and tvec values.(rotation vector and position).
I then run this function to convert the values to the camera pose:
void GetCameraPoseEigen(cv::Vec3d tvecV, cv::Vec3d rvecV, Eigen::Vector3d &Translate, Eigen::Quaterniond &quats)
{
Mat R;
Mat tvec, rvec;
tvec = DoubleMatFromVec3b(tvecV);
rvec = DoubleMatFromVec3b(rvecV);
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R*tvec; // translation of inverse
Eigen::Matrix3d mat;
cv2eigen(R, mat);
Eigen::Quaterniond EigenQuat(mat);
quats = EigenQuat;
double x_t = tvec.at<double>(0, 0);
double y_t = tvec.at<double>(1, 0);
double z_t = tvec.at<double>(2, 0);
Translate.x() = x_t * 10;
Translate.y() = y_t * 10;
Translate.z() = z_t * 10;
}
This works, yet at some rotation angles, the converted rotation values flip randomly between positive and negative values. Yet, the source rvecV value does not. I assume this means I am going wrong with my conversion. How can i get a stable Quaternion from the PnP returned cv::Vec3d?
EDIT: This seems to be Quaternion flipping, as mentioned here:
Quaternion is flipping sign for very similar rotations?
Based on that, i have tried adding:
if(quat.w() < 0)
{
quat = quat.Inverse();
}
But I see the same flipping.
Both quat and -quat represent the same rotation. You can check that by taking a unit quaternion, converting it to a rotation matrix, then doing
quat.coeffs() = -quat.coeffs();
and converting that to a rotation matrix as well.
If for some reason you always want a positive w value, negate all coefficients if w is negative.
The sign should not matter...
... rotation-wise, as long as all four fields of the 4D quaternion are getting flipped. There's more to it explained here:
Quaternion to EulerXYZ, how to differentiate the negative and positive quaternion
Think of it this way:
Angle/axis both flipped mean the same thing
and mind the clockwise to counterclockwise transition much like in a mirror image.
There may be convention to keep the quat.w() or quat[0] component positive and change other components to opposite accordingly. Assume w = cos(angle/2) then setting w > 0 just means: I want angle to be within the (-pi, pi) range. So that the -270 degrees rotation becomes +90 degrees rotation.
Doing the quat.Inverse() is probably not what you want, because this creates a rotation in the opposite direction. That is -quat != quat.Inverse().
Also: check that both systems have the same handedness (chirality)! Test if your rotation matrix determinant is +1 or -1.
(sry for the image link, I don't have enough reputation to embed them).

What is the use of Projection matrix?

I've been trying to analyse Apple's pARk(Augmented reality sample application) where I came across the below function,
Method call with parameters below:
createProjectionMatrix(projectionTransform, 60.0f*DEGREES_TO_RADIANS, self.bounds.size.width*1.0f / self.bounds.size.height, 0.25f, 1000.0f);
void createProjectionMatrix(mat4f_t mout, float fovy, float aspect, float zNear, float zFar)
{
float f = 1.0f / tanf(fovy/2.0f);
mout[0] = f / aspect;
mout[1] = 0.0f;
mout[2] = 0.0f;
mout[3] = 0.0f;
mout[4] = 0.0f;
mout[5] = f;
mout[6] = 0.0f;
mout[7] = 0.0f;
mout[8] = 0.0f;
mout[9] = 0.0f;
mout[10] = (zFar+zNear) / (zNear-zFar);
mout[11] = -1.0f;
mout[12] = 0.0f;
mout[13] = 0.0f;
mout[14] = 2 * zFar * zNear / (zNear-zFar);
mout[15] = 0.0f;
}
I see this projection matrix is multiplied with rotation matrix(obtained by motionManager.deviceMotion API).
What is the use of projection matrix?Why should it be multiplied with rotation matrix?
multiplyMatrixAndMatrix(projectionCameraTransform, projectionTransform, cameraTransform);
Why the resultant matrix has to be multiplied with a PointOfInterest vector coordinates again?
multiplyMatrixAndVector(v, projectionCameraTransform, placesOfInterestCoordinates[i]);
Appreciate any help here.
Sample code link here
In computer vision and in robotics, a typical task is to identify specific objects in an image and to determine each object's POSITION and ORIENTATION (or Translation and Rotation) relative to some coordinate system.
In Augmented Reality we normally calculate the pose of the detected object and then augment a virtual model on top of it. We can project the virtual model more REALISTically if we know the pose of the detected object.
The joint rotation-translation matrix [R|t] is called a matrix of extrinsic parameters. It is used to describe the camera motion around a static scene, or vice versa, rigid motion of an object in front of a still camera. That is, [R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera. This offers you a 6DOF pose(3 rotation & 3 translation) required for Mobile AR.
A good read if you want to read more http://games.ianterrell.com/learn-the-basics-of-opengl-with-glkit-in-ios-5/
Sorry I am only working with Android AR. Hope this helps :)

Use of maths in the Apple pARk sample code

I'm studied the pARK example project (http://developer.apple.com/library/IOS/#samplecode/pARk/Introduction/Intro.html#//apple_ref/doc/uid/DTS40011083) so I can apply some of its fundamentals in an app i'm working on. I understand nearly everything, except:
The way it has to calculate if a point of interest must appear or not. It gets the attitude, multiply it with the projection matrix (to get the rotation in GL coords?), then multiply that matrix with the coordinates of the point of interest and, at last, look at the last coordinate of that vector to find out if the point of interest must be shown. Which are the mathematic fundamentals of this?
Thanks a lot!!
I assume you are referring to the following method:
- (void)drawRect:(CGRect)rect
{
if (placesOfInterestCoordinates == nil) {
return;
}
mat4f_t projectionCameraTransform;
multiplyMatrixAndMatrix(projectionCameraTransform, projectionTransform, cameraTransform);
int i = 0;
for (PlaceOfInterest *poi in [placesOfInterest objectEnumerator]) {
vec4f_t v;
multiplyMatrixAndVector(v, projectionCameraTransform, placesOfInterestCoordinates[i]);
float x = (v[0] / v[3] + 1.0f) * 0.5f;
float y = (v[1] / v[3] + 1.0f) * 0.5f;
if (v[2] < 0.0f) {
poi.view.center = CGPointMake(x*self.bounds.size.width, self.bounds.size.height-y*self.bounds.size.height);
poi.view.hidden = NO;
} else {
poi.view.hidden = YES;
}
i++;
}
}
This is performing an OpenGL like vertex transformation on the places of interest to check if they are in a viewable frustum. The frustum is created in the following line:
createProjectionMatrix(projectionTransform, 60.0f*DEGREES_TO_RADIANS, self.bounds.size.width*1.0f / self.bounds.size.height, 0.25f, 1000.0f);
This sets up a frustum with a 60 degree field of view, a near clipping plane of 0.25 and a far clipping plane of 1000. Any point of interest that is further away than 1000 units will then not be visible.
So, to step through the code, first the projection matrix that sets up the frustum, and the camera view matrix, which simply rotates the object so it is the right way up relative to the camera, are multiplied together. Then, for each place of interest, its location is multiplied by the viewProjection matrix. This will project the location of the place of interest into the view frustum, applying rotation and perspective.
The next two lines then convert the transformed location of the place into whats known as normalized device coordinates. The 4 component vector needs to be collapsed to 3 dimensional space, this is achieved by projecting it onto the plane w == 1, by dividing the vector by its w component, v[3]. It is then possible to determine if the point lies within the projection frustum by checking if its coordinates lie in the cube with side length 2 with origin [0, 0, 0]. In this case, the x and y coordinates are being biased from the range [-1 1] to [0 1] to match up with the UIKit coordinate system, by adding 1 and dividing by 2.
Next, the v[2] component, z, is checked to see if it is greater than 0. This is actually incorrect as it has not been biased, it should be checked to see if it is greater than -1. This will detect if the place of interest is in the first half of the projection frustum, if it is then the object is deemed visible and displayed.
If you are unfamiliar with vertex projection and coordinate systems, this is a huge topic with a fairly steep learning curve. There is however a lot of material online covering it, here are a couple of links to get you started:
http://www.falloutsoftware.com/tutorials/gl/gl0.htm
http://www.opengl.org/wiki/Vertex_Transformation
Good luck//

iOS: Questions about camera information within GLKMatrix4MakeLookAt result

The iOS 5 documentation reveals that GLKMatrix4MakeLookAt operates the same as gluLookAt.
The definition is provided here:
static __inline__ GLKMatrix4 GLKMatrix4MakeLookAt(float eyeX, float eyeY, float eyeZ,
float centerX, float centerY, float centerZ,
float upX, float upY, float upZ)
{
GLKVector3 ev = { eyeX, eyeY, eyeZ };
GLKVector3 cv = { centerX, centerY, centerZ };
GLKVector3 uv = { upX, upY, upZ };
GLKVector3 n = GLKVector3Normalize(GLKVector3Add(ev, GLKVector3Negate(cv)));
GLKVector3 u = GLKVector3Normalize(GLKVector3CrossProduct(uv, n));
GLKVector3 v = GLKVector3CrossProduct(n, u);
GLKMatrix4 m = { u.v[0], v.v[0], n.v[0], 0.0f,
u.v[1], v.v[1], n.v[1], 0.0f,
u.v[2], v.v[2], n.v[2], 0.0f,
GLKVector3DotProduct(GLKVector3Negate(u), ev),
GLKVector3DotProduct(GLKVector3Negate(v), ev),
GLKVector3DotProduct(GLKVector3Negate(n), ev),
1.0f };
return m;
}
I'm trying to extract camera information from this:
1. Read the camera position
GLKVector3 cPos = GLKVector3Make(mx.m30, mx.m31, mx.m32);
2. Read the camera right vector as `u` in the above
GLKVector3 cRight = GLKVector3Make(mx.m00, mx.m10, mx.m20);
3. Read the camera up vector as `u` in the above
GLKVector3 cUp = GLKVector3Make(mx.m01, mx.m11, mx.m21);
4. Read the camera look-at vector as `n` in the above
GLKVector3 cLookAt = GLKVector3Make(mx.m02, mx.m12, mx.m22);
There are two questions:
The look-at vector seems negated as they defined it, since they perform (eye - center) rather than (center - eye). Indeed, when I call GLKMatrix4MakeLookAt with a camera position of (0,0,-10) and a center of (0,0,1) my extracted look at is (0,0,-1), i.e. the negative of what I expect. So should I negate what I extract?
The camera position I extract is the result of the view transformation matrix premultiplying the view rotation matrix, hence the dot products in their definition. I believe this is incorrect - can anyone suggest how else I should calculate the position?
Many thanks for your time.
Per its documentation, gluLookAt calculates centre - eye, uses that for some intermediate steps, then negatives it for placement into the resulting matrix. So if you want centre - eye back, the taking negative is explicitly correct.
You'll also notice that the result returned is equivalent to a multMatrix with the rotational part of the result followed by a glTranslate by -eye. Since the classic OpenGL matrix operations post multiply, that means gluLookAt is defined to post multiply the rotational by the translational. So Apple's implementation is correct, and the same as first moving the camera, then rotating it — which is correct.
So if you define R = (the matrix defining the rotational part of your instruction), T = (the translational analogue), you get R.T. If you want to extract T you could premultiply by the inverse of R and then pull the results out of the final column, since matrix multiplication is associative.
As a bonus, because R is orthonormal, the inverse is just the transpose.

Resources