View results of affine transform - opencv

I am trying to find out the reason why when I apply affine transformations on an image in OpenCV, the result of it is not visible in the preview window, but the entire window is black.How can I find workaround for this problem so that I can always view my transformed image (the result of the affine transform) in the window no matter the applied transformation?
Update: I think that this happens because all the transformations are calculated with respect to the origin of the coordinate system (top left corner of the image). While for rotation I can specify the center of the rotation, and I am able to view the result, when I perform scaling I am not able to control where the transformed image goes. Is it possible to somehow move the coordinate system to make the image fit in the window?
Update2: I have an image which contains only ROI at some position in it (the rest of the image is black), and I need to apply a set of affine transforms on it. To make things simpler and to see the effect of each individual transform, I applied each transform one by one. What I noticed is that, whenever I move (translate) the image such that the center of the ROI is in the center of the coordinate system (top left corner of the view window), all the affine transforms perform correctly without moving. However, by translating the center of ROI at the center of the coordinate system, the upper and the left part of the ROI remain cut out of the current view window.
If I move ROI's central point to another point in the view window (for example the window center), an affine transform of type:
A=[a 0 0; 0 b 0] (A is 2x3 matrix, parameter of the warpAffine function)
moves the image (ROI), outside of the view window (which doesn't happen if the ROI's center is in the top-left corner). How can I modify the affine transform so the image doesn't move out of its place (behaves the same way as when the ROI center is in the center of the coordinate system)?

If you want to be able to apply any affine transform, you will not always be able to view it. A better idea might be to manually apply your transform to 4 corners of a square and then look at the coordinates where those 4 points end up. That will tell you where your image is going.
If you have several transforms, just combine them into one transform. If you have 3 transforms
[A],[B],[C]
transforming an image by A,then B, then C is equivalent to transforming the image once by
[C]*[B]*[A]
If your transforms are in 2x3 matrices, just convert them to 3x3 matrices by adding
[0,0,1]
as the new bottom row, then multiply the 3x3 matrices together, when you are finished the bottom row will be unchanged, then just drop it to get your new, combined affine transform
Update
If you want to apply a transform to an object as if the object were somewhere else. You can combine 3 transforms. First translate the object to the location you want it to be transformed in (center of coordinate system in your case) with an affine transform [A]. Then apply your scaling transform [B], then a translation back to where you started. The translation back should be the inverse of [A]. That means your final transform would be
final_transform = [A].inv()*[B]*[A]
order of operations reads right to left when doing matrix multiplication.

Related

Marker Tracking + perspective warp of marker

I'm tracking a marker with ARToolKit+. I receive a model view matrix that looks about right. Now I'd like to warp the image in a way that the marker looks just like it would look if I looked straight at it. But whatever I do, the result looks just extremely distorted. I know that ARToolKit stores the 4x4 matrix in column major order, so I fixed that for OpenCV.
What I tried so far was:
1) fix the order to row major order
2) calculate the inverse with cvInverse (although transposing the 3x3 rotation part + inverting the translation should suffice)
3) use that matrix with cvPerspectiveWarp
Am I doing something wrong?
tl;dr:
I want this: https://www.youtube.com/watch?v=qZ-LU-C2p2Q
I get some distorted lines and lots of black instead.
Your problem is in converting from 4x4 to 3x3. The short answer is that you want to drop the 3rd column and bottom row to make the 3x3 and then premultiply with your camera matrix. For a longer explanation see here
Clarification
The pose you get from ARTK represents a transform from one place to another. When I say "the initial image appears without rotation" I meant that your transform goes from an initial state which has no rotation about the x or y axis to the current state. That is a fine assumption for most augmented reality applications, I mentioned it just to be thorough.
As for why you can drop the 3rd column. Since you are transforming a plane, your z coordinate can be completely expressed by your x and y coordinates given the equation of your plane. If we assume that initially there is no rotation then your initial z coordinate is a constant value. If there is rotation then z is not constant but it varies deterministically in x and y according to its plane equation which can still be expressed in one matrix (though you don't need that). Since in your case your 4x4 transform is probably expressing the transform from the marker lying flat at z = 0 to its current position, the 3rd column of your 4x4 matrix does nothing (it all gets multiplied by 0) so it can be dropped without affecting the result.
In short: Forget about the rotation stuff, its more complicated than you need, just realize that the transform is from initial coordinates to final coordinates and your initial coordinates are always
[x,y,0,1]
which makes your third column irrelevant.
Update
I'm sorry! I just re-read your question and realized you just want to warp the marker so it looks like a straight on view, I got caught up in describing a general transform from 4x4 to 3x3. The 4x4 transform you get from ARTK is not the transform that will de warp the warker, it is the transform that moves the marker from the origin to its final position. To de warp the marker like you asked the process is similar but would be slightly different. I haven't done that before but here is my guess.
First, you need to get the 4x4 transform between where the marker is in world space, and where you would like it to appear to be after warping it. Right now the transform goes from the origin to the marker location. To change the transform to go from some point farther down on the z axis (say 100) to the marker location define the transform.
initial_marker_pose = [1,0,0,0
0,1,0,0
0,0,1,100
0,0,0,1];
Now you have the transform from the origin to what you want as your "inital" position, and the transform from the origin to your "final" position. To get the transform from initial to final simply
initial_to_final = origin_to_marker*initial_marker_pose.inv();
Now you would follow the process outlined in the link I gave you, in this case your initial zpos is no longer 0, it is 100. Then when you are finished you will need to invert your 3x3 matrix. That is because this process takes you from a straight on view to the one defined by the pose from ARTK and you want the opposite of that. You will need to experiment with the initial z position. The smaller it is, the larger your marker will appear after de-warping.
Hopefully that works, sorry for the confusion about your question.

Transform position of point form one perspective into another

I'm trying to convert the position of a point which was filmed with a freely moving camera (local space) into the position in a image of the same scene (global space). The position of the point is given in local space and I need to calculate it in global space. I have markers distributed all over the scene to have corresponding points in both global and local space to calculate the perspective transform.
I tried to calculate the perspective transform matrix by comparing the points of corresponding markers in gloabl and local space with the help of JavaCV (cvGetPerspectiveTransform(localMarker, globalMarker, mmat)). Then I transform the postion of the point in local space with the help of the perspective transform matrix (cvPerspectiveTransform(localFieldPoints, globalFieldPoints, mmat)).
I though that would be enough to solve my problem, but it doesn't quite work good. I also noticed that when I calculate the perspective transform matrix of different markers in one specific image of the video, i get diefferent perspective transform matrices. If I understood everything correct, this shouldn't happen, because the perspective is alway the same here, so I should always get the same perspective transform matrix, shouldn't I?
Because I'm quite new to all of this and this was my first attempt, I just wanted to know If the method I used is generally right or should it be done differently? Maybe I just missed something?
EDIT:
Again, I have one image of the complete scene i look at and a video from a camara which moves freely in the scene. Now I take every Image of the video and compare it with the image of the complete scene (I used different cameras for making the image and the video, so the camera intrinsics actually aren't the same. Could that be the Problem?
Perspective Transform Screenshot.
On the rigth side I have the image of the scene, on the left one Image of the video. The red circle in the left video image is the given point. The red square in the right image ist the calculated point with the help of perspective transform. As you can see, the calculated point isn't at the right position.
What I meant with „I get different perspective transform matrices“ is that when I calculate a perspective transform matrix with the help of marker „0E3E“ I get a different matrix than using marker „0272“.

Why does DirectX use a flipped Y axis?

I am saving my driven X/Y coordinates, and then using a function that convert the coordinates to meters, and add 1280 to each point (so it will fit nicely into a 2560x2560 image), and then draw a polygon between the 'points', resulting in a some sort of racing line. But once I have generated the polygon and saved it as an image, it is vertically flipped somehow. Flipping the image vertically will make it match the track bitmaps perfectly. I was told this is due to DirectX internally has the Y axis flipped. Why does DirectX use a flipped Y axis?
Well, the question is, does DirectX have a flipped Y-axis or does the image?
DirectX uses a 3D/4D coordinate system where the X-axis points to the right and Y-axis points upwards when no transformation is applied. This is because the screen (where Y-axis points downwards) is the last instance that has to process the image. Every step before that uses the coordinate system with the upward Y-axis. Since Direct3D is designed for 3D worlds, a coordinate system that is aligned like the world and like most coordinate system in maths is much more convenient for the programmer and designer. Imagine, you would create a 3D model. Wouldn't it be kind of weird, if you design it so that the Y-axis is pointing downwards?
When you have no transformation at all that would allow perspective and so on, you have the same coordinate system. Ignoring the Z-axis, the top left corner is (-1 | 1), the bottom right corner is (1, -1). This is equal to the coordinate systems used in e.g. maths. In the end, this coordinate system is transformed with the viewport which will result in the top left corner to be (0 | 0) and the bottom right corner to be (ResolutionX | ResolutionY).
So all in all, the reason why the Y-axis points upwards is that Direct3D's main purpose is to describe worlds in a convenient way independently of the screen's physical attributes.

Is there a reverse function of lookat for glMatrix?

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.

Stretch an image to fit in any quadrangle

The application PhotoFiltre has an option to stretch part of an image. You select a rectangular shape and you can then grab and move the vertexes somewhere else to make any quadrangle. The image part which you selected will stretch along. Hopefully these images make my point a little clearer:
Is there a general algorithm which can handle this? I would like to obtain the same effect on HTML5 canvas - given an image and the resulting corner points, I would like to be able to draw the stretched image in such a way that it fills the new quadrangle neatly.
A while ago I asked something similar, where the solution was to divide the image up in triangles and stretch each triangle so that each three points correspond to the three points on the original image. This technique turned out to be rather exprensive and I would like if there is a more general method of accomplishing this.
I would like to use this in a 3D renderer, but I would like to work with a (2D) quadrangle.
I don't know whether PhotoFiltre internally also uses triangles, or whether it uses another (cheaper) algorithm to stretch an image like this.
Does someone perhaps know if there is a cheaper or more general method/algorithm to stretch a rectangular image, so that it fills a quadrangle given four points?
The normal method is to start with the destination, pick an appropriate grid size and then for each point in the new shape calculate the corresponding point in the source image (possibly with interpolation depending on the quality you need)
Affine transform.
Given four points for the "stretched" figure and four points for the figure it should match (e.g. a rectangle), an affine transform provides the spatial mapping you need. For each point (x1,y1) in the original image there is a corresponding point (x2,y2) in the second, "stretched" image.
For each integer-valued pixel (x2, y2) in the stretched image, use the affine transform to find the corresponding real-valued point (x1, y1) in the original image and apply its color to (x2,y2).
http://demonstrations.wolfram.com/AffineTransform/
You'll find sample code for Java and other languages online. .NET has the Matrix class.

Resources