OpenCV POSIT algorithm - coordinate systems - opencv

I know that Posit calculates the translation and rotation between your camera and a 3d object.
the only problem i have right now is, i have no idea how the coordinate systems of the camera and the object are defined.
So for example if i get 90° around the z-axis, in which direction is the z axis pointing and is the object rotating around this axis or is the camera rotating around it?
Edit:
After some testing and playing around with different coordinate systems, i think this is right:
definition of the camera coordinate system:
z-axis is pointing in the direction, in which the camera is looking.
x-axis is pointing to the right, while looking in z-direction.
y-axis is pointing up, while looking in z-direction.
the object is defined in the same coordinate system, but each point is defined relative to the starting point and not to the coordinate systems origin.
the translation vector you get, tells you how point[0] of the object is moved away from the origin of the camera coordinate system.
the rotationmatrix tells you how to rotate the object in the cameras coordinate system, in order to get the objects starting orientation. so the rotation matrix basically doesnt tell you how the object is rotated right now, but it tells you how you have to reverse its current orientation.
can anyone approve this?

Check out this answer.
The Y axis is pointing downward. I don't know what do You mean by starting point. The camera lays in the origin of it's coordinate system, and object points are defined in this system.
You are right with the rotation matrix, well, half of. The rotation matrix tells You, how to rotate the coordinate system to make it oriented the same as the coordinate system used to define model of the object. So it does tell You how the object is oriented with respect to the camera coordinate system.

Related

OpenCV - Determine detected objects angle from "center" of frame

I am working on a machine vision project and need to determine the angle of an object in x and y relative to the center of the frame (center in my mind being where the camera is pointed). I originally did NOT do a camera calibration (calculated angle per pixel by taking a picture of a dense grid and doing some simple math). While doing some object tracking I was noticing some strange behaviour which I suspected was due to some distortion. I also noticed that an object that should be dead center of my frame was not, the camera had to be shifted or the angle changed for that to be true.
I performed a calibration in OpenCV and got a principal point of (363.31, 247.61) with a resolution of 640x480. The angle per pixel obtained by cv2.calibrationMatrixVales() was very close to what I had calculated, but up to this point I was assuming center of the frame was based on 640/2, 480/2. I'm hoping that someone can confirm, but going forward do I assume that my (0,0) in cartesian coordinates is now at the principal point? Perhaps I can use my new camera matrix to correct the image so my original assumption is true? Or I am out to lunch and need some direction on how to achieve this.
Also was my assumption of 640/2 correct or should it technically have been (640-1)/2. Thanks all!

Recreate the 3D outlines of a City street in iOS SceneKit with OSM XML data

What is best strategy to recreate part of a street in iOS SceneKit using .osm XML data?
Please assume part of a street is offered in the OSM XML data and contains the necessary geopoints with latitude and longitude denoting the Nodes to describe the paths/footprints of 6 buildings (i.e. ground floor plans that line the side of a street).
Specifically, what's the best strategy to convert latitude and longitude Nodes in order to locate these building footprints/polygons on the ground floor in a scene within SceneKit iOS? (i.e. running through position 0,0,0)? Thank you.
Very roughly and briefly, based on my own experience with 3D map rendering:
Transform the XML data from lat/long to appropriate coordinates for a 2D map (that is, project it to a plane using a map projection, then apply a 2D affine transform to get it into screen pixel coordinates). Create a 2D map that's wider and taller than the actual screen, because of what's going to happen in step 2:
Using a 3D coordinate system with your map vertical (i.e., set all the Z coordinates to zero), rotate the map so that it reclines at an appropriate shallow angle, as if you're in an aeroplane looking down on it; the angle might be 30 degrees from horizontal. To rotate the map you'll need to create a 3D rotation matrix. The axis of rotation will be the X axis: that is, the horizontal line that is the bottom border of your 2D map. The rotation is exactly the same as what happens when you rotate your laptop screen away from you.
Supply the new 3D coordinates to your rendering system. I haven't used SceneKit but I had a quick look at the documentation and you can use any coordinate system you like, so you will be able to use one that is convenient for the process I have just described: something that uses units the size of a screen pixel at the viewing plane, with Y going upwards, X going right, and Z going away from the viewer.
One final caveat: if you want to add extrusions giving a rough approximation of the 3D building shapes (such data is available in OSM for some areas) note that my scheme requires the tops of buildings, and indeed anything above ground level, to have negative Z coordinates.
Pretty simple. First, convert Your CLLocationCoordinate2D to a MKMapPoint, which is exactly the same as a CGRect. Second, scale down the MKMapPoint by some arbitrary number so it fits in with how you want it on your scene graph, let's say by 200. Since scenekit's coordinate system is centered at (0,0), you'll need to make sure your location is correct. Then just create your scnvector3's with the x/y of he MKMapPoint, and you will be locked to coordinates.

Camera pose and reflections using OpenCV's SovePNP

I'm trying to use the function SolvePNP to estimate the relative position of a camera. Mi question is this, when choosing world coordinates, do I need to be careful in choosing them so that there can be no reflections when transforming them to camera coordinates? Or will OpenCV correct that for me?
Details: I'm filming a tennis court and was originally setting the world coordinate origin to be the centre of the court, with the x-axis pointing parallel to the net towards the left, the y-axis pointing forwards vertically on the court, and the z-axis pointing upwards. If I've understood correctly, SolvePNP will transform these coordinates to a system with origin at some point behind the top left corner of an image, with x-axis pointing downwards on the image, y-axis pointing to the right, and z-axis pointing forwards to the scene. However this transformation would definitely involve a reflection, must I swap the x and y axis of my world coordinates to avoid this or is it fine to leave them as they are? (Also, let me know if I'm making a big mistake and SolvePnp actually puts the origin at a point behind the centre of the image rather than one the top left corner...)
Assuming that you have a camera calibration matrix (and that such calibration was done assuming a right hand coordinate system all along), and correct correspondences between the tennis field features in the image and the CAD-features:
You need to select the reference frame in the tennis court such that is a right hand coordinate system, so that your solution from solvePNP provides the pose and position of the tennis field reference frame with respect to the camera coordinate system (by default a right hand coordinate system).
Hope it helps

Is there a reverse function of lookat for glMatrix?

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.

What is this rotation behavior in XNA?

I am just starting out in XNA and have a question about rotation. When you multiply a vector by a rotation matrix in XNA, it goes counter-clockwise. This I understand.
However, let me give you an example of what I don't get. Let's say I load a random art asset into the pipeline. I then create some variable to increment every frame by 2 radians when the update method runs(testRot += 0.034906585f). The main thing of my confusion is, the asset rotates clockwise in this screen space. This confuses me as a rotation matrix will rotate a vector counter-clockwise.
One other thing, when I specify where my position vector is, as well as my origin, I understand that I am rotating about the origin. Am I to assume that there are perpendicular axis passing through this asset's origin as well? If so, where does rotation start from? In other words, am I starting rotation from the top of the Y-axis or the x-axis?
The XNA SpriteBatch works in Client Space. Where "up" is Y-, not Y+ (as in Cartesian space, projection space, and what most people usually select for their world space). This makes the rotation appear as clockwise (not counter-clockwise as it would in Cartesian space). The actual coordinates the rotation is producing are the same.
Rotations are relative, so they don't really "start" from any specified position.
If you are using maths functions like sin or cos or atan2, then absolute angles always start from the X+ axis as zero radians, and the positive rotation direction rotates towards Y+.
The order of operations of SpriteBatch looks something like this:
Sprite starts as a quad with the top-left corner at (0,0), its size being the same as its texture size (or SourceRectangle).
Translate the sprite back by its origin (thus placing its origin at (0,0)).
Scale the sprite
Rotate the sprite
Translate the sprite by its position
Apply the matrix from SpriteBatch.Begin
This places the sprite in Client Space.
Finally a matrix is applied to each batch to transform that Client Space into the Projection Space used by the GPU. (Projection space is from (-1,-1) at the bottom left of the viewport, to (1,1) in the top right.)
Since you are new to XNA, allow me to introduce a library that will greatly help you out while you learn. It is called XNA Debug Terminal and is an open source project that allows you to run arbitrary code during runtime. So you can see if your variables have the value you expect. All this happens in a terminal display on top of your game and without pausing your game. It can be downloaded at http://www.protohacks.net/xna_debug_terminal
It is free and very easy to setup so you really have nothing to lose.

Resources