I'm using XNA but it doesn't matter too much for this example. So let's say I have a sprite. I then apply a scaling matrix before anything. Is the scaling matrix applied scaling the local axis of the sprite or just moving the points down? In other words, is applying a scaling matrix of 0.5f in the world space to my sprite at the world origin scaling down the local axis of the sprite or just all the points that make up that sprite by half?
The same kind of applies to a translation and then scaling. In my head, I picture a translation matrix of 30,30 as moving the sprite's local origin to 30,30 and as a result, the sprite's local axis to 30,30. Then, scaling by 0.5f would scale back the local axis but I don't see why the origin of the sprite would now be at 15,15.
This confusion compounds the fact that is you perform a translation of 1 to the right on the x-axis in the world, you are now moving based on the scale which you applied (so you would only move .5 in the world). This leads me to believe that the scale is applied to the object's own axis.
Btw, if you guys talk about the origin in your followups, could you state which origin you are referring to?
Thanks
Normally a sprite is defined by it's vertices (points). Applying a scaling matrix to a sprite will transform the vertices (points) of the sprite.
A scale matrix always assumes (0, 0) is the origin of the scale transform. So if you scale a sprite centered at (30, 30) all points will stretch away from the (0, 0) point. If it helps, imagine the sprite as a small dot on a circle around the (0, 0) point with that entire circle being scaled.
If you want to scale a sprite at (30, 30) from the center of the sprite, you have to translate the center of the sprite to (0, 0) first, then translate the sprite back out to (30, 30) after the scale has been performed.
So that would be:
Translate(-30, -30)
Scale(0.5)
Translate(30, 30)
To expand on Empyrean's answer, 3D worlds usually have at least four coordinate systems, each with its own local origin:
Object Space
World Space
Camera Space
View Space (2D!)
with three transformations:
Object to World
World to Camera
Camera to View
You can create new coordinate systems, for example 'Model Space', with the transformation 'Model to Object'. Using this, you get a series of steps:
Model -> scale -> Object
Object -> rotate -> translate -> World
World -> rotate -> translate -> Camera
Camera -> perspective -> View
In OpenGL you would push the matrices in the reverse order listed above, so the Model->Object transformation is the last to be pushed, and OpenGL should render the object correctly. I would assume XNA / DirectX has a similar system.
Getting more complex, Model Space can have a hierarchy of translations, scales and rotations in a tree to produce a skeletal system which can then be used to deform the model mesh. This is usually called Skinning.
So, to answer the question, depending on which transformation you apply a rotation transformation, for example, you will get different results. In the Model->Object transformation, the model will rotate about the object's origin. In the Object->World transformation, the object will rotate about the world's origin.
Related
the goal of the project is to create a drawing app. i want it so that when i touch the screen and move my finger it will follow the finger and leave a cyan color paint. i did created it BUT there is one problem. the paint DEPTH is always randomly placed.
here is the code, just need to connect the sceneView with the storyboard.
https://github.com/javaplanet17/test/blob/master/drawingar
my question is how do i make the program so that the depth will always be consistent, by consistent i mean there is always distance between the paint and the camera.
if you run the code above you will see that i have printed out all the SCNMatrix4, but i none of them is the DEPTH.
i have tried to change hitTransform.m43 but it only messes up the x and y.
If you want to get a point some consistent distance in front of the camera, you don’t want a hit test. A hit test finds the real world surface in front of the camera — unless your camera is pointed at a wall that’s perfectly parallel to the device screen, you’re always going to get a range of different distances.
If you want a point some distance in front of the camera, you need to get the camera’s position/orientation and apply a translation (your preferred distance) to that. Then to place SceneKit content there, use the resulting matrix to set the transform of a SceneKit node.
The easiest way to do this is to stick to SIMD vector/matrix types throughout rather than converting between those and SCN types. SceneKit adds a bunch of new accessors in iOS 11 so you can use SIMD types directly.
There’s at least a couple of ways to go about this, depending on what result you want.
Option 1
// set up z translation for 20 cm in front of whatever
// last column of a 4x4 transform matrix is translation vector
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
// get camera transform the ARKit way
let cameraTransform = view.session.currentFrame.camera.transform
// if we wanted, we could go the SceneKit way instead; result is the same
// let cameraTransform = view.pointOfView.simdTransform
// set node transform by multiplying matrices
node.simdTransform = cameraTransform * translation
This option, using a whole transform matrix, not only puts the node a consistent distance in front of your camera, it also orients it to point the same direction as your camera.
Option 2
// distance vector for 20 cm in front of whatever
let translation = float3(x: 0, y: 0, z: -0.2)
// treat distance vector as in camera space, convert to world space
let worldTranslation = view.pointOfView.simdConvertPosition(translation, to: nil)
// set node position (not whole transform)
node.simdPosition = worldTranslation
This option sets only the position of the node, leaving its orientation unchanged. For example, if you place a bunch of cubes this way while moving the camera, they’ll all be lined up facing the same direction, whereas with option 1 they’d all be in different directions.
Going beyond
Both of the options above are based only on the 3D transform of the camera — they don’t take the position of a 2D touch on the screen into account.
If you want to do that, too, you’ve got more work cut out for you — essentially what you’re doing is hit testing touches not against the world, but against a virtual plane that’s always parallel to the camera and a certain distance away. That plane is a cross section of the camera projection frustum, so its size depends on what fixed distance from the camera you place it at. A point on the screen projects to a point on that virtual plane, with its position on the plane scaling proportional to the distance from the camera (like in the below sketch):
So, to map touches onto that virtual plane, there are a couple of approaches to consider. (Not giving code for these because it’s not code I can write without testing, and I’m in an Xcode-free environment right now.)
Make an invisible SCNPlane that’s a child of the view’s pointOfView node, parallel to the local xy-plane and some fixed z distance in front. Use SceneKit hitTest (not ARKit hit test!) to map touches to that plane, and use the worldCoordinates of the hit test result to position the SceneKit nodes you drop into your scene.
Use Option 1 or Option 2 above to find a point some fixed distance in front of the camera (or a whole translation matrix oriented to match the camera, translated some distance in front). Use SceneKit’s projectPoint method to find the normalized depth value Z for that point, then call unprojectPoint with your 2D touch location and that same Z value to get the 3D position of the touch location with your camera distance. (For extra code/pointers, see my similar technique in this answer.)
I am saving my driven X/Y coordinates, and then using a function that convert the coordinates to meters, and add 1280 to each point (so it will fit nicely into a 2560x2560 image), and then draw a polygon between the 'points', resulting in a some sort of racing line. But once I have generated the polygon and saved it as an image, it is vertically flipped somehow. Flipping the image vertically will make it match the track bitmaps perfectly. I was told this is due to DirectX internally has the Y axis flipped. Why does DirectX use a flipped Y axis?
Well, the question is, does DirectX have a flipped Y-axis or does the image?
DirectX uses a 3D/4D coordinate system where the X-axis points to the right and Y-axis points upwards when no transformation is applied. This is because the screen (where Y-axis points downwards) is the last instance that has to process the image. Every step before that uses the coordinate system with the upward Y-axis. Since Direct3D is designed for 3D worlds, a coordinate system that is aligned like the world and like most coordinate system in maths is much more convenient for the programmer and designer. Imagine, you would create a 3D model. Wouldn't it be kind of weird, if you design it so that the Y-axis is pointing downwards?
When you have no transformation at all that would allow perspective and so on, you have the same coordinate system. Ignoring the Z-axis, the top left corner is (-1 | 1), the bottom right corner is (1, -1). This is equal to the coordinate systems used in e.g. maths. In the end, this coordinate system is transformed with the viewport which will result in the top left corner to be (0 | 0) and the bottom right corner to be (ResolutionX | ResolutionY).
So all in all, the reason why the Y-axis points upwards is that Direct3D's main purpose is to describe worlds in a convenient way independently of the screen's physical attributes.
I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.
I am just starting out in XNA and have a question about rotation. When you multiply a vector by a rotation matrix in XNA, it goes counter-clockwise. This I understand.
However, let me give you an example of what I don't get. Let's say I load a random art asset into the pipeline. I then create some variable to increment every frame by 2 radians when the update method runs(testRot += 0.034906585f). The main thing of my confusion is, the asset rotates clockwise in this screen space. This confuses me as a rotation matrix will rotate a vector counter-clockwise.
One other thing, when I specify where my position vector is, as well as my origin, I understand that I am rotating about the origin. Am I to assume that there are perpendicular axis passing through this asset's origin as well? If so, where does rotation start from? In other words, am I starting rotation from the top of the Y-axis or the x-axis?
The XNA SpriteBatch works in Client Space. Where "up" is Y-, not Y+ (as in Cartesian space, projection space, and what most people usually select for their world space). This makes the rotation appear as clockwise (not counter-clockwise as it would in Cartesian space). The actual coordinates the rotation is producing are the same.
Rotations are relative, so they don't really "start" from any specified position.
If you are using maths functions like sin or cos or atan2, then absolute angles always start from the X+ axis as zero radians, and the positive rotation direction rotates towards Y+.
The order of operations of SpriteBatch looks something like this:
Sprite starts as a quad with the top-left corner at (0,0), its size being the same as its texture size (or SourceRectangle).
Translate the sprite back by its origin (thus placing its origin at (0,0)).
Scale the sprite
Rotate the sprite
Translate the sprite by its position
Apply the matrix from SpriteBatch.Begin
This places the sprite in Client Space.
Finally a matrix is applied to each batch to transform that Client Space into the Projection Space used by the GPU. (Projection space is from (-1,-1) at the bottom left of the viewport, to (1,1) in the top right.)
Since you are new to XNA, allow me to introduce a library that will greatly help you out while you learn. It is called XNA Debug Terminal and is an open source project that allows you to run arbitrary code during runtime. So you can see if your variables have the value you expect. All this happens in a terminal display on top of your game and without pausing your game. It can be downloaded at http://www.protohacks.net/xna_debug_terminal
It is free and very easy to setup so you really have nothing to lose.
I have a sprite object in XNA.
It has a size, position and rotation.
How to translate a point from the screen coordinates to the sprite coordinates ?
Thanks,
SW
You need to calculate the transform matrix for your sprite, invert that (so the transform now goes from world space -> local space) and transform the mouse position by the inverted matrix.
Matrix transform = Matrix.CreateScale(scale) * Matrix.CreateRotationZ(rotation) * Matrix.CreateTranslation(translation);
Matrix inverseTransform = Matrix.Invert(transform);
Vector3 transformedMousePosition = Vector3.Transform(mousePosition, inverseTransform);
You might find the following XNA picking sample useful:
http://creators.xna.com/en-us/sample/picking
One solution is to hit test against the sprite's original, unrotated bounding box.
So given the 2D screen vector (x,y):
translate the 2D vector into local sprite space: (x,y) - (spritex,spritey)
apply inverse sprite rotation
perform hit testing against bounding box
The hit test can of course be made more accurate by taking into account the sprite shape.
I think it may be as simple as using the Contains method on Rectangle, the rectangle being the bounding box of your sprite. I've implemented drag-and-drop this way in XNA; I believe Contains tests based on x and y being screen coordinates.