SceneKit unproject Z documentation explanation? - ios

I am going through some SceneKit concepts and one I am trying to solidify in my head is unprojectPoint.
I understand that the function will take a point in 2D and return a point in 3D (so one with the proper Z value).
When I read the documentation I read this:
#method unprojectPoint
#abstract Unprojects a screenspace 2D point with depth info using the receiver's current point of view and viewport.
#param point The screenspace position to be unprojected.
#discussion A point whose z component is 0 (resp. 1) is unprojected on the near (resp. far) clip plane.
public func unprojectPoint(_ point: SCNVector3) -> SCNVector3
What I am not too clear on is the values 0 and 1 used when it talks about Z....
A point whose z component is 0 (resp. 1) is unprojected on the near (resp. far) clip plane.
As I was reading around online I then found this question:
How to use iOS (Swift) SceneKit SCNSceneRenderer unprojectPoint properly
When I deal with a SceneKit view, is Z = 0 always the near plane, and Z = 1 the far plane? If so, why? And, is Z = 0 and Z = 1 just normalized values?
So, can somebody help me understand why the value 0 and 1 are used for Z in this context? And ultimately help me understand the:
A point whose z component is 0 (resp. 1) is unprojected on the near (resp. far) clip plane.

Perspective projection is the task of converting a point from the 3D space used for modeling your scene into the 2D pixel space of the view your scene is rendered in. It's something the GPU does thousands of times per frame during rendering.
But it's not entirely a 3D-to-2D conversion. It's important during rendering to sort out which objects are nearer to or farther from the camera (so they obscure each other properly), so perspective projection also outputs a normalized depth component, where lower values indicate a point nearer to the camera (and vice versa). (Values are normalized because at this point all that's needed is relative depth. And/or for reasons of traditional 3D graphics math history and GPU design.) This information gets used during rendering but effectively thrown away afterward — all you see is the 2D view.
"Unprojecting" a point is doing the same thing in reverse: given a point in 2D screen space, you want a point in 3D scene space. But a 2D point in screen space corresponds to a line in 3D space: each pixel in your view is looking along a ray into the 3D scene, and what you see in that pixel comes from the first 3D object that ray intersects.
Thus, to unproject a 2D point into 3D scene space, you need the 2D point itself to define a ray into the 3D scene, then a normalized depth value to decide how far along the ray you want the resulting 3D point to be. (Beware, normalized depth doesn't linearly correspond to distance because of perspective division.)
If you don't know the depth of the point your looking for, there are two things to consider...
Are you actually looking for the scene content (geometry) "behind" a specific pixel? If so, a hit test is more likely what you need.
Is the 3D point you want to get from unprojecting related to another point, enabling you to derive the Z value? There are a few other Q&As around here for that: How to use iOS (Swift) SceneKit SCNSceneRenderer unprojectPoint properly, How to convert 2D point to 3D using SceneKit's unprojectPoint without having a depth value?


Find the Transformation Matrix that maps 3D local coordinates to global coordinates

I'm coding a calibration algorithm for my depth-camera. This camera outputs an one channel 2D image with the distance of every object in the image.
From that image, and using the camera and distortion matrices, I was able to create a 3D point cloud, from the camera perspective. Now I wish to convert those 3D coordinates to a global/world coordinates. But, since I can't use any patterns like the chessboard to calibrate the camera, I need another alternative.
So I was thinking: If I provide some ground points (in the camera perspective), I would define a plane that I know should have the Z coordinate close to zero, in the global perspective. So, how should I proceed to find the transformation matrix that horizontalizes the plane.
Local coordinates ground plane, with an object on top
I tried using the OpenCV's solvePnP, but it didn't gave me the correct transformation. Also I thought in using the OpenCV's estimateAffine3D, but I don't know where should the global coordinates be mapped to, since the provided ground points do not need to lay on any specific pattern/shape.
Thanks in advance
What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.
To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:
Easier explanation:
Python code (from the easier explanation site):
So, if you want to transform 3D points from the camera reference frame, do the following:
As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.
Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:
q =
EDIT: answer when 3D position of points in world are unspecified
Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.
Here is a possible solution:
Take the selected 3D points in camera reference frame, let's call them q'i.
Fit a plane to these points, for example as described in The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
Use these values of pi and qi to estimate R and t, as in the first part of the answer!
Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.

finding the depth in arkit with SCNVector3Make

the goal of the project is to create a drawing app. i want it so that when i touch the screen and move my finger it will follow the finger and leave a cyan color paint. i did created it BUT there is one problem. the paint DEPTH is always randomly placed.
here is the code, just need to connect the sceneView with the storyboard.
my question is how do i make the program so that the depth will always be consistent, by consistent i mean there is always distance between the paint and the camera.
if you run the code above you will see that i have printed out all the SCNMatrix4, but i none of them is the DEPTH.
i have tried to change hitTransform.m43 but it only messes up the x and y.
If you want to get a point some consistent distance in front of the camera, you don’t want a hit test. A hit test finds the real world surface in front of the camera — unless your camera is pointed at a wall that’s perfectly parallel to the device screen, you’re always going to get a range of different distances.
If you want a point some distance in front of the camera, you need to get the camera’s position/orientation and apply a translation (your preferred distance) to that. Then to place SceneKit content there, use the resulting matrix to set the transform of a SceneKit node.
The easiest way to do this is to stick to SIMD vector/matrix types throughout rather than converting between those and SCN types. SceneKit adds a bunch of new accessors in iOS 11 so you can use SIMD types directly.
There’s at least a couple of ways to go about this, depending on what result you want.
Option 1
// set up z translation for 20 cm in front of whatever
// last column of a 4x4 transform matrix is translation vector
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
// get camera transform the ARKit way
let cameraTransform =
// if we wanted, we could go the SceneKit way instead; result is the same
// let cameraTransform = view.pointOfView.simdTransform
// set node transform by multiplying matrices
node.simdTransform = cameraTransform * translation
This option, using a whole transform matrix, not only puts the node a consistent distance in front of your camera, it also orients it to point the same direction as your camera.
Option 2
// distance vector for 20 cm in front of whatever
let translation = float3(x: 0, y: 0, z: -0.2)
// treat distance vector as in camera space, convert to world space
let worldTranslation = view.pointOfView.simdConvertPosition(translation, to: nil)
// set node position (not whole transform)
node.simdPosition = worldTranslation
This option sets only the position of the node, leaving its orientation unchanged. For example, if you place a bunch of cubes this way while moving the camera, they’ll all be lined up facing the same direction, whereas with option 1 they’d all be in different directions.
Going beyond
Both of the options above are based only on the 3D transform of the camera — they don’t take the position of a 2D touch on the screen into account.
If you want to do that, too, you’ve got more work cut out for you — essentially what you’re doing is hit testing touches not against the world, but against a virtual plane that’s always parallel to the camera and a certain distance away. That plane is a cross section of the camera projection frustum, so its size depends on what fixed distance from the camera you place it at. A point on the screen projects to a point on that virtual plane, with its position on the plane scaling proportional to the distance from the camera (like in the below sketch):
So, to map touches onto that virtual plane, there are a couple of approaches to consider. (Not giving code for these because it’s not code I can write without testing, and I’m in an Xcode-free environment right now.)
Make an invisible SCNPlane that’s a child of the view’s pointOfView node, parallel to the local xy-plane and some fixed z distance in front. Use SceneKit hitTest (not ARKit hit test!) to map touches to that plane, and use the worldCoordinates of the hit test result to position the SceneKit nodes you drop into your scene.
Use Option 1 or Option 2 above to find a point some fixed distance in front of the camera (or a whole translation matrix oriented to match the camera, translated some distance in front). Use SceneKit’s projectPoint method to find the normalized depth value Z for that point, then call unprojectPoint with your 2D touch location and that same Z value to get the 3D position of the touch location with your camera distance. (For extra code/pointers, see my similar technique in this answer.)

Recreate the 3D outlines of a City street in iOS SceneKit with OSM XML data

What is best strategy to recreate part of a street in iOS SceneKit using .osm XML data?
Please assume part of a street is offered in the OSM XML data and contains the necessary geopoints with latitude and longitude denoting the Nodes to describe the paths/footprints of 6 buildings (i.e. ground floor plans that line the side of a street).
Specifically, what's the best strategy to convert latitude and longitude Nodes in order to locate these building footprints/polygons on the ground floor in a scene within SceneKit iOS? (i.e. running through position 0,0,0)? Thank you.
Very roughly and briefly, based on my own experience with 3D map rendering:
Transform the XML data from lat/long to appropriate coordinates for a 2D map (that is, project it to a plane using a map projection, then apply a 2D affine transform to get it into screen pixel coordinates). Create a 2D map that's wider and taller than the actual screen, because of what's going to happen in step 2:
Using a 3D coordinate system with your map vertical (i.e., set all the Z coordinates to zero), rotate the map so that it reclines at an appropriate shallow angle, as if you're in an aeroplane looking down on it; the angle might be 30 degrees from horizontal. To rotate the map you'll need to create a 3D rotation matrix. The axis of rotation will be the X axis: that is, the horizontal line that is the bottom border of your 2D map. The rotation is exactly the same as what happens when you rotate your laptop screen away from you.
Supply the new 3D coordinates to your rendering system. I haven't used SceneKit but I had a quick look at the documentation and you can use any coordinate system you like, so you will be able to use one that is convenient for the process I have just described: something that uses units the size of a screen pixel at the viewing plane, with Y going upwards, X going right, and Z going away from the viewer.
One final caveat: if you want to add extrusions giving a rough approximation of the 3D building shapes (such data is available in OSM for some areas) note that my scheme requires the tops of buildings, and indeed anything above ground level, to have negative Z coordinates.
Pretty simple. First, convert Your CLLocationCoordinate2D to a MKMapPoint, which is exactly the same as a CGRect. Second, scale down the MKMapPoint by some arbitrary number so it fits in with how you want it on your scene graph, let's say by 200. Since scenekit's coordinate system is centered at (0,0), you'll need to make sure your location is correct. Then just create your scnvector3's with the x/y of he MKMapPoint, and you will be locked to coordinates.

How to convert 2D point to 3D using SceneKit's unprojectPoint without having a depth value?

Is it possible to use SceneKit's unprojectPoint to convert a 2D point to 3D without having a depth value?
I only need to find the 3D location in the XZ plane. Y can be always 0 or any value since I'm not using it.
I'm trying to do this for iOS 8 Beta.
I had something similar with JavaScript and Three.js (WebGL) like this:
function getMouse3D(x, y) {
var pos = new THREE.Vector3(0, 0, 0);
var pMouse = new THREE.Vector3(
(x / renderer.domElement.width) * 2 - 1,
-(y / renderer.domElement.height) * 2 + 1,
projector.unprojectVector(pMouse, camera);
var cam = camera.position;
var m = pMouse.y / ( pMouse.y - cam.y );
pos.x = pMouse.x + ( cam.x - pMouse.x ) * m;
pos.z = pMouse.z + ( cam.z - pMouse.z ) * m;
return pos;
But I don't know how to translate the part with unprojectVector to SceneKit.
What I want to do is to be able to drag an object around in the XZ plane only. The vertical axis Y will be ignored.
Since the object would need to move along a plane, one solution would be to use hitTest method, but I don't think is very good in terms of performance to do it for every touch/drag event. Also, it wouldn't allow the object to move outside the plane either.
I've tried a solution based on the accepted answer here, but it didn't worked. Using one depth value for unprojectPoint, if dragging the object around in the +/-Z direction the object doesn't stay under the finger too long, but it moves away from it instead.
I need to have the dragged object stay under the finger no matter where is it moved in the XZ plane.
First, are you actually looking for a position in the xz-plane or the xy-plane? By default, the camera looks in the -z direction, so the x- and y-axes of the 3D Scene Kit coordinate system go in the same directions as they do in the 2D view coordinate system. (Well, y is flipped by default in UIKit, but it's still the vertical axis.) The xz-plane is then orthogonal to the plane of the screen.
Second, a depth value is a necessary part of converting from 2D to 3D. I'm not an expert on three.js, but from looking at their library documentation (which apparently can't be linked into), their unprojectVector still takes a Vector3. And that's what you're constructing for pMouse in your code above — a vector whose z- and y-coordinates come from the 2D mouse position, and whose z-coordinate is 1.
SceneKit's unprojectPoint works the same way — it takes a point whose z-coordinate refers to a depth in clip space, and maps that to a point in your scene's world space.
If your world space is oriented such that the only variation you care about is in the x- and y-axes, you may pass any z-value you want to unprojectPoint, and ignore the z-value in the vector you get back. Otherwise, pass -1 to map to the far clipping plane, 1 for the near clipping plane, or 0 for halfway in between — the plane whose z-coordinate (in camera space) is 0. If you're using the unprojected point to position a node in the scene, the best advice is to just try different z-values (between -1 and 1) until you get the behavior you want.
However, it's a good idea to be thinking about what you're using an unprojected vector for — if the next thing you'd be doing with it is testing for intersections with scene geometry, look at hitTest: instead.

Is there a reverse function of lookat for glMatrix?

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.
