I'm aware of the existence of SCNLookAtConstraint in SceneKit. This method, however, is not what I'm looking for.
What I need is able to point the camera at any given moment to a point (not a node) in space.
Having a X, Y, Z in world coordinates, I need the camera to rotate so that it points towards this point.
The points will change over time (meaning different points are being passed to a method, I need to 'jump' from one to the other).
I'd rather skip calculating eulerAngles, some solutions I have seen are based on trig functions and cos(angle) in the denominator. The user is basically free to rotate by dragging the cameraNode.
Any method I'm missing for this in SceneKit?
It seems to me that using a node as the target is the simplest way to go. If you create an empty node (no geometry or camera or light) it won't be visible, and you'll be able to use the highest available abstractions that the framework offers you. Move the node from point to point, with animation or with abrupt jumps.
The GLKMatrix4 methods may be of interest here, specifically GLKMatrix4MakeLookAt.
Given an eye position (camera), a position you wish to look at, and an up vector you will be able to create a transformation matrix (as a GLKMatrix4). The SceneKit function SCNMatrix4FromGLKMatrix4 will then translate this to a SCNMatrix4 that you can set as the transform of your camera node.
TBH, I would probably go with Hal's solution.
Related
We are doing a project in which we are detecting (using YOLOv4) runway debris using a fixed camera mounted at a pole on the side of the runway. We want to find out the position of the object with respect to the runway surface. How can we find out the distance of the object from the runway sides?
I would recommend using reliable sensors such as "light curtains" to make sure there is no debris on a runway. AI can fail, especially if things are hard to see.
As for mapping image coordinates to world coordinates on a plane, look into "homography". OpenCV has getPerspectiveTransform. it's easiest if you pick a big rectangle of known dimensions in your scene (runway), mark the corners, find them in your picture, note those points too, and those four pairs you give to getPerspectiveTransform. now you have a homography matrix that will transform coordinates back and forth (you can invert it).
check out the tutorials section in OpenCV's documentation for more on homographies.
I have few related questions.
1) Can you get the position of the device in the global coordinates? I tried to get this value using ARFrame.camera.transform.colums.3, but it seems like the [X,Y,Z] in this column is alway [0,0,0]. I interpreted this transform to be the camera's orientation with respective to the body frame. Can someone explain what exactly you get out of the ARFrame.camera.transform?
2) If we have the position of the device (camera) in the global coordinates, I assume we can easily get the velocity of the device. Is this a valid statement?
3) Can you only get the global position when you are tracking an object? Thus, you get your position relative to the tracked object? I would like to get the speed of the device even when the camera shakes a lot, thus the tracking quality is not always good.
Yes, you can make a speedometer with ARKit. A few people have already.
Regarding your more specific questions...
ARKit doesn’t have “global coordinates”, or probably not in the sense you’re thinking. Camera and anchor transforms use a shared reference frame (“world” space in traditional 3D graphics parlance), but that reference frame is valid only within the session: 0,0,0 is where your camera/device was at the beginning of the session.
If you have two positions at two different times in any shared reference frame, the difference between those positions is the average velocity over that time.
ARKit doesn’t track objects. The camera transform is always relative to “world” space. As mentioned above, it’s 0,0,0 at the beginning of your session because the reference frame is based within the session.
If you want Global positioning — that is, relative to the Earth — you should be looking at Core Location. Note that there’s a difference of scale and precision, though: GPS is accurate to a meter or two but operates at planet scale, and ARKit is accurate to a centimeter or two but operates at room scale.
I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.
I am working on converting 2d images to 3d environment. The images were collected from a video made in a lateral motion. Then the images were placed one behind the other, so it would be easy to find the correspondences between the two images. This is called a spatial-temporal volume.
Next I take a slice from the spatiotemporal volume. That slice is called the Epipolar Plane Image.
Using the Epipolar Plane Image, I want to calculate the depth of the objects in the scene and make a 3D enviornment. I have listed the reference but I have not been able to figure out the math described in the paper. Can someone help me figure this out? Any help is appreciated.
Reference
Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion* !
The math in this situation is easy and straight forward.
First let's define two the coordinate systems for two overlapping images taken by the same camera with the focal length with the following schema:
Let us say that first camera position is defined as follows:
While it's orientation by using three Euler angles is:
By using this definition the corresponding rotation matrix is the identity matrix
The second camera position can be defined as follows:
And since the orientation is the same as the first camera, all Euler angles remain zero:
Which also means that the corresponding rotation matrix is the identity matrix.
If the images overlap and the orientation is the same, the situation in the image space looks like this:
Here the image coordinates and their measurement accuracy are defined as follows:
This geometrical situation can be described by using the Intercept Theorem:
As you see it's not complicated. But be aware that this solution is certainly not the best, since it's base assumption that all orientation angles are the same can't be fulfilled in reality.
If you need to be accurate then you have to perform an bundle adjustment. However, this equations are often used to determine the approximated solution for this geometric situation, where the values are used to linearize the collinearity equations.
I'm using open cv in C++ in multi-view scene with two cameras. I have the intrinsic and extrinsic parameters for both cameras.
I would like to map a (X,Y) point in View 1 to the same point in the second View. I'm am slightly unsure how I should use the intrinsic and extrinsic matrices in order to convert the points to a 3D world and finally end up with the new 2D point in view 2.
It is (normally) not possible to take a 2D coordinate in one image and map it into another 2D coordinate without some additional information.
The main problem is that a single point in the left image will map to a line in the right image (an epipolar line). There are an infinite number of possible corresponding locations because depth is a free parameter. Secondly it's entirely possible that the point doesn't exist in the right image i.e. it's occluded. Finally it may be difficult to determine exactly which point is the right correspondence, e.g. if there is no texture in the scene or if it contains lots of repeating features.
Although the fundamental matrix (which you get out of cv::StereoCalibrate anyway) gives you a constraint between points in each camera: x'Fx = 0, for a given x' there will be a whole family of x's which will satisfy the equation.
Some possible solutions are as follows:
You know the 3D location of a 2D point in one image. Provided that 3D point is in a common coordinate system, you just use cv::projectPoints with the calibration parameters of the other camera you want to project into.
You do some sparse feature detection and matching using something like SIFT or ORB. Then you can calculate a homography to map the points from one image to the other. This makes a few assumptions about things being planes. If you Google panorama homography, there are plenty of lecture slides detailing this.
You calibrate your cameras, perform an epipolar rectification (cv::StereoRectify, cv::initUndistortRectifyMap, cv::remap) and then run them through a stereo matcher. The output is a disparity map which gives you exactly what you want: a per-pixel mapping from one camera to the other. That is, left[y,x] = right[y, x+disparity_map[y,x]].
(1) is by far the easiest, but it's unlikely you have that information already. (2) is often doable and might be suitable, and as another commenter pointed out will be poor where the planarity assumption fails. (3) is the general (ideal) solution, but has its own drawbacks and relies on the images being amenable to dense matching.