Measure real distance between two points using iOS Depth camera - ios

Right now I'm exploring features of iOS Depth camera and now I want to obtain the distance in real-world metrics between two points (for example, between two eyes).
I have successfully connected iOS Depth camera functionality and I have AVDepthData in my hands but I'm not quite sure how I can get a real-world distance between two specific points.
I believe I could calculate it if I have depth and viewing angle, but I don't see that the latter is presented as parameter. Also I know that this task could be handled with ARKit, but I'm really curious how I can implement it myself. I mean ARKit uses Depth camera as well, so there must be an algorithm where Depth maps is all I need to calculate the real distance
Could you please give me an advice how to tackle this task? Thanks in advance!

Related

Basic principle to measure length between two points using a single camera

What is the basic principle for creating a virtual measuring app, given that I have normal camera, or bunch of cameras focused on a single object from different angles. So basically how to measure the distance between two physical points in a video without knowing anything else.
Simple answer: You can't do that.
The size of a pixel in object space depends on the distance to the camera.
Either you need depth information which is not available using a single standard camera or you need to know the size of a known object in the same distance. But the second doesn't work in most real world scenarios.
With a well known setup of multiple cameras you can do stereo vision.

Is it possible to get (force) absolute accuracy on AVDepthData from IPhoneX camera?

I need to get distance from the camera to points in the camera image with AVDepthData. I understand there are two kinds of accuracy associated to AVDepthData: relative and absolute, the latter being the one which corresponds to real life distance.
I cannot seem to generate an AVDepthData with absolute accuracy. Is it possible at all?
AVDepthData is a generic model object for representing depth maps from a variety of possible sources, including parallax-based disparity inference, time-of-flight-based depth inference, data recorded by third-party cameras, or data synthesized by a 3D rendering engine. Thus, it can represent and describe more types of data than the device you're currently using can capture.
(It's like having an image format that supports 10-bit-per-component color: just because UIImage or some other API can tell you it's holding a wide-color image doesn't mean you have a camera that captures such images.)
More specifically... you didn't say whether you're using the front or back camera on iPhone X, but that matters quite a bit to what kind of depth maps you can capture.
builtInDualCamera, which iPhone X has for the back-facing camera (as do iPhone 7/8 Plus), infers disparity — which is not quite the same as depth, but related — by analyzing the parallax offsets between two camera images. This technique doesn't produce absolute measurements of depth, but because disparity is inversely proportional to depth you can know which points are deeper than others. (And using the cameraCalibrationData you can do some math and maybe get some decent estimates of absolute depth.)
builtInTrueDepthCamera, which iPhone X (and so far only iPhone X) has for its front-facing camera, can measure disparity or depth with time-of-flight analysis. (And sharks with fricking laser beams!) This technique produces absolute measurements pretty well, as long as you can safely assume the speed of light.
Which technique is used determines what kind of measurement you can get, and which technique is used depends on the capture device you select. (And by the way, there's a wealth of information on how these techniques work in the WWDC17 talk on capturing depth.)
If you're looking for back-camera depth measurements in an absolute frame of reference, you might do better to look at ARKit — that's not going to get you accurate depth values for every pixel, because it depends on coarse scene reconstruction, but the distance values you can get are absolute.

Computing real depth map of image objects and reconstruction from several images

I have a next task: get a room 3d projection from multiple images (possible video stream, doesn't matter). There will be spherical camera (in fact multiple cameras on sphere-like construction), so the case is the right one on the image.
I decided to code it on iOS platform as I'm iOS developer and model cameras with iPhone cam rotating it as shown on the pic above. As I can decompose this task, first I need to get real distance to the objects (walls in most cases, I think). Is it possible? Which algoritms/methods should I use to achieve this? I don't ask you to make the task for me obviously, but give me the direction, because I have no idea, maybe some equations/tutorials/algorithms with explanation to my case. Thank you!
The task of building a 3D model from multiple 2D images is called "scene reconstruction." It's still an active area of research, but solutions involve recognizing the same keypoint (e.g. a distinctive part of an object) in two images. Once you have that, you can use the known camera geometry to solve for the 3D position of that keypoint in the world.
Here's a reference:
http://docs.opencv.org/3.1.0/d4/d18/tutorial_sfm_scene_reconstruction.html#gsc.tab=0
You can google "scene reconstruction" to find lots more, and papers that go into more detail.

Estimating pose of one camera given another with known baseline

I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?
Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.

how to find object distance from asus xtion pro camera opencv, ROS

Hi i am using an asus xtion pro live camera for my object detection, i am also new to opencv. Im trying to get distance of object from the camera. The Object detected is in 2d image. Im not sure on what should i use to get the information then following up with the calculations to get distance between camera and object detected. Could someone advise me please?
In short: You can't.
You're losing the depth information and any visible pixel in your camera image essentially transforms into a ray originating from your camera.
So once you've got an object at pixel X, all you know is that the object somewhere intersects the vector cast based on this pixel and the camera's intrinsic/extrinsic parameters.
You'll essentially need more information. One of the following should suffice:
Know at least one coordinate of the 3D point (e.g. everything detected is on the ground or in some known plane).
Know the relation between two projected points:
Either the same point from different positions (known camera movement/offset)
or two points with significant distance between them (like the two ends of some staff or bar).
Once you've got either, you're able to use simple trigonometry (rule of three) to calculate the missing values.
Since I initially missed this being a camera with an OpenNI compatible depth sensor, it's possible to build OpenCV with support for that by definining the preprocessor define WITH_OPENNI when building the library.
I don't like to be the one breaking this to you but what you are trying to do is either impossible or extremely difficult with a single camera.
You need to have the camera moving, record a video of it and use a complex technique such as this. Usually 3d information is created from at least 2 2d images taken from 2 different places. You also need to know quite precisely the distance and the rotation between the two images. The common technique is to have 2 cameras with a precisely measured distance between the two.
The Xtion is not a basic webcam. It's a stereo-scopic depth sensing cam similar to Kinect and Primesense. The main API for this is OpenNI - see http://structure.io/openni.

Resources