OpenCV: How-to calculate distance between camera and object using image? - image-processing

I am a newbie in OpenCV. I am working with the following formula to calculate distance:
distance to object (mm) = focal length (mm) * real height of the object (mm) * image height (pixels)
----------------------------------------------------------------
object height (pixels) * sensor height (mm)
Is there a function in OpenCV that can determine object distance? If not, any reference to sample code?

How to calculate distance given an object of known size
You need to know one of 2 things up front
Focal-length (in mm and pixels per mm)
Physical size of the image sensor (to calculate pixels per mm)
I'm going to use focal-length since I don't want to google for the sensor datasheet.
Calibrate the camera
Use the OpenCV calibrate.py tool and the Chessboard pattern PNG provided in the source code to generate a calibration matrix. I took about 2 dozen photos of the chessboard from as many angles as I could and exported the files to my Mac. For more detail check OpenCV's camera calibration docs.
Camera Calibration Matrix (iPhone 5S Rear Camera)
RMS: 1.13707201375
camera matrix:
[[ 2.80360356e+03 0.00000000e+00 1.63679133e+03]
[ 0.00000000e+00 2.80521893e+03 1.27078235e+03]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
distortion coefficients: [ 0.03716712 0.29130959 0.00289784 -0.00262589 -1.73944359]
f_x = 2803
f_y = 2805
c_x = 1637
c_y = 1271
Checking the details of the series of chessboard photos you took, you will find the native resolution (3264x2448) of the photos and in their JPEG EXIF headers, visible in iPhoto, you can find the Focal Length value (4.15mm). These items should vary depending on camera.
Pixels per millimeter
We need to know the pixels per millimeter (px/mm) on the image sensor. From the page on camera resectioning we know that f_x and f_y are focal-length times a scaling factor.
f_x = f * m_x
f_y = f * m_y
Since we have two of the variables for each formula we can solve for m_x and m_y. I just averaged 2803 and 2805 to get 2804.
m = 2804px / 4.15mm = 676px/mm
Object size in pixels
I used OpenCV (C++) to grab out the Rotated Rect of the points and determined the size of the object to be 41px. Notice I have already retrieved the corners of the object and I ask the bounding rectangle for its size.
cv::RotatedRect box = cv::minAreaRect(cv::Mat(points));
Small wrinkle
The object is 41px in a video shot on the camera # 640x480.
Convert px/mm in the lower resolution
3264/676 = 640/x
x = 133 px/mm
So given 41px/133px/mm we see that the size of the object on the image sensor is .308mm .
Distance formula
distance_mm = object_real_world_mm * focal-length_mm / object_image_sensor_mm
distance_mm = 70mm * 4.15mm / .308mm
distance_mm = 943mm
This happens to be pretty good. I measured 910mm and with some refinements I can probably reduce the error.
Feedback is appreciated.
Similar triangles approach
Adrian at pyimagesearch.com demonstrated a different technique using similar triangles. We discussed this topic beforehand and he took the similar triangles approach and I did camera intrinsics.

there is no such function available in opencv to calculate the distance between object and the camera. see this :
Finding distance from camera to object of known size
You should know that the parameters depend on the camera and will change if the camera is changed.

To get a mapping between the real world and camera without any prior information of the camera you need to calibrate the camera...here you can find some theory
For calculating the depth i.e. distance between camera and object you need at least two images of the same object taken by two different cameras...which is popularly called the stereo vision technique..

Related

How to compute distance values from the depth data embedded into iphone portrait pictures?

I am working on an application that reconstructs point cloud from the depth data.
I try to use depth data embedded in portrait images taken by some smartphones.
So far I have progress on the Google Camera app photos. It has well documented depth format (https://developer.android.com/training/camera2/Dynamic-depth-v1.0.pdf).
I am also able to extract depth related data from ios portrait pictures by uploading them to https://www.photopea.com/ or by using exiftool. Here is preview of embedded data:
Unfortunately I am unable to determine actual distance values encoded to that embedded image.
There is seems to be encoding related info in xmp metadata such as
Stored Format = 'L008' // one component 8 bit
Native Format = 'hdis' // half precision floating point disparity
Depth Data Version = 125537 // varies
Int min value = 0
Int max value = 255
Float Min Value // for example 1.23
Float Max Value // for example 2.12
I tried to interpolate image data values and use it as disparity the same way google camera does
distance_to_camera = constant * 1.0 / (Float_Max_Value * pixel / 255 + Float_Min_Value * (1 - pixel / 255))
but it results in heavily distorted point cloud. I think it is likely to be incorrect depth data interpretation.
If You're working only on getting point-cloud out if depth map then You don't need actual camera-object distance. You can generate point cloud directly with some visualization library. I recommend Vedo.
Then if You need that distance information, You need to get Your camera parameters. Check out OpenCV's tutorial about camera calibration.
There is seems to be multiple type of images taken to Apple devices.
Images taken to the frontal camera (TrueDepth) has "Accuracy: absolute" mark in embedded depth XMP metadata. Disparity (~ 1 / distance) for those pictures could be calculated as linear interpolation between values "Float min value" and "Float max value" as suggested in the question.
Images taken to the rear (dual) camera has "Accuracy: relative" mark in depth XMP metadata. Disparity calculated with method above has a shift which results in depth distortion. I was not able to compensate this shift by the terms of available image metadata.
I've had iphone image dataset with different known distances and it looks like this shift is not related to any of the parameters found in XMP.
Here is a cite from AVFoundation SDK
The accuracy of a depth data map is highly dependent on the camera
calibration data used to generate it. If the camera's focal length
cannot be precisely determined at the time of capture, scaling error
in the z (depth) plane will be introduced. If the camera's optical
center can't be precisely determined at capture time, principal point
error will be introduced, leading to an offset error in the disparity
estimate. AVDepthDataAccuracy constants report the accuracy of a map's
values with respect to its reported units. If the accuracy is reported
to be AVDepthDataAccuracyRelative, the values within the map are
usable relative to one another (that is, larger depth values are
farther away than smaller depth values), but do not accurately convey
real world distance. Disparity maps with relative accuracy may still
be used to reliably determine the difference in disparity between two
points in the same map.

How can I get the depth intensity from kinect depth image since it respresents distance of pixel from sensor

Recently i read a paper , they extract depth intensity and distance of pixel from camera using depth image. But, as far I know, each pixel value in depth image represents distance in mm [range:0-65536] then how can they extract depth intensity within a range [0 to 255] from depth image. I don't understand it. kinect sensor returns uint16 depth frame which includes the each pixel distance from sensor. It does not return any intensity value, then how can the paper demonstrates that they extract depth intensity . I am really confused.
Here is the paper link
This is the graph what I want to extract(collected from the paper:
Since there is no an answer for this question , i will suggest you approach for getting your own depth image data .
One simple way can be scaling the image based on following formula:
Pixel_value=Pixel_value/4500*65535
If you want see the exact image that you get from uint8 ; I guess the following steps will work for you.
Probably while casting the image to uint8 matlab firstly clip the values above some threshold lets say 4095=2**12-1 (i'm not sure about value) and then it makes right shifts (4 shifts in our case) to make it inside the range of 0-255.
So i guess multiplying the value of uint8 with 256 and casting it as uint16 will help you get the same image
Pixel_uint16_value= Pixel_uint8_value*256 //or Pixel_uint16_value= Pixel_uint8_value<<8
//dont forget to cast the result as uint16
The other way to converting raw data to depth image in millimeters.
The depth image
should be stored in millimeters and as 16 bit unsigned
integers. The following two formulas can be used for
converting raw-data to millimeters .
distance = 1000/ (− 0.00307 ∗ rawDisparity + 3.33 )
distance = 123.6 ∗ tan ( rawDisparity/2842.5 + 1.1863 )
Save each distance value to coressponding rawdisparty pixel. Save them as 16 bit unsigned grayscale png images. Check this link for details.
Quick answer:
You can get the intensity by getting the intensity of corresponding IR pixel. let say you have a IR pixel array irdata,
then you can get the intensity of the ith pixel by
byte intensity = (byte)(irdata[i] >> 8);
In Kinect v2 only has two cameras, One is RGB camera and other one is IR camera. It uses IR camera to calculate the depth of the image by using the time-of-flight (TOF). If you need more information, please comment here or find my project on Kinect in github https://github.com/shanilfernando/VRInteraction. I'm more than happy to help you.
Edit
As you know depth is the distance between Kinect sensor to the object in a given space. The Kinect IR emitter emit bunch of IR rays and start counting time. Once the IR rays reflect back to the depth sensor(IR sensor) of the kinect, it stop the time counter. The time (t) between emission and receiving that specific ray is called the time-of-flight of that ray. Then distance (d) between kinect and the object can be calculated by
d = (t * speed-of-light)/2
This is done for all the rays it emits and build the IR image and depth image. Each and every ray represent a pixel in IR and depth images.
I read your reference paper, First of all, they are NOT using a depth image which is captured from the Kinect V2. It clearly said its resolution is 640 × 480 and effective distance range from 0.8 meters to 3.5 meters.
I want you to clearly understnad, the depth frame and depth image are two different things elements. If you check the depth frame, each pixel is a distance and in depth image each pixel is intensity(How much bright/brightness).
In this plot they are trying to plot intensity of the star point against the actual distance of the star point. They are starting with a depth (intensity) image, NOT depth frame. depth frame you can scale in to a depth image where values are 0 to 255 where near points has higher values and further points has lower values.
I guess you were trying to read depth from a Image file .png because of which the data is converted to binary form.
I would suggest you to save the depth image in .tiff format rather than png format.

Calculate focus to map world point on imageplane

I try to calculate the focus value to map a world point on to image plane.
I use raspberry pi camera v2. I did get the camera matrix from opencv it gives me for fx and fy 204. Got nearly the same value by measuring at known distance and size of object.
But when I use a formular I get wrong values.
My formular is
Fpix=sensorsize_pix * focus_mm/sensorsize_mm=1pix*focus_mm/pixsize_mm
I'm using as values:
320x240 image.
Image is taken with 640x480 resolution and then binned 2x2 in Software.
Because the image is already binned by driver I would have a total binning of 4x4.
The original pixel size 1.4um and focus 3.00mm
Which would give me a binned pixelsize of 5.6um.
So I would calculate
Fpix=1pix*3.0mm/0.0056mm=536pix
which is a huge difference to 204pix
The specification for the sensor can be found herelink
As I would consider opencv and measurements as correct. Something must be wrong with my formular.

Computing x,y,z coordinate (3D) from image point (2)

Referring to the question: Computing x,y coordinate (3D) from image point
If I have the coordinate Z of the point measured in pixel (not in mm), how can I do the same thing shown in the question above?
The calibration matrix A returned by calibrateCamera provides the scaling factors, when paired with the physical dimensions of the sensor. Use the calibrationMatrixValues routine to do the conversions. You can get the sensor dimensions from the camera spec sheet or (sometimes) from the image EXIF header.
Once you have the f_mm from it, it is Z_mm = f_mm / fx * Z_pixels.

Opencv: Find focal lenth in mm in an analog camera

I have sucessfully calibrated an analog camera using opencv. The ouput focal length and principal points are in pixels.
I know in digital cameras you can easily multiply the size of the pixel in the sensor by the focal length in pixels and get the focal length in mm (or whatever).
How can I do with this analog camera to get the focal length in mm?
The lens manufacturers usually write focal length on the lens. Even the name of the lens contains it, e.g. "canon lens 1.8 50mm".
If not, you can try to measure it manually.
Get lens apart from the camera. Take a small well illuminated object, place it in 1-3 meters in from of lens and sheet of paper back from it. Get sharp and focused image of the object on the paper.
Now measure following:
a - distance from lens to the object;
y - object size;
y' - object image size on the paper;
f = a/(1+y/y') - focus distance.
If your output is in pixels, you must be digitizing the analog input at some point. You just need to figure out the size of the pixel that you are creating.
For example, if you are scanning film in, then you use the pixel size of the scanner.

Resources