How to read the geographical coordinates of a pixel in GDAL? - geolocation

I have an image in UTM projection, given a pixel value (px, py) I'd need to get its corresponding utm coordinates (utx, utmy).
Is it possible? I tried
gdallocationinfo input.tif 256 256
but I did not get the expected result.
Thank you
M.

gdallocationinfo is for getting pixel values out of a raster. For conversion you should use gdaltransform. If you provide the target image as an argument you can enter pixel coordinates and get projected coordinates in return.
So:
gdaltransform input.tif
256 256

It's a math problem, you just need to fetch the affine transformation coefficients, which you can easily do with the Python bindings. Also, it depends on your pixel coordinate system (e.g., does it start with [1,1] or [0,0]? And, which corner is the origin?)

Related

Align RGB image to Depth Image using Intrinsic and Extrinsic Matrix

similar questions are solved many times. However, they generally maps depth coordinates to RGB coordinates, by following the next steps:
apply the inverse depth intrinsic matrix to the depth coordinates.
rotate and translate the 3d coordinates obtained using the rotation R and T matrixes that maps 3d depth coordinates to 3D RGB coordinates.
apply the RGB intrinsic matrix to obtain the image coordinates.
However, I want to do the reverse process. From a RGB coordinates obtain the depth coordinates. Then I can obtain an interpolated value from the depth map based on those coordinates.
The problem is that I don't know how can I define the z coordinate in the RGB image to make everything works.
The process should be:
obtain 3D RGB coordinates by applying the camera's inverse intrinsic matrix. How can I set the z coordinates? Should I define and estimated value? Set all the z coordinates to one?
rotate and translate the 3D RGB coordinates to the 3d coordinates.
apply the depth intrinsic matrix.
If this process cannot be done. How can I map RGB coordinates to depth coordinates instead of the other way around?
Thank you!

How can I get the depth intensity from kinect depth image since it respresents distance of pixel from sensor

Recently i read a paper , they extract depth intensity and distance of pixel from camera using depth image. But, as far I know, each pixel value in depth image represents distance in mm [range:0-65536] then how can they extract depth intensity within a range [0 to 255] from depth image. I don't understand it. kinect sensor returns uint16 depth frame which includes the each pixel distance from sensor. It does not return any intensity value, then how can the paper demonstrates that they extract depth intensity . I am really confused.
Here is the paper link
This is the graph what I want to extract(collected from the paper:
Since there is no an answer for this question , i will suggest you approach for getting your own depth image data .
One simple way can be scaling the image based on following formula:
Pixel_value=Pixel_value/4500*65535
If you want see the exact image that you get from uint8 ; I guess the following steps will work for you.
Probably while casting the image to uint8 matlab firstly clip the values above some threshold lets say 4095=2**12-1 (i'm not sure about value) and then it makes right shifts (4 shifts in our case) to make it inside the range of 0-255.
So i guess multiplying the value of uint8 with 256 and casting it as uint16 will help you get the same image
Pixel_uint16_value= Pixel_uint8_value*256 //or Pixel_uint16_value= Pixel_uint8_value<<8
//dont forget to cast the result as uint16
The other way to converting raw data to depth image in millimeters.
The depth image
should be stored in millimeters and as 16 bit unsigned
integers. The following two formulas can be used for
converting raw-data to millimeters .
distance = 1000/ (− 0.00307 ∗ rawDisparity + 3.33 )
distance = 123.6 ∗ tan ( rawDisparity/2842.5 + 1.1863 )
Save each distance value to coressponding rawdisparty pixel. Save them as 16 bit unsigned grayscale png images. Check this link for details.
Quick answer:
You can get the intensity by getting the intensity of corresponding IR pixel. let say you have a IR pixel array irdata,
then you can get the intensity of the ith pixel by
byte intensity = (byte)(irdata[i] >> 8);
In Kinect v2 only has two cameras, One is RGB camera and other one is IR camera. It uses IR camera to calculate the depth of the image by using the time-of-flight (TOF). If you need more information, please comment here or find my project on Kinect in github https://github.com/shanilfernando/VRInteraction. I'm more than happy to help you.
Edit
As you know depth is the distance between Kinect sensor to the object in a given space. The Kinect IR emitter emit bunch of IR rays and start counting time. Once the IR rays reflect back to the depth sensor(IR sensor) of the kinect, it stop the time counter. The time (t) between emission and receiving that specific ray is called the time-of-flight of that ray. Then distance (d) between kinect and the object can be calculated by
d = (t * speed-of-light)/2
This is done for all the rays it emits and build the IR image and depth image. Each and every ray represent a pixel in IR and depth images.
I read your reference paper, First of all, they are NOT using a depth image which is captured from the Kinect V2. It clearly said its resolution is 640 × 480 and effective distance range from 0.8 meters to 3.5 meters.
I want you to clearly understnad, the depth frame and depth image are two different things elements. If you check the depth frame, each pixel is a distance and in depth image each pixel is intensity(How much bright/brightness).
In this plot they are trying to plot intensity of the star point against the actual distance of the star point. They are starting with a depth (intensity) image, NOT depth frame. depth frame you can scale in to a depth image where values are 0 to 255 where near points has higher values and further points has lower values.
I guess you were trying to read depth from a Image file .png because of which the data is converted to binary form.
I would suggest you to save the depth image in .tiff format rather than png format.

Understanding Disparity Map in Opencv

Can somebody explain me what exactly does a disparity map return. Because there is not much given in the documentation and I have a few questions related to it.
Does it return difference values of pixels with respect to both images?
How to use disparity values in the formula for depth estimation i.e.
Depth = focalLength*Baseline/Disparity
I have read somewhere that disparity map gives a function of depth f(z)
Please explain what it means. If depth is purely an absolute value how can it be generated as a function or is it a function with respect to the pixels?
The difference d = pl − pr of two corresponding image points is called disparity.
Here, pl is the position of the point in the left stereo image and pr is the position of the point in the right stereo image.
For parallel optical axes, the disparity is d = xl − xr
⇒ search for depth information is equivalent to search for disparity, i.e. corresponding pixel
the distance is inversely proportional to disparity
The disparity values are visualized in a so-called disparity map, each disparity value for each pixel in the reference image (here: left) is coded as a grayscale value. Also for pixel that do not have any correspondences, a grayscale value (here: black) is defined. The so-called groundtruth-map is a disparity map that contains the ideal solution of the correspondence problem.
Relation between Disparity and Depth information:
The following image represent two cameras (left and right) and then tries to find the depth of a point p(x_w, z_x).
The result of depth is given my:
so, it can be seen that the depth is inversely proportional to disparity.
UPDATE:
To calculate the disparity, you need two image (1) Left image and (2) Right image. Lets say that there is a pixel at position (60,30) in left image and that same pixel is present at position (40,30) in right image then your disparity will be: 60 - 40 = 20. So, disparity map gives you the difference between the position of pixels between left image and right image. If a pixel is present in left image but absent in right image then then value at that position in disparity map will be zero. Once you get the disparity value for each pixel of left image then we can easily calculate the depth using the formula given at the end of my answer.

Computing x,y,z coordinate (3D) from image point (2)

Referring to the question: Computing x,y coordinate (3D) from image point
If I have the coordinate Z of the point measured in pixel (not in mm), how can I do the same thing shown in the question above?
The calibration matrix A returned by calibrateCamera provides the scaling factors, when paired with the physical dimensions of the sensor. Use the calibrationMatrixValues routine to do the conversions. You can get the sensor dimensions from the camera spec sheet or (sometimes) from the image EXIF header.
Once you have the f_mm from it, it is Z_mm = f_mm / fx * Z_pixels.

image transformations

So I've been using gnu-gsl and CImg to implement some of the fundamental projective space techniques for affine and metric rectification.
I've completed computing an affine rectification but, I'm having a hard time figuring out how to apply the affine rectification matrix to the original (input) image.
My current thought process is to iterate across the input image for each pixel coordinate. Then multiply the original pixel coordinate (converted to a homogeneous coordinate) by the affine rectification matrix to get the output pixel coordinate.
Then access the output image using the output pixel coordinate and conduct a blend (addition) operation on the output image's pixel location with the pixel color from the original image.
Does that sound right? I'm getting a lot of really weird values after multiplying the original pixel coordinate by the affine rectification matrix.
No, your values should not be weird. Why don't you make a simple example, a small scale with a small translation; e.g.
x' = 1.01*x + 0.0*y + 5;
y' = 0.0*x + 0.98*y + 10;
Now the pixel at (10,10) should map to (15.1,19.8), right ?
If you want to make a nice output image, you should find the forward projection and then back project to the input image and interpolate there rather than try to blend into the output image. Otherwise you will end up with gaps in the output.
You need to be careful with your terminology here; it sounds to me like you are doing projections, sometimes called warping in the computer graphics community. Rectification is something else, but it depends on what you are doing.

Resources