I'm trying to test my algorithm on the lineMOD object detection dataset. According to the author, the depth values are stored as unsigned short values. I've managed to load the depth values into a cv::Mat but I would like to convert them to the typical float representation [0,1].
At first I assumed that I just have to divide with the maximum unsigned short but this doesn't seem to be the case since the maximum value I find seems to be 3399 while there are a lot of zeros in the depth map. I suppose the zeros mean that the specific pixel is a point that is too far for the depth camera to detect.
Is it possible that these unsigned shorts represent millimeters? If not, how should I convert the depth values before applying the transforms that generate the point cloud?
I guess the pixel values are not millimeters, rather some relative values, because it is easier for a depth camera to get relative depth values than accurate millimeter values, the values even might not be linear. Consult the author to get more information.
You may try a few options:
Consult the author to fully understand what does the depth value mean, then do the conversion accordingly.
Find out what is the actual pixel range among a single image, or among all of
your images, say [534, 4399], scale it to [0.1, 1.0], set those zeros to be 0.0
Simply scale the full range of unsigned short [0 ~ 65535] to [0.0, 1.0]
Related
The documentation for convertMaps says that it supports the following transformation:
(CV_32FC1, CV_32FC1)→(CV_16SC2, CV_16UC1) This is the most frequently used conversion operation, in which the original floating-point maps (see remap) are converted to a more compact and much faster fixed-point representation. The first output array contains the rounded coordinates and the second array (created only when nninterpolation=false) contains indices in the interpolation tables.
I understand that (CV_32FC1, CV_32FC1) is encoding (x, y) coordinates as floats. How does the fixed point format work? What is encoded in each 2-channel entry of the CV_16SC2 matrix? What interpolation tables does the CV_16UC1 matrix index into?
I'm going by what I remember from the last time I investigated this. Grain of salt and all that.
the fixed point format splits the integer and fractional parts of your (x,y)-coordinates into different maps.
it's "compact" in that CV_32FC2 or 2x CV_32FC1 uses 8 bytes per pixel, while CV_16SC2 + CV_16UC1 uses 6 bytes per pixel. also it's integer-only, so using it can free up floating point compute resources for other work.
the integer parts go into the first map, which is 2-channel. no surprises there.
the fractional parts are converted to 5-bit integers, i.e. they're multiplied by 32. then they're packed together, lowest 5 bits from one coordinate, higher next 5 bits from the other one.
the resulting funny number has a range of 0 .. 1023, or 0b00000_00000 .. 0b11111_11111, which encodes fractional parts (0.0, 0.0) and (0.96875, 0.96875) respectively (that's 31/32).
during remap...
the integer map is used to look up, for every resulting pixel, several pixels in the source image required for interpolation.
the fractional map is taken as an index into an "interpolation table", which is internal to OpenCV. it contains whatever factors and shifts required to correctly blend the several sampled pixels into one resulting pixel, all using integer math. I guess there are multiple tables, one for each interpolation method (linear, cubic, ...).
I have a dataset of CT-Scan representing hips scan. I'm currently not normalizing the pixel value because in CT-Scan pixel value represent different part of the scan (bone 1000+, water0, air-1000, etc). Also the range of pixels value change every scan (ex. -500:1500, -400:1200).
I'm wondering if normalizing pixel value between [0,1] would be a + for my training or I would lost information on the relation between int pixel value and segmentation truth.
Thanks for the answers
It depends a little on your data. What you are describing are so called Hounsfield Units (probably read up on that), you basically express every intensity relative to the one of water.
Bone density (and with that the corresponding intensity) can vary greatly, not to mention if there is metal present.
Your HU range is highly dependent on the body region and mainly the patient.
https://images.app.goo.gl/WNLCs8eENTdbXWwM7
CT Scans are usually uint16 grayscale, I would definitely normalize as long as you can ensure that your float range is sufficient to accommodate the 2^16 different grayscale values.
Recently i read a paper , they extract depth intensity and distance of pixel from camera using depth image. But, as far I know, each pixel value in depth image represents distance in mm [range:0-65536] then how can they extract depth intensity within a range [0 to 255] from depth image. I don't understand it. kinect sensor returns uint16 depth frame which includes the each pixel distance from sensor. It does not return any intensity value, then how can the paper demonstrates that they extract depth intensity . I am really confused.
Here is the paper link
This is the graph what I want to extract(collected from the paper:
Since there is no an answer for this question , i will suggest you approach for getting your own depth image data .
One simple way can be scaling the image based on following formula:
Pixel_value=Pixel_value/4500*65535
If you want see the exact image that you get from uint8 ; I guess the following steps will work for you.
Probably while casting the image to uint8 matlab firstly clip the values above some threshold lets say 4095=2**12-1 (i'm not sure about value) and then it makes right shifts (4 shifts in our case) to make it inside the range of 0-255.
So i guess multiplying the value of uint8 with 256 and casting it as uint16 will help you get the same image
Pixel_uint16_value= Pixel_uint8_value*256 //or Pixel_uint16_value= Pixel_uint8_value<<8
//dont forget to cast the result as uint16
The other way to converting raw data to depth image in millimeters.
The depth image
should be stored in millimeters and as 16 bit unsigned
integers. The following two formulas can be used for
converting raw-data to millimeters .
distance = 1000/ (− 0.00307 ∗ rawDisparity + 3.33 )
distance = 123.6 ∗ tan ( rawDisparity/2842.5 + 1.1863 )
Save each distance value to coressponding rawdisparty pixel. Save them as 16 bit unsigned grayscale png images. Check this link for details.
Quick answer:
You can get the intensity by getting the intensity of corresponding IR pixel. let say you have a IR pixel array irdata,
then you can get the intensity of the ith pixel by
byte intensity = (byte)(irdata[i] >> 8);
In Kinect v2 only has two cameras, One is RGB camera and other one is IR camera. It uses IR camera to calculate the depth of the image by using the time-of-flight (TOF). If you need more information, please comment here or find my project on Kinect in github https://github.com/shanilfernando/VRInteraction. I'm more than happy to help you.
Edit
As you know depth is the distance between Kinect sensor to the object in a given space. The Kinect IR emitter emit bunch of IR rays and start counting time. Once the IR rays reflect back to the depth sensor(IR sensor) of the kinect, it stop the time counter. The time (t) between emission and receiving that specific ray is called the time-of-flight of that ray. Then distance (d) between kinect and the object can be calculated by
d = (t * speed-of-light)/2
This is done for all the rays it emits and build the IR image and depth image. Each and every ray represent a pixel in IR and depth images.
I read your reference paper, First of all, they are NOT using a depth image which is captured from the Kinect V2. It clearly said its resolution is 640 × 480 and effective distance range from 0.8 meters to 3.5 meters.
I want you to clearly understnad, the depth frame and depth image are two different things elements. If you check the depth frame, each pixel is a distance and in depth image each pixel is intensity(How much bright/brightness).
In this plot they are trying to plot intensity of the star point against the actual distance of the star point. They are starting with a depth (intensity) image, NOT depth frame. depth frame you can scale in to a depth image where values are 0 to 255 where near points has higher values and further points has lower values.
I guess you were trying to read depth from a Image file .png because of which the data is converted to binary form.
I would suggest you to save the depth image in .tiff format rather than png format.
I am creating a histogram of an image. I need a way to scale it in y-axis to represent it nicely, as standard image/video processing programs do. Thus I need to make stronger the small values, to make weaker the big values.
What I tried to do so far:
To scale the y-values by dividing them by the greatest y value. It allowed me to see it, but still small values are almost indistinguishable from zero.
What I have seen:
In a standard video processing tool let's say three biggest values have the same y-values on their histogram representation. However, real values are different. And the small values are amplified on the histogram.
I would be thankful for the tips/formula/algorithm.
You can create lookup table (LUT), fill it with values from a curve that describes desired behavior. Seems that you want something like gamma curve
for i in MaxValue range
LUT[i] = MaxValue(255?) * Power(i/MaxValue, gamma)
To apply it:
for every pixel
NewValue = LUT[OldValue]
Consider the really simple difference kernel
kernel vec4 diffKernel(__sample image1, __sample image2)
{
return vec4(image1.rgb - image2.rgb, 1.0);
}
When used as a CIColorKernel, this produces the difference between two images. However, any valus for which image1.rgb < image2.rgb (pointwise) will be forced to zero due to the "clamping" nature of the outputs of kernels in CIKernel.
For many image processing algorithms, such as those involving image pyramids (see my other question on how this can be achieved in Core Image), it is important to preserve these negative values for later use (reconstructing the pyramid, for example). If 0's re used in their place, you will actually get an incorrect output.
I've seen that one such way is to just store abs(image1.rgb - image2.rgb) make a new image, who's RGB values store 0 or 1 whether a negative sign is attached to that value, then do a multiply blend weighted with -1 to the correct places.
What are some other such ways one can store the sign of a pixel value? Perhaps we can use the alpha channel if it being unused?
I actually ended up figuring this out -- you can use an option in CIContext to make sure that things are computed using the kCIFormatAf key. This means that any calculations done on that context will be done in a floating point precision, so that values beyond the scope of [0,1] are preserved from one filter to the next!