How does the "compressed form" of `cv::convertMaps` work? - opencv

The documentation for convertMaps says that it supports the following transformation:
(CV_32FC1, CV_32FC1)→(CV_16SC2, CV_16UC1) This is the most frequently used conversion operation, in which the original floating-point maps (see remap) are converted to a more compact and much faster fixed-point representation. The first output array contains the rounded coordinates and the second array (created only when nninterpolation=false) contains indices in the interpolation tables.
I understand that (CV_32FC1, CV_32FC1) is encoding (x, y) coordinates as floats. How does the fixed point format work? What is encoded in each 2-channel entry of the CV_16SC2 matrix? What interpolation tables does the CV_16UC1 matrix index into?

I'm going by what I remember from the last time I investigated this. Grain of salt and all that.
the fixed point format splits the integer and fractional parts of your (x,y)-coordinates into different maps.
it's "compact" in that CV_32FC2 or 2x CV_32FC1 uses 8 bytes per pixel, while CV_16SC2 + CV_16UC1 uses 6 bytes per pixel. also it's integer-only, so using it can free up floating point compute resources for other work.
the integer parts go into the first map, which is 2-channel. no surprises there.
the fractional parts are converted to 5-bit integers, i.e. they're multiplied by 32. then they're packed together, lowest 5 bits from one coordinate, higher next 5 bits from the other one.
the resulting funny number has a range of 0 .. 1023, or 0b00000_00000 .. 0b11111_11111, which encodes fractional parts (0.0, 0.0) and (0.96875, 0.96875) respectively (that's 31/32).
during remap...
the integer map is used to look up, for every resulting pixel, several pixels in the source image required for interpolation.
the fractional map is taken as an index into an "interpolation table", which is internal to OpenCV. it contains whatever factors and shifts required to correctly blend the several sampled pixels into one resulting pixel, all using integer math. I guess there are multiple tables, one for each interpolation method (linear, cubic, ...).

Related

Interpreting unsigned short depth map values

I'm trying to test my algorithm on the lineMOD object detection dataset. According to the author, the depth values are stored as unsigned short values. I've managed to load the depth values into a cv::Mat but I would like to convert them to the typical float representation [0,1].
At first I assumed that I just have to divide with the maximum unsigned short but this doesn't seem to be the case since the maximum value I find seems to be 3399 while there are a lot of zeros in the depth map. I suppose the zeros mean that the specific pixel is a point that is too far for the depth camera to detect.
Is it possible that these unsigned shorts represent millimeters? If not, how should I convert the depth values before applying the transforms that generate the point cloud?
I guess the pixel values are not millimeters, rather some relative values, because it is easier for a depth camera to get relative depth values than accurate millimeter values, the values even might not be linear. Consult the author to get more information.
You may try a few options:
Consult the author to fully understand what does the depth value mean, then do the conversion accordingly.
Find out what is the actual pixel range among a single image, or among all of
your images, say [534, 4399], scale it to [0.1, 1.0], set those zeros to be 0.0
Simply scale the full range of unsigned short [0 ~ 65535] to [0.0, 1.0]

Is pixel value normalization needed in medical image segmentation?

I have a dataset of CT-Scan representing hips scan. I'm currently not normalizing the pixel value because in CT-Scan pixel value represent different part of the scan (bone 1000+, water0, air-1000, etc). Also the range of pixels value change every scan (ex. -500:1500, -400:1200).
I'm wondering if normalizing pixel value between [0,1] would be a + for my training or I would lost information on the relation between int pixel value and segmentation truth.
Thanks for the answers
It depends a little on your data. What you are describing are so called Hounsfield Units (probably read up on that), you basically express every intensity relative to the one of water.
Bone density (and with that the corresponding intensity) can vary greatly, not to mention if there is metal present.
Your HU range is highly dependent on the body region and mainly the patient.
https://images.app.goo.gl/WNLCs8eENTdbXWwM7
CT Scans are usually uint16 grayscale, I would definitely normalize as long as you can ensure that your float range is sufficient to accommodate the 2^16 different grayscale values.

How to choose the number of bins when creating HSV histogram?

I was reading some documentation about HSV histogram, and in several refs the Saturation channel was quantized into 256 values. Why is that? Is there any reason behind choosing this number?
I have the same questions for the Hue channel, often it is quantized into 180 values.
Disclaimer: Off-hand answers (i.e., not backed up by any documentation):
"256" is a popular number for a bin size because Programmers Like Round Numbers -- it fits in a single byte. And "180" because the HSB circle is "360 [degrees]", but "360" does not fit into a single byte.
For many image formats, the range of RGB values is limited to 0..255 per channel -- 3 bytes in total. To store the same amount of data (ignoring any artifacts of converting to another color model), Saturation and Brightness are often expressed in single bytes as well. The same could be done for Hue, by scaling the original range of 0..359 (as Hue is usually expressed as a value in degrees on the HSB Color Wheel) into the byte range 0..255. However, probably because it's easier to do calculations with a number close to the original 360° full circle, the range is clipped to 0..179. That way the value can be stored into a single byte (and thus "HSB" uses as much memory as "RGB") and can be converted trivially back to (close to) its original value -- multiply by 2. Obviously, sticking to the storage space wins over fidelity.
Given 256 values for both S and B, and 180 for H, you end up with a color space of 256*256*180 = 11,796,480 colors. To inspect the number of colors, you build a histogram: an array where you can read out the total amount of pixels in a certain color or color range. Using a color range here, instead of actual values, significantly cuts down the memory requirements.
For an RGB color image, with the colors fairly evenly distributed, you could shift down each channel a certain number of bits. This is how a straightforward conversion from 24-bit "true-color" RGB down to 15-bit RGB "high-color" space works: each channel gets divided by 8, reducing 256 values down to 32 (5 bits per channel). Conversion to a 16-bit high-color RGB space works the same; the bit that got left over in the 15-bit conversion is assigned to green. Thus, the range of colors for green is doubled, which is useful since the human eye is more perceptive for shades of green than for the other two primaries.
It gets more complicated when the colors in the input image are not evenly distributed. A naive solution is to create an array of [256][256][256], initialize all to zero, then fill the array with the colors of the image, and finally sort them. There are better alternatives -- let me consult my old Computer Graphics [1] here. Hold on.
13.4 Reproducing Color mentions the names of two different approaches from Heckbert (Color Image Quantization for Frame Buffer Display, SIGGRAPH 82): the popularity and the median-cut algorithms. (Unfortunately, that's all they say about this topic. I assume efficient code for both can be googled for.)
A rough guess:
The size for each bin (H,S,B) should be reflected by what you are trying to use it for. This older SO question, for example, uses a large bin for hue -- color is considered the most important -- and only 3 different values for both saturation and brightness. Thus, bright images with some subdued areas (say, a comic book) will give a good spread in this histogram, but a real-color photograph will not so much.
The main limit is that the bin sizes, multiplied with each other, should use a reasonably small amount of memory, yet cover enough of each component to get evenly filled. Perhaps some trial-and-error comes into play here. You could initially evenly distribute all of H, S, and B components over the available memory in your histogram and process a small part of the image; say, 1 out of 4 pixels, horizontally and vertically. If you notice one of the component bins fills up too fas where others stay untouched, adjust the ranges and restart.
If you need to do an analysis of multiple pictures, make sure they are all alike in their color gamut. You cannot expect a reasonable bin size to work on all sorts of images; you would end up with an evenly distribution, where all matches are only so-so.
[1] Computer Graphics. Principles and Practices. (1997) J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, 2nd ed., Reading, MA: Addison-Wesley.

Matlab Camera Calibration - Correct lens distortion

In the Computer Vision System Toolbox for Matlab there are three types of interpolation methods used for Correct lens distortion.
Interpolation method for the function to use on the input image. The interp input interpolation method can be the string, 'nearest', 'linear', or 'cubic'.
My question is: what is the difference between 'nearest', 'linear', or 'cubic' ? and which one implemented in "Zhang" and "Heikkila, J, and O. Silven" methods.
I can't access the paged at the link you wrote in your question (it asks for a username and password) and so I assume your linked page has the same contents of the page http://www.mathworks.it/it/help/vision/ref/undistortimage.html which I quote here:
J = undistortImage(I,cameraParameters,interp) removes lens distortion from the input image, I and specifies the
interpolation method for the function to use on the input image.
Input Arguments
I — Input image
cameraParameters — Object for storing camera parameters
interp — Interpolation method
'linear' (default) | 'nearest' | 'cubic'
Interpolation method for the function to use on
the input image. The interp input interpolation method can be the
string, 'nearest', 'linear', or 'cubic'.
Furthermore, I assume you are referring to these papers:
ZHANG, Zhengyou. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000, 22.11: 1330-1334.
HEIKKILA, Janne; SILVEN, Olli. A four-step camera calibration procedure with implicit image correction. In: Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997. p. 1106-1112.
I have searched for the word "interpolation" in the two pdf documents Zhang and Heikkila and Silven and I did not find any direct statement about the interpolation method they have used.
To my knowledge, in general, a camera calibration method is concerned on how to estimate the intrinsic, extrinsic and lens distortion parameters (all these parameters are inside the input argument cameraParameters of Matlab's undistortImage function); the interpolation method is part of a different problem, i.e. the problem of "Geometric Image Transformations".
I quote from the OpenCV's page Geometric Image Transformation (I have slightly modified the original omitting some details and adding some definitions, I assume you are working with grey level image):
The functions in this section perform various geometrical
transformations of 2D images. They do not change the image content but
deform the pixel grid and map this deformed grid to the destination
image. In fact, to avoid sampling artifacts, the mapping is done in
the reverse order, from destination to the source. That is, for each
pixel (x, y) of the destination image, the functions compute
coordinates of the corresponding “donor” pixel in the source image and
copy the pixel value:
dst(x,y) = src(f_x(x,y), f_y(x,y))
where
dst(x,y) is the grey value of the pixel located at row x and column y in the destination image
src(x,y) is the grey value of the pixel located at row x and column y in the source image
f_x is a function that maps the row x and the column y to a new row, it just uses coordinates and not the grey level.
f_y is a function that maps the row x and the column y to a new column, it just uses coordinates and not the grey level.
The actual implementations of the geometrical transformations, from
the most generic remap() and to the simplest and the fastest resize()
, need to solve two main problems with the above formula:
• Extrapolation of non-existing pixels. Similarly to the filtering
functions described in the previous section, for some (x,y) , either
one of f_x(x,y) , or f_y(x,y) , or both of them may fall outside of
the image. In this case, an extrapolation method needs to be used.
OpenCV provides the same selection of extrapolation methods as in the
filtering functions. In addition, it provides the method
BORDER_TRANSPARENT . This means that the corresponding pixels in the
destination image will not be modified at all.
• Interpolation of pixel
values. Usually f_x(x,y) and f_y(x,y) are floating-point numbers. This
means that <f_x, f_y> can be either an affine or
perspective transformation, or radial lens distortion correction, and
so on. So, a pixel value at fractional coordinates needs to be
retrieved. In the simplest case, the coordinates can be just rounded
to the nearest integer coordinates and the corresponding pixel can be
used. This is called a nearest-neighbor interpolation. However, a
better result can be achieved by using more sophisticated
interpolation methods, where a polynomial function is fit into some
neighborhood of the computed pixel (f_x(x,y), f_y(x,y)), and then the
value of the polynomial at (f_x(x,y), f_y(x,y)) is taken as the
interpolated pixel value. In OpenCV, you can choose between several
interpolation methods. See resize() for details.
For a "soft" introduction see also for example Cambridge in colour - DIGITAL IMAGE INTERPOLATION.
So let's say you need the grey level of pixel at x=20.2 y=14.7, since x and y are number with a fractional part different from zero you will need to "invent" (compute) the grey level in some way. In the simplest case ('nearest' interpolation) you just say that the grey level at (20.2,14.7) is the grey level you retrieve at (20,15), it is called "nearest" because 20 is the nearest integer value to 20.2 and 15 is the nearest integer value to 14.7.
In the (bi)'linear' interpolation you will compute the value at (20.2,14.7) with a combination of the grey levels of the four pixels at (20,14), (20,15), (21,14), (21,15); for the details on how to compute the combination see the Wikipedia page which has a numeric example.
The (bi)'cubic' interpolation considers the combination of sixteen pixels in order to compute the value at (20.2,14.7), see the Wikipedia page.
I suggest you to try all the three methods, with the same input image, and see the differences in the output image.
Interpolation method is actually independent of the camera calibration. Any time you apply a geometric transformation to an image, such as rotation, re-sizing, or distortion compensation, the pixels in the new image will correspond to points between the pixels of the old image. So you have to interpolate their values somehow.
'nearest' means you simply use the value of the nearest pixel.
'linear' means you use bi-linear interpolation. The new pixel's value is a weighted sum of the values of the neighboring pixels in the input image, where the weights are proportional to distances.
'cubic' means you use a bi-cubic interpolation, which is more complicated than bi-linear, but may give you a smoother image.
A good description of these interpolation methods is given in the documentation for the interp2 function.
And finally, just to clarify, the undistortImage function is in the Computer Vision System Toolbox.

Calculating a 48-bit image histogram

I have a 48-bit (16 bits per pixel) image I've loaded with FreeImage. I'm trying to generate a histogram from this image without having to convert it to a 24-bit image.
This is how I understand histograms are calculated..
for (pixel in pixels)
{
red_histo[pixel.red]++;
}
Where pixel.red can be between 0 and 255. So there is a range from 0 to 255 on my histogram. But if there is 16 bits per pixel, it could be between 0 and 65535, which is too large to be displayed on a histogram.
Is there a standard way to calculate histograms with 48-bit (or higher) images?
You have to decide how many bins you need in the histogram. For eg. the Matlab histogram function takes these forms
imhist(I)
imhist(I, n)
imhist(X, map)
In the first case, the number of bins is by default used as 256. So, if you have 16bit input, these will be scaled down to 8 bit and split into 256 bin histogram.
In the second one, you can specify number of bins 'n'. Lets say you specify n=2 for your 16 bit data. Then, this will essentially split the histogram as [0-2^15, 2^15-2^16-1].
The third case is where you specify the map for each bin. ie you have to specify the ranges of the pixel values for each bin.
http://www.mathworks.com/help/images/ref/imhist.html
How you want to choose the number of bins depends on your requirement.
This Stack Overflow Question May have the answer you are looking for.
I do not know if there is a "standard" way.
If this is for display purposes you can scale back the pixels to keep the range from 0-255 for instance:
double scalingFactor = 255/65535;
for (pixel in pixels)
{
red_histo[(int)(scalingFactor * pixel.red)]++;
}
This will allow the upper range of the 16 bit pixel to come in at 255 and lower range of the 16 bit pixel to come in at 0.

Resources