Calculating a 48-bit image histogram - image-processing

I have a 48-bit (16 bits per pixel) image I've loaded with FreeImage. I'm trying to generate a histogram from this image without having to convert it to a 24-bit image.
This is how I understand histograms are calculated..
for (pixel in pixels)
{
red_histo[pixel.red]++;
}
Where pixel.red can be between 0 and 255. So there is a range from 0 to 255 on my histogram. But if there is 16 bits per pixel, it could be between 0 and 65535, which is too large to be displayed on a histogram.
Is there a standard way to calculate histograms with 48-bit (or higher) images?

You have to decide how many bins you need in the histogram. For eg. the Matlab histogram function takes these forms
imhist(I)
imhist(I, n)
imhist(X, map)
In the first case, the number of bins is by default used as 256. So, if you have 16bit input, these will be scaled down to 8 bit and split into 256 bin histogram.
In the second one, you can specify number of bins 'n'. Lets say you specify n=2 for your 16 bit data. Then, this will essentially split the histogram as [0-2^15, 2^15-2^16-1].
The third case is where you specify the map for each bin. ie you have to specify the ranges of the pixel values for each bin.
http://www.mathworks.com/help/images/ref/imhist.html
How you want to choose the number of bins depends on your requirement.

This Stack Overflow Question May have the answer you are looking for.
I do not know if there is a "standard" way.

If this is for display purposes you can scale back the pixels to keep the range from 0-255 for instance:
double scalingFactor = 255/65535;
for (pixel in pixels)
{
red_histo[(int)(scalingFactor * pixel.red)]++;
}
This will allow the upper range of the 16 bit pixel to come in at 255 and lower range of the 16 bit pixel to come in at 0.

Related

How the Input Image Pixel Values Scaled in YOLO-V4?

I have an issue with something vague for me regarding the input data preprocessing in YOLO-V4,
If the input image is a grayscale image of 16-bits per pixel, i.e. range of pixel values [0,2^16) instead of [0,2^8), it is mentioned that they are scaled so I want to ask about:
1- These values are scaled to be within -1 to 1 or 0 to 1?
2- Which method is used in scaling (where can I find its piece of code)?
3- Is the scaling done using the max value per image or the max depth per image, i.e in my case does the max used in the scaling will be 2^16 or the maximum value in the image for example 2000 or whatever?
Thanks in advance,
(1) - from 0 to 1.
Now scaling works as v = pix / 255.
Here

How to iterate over pixels in a YUV NV12 buffer from camera and set color in Obj-c?

I need to iterate over the pixels of a YUV NV12 buffer and set color. I think the conversion for NV12 format should be easy but I can't figure it out. If I could set the top 50x50 pixels at 0,0 to white, I'd be set. Thank you in advance.
Have you tried setting the first 3 bytes (12 bits) * number of pixels to all 0x00 or all 0xFF?
Since you really don't seem to care about the conversion simply overwriting the buffer would suffice. If that works, you can tackle the other problems, like finding the right color and producing a rect instead of a line.
For the first, you need to understand the YUV coding. https://wiki.videolan.org/YUV#NV12. According to this document you will most likely need to overwrite bits in the Y range and in the UV range. So writing at two different locations. Thats very contrary to the RGB buffer where all pixel colors have close locality. So you can start and overwrite the first 8 bits in the Y range and the first or last 2 bits in the UV range. That should set you one pixel to a different color than before.
Finally you can tackle the display of the 50x50 rectangle. You'll need to know the image dimensions, because you'll need to offset after each row (if the buffer is transmitted by rows!). E.g., this graph:
.------.
|xx |
|xx |
| |
'------'
In a rgb color space, with row major transmitted values, the buffer would look like this: xx0000xx0000000000. So you would need to overwrite bytes 0-6 and bytes 18-24 (RGB). Because: first range * 3 bytes RGB. Then next range starts at row number (1) * image width (6) * 3 bytes (RGB), and so on. You have to apply the same thinking to the YUV color space.

How do you handle negative pixel values after filtering?

I have a 8-bit image and I want to filter it with a matrix for edge detection. My kernel matrix is
0 1 0
1 -4 1
0 1 0
For some indices it gives me a negative value. What am I supposed to with them?
Your kernel is a Laplace filter. Applying it to an image yields a finite difference approximation to the Laplacian operator. The Laplace operator is not an edge detector by itself.
But you can use it as a building block for an edge detector: you need to detect the zero crossings to find edges (this is the Marr-Hildreth edge detector). To find zero crossings, you need to have negative values.
You can also use the Laplace filtered image to sharpen your image. If you subtract it from the original image, the result will be an image with sharper edges and a much crisper feel. For this, negative values are important too.
For both these applications, clamping the result of the operation, as suggested in the other answer, is wrong. That clamping sets all negative values to 0. This means there are no more zero crossings to find, so you can't find edges, and for the sharpening it means that one side of each edge will not be sharpened.
So, the best thing to do with the result of the Laplace filter is preserve the values as they are. Use a signed 16-bit integer type to store your results (I actually prefer using floating-point types, it simplifies a lot of things).
On the other hand, if you want to display the result of the Laplace filter to a screen, you will have to do something sensical with the pixel values. Common in this case is to add 128 to each pixel. This shifts the zero to a mid-grey value, shows negative values as darker, and positive values as lighter. After adding 128, values above 255 and below 0 can be clipped. You can also further stretch the values if you want to avoid clipping, for example laplace / 2 + 128.
Out of range values are extremely common in JPEG. One handles them by clamping.
If X < 0 then X := 0 ;
If X > 255 then X := 255 ;

How to choose the number of bins when creating HSV histogram?

I was reading some documentation about HSV histogram, and in several refs the Saturation channel was quantized into 256 values. Why is that? Is there any reason behind choosing this number?
I have the same questions for the Hue channel, often it is quantized into 180 values.
Disclaimer: Off-hand answers (i.e., not backed up by any documentation):
"256" is a popular number for a bin size because Programmers Like Round Numbers -- it fits in a single byte. And "180" because the HSB circle is "360 [degrees]", but "360" does not fit into a single byte.
For many image formats, the range of RGB values is limited to 0..255 per channel -- 3 bytes in total. To store the same amount of data (ignoring any artifacts of converting to another color model), Saturation and Brightness are often expressed in single bytes as well. The same could be done for Hue, by scaling the original range of 0..359 (as Hue is usually expressed as a value in degrees on the HSB Color Wheel) into the byte range 0..255. However, probably because it's easier to do calculations with a number close to the original 360° full circle, the range is clipped to 0..179. That way the value can be stored into a single byte (and thus "HSB" uses as much memory as "RGB") and can be converted trivially back to (close to) its original value -- multiply by 2. Obviously, sticking to the storage space wins over fidelity.
Given 256 values for both S and B, and 180 for H, you end up with a color space of 256*256*180 = 11,796,480 colors. To inspect the number of colors, you build a histogram: an array where you can read out the total amount of pixels in a certain color or color range. Using a color range here, instead of actual values, significantly cuts down the memory requirements.
For an RGB color image, with the colors fairly evenly distributed, you could shift down each channel a certain number of bits. This is how a straightforward conversion from 24-bit "true-color" RGB down to 15-bit RGB "high-color" space works: each channel gets divided by 8, reducing 256 values down to 32 (5 bits per channel). Conversion to a 16-bit high-color RGB space works the same; the bit that got left over in the 15-bit conversion is assigned to green. Thus, the range of colors for green is doubled, which is useful since the human eye is more perceptive for shades of green than for the other two primaries.
It gets more complicated when the colors in the input image are not evenly distributed. A naive solution is to create an array of [256][256][256], initialize all to zero, then fill the array with the colors of the image, and finally sort them. There are better alternatives -- let me consult my old Computer Graphics [1] here. Hold on.
13.4 Reproducing Color mentions the names of two different approaches from Heckbert (Color Image Quantization for Frame Buffer Display, SIGGRAPH 82): the popularity and the median-cut algorithms. (Unfortunately, that's all they say about this topic. I assume efficient code for both can be googled for.)
A rough guess:
The size for each bin (H,S,B) should be reflected by what you are trying to use it for. This older SO question, for example, uses a large bin for hue -- color is considered the most important -- and only 3 different values for both saturation and brightness. Thus, bright images with some subdued areas (say, a comic book) will give a good spread in this histogram, but a real-color photograph will not so much.
The main limit is that the bin sizes, multiplied with each other, should use a reasonably small amount of memory, yet cover enough of each component to get evenly filled. Perhaps some trial-and-error comes into play here. You could initially evenly distribute all of H, S, and B components over the available memory in your histogram and process a small part of the image; say, 1 out of 4 pixels, horizontally and vertically. If you notice one of the component bins fills up too fas where others stay untouched, adjust the ranges and restart.
If you need to do an analysis of multiple pictures, make sure they are all alike in their color gamut. You cannot expect a reasonable bin size to work on all sorts of images; you would end up with an evenly distribution, where all matches are only so-so.
[1] Computer Graphics. Principles and Practices. (1997) J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, 2nd ed., Reading, MA: Addison-Wesley.

How Huffman Encoding construct the image(jpeg) from dct coefficients?

I have a 512x512 image and I tried to recompress it. Here's the steps for recompressing an image to jpeg file
1) convert rgb to YCrCb
2) perform down sampling on Cr and Cb
2) convert YCrCb to DCT and Quantized according to chosen Quality
3) perform Huffman Encoding on Quantized DCT
But before Huffman Encoding I counted the number of DCT coefficients and it is 393216. Dividing by it by 64 tells me the number of DCT block (8x8) which will be 6144.
Now I tried to count the number of 8x8 blocks for pixel domain. 512/8=64 which gives me 64 blocks horizontally and 64 blocks vertically. 64 x 64 = 4096 which is not equal to number of DCT blocks while the number of pixels are 512x512 = 262144
My Question is how does Huffman encoding magically transform 393216 coefficients to 262144 pixels and get each pixel values, and compute the dimension (512x512) of the compressed image(jpeg).
Thanks you in advance. :D
If your image was encoded with no color subsampling, then there would be a 1:1 ratio of 8x8 coefficient blocks to 8x8 color component blocks. Each MCU (minimum coded unit) would be 8x8 pixels and have 3 8x8 coefficient blocks. 512x512 pixels = 64x64 8x8 blocks x 3 (one each for Y, Cr and Cb) = 12288 coefficient blocks.
Since you said you subsampled the color (I assume in both directions), then you will now have 6 8x8 blocks for each MCU. In the diagram below, the leftmost diagram shows the case for no subsampling of the colors and the rightmost diagram shows subsampling in both directions. The MCU size in this case will be 16x16 pixels. Each 16x16 block of pixels will need 6 8x8 coefficient blocks to define it (4 Y, 1 Cr, 1 Cb). If you divide the image into 16x16 MCUs, you will have 32x32 MCUs each with 6 8x8 blocks per MCU = 6144 coefficient blocks. So, to answer your question, the Huffman encoding is not what's changing the number of coefficients, it's the color subsampling. Part of the compression which comes from using color subsampling in JPEG images is exploiting a feature of the human visual system. Our eyes are more sensitive to changes in luminance than chrominance.
Huffman encoding doesn't transform coefficients to pixels or anything like that. At least not the Huffman encoding that I'm thinking of. All huffman encoding does, is it takes a list of tokens, and represents them with less bits based on the frequency of those tokens.
an example: you have tokens a, b, c, and d
now, uncompressed, each of your tokens would require 2 bits(00, 01, 10, and 11).
let's say a=00, b=01, c=10, and d=11
aabaccda would be represented as 0000010010101100 16 bits
but with Huffman encoding you'd represent a with less bits because it's more common, and you'd represent b and d with more because they're less common something to the extent of:
a=0, b=110, c=10, d=111 and then
aabaccda would be represented as 00110010101110 14 bits
Your image is 512x512 pixels
The Y component is 512x512 hence 262144 pixels turned into 262144 DCT coefficients
The Cb and Cr components are downsampled by 2 hence 256x256 pixels turned into 65536 DCT coefficients each.
The sum of all DCT coefficients is 262144+65536+65536 = 393216.
Huffman has nothing to do with this.

Resources