How Huffman Encoding construct the image(jpeg) from dct coefficients?

How Huffman Encoding construct the image(jpeg) from dct coefficients? - image-processing

I have a 512x512 image and I tried to recompress it. Here's the steps for recompressing an image to jpeg file
1) convert rgb to YCrCb
2) perform down sampling on Cr and Cb
2) convert YCrCb to DCT and Quantized according to chosen Quality
3) perform Huffman Encoding on Quantized DCT
But before Huffman Encoding I counted the number of DCT coefficients and it is 393216. Dividing by it by 64 tells me the number of DCT block (8x8) which will be 6144.
Now I tried to count the number of 8x8 blocks for pixel domain. 512/8=64 which gives me 64 blocks horizontally and 64 blocks vertically. 64 x 64 = 4096 which is not equal to number of DCT blocks while the number of pixels are 512x512 = 262144
My Question is how does Huffman encoding magically transform 393216 coefficients to 262144 pixels and get each pixel values, and compute the dimension (512x512) of the compressed image(jpeg).
Thanks you in advance. :D

If your image was encoded with no color subsampling, then there would be a 1:1 ratio of 8x8 coefficient blocks to 8x8 color component blocks. Each MCU (minimum coded unit) would be 8x8 pixels and have 3 8x8 coefficient blocks. 512x512 pixels = 64x64 8x8 blocks x 3 (one each for Y, Cr and Cb) = 12288 coefficient blocks.
Since you said you subsampled the color (I assume in both directions), then you will now have 6 8x8 blocks for each MCU. In the diagram below, the leftmost diagram shows the case for no subsampling of the colors and the rightmost diagram shows subsampling in both directions. The MCU size in this case will be 16x16 pixels. Each 16x16 block of pixels will need 6 8x8 coefficient blocks to define it (4 Y, 1 Cr, 1 Cb). If you divide the image into 16x16 MCUs, you will have 32x32 MCUs each with 6 8x8 blocks per MCU = 6144 coefficient blocks. So, to answer your question, the Huffman encoding is not what's changing the number of coefficients, it's the color subsampling. Part of the compression which comes from using color subsampling in JPEG images is exploiting a feature of the human visual system. Our eyes are more sensitive to changes in luminance than chrominance.

Huffman encoding doesn't transform coefficients to pixels or anything like that. At least not the Huffman encoding that I'm thinking of. All huffman encoding does, is it takes a list of tokens, and represents them with less bits based on the frequency of those tokens.
an example: you have tokens a, b, c, and d
now, uncompressed, each of your tokens would require 2 bits(00, 01, 10, and 11).
let's say a=00, b=01, c=10, and d=11
aabaccda would be represented as 0000010010101100 16 bits
but with Huffman encoding you'd represent a with less bits because it's more common, and you'd represent b and d with more because they're less common something to the extent of:
a=0, b=110, c=10, d=111 and then
aabaccda would be represented as 00110010101110 14 bits

Your image is 512x512 pixels
The Y component is 512x512 hence 262144 pixels turned into 262144 DCT coefficients
The Cb and Cr components are downsampled by 2 hence 256x256 pixels turned into 65536 DCT coefficients each.
The sum of all DCT coefficients is 262144+65536+65536 = 393216.
Huffman has nothing to do with this.

Related

iOS Metal. Why does simply changing colorPixelFormat result in brighter imagery?

In Metal on iOS the default colorPixelFormat is bgra8Unorm. When I change format to rgba16Float all imagery brightens. Why?
An example:
Artwork
MTKView with format bgra8Unorm.
Texture-mapped quad. Texture created with SRGB=false.
MTKView with format rgba16Float.
Texture-mapped quad. Texture created with SRGB=false.
Why is everything brighter with rgba16Float. My understanding is that SRGB=false implies that no gamma correction is done when importing artwork. The assumption is the artwork has no gamma applied.
What is going on here?

If your artwork has a gamma (it does per the first image you uploaded), you have to convert it to a linear gamma if you want to use it in a linear space.
What is happening here is you are displaying gamma encoded values of the image in a linear workspace, without using color management or transform to convert those values.
BUT: Reading some of your comments, is the texture not an image but an .svg?? Did you convert your color values to linear space?
Here's the thing: RGB values are meaningless numbers unless you define how those RGB values relate to a given space.
#00FF00 in sRGB is a different color than #00FF00 in Adobe98 for instance. In your case you are going linear, but what primaries? Still using sRGB primaries? P3 Primaries? I'm not seeing a real hue shift, so I assume you are using sRGB primaries and a linear transfer curve for the second example.
THAT SAID, an RGB value of the top middle kid's green shirt is #8DB54F, normalized to 0-1, that's 0.553 0.710 0.310 .These numbers by themselves don't know if they are gamma encoded or not.
THE RELATIONSHIP BETWEEN sRGB, Y, and Light:
For the purposes of this discussion, we will assume the SIMPLE sRGB gamma of 1/2.2 and not the piecewise version. Same for L*
In sRGB, #8DB54F when displayed on an sRGB monitor with a sRGB gamma curve, the luminance (Y) is 39
This can be found by
(0.553^2.2)*0.2126 + (0.710^2.2)*0.7152 + (0.310^2.2)*0.0722
or 0.057 + 0.33 + 0.0061 = 0.39 and 0.39 * 100 = 39 (Y)
But if color management is told the values are linear, then the gamma correction is discarded, and (more or less):
0.553*0.2126 + 0.710*0.7152 + 0.310*0.0722
or 0.1175 + 0.5078 + 0.0223 = 0.65 and 0.65 * 100 = 65 (Y)
(Assuming the same coefficients are used.)
Luminance (Y) is linear, like light. But human perception is not, and neither are sRGB values.
Y is the linear luminance from CIEXYZ, while it is spectrally weighted based on the eye's response to different wavelengths, it is NOT uniform in terms of lightness. On a scale of 0-100, 18.4 is perceived as the middle.
L* is a perceptual lightness from CIELAB (L* a* b*), it is (simplified curve of):
L* = Y^0.42 On a scale of 0-100, L* 50 is the "perceived middle" value. So that green shirt at Y 39 is L* 69 when interpreted and displayed as sRGB, and the Y 65 is about L* 84 (those numbers are based on the math, here are the values per the color picker on my MacBook):
sRGB is a gamma encoded signal, done to make the best use of the limited bit depth of 8bits per channel. The effective gamma curve is similar to human perception so that more bits are used to define darker areas as human perception is more sensitive to luminance changes in dark regions. As noted above it is a simplified curve of:
sRGB_Video = Linear_Video^0.455 (And to be noted, the MONITOR adds an exponent of about 1.1)
So if 0% is black and 100% is white, then middle gray, the point most humans will say is in between 0% and 100% is:
Y 18.4%. = L* 50% = sRGB 46.7%
That is, an sRGB hex value of #777777 will display a luminance of 18.4 Y, and is equivalent to a perceived lightness of 50 L*. Middle Grey.
BUT WAIT, THERE'S MORE
So what is happening, you are telling MTKView that you are sending it image data that references linear values. But you are actually sending it sRGB values which are lighter due to the applied gamma correction. And then color management is taking what it thinks are linear values, and transforming them to the needed values for the output display.
Color management needs to know what the values mean, what colorspace they relate to. When you set SRGB=false then you are telling it that you are sending it linear values, not gamma encoded values.
BUT you are clearly sending gamma encoded values into a linear space without transforming/decoding the values to linear. Linearization won't happen unless you implicitly do so.
SOLUTION
Linearize the image data OR set the flag SRGB=true
Please let me know if you have further questions. But also, you may wish to see the Poynton Gamma FAQ or also the Color FAQ for clarification.
Also, for your grey: A linear value of 0.216 is equivalent to an sRGB (0-1) value of 0.500

Block Based DCT on gray images

Can IDCT give negative values after applying it to block based DCT channels?
I divided the gray image in 4*4 blocks and they took the 4*4 DCT of the blocks. Then using those blocks i created DCT channels. Each channel contained the spread of a frequency over the image.

Yes it can. The DCT is a sum of cosines, which have zero mean. So you may very well have nonzero pixels with negative weights only.

About "Chroma Subsampling" and video/image format (YUV 4:2:2, YUV 4:2:0 and so on)

Starting from a RGB image (or from a video streaming) I know how it is possible to obtain other image/video formats (for example YCrCb 4:2:2 or 4:2:0). I know the relation between the RGB pixels and the YCrCb and I know how to subsample in order to obtain a 4:2:2 or 4:2:0. The question is: why this notation? Where does this notation come from? What do the numbers mean?

These numbers mean the ratio of luminance (Y) to chrominance (Cr,Cb) values used in representation. Often number of bytes for chrominance are reduce in order to reduce the size of the image.
4:4:4 means you transmit a Y, a Cr, and a Cb value for each pixel. 4:2:2 means that you transmit a Y value for each pixel, but you transmit Cr and Cb values once for every to rows of the image. 4:2:0 means you transmit a Y value for each pixel but you downsample the Cr and Cb by 2 (i.e. you send one Cr and one Cb for every 2x2 block).

Fourier Transform and Image Compression

I'm getting all pixels' RGB values into
R=[],
G=[],
B=[]
arrays from the picture. They are 8 bits [0-255] values containing arrays. And I need to use Fourier Transform to compress image with a lossy method.
Fourier Transform
N will be the pixel numbers. n is i for array. What will be the k and imaginary j?
Can I implement this equation into a programming language and get the compressed image file?
Or I need to use the transformation equation to a different value instead of RGB?

First off, yes, you should convert from RGB to a luminance space, such as YCbCr. The human eye has higher resolution in luminance (Y) than in the color channels, so you can decimate the colors much more than the luminance for the same level of loss. It is common to begin by reducing the resolution of the Cb and Cr channels by a factor of two in both directions, reducing the size of the color channels by a factor of four. (Look up Chroma Subsampling.)
Second, you should use a discrete cosine transform (DCT), which is effectively the real part of the discrete Fourier transform of the samples shifted over one-half step. What is done in JPEG is to break the image up into 8x8 blocks for each channel, and doing a DCT on every column and row of each block. Then the DC component is in the upper left corner, and the AC components increase in frequency as you go down and to the left. You can use whatever block size you like, though the overall computation time of the DCT will go up with the size, and the artifacts from the lossy step will have a broader reach.
Now you can make it lossy by quantizing the resulting coefficients, more so in the higher frequencies. The result will generally have lots of small and zero coefficients, which is then highly compressible with run-length and Huffman coding.

Calculating a 48-bit image histogram

I have a 48-bit (16 bits per pixel) image I've loaded with FreeImage. I'm trying to generate a histogram from this image without having to convert it to a 24-bit image.
This is how I understand histograms are calculated..
for (pixel in pixels)
{
red_histo[pixel.red]++;
}
Where pixel.red can be between 0 and 255. So there is a range from 0 to 255 on my histogram. But if there is 16 bits per pixel, it could be between 0 and 65535, which is too large to be displayed on a histogram.
Is there a standard way to calculate histograms with 48-bit (or higher) images?

You have to decide how many bins you need in the histogram. For eg. the Matlab histogram function takes these forms
imhist(I)
imhist(I, n)
imhist(X, map)
In the first case, the number of bins is by default used as 256. So, if you have 16bit input, these will be scaled down to 8 bit and split into 256 bin histogram.
In the second one, you can specify number of bins 'n'. Lets say you specify n=2 for your 16 bit data. Then, this will essentially split the histogram as [0-2^15, 2^15-2^16-1].
The third case is where you specify the map for each bin. ie you have to specify the ranges of the pixel values for each bin.
http://www.mathworks.com/help/images/ref/imhist.html
How you want to choose the number of bins depends on your requirement.

This Stack Overflow Question May have the answer you are looking for.
I do not know if there is a "standard" way.

If this is for display purposes you can scale back the pixels to keep the range from 0-255 for instance:
double scalingFactor = 255/65535;
for (pixel in pixels)
{
red_histo[(int)(scalingFactor * pixel.red)]++;
}
This will allow the upper range of the 16 bit pixel to come in at 255 and lower range of the 16 bit pixel to come in at 0.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart