I am trying to understand the jpeg compression algorithm. If I have a 3-channel color image, do I have to take 3 different Discrete cosine transform (DCT) and quantize for each channel? And after taking inverse DCT, will the result be an jpeg image?
If I have a 3-channel color image, do I have to take 3 different Discrete cosine transform (DCT) and quantize for each channel?
Yes, except that the color values are normally converted from RGB to YCbCr first.
Then you have to do run-length compression and Huffman coding on the resulting values. The DCT alone negatively compresses.
Related
I have a 512x512 grayscale image (or MultiArray) which is the output of a CoreML depth estimation model.
In Python, one can use Matplotlib or other packages to visualise grayscale images in different colormaps, like so:
Grayscale
Magma
[Images from https://ai.googleblog.com/2019/08/turbo-improved-rainbow-colormap-for.html]
I was wondering if there was any way to take said output and present it as a cmap in Swift/iOS?
If you make the model output an image, you get a CVPixelBuffer object. This is easy enough to draw on the screen by converting it to a CIImage and then a CGImage.
If you want to draw it with a colormap, you'll have to replace each of the grayscale values with a color manually. One way to do this is to output an MLMultiArray and loop through each of the output values, and use a lookup table for the colors. A quicker way is to do this in a Metal compute shader.
I'm getting all pixels' RGB values into
R=[],
G=[],
B=[]
arrays from the picture. They are 8 bits [0-255] values containing arrays. And I need to use Fourier Transform to compress image with a lossy method.
Fourier Transform
N will be the pixel numbers. n is i for array. What will be the k and imaginary j?
Can I implement this equation into a programming language and get the compressed image file?
Or I need to use the transformation equation to a different value instead of RGB?
First off, yes, you should convert from RGB to a luminance space, such as YCbCr. The human eye has higher resolution in luminance (Y) than in the color channels, so you can decimate the colors much more than the luminance for the same level of loss. It is common to begin by reducing the resolution of the Cb and Cr channels by a factor of two in both directions, reducing the size of the color channels by a factor of four. (Look up Chroma Subsampling.)
Second, you should use a discrete cosine transform (DCT), which is effectively the real part of the discrete Fourier transform of the samples shifted over one-half step. What is done in JPEG is to break the image up into 8x8 blocks for each channel, and doing a DCT on every column and row of each block. Then the DC component is in the upper left corner, and the AC components increase in frequency as you go down and to the left. You can use whatever block size you like, though the overall computation time of the DCT will go up with the size, and the artifacts from the lossy step will have a broader reach.
Now you can make it lossy by quantizing the resulting coefficients, more so in the higher frequencies. The result will generally have lots of small and zero coefficients, which is then highly compressible with run-length and Huffman coding.
I want to segment an image but someone told me that the Euclidean distance for RGB is not as good as HSV -- but for HSV, as not all H, S, V are of the same range so I need to normalize it. Is it a good idea to normalize HSV and then do clustering? If so, how should I normalize on HSV scale?
Thanks
As HSV components are signify Hue, Saturation and gray intensity of a pixel they are not correlated to each other in terms of color, each component have its own role in defining the property of that pixel, like Hue will give you information regarding color (wavelength in other terms) Saturation always shows how much percentage of white is mixed with that color and Value is nothing but magnitude of that color(in other term Intensity), that is why all components of HSV space not follow same scale for representation of the values while hue can goes negative(because these are cyclic values) on the scale as well but intensity (V) will never goes negative, so normalization will not help in clustering much, the Better idea is you should apply clustering only on Hue if you want to do color clustering.
Now why Euclidean is not good for multi-channel clustering is because its distribution along mean is spherical(for 2D circular) so if it can not make any difference between (147,175,208) and (208,175,147) both will have same distance from the center, its better to use Mahalanobis Distance for distance calculation because it uses Co-variance matrix of the components which makes this distance distribution Parabolic along the mean.
so if you want to do color segmentation in RGB color space use mahalanobis distance(but it will computationally extensive so it will slows down clustering process) and if you want to do clustering in HSV color space use Hue for the segmentation of colors and than use V for fine tuning of segmentation output.
Hope it will help. Thank You
Hue is cyclic.
Do not use the mean (and thus, k-means) on such data.
Firstly you need to know why HSV is more preffered than RGB in image segmentation. HSV separates color information (Chroma) and image intensity or brightness level (Luma) which is very useful if you want to do image segmentation. For example if you try to use RGB approach for a photo with sea as the background there is a big chance the dominant RGB component in the sea is not blue (usually because of shadow or illumination). But if you are using HSV, value is separated and you can construct a histogram or thresholding rules using only saturation and hue.
There is a really good paper that compared both RGB and HSV approach and I think it will be a good read for you -> http://www.cse.msu.edu/~pramanik/research/papers/2002Papers/icip.hsv.pdf
I have a problem with normalization.
Let me what the problem is and how I attempt to solve it.
I take a three-channel color image, convert it to grayscale and apply uniform or non-uniform quantization and the same thing.
To this image, I should apply the normalization, but I have a problem even if the image and grayscale and always has three channels.
How can I apply normalization having a three-channel image?
Should the min and the max all be in the three channels?
Could someone give me a hand?
The language I am using is processing 2.
P.S.
Can you do the same thing with a color image instead use a grayscale image?
You can convert between the 1-channel and 3-channel representations easily. I'd recommend scikit-image (http://scikit-image.org/).
from skimage.io import imread
from skimage.color import rgb2gray, gray2rgb
rgb_img = imread('path/to/my/image')
gray_img = rgb2gray(rgb_image)
# Now normalize gray image
gray_norm = gray_img / max(gray_img)
# Now convert back
rgb_norm = gray2rgb(gray_norm)
I worked with a similar problem sometime back. One of the good solutions to this was to:
Convert the image from RGB to HSI
Leaving the Hue and Saturation channels unchanged, simply normalize across the Intensity channel
Convert back to RGB
This logic can be applied accross several other image processing tasks, like for example, applying histogram equalization to RGB images.
I was writing code for histogram equalization upon RGB images?
It was suggested not performing equalization operation against R-G-B channels respectively.
So I first converted RGB to YUV color space and then performed equalization on Y channel (only), leaving U and V channel as what they were, converted altered Y channel with original U and V channels back to RGB color space.
The (RGB) resulting output was not ideal, while the gray scale ouput generated from Y channel only was quite acceptable.
My question is, Is it possible to get a full color RGB equalized ouput? And how? Should I perform equalization operation on U&V channel as well?