How does a predictive coding aid in lossless compression? - image-processing

I'm working on this lab where we need to apply a lossless predictive coding to an image before compressing it (with Huffman, or some other lossless compression algorithm).
From the example seen below, it's pretty clear that by pre-processing the image with predictive coding, we've modified its histogram and concentrated all of its grey levels around 0. But why exactly does this aid compression?
Is there maybe a formula to determine the compression rate of Huffman, knowing the standard deviation and entropy of the original image? Otherwise, why would the compression ratio be any different; it's not like the range of values has changed between the original image and pre-processed image.
Thank you in advance,
Liam.

Related

Why image compression doesnt use overlapped data?

For example in audio codecs like Opus, MDCT is used with 50% percent overlap to avoid ringing artifacts. Why a similar approach is not used in image codecs. e.g., JPEG uses non-overlapping 8x8 blocks ?
Later lossy image codecs like JPEG2000 do use overlapped transforms, but these techniques just weren't around when JPEG was being defined. The wavelet transform that JPEG2000 is based on hadn't been invented yet, and time-domain anti-aliasing techniques like MDCT were extremely new.
For the MDCT in particular, as far as I know it is not used for image compression at all, even Today. I would guess that's because its basis vectors are asymmetric, which makes it intuitively difficult to choose for imaging applications.

JPG compression and noise

I am studying jpeg compression and it seems to work by reducing high frequency components in images. Since noise is usually high frequency, does this imply that jpeg compression somewhat works on reducing noise in images?
JPEG compression can reduce noise by smoothing out the high-frequency components of the image, but it also introduces visual noise in the form of compression artifacts. Here is a zoomed-in (3x) view of part of my avatar (a high-quality JPEG) and part of your avatar (a PNG drawing), on the left as downloaded and on the right as compressed with ImageMagick using -quality 60. To my eye they both look "noisier" when JPEG-compressed.
Strictly speaking, no.
JPEG does remove high frequencies (see below), but not selectively enough to be a denoising algorithm. In other words, it will remove high frequencies if they are noise, but also if they are useful detail information.
To understand this, it helps to know the basics of how JPEG works. First, the image is divided in 8x8 blocks. Then the discrete cosine transform (DCT) is applied. As a result, each element of the 8x8 block contains the "weight" of a different frequency. Then the elements are quantized in a fixed way depending on the quality level selected a priori. This quantization means gaining coding performance at the cost of losing precision. The amount of precision lost is fixed a priori, and (as I said above) it does not differenciate between noise and useful detail.
You can test this yourself by saving the same image with different qualities (which technically control the amount of quantization applied to each block) and see that not only noise is removed. There is a nice video showing this effect for different quality levels here: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv.

Need advice on training Tesseract OCR (text with conversion/compression artifacts)

I need to do OCR on images that have gone through a digital to analog (interlaced video) to digital conversion, then jpeg compressed (resulting in compression artifacts). I have not been able to locate the exact fonts used, but we'll be looking at a mix of sans serif - e.g., Arial, Calibri, and Tiresias might work well as a training set. There is no way to get around the jpeg compression. These are text-only, white-on-black images at standard def resolution (720x480 deinterlaced).
An example is located here, resized at 1000%:
I've found a preprocessing pipeline that works fairly well for Tesseract:
Resize to 400-600%
Blur
Threshold (binarization)
Erode (get thinner stroke width)
One problem is that letters like 't' and 'f' end up with a diamond shape at the cross. Still, this process works well, but isn't quite perfect. So I'd like to train tesseract. My question:
How should I create the training set?
Should I try to emulate the analog-to-digital-to-analog by adding a small amount of noise, then compress with jpeg? Should I do preprocessing on my training set, similar to what I listed above? If I train with noisy jpeg compressed images to match my captured images, is it best to skip preprocessing on the captured images?
Additionally, any hints on getting rid of the conversion/compression artifacts without sacrificing the text would be appreciated.

Image blending modes for HDR images

The blending modes Screen, Color Dodge, Soft Light, etc.
like in Photoshop, each have their own math that works
for range 0-1. I wonder how do these blend modes work
for HDR images?
Thanks
I am not familiar with photoshop and it's filter but here is a general explanation of the math behind HDR filters.
Suppose you have 3 images (low light, medium and over exposed). You want to average those images but (I1+I2+I3)/3 is a stupid way. You want to give a higher weight to the image that captures more information in a given area.
So basically you average the images with a weight factor and there are different types of algorithms to calculate the weights. Here are few:
The simplest one is using STD (standard deviation). In each pixel, in each image calculate standard deviation of its 9 neighbours. Use std as weight:
HDR pixel(i,j) = I1(i,j)*stdI1(i,j) + I2(i,j)*stdI2(i,j) + I3(i,j)*stdI3(i,j).
Why std is used? since when std is high it means a high variation in pixels intencity which means more information was captured by the image.
Instead of STD you can use entropy filter, edge detection or any other which represents how much information is encoded around the given pixel
There are also slower but better ways to do HDR. Usually it is done with some kind of wavelet transformation. For example Furier transform. Each image is converted to furier space (coefficients of the frequencies and than the for each frequency, the maximal coefficient of 3 images is taken).
You can even combine the method of std filter and wavelet transforms. For example break the image to different frequencies, smooth the lower frequencies and take a stupid average (I1+I2+I3)/3, but with high frequencies use less smoothing and using std weighted average. The action of smoothing more lower frequencies is called 'blending'. It heavily used when stitching 2 images of different light exposure to a panorama.
Look at this image: http://magazine.magix.com/en/wp-content/uploads/2012/05/Panorama-3.jpg
You can clearly see that the sky gets different color on each image but since sky is a very low frequency (almost no information and no small object) it is heavily smoothed and averaged, thus allowing a gentle stitching.
Hope that answers your question

Image Processing - Does PSNR and SSIM metrics show smoothing (noise reduction) quality?

For my Image Processing class project, I am filtering an image with various filter algorithms (bilateral filter, NL-Means etc..) and trying to compare results with changing parameters. I came across PSNR and SSIM metrics to measure filter quality but could not fully understand what the values mean. Can anybody help me about:
Does a higher PSNR value means higher quality smoothing (getting rid of noise)?
Should SSIM value be close to 1 in order to have high quality smoothing?
Are there any other metrics or methods to measure smoothing quality?
I am really confused. Any help will be highly appreciated. Thank you.
With respect to an ideal result image, the PSNR computes the mean squared reconstruction error after denoising. Higher PSNR means more noise removed. However, as a least squares result, it is slightly biased towards over smoothed (= blurry) results, i.e. an algorithm that removes not only the noise but also a part of the textures will have a good score.
SSIm has been developed to have a quality reconstruction metric that also takes into account the similarity of the edges (high frequency content) between the denoised image and the ideal one. To have a good SSIM measure, an algorithm needs to remove the noise while also preserving the edges of the objects.
Hence, SSIM looks like a "better quality measure", but it is more complicated to compute (and the exact formula involves one number per pixel, while PSNR gives you an average value for the whole image).
Expanding on #sansuiso's answer
There are a lot of others Image quality measures you can use to evaluate the de-noising capability of various filters in your case NL means , bilateral filter etc
Here is a chart that demonstrates the various parameters that could be used
Yes and more the PSNR better is the de- noising capability
Here is a paper where you can find the details regarding these parameters and the MATLAB codes could be found here
PSNR is the evaluation standard of the reconstructed image quality, and is important feature
The large the value of NAE means that image is poor quality
The large value of SC means that image is a poor quality.
Regarding this article:
http://icpr2010.org/pdfs/icpr2010_WeAT8.44.pdf
I found out that the PSNR can be obtained by SSIM and vice-versa. And PSNR is more sensitive to the noise than SSIM. By the other hand the other paramethers are almost equal in sensitivity by both: Gaussian Blur and discriminating Quality.

Resources