I am studying jpeg compression and it seems to work by reducing high frequency components in images. Since noise is usually high frequency, does this imply that jpeg compression somewhat works on reducing noise in images?
JPEG compression can reduce noise by smoothing out the high-frequency components of the image, but it also introduces visual noise in the form of compression artifacts. Here is a zoomed-in (3x) view of part of my avatar (a high-quality JPEG) and part of your avatar (a PNG drawing), on the left as downloaded and on the right as compressed with ImageMagick using -quality 60. To my eye they both look "noisier" when JPEG-compressed.
Strictly speaking, no.
JPEG does remove high frequencies (see below), but not selectively enough to be a denoising algorithm. In other words, it will remove high frequencies if they are noise, but also if they are useful detail information.
To understand this, it helps to know the basics of how JPEG works. First, the image is divided in 8x8 blocks. Then the discrete cosine transform (DCT) is applied. As a result, each element of the 8x8 block contains the "weight" of a different frequency. Then the elements are quantized in a fixed way depending on the quality level selected a priori. This quantization means gaining coding performance at the cost of losing precision. The amount of precision lost is fixed a priori, and (as I said above) it does not differenciate between noise and useful detail.
You can test this yourself by saving the same image with different qualities (which technically control the amount of quantization applied to each block) and see that not only noise is removed. There is a nice video showing this effect for different quality levels here: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv.
Related
As I know, there are some functions in the CMOS Image Sensor ISP (Image Signal Processor).
Specifically, I'd like to know the difference between binning and sub-sampling. I think these purpose is same to reduce image size.
However, I'm not sure why these functions exist?
What is their purpose?
Binning and sub-sampling reduce the image size as you have suspected, but what they focus on are different things. Let's tackle each issue separately
Binning
Binning in image processing deals primarily with quantization. The closest thing I can think of is related to what is known as data binning. Basically, consider breaking up your image into distinct (non-overlapping) M x N tiles, where M and N are the rows and columns of a tile and M and N should be much smaller than the rows and columns of the image.
If you consider any grid of M x N pixels, all of these pixels get replaced with a representative colour. The way this representative colour is calculated is done in many ways... the average is a popular method. The reason why binning is performed is primarily as a data pre-processing technique which is used to reduce the effects of minor observation errors. This effectively reduces the amount of information that is representative of the image, and so it certainly reduces the image size by reducing the amount of unique colours that represent the image.
In addition, binning the data may also reduce the impact of noise that impacts the CMOS sensor on the final processed image, but at the cost of a lower dynamic range of colours.
Sub-sampling
Sub-sampling in the case of image processing mostly deals with image resizing. It's also called image scaling. The goal is to take an image and reduce its dimensions so that you get a smaller image as a result. Binning deals with keeping the image the same size (i.e. the same dimensions as the original) while reducing the amount of colours which ultimately reduces the amount of space the image takes up. Subsampling reduces the image size by removing information all together. Usually when you subsample, you also interpolate or smooth the image so that you reduce aliasing.
Sub-sampling has another application in video processing - especially in MPEG where video is encoded in YCbCr. Y is the luminance while Cb and Cr are the chrominance pairs. We tend to notice changes in luminance rather than chrominance, and so the chrominance is subsampled to reduce the amount of space taken up by the video. Specifically, the human visual system has poor acuity when it comes to colour information than we do with luminance / intensity. Usually, the chrominance values are filtered then subsampled by 1/2 or even 1/4 of that of the intensity. Even with a rather high subsampling rate, we don't notice any differences in terms of perceived image quality.
This is obviously a rather rough introduction on the differences between them both, but I hope this gives you enough of what you're after for your purposes.
Good luck!
I'm wonder if there is a way to automatically choose a reasonable JPEG compression level in OpenCV?
The current JPEG sizes I'm getting are too large, and nailing it to a fixed value feels dirty. If I recall such features existed in image editors such as Dreamweaver. If there is no such features, i'm also wondering if somebody knows of an algorithm that is able to estimate this parameter without performing hard disk IO.
std::vector<int> params;
params.push_back(CV_IMWRITE_JPEG_QUALITY);
params.push_back(magic); //Want a way to estimate magic
cv::imwrite("my.jpg",image,params);
Unfortunately, to "optimize" JPEG compression, one would have to learn and apply many technical details about the JPEG compression. Because of this, many libraries do not offer the full suite of adjustment parameters. The 0-100 JPEG quality parameter is already a good compromise.
ImageMagick may have such functionality.
You are looking for a way to "automatically choose a reasonable JPEG compression level in OpenCV".
However, "reasonable" is subjective, and depends on the the image owner's perception of what features are important in the given image. This means the perception can be different for every combination of (different owners) x (different images).
The short answer
No, OpenCV does not currently offer this functionality.
The "sysadmin" answer
Look at OpenCV ImageMagick integration.
http://www.imagemagick.org/discourse-server/viewtopic.php?f=22&t=20333&start=45
The quick and dirty answer
Use method of bisection (0, 100, 50, 75, 87, ...) to search for a JPEG quality level that will approach a specified output file size.
Secant method may also be applicable.
Edited: Newton's method is probably not useful, because one cannot obtain the first derivative of the quality-file size curve without an analytical model.
Obviously this is too inefficient for practical every-day use, so it is not provided by the library.
If you want to use it, you have to implement it yourself with your own choice of techniques.
To avoid disk I/O, use cv::imencode which writes to memory instead of to disk.
The slightly longer answer
Although it doesn't implement this functionality, it is obvious that it is a nice feature to have.
If someone is willing to implement it with code quality good for use in OpenCV, OpenCV may consider accept it.
The yet longer answer
OpenCV uses jpeglib, or optionally libjpeg-turbo, and both libraries allow one to configure the technical details of JPEG compression.
Below I will focus on these technical details.
Read first: JPEG compression on Wikipedia
Of the JPEG compression pipeline, three of the compression steps can be configured by users of jpeglib or libjpeg-turbo:
Chroma subsampling
After the conversion from RGB to YCbCr, the chroma (color-carrying) channels: Chroma-blue and Chroma-red, are optionally stored in a lower resolution relative to the Luminance (Y) channel, also known as the Intensity or Grayscale channel, the latter is always stored at full resolution.
Most JPEG decoders can support these downsampling factors:
(1, 1) - no subsampling
(1, 2), (2, 1), (2, 2) - moderate subsampling, where one or both dimensions may be subsampled by 2.
(1, 4), (2, 4), (4, 2), (4, 1) - heavy subsampling. Note that the original JPEG specification forbids some of these combinations, but most JPEG decoders are able to decode them nevertheless.
Quantization table
Each JPEG image can define a quantization table for the "AC coefficients" of the DCT transformed coefficients
Each JPEG image can define a quantization table for the "DC coefficient" (i.e. the average value of the 8x8 block) computed from the DCT transform.
Quantization is the "lossy step" of JPEG compression. So, a technical user will have to decide how much loss (quantization) is acceptable, and then configure the quantization table accordingly.
Huffman table
Huffman coding is a lossless compression technique. In other words, if one could really spend time optimizing the Huffman coding table based on the statistics of the quantized DCT coefficients of the whole image, one can often construct a good Huffman table to optimize compression without having to trade off quality.
Unfortunately, the reality is more complicated, and such optimization is often not enabled.
It requires keeping all DCT coefficients in memory, for the whole image. This bloats memory usage.
Writing to the file cannot start until everything is in memory. In contrast, if a library chooses the quantization table and Huffman table up-front, without looking at the statistics of the DCT coefficients, then the library would be able to write to the file incrementally as rows and rows of pixels are being processed. Because libjpeg is designed to be usable in the lowest-denominator devices (including smart watches, and maybe your refrigerator too?), being able to operate with minimum memory is an important feature.
Sorry but there is no way to tell the size before you make compress the file. If you are not in a hurry, compress the image using different quality values and then select the best one.
i have just downloaded the latest win32 jpegtran.exe from http://jpegclub.org/jpegtran/ and observed the following:
i have prepared a 24 BPP jpeg test image with 14500 x 10000 pixels.
compressed size in file system is around 7.5 MB.
decompressing into memory (with some image viewer) inflates to around 450 MB.
monitoring the jpegtran.exe command line tool's memory consumption during lossless rotation (180) i can see the process consuming up to 900 MB memory!
i would have assumed that such jpeg lossless transformations don't require decoding the image file into memory and instead would just perform some mathematical transformations on the encoded file itself - keeping the memory footprint very low.
so which of the following is true?
some bug in this particular tool's implementation
some configuration switch i have missed
some misunderstanding at my end (i.e. jpeg lossless transformations also need to decode the image into memory?)
the "mathematical operations" consuming even more memory than "decoding the image into memory"
edit:
according to the answer by JasonD the reason seems to be the latter one. so i'll extend my question:
are there any implementations that can do those operations in small chunks (to avoid high memory usage)? or does it always need to be done on the whole and there's no way around it?
PS:
i'm not planning to implement my own codec / algorithm. instead i'm asking if there are any implementations out there that meet my requirements. or if there could be in theory, at least.
I don't know about the library in question, but in order to perform a lossless rotation on a jpeg image, you would at least have to decompress the DCT coefficients in order to rotate them, and then re-compress.
The DCT coefficients, fully expanded, will be the same size or larger than the original image data, as they have more bits of information.
It's lossless, because the loss in a jpeg is caused by quantization of the DCT coefficients. So long as you don't decode/re-encode/re-quantize these, no loss will be incurred.
But it will be memory intensive.
jpeg compression works very roughly as follows:
Convert image into YCbCr colour space.
Optionally downsample some of the channels (colour error is less perceptible than luminance error, so it is typical to 2x downsample the chroma channels). This is obviously lossy, but very predictably/stably so.
Transform 8x8 blocks of the image by a discrete cosine transform (DCT), moving the image into frequency space. The DCT coefficients are also in an 8x8 block, and use more bits for storage than the 8-bit image data did.
Quantize the DCT coefficients by a variable amount (this is the quality setting in most packages). The aim is to produce as many small and especially zero coefficients as possible. The is the main "lossy" aspect of jpeg compression.
Zig-zag through the 2D data to turn it into a 1D stream of coefficients which is roughly in frequency order. High frequencies are more likely to be zero'd out, so many packets will ideally end in a stream of zeros which can be truncated.
Compress (non-lossily) the (now quite compressible) data using huffman encoding.
So a 'non-lossy' transformation would want to avoid doing as much as possible of that - especially anything beyond the DCT quantization, but that does not avoid expanding the data.
The blending modes Screen, Color Dodge, Soft Light, etc.
like in Photoshop, each have their own math that works
for range 0-1. I wonder how do these blend modes work
for HDR images?
Thanks
I am not familiar with photoshop and it's filter but here is a general explanation of the math behind HDR filters.
Suppose you have 3 images (low light, medium and over exposed). You want to average those images but (I1+I2+I3)/3 is a stupid way. You want to give a higher weight to the image that captures more information in a given area.
So basically you average the images with a weight factor and there are different types of algorithms to calculate the weights. Here are few:
The simplest one is using STD (standard deviation). In each pixel, in each image calculate standard deviation of its 9 neighbours. Use std as weight:
HDR pixel(i,j) = I1(i,j)*stdI1(i,j) + I2(i,j)*stdI2(i,j) + I3(i,j)*stdI3(i,j).
Why std is used? since when std is high it means a high variation in pixels intencity which means more information was captured by the image.
Instead of STD you can use entropy filter, edge detection or any other which represents how much information is encoded around the given pixel
There are also slower but better ways to do HDR. Usually it is done with some kind of wavelet transformation. For example Furier transform. Each image is converted to furier space (coefficients of the frequencies and than the for each frequency, the maximal coefficient of 3 images is taken).
You can even combine the method of std filter and wavelet transforms. For example break the image to different frequencies, smooth the lower frequencies and take a stupid average (I1+I2+I3)/3, but with high frequencies use less smoothing and using std weighted average. The action of smoothing more lower frequencies is called 'blending'. It heavily used when stitching 2 images of different light exposure to a panorama.
Look at this image: http://magazine.magix.com/en/wp-content/uploads/2012/05/Panorama-3.jpg
You can clearly see that the sky gets different color on each image but since sky is a very low frequency (almost no information and no small object) it is heavily smoothed and averaged, thus allowing a gentle stitching.
Hope that answers your question
I have some (millions) of 16-bit losslessly compressed TIFFs (about 2MB each) and after exhausting TB of disk space I think it's time I archive the older TIFFs as 8-bit JPEGs. Each individual image is a grayscale image, though there may be as many as 5 such images representing the same imaging area at different wavelengths. Now I want to preserve as much information as possible in this process, including the ability to restore the images to their approximate original values. I know there are ways to get further savings through spatial correlations across multiple channels, but the number of channels can vary, and it would be nice to be able to load channels independently.
The images themselves suggest some possible strategies to use since close to ~60% of the area in each image is dark 'background'. So one way to preserve more of the useful image range is just to threshold away anything below this 'background' before scaling and reducing the bit depth. This strategy is, of course, pretty subjective, and I'm looking for any other suggestions for strategies that are demonstrably superior and/or more general. Maybe something like trying to preserve the most image entropy?
Thanks.
Your 2MB TIFFs are already losslessly compressed, so you would be hard-pressed to find a method that allows you to "restore the images" to their original value ranges without some loss of intensity detail.
So here are some questions to narrow down your problem a bit:
What are the image dimensions and number of channels? It's a bit difficult to guess from the filesize and bit depth alone, because as you've mentioned you're using lossless compression. A sample image would be good.
What sort of images are they? E.g. are they B/W blueprints, X-ray/MRI images, color photographs. You mention that around 60% of the images is "background" -- could you tell us more about the image content?
What are they used for? Is it just for a human viewer, or are they training images for some computer algorithm?
What kind of coding efficiency are you expecting? E.g. for the current 2MB filesize, how small do you want your compressed files to be?
Based on that information, people may be able to suggest something. For example, if your images are just color photographs that people will look at, 4:2:0 chroma subsampling will give you a 50% reduction in space without any visually detectable quality loss. You may even be able to keep your 16-bit image depth, if the reduction is sufficient.
Finally, note that you've compared two fundamentally different things in your question:
"top ~40% of the pixels" -- here it sounds like you're talking about contiguous parts of the intensity spectrum (e.g. intensities from 0.6 to 1.0) -- essentially the probability density function of the image.
"close to ~60% of the area in each image" -- here you're talking about the distribution of pixels in the spatial domain.
In general, these two things are unrelated and comparing them is meaningless. There may be an exception for specific image content -- please put up a representative image to make it obvious what you're dealing with.
If you edit your question, I'll have a look and reply if I think of something.