I'm wonder if there is a way to automatically choose a reasonable JPEG compression level in OpenCV?
The current JPEG sizes I'm getting are too large, and nailing it to a fixed value feels dirty. If I recall such features existed in image editors such as Dreamweaver. If there is no such features, i'm also wondering if somebody knows of an algorithm that is able to estimate this parameter without performing hard disk IO.
std::vector<int> params;
params.push_back(CV_IMWRITE_JPEG_QUALITY);
params.push_back(magic); //Want a way to estimate magic
cv::imwrite("my.jpg",image,params);
Unfortunately, to "optimize" JPEG compression, one would have to learn and apply many technical details about the JPEG compression. Because of this, many libraries do not offer the full suite of adjustment parameters. The 0-100 JPEG quality parameter is already a good compromise.
ImageMagick may have such functionality.
You are looking for a way to "automatically choose a reasonable JPEG compression level in OpenCV".
However, "reasonable" is subjective, and depends on the the image owner's perception of what features are important in the given image. This means the perception can be different for every combination of (different owners) x (different images).
The short answer
No, OpenCV does not currently offer this functionality.
The "sysadmin" answer
Look at OpenCV ImageMagick integration.
http://www.imagemagick.org/discourse-server/viewtopic.php?f=22&t=20333&start=45
The quick and dirty answer
Use method of bisection (0, 100, 50, 75, 87, ...) to search for a JPEG quality level that will approach a specified output file size.
Secant method may also be applicable.
Edited: Newton's method is probably not useful, because one cannot obtain the first derivative of the quality-file size curve without an analytical model.
Obviously this is too inefficient for practical every-day use, so it is not provided by the library.
If you want to use it, you have to implement it yourself with your own choice of techniques.
To avoid disk I/O, use cv::imencode which writes to memory instead of to disk.
The slightly longer answer
Although it doesn't implement this functionality, it is obvious that it is a nice feature to have.
If someone is willing to implement it with code quality good for use in OpenCV, OpenCV may consider accept it.
The yet longer answer
OpenCV uses jpeglib, or optionally libjpeg-turbo, and both libraries allow one to configure the technical details of JPEG compression.
Below I will focus on these technical details.
Read first: JPEG compression on Wikipedia
Of the JPEG compression pipeline, three of the compression steps can be configured by users of jpeglib or libjpeg-turbo:
Chroma subsampling
After the conversion from RGB to YCbCr, the chroma (color-carrying) channels: Chroma-blue and Chroma-red, are optionally stored in a lower resolution relative to the Luminance (Y) channel, also known as the Intensity or Grayscale channel, the latter is always stored at full resolution.
Most JPEG decoders can support these downsampling factors:
(1, 1) - no subsampling
(1, 2), (2, 1), (2, 2) - moderate subsampling, where one or both dimensions may be subsampled by 2.
(1, 4), (2, 4), (4, 2), (4, 1) - heavy subsampling. Note that the original JPEG specification forbids some of these combinations, but most JPEG decoders are able to decode them nevertheless.
Quantization table
Each JPEG image can define a quantization table for the "AC coefficients" of the DCT transformed coefficients
Each JPEG image can define a quantization table for the "DC coefficient" (i.e. the average value of the 8x8 block) computed from the DCT transform.
Quantization is the "lossy step" of JPEG compression. So, a technical user will have to decide how much loss (quantization) is acceptable, and then configure the quantization table accordingly.
Huffman table
Huffman coding is a lossless compression technique. In other words, if one could really spend time optimizing the Huffman coding table based on the statistics of the quantized DCT coefficients of the whole image, one can often construct a good Huffman table to optimize compression without having to trade off quality.
Unfortunately, the reality is more complicated, and such optimization is often not enabled.
It requires keeping all DCT coefficients in memory, for the whole image. This bloats memory usage.
Writing to the file cannot start until everything is in memory. In contrast, if a library chooses the quantization table and Huffman table up-front, without looking at the statistics of the DCT coefficients, then the library would be able to write to the file incrementally as rows and rows of pixels are being processed. Because libjpeg is designed to be usable in the lowest-denominator devices (including smart watches, and maybe your refrigerator too?), being able to operate with minimum memory is an important feature.
Sorry but there is no way to tell the size before you make compress the file. If you are not in a hurry, compress the image using different quality values and then select the best one.
Related
For example in audio codecs like Opus, MDCT is used with 50% percent overlap to avoid ringing artifacts. Why a similar approach is not used in image codecs. e.g., JPEG uses non-overlapping 8x8 blocks ?
Later lossy image codecs like JPEG2000 do use overlapped transforms, but these techniques just weren't around when JPEG was being defined. The wavelet transform that JPEG2000 is based on hadn't been invented yet, and time-domain anti-aliasing techniques like MDCT were extremely new.
For the MDCT in particular, as far as I know it is not used for image compression at all, even Today. I would guess that's because its basis vectors are asymmetric, which makes it intuitively difficult to choose for imaging applications.
I am studying jpeg compression and it seems to work by reducing high frequency components in images. Since noise is usually high frequency, does this imply that jpeg compression somewhat works on reducing noise in images?
JPEG compression can reduce noise by smoothing out the high-frequency components of the image, but it also introduces visual noise in the form of compression artifacts. Here is a zoomed-in (3x) view of part of my avatar (a high-quality JPEG) and part of your avatar (a PNG drawing), on the left as downloaded and on the right as compressed with ImageMagick using -quality 60. To my eye they both look "noisier" when JPEG-compressed.
Strictly speaking, no.
JPEG does remove high frequencies (see below), but not selectively enough to be a denoising algorithm. In other words, it will remove high frequencies if they are noise, but also if they are useful detail information.
To understand this, it helps to know the basics of how JPEG works. First, the image is divided in 8x8 blocks. Then the discrete cosine transform (DCT) is applied. As a result, each element of the 8x8 block contains the "weight" of a different frequency. Then the elements are quantized in a fixed way depending on the quality level selected a priori. This quantization means gaining coding performance at the cost of losing precision. The amount of precision lost is fixed a priori, and (as I said above) it does not differenciate between noise and useful detail.
You can test this yourself by saving the same image with different qualities (which technically control the amount of quantization applied to each block) and see that not only noise is removed. There is a nice video showing this effect for different quality levels here: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv.
i have just downloaded the latest win32 jpegtran.exe from http://jpegclub.org/jpegtran/ and observed the following:
i have prepared a 24 BPP jpeg test image with 14500 x 10000 pixels.
compressed size in file system is around 7.5 MB.
decompressing into memory (with some image viewer) inflates to around 450 MB.
monitoring the jpegtran.exe command line tool's memory consumption during lossless rotation (180) i can see the process consuming up to 900 MB memory!
i would have assumed that such jpeg lossless transformations don't require decoding the image file into memory and instead would just perform some mathematical transformations on the encoded file itself - keeping the memory footprint very low.
so which of the following is true?
some bug in this particular tool's implementation
some configuration switch i have missed
some misunderstanding at my end (i.e. jpeg lossless transformations also need to decode the image into memory?)
the "mathematical operations" consuming even more memory than "decoding the image into memory"
edit:
according to the answer by JasonD the reason seems to be the latter one. so i'll extend my question:
are there any implementations that can do those operations in small chunks (to avoid high memory usage)? or does it always need to be done on the whole and there's no way around it?
PS:
i'm not planning to implement my own codec / algorithm. instead i'm asking if there are any implementations out there that meet my requirements. or if there could be in theory, at least.
I don't know about the library in question, but in order to perform a lossless rotation on a jpeg image, you would at least have to decompress the DCT coefficients in order to rotate them, and then re-compress.
The DCT coefficients, fully expanded, will be the same size or larger than the original image data, as they have more bits of information.
It's lossless, because the loss in a jpeg is caused by quantization of the DCT coefficients. So long as you don't decode/re-encode/re-quantize these, no loss will be incurred.
But it will be memory intensive.
jpeg compression works very roughly as follows:
Convert image into YCbCr colour space.
Optionally downsample some of the channels (colour error is less perceptible than luminance error, so it is typical to 2x downsample the chroma channels). This is obviously lossy, but very predictably/stably so.
Transform 8x8 blocks of the image by a discrete cosine transform (DCT), moving the image into frequency space. The DCT coefficients are also in an 8x8 block, and use more bits for storage than the 8-bit image data did.
Quantize the DCT coefficients by a variable amount (this is the quality setting in most packages). The aim is to produce as many small and especially zero coefficients as possible. The is the main "lossy" aspect of jpeg compression.
Zig-zag through the 2D data to turn it into a 1D stream of coefficients which is roughly in frequency order. High frequencies are more likely to be zero'd out, so many packets will ideally end in a stream of zeros which can be truncated.
Compress (non-lossily) the (now quite compressible) data using huffman encoding.
So a 'non-lossy' transformation would want to avoid doing as much as possible of that - especially anything beyond the DCT quantization, but that does not avoid expanding the data.
I'm working with lots of camera's which capture in BG bayer pattern natively.
Now, every time I record some data, I save it to the disk in the raw bayer pattern, in an avi container. The problem is, that this really adds up after a while. After one year of research, I have close to 4TB of data...
So I'm looking for a lossless codec to compress this data. I know I could use libx264 (with --qp 0), or huffYUV, dirac or jpeg2000, but they all assume you have RGB or YUV data. It's easy enough to convert the bayered data to RGB, and then compress it, but it kind of defeats the purpose of compression if you first triple the data. This would also mean that the demoasicing artefacts introduced by debayering would also be in my source data, which is also not too great. It would be nice to have a codec that can work on the bayered data directly.
Even more nice would be that the solution would involve a codec that is already supported by gstreamer (or ffmpeg), since that's what I am already using.
A rather late suggestion, maybe useful for others..
It helps to deinterleave the Bayer pattern into four quadrants and then treat that image as grayscale. The sub-images (e.g. all red pixels in top left) have half the spatial resolution, but their pixels are more highly correlated. This leads to lower residuals from predictors using nearby pixels and therefore to better compression ratios.
I've seen this reach 2-3x lossless compression on 12-bit raw camera data.
If a commercial solution is ok, check out Cineform. I've used their sdk for a custom video compressor and it works great plus they have some great tools for processing the raw video.
Or if you prefer the open source route check out Elphel JP4.
All I know about Bayer Patterns I learned from Wikipedia, but isn't conversion to RGB more of a deinterlacing than a tripling? Doesn't the resolution for red and blue go down by a factor of 4 and green by a factor of 2? If so, a lossless image compression scheme like lossless jpeg might be just the thing.
I have some (millions) of 16-bit losslessly compressed TIFFs (about 2MB each) and after exhausting TB of disk space I think it's time I archive the older TIFFs as 8-bit JPEGs. Each individual image is a grayscale image, though there may be as many as 5 such images representing the same imaging area at different wavelengths. Now I want to preserve as much information as possible in this process, including the ability to restore the images to their approximate original values. I know there are ways to get further savings through spatial correlations across multiple channels, but the number of channels can vary, and it would be nice to be able to load channels independently.
The images themselves suggest some possible strategies to use since close to ~60% of the area in each image is dark 'background'. So one way to preserve more of the useful image range is just to threshold away anything below this 'background' before scaling and reducing the bit depth. This strategy is, of course, pretty subjective, and I'm looking for any other suggestions for strategies that are demonstrably superior and/or more general. Maybe something like trying to preserve the most image entropy?
Thanks.
Your 2MB TIFFs are already losslessly compressed, so you would be hard-pressed to find a method that allows you to "restore the images" to their original value ranges without some loss of intensity detail.
So here are some questions to narrow down your problem a bit:
What are the image dimensions and number of channels? It's a bit difficult to guess from the filesize and bit depth alone, because as you've mentioned you're using lossless compression. A sample image would be good.
What sort of images are they? E.g. are they B/W blueprints, X-ray/MRI images, color photographs. You mention that around 60% of the images is "background" -- could you tell us more about the image content?
What are they used for? Is it just for a human viewer, or are they training images for some computer algorithm?
What kind of coding efficiency are you expecting? E.g. for the current 2MB filesize, how small do you want your compressed files to be?
Based on that information, people may be able to suggest something. For example, if your images are just color photographs that people will look at, 4:2:0 chroma subsampling will give you a 50% reduction in space without any visually detectable quality loss. You may even be able to keep your 16-bit image depth, if the reduction is sufficient.
Finally, note that you've compared two fundamentally different things in your question:
"top ~40% of the pixels" -- here it sounds like you're talking about contiguous parts of the intensity spectrum (e.g. intensities from 0.6 to 1.0) -- essentially the probability density function of the image.
"close to ~60% of the area in each image" -- here you're talking about the distribution of pixels in the spatial domain.
In general, these two things are unrelated and comparing them is meaningless. There may be an exception for specific image content -- please put up a representative image to make it obvious what you're dealing with.
If you edit your question, I'll have a look and reply if I think of something.