jpeg lossless transformations - memory consumption? - memory

i have just downloaded the latest win32 jpegtran.exe from http://jpegclub.org/jpegtran/ and observed the following:
i have prepared a 24 BPP jpeg test image with 14500 x 10000 pixels.
compressed size in file system is around 7.5 MB.
decompressing into memory (with some image viewer) inflates to around 450 MB.
monitoring the jpegtran.exe command line tool's memory consumption during lossless rotation (180) i can see the process consuming up to 900 MB memory!
i would have assumed that such jpeg lossless transformations don't require decoding the image file into memory and instead would just perform some mathematical transformations on the encoded file itself - keeping the memory footprint very low.
so which of the following is true?
some bug in this particular tool's implementation
some configuration switch i have missed
some misunderstanding at my end (i.e. jpeg lossless transformations also need to decode the image into memory?)
the "mathematical operations" consuming even more memory than "decoding the image into memory"
edit:
according to the answer by JasonD the reason seems to be the latter one. so i'll extend my question:
are there any implementations that can do those operations in small chunks (to avoid high memory usage)? or does it always need to be done on the whole and there's no way around it?
PS:
i'm not planning to implement my own codec / algorithm. instead i'm asking if there are any implementations out there that meet my requirements. or if there could be in theory, at least.

I don't know about the library in question, but in order to perform a lossless rotation on a jpeg image, you would at least have to decompress the DCT coefficients in order to rotate them, and then re-compress.
The DCT coefficients, fully expanded, will be the same size or larger than the original image data, as they have more bits of information.
It's lossless, because the loss in a jpeg is caused by quantization of the DCT coefficients. So long as you don't decode/re-encode/re-quantize these, no loss will be incurred.
But it will be memory intensive.
jpeg compression works very roughly as follows:
Convert image into YCbCr colour space.
Optionally downsample some of the channels (colour error is less perceptible than luminance error, so it is typical to 2x downsample the chroma channels). This is obviously lossy, but very predictably/stably so.
Transform 8x8 blocks of the image by a discrete cosine transform (DCT), moving the image into frequency space. The DCT coefficients are also in an 8x8 block, and use more bits for storage than the 8-bit image data did.
Quantize the DCT coefficients by a variable amount (this is the quality setting in most packages). The aim is to produce as many small and especially zero coefficients as possible. The is the main "lossy" aspect of jpeg compression.
Zig-zag through the 2D data to turn it into a 1D stream of coefficients which is roughly in frequency order. High frequencies are more likely to be zero'd out, so many packets will ideally end in a stream of zeros which can be truncated.
Compress (non-lossily) the (now quite compressible) data using huffman encoding.
So a 'non-lossy' transformation would want to avoid doing as much as possible of that - especially anything beyond the DCT quantization, but that does not avoid expanding the data.

Related

What is the best format for using SIFT?

Is there any difference in terms of precision and speed in using SIFT with JPEG, PNG or PGM images? Obviously supposing the same image size.
This question does not make sense.
SIFT is an algorithm, it operates on raw pixel data in memory, not on files.
You would need some framework that loads image files into some data structure in memroy.
SIFT itself doesn't care if you load a jpeg or a pgm. It will never know where the pixels came from.
Having the same size (to 1 bit) for the same image in three different formats would be pretty impossible in my opinion. As PGM is uncompressed this would also mean that JPEG and PNG would have to be uncompressed to have roughly the same size. If you have no compression you have no losses. So there would be no difference in SIFT performance.

JPG compression and noise

I am studying jpeg compression and it seems to work by reducing high frequency components in images. Since noise is usually high frequency, does this imply that jpeg compression somewhat works on reducing noise in images?
JPEG compression can reduce noise by smoothing out the high-frequency components of the image, but it also introduces visual noise in the form of compression artifacts. Here is a zoomed-in (3x) view of part of my avatar (a high-quality JPEG) and part of your avatar (a PNG drawing), on the left as downloaded and on the right as compressed with ImageMagick using -quality 60. To my eye they both look "noisier" when JPEG-compressed.
Strictly speaking, no.
JPEG does remove high frequencies (see below), but not selectively enough to be a denoising algorithm. In other words, it will remove high frequencies if they are noise, but also if they are useful detail information.
To understand this, it helps to know the basics of how JPEG works. First, the image is divided in 8x8 blocks. Then the discrete cosine transform (DCT) is applied. As a result, each element of the 8x8 block contains the "weight" of a different frequency. Then the elements are quantized in a fixed way depending on the quality level selected a priori. This quantization means gaining coding performance at the cost of losing precision. The amount of precision lost is fixed a priori, and (as I said above) it does not differenciate between noise and useful detail.
You can test this yourself by saving the same image with different qualities (which technically control the amount of quantization applied to each block) and see that not only noise is removed. There is a nice video showing this effect for different quality levels here: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv.

Automatically choose optimal JPEG compression in OpenCV?

I'm wonder if there is a way to automatically choose a reasonable JPEG compression level in OpenCV?
The current JPEG sizes I'm getting are too large, and nailing it to a fixed value feels dirty. If I recall such features existed in image editors such as Dreamweaver. If there is no such features, i'm also wondering if somebody knows of an algorithm that is able to estimate this parameter without performing hard disk IO.
std::vector<int> params;
params.push_back(CV_IMWRITE_JPEG_QUALITY);
params.push_back(magic); //Want a way to estimate magic
cv::imwrite("my.jpg",image,params);
Unfortunately, to "optimize" JPEG compression, one would have to learn and apply many technical details about the JPEG compression. Because of this, many libraries do not offer the full suite of adjustment parameters. The 0-100 JPEG quality parameter is already a good compromise.
ImageMagick may have such functionality.
You are looking for a way to "automatically choose a reasonable JPEG compression level in OpenCV".
However, "reasonable" is subjective, and depends on the the image owner's perception of what features are important in the given image. This means the perception can be different for every combination of (different owners) x (different images).
The short answer
No, OpenCV does not currently offer this functionality.
The "sysadmin" answer
Look at OpenCV ImageMagick integration.
http://www.imagemagick.org/discourse-server/viewtopic.php?f=22&t=20333&start=45
The quick and dirty answer
Use method of bisection (0, 100, 50, 75, 87, ...) to search for a JPEG quality level that will approach a specified output file size.
Secant method may also be applicable.
Edited: Newton's method is probably not useful, because one cannot obtain the first derivative of the quality-file size curve without an analytical model.
Obviously this is too inefficient for practical every-day use, so it is not provided by the library.
If you want to use it, you have to implement it yourself with your own choice of techniques.
To avoid disk I/O, use cv::imencode which writes to memory instead of to disk.
The slightly longer answer
Although it doesn't implement this functionality, it is obvious that it is a nice feature to have.
If someone is willing to implement it with code quality good for use in OpenCV, OpenCV may consider accept it.
The yet longer answer
OpenCV uses jpeglib, or optionally libjpeg-turbo, and both libraries allow one to configure the technical details of JPEG compression.
Below I will focus on these technical details.
Read first: JPEG compression on Wikipedia
Of the JPEG compression pipeline, three of the compression steps can be configured by users of jpeglib or libjpeg-turbo:
Chroma subsampling
After the conversion from RGB to YCbCr, the chroma (color-carrying) channels: Chroma-blue and Chroma-red, are optionally stored in a lower resolution relative to the Luminance (Y) channel, also known as the Intensity or Grayscale channel, the latter is always stored at full resolution.
Most JPEG decoders can support these downsampling factors:
(1, 1) - no subsampling
(1, 2), (2, 1), (2, 2) - moderate subsampling, where one or both dimensions may be subsampled by 2.
(1, 4), (2, 4), (4, 2), (4, 1) - heavy subsampling. Note that the original JPEG specification forbids some of these combinations, but most JPEG decoders are able to decode them nevertheless.
Quantization table
Each JPEG image can define a quantization table for the "AC coefficients" of the DCT transformed coefficients
Each JPEG image can define a quantization table for the "DC coefficient" (i.e. the average value of the 8x8 block) computed from the DCT transform.
Quantization is the "lossy step" of JPEG compression. So, a technical user will have to decide how much loss (quantization) is acceptable, and then configure the quantization table accordingly.
Huffman table
Huffman coding is a lossless compression technique. In other words, if one could really spend time optimizing the Huffman coding table based on the statistics of the quantized DCT coefficients of the whole image, one can often construct a good Huffman table to optimize compression without having to trade off quality.
Unfortunately, the reality is more complicated, and such optimization is often not enabled.
It requires keeping all DCT coefficients in memory, for the whole image. This bloats memory usage.
Writing to the file cannot start until everything is in memory. In contrast, if a library chooses the quantization table and Huffman table up-front, without looking at the statistics of the DCT coefficients, then the library would be able to write to the file incrementally as rows and rows of pixels are being processed. Because libjpeg is designed to be usable in the lowest-denominator devices (including smart watches, and maybe your refrigerator too?), being able to operate with minimum memory is an important feature.
Sorry but there is no way to tell the size before you make compress the file. If you are not in a hurry, compress the image using different quality values and then select the best one.

Do PVR textures consume less RAM?

It is my understanding that PVR textures, made with texturetool, are simply compressed images. Therefore the difference lies in the file size.
Frankly, the file size doesn't interest me. What I want to know is, can a PVR texture consume less RAM than a normal .PNG texture? Or does this depend entirely on the texture format (like RGBA8888 etc)?
The essential question would be:
Given X.png and X.pvr, if I display both with texture format RGBA8888, will one consume less RAM than the other?
Yes, the PVR will consume less RAM at all stages — it's unpacked live by the GPU as it's accessed. There's no intermediate decompression.
A PVR-like approach used in digital video is that instead of storing RGB at every pixel, convert to YUV, then store Y at every pixel and U and V only twice per four-pixel block. So you go from 128 bits for the block to 64 bits. To get back to RGB the outputter reads the exactly correct Y and interpolates or accesses the most nearby U and V as necessary.
Schemes like PVR do a similar thing of not storing the full value at every pixel but inferring parts of it from nearby context. What counts as nearby is picked directly corresponding to however the caching is arranged on that GPU. It's usually more in-depth that just scaling down the sampling resolution of some of the channels, e.g. specifying a base offset for samples and then using a tiny precision for each is also common.
So the GPU can always get a value for pixel X by reading only values in a very small, local region of the data.
This contrasts with traditional schemes like PNG where having to know every pixel in the stream prior to X is acceptable if it improves the compression. Processing such things live would flood the GPU's memory bandwidth and hence be completely impractical, so such textures are decompressed from disk and then uploaded.
So schemes like PVR tend to lead to poorer compression and lower per-pixel quality but the win is that they can sit in VRAM compressed. A game will often increase the resolution of its textures if using PVR to try to find a comfortable balance.
Uncompressed textures are:
16 bit per pixel (RGB565, RGBA4444),
24 bit per pixel (RGB888)
PVRTC textures are either 4bpp or even 2bpp. So yes they do use less memory.
Also they perform better because need less memory bandwidth to fetch textures.

PVRTC compression increasing the file sizes of PNG

For iPhone game development, I switched from PNG format to PVRTC format for the sake of performance. But PVRTC compression is creating files that are much bigger than the PNG files.. So a PNG of 140 KB (1024x1024) gets bloated to 512 KB or more in the PVRTC format.. I read somewhere that a PNG file of 50KB got compressed to some 10KB and all, in my case, its the other way around..
Any reason why it happens this way and how I can avoid this.. If PVRTC compression is blindly doing 4bpp conversion (1024x1024x0.5) irrespective of the transparencies in the PNG, then whats the compression we are achieving here..
I have 100s of these 1024x1024 images in my game as there are numerous characters each doing some complex animations.. so in this rate of 512KB per image, my app would get more than 50MB.. which is unacceptable for my customer.. ( with PNG, I could have got my app to 10MB)..
In general, uncompressed image data is either 24bpp (RGB) or 32bpp (RGBA) flatrate. PVRTC is 4bpp (or 2bpp) flatrate so there is a compression of 6 or 8 (12 or 16) times compared to this.
A requirement for graphics hardware to use textures natively is that the format of the texture must be random accessible for the hardware. PVRTC is this kind of format, PNG is not and this is why PNG can achieve greater compression ratios. PVRTC is a runtime, deployment format; PNG is a storage format.
PVRTC compression is carried out on 4x4 blocks of pixels at a time and at a flat bit rate so it is easy to calculate where in memory to retrieve the data required to derive a particular texel's value from and there is only one access to memory required. There is dedicated circuitry in the graphics core which will decode this 4x4 block and give the texel value to your shader/texture combiner etc.
PNG compression does not work at a flat bitrate and is more complicated to retrieve specific values from; memory needs to be accessed from multiple locations in order to retrieve a single colour value and far more memory and processing would be required every single time a texture read occurs. So it's not suitable for use as a native texture format and this is why your textures must be decompressed before the graphics hardware will use them. This increases bandwidth use when compared to PVRTC, which requires no decompression for use.
So for offline storage (the size of your application on disk), PNG is smaller than PVRTC which is smaller than completely uncompressed. For runtime memory footprint and performance, PVRTC is smaller and faster than PNG which, because it must be decompressed, is just as large and slow as uncompressed textures. You might gain some advantage with PNG at initialisation for disk access, but then you'd lose time for decompression.
If you want to reduce the storage footprint of PVRTC you could try zip-style compression on the texture files and expand these when you load from disk.
PVRTC (PowerVR Texture Compression) is a texture compression format. On devices using PowerVR e.g. most higher end mobile phones including the iPhone and other ARM-based gadgets like the iPod it is very fast to draw since drawing it is hardware accelerated. It also uses much less memory since images are represented in their compressed form and decoded each draw, whereas a PNG needs to be decompressed before being drawn.
PNG is lossless compression.
PVRTC is lossy compression meaning it approximates the image. It has a completely different design criteria.
PVRTC will 'compress' (by approximating) any type of artwork, giving a fixed bits per texel, including photographic images.
PNG does not approximate the image, so if the image contains little redundancy it will hardly compress at all. On the other hand, a uniform image e.g. an illustration will compress best with PNG.
Its apples and oranges.
Place more than one frame tiled onto a single image and blit the subrectangles of the texture. This will dramatically reduce your memory consumption.
If you images are, say, 64x64, then you can place 256 of them on a 1024x1024 texture in a 16x16 arrangement.
With a little effort, images do not need to be all the same size, just so long as you keep track in the code of the rectangle in the texture that each image is at.
This is how iPhone game developers do it.
I agree with Will. There is no point in the question. I read the question 3 times, but I still don't know what Sankar want to know. It's just a complain, no question.
The only thing I can advice, don't use PVRTC if you mind to use it. It offers performance gain and saves VRAM, but it won't help you in this case. Because what you want is just reducing game volume, not a consideration about trade-off between performance and quality.

Resources