Worst PNG compression scenario - image-processing

I am using libpng to convertraw image data (3 channel, 8 bit, no metadata) to PNG and store it in a buffer. I now have a problem to allocate the right amount of buffer space for writing the PNG data to it. It is clear to me, that the compressed data might be larger than the raw data (cf. the overhead for a 1x1 image)
Is there any general rule for an upper margin of the compressed data size with respect to the image size and the different filtering/compression options? If that is too generic, let's say we use PNG_COLOR_TYPE_RGB, PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_DEFAULT, PNG_FILTER_TYPE_DEFAULT.
Thank you

PNG overhead is 8 (signature) + 25 (IHDR) +12 (first IDAT) + 12 (IEND) plus 1 byte per row (filter byte), plus 12 bytes per additional IDAT when the size exceeds the zlib buffer size which is typically 8192. Zlib overhead is 6 (2-byte header and 4-byte checksum). Deflate overhead is 5 bytes plus 5 bytes per additional 32k in size.
So figure (1.02 * (3*W+1) * H) + 68.
You can decrease the 1.02 factor if you use a larger Zlib buffer size or increase it if you use a smaller buffer size. For example, a 256x256 RGB PNG compressed with a 1000000-byte buffer size (1000000 bytes per IDAT chunk) will have only one IDAT chunk and the total overhead will be around 330 bytes, or less than .2 percent, while if you compress it with a very small buffer size, for example 100 bytes, then there will be around 2000 IDAT chunks and the overhead will be about twelve percent.
See RFC-1950, RFC-1951, and RFC-2083.

You can use compressBound() in zlib to determine an upper bound on the size of the compressed data given an uncompressed data length, assuming the default zlib settings. For a specific set of different zlib settings, you can use deflateBound() after deflateInit2() has been used to establish the settings.

Related

Flipping bits in a image bytes without decompressing the image

I am interested in flipping few bits in image section of popular image formats such jpg, tiff, png, heic. Let's consider a example.
Given a image as byte array, few bytes represent the header section say from [0 to N]
Then few more bytes contain metadata such as exif etc from [N+1 to M]
Then a lot more bytes contain the image pixels in a compressed format according to some compression algorithm, say [M+1 to X]
Lastly there is tail section from [X+1 to Z] where Z is length of given byte array.
I am interested in flipping few bits in bytes from [M+1 to X]. I am assuming that these bytes do not contain anything else except compressed bits of the image so when bits are flipped, any image viewer will still work without any loss of image quality.
I need recommendation for java or python libs that can parse the image and give me indices for M+1 and X.
Thanks for reading and helping out in advance.
Best

libvips rotate is throwing no space left on device

I am using libvips to rotate the images. I am using a VM that have 3002 MB Ram and 512MB temp storage.The AWS Lambda Machine.
The command I running to rotate images is
vips rot original.jpg rotated.jpg d90
It throwing the following error
Exit Code: 1, Error Output: ERROR: wbuffer_write: write failed unix error: No space left on device
The jpg image is arround 10Mb.
Here's how libvips will rotate your jpg image.
90 degree rotate requires random access to the image pixels, but JPEG images can only be read strictly top-to-bottom, so as a first step, libvips has to unpack the JPG to a random access format. It uses vips (.v) format for this, which is pretty much a C array with a small header.
For images under 100mb (you can change this value, see below) decompressed, it will unpack to a memory buffer. For images over 100mb decompressed, it will unpack to a temporary file in /tmp (you can change this, see below).
Next, it does the rotate to the output image. It can do this as a single streaming operation, so it will typically need enough memory for 256 scanlines on the input image, and 256 on the output, so around another 30mb or so in this case, plus some more working area for each thread.
In your specific case, the input image is being decompressed to a temporary file of 30,000 x 10,000 x 3 bytes, or about 900mb. This is way over the 512mb you have in /tmp, so the operation fails.
The simplest solution is to force the loader to load via a memory buffer. If I try:
$ vipsheader x.jpg
x.jpg: 30000x10000 uchar, 3 bands, srgb, jpegload
$ time vips rot x.jpg y.jpg d90 --vips-progress --vips-leak
vips temp-3: 10000 x 30000 pixels, 8 threads, 128 x 128 tiles, 256 lines in buffer
vips x.jpg: 30000 x 10000 pixels, 8 threads, 30000 x 16 tiles, 256 lines in buffer
vips x.jpg: done in 0.972s
vips temp-3: done in 4.52s
memory: high-water mark 150.43 MB
real 0m4.647s
user 0m5.078s
sys 0m8.418s
The leak and progress flags make vips report some stats. You can see the initial decompress to the temporary file is talking 0.97s, the rotate to the output is 4.5s, it needs 150mb of pixel buffers and 900mb of disc.
If I raise the threshold, I see:
$ time VIPS_DISC_THRESHOLD=1gb vips rot x.jpg y.jpg d90 --vips-progress --vips-leak
vips temp-3: 10000 x 30000 pixels, 8 threads, 128 x 128 tiles, 256 lines in buffer
vips x.jpg: 30000 x 10000 pixels, 8 threads, 30000 x 16 tiles, 256 lines in buffer
vips x.jpg: done in 0.87s
vips temp-3: done in 1.98s
memory: high-water mark 964.79 MB
real 0m2.039s
user 0m3.842s
sys 0m0.443s
Now the second rotate phase is only 2s since it's just reading memory, but memory use has gone up to around 1gb.
This system is introduced in the libvips docs here:
http://jcupitt.github.io/libvips/API/current/How-it-opens-files.md.html

How to calculate vram usage size from a image?

I am learning WebGL and I want to know the formula of calculating vram usage size from a image(jpg/png).
Thanks.
jpg or png make no difference. They are expanded to uncompressed data before being uploaded to WebGL. There is no perfect way to compute the vram usage because what the driver actually stores internally is unknown but you can estimate.
bytesPerPixel * width * height
Where bytesPerPixel is derived from the format/type you pass to gl.texImage2D as in
gl.texImage2D(level, internalFormat, width, height, 0, format, type, data)
or
gl.texImage2D(level, internalFormat, format, type, img/canvas/video)
In WebGL2 you'd compute from the interalFormat passed to the same function (see table here)
for WebGL1 common values are
format type bytesPerPixel
------------------------------------------------------
gl.RGBA gl.UNSIGNED_BYTE 4
gl.RGB gl.UNSIGNED_BYTE 3
gl.LUMIANCE gl.UNSIGNED_BYTE 1
gl.ALPHA gl.UNSIGNED_BYTE 1
gl.LUMIANCE_ALPHA gl.UNSIGNED_BYTE 2
gl.RGB gl.UNSIGNED_SHORT_5_6_5 2
gl.RGBA gl.UNSIGNED_SHORT_4_4_4_4 2
gl.RGBA gl.UNSIGNED_SHORT_5_5_5_1 2
gl.RGBA gl.FLOAT 16 (if enabled)
Then, if you upload a mipmap or generate one with gl.generateMipmap you need to multply by about 33%. Example, a 16x16 pixel texture will have
16x16 + 8x8 + 4x4 + 2x2 + 1x1 = 340
16x16 = 256
256 * 1.33 = 340.
But like I mentioned it's up to the driver. Some (most drivers?) will expand RGB to RGBA as one example. Some drivers will expand the various 2 byte per pixel RGB/RGBA formats to 4 bytes.

how to encode Image containing single row using jpeg

While I was reading jpeg spec, I came to know while encoding jpeg, image is first broken into 8x8 blocks then DCT and other things happen.
So I am curious to know how would an image (raw file) containing a single row get encoded using jpeg?
would jpeg add extra 7 rows to file so that it can break it in 8x8 blocks?
A very nice explanation is given in https://dsp.stackexchange.com/questions/35339/jpeg-dct-padding
From Baseline JPEG:
The image is partitioned into blocks of size 8x8.
Each block is then independently transformed using the 8x8 DCT. If the image dimensions are not exact multiples of 8, the blocks on the lower and right hand boundaries may be only partially occupied. These boundary blocks must be padded to the full 8x8 block size and processed in an identical fashion to every other block. The compressor is free to select the value used to pad partial boundary blocks.
In JPEG compression, images that are not multiples of the MCU size are padded upwards to that size.

What is the size of my CUDA texture memory?

How to interpret texture memory information output by deviceQuery sample to know texture memory size?
Here is output of my texture memory.
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535),3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
It is a common misconception, but there is no such thing as "texture memory" in CUDA GPUs. There are only textures, which are global memory allocations accessed through dedicated hardware which has inbuilt cache, filtering and addressing limitations which lead to the size limits you see reported in the documentation and device query. So the limit is either roughly the free amount of global memory (allowing for padding and alignment in CUDA arrays) or the dimensional limits you already quoted.
The output shows that the maximum texture dimensions are:
For 1D textures 65536
For 2D textures 65536*65535
For 3D textures 2048*2048*2048
If you want the size in bytes, multiply that by the maximum number of channels (4) and the maximum sub-pixel size (4B).
(For layered textures, multiply the relevant numbers you got for the dimensions by the number of maximum layers you got.)
However, this is the maximum size for a single texture, not the available memory for all textures.

Resources