How to calculate vram usage size from a image? - webgl

I am learning WebGL and I want to know the formula of calculating vram usage size from a image(jpg/png).
Thanks.

jpg or png make no difference. They are expanded to uncompressed data before being uploaded to WebGL. There is no perfect way to compute the vram usage because what the driver actually stores internally is unknown but you can estimate.
bytesPerPixel * width * height
Where bytesPerPixel is derived from the format/type you pass to gl.texImage2D as in
gl.texImage2D(level, internalFormat, width, height, 0, format, type, data)
or
gl.texImage2D(level, internalFormat, format, type, img/canvas/video)
In WebGL2 you'd compute from the interalFormat passed to the same function (see table here)
for WebGL1 common values are
format type bytesPerPixel
------------------------------------------------------
gl.RGBA gl.UNSIGNED_BYTE 4
gl.RGB gl.UNSIGNED_BYTE 3
gl.LUMIANCE gl.UNSIGNED_BYTE 1
gl.ALPHA gl.UNSIGNED_BYTE 1
gl.LUMIANCE_ALPHA gl.UNSIGNED_BYTE 2
gl.RGB gl.UNSIGNED_SHORT_5_6_5 2
gl.RGBA gl.UNSIGNED_SHORT_4_4_4_4 2
gl.RGBA gl.UNSIGNED_SHORT_5_5_5_1 2
gl.RGBA gl.FLOAT 16 (if enabled)
Then, if you upload a mipmap or generate one with gl.generateMipmap you need to multply by about 33%. Example, a 16x16 pixel texture will have
16x16 + 8x8 + 4x4 + 2x2 + 1x1 = 340
16x16 = 256
256 * 1.33 = 340.
But like I mentioned it's up to the driver. Some (most drivers?) will expand RGB to RGBA as one example. Some drivers will expand the various 2 byte per pixel RGB/RGBA formats to 4 bytes.

Related

How to iterate over pixels in a YUV NV12 buffer from camera and set color in Obj-c?

I need to iterate over the pixels of a YUV NV12 buffer and set color. I think the conversion for NV12 format should be easy but I can't figure it out. If I could set the top 50x50 pixels at 0,0 to white, I'd be set. Thank you in advance.
Have you tried setting the first 3 bytes (12 bits) * number of pixels to all 0x00 or all 0xFF?
Since you really don't seem to care about the conversion simply overwriting the buffer would suffice. If that works, you can tackle the other problems, like finding the right color and producing a rect instead of a line.
For the first, you need to understand the YUV coding. https://wiki.videolan.org/YUV#NV12. According to this document you will most likely need to overwrite bits in the Y range and in the UV range. So writing at two different locations. Thats very contrary to the RGB buffer where all pixel colors have close locality. So you can start and overwrite the first 8 bits in the Y range and the first or last 2 bits in the UV range. That should set you one pixel to a different color than before.
Finally you can tackle the display of the 50x50 rectangle. You'll need to know the image dimensions, because you'll need to offset after each row (if the buffer is transmitted by rows!). E.g., this graph:
.------.
|xx |
|xx |
| |
'------'
In a rgb color space, with row major transmitted values, the buffer would look like this: xx0000xx0000000000. So you would need to overwrite bytes 0-6 and bytes 18-24 (RGB). Because: first range * 3 bytes RGB. Then next range starts at row number (1) * image width (6) * 3 bytes (RGB), and so on. You have to apply the same thinking to the YUV color space.

How to represent the magnitude of a Fourier transform of an image in 8-bit format?

I have computed the Fourier transform of an 256-color-value grayscale image, but I'm not sure how to represent the output in a visible format.
This matrix represents the original image:
0 127 127 195
0 255 255 195
While this matrix represents the Fourier transform of the image:
1154 + 0j -382 + 8j -390 + 0j -382 - 8j
-256 + 0j 128 + 128j 0 + 0j 128 - 128j
From what I know, the magnitude can be computed as sqrt((r)^2+(i)^2) where r is the real component and i is the imaginary component. However, this yields values outside of the range that can be represented in 8 bits. How do I correct this?
Typically, one takes the log magnitude of each complex fft result value (ignoring ones with magnitude zero), and then scale the result so that the maximum expected result is 255 (the scale factor will depend on the dimensions and input gain of the 2D image).
Since the dynamic range of the spectrum is quite different from that of the original spatial signal, it is much difficult to use the original 8-bit format. You can use log(1+x) to shrink the range, and then scale into the 8-bit range.

Finding the size in bytes of cv::Mat

I'm using OpenCV with cv::Mat objects, and I need to know the number of bytes that my matrix occupies in order to pass it to a low-level C API. It seems that OpenCV's API doesn't have a method that returns the number of bytes a matrix uses, and I only have a raw uchar *data public member with no member that contains its actual size.
How can one find a cv::Mat size in bytes?
The common answer is to calculate the total number of elements in the matrix and multiply it by the size of each element, like this:
// Given cv::Mat named mat.
size_t sizeInBytes = mat.total() * mat.elemSize();
This will work in conventional scenarios, where the matrix was allocated as a contiguous chunk in memory.
But consider the case where the system has an alignment constraint on the number of bytes per row in the matrix. In that case, if mat.cols * mat.elemSize() is not properly aligned, mat.isContinuous() is false, and the previous size calculation is wrong, since mat.elemSize() will have the same number of elements, although the buffer is larger!
The correct answer, then, is to find the size of each matrix row in bytes, and multiply it with the number of rows:
size_t sizeInBytes = mat.step[0] * mat.rows;
Read more about step here.

Worst PNG compression scenario

I am using libpng to convertraw image data (3 channel, 8 bit, no metadata) to PNG and store it in a buffer. I now have a problem to allocate the right amount of buffer space for writing the PNG data to it. It is clear to me, that the compressed data might be larger than the raw data (cf. the overhead for a 1x1 image)
Is there any general rule for an upper margin of the compressed data size with respect to the image size and the different filtering/compression options? If that is too generic, let's say we use PNG_COLOR_TYPE_RGB, PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_DEFAULT, PNG_FILTER_TYPE_DEFAULT.
Thank you
PNG overhead is 8 (signature) + 25 (IHDR) +12 (first IDAT) + 12 (IEND) plus 1 byte per row (filter byte), plus 12 bytes per additional IDAT when the size exceeds the zlib buffer size which is typically 8192. Zlib overhead is 6 (2-byte header and 4-byte checksum). Deflate overhead is 5 bytes plus 5 bytes per additional 32k in size.
So figure (1.02 * (3*W+1) * H) + 68.
You can decrease the 1.02 factor if you use a larger Zlib buffer size or increase it if you use a smaller buffer size. For example, a 256x256 RGB PNG compressed with a 1000000-byte buffer size (1000000 bytes per IDAT chunk) will have only one IDAT chunk and the total overhead will be around 330 bytes, or less than .2 percent, while if you compress it with a very small buffer size, for example 100 bytes, then there will be around 2000 IDAT chunks and the overhead will be about twelve percent.
See RFC-1950, RFC-1951, and RFC-2083.
You can use compressBound() in zlib to determine an upper bound on the size of the compressed data given an uncompressed data length, assuming the default zlib settings. For a specific set of different zlib settings, you can use deflateBound() after deflateInit2() has been used to establish the settings.

How Huffman Encoding construct the image(jpeg) from dct coefficients?

I have a 512x512 image and I tried to recompress it. Here's the steps for recompressing an image to jpeg file
1) convert rgb to YCrCb
2) perform down sampling on Cr and Cb
2) convert YCrCb to DCT and Quantized according to chosen Quality
3) perform Huffman Encoding on Quantized DCT
But before Huffman Encoding I counted the number of DCT coefficients and it is 393216. Dividing by it by 64 tells me the number of DCT block (8x8) which will be 6144.
Now I tried to count the number of 8x8 blocks for pixel domain. 512/8=64 which gives me 64 blocks horizontally and 64 blocks vertically. 64 x 64 = 4096 which is not equal to number of DCT blocks while the number of pixels are 512x512 = 262144
My Question is how does Huffman encoding magically transform 393216 coefficients to 262144 pixels and get each pixel values, and compute the dimension (512x512) of the compressed image(jpeg).
Thanks you in advance. :D
If your image was encoded with no color subsampling, then there would be a 1:1 ratio of 8x8 coefficient blocks to 8x8 color component blocks. Each MCU (minimum coded unit) would be 8x8 pixels and have 3 8x8 coefficient blocks. 512x512 pixels = 64x64 8x8 blocks x 3 (one each for Y, Cr and Cb) = 12288 coefficient blocks.
Since you said you subsampled the color (I assume in both directions), then you will now have 6 8x8 blocks for each MCU. In the diagram below, the leftmost diagram shows the case for no subsampling of the colors and the rightmost diagram shows subsampling in both directions. The MCU size in this case will be 16x16 pixels. Each 16x16 block of pixels will need 6 8x8 coefficient blocks to define it (4 Y, 1 Cr, 1 Cb). If you divide the image into 16x16 MCUs, you will have 32x32 MCUs each with 6 8x8 blocks per MCU = 6144 coefficient blocks. So, to answer your question, the Huffman encoding is not what's changing the number of coefficients, it's the color subsampling. Part of the compression which comes from using color subsampling in JPEG images is exploiting a feature of the human visual system. Our eyes are more sensitive to changes in luminance than chrominance.
Huffman encoding doesn't transform coefficients to pixels or anything like that. At least not the Huffman encoding that I'm thinking of. All huffman encoding does, is it takes a list of tokens, and represents them with less bits based on the frequency of those tokens.
an example: you have tokens a, b, c, and d
now, uncompressed, each of your tokens would require 2 bits(00, 01, 10, and 11).
let's say a=00, b=01, c=10, and d=11
aabaccda would be represented as 0000010010101100 16 bits
but with Huffman encoding you'd represent a with less bits because it's more common, and you'd represent b and d with more because they're less common something to the extent of:
a=0, b=110, c=10, d=111 and then
aabaccda would be represented as 00110010101110 14 bits
Your image is 512x512 pixels
The Y component is 512x512 hence 262144 pixels turned into 262144 DCT coefficients
The Cb and Cr components are downsampled by 2 hence 256x256 pixels turned into 65536 DCT coefficients each.
The sum of all DCT coefficients is 262144+65536+65536 = 393216.
Huffman has nothing to do with this.

Resources