Packing Pixel Data in OpenCV - opencv

Whenever i read a colored image with 3 channels via cv::imread; its data alignment is a bit awkward (neither a byte nor an integer) and slows me down when i read a single pixel data on GPU memory. And it seems cv::Mat class's logic behind the alignment is a bit different than what i had initially thought. It does not add an extra byte between two pixels in a single row in order to have each pixel in a row started at every 4 bytes; but rather it pads some extra bytes at the END of each row for which any row may start at every 4 bytes boundary.
What should i do to pack each pixel data into a single unsigned integer? Is there a built-in method in OpenCV so that i do not have to use logical OR operation for packing each pixel data one by one?
Kind Regards.

You can convert the pixel format from BGR to BGRA
See this example.

Related

What kind of texture compression/encoding is this? Run-Length encoding? Is there a way for Uploading to Open Gl?

I've got some Textures from an old computer game. They are stored in one huge file with a simple run-length encoding. I know the image width, and height, the images have also some fully transperant parts.
one texture is structured as follows:
-a color map: with 256 Colors, each color contains 4 bytes for each color. One value for b,g,r and one zero value
-texel Data: in general each byte corresponds to a color in the color map but
texel datas transparent is compressed with some run-length encoding:
first value indicates with how many tranparent texel the first row starts
the following value indicates how many of the folloing bytes corresponds to a texel from the colormap
(byte 5 means you have to map the next 5 bytes to the color map)
the value after the 5 bytes could be another transparent pixel count indicator, which would be followed by another color pixel count.
but if the value is 0xFE then there is a no other colored texels in this row. if there less texels in this row than given by the image width, the rest of the texels are transparent.
if the value is 0xFF the end of the image/the last row is reached. if there was less rows then given by the image height, the rest of the texels are transparent.
for example a 4x4 texture could look like this:
02 01 99 FE (two transparent pixel, color 99, one transperent pixel to fullfill the width)
01 02 98 99 FE (one transparent pixel, color 98, color 99 , one transperent pixel to fullfill the width)
00 01 99 01 02 98 99 FE (color 99, one transparent pixel, color 98, color 99)
02 02 99 98 FF (two transparent pixel, color 99, color 98)
so i think because this is a very rudimentary compression, maybe somebody knows if it is called a specific name or something?
And the most important, is there a way to upload this "compressed" data to openGL? I know for that i have to specify some encoding for the data in openGl.
I ve already writen an algorythm to convert this data to normal rgba data. But this takes much more graphics memory than the game actually specifies (about 30% of each image is transparent which could be run-length encoded instead). So if the game is not converting the image to all rgba i want also find a way for that.
can anybody give me some help?
This is just a paletted image that uses RLE encoding for empty spaces. There's not really a name for that. It's like GIF only not as good, but probably easier to decompress.
I ve already writen an algorythm to convert this data to normal rgba data.
Then you're done. Upload that to an OpenGL texture.
So if the game is not converting the image to all rgba i want also find a way for that.
You can't.
While you could implement a palette in a shader by using either two textures or a texture and a UBO/SSBO for the palette, you can't implement the run-length encoding scheme in a shader.
RLE is fine for data storage and bulk decompression, but it is terrible at random access. And random access is precisely how textures work. There's no real way to map a texture coordinate to a memory address containing the data for the corresponding texel. And if you can't do that, you can't access the texel.
Actual compressed texture formats are designed in a way that you can go directly from a texture coordinate to an exact memory address for the block of data containing that texel. RLE isn't like that.
You have to know how are the texels being indexed within the file, but you can actually access the data without decompressing it, though you have to run through the file linearly. You read it, say... "run" by "run" and it's associated colour, and you increment a counter by the number of texels in the "run" you just read, given the pixel index as a scalar (in a flat array) you can then compare the counter to the index you want to access, while:
counter >= index
counter += texels_in_run(value)
You read the next "run", and as soon as:
index < counter
you know the pixel is within that "run" and you can get the colour. You can use this to read the data both on CPU and GPU, but on GPU side each thread would need to read the data from start until it finds the desired pixel. But since the data is still compressed you got O(n) time, based on the number of "runs" you read, and not on the number of pixels, which may be not that bad…
Or you could probably find a better way to read the data, instead of accessing it linearly... :)
I didn’t look closely to what you wrote, so I can’t say this method will work with the way the data is formatted in your files, but maybe with some tinkering you’re able to make it work.

How to iterate over pixels in a YUV NV12 buffer from camera and set color in Obj-c?

I need to iterate over the pixels of a YUV NV12 buffer and set color. I think the conversion for NV12 format should be easy but I can't figure it out. If I could set the top 50x50 pixels at 0,0 to white, I'd be set. Thank you in advance.
Have you tried setting the first 3 bytes (12 bits) * number of pixels to all 0x00 or all 0xFF?
Since you really don't seem to care about the conversion simply overwriting the buffer would suffice. If that works, you can tackle the other problems, like finding the right color and producing a rect instead of a line.
For the first, you need to understand the YUV coding. https://wiki.videolan.org/YUV#NV12. According to this document you will most likely need to overwrite bits in the Y range and in the UV range. So writing at two different locations. Thats very contrary to the RGB buffer where all pixel colors have close locality. So you can start and overwrite the first 8 bits in the Y range and the first or last 2 bits in the UV range. That should set you one pixel to a different color than before.
Finally you can tackle the display of the 50x50 rectangle. You'll need to know the image dimensions, because you'll need to offset after each row (if the buffer is transmitted by rows!). E.g., this graph:
.------.
|xx |
|xx |
| |
'------'
In a rgb color space, with row major transmitted values, the buffer would look like this: xx0000xx0000000000. So you would need to overwrite bytes 0-6 and bytes 18-24 (RGB). Because: first range * 3 bytes RGB. Then next range starts at row number (1) * image width (6) * 3 bytes (RGB), and so on. You have to apply the same thinking to the YUV color space.

Are some 8x8 blocks skipped when encoding a JPEG?

I am trying to split a compressed JPEG bitstream into the 8x8 blocks of the original image. However, I am routinely finding fewer than I know there to be based on the size of the image.
I have narrowed this down to the first row of the image, where the padded edge block (identifiable by its lower mean value) is reached after 65 blocks when the image is 80 blocks across. The end of subsequent rows are then reached after the expected 80 blocks, indicating no further skipped blocks.
Am I simply missing some EOB markers in the first row, or is there a scenario in which some 8x8 blocks are not encoded into the bitstream?
If you are decoding a color image, it is very possible that the Cb and Cr components are subsampled so that there are not as many 8x8 blocks as for the Y component.

how to encode Image containing single row using jpeg

While I was reading jpeg spec, I came to know while encoding jpeg, image is first broken into 8x8 blocks then DCT and other things happen.
So I am curious to know how would an image (raw file) containing a single row get encoded using jpeg?
would jpeg add extra 7 rows to file so that it can break it in 8x8 blocks?
A very nice explanation is given in https://dsp.stackexchange.com/questions/35339/jpeg-dct-padding
From Baseline JPEG:
The image is partitioned into blocks of size 8x8.
Each block is then independently transformed using the 8x8 DCT. If the image dimensions are not exact multiples of 8, the blocks on the lower and right hand boundaries may be only partially occupied. These boundary blocks must be padded to the full 8x8 block size and processed in an identical fashion to every other block. The compressor is free to select the value used to pad partial boundary blocks.
In JPEG compression, images that are not multiples of the MCU size are padded upwards to that size.

How to choose the number of bins when creating HSV histogram?

I was reading some documentation about HSV histogram, and in several refs the Saturation channel was quantized into 256 values. Why is that? Is there any reason behind choosing this number?
I have the same questions for the Hue channel, often it is quantized into 180 values.
Disclaimer: Off-hand answers (i.e., not backed up by any documentation):
"256" is a popular number for a bin size because Programmers Like Round Numbers -- it fits in a single byte. And "180" because the HSB circle is "360 [degrees]", but "360" does not fit into a single byte.
For many image formats, the range of RGB values is limited to 0..255 per channel -- 3 bytes in total. To store the same amount of data (ignoring any artifacts of converting to another color model), Saturation and Brightness are often expressed in single bytes as well. The same could be done for Hue, by scaling the original range of 0..359 (as Hue is usually expressed as a value in degrees on the HSB Color Wheel) into the byte range 0..255. However, probably because it's easier to do calculations with a number close to the original 360° full circle, the range is clipped to 0..179. That way the value can be stored into a single byte (and thus "HSB" uses as much memory as "RGB") and can be converted trivially back to (close to) its original value -- multiply by 2. Obviously, sticking to the storage space wins over fidelity.
Given 256 values for both S and B, and 180 for H, you end up with a color space of 256*256*180 = 11,796,480 colors. To inspect the number of colors, you build a histogram: an array where you can read out the total amount of pixels in a certain color or color range. Using a color range here, instead of actual values, significantly cuts down the memory requirements.
For an RGB color image, with the colors fairly evenly distributed, you could shift down each channel a certain number of bits. This is how a straightforward conversion from 24-bit "true-color" RGB down to 15-bit RGB "high-color" space works: each channel gets divided by 8, reducing 256 values down to 32 (5 bits per channel). Conversion to a 16-bit high-color RGB space works the same; the bit that got left over in the 15-bit conversion is assigned to green. Thus, the range of colors for green is doubled, which is useful since the human eye is more perceptive for shades of green than for the other two primaries.
It gets more complicated when the colors in the input image are not evenly distributed. A naive solution is to create an array of [256][256][256], initialize all to zero, then fill the array with the colors of the image, and finally sort them. There are better alternatives -- let me consult my old Computer Graphics [1] here. Hold on.
13.4 Reproducing Color mentions the names of two different approaches from Heckbert (Color Image Quantization for Frame Buffer Display, SIGGRAPH 82): the popularity and the median-cut algorithms. (Unfortunately, that's all they say about this topic. I assume efficient code for both can be googled for.)
A rough guess:
The size for each bin (H,S,B) should be reflected by what you are trying to use it for. This older SO question, for example, uses a large bin for hue -- color is considered the most important -- and only 3 different values for both saturation and brightness. Thus, bright images with some subdued areas (say, a comic book) will give a good spread in this histogram, but a real-color photograph will not so much.
The main limit is that the bin sizes, multiplied with each other, should use a reasonably small amount of memory, yet cover enough of each component to get evenly filled. Perhaps some trial-and-error comes into play here. You could initially evenly distribute all of H, S, and B components over the available memory in your histogram and process a small part of the image; say, 1 out of 4 pixels, horizontally and vertically. If you notice one of the component bins fills up too fas where others stay untouched, adjust the ranges and restart.
If you need to do an analysis of multiple pictures, make sure they are all alike in their color gamut. You cannot expect a reasonable bin size to work on all sorts of images; you would end up with an evenly distribution, where all matches are only so-so.
[1] Computer Graphics. Principles and Practices. (1997) J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, 2nd ed., Reading, MA: Addison-Wesley.

Resources