While I was reading jpeg spec, I came to know while encoding jpeg, image is first broken into 8x8 blocks then DCT and other things happen.
So I am curious to know how would an image (raw file) containing a single row get encoded using jpeg?
would jpeg add extra 7 rows to file so that it can break it in 8x8 blocks?
A very nice explanation is given in https://dsp.stackexchange.com/questions/35339/jpeg-dct-padding
From Baseline JPEG:
The image is partitioned into blocks of size 8x8.
Each block is then independently transformed using the 8x8 DCT. If the image dimensions are not exact multiples of 8, the blocks on the lower and right hand boundaries may be only partially occupied. These boundary blocks must be padded to the full 8x8 block size and processed in an identical fashion to every other block. The compressor is free to select the value used to pad partial boundary blocks.
In JPEG compression, images that are not multiples of the MCU size are padded upwards to that size.
Related
I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4
Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))
For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.
I am trying to split a compressed JPEG bitstream into the 8x8 blocks of the original image. However, I am routinely finding fewer than I know there to be based on the size of the image.
I have narrowed this down to the first row of the image, where the padded edge block (identifiable by its lower mean value) is reached after 65 blocks when the image is 80 blocks across. The end of subsequent rows are then reached after the expected 80 blocks, indicating no further skipped blocks.
Am I simply missing some EOB markers in the first row, or is there a scenario in which some 8x8 blocks are not encoded into the bitstream?
If you are decoding a color image, it is very possible that the Cb and Cr components are subsampled so that there are not as many 8x8 blocks as for the Y component.
I am using the inpainting command in GMIC, which takes in both an image and a mask which indicates which part of that image to inpaint. Values that are 255 on the mask are then filled in.
http://gmic.eu/reference.shtml
The input images I am using have huge black portions (the value of the pixels are 0 here). I want to define the mask to be exactly the pixels of the original image which are black.
Of course, I can preprocess all these masks in matlab, python, etc, but this will take a long time as I am processing on the order of 1 million images. GMIC has a fast piping interface which does everything in memory, and a mathematical interpreter, so I should be able to do this all with the GMIC command line and save a lot of time.
The answer I need does this entirely in GMIC using it's mathematical interpreter. Thanks in advance!
Something like this probably :
$ gmic input.png --select_color 0,0,0,0 -inpaint[0] [1],.... -keep[0] -o output.png
(where you must set your inpaint parameters according to your needs).
Whenever i read a colored image with 3 channels via cv::imread; its data alignment is a bit awkward (neither a byte nor an integer) and slows me down when i read a single pixel data on GPU memory. And it seems cv::Mat class's logic behind the alignment is a bit different than what i had initially thought. It does not add an extra byte between two pixels in a single row in order to have each pixel in a row started at every 4 bytes; but rather it pads some extra bytes at the END of each row for which any row may start at every 4 bytes boundary.
What should i do to pack each pixel data into a single unsigned integer? Is there a built-in method in OpenCV so that i do not have to use logical OR operation for packing each pixel data one by one?
Kind Regards.
You can convert the pixel format from BGR to BGRA
See this example.
I am trying to develop an OCR in VB6 and I have some problems with BMP format. I have been investigating the OCR process and the first step is to convert the image in "black and white" with a threshold. The conversion process is easy to understand and I have done it. However, I'm trying to reduce the size of the resulting image because it uses less colors (each pixel only has 256 possible values in grayscale). In the original image I have 3 colors (red, green and blue) but now I only need one color (the value in grayscale). In this moment I have achieved the conversion but the resulting grayscale images have the same size as the original color image (I assign the same color value in the three channels).
I have tried to modify the header of the BMP file but I haven't achieved anything and now I don't understand how it works. For example, if I convert the image with paint, the offset that is specified in the header changes its value. If the header is constant, why does the offset change its value?.
The thing is that a grey-scale bitmap image is the same size as a color bitmap image because the data that is used to save the grey colors takes just as much space as the color.
The only difference is that grey is just 3 times that same value. (160,160,160) for example with color giving something like (123,200,60). The grey values are just a small subset of the RGB field.
You can trim down the size after converting to grey-scale by converting it from 24 bit to 16 bit or 8-bit for example. Although it depends on what you are using to do the conversion whether that is already supplied to you. Otherwise you'll have to make it yourself.
You can also try using something else than BMP images. PNG files are lossless too, and would even save space with the 24 bit version. Image processing libraries usally give you several options as output formats. Otherwise you can probably find a library that does this for you.
You can write your own conversion in a "lockbits" method. It takes a while to understand how to lock/unlock bits correctly, but the effort is worth it, and once you have the code working you'll see how it can be applied to other scenarios. For example, using an lock/unlock bits technique you can access the pixel values from a bitmap, copy those pixel values into an array, manipulate the array, and then copy the modified array back into a bitmap. That's much faster than calling GetPixel() and SetPixel(). That's still not the fastest image manipulation code one can write, but it's relatively easy to implement and maintain the code.
It's been a while since I've written VB6 code, but Bob Powell's often has good examples, and he has a page about lock bits:
https://web.archive.org/web/20121203144033/http://www.bobpowell.net/lockingbits.htm
In a pinch you could create a new Bitmap of the appropriate format and call SetPixel() for every pixel:
Every pixel (x,y) in your 24-bit color image will have a color value (r,g,b)
After conversion to a 24-bit gray image, each pixel (x,y) will have a three equal values for each color channel; that can be expressed as (n,n,n) as Willem wrote in his reply. If all three colors R,G,B have the same value, then you can say that color value is the "grayscale" value of that pixel. This is the same shade of gray that you will see in your final 8-bit bitmap.
Call SetPixel for each pixel (x,y) in a newly created 8-bit bitmap that is the same width and height as the original color image.