Convert to grayscale and reduce the size - image-processing

I am trying to develop an OCR in VB6 and I have some problems with BMP format. I have been investigating the OCR process and the first step is to convert the image in "black and white" with a threshold. The conversion process is easy to understand and I have done it. However, I'm trying to reduce the size of the resulting image because it uses less colors (each pixel only has 256 possible values in grayscale). In the original image I have 3 colors (red, green and blue) but now I only need one color (the value in grayscale). In this moment I have achieved the conversion but the resulting grayscale images have the same size as the original color image (I assign the same color value in the three channels).
I have tried to modify the header of the BMP file but I haven't achieved anything and now I don't understand how it works. For example, if I convert the image with paint, the offset that is specified in the header changes its value. If the header is constant, why does the offset change its value?.

The thing is that a grey-scale bitmap image is the same size as a color bitmap image because the data that is used to save the grey colors takes just as much space as the color.
The only difference is that grey is just 3 times that same value. (160,160,160) for example with color giving something like (123,200,60). The grey values are just a small subset of the RGB field.
You can trim down the size after converting to grey-scale by converting it from 24 bit to 16 bit or 8-bit for example. Although it depends on what you are using to do the conversion whether that is already supplied to you. Otherwise you'll have to make it yourself.
You can also try using something else than BMP images. PNG files are lossless too, and would even save space with the 24 bit version. Image processing libraries usally give you several options as output formats. Otherwise you can probably find a library that does this for you.

You can write your own conversion in a "lockbits" method. It takes a while to understand how to lock/unlock bits correctly, but the effort is worth it, and once you have the code working you'll see how it can be applied to other scenarios. For example, using an lock/unlock bits technique you can access the pixel values from a bitmap, copy those pixel values into an array, manipulate the array, and then copy the modified array back into a bitmap. That's much faster than calling GetPixel() and SetPixel(). That's still not the fastest image manipulation code one can write, but it's relatively easy to implement and maintain the code.
It's been a while since I've written VB6 code, but Bob Powell's often has good examples, and he has a page about lock bits:
https://web.archive.org/web/20121203144033/http://www.bobpowell.net/lockingbits.htm
In a pinch you could create a new Bitmap of the appropriate format and call SetPixel() for every pixel:
Every pixel (x,y) in your 24-bit color image will have a color value (r,g,b)
After conversion to a 24-bit gray image, each pixel (x,y) will have a three equal values for each color channel; that can be expressed as (n,n,n) as Willem wrote in his reply. If all three colors R,G,B have the same value, then you can say that color value is the "grayscale" value of that pixel. This is the same shade of gray that you will see in your final 8-bit bitmap.
Call SetPixel for each pixel (x,y) in a newly created 8-bit bitmap that is the same width and height as the original color image.

Related

Scaling images before doing conversion or vice versa?

I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4
Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))
For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

TBitmap.LoadFromFile for a PNG changes the RGB values - can I stop it?

I need to access a .png image file's RGBA data. I found that reading a .png image using Firemonkey's TBitmap.LoadFromFile changes the RGB values. They get premultiplied by the alpha value, thus losing the original RGB values whenever alpha is not 255.
In Windows I traced it to TBitmapCodecWIC.DecodeFrame in FMX.Canvas.D2D where it uses the GUID_WICPixelFormat32bppPBGRA pixel format, which according to WIC docs implies D2D1_ALPHA_MODE_PREMULTIPLIED.
Investigating further, I understand I can approximately recover the lost RGB values by doing an "UnPreMultiplyAlpha" which effectively divides the RGB values by the alpha value again. This works, visually, but as you can imagine is pretty lossy especially for pixels with low alpha and/or low RGB values.
Is there a way to tell TBitmap.LoadFromFile to retain the original RGBA values?

UIImage/CGImage changing my pixel color

I have an image that is totally white in its RGB components, with varying alpha -- so, for example, 0xFFFFFF09 in RGBA format. But when I load this image with either UIImage or CGImage APIs, and then draw it in a CGBitmapContext, it comes out grayscale, with the RGB components set to the value of the alpha -- so in my example above, the pixel would come out 0x09090909 instead of 0xFFFFFF09. So an image that is supposed to be white, with varying transparency, comes out essentially black with transparency instead. There's nothing wrong with the PNG file I'm loading -- various graphics programs all display it correctly.
I wondered whether this might have something to do with my use of kCGImageAlphaPremultipliedFirst, but I can't experiment with it because CGBitmapContextCreate fails with other values.
The ultimate purpose here is to get pixel data that I can upload to a texture with glTexImage2D. I could use libPNG to bypass iOS APIs entirely, but any other suggestions? Many thanks.
White on a black background with an alpha of x IS a grey value corresponding to x in all the components. Thats how multiplicative blending works.

What is a good way of Enhancing contrast of color images?

I split color image for 3 channels and made a contrast enhancement of each channel.
Then merged them together, I like the image at the result, but it has different colors.
Black objects became yellow and so on...
EDIT:
The algorithm I used is to calculate the 5th percentile and the 95th percentile
as min and max values, and then expand the values of image so that it will have min and max values as 0 and 255. If there is a better approach please tell me.
When doing contrast enhancement in color images, it is a good idea to only adjust the luminance (brightness) and leave the color information alone. This requires a colorspace conversion from RGB to something like YUV. In this colorspace, the Y component is similar to a grayscale version of the image, while the other components provide the color. This effectively allows you to adjust contrast (by running your algorithm on just the Y component) without distorting the color information. Finally, you can convert back to RGB.
Use CLAHE algorithm. openCV has an implementation of it: cv::createCLAHE()

What's the easiest solution to get digitalized representation of the text on an image?

Assume the image is only in black and white.
Is there a software that can generate a matrix representation for the text on the image?
You should take a look at OCR-Software.
If you're referring to an image like the one below, then it's pretty straight forward.
You just do the following:
Read the image into a 2D byte array so you can access the pixels
Loop through the array and look for every black pixel (0)
Store these in a matrix however you need to
This assumes that lettering is pure black on white, you might have to allow values upto a certain value (5 or 10) if this isn't the case.

Resources