Scaling images before doing conversion or vice versa? - image-processing

I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4

Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))

For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

Related

Apply HSV filter in Core Image

I want to set the V component (in the HSV color space) to a constant for all the pixels of a CIImage.
It is easy to imagine a solution that loops (using for) over each RGB pixels, converts (by hand?) to HSV, set the value and convert back to RGB.
But in OpenCV/Python, this is two lines (plus it uses SIMD processor instructions for processing speed).
So the question is: what is the Core Image way of doing that, and fast?
I am using Swift5 for an iOS (iPhone) application.

How to blend 80x60 thermal and 640x480 RGB image?

How do I blend two images - thermal(80x60) and RGB(640x480) efficiently?
If I scale the thermal to 640x480 it doesn't scale up evenly or doesn't have enough quality to do any processing on it. Any ideas would be really helpful.
RGB image - http://postimg.org/image/66f9hnaj1/
Thermal image - http://postimg.org/image/6g1oxbm5n/
If you scale the resolution of the thermal image up by a factor of 8 and use Bilinear Interpolation you should get a smoother, less-blocky result.
When combining satellite images of different resolution, (I talk about satellite imagery because that is my speciality), you would normally use the highest resolution imagery as the Lightness or L channel to give you apparent resolution and detail in the shapes because the human eye is good at detecting contrast and then use the lower resolution imagery to fill in the Hue and Saturation, or a and b channels to give you the colour graduations you are hoping to see.
So, in concrete terms, I would consider converting the RGB to Lab or HSL colourspace and retaining the L channel. The take the thermal image and up-res it by 8 using bilinear interpolation and use the result as the a, or b or H or S and maybe fill in the remaining channel with the one from the RGB that has the most variance. Then convert the result back to RGB for a false-colour image. It is hard to tell without seeing the images or knowing what you are hoping to find in them. But in general terms, that would be my approach. HTH.
Note: Given that a of Lab colourspace controls the red/green relationship, I would probably try putting the thermal data in that channel so it tends to show more red the "hotter" the thermal channel is.
Updated Answer
Ok, now I can see your images and you have a couple more problems... firstly the images are not aligned, or registered, with each other which is not going to help - try using a tripod ;-) Secondly, your RGB image is very poorly exposed so it is not really going to contribute that much detail - especially in the shadows - to the combined image.
So, firstly, I used ImageMagick at the commandline to up-size the thermal image like this:
convert thermal.png -resize 640x480 thermal.png
Then, I used Photoshop to do a crude alignment/registration. If you want to try this, the easiest way is to put the two images into separate layers of the same document and set the Blending mode of the upper layer to Difference. Then use the Move Tool (shortcut v) to move the upper image around till the screen goes black which means that the details are on top of each other and when subtracted they come to zero, i.e. black. Then crop so the images are aligned and turn off one layer and save, then turn that layer back on and the other layer off and save again.
Now, I used ImageMagick again to separate the two images into Lab layers:
convert bigthermalaligned.png -colorspace Lab -separate thermal.png
convert rgbaligned.png -colorspace Lab -separate rgb.png
which gives me
thermal-0.png => L channel
thermal-1.png => a channel
thermal-2.png => b channel
rgb-0.png => L channel
rgb-1.png => a channel
rgb-2.png => b channel
Now I can take the L channel of the RGB image and the a and b channels of the thermal image and put them together:
convert rgba-0.png thermal-1.png thermal-2.png -normalize -set colorpsace lab -combine result.png
And you get this monstrosity! Obviously you can play around with the channels and colourpsaces and a tripod and proper exposures, but you should be able to see some of the details of the RGB image - especially the curtains on the left, the lights, the camera on the cellphone and the label on the water bottle - have come through into the final image.
Assuming that the images were not captured using a single camera, you need to note that the two cameras may have different parameters. Also, if it's two cameras, they are probably not located in the same world position (offset).
In order to resolve this, you need to get the intrinsic calibration matrix of each of the cameras, and find the offset between them.
Then, you can find a transformation between a pixel in one camera and the other. Unfortunately, if you don't have any depth information about the scene, the most you can do with the calibration matrix is get a ray direction from the camera position to the world.
The easy approach would be to ignore the offset (assuming the scene is not too close to the camera), and just transform the pixel.
p2=K2*(K1^-1 * p1)
Using this you can construct a new image that is a composite of both.
The more difficult approach would be to reconstruct the 3D structure of the scene by finding features that you can match between both images, and then triangulate the point with both rays.

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Convert to grayscale and reduce the size

I am trying to develop an OCR in VB6 and I have some problems with BMP format. I have been investigating the OCR process and the first step is to convert the image in "black and white" with a threshold. The conversion process is easy to understand and I have done it. However, I'm trying to reduce the size of the resulting image because it uses less colors (each pixel only has 256 possible values in grayscale). In the original image I have 3 colors (red, green and blue) but now I only need one color (the value in grayscale). In this moment I have achieved the conversion but the resulting grayscale images have the same size as the original color image (I assign the same color value in the three channels).
I have tried to modify the header of the BMP file but I haven't achieved anything and now I don't understand how it works. For example, if I convert the image with paint, the offset that is specified in the header changes its value. If the header is constant, why does the offset change its value?.
The thing is that a grey-scale bitmap image is the same size as a color bitmap image because the data that is used to save the grey colors takes just as much space as the color.
The only difference is that grey is just 3 times that same value. (160,160,160) for example with color giving something like (123,200,60). The grey values are just a small subset of the RGB field.
You can trim down the size after converting to grey-scale by converting it from 24 bit to 16 bit or 8-bit for example. Although it depends on what you are using to do the conversion whether that is already supplied to you. Otherwise you'll have to make it yourself.
You can also try using something else than BMP images. PNG files are lossless too, and would even save space with the 24 bit version. Image processing libraries usally give you several options as output formats. Otherwise you can probably find a library that does this for you.
You can write your own conversion in a "lockbits" method. It takes a while to understand how to lock/unlock bits correctly, but the effort is worth it, and once you have the code working you'll see how it can be applied to other scenarios. For example, using an lock/unlock bits technique you can access the pixel values from a bitmap, copy those pixel values into an array, manipulate the array, and then copy the modified array back into a bitmap. That's much faster than calling GetPixel() and SetPixel(). That's still not the fastest image manipulation code one can write, but it's relatively easy to implement and maintain the code.
It's been a while since I've written VB6 code, but Bob Powell's often has good examples, and he has a page about lock bits:
https://web.archive.org/web/20121203144033/http://www.bobpowell.net/lockingbits.htm
In a pinch you could create a new Bitmap of the appropriate format and call SetPixel() for every pixel:
Every pixel (x,y) in your 24-bit color image will have a color value (r,g,b)
After conversion to a 24-bit gray image, each pixel (x,y) will have a three equal values for each color channel; that can be expressed as (n,n,n) as Willem wrote in his reply. If all three colors R,G,B have the same value, then you can say that color value is the "grayscale" value of that pixel. This is the same shade of gray that you will see in your final 8-bit bitmap.
Call SetPixel for each pixel (x,y) in a newly created 8-bit bitmap that is the same width and height as the original color image.

Differences between gamma correction and exposure in image processing

Anyone know what is the difference between gamma and exposure? And what is the difference between gamma correction and exposure adjustment in image processing?
Since you don't have an image processing background i would start with a basics
1) Every digital image has a dynamic range of gray levels.Now gray levels are nothing but values which ultimately corresponds to a color. Say Mono-chrome image(Black and white image) has only 2 gray levels i.e. 0 and 1 where 0 means black and 1 means white color. Here the dynamic range is [0-1]. In these images each pixel is stored as a single bit.
Similarly there is Gray-scale images have shades of gray in them. Here each pixel is stored as 8-bit so dynamic range is [0-255]. How? just apply the formula (2^n -1) where n is number of bits. i.e. (2^8 - 1) i.e. 256-1 = 255.
Similarly there are color-images which are 24-bit images.In general the dynamic range of gray levels in image is given by [0 - L-1] where L is number of gray levels.
2) Now once you have understood what is dynamic range lets understand Gamma correction.Gamma correction is nothing but a function that compress the dynamic range of images so that we can view the image more nicely or properly. But why do we need to compress dynamic range? A best day to day example is during day time when we cannot see the stars, the reason is because the intensity of sun is so large as compared to the intensity of stars that we cannot see the stars in day time.Similarly when dynamic range is high in an image then that of the display device we cannot see the image properly. Therefore we can use gamma correction to compress the dynamic range of image
3) Gamma correction can be written as g(x,y) = c * f(x,y) ^ # where # is symbol of gamma (since i don't know how to write gamma symbol here, i have used #) and f(x,y) is original image with high dynamic range, g(x,y) is modified image. C is a positive constant.
4) Exposure as said earlier in an answer its phenomena in camera. I don't know much about it as it is not covered in the syllabus of image processing which i am currently studying.
Gamma correction is a non-linear global function that compresses certain ranges in your image. It is mainly used in order to be more efficient from human vision point of view, in fixed point format. It is absent in raw files, but exists in JPEG. Each pixel undergoes the following transformation:
y = x^p
Exposure is a physical phenomenon in your camera. Exposure adjustment on the other hand is linear global function. It is used mainly in order to compensate for lack or excess of exposure in the camera:
y = a*x
Exposure is an indication of the total quantity of light that reaches the CCD of your camera (or the silver ions on film). It can be expressed as the number of photons that hit your image-recording elements.
Films and CCD are calibrated to expect a certain quantity of light (certain number of photons) in order to be able to create an "average" image.
The higher the "expected" quantity of light, the lower the ISO number of your film (or camera setting) => in order to obtain a normal image, a film (or camera setting) of 100 ISO needs more light than a film of 3200 ISO, hence the use of 3200 ISO films for night photography.
next step: the camera thing. When you want to make a picture (= have photons hit your CCD or film), you need to open the diaphragm of your camera. Depending on how much you open your diaphragm, the nature of your image will change (speaking from an artistic point of view here). If your diaphragm is wide open, most of the image which is not perfectly in focus will be blurred (e.g. as used in portrait photography). Conversely, if your diaphragm is only a little bit open during exposure, most of your image will be very sharp. This is used very often for landscape photography.
As your film (or CCD) expect a certain quantity of light with a given ISO value, it is obvious that a smaller diaphragm opening requires longer exposure times whereas a wide open diaphraghm requires a very short time.
Good books about this subject are the series "The Camera", "The Negative" and "The Print" by Ansel Adams.
Conclusion: exposure and gamma correction are different things.
- Exposure is a part of the parameters you need to control while creating your initial image through the use of a camera.
- Gamma correction is related to subsequent manipulation of your image file. I'm not sure if the notion of "gamma correction" is being used in the context of film.
Basically:
Gamma is a monitor thing.
Exposure is a camera thing.

Resources