Task at hand is to split an available BGR(raw) image into N equal number image. Can someone give me hint on storage of BGR -raw images in memory
For example:
If I have 1920 * 1080 pixels BGR image, and I would like to split it into 8 equal parts then is there any available framework that can help me. I'm trying to write native CPP code on Android, working with OpenCV would be expensive to do, any other alternative
Related
I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4
Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))
For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.
I am looking at some view(screen) building codes to draw GUIs, like transforming distorted lens view to a planar view. In that I came across the terms forward LUT and reverse LUT and I don't understand what is it and why it is being used?
Can someone please explain me or give some pointers where I can learn about them?
A "Look Up Table", or LUT is a small table, normally with 256 entries in it. It is used for applying "point processes" to images, i.e. where the new value after processing of each pixel only depends on the previous value at that point (and not any neighbouring pixels).
Instead of doing a load of maths or if statements for each of the 12 million pixels in your image, you just use the current 8-bit value of each pixel as an index into the Look Up Table to find the new value for that pixel. It is normally much faster than stalling your CPU doing if statements as it is just an indexing operation into a table. It is also very simple to implement in hardware at high speed.
You can use it to threshold an image, or to alter the contrast in an image, or to save space. In this last technique, you are basically creating an image with a palette of 256 colours, then instead of storing 3 bytes for each pixel (i.e. R, G and B), you just store 1 byte and use that byte to "look up" the colour - and as if by magic, your image is 1/3rd the size.
Here is a little example, I make a LUT with all elements below 64 black and all elements above that into white, then apply it to a greyscale image. I added the red border afterwards so you can see the extent of the image on Stack Overflow's white background:
#!/usr/local/bin/python3
import numpy as np
from PIL import Image
# Open the input image as numpy array, convert to greyscale
npImage=np.array(Image.open("grey.png").convert("L"))
# Make a LUT (Look-Up Table) to translate image values
LUT=np.zeros(256,dtype=np.uint8)
for idx in range(64,255):
# All pixels > 64 become white
LUT[idx]=255
# Apply LUT
npImage = LUT[npImage]
# Apply LUT and save resulting image
Image.fromarray(npImage).save('result.png')
Start Image:
Result Image:
Here's another example where I make the LUT run backwards, so it inverts the image.
#!/usr/local/bin/python3
import numpy as np
from PIL import Image
# Open the input image as numpy array, convert to greyscale
npImage=np.array(Image.open("grey.png").convert("L"))
# Make a LUT (Look-Up Table) to translate image values to their inverse/negative
# i.e. 0 input maps to 255 output
# 1 input maps to 254 output
LUT = np.arange(255,-1,-1,dtype=np.uint8)
# Apply LUT
npImage = LUT[npImage]
# Apply LUT and save resulting image
Image.fromarray(npImage).save('result.png')
Keywords: Python, Numpy, image, image processing, LUT, Look-Up Table, Lookup, negate, inverse, threshold
I have an RGBN band .tif satellite image of PlanetScope which I would like to preprocess for a neural network. When I view the image in QGIS I get a nice RGB image, however when importing as a numpy array the image is very light. Some information on the image:
Type of the image : <class 'numpy.ndarray'>
Shape of the image : (7327, 7327, 5)
Image Height 7327
Image Width 7327
Image Shape (7327, 7327, 5)
Dimension of Image 3
Image size 268424645
Maximum RGB value in this image 65535
Minimum RGB value in this image 1
The image is uint16 type. The last band (pic[:,:,5]) only shows a singular value (65535) in all instances. Hence, I think this band should be removed leaving the RGBN bands, of which the information is as follows:
Type of the image : <class 'numpy.ndarray'>
Shape of the image : (7327, 7327, 4)
Image Height 7327
Image Width 7327
Image Shape (7327, 7327, 4)
Dimension of Image 3
Image size 214739716
Maximum RGB value in this image 19382
Minimum RGB value in this image 1
The maximum value (19382) of the RGBN image seems pretty low knowing that the range of uint16 images is 0-65535. Subsequently the function 'skimage.io.imshow(image)' shows a nearly white image. I do not understand why QGIS is able to show the image properly in real color but python does not.
The image is loaded by means of pic = skimage.io.imread("planetscope_20180502_43.tif")
I have tried scaling the image with img_scaled = pic / pic.max() and converting it to uint8 before viewing the image with img_as_ubyte(pic) without success. I view the image with skimage.io.imshow(pic).
If necessary the image can be downloaded here. I incorporate the image because somehow it seems not possible to import the image using certain packages (Tifffile for example does not work on this tif file).
The max values of the RGB channels are lower than that of the N channel:
>>> pic.max(axis=(0,1))
array([10300, 7776, 11530, 19382, 65535], dtype=uint16)
But look at the mean values of the RGB channels: they are much smaller than max/2:
>>> pic.mean(axis=(0,1))
array([ 439.14001492, 593.17588875, 542.4638124 , 3604.6826063 ,
65535. ])
You have a high dynamic range (HDR) image here and want to compress its high range to 8 bits for displaying. A linear scaling with the maximum value won't do as the highest peaks are an order of magnitude higher than the average image values. Plotting the histogram of the RGB values:
If you do a linear scaling with some factor that's a bit above the mean and just disregard clipping the rest (now overexposed) values you can display it to see you have valid data:
rgb = pic[..., :3].astype(np.float32) / 2000
rgb = np.clip(rgb, 0.0, 1.0)
But to get a proper image, you will need to look into what the camera response of your data is, and how these HDR images are usually compressed into 8 bits for displaying (I'm not familiar with satellite imaging).
Thank you w-m, I was able to built on that and figured it out. Since w-m already did a neat job to elaborate on the problem, I will just leave the code here that I wrote to resolve the issue:
for i in range(0,4):
min_ = int(np.percentile(image[:,:,i],2))
max_ = int(np.percentile(image[:,:,i],98))
np.maximum(image[:,:,i])
np.minimum(image[:,:,i])
image[:,:,i] = np.interp(image[:,:,i], image[:,:,i].min(), image[:,:,i].max(), (0,255))
image_8bit_scaled = skimage.img_as_ubyte(image)
I would like to compare videos. To compare the quality (Non blurry) by coding a C program. Someone told me to learn about DFT (Discrete Fourier Transform) for image analysis and to use a FFT or DFT tool to learn the difference between blurred vs detailed (non-blurry) copies of same image.
(copied from other question):
Lets say we have different files with different video quality, one is extremely clear, other is blurred, one is having rough colors. Compare all files basically frame by frame and report to the user which has better quality.
So can anyone help me with this ??
Let's say we have various files having different video quality:
one is extremely clear, other is blurred, one is having rough colors.
Compare all files basically frame by frame and report to the user which has better quality.
(1) Color Quality detection...
To check which has better color, you analyze the histograms of the test images. The histogram will be a count of how many pixels have intensity X. Where X is a number ranging between 0 up to 255 (because each red, green and blue channels each holds any of those 256 possible intensities).
There are many tutorials online about how to create a histogram since it's a basic task in computer graphics.
Generally it goes like:
First make 3 arrays (eg: hist_Red) to hold data for red, green and blue channels.
Break up (using FOR loop) each pixel into individual R/G/B channel components:
example:
temp_Red = this_pixel >> 16 & 0x0ff;
temp_Grn = this_pixel >> 8 & 0x0ff;
temp_Blu = this_pixel >> 0 & 0x0ff;
Then add +1 to that specific red/green/blue intensity in relevant histogram.
example:
hist_Red[ temp_Red ] += 1;
hist_Grn[ temp_Grn ] += 1;
hist_Blu[ temp_Blu ] += 1;
By adding the totals of red, green and blue, you will have total intensities of RGB in an array that could build charts like below. Check with image's array has most values to find image with better quality of colors:
(2) Detailed vs Blurred detection...
You can try using a convolution filter to detect blur in image. Give the filter a kernel (eg: a matrix). The matrix (3x3) shown below gives an edge-detect filter, where blurred images give less edges (therefore gives more black pixels).
Use logic to assume that: more black pixels EQuals a more blurred image (less detail).
You can read about convolutions here
Lode's Computer Graphics Tutorial: Image Filtering
Image Convolution with C/C++ code
PDF Image Manipulation: Filters and Convolutions
PDF Read page 10 onwards : Convolution filters
Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.