OpenCV (Emgu.CV) -- compositing images with alpha - image-processing

I'm using Emgu.CV to perform some basic image manipulation and composition. My images are loaded as Image<Bgra,Byte>.
Question #1: When I use the Image<,>.Add() method, the images are always blended together, regardless of the alpha value. Instead I'd like them to be composited one atop the other, and use the included alpha channel to determine how the images should be blended. So if I call image1.Add(image2) any fully opaque pixels in image2 would completely cover the pixels from image1, while semi-transparent pixels would be blended based on the alpha value.
Here's what I'm trying to do in visual form. There's a city image with some "transparent holes" cut out, and a frog behind. This is what it should look like:
And this is what openCV produces.
How can I get this effect with OpenCV? And will it be as fast as calling Add()?
Question #2: is there a way to perform this composition in-place instead of creating a new image with each call to Add()? (e.g. image1.AddImageInPlace(image2) modifies the bytes of image1?)
NOTE: Looking for answers within Emgu.CV, which I'm using because of how well it handles perspective warping.

Before OpenCV 2.4 there was no support of PNGs with alpha channel.
To verify if your current version supports it, print the number of channels after loading an image that you are certain to be RGBA. If it supports, the application will output the number 4, else it will output number 3 (RGB). Using the C API you would do:
IplImage* t_img = cvLoadImage(argv[1], CV_LOAD_IMAGE_UNCHANGED);
if (!t_img)
{
printf("!!! Unable to load transparent image.\n");
return -1;
}
printf("Channels: %d\n", t_img->nChannels);
If you can't update OpenCV:
There are some posts around that try to bypass this limitation but I haven't tested them myself;
The easiest solution would be to use another API to load the image and blend it, check blImageBlending;
Another alternative, not as lightweight, is to use Qt.
If your version already supports PNGs with RGBA:
Take a look at Emulating photoshop’s blending modes in OpenCV. It implements several Photoshop blending modes and I imagine you are capable of converting that code to .Net.
EDIT:
I had to deal with this problem recently and I've demonstrated how to deal with it on this answer.

You'll have to iterate through each pixel. I'm assuming image 1 is the frog image, and image 2 is the city image, with image1 always being bigger than image2.
//to simulate image1.AddInPlace(image2)
int image2w = image2.Width;
int image2h = image2.Height;
int i,j;
var alpha;
for (i = 0; i < w; i++)
{
for (j = 0; j < h; j++)
{
//alpha=255 is opaque > image2 should be used
alpha = image2[3][j,i].Intensity;
image1[j, i]
= new Bgra(
image2[j, i].Blue * alpha + (image1[j, i].Blue * (255-alpha)),
image2[j, i].Green * alpha + (image1[j, i].Green * (255-alpha)),
image2[j, i].Red * alpha + (image1[j, i].Red * (255-alpha)));
}
}

Using Osiris's suggestion as a starting point, and having checked out alpha compositing on Wikipedia, i ended up with the following which worked really nicely for my purposes.
This was used this with Emgucv. I was hoping that the opencv gpu::AlphaComposite methods were available in Emgucv which I believe would have done the following for me, but alas the version I am using didn't appear to have them implemented.
static public Image<Bgra, Byte> Overlay( Image<Bgra, Byte> image1, Image<Bgra, Byte> image2 )
{
Image<Bgra, Byte> result = image1.Copy();
Image<Bgra, Byte> src = image2;
Image<Bgra, Byte> dst = image1;
int rows = result.Rows;
int cols = result.Cols;
for (int y = 0; y < rows; ++y)
{
for (int x = 0; x < cols; ++x)
{
// http://en.wikipedia.org/wiki/Alpha_compositing
double srcA = 1.0/255 * src.Data[y, x, 3];
double dstA = 1.0/255 * dst.Data[y, x, 3];
double outA = (srcA + (dstA - dstA * srcA));
result.Data[y, x, 0] = (Byte)(((src.Data[y, x, 0] * srcA) + (dst.Data[y, x, 0] * (1 - srcA))) / outA); // Blue
result.Data[y, x, 1] = (Byte)(((src.Data[y, x, 1] * srcA) + (dst.Data[y, x, 1] * (1 - srcA))) / outA); // Green
result.Data[y, x, 2] = (Byte)(((src.Data[y, x, 2] * srcA) + (dst.Data[y, x, 2] * (1 - srcA))) / outA); // Red
result.Data[y, x, 3] = (Byte)(outA*255);
}
}
return result;
}
A newer version, using emgucv methods. rather than a loop. Not sure it improves on performance.
double unit = 1.0 / 255.0;
Image[] dstS = dst.Split();
Image[] srcS = src.Split();
Image[] rs = result.Split();
Image<Gray, double> srcA = srcS[3] * unit;
Image<Gray, double> dstA = dstS[3] * unit;
Image<Gray, double> outA = srcA.Add(dstA.Sub(dstA.Mul(srcA)));// (srcA + (dstA - dstA * srcA));
// Red.
rs[0] = srcS[0].Mul(srcA).Add(dstS[0].Mul(1 - srcA)).Mul(outA.Pow(-1.0)); // Mul.Pow is divide.
rs[1] = srcS[1].Mul(srcA).Add(dstS[1].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[2] = srcS[2].Mul(srcA).Add(dstS[2].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[3] = outA.Mul(255);
// Merge image back together.
CvInvoke.cvMerge(rs[0], rs[1], rs[2], rs[3], result);
return result.Convert<Bgra, Byte>();

I found an interesting blog post on internet, which I think is related to what you are trying to do.
Please have a look at the Creating Overlays Method (archive.org link). You can use this idea to implement your own function to add two images in the way you mentioned above, making some particular areas in the image transparent while leaving the rest as it is.

Related

Preprocessing for OCR [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

how to remove a stamp from an image with opencv

I am working on a OCR project, and in the preprocessing, some RED stamps need to be removed, so that the text near the stamps could be detected. I try a lot of methods(like change the values of pixel, threshold in Red channel) but fail.
Any suggestions are highly appreciated.
Python, C++, Java or what? Since you didn't state the OpenCV implementation you are using, I'm giving my answer in C++.
An option is to use the HSV color space to filter out the range of red values that defines the seal. My approach is to use the CMYK color space to filter everything except the black (or dark) text. It should do a pretty good job on printed media, which is your case.
//read input image:
std::string imageName = "C://opencvImages//seal.png";
cv::Mat imageInput = cv::imread( imageName );
Now, perform the CMYK conversion. OpenCV does not support this operation out of the box, bear with me as I provide the helper function at the end of this post.
//CMYK conversion:
std::vector<cv::Mat> cmyk;
cmyk = rgb2cmyk( imageInput );
//This is the Black channel:
cv::Mat blackChannel = cmyk[3].clone();
This is the image of the black channel; it is nice how everything that is not black (or dark) practically disappears!
Now, optionally, enhance the result applying brightness and contrast adjustment. Just try to separate the text from the background a little bit better; we want some defined pixel distributions to get a nice binary image.
//Brightness and contrast adjustment:
float alpha = 2.0;
float beta = -50.0;
contrastBrightnessAdjustment( blackChannel, alpha, beta );
Again, OpenCV does not offer brightness and contrast adjustment out of the box; however, its implementation is very easy. Hold on a little bit, and let me show you the result of this operation:
Nice. Let's Otsu-threshold this bad boy to get a nice binary image containing the clean text:
cv::threshold( blackChannel, binaryImage ,0, 255, cv::THRESH_OTSU );
This is what you get:
Now, the RGB to CMYK conversion function. I'm using the following implementation. The function receives an RGB image and returns a vector containing each of the CMYK channels
std::vector<cv::Mat> rgb2cmyk( cv::Mat& inputImage ){
std::vector<cv::Mat> cmyk;
for (int i = 0; i < 4; i++) {
cmyk.push_back( cv::Mat( inputImage.size(), CV_8UC1 ) );
}
std::vector<cv::Mat> inputRGB;
cv::split( inputImage, inputRGB );
for (int i = 0; i < inputImage.rows; i++)
{
for (int j = 0; j < inputImage.cols; j++)
{
float r = (int)inputRGB[2].at<uchar>(i, j) / 255.;
float g = (int)inputRGB[1].at<uchar>(i, j) / 255.;
float b = (int)inputRGB[0].at<uchar>(i, j) / 255.;
float k = std::min(std::min(1-r, 1-g), 1-b);
cmyk[0].at<uchar>(i, j) = (1 - r - k) / (1 - k) * 255.;
cmyk[1].at<uchar>(i, j) = (1 - g - k) / (1 - k) * 255.;
cmyk[2].at<uchar>(i, j) = (1 - b - k) / (1 - k) * 255.;
cmyk[3].at<uchar>(i, j) = k * 255.;
}
}
return cmyk;
}
And the contrastBrightnessAdjustment function is this, implemented using pointer arithmetic. The function receives a grayscale image and applies the linear transformation via the alpha and beta parameters:
void contrastBrightnessAdjustment( cv::Mat inputImage, float alpha, int beta ){
cv::MatIterator_<cv::Vec3b> it, end;
for (it = inputImage.begin<cv::Vec3b>(), end = inputImage.end<cv::Vec3b>(); it != end; ++it) {
uchar &pixel = (*it)[0];
pixel = cv::saturate_cast<uchar>(alpha*pixel+beta);
}
}

How to know if a photo is black or too dark?

I have a UIImagePickerViewController where the user takes a photo. My problem is how to know before uploading the photo to the server if the user is sending a dark photo. I mean a totally or nearly black.
I was researching and I found this:
const UInt8 *pixels = CFDataGetBytePtr(imageData);
UInt8 blackThreshold = 10; // or some value close to 0
int bytesPerPixel = 4;
for(int x = 0; x < width1; x++) {
for(int y = 0; y < height1; y++) {
int pixelStartIndex = (x + (y * width1)) * bytesPerPixel;
UInt8 alphaVal = pixels[pixelStartIndex]; // can probably ignore this value
UInt8 redVal = pixels[pixelStartIndex + 1];
UInt8 greenVal = pixels[pixelStartIndex + 2];
UInt8 blueVal = pixels[pixelStartIndex + 3];
if(redVal < blackThreshold && blueVal < blackThreshold && greenVal < blackThreshold) {
//This pixel is close to black...do something with it
}
}
}
However, I don't know how to apply the algorithm.
Yep that's a fairly simple way of doing it. You could, for example, iterate through and see what percentage of the pixels are pure black (i.e. clipped shadows) or nearly black. Or you could average the pixel colors throughout the whole image and see if it falls below a certain threshold. There are lots of approaches and these two might be a tad simplistic, but I'm not sure if this calls for anything particularly sophisticated. What threshold you want to use is up to you.
Also, while it has little practical impact, if I was going to be picky about the algorithm, I might only perform the "brightness" logic if the alphaVal was over a certain threshold, as well, as the color information is meaningless at transparent portions of image. Having said that, real photos rarely have any transparency, so this may be non-issue.
FYI, here is Apple's code for retrieving the pixel buffer. It's an oldie, but a goodie. (If I recall correctly, the only hassle is that the kCGImageAlphaPremultipliedFirst reference in CreateARGBBitmapContext must be cast with (CGBitmapInfo).)
By the way, if you're trying to determine the luminance of a particular pixel, one common algorithm is:
luminance = 0.2126 * red + 0.7152 * green + 0.0722 * blue

Replicate OpenCV resize with bilinar interpolation in C (shrink only)

I'm trying to make a copy of the resizing algorithm of OpenCV with bilinear interpolation in C. What I want to achieve is that the resulting image is exactly the same (pixel value) to that produced by OpenCV. I am particularly interested in shrinking and not in the magnification, and I'm interested to use it on single channel Grayscale images. On the net I read that the bilinear interpolation algorithm is different between shrinkings and enlargements, but I did not find formulas for shrinking-implementations, so it is likely that the code I wrote is totally wrong. What I wrote comes from my knowledge of interpolation acquired in a university course in Computer Graphics and OpenGL. The result of the algorithm that I wrote are images visually identical to those produced by OpenCV but whose pixel values are not perfectly identical (in particular near edges). Can you show me the shrinking algorithm with bilinear interpolation and a possible implementation?
Note: The code attached is as a one-dimensional filter which must be applied first horizontally and then vertically (i.e. with transposed matrix).
Mat rescale(Mat src, float ratio){
float width = src.cols * ratio; //resized width
int i_width = cvRound(width);
float step = (float)src.cols / (float)i_width; //size of new pixels mapped over old image
float center = step / 2; //V1 - center position of new pixel
//float center = step / src.cols; //V2 - other possible center position of new pixel
//float center = 0.099f; //V3 - Lena 512x512 lower difference possible to OpenCV
Mat dst(src.rows, i_width, CV_8UC1);
//cycle through all rows
for(int j = 0; j < src.rows; j++){
//in each row compute new pixels
for(int i = 0; i < i_width; i++){
float pos = (i*step) + center; //position of (the center of) new pixel in old map coordinates
int pred = floor(pos); //predecessor pixel in the original image
int succ = ceil(pos); //successor pixel in the original image
float d_pred = pos - pred; //pred and succ distances from the center of new pixel
float d_succ = succ - pos;
int val_pred = src.at<uchar>(j, pred); //pred and succ values
int val_succ = src.at<uchar>(j, succ);
float val = (val_pred * d_succ) + (val_succ * d_pred); //inverting d_succ and d_pred, supposing "d_succ = 1 - d_pred"...
int i_val = cvRound(val);
if(i_val == 0) //if pos is a perfect int "x.0000", pred and succ are the same pixel
i_val = val_pred;
dst.at<uchar>(j, i) = i_val;
}
}
return dst;
}
Bilinear interpolation is not separable in the sense that you can resize vertically and the resize again vertically. See example here.
You can see OpenCV's resize code here.

How to convert an 8-bit OpenCV IplImage* to a 32-bit IplImage*?

I need to convert an 8-bit IplImage to a 32-bits IplImage. Using documentation from all over the web I've tried the following things:
// general code
img2 = cvCreateImage(cvSize(img->width, img->height), 32, 3);
int height = img->height;
int width = img->width;
int channels = img->nChannels;
int step1 = img->widthStep;
int step2 = img2->widthStep;
int depth1 = img->depth;
int depth2 = img2->depth;
uchar *data1 = (uchar *)img->imageData;
uchar *data2 = (uchar *)img2->imageData;
for(h=0;h<height;h++) for(w=0;w<width;w++) for(c=0;c<channels;c++) {
// attempt code...
}
// attempt one
// result: white image, two red spots which appear in the original image too.
// this is the closest result, what's going wrong?!
// see: http://files.dazjorz.com/cache/conversion.png
((float*)data2+h*step2+w*channels+c)[0] = data1[h*step1+w*channels+c];
// attempt two
// when I change float to unsigned long in both previous examples, I get a black screen.
// attempt three
// result: seemingly random data to the top of the screen.
data2[h*step2+w*channels*3+c] = data1[h*step1+w*channels+c];
data2[h*step2+w*channels*3+c+1] = 0x00;
data2[h*step2+w*channels*3+c+2] = 0x00;
// and then some other things. Nothing did what I wanted. I couldn't get an output
// image which looked the same as the input image.
As you see I don't really know what I'm doing. I'd love to find out, but I'd love it more if I could get this done correctly.
Thanks for any help I get!
The function you are looking for is cvConvertScale(). It automagically does any type conversion for you. You just have to specify that you want to scale by a factor of 1/255 (which maps the range [0...255] to [0...1]).
Example:
IplImage *im8 = cvLoadImage(argv[1]);
IplImage *im32 = cvCreateImage(cvSize(im8->width, im8->height), 32, 3);
cvConvertScale(im8, im32, 1/255.);
Note the dot in 1/255. - to force a double division. Without it you get a scale of 0.
Perhaps this link can help you?
Edit In response to the second edit of the OP and the comment
Have you tried
float value = 0.5
instead of
float value = 0x0000001;
I thought the range for a float color value goes from 0.0 to 1.0, where 1.0 is white.
Floating point colors go from 0.0 to 1.0, and uchars go from 0 to 255. The following code fixes it:
// h is height, w is width, c is current channel (0 to 2)
int b = ((uchar *)(img->imageData + h*img->widthStep))[w*img->nChannels + c];
((float *)(img2->imageData + h*img2->widthStep))[w*img2->nChannels + c] = ((float)b) / 255.0;
Many, many thanks to Stefan Schmidt for helping me fix this!
If you do not put the dot (.), some compilers will understand is as an int division, giving you a int result (zero in this case).
You can create an IplImage wrapper using boost::shared_ptr and template-metaprogramming. I have done that, and I get automatic garbage collection, together with automatic image conversions from one depth to another, or from one-channel to multi-channel images.
I have called the API blImageAPI and it can be found here:
http://www.barbato.us/2010/10/14/image-data-structure-based-shared_ptr-iplimage/
It is very fast, and make code very readable, (good for maintaining algorithms)
It is also can be used instead of IplImage in opencv algorithms without changing anything.
Good luck and have fun writing algorithms!!!
IplImage *img8,*img32;
img8 =cvLoadImage("a.jpg",1);
cvNamedWindow("Convert",1);
img32 = cvCreateImage(cvGetSize(img8),IPL_DEPTH_32F,3);
cvConvertScale(img8,img32,1.0/255.0,0.0);
//For Confirmation Check the pixel values (between 0 - 1)
for(int row = 0; row < img32->height; row++ ){
float* pt = (float*) (img32->imageData + row * img32->widthStep);
for ( int col = 0; col < width; col++ )
printf("\n %3.3f , %3.3f , %3.3f ",pt[3*col],pt[3*col+1],pt[3*col+2]);
}
cvShowImage("Convert",img32);
cvWaitKey(0);
cvReleaseImage(&img8);
cvReleaseImage(&img32);
cvDestroyWindow("Convert");

Resources