I am working on a OCR project, and in the preprocessing, some RED stamps need to be removed, so that the text near the stamps could be detected. I try a lot of methods(like change the values of pixel, threshold in Red channel) but fail.
Any suggestions are highly appreciated.
Python, C++, Java or what? Since you didn't state the OpenCV implementation you are using, I'm giving my answer in C++.
An option is to use the HSV color space to filter out the range of red values that defines the seal. My approach is to use the CMYK color space to filter everything except the black (or dark) text. It should do a pretty good job on printed media, which is your case.
//read input image:
std::string imageName = "C://opencvImages//seal.png";
cv::Mat imageInput = cv::imread( imageName );
Now, perform the CMYK conversion. OpenCV does not support this operation out of the box, bear with me as I provide the helper function at the end of this post.
//CMYK conversion:
std::vector<cv::Mat> cmyk;
cmyk = rgb2cmyk( imageInput );
//This is the Black channel:
cv::Mat blackChannel = cmyk[3].clone();
This is the image of the black channel; it is nice how everything that is not black (or dark) practically disappears!
Now, optionally, enhance the result applying brightness and contrast adjustment. Just try to separate the text from the background a little bit better; we want some defined pixel distributions to get a nice binary image.
//Brightness and contrast adjustment:
float alpha = 2.0;
float beta = -50.0;
contrastBrightnessAdjustment( blackChannel, alpha, beta );
Again, OpenCV does not offer brightness and contrast adjustment out of the box; however, its implementation is very easy. Hold on a little bit, and let me show you the result of this operation:
Nice. Let's Otsu-threshold this bad boy to get a nice binary image containing the clean text:
cv::threshold( blackChannel, binaryImage ,0, 255, cv::THRESH_OTSU );
This is what you get:
Now, the RGB to CMYK conversion function. I'm using the following implementation. The function receives an RGB image and returns a vector containing each of the CMYK channels
std::vector<cv::Mat> rgb2cmyk( cv::Mat& inputImage ){
std::vector<cv::Mat> cmyk;
for (int i = 0; i < 4; i++) {
cmyk.push_back( cv::Mat( inputImage.size(), CV_8UC1 ) );
}
std::vector<cv::Mat> inputRGB;
cv::split( inputImage, inputRGB );
for (int i = 0; i < inputImage.rows; i++)
{
for (int j = 0; j < inputImage.cols; j++)
{
float r = (int)inputRGB[2].at<uchar>(i, j) / 255.;
float g = (int)inputRGB[1].at<uchar>(i, j) / 255.;
float b = (int)inputRGB[0].at<uchar>(i, j) / 255.;
float k = std::min(std::min(1-r, 1-g), 1-b);
cmyk[0].at<uchar>(i, j) = (1 - r - k) / (1 - k) * 255.;
cmyk[1].at<uchar>(i, j) = (1 - g - k) / (1 - k) * 255.;
cmyk[2].at<uchar>(i, j) = (1 - b - k) / (1 - k) * 255.;
cmyk[3].at<uchar>(i, j) = k * 255.;
}
}
return cmyk;
}
And the contrastBrightnessAdjustment function is this, implemented using pointer arithmetic. The function receives a grayscale image and applies the linear transformation via the alpha and beta parameters:
void contrastBrightnessAdjustment( cv::Mat inputImage, float alpha, int beta ){
cv::MatIterator_<cv::Vec3b> it, end;
for (it = inputImage.begin<cv::Vec3b>(), end = inputImage.end<cv::Vec3b>(); it != end; ++it) {
uchar &pixel = (*it)[0];
pixel = cv::saturate_cast<uchar>(alpha*pixel+beta);
}
}
Related
I'm trying to reproduce Photoshop's multiply blend mode in OpenCV. Equivalents to this would be what you find in GIMP, or when you use the CIMultiplyBlendMode in Apple's CoreImage framework.
Everything I read online suggests that multiply blending is accomplished simply by multiplying the channels of the two input images (i.e., Blend = AxB). And, this works, except for the case(s) where alpha is < 1.0.
You can test this very simply in GIMP/PhotoShop/CoreImage by creating two layers/images, filling each with a different solid color, and then modifying the opacity of the first layer. (BTW, when you modify alpha, the operation is no longer commutative in GIMP for some reason.)
A simple example: if A = (0,0,0,0) and B = (0.4,0,0,1.0), and C = AxB, then I would expect C to be (0,0,0,0). This is simple multiplication. But this is not how this blend is implemented in practice. In practice, C = (0.4,0,0,1.0), or C = B.
The bottom line is this: I need to figure out the formula for the multiply blend mode (which is clearly more than AxB) and then implement it in OpenCV (which should be trivial once I have the formula).
Would appreciate any insights.
Also, for reference, here are some links which show multiply blend as being simply AxB:
How does photoshop blend two images together
Wikipedia - Blend Modes
Photoshop Blend Modes
Here is an OpenCV solution based the source code of GIMP, specifically the function gimp_operation_multiply_mode_process_pixels.
NOTE
Instead of looping on all pixels it can be vectorized, but I followed the steps of GIMP.
Input images must be of type CV_8UC3 or CV_8UC4.
it supports also the opacity value, that must be in [0, 255]
in the original GIMP implementation there is also the support for a mask. It can be trivially added to the code, eventually.
This implementation is in fact not symmetrical, and reproduce your strange behaviour.
Code:
#include <opencv2\opencv.hpp>
using namespace cv;
Mat blend_multiply(const Mat& level1, const Mat& level2, uchar opacity)
{
CV_Assert(level1.size() == level2.size());
CV_Assert(level1.type() == level2.type());
CV_Assert(level1.channels() == level2.channels());
// Get 4 channel float images
Mat4f src1, src2;
if (level1.channels() == 3)
{
Mat4b tmp1, tmp2;
cvtColor(level1, tmp1, COLOR_BGR2BGRA);
cvtColor(level2, tmp2, COLOR_BGR2BGRA);
tmp1.convertTo(src1, CV_32F, 1. / 255.);
tmp2.convertTo(src2, CV_32F, 1. / 255.);
}
else
{
level1.convertTo(src1, CV_32F, 1. / 255.);
level2.convertTo(src2, CV_32F, 1. / 255.);
}
Mat4f dst(src1.rows, src1.cols, Vec4f(0., 0., 0., 0.));
// Loop on every pixel
float fopacity = opacity / 255.f;
float comp_alpha, new_alpha;
for (int r = 0; r < src1.rows; ++r)
{
for (int c = 0; c < src2.cols; ++c)
{
const Vec4f& v1 = src1(r, c);
const Vec4f& v2 = src2(r, c);
Vec4f& out = dst(r, c);
comp_alpha = min(v1[3], v2[3]) * fopacity;
new_alpha = v1[3] + (1.f - v1[3]) * comp_alpha;
if ((comp_alpha > 0.) && (new_alpha > 0.))
{
float ratio = comp_alpha / new_alpha;
out[0] = max(0.f, min(v1[0] * v2[0], 1.f)) * ratio + (v1[0] * (1.f - ratio));
out[1] = max(0.f, min(v1[1] * v2[1], 1.f)) * ratio + (v1[1] * (1.f - ratio));
out[2] = max(0.f, min(v1[2] * v2[2], 1.f)) * ratio + (v1[2] * (1.f - ratio));
}
else
{
out[0] = v1[0];
out[1] = v1[1];
out[2] = v1[2];
}
out[3] = v1[3];
}
}
Mat3b dst3b;
Mat4b dst4b;
dst.convertTo(dst4b, CV_8U, 255.);
cvtColor(dst4b, dst3b, COLOR_BGRA2BGR);
return dst3b;
}
int main()
{
Mat3b layer1 = imread("path_to_image_1");
Mat3b layer2 = imread("path_to_image_2");
Mat blend = blend_multiply(layer1, layer2, 255);
return 0;
}
I managed to sort this out. Feel free to comment with any suggested improvements.
First, I found a clue as to how to implement the multiply function in this post:
multiply blending
And here's a quick OpenCV implementation in C++.
Mat MultiplyBlend(const Mat& cvSource, const Mat& cvBackground) {
// assumption: cvSource and cvBackground are of type CV_8UC4
// formula: (cvSource.rgb * cvBackground.rgb * cvSource.a) + (cvBackground.rgb * (1-cvSource.a))
Mat cvAlpha(cvSource.size(), CV_8UC3, Scalar::all(0));
Mat input[] = { cvSource };
int from_to[] = { 3,0, 3,1, 3,2 };
mixChannels(input, 1, &cvAlpha, 1, from_to, 3);
Mat cvBackgroundCopy;
Mat cvSourceCopy;
cvtColor(cvSource, cvSourceCopy, CV_RGBA2RGB);
cvtColor(cvBackground, cvBackgroundCopy, CV_RGBA2RGB);
// A = cvSource.rgb * cvBackground.rgb * cvSource.a
Mat cvBlendResultLeft;
multiply(cvSourceCopy, cvBackgroundCopy, cvBlendResultLeft, 1.0 / 255.0);
multiply(cvBlendResultLeft, cvAlpha, cvBlendResultLeft, 1.0 / 255.0);
delete(cvSourceCopy);
// invert alpha
bitwise_not(cvAlpha, cvAlpha);
// B = cvBackground.rgb * (1-cvSource.a)
Mat cvBlendResultRight;
multiply(cvBackgroundCopy, cvAlpha, cvBlendResultRight, 1.0 / 255.0);
delete(cvBackgroundCopy, cvAlpha);
// A + B
Mat cvBlendResult;
add(cvBlendResultLeft, cvBlendResultRight, cvBlendResult);
delete(cvBlendResultLeft, cvBlendResultRight);
cvtColor(cvBlendResult, cvBlendResult, CV_RGB2RGBA);
return cvBlendResult;
}
I'm trying to make a copy of the resizing algorithm of OpenCV with bilinear interpolation in C. What I want to achieve is that the resulting image is exactly the same (pixel value) to that produced by OpenCV. I am particularly interested in shrinking and not in the magnification, and I'm interested to use it on single channel Grayscale images. On the net I read that the bilinear interpolation algorithm is different between shrinkings and enlargements, but I did not find formulas for shrinking-implementations, so it is likely that the code I wrote is totally wrong. What I wrote comes from my knowledge of interpolation acquired in a university course in Computer Graphics and OpenGL. The result of the algorithm that I wrote are images visually identical to those produced by OpenCV but whose pixel values are not perfectly identical (in particular near edges). Can you show me the shrinking algorithm with bilinear interpolation and a possible implementation?
Note: The code attached is as a one-dimensional filter which must be applied first horizontally and then vertically (i.e. with transposed matrix).
Mat rescale(Mat src, float ratio){
float width = src.cols * ratio; //resized width
int i_width = cvRound(width);
float step = (float)src.cols / (float)i_width; //size of new pixels mapped over old image
float center = step / 2; //V1 - center position of new pixel
//float center = step / src.cols; //V2 - other possible center position of new pixel
//float center = 0.099f; //V3 - Lena 512x512 lower difference possible to OpenCV
Mat dst(src.rows, i_width, CV_8UC1);
//cycle through all rows
for(int j = 0; j < src.rows; j++){
//in each row compute new pixels
for(int i = 0; i < i_width; i++){
float pos = (i*step) + center; //position of (the center of) new pixel in old map coordinates
int pred = floor(pos); //predecessor pixel in the original image
int succ = ceil(pos); //successor pixel in the original image
float d_pred = pos - pred; //pred and succ distances from the center of new pixel
float d_succ = succ - pos;
int val_pred = src.at<uchar>(j, pred); //pred and succ values
int val_succ = src.at<uchar>(j, succ);
float val = (val_pred * d_succ) + (val_succ * d_pred); //inverting d_succ and d_pred, supposing "d_succ = 1 - d_pred"...
int i_val = cvRound(val);
if(i_val == 0) //if pos is a perfect int "x.0000", pred and succ are the same pixel
i_val = val_pred;
dst.at<uchar>(j, i) = i_val;
}
}
return dst;
}
Bilinear interpolation is not separable in the sense that you can resize vertically and the resize again vertically. See example here.
You can see OpenCV's resize code here.
All,
I have a basic question that I am struggling with here. When you look at the findmyicone sample code from WWDC 2010, you will see this:
static const uint8_t orangeColor[] = {255, 127, 0};
uint8_t referenceColor[3];
// Remove luminance
static inline void normalize( const uint8_t colorIn[], uint8_t colorOut[] ) {
// Dot product
int sum = 0;
for (int i = 0; i < 3; i++)
sum += colorIn[i] / 3;
for (int j = 0; j < 3; j++)
colorOut[j] = (float) ((colorIn[j] / (float) sum) * 255);
}
And then it is called:
normalize(orangeColor, referenceColor);
Running the debugger, it is converting BGRA: (Red 255, Green 127, Blue 0) to (Red 0, Green 255, Blue 0). I have looked on the web and SO to find details on luminance and dot product and there is really no information.
1- Can someone guide me on what this function is doing?
2- Can you guide me to some helpful topics/primer online as well?
Thanks again
KMB
What they're trying to do is track a particular color across variations in brightness, so they're normalizing for the luminance of the color. I do something similar in the fragment shader I use in a color tracking example based on a GPU Gems paper from Apple, as well as the ColorObjectTracking sample application in my GPUImage framework:
vec3 normalizeColor(vec3 color)
{
return color / max(dot(color, vec3(1.0/3.0)), 0.3);
}
vec4 maskPixel(vec3 pixelColor, vec3 maskColor)
{
float d;
vec4 calculatedColor;
// Compute distance between current pixel color and reference color
d = distance(normalizeColor(pixelColor), normalizeColor(maskColor));
// If color difference is larger than threshold, return black.
calculatedColor = (d > threshold) ? vec4(0.0) : vec4(1.0);
//Multiply color by texture
return calculatedColor;
}
The above calculation takes the average of the three color components by multiplying each channel by 1/3 and then summing them (that's what the dot product does here). It then divides each color channel by this average to arrive at a normalized color.
The distance between this normalized color and the target one is calculated, and if it is within a certain threshold the pixel is marked as being of that color.
This is just one way of determining proximity of one color to another. Another way is to convert the RGB values into Y, Cr, and Cb (Y, U, and V) components and then take the distance between just the chrominance portions (Cr and Cb):
vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
vec4 textureColor2 = texture2D(inputImageTexture2, textureCoordinate2);
float maskY = 0.2989 * colorToReplace.r + 0.5866 * colorToReplace.g + 0.1145 * colorToReplace.b;
float maskCr = 0.7132 * (colorToReplace.r - maskY);
float maskCb = 0.5647 * (colorToReplace.b - maskY);
float Y = 0.2989 * textureColor.r + 0.5866 * textureColor.g + 0.1145 * textureColor.b;
float Cr = 0.7132 * (textureColor.r - Y);
float Cb = 0.5647 * (textureColor.b - Y);
float blendValue = 1.0 - smoothstep(thresholdSensitivity, thresholdSensitivity + smoothing, distance(vec2(Cr, Cb), vec2(maskCr, maskCb)));
This code is what I use in a chroma keying shader, and it's based on a similar calculation that Apple uses in one of their sample applications. Which one is best can depend on the particular situation you're facing.
I'm using Emgu.CV to perform some basic image manipulation and composition. My images are loaded as Image<Bgra,Byte>.
Question #1: When I use the Image<,>.Add() method, the images are always blended together, regardless of the alpha value. Instead I'd like them to be composited one atop the other, and use the included alpha channel to determine how the images should be blended. So if I call image1.Add(image2) any fully opaque pixels in image2 would completely cover the pixels from image1, while semi-transparent pixels would be blended based on the alpha value.
Here's what I'm trying to do in visual form. There's a city image with some "transparent holes" cut out, and a frog behind. This is what it should look like:
And this is what openCV produces.
How can I get this effect with OpenCV? And will it be as fast as calling Add()?
Question #2: is there a way to perform this composition in-place instead of creating a new image with each call to Add()? (e.g. image1.AddImageInPlace(image2) modifies the bytes of image1?)
NOTE: Looking for answers within Emgu.CV, which I'm using because of how well it handles perspective warping.
Before OpenCV 2.4 there was no support of PNGs with alpha channel.
To verify if your current version supports it, print the number of channels after loading an image that you are certain to be RGBA. If it supports, the application will output the number 4, else it will output number 3 (RGB). Using the C API you would do:
IplImage* t_img = cvLoadImage(argv[1], CV_LOAD_IMAGE_UNCHANGED);
if (!t_img)
{
printf("!!! Unable to load transparent image.\n");
return -1;
}
printf("Channels: %d\n", t_img->nChannels);
If you can't update OpenCV:
There are some posts around that try to bypass this limitation but I haven't tested them myself;
The easiest solution would be to use another API to load the image and blend it, check blImageBlending;
Another alternative, not as lightweight, is to use Qt.
If your version already supports PNGs with RGBA:
Take a look at Emulating photoshop’s blending modes in OpenCV. It implements several Photoshop blending modes and I imagine you are capable of converting that code to .Net.
EDIT:
I had to deal with this problem recently and I've demonstrated how to deal with it on this answer.
You'll have to iterate through each pixel. I'm assuming image 1 is the frog image, and image 2 is the city image, with image1 always being bigger than image2.
//to simulate image1.AddInPlace(image2)
int image2w = image2.Width;
int image2h = image2.Height;
int i,j;
var alpha;
for (i = 0; i < w; i++)
{
for (j = 0; j < h; j++)
{
//alpha=255 is opaque > image2 should be used
alpha = image2[3][j,i].Intensity;
image1[j, i]
= new Bgra(
image2[j, i].Blue * alpha + (image1[j, i].Blue * (255-alpha)),
image2[j, i].Green * alpha + (image1[j, i].Green * (255-alpha)),
image2[j, i].Red * alpha + (image1[j, i].Red * (255-alpha)));
}
}
Using Osiris's suggestion as a starting point, and having checked out alpha compositing on Wikipedia, i ended up with the following which worked really nicely for my purposes.
This was used this with Emgucv. I was hoping that the opencv gpu::AlphaComposite methods were available in Emgucv which I believe would have done the following for me, but alas the version I am using didn't appear to have them implemented.
static public Image<Bgra, Byte> Overlay( Image<Bgra, Byte> image1, Image<Bgra, Byte> image2 )
{
Image<Bgra, Byte> result = image1.Copy();
Image<Bgra, Byte> src = image2;
Image<Bgra, Byte> dst = image1;
int rows = result.Rows;
int cols = result.Cols;
for (int y = 0; y < rows; ++y)
{
for (int x = 0; x < cols; ++x)
{
// http://en.wikipedia.org/wiki/Alpha_compositing
double srcA = 1.0/255 * src.Data[y, x, 3];
double dstA = 1.0/255 * dst.Data[y, x, 3];
double outA = (srcA + (dstA - dstA * srcA));
result.Data[y, x, 0] = (Byte)(((src.Data[y, x, 0] * srcA) + (dst.Data[y, x, 0] * (1 - srcA))) / outA); // Blue
result.Data[y, x, 1] = (Byte)(((src.Data[y, x, 1] * srcA) + (dst.Data[y, x, 1] * (1 - srcA))) / outA); // Green
result.Data[y, x, 2] = (Byte)(((src.Data[y, x, 2] * srcA) + (dst.Data[y, x, 2] * (1 - srcA))) / outA); // Red
result.Data[y, x, 3] = (Byte)(outA*255);
}
}
return result;
}
A newer version, using emgucv methods. rather than a loop. Not sure it improves on performance.
double unit = 1.0 / 255.0;
Image[] dstS = dst.Split();
Image[] srcS = src.Split();
Image[] rs = result.Split();
Image<Gray, double> srcA = srcS[3] * unit;
Image<Gray, double> dstA = dstS[3] * unit;
Image<Gray, double> outA = srcA.Add(dstA.Sub(dstA.Mul(srcA)));// (srcA + (dstA - dstA * srcA));
// Red.
rs[0] = srcS[0].Mul(srcA).Add(dstS[0].Mul(1 - srcA)).Mul(outA.Pow(-1.0)); // Mul.Pow is divide.
rs[1] = srcS[1].Mul(srcA).Add(dstS[1].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[2] = srcS[2].Mul(srcA).Add(dstS[2].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[3] = outA.Mul(255);
// Merge image back together.
CvInvoke.cvMerge(rs[0], rs[1], rs[2], rs[3], result);
return result.Convert<Bgra, Byte>();
I found an interesting blog post on internet, which I think is related to what you are trying to do.
Please have a look at the Creating Overlays Method (archive.org link). You can use this idea to implement your own function to add two images in the way you mentioned above, making some particular areas in the image transparent while leaving the rest as it is.
I have optical flow stored in a 2-channel 32F matrix. I want to visualize the contents, what's the easiest way to do this?
How do I convert a CV_32FC2 to RGB with an empty blue channel, something imshow can handle? I am using OpenCV 2 C++ API.
Super Bonus Points
Ideally I would get the angle of flow in hue and the magnitude in brightness (with saturation at a constant 100%).
imshow can handle only 1-channel gray-scale and 3-4 channel BRG/BGRA images. So you need do a conversion yourself.
I think you can do something similar to:
//extraxt x and y channels
cv::Mat xy[2]; //X,Y
cv::split(flow, xy);
//calculate angle and magnitude
cv::Mat magnitude, angle;
cv::cartToPolar(xy[0], xy[1], magnitude, angle, true);
//translate magnitude to range [0;1]
double mag_max;
cv::minMaxLoc(magnitude, 0, &mag_max);
magnitude.convertTo(magnitude, -1, 1.0 / mag_max);
//build hsv image
cv::Mat _hsv[3], hsv;
_hsv[0] = angle;
_hsv[1] = cv::Mat::ones(angle.size(), CV_32F);
_hsv[2] = magnitude;
cv::merge(_hsv, 3, hsv);
//convert to BGR and show
cv::Mat bgr;//CV_32FC3 matrix
cv::cvtColor(hsv, bgr, cv::COLOR_HSV2BGR);
cv::imshow("optical flow", bgr);
cv::waitKey(0);
The MPI Sintel Dataset provides C and MatLab code for visualizing computed flow. Download the ground truth optical flow of the training set from here. The archive contains a folder flow_code containing the mentioned source code.
You can port the code to OpenCV, however, I wrote a simple OpenCV wrapper to easily use the provided code. Note that the method MotionToColor is taken from the color_flow.cpp file. Note the comments in the listing below.
// Important to include this before flowIO.h!
#include "imageLib.h"
#include "flowIO.h"
#include "colorcode.h"
// I moved the MotionToColor method in a separate header file.
#include "motiontocolor.h"
cv::Mat flow;
// Compute optical flow (e.g. using OpenCV); result should be
// 2-channel float matrix.
assert(flow.channels() == 2);
// assert(flow.type() == CV_32F);
int rows = flow.rows;
int cols = flow.cols;
CFloatImage cFlow(cols, rows, 2);
// Convert flow to CFLoatImage:
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
cFlow.Pixel(j, i, 0) = flow.at<cv::Vec2f>(i, j)[0];
cFlow.Pixel(j, i, 1) = flow.at<cv::Vec2f>(i, j)[1];
}
}
CByteImage cImage;
MotionToColor(cFlow, cImage, max);
cv::Mat image(rows, cols, CV_8UC3, cv::Scalar(0, 0, 0));
// Compute back to cv::Mat with 3 channels in BGR:
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
image.at<cv::Vec3b>(i, j)[0] = cImage.Pixel(j, i, 0);
image.at<cv::Vec3b>(i, j)[1] = cImage.Pixel(j, i, 1);
image.at<cv::Vec3b>(i, j)[2] = cImage.Pixel(j, i, 2);
}
}
// Display or output the image ...
Below is the result when using the Optical Flow code and example images provided by Ce Liu.