Related
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.
I am working on a OCR project, and in the preprocessing, some RED stamps need to be removed, so that the text near the stamps could be detected. I try a lot of methods(like change the values of pixel, threshold in Red channel) but fail.
Any suggestions are highly appreciated.
Python, C++, Java or what? Since you didn't state the OpenCV implementation you are using, I'm giving my answer in C++.
An option is to use the HSV color space to filter out the range of red values that defines the seal. My approach is to use the CMYK color space to filter everything except the black (or dark) text. It should do a pretty good job on printed media, which is your case.
//read input image:
std::string imageName = "C://opencvImages//seal.png";
cv::Mat imageInput = cv::imread( imageName );
Now, perform the CMYK conversion. OpenCV does not support this operation out of the box, bear with me as I provide the helper function at the end of this post.
//CMYK conversion:
std::vector<cv::Mat> cmyk;
cmyk = rgb2cmyk( imageInput );
//This is the Black channel:
cv::Mat blackChannel = cmyk[3].clone();
This is the image of the black channel; it is nice how everything that is not black (or dark) practically disappears!
Now, optionally, enhance the result applying brightness and contrast adjustment. Just try to separate the text from the background a little bit better; we want some defined pixel distributions to get a nice binary image.
//Brightness and contrast adjustment:
float alpha = 2.0;
float beta = -50.0;
contrastBrightnessAdjustment( blackChannel, alpha, beta );
Again, OpenCV does not offer brightness and contrast adjustment out of the box; however, its implementation is very easy. Hold on a little bit, and let me show you the result of this operation:
Nice. Let's Otsu-threshold this bad boy to get a nice binary image containing the clean text:
cv::threshold( blackChannel, binaryImage ,0, 255, cv::THRESH_OTSU );
This is what you get:
Now, the RGB to CMYK conversion function. I'm using the following implementation. The function receives an RGB image and returns a vector containing each of the CMYK channels
std::vector<cv::Mat> rgb2cmyk( cv::Mat& inputImage ){
std::vector<cv::Mat> cmyk;
for (int i = 0; i < 4; i++) {
cmyk.push_back( cv::Mat( inputImage.size(), CV_8UC1 ) );
}
std::vector<cv::Mat> inputRGB;
cv::split( inputImage, inputRGB );
for (int i = 0; i < inputImage.rows; i++)
{
for (int j = 0; j < inputImage.cols; j++)
{
float r = (int)inputRGB[2].at<uchar>(i, j) / 255.;
float g = (int)inputRGB[1].at<uchar>(i, j) / 255.;
float b = (int)inputRGB[0].at<uchar>(i, j) / 255.;
float k = std::min(std::min(1-r, 1-g), 1-b);
cmyk[0].at<uchar>(i, j) = (1 - r - k) / (1 - k) * 255.;
cmyk[1].at<uchar>(i, j) = (1 - g - k) / (1 - k) * 255.;
cmyk[2].at<uchar>(i, j) = (1 - b - k) / (1 - k) * 255.;
cmyk[3].at<uchar>(i, j) = k * 255.;
}
}
return cmyk;
}
And the contrastBrightnessAdjustment function is this, implemented using pointer arithmetic. The function receives a grayscale image and applies the linear transformation via the alpha and beta parameters:
void contrastBrightnessAdjustment( cv::Mat inputImage, float alpha, int beta ){
cv::MatIterator_<cv::Vec3b> it, end;
for (it = inputImage.begin<cv::Vec3b>(), end = inputImage.end<cv::Vec3b>(); it != end; ++it) {
uchar &pixel = (*it)[0];
pixel = cv::saturate_cast<uchar>(alpha*pixel+beta);
}
}
Currently I'm developing an Android application that involves some image processing. After some research I have made I found that is better to use Android NDK for bitmap manipulation for a good performance. So, I found some basic examples like this one:
static void myFunction(AndroidBitmapInfo* info, void* pixels){
int xx, yy, red, green, blue;
uint32_t* line;
for(yy = 0; yy < info->height; yy++){
line = (uint32_t*)pixels;
for(xx =0; xx < info->width; xx++){
//extract the RGB values from the pixel
blue = (int) ((line[xx] & 0x00FF0000) >> 16);
green = (int)((line[xx] & 0x0000FF00) >> 8);
red = (int) (line[xx] & 0x00000FF );
//change the RGB values
// set the new pixel back in
line[xx] =
((blue << 16) & 0x00FF0000) |
((green << 8) & 0x0000FF00) |
(red & 0x000000FF);
}
pixels = (char*)pixels + info->stride;
}
}
I used this code and it works very well for basic operations, but I want to make a more complex one, like a filter, where I need to access the above and below pixels from the current pixel. To be more specific, I'll give you an example: for dilation and erosion operations we move through pixels and we verify if the pixels from north west, north, north east, west, east, south west, south and south east (for 8 neighbors structure element) are object pixels. What I need to know is how can I access the values of the north and south pixels using the above code.
I'm not very familiarized with image processing using C (pointers etc.).
Thanks!
I've edit your function a little, basically to get the pixel position in the array the formula is:
position = y*width+x
static void myFunction(AndroidBitmapInfo* info, void* pixels){
int xx, yy, red, green, blue;
uint32_t* px = pixels;
for(yy = 0; yy < info->height; yy++){
for(xx =0; xx < info->width; xx++){
int position = yy*info->width+xx;//this formula gives you the address of pixel with coordinates (xx; yy) in 'px'
//extract the RGB values from the pixel
blue = (int) ((line[position] & 0x00FF0000) >> 16);
green = (int)((line[position] & 0x0000FF00) >> 8);
red = (int) (line[position] & 0x00000FF );
//change the RGB values
// set the new pixel back in
line[position] =
((blue << 16) & 0x00FF0000) |
((green << 8) & 0x0000FF00) |
(red & 0x000000FF);
//so the position of the south pixel is (yy+1)*info->width+xx
//and the one of the north is (yy-1)*info->width+xx
//the left yy*info->width+xx-1
//the right yy*info->width+xx+1
}
}
assuming you want to read/edit a pixel with coordinates x,y
you must check if 0 <= y < height and if 0 <= x < width , otherwise you may access unexisting pixels and you will have a memory access error
I would like to detect this pattern
As you can see it's basically the letter C, inside another, with different orientations. My pattern can have multiple C's inside one another, the one I'm posting with 2 C's is just a sample. I would like to detect how many C's there are, and the orientation of each one. For now I've managed to detect the center of such pattern, basically I've managed to detect the center of the innermost C. Could you please provide me with any ideas about different algorithms I could use?
And here we go! A high level overview of this approach can be described as the sequential execution of the following steps:
Load the input image;
Convert it to grayscale;
Threshold it to generate a binary image;
Use the binary image to find contours;
Fill each area of contours with a different color (so we can extract each letter separately);
Create a mask for each letter found to isolate them in separate images;
Crop the images to the smallest possible size;
Figure out the center of the image;
Figure out the width of the letter's border to identify the exact center of the border;
Scan along the border (in a circular fashion) for discontinuity;
Figure out an approximate angle for the discontinuity, thus identifying the amount of rotation of the letter.
I don't want to get into too much detail since I'm sharing the source code, so feel free to test and change it in any way you like.
Let's start, Winter Is Coming:
#include <iostream>
#include <vector>
#include <cmath>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
cv::RNG rng(12345);
float PI = std::atan(1) * 4;
void isolate_object(const cv::Mat& input, cv::Mat& output)
{
if (input.channels() != 1)
{
std::cout << "isolate_object: !!! input must be grayscale" << std::endl;
return;
}
// Store the set of points in the image before assembling the bounding box
std::vector<cv::Point> points;
cv::Mat_<uchar>::const_iterator it = input.begin<uchar>();
cv::Mat_<uchar>::const_iterator end = input.end<uchar>();
for (; it != end; ++it)
{
if (*it) points.push_back(it.pos());
}
// Compute minimal bounding box
cv::RotatedRect box = cv::minAreaRect(cv::Mat(points));
// Set Region of Interest to the area defined by the box
cv::Rect roi;
roi.x = box.center.x - (box.size.width / 2);
roi.y = box.center.y - (box.size.height / 2);
roi.width = box.size.width;
roi.height = box.size.height;
// Crop the original image to the defined ROI
output = input(roi);
}
For more details on the implementation of isolate_object() please check this thread. cv::RNG is used later on to fill each contour with a different color, and PI, well... you know PI.
int main(int argc, char* argv[])
{
// Load input (colored, 3-channel, BGR)
cv::Mat input = cv::imread("test.jpg");
if (input.empty())
{
std::cout << "!!! Failed imread() #1" << std::endl;
return -1;
}
// Convert colored image to grayscale
cv::Mat gray;
cv::cvtColor(input, gray, CV_BGR2GRAY);
// Execute a threshold operation to get a binary image from the grayscale
cv::Mat binary;
cv::threshold(gray, binary, 128, 255, cv::THRESH_BINARY);
The binary image looks exactly like the input because it only had 2 colors (B&W):
// Find the contours of the C's in the thresholded image
std::vector<std::vector<cv::Point> > contours;
cv::findContours(binary, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE);
// Fill the contours found with unique colors to isolate them later
cv::Mat colored_contours = input.clone();
std::vector<cv::Scalar> fill_colors;
for (size_t i = 0; i < contours.size(); i++)
{
std::vector<cv::Point> cnt = contours[i];
double area = cv::contourArea(cv::Mat(cnt));
//std::cout << "* Area: " << area << std::endl;
// Fill each C found with a different color.
// If the area is larger than 100k it's probably the white background, so we ignore it.
if (area > 10000 && area < 100000)
{
cv::Scalar color = cv::Scalar(rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255));
cv::drawContours(colored_contours, contours, i, color,
CV_FILLED, 8, std::vector<cv::Vec4i>(), 0, cv::Point());
fill_colors.push_back(color);
//cv::imwrite("test_contours.jpg", colored_contours);
}
}
What colored_contours looks like:
// Create a mask for each C found to isolate them from each other
for (int i = 0; i < fill_colors.size(); i++)
{
// After inRange() single_color_mask stores a single C letter
cv::Mat single_color_mask = cv::Mat::zeros(input.size(), CV_8UC1);
cv::inRange(colored_contours, fill_colors[i], fill_colors[i], single_color_mask);
//cv::imwrite("test_mask.jpg", single_color_mask);
Since this for loop is executed twice, one for each color that was used to fill the contours, I want you to see all images that were generated by this stage. So the following images are the ones that were stored by single_color_mask (one for each iteration of the loop):
// Crop image to the area of the object
cv::Mat cropped;
isolate_object(single_color_mask, cropped);
//cv::imwrite("test_cropped.jpg", cropped);
cv::Mat orig_cropped = cropped.clone();
These are the ones that were stored by cropped (by the way, the smaller C looks fat because the image is rescaled by this page to have the same size of the larger C, don't worry):
// Figure out the center of the image
cv::Point obj_center(cropped.cols/2, cropped.rows/2);
//cv::circle(cropped, obj_center, 3, cv::Scalar(128, 128, 128));
//cv::imwrite("test_cropped_center.jpg", cropped);
To make it clearer to understand for what obj_center is for, I painted a little gray circle for educational purposes on that location:
// Figure out the exact center location of the border
std::vector<cv::Point> border_points;
for (int y = 0; y < cropped.cols; y++)
{
if (cropped.at<uchar>(obj_center.x, y) != 0)
border_points.push_back(cv::Point(obj_center.x, y));
if (border_points.size() > 0 && cropped.at<uchar>(obj_center.x, y) == 0)
break;
}
if (border_points.size() == 0)
{
std::cout << "!!! Oops! No border detected." << std::endl;
return 0;
}
// Figure out the exact center location of the border
cv::Point border_center = border_points[border_points.size() / 2];
//cv::circle(cropped, border_center, 3, cv::Scalar(128, 128, 128));
//cv::imwrite("test_border_center.jpg", cropped);
The procedure above scans a single vertical line from top/middle of the image to find the borders of the circle to be able to calculate it's width. Again, for education purposes I painted a small gray circle in the middle of the border. This is what cropped looks like:
// Scan the border of the circle for discontinuities
int radius = obj_center.y - border_center.y;
if (radius < 0)
radius *= -1;
std::vector<cv::Point> discontinuity_points;
std::vector<int> discontinuity_angles;
for (int angle = 0; angle <= 360; angle++)
{
int x = obj_center.x + (radius * cos((angle+90) * (PI / 180.f)));
int y = obj_center.y + (radius * sin((angle+90) * (PI / 180.f)));
if (cropped.at<uchar>(x, y) < 128)
{
discontinuity_points.push_back(cv::Point(y, x));
discontinuity_angles.push_back(angle);
//cv::circle(cropped, cv::Point(y, x), 1, cv::Scalar(128, 128, 128));
}
}
//std::cout << "Discontinuity size: " << discontinuity_points.size() << std::endl;
if (discontinuity_points.size() == 0 && discontinuity_angles.size() == 0)
{
std::cout << "!!! Oops! No discontinuity detected. It's a perfect circle, dang!" << std::endl;
return 0;
}
Great, so the piece of code above scans along the middle of the circle's border looking for discontinuity. I'm sharing a sample image to illustrate what I mean. Every gray dot on the image represents a pixel that is tested. When the pixel is black it means we found a discontinuity:
// Figure out the approximate angle of the discontinuity:
// the first angle found will suffice for this demo.
int approx_angle = discontinuity_angles[0];
std::cout << "#" << i << " letter C is rotated approximately at: " << approx_angle << " degrees" << std::endl;
// Figure out the central point of the discontinuity
cv::Point discontinuity_center;
for (int a = 0; a < discontinuity_points.size(); a++)
discontinuity_center += discontinuity_points[a];
discontinuity_center.x /= discontinuity_points.size();
discontinuity_center.y /= discontinuity_points.size();
cv::circle(orig_cropped, discontinuity_center, 2, cv::Scalar(128, 128, 128));
cv::imshow("Original crop", orig_cropped);
cv::waitKey(0);
}
return 0;
}
Very well... This last piece of code is responsible for figuring out the approximate angle of the discontinuity as well as indicate the central point of discontinuity. The following images are stored by orig_cropped. Once again I added a gray dot to show the exact positions detected as the center of the gaps:
When executed, this application prints the following information to the screen:
#0 letter C is rotated approximately at: 49 degrees
#1 letter C is rotated approximately at: 0 degrees
I hope it helps.
For start you could use Hough transformation. This algorithm is not very fast, but it's quite robust. Especially if you have such clear images.
The general approach would be:
1) preprocessing - suppress noise, convert to grayscale / binary
2) run edge detector
3) run Hough transform - IIRC it's `cv::HoughCircles` in OpenCV
4) do some postprocessing - remove surplus circles, decide which ones correspond to shape of letter C, and so on
My approach will give you 2 hough circles per letter C. One on inner boundary, one on outer letter C. If you want only one circle per letter you can use skeletonization algoritm. More info here http://homepages.inf.ed.ac.uk/rbf/HIPR2/skeleton.htm
Given that we have nested C structures and you know the centres of the Cs and would like to evaluate the orientations- one simply needs to observe the distribution of pixels along the radius of the concentric Cs in all directions.
This can be done by performing a simple morphological dilation operation from the centre. As we reach the right radius for the innermost C, we will reach a maximum number of pixels covered for the innermost C. The difference between the disc and the C will give us the location of the gap in the whole and one can perform an ultimate erosion to get the centroid of the gap in the C. The angle between the centre and this point is the orientation of the C. This step is iterated till all Cs are covered.
This can also be done quickly using the Distance function from the centre point of the Cs.
Does anyone know how adjustment layers work in Photoshop? I need to generate a result image having a source image and HSL values from Hue/Saturation adjustment layer. Conversion to RGB and then multiplication with the source color does not work.
Or is it possible to replace Hue/Saturation Adjustment Layer with normal layers with appropriately set blending modes (Mulitiply, Screen, Hue, Saturation, Color, Luminocity,...)?
If so then how?
Thanks
I've reverse-engineered the computation for when the "Colorize" checkbox is checked. All of the code below is pseudo-code.
The inputs are:
hueRGB, which is an RGB color for HSV(photoshop_hue, 100, 100).ToRGB()
saturation, which is photoshop_saturation / 100.0 (i.e. 0..1)
lightness, which is photoshop_lightness / 100.0 (i.e. -1..1)
value, which is the pixel.ToHSV().Value, scaled into 0..1 range.
The method to colorize a single pixel:
color = blend2(rgb(128, 128, 128), hueRGB, saturation);
if (lightness <= -1)
return black;
else if (lightness >= 1)
return white;
else if (lightness >= 0)
return blend3(black, color, white, 2 * (1 - lightness) * (value - 1) + 1)
else
return blend3(black, color, white, 2 * (1 + lightness) * (value) - 1)
Where blend2 and blend3 are:
blend2(left, right, pos):
return rgb(left.R * (1-pos) + right.R * pos, same for green, same for blue)
blend3(left, main, right, pos):
if (pos < 0)
return blend2(left, main, pos + 1)
else if (pos > 0)
return blend2(main, right, pos)
else
return main
I have figured out how Lightness works.
The input parameter brightness b is in [0, 2], Output is c (color channel).
if(b<1) c = b * c;
else c = c + (b-1) * (1-c);
Some tests:
b = 0 >>> c = 0 // black
b = 1 >>> c = c // same color
b = 2 >>> c = 1 // white
However, if you choose some interval (e.g. Reds instead of Master), Lightness behaves completely differently, more like Saturation.
Photoshop, dunno. But the theory is usually: The RGB image is converted to HSL/HSV by the particular layer's internal methods; each pixel's HSL is then modified according to the specified parameters, and the so-obtained result is being provided back (for displaying) in RGB.
PaintShopPro7 used to split up the H space (assuming a range of 0..360) in discrete increments of 30° (IIRC), so if you bumped only the "yellows", i.e. only pixels whose H component was valued 45-75 would be considered for manipulation.
reds 345..15, oranges 15..45, yellows 45..75, yellowgreen 75..105, greens 105..135, etc.
if (h >= 45 && h < 75)
s += s * yellow_percent;
There are alternative possibilities, such as applying a falloff filter, as in:
/* For h=60, let m=1... and linearly fall off to h=75 m=0. */
m = 1 - abs(h - 60) / 15;
if (m < 0)
m = 0;
s += s * yellow_percent * d;
Hello I wrote colorize shader and my equation is as folows
inputRGB is the source image which should be in monochrome
(r+g+b) * 0.333
colorRGB is your destination color
finalRGB is the result
pseudo code:
finalRGB = inputRGB * (colorRGB + inputRGB * 0.5);
I think it's fast and efficient
I did translate #Roman Starkov solution to java if any one needed, but for some reason It not worked so well, then I started read a little bit and found that the solution is very simple , there are 2 things have to be done :
When changing the hue or saturation replace the original image only hue and saturation and the lightness stay as is was in the original image this blend method called 10.2.4. luminosity blend mode :
https://www.w3.org/TR/compositing-1/#backdrop
When changing the lightness in photoshop the slider indicates how much percentage we need to add or subtract to/from the original lightness in order to get to white or black color in HSL.
for example :
If the original pixel is 0.7 lightness and the lightness slider = 20
so we need more 0.3 lightness in order to get to 1
so we need to add to the original pixel lightness : 0.7 + 0.2*0.3;
this will be the new blended lightness value for the new pixel .
#Roman Starkov solution Java implementation :
//newHue, which is photoshop_hue (i.e. 0..360)
//newSaturation, which is photoshop_saturation / 100.0 (i.e. 0..1)
//newLightness, which is photoshop_lightness / 100.0 (i.e. -1..1)
//returns rgb int array of new color
private static int[] colorizeSinglePixel(int originlPixel,int newHue,float newSaturation,float newLightness)
{
float[] originalPixelHSV = new float[3];
Color.colorToHSV(originlPixel,originalPixelHSV);
float originalPixelLightness = originalPixelHSV[2];
float[] hueRGB_HSV = {newHue,100.0f,100.0f};
int[] hueRGB = {Color.red(Color.HSVToColor(hueRGB_HSV)),Color.green(Color.HSVToColor(hueRGB_HSV)),Color.blue(Color.HSVToColor(hueRGB_HSV))};
int color[] = blend2(new int[]{128,128,128},hueRGB,newSaturation);
int blackColor[] = new int[]{Color.red(Color.BLACK),Color.green(Color.BLACK),Color.blue(Color.BLACK)};
int whileColor[] = new int[]{Color.red(Color.WHITE),Color.green(Color.WHITE),Color.blue(Color.WHITE)};
if(newLightness <= -1)
{
return blackColor;
}
else if(newLightness >=1)
{
return whileColor;
}
else if(newLightness >=0)
{
return blend3(blackColor,color,whileColor, (int) (2*(1-newLightness)*(originalPixelLightness-1) + 1));
}
else
{
return blend3(blackColor,color,whileColor, (int) ((1+newLightness)*(originalPixelLightness) - 1));
}
}
private static int[] blend2(int[] left,int[] right,float pos)
{
return new int[]{(int) (left[0]*(1-pos)+right[0]*pos),(int) (left[1]*(1-pos)+right[1]*pos),(int) (left[2]*(1-pos)+right[2]*pos)};
}
private static int[] blend3(int[] left,int[] main,int[] right,int pos)
{
if(pos < 0)
{
return blend2(left,main,pos+1);
}
else if(pos > 0)
{
return blend2(main,right,pos);
}
else
{
return main;
}
}
When the “Colorize” checkbox is checked, the lightness of the underlying layer is combined with the values of the Hue and Saturation sliders and converted from HSL to RGB according to the equations at https://en.wikipedia.org/wiki/HSL_and_HSV#From_HSL . (The Lightness slider just remaps the lightness to a subset of the scale as you can see from watching the histogram; the effect is pretty awful and I don’t see why anyone would ever use it.)