I have questions in the ‘Noise region definition’ and ‘Noise generation process’ of the paper “A character degradation model for grayscale ancient document images”.
In Noise region definition, g controls the flatness of the regions. What does it exactly mean? How can we say that a noise region is flatter compared to another noise region?
Below is the illustration of the ellipse noise region within the document image. The green ellipse shape is defined to be the noise region.
According to paragraph below, the average value bj of its 8-neighbours (in the initial grayscale image) is used for calculating values of all pixels in the line CiBj.
Does the average value of bj calculated by averaging the greyscale value of the adjacent pixels in north, northeast, east, south east, south, south west, west and north west?
Does the formula for getting Pk is below.
Please refer to the paragraph below:
I just want to assess , if my comprehension in reading the article is right. Thanks.
"Flattening" Refers to the effect of g on flatting the ellipse. If alpha is small b, the minor axis of the ellipse, is almost equal to a, the major axis, which will make the ellipse look like a circle. If g is almost 1 the minor axis will be so small that it will look like a line, very "flat".
Yes i think you are correct. Pixel Connectivity
Yes i think you are correct. Each pixel along the line between ci and bj will be a random sample from a normal distribution. The mean of the distribution is a linear interpolation of the expected gray scale.
Related
I am looking for a workflow that would clean (and possibly straighten) old and badly scanned images of musical scores (like the one below).
I've tried to use denoise, hough filters, imagemagick geometry filters, and I am struggling to identify the series of filters that remove the scanner noise/bias.
Just some quick ideas:
Remove grayscale noise: Do a low pass filter (darks), since the music is darker than a lot of the noise. Remaining noise is mostly vertical lines.
Rotate image: Sum grayscale values for each column of the image. You'll get a vector with the total pixel lightness in that column. Use gradient descent or search on the rotation of the image (within some bounds like +/-15deg rotation) to maximize the variance of that vector. Idea here is that the vertical noise lines indicate vertical alignment, and so we want the columns of the image to align with these noise lines (= maximized variance).
Remove vertical line noise: After rotation, take median value of each column. The greater the distance (squared difference) a pixel is from that median darkness, the more confident we are it is its true color (e.g. a pure white or black pixel when vertical noise was gray). Since noise is non-white, you could try blending this distance by the whiteness of the median for an alternative confidence metric. Ideally, I think here you'd train some 7x7x2 convolution filter (2 channels being pixel value and distance from median) to estimate true value of the pixel. That would be the most minimal machine learning approach, not using some full-fledged NN. However, given your lack of training data, we'll have to come up with our own heuristic for what the true pixel value is. You likely will need to play around with it, but here's what I think might work:
Set some threshold of confidence; above that threshold we take the value as is. Below the threshold, set to white (the binary expected pixel value for the entire page).
For all values below threshold, take the max confidence value within a +/-2 pixels L1 distance (e.g. 5x5 convolution) as that pixel's value. Seems like features are separated by at least 2 pixels, but for lower resolutions that window size may need to be adjusted. Since white pixels may end up being more confident overall, you could experiment with prioritizing darker pixels (increase their confidence somehow).
Clamp the image contrast and maybe run another low pass filter.
I have an image with letters, for example like this:
It's a binary image obtained from previous image processing stages and I know boundingRect and RotatedRect of every letter, but these letters are not grouped in words yet. It is worth mentioning, that RotatedRect can be returned from minAreaRect() or fitEllipse(), what is shown here and here. In my case RotatedRects look like this:
Blue rectangles are obtained from minAreaRect and red are obtained from fitEllipse. They give a little different boxes (center, width, height, angle), but the biggest difference is in values of angle. In first option angle changes from -90 to 0 degrees , in second case angle changes from 0 to 180 degrees. My problem is: how to group these letters in words, basing on parameters of RotatedRects? I can check angle of every RotatedRect and also measure distance between centers of every two RotatedRects. With simple assumptions on direction of text and distance between letters my algorithm of grouping works. But in more complicated case I encounter a problem. For example, in the image below there are few groups of text, with different directions, different angles and distances between letters.
Problems are when letter from one word is close to letter from other word and when angle of RotatedRect inside given word is more different than the angles of its neighbours. What could be the best way to connect letters in right words?
First, you need to define metric. It may be Euclidian 3D distance for example, defined as ||delta_X,delta_y,Delta_angle|| , where delta_X and delta_Y are distances beetween rectangle centers along x and y coordinate, and Delta_angle as a distance between angular orientation.
In short, your rectangles transforms to 3D data points, with coordinates (x,y,angle).
After you define this. You can use clusetering algorithm on your data. Seems DBSCAN should work good here. Check this article for example: link it may help to choose clustering algorithm.
I extended the aforementioned metric by a few other elements related to geometric properties of letters and words (distances, angles, areas, a ratio of neighboring letters areas, etc.) and now it works fine. Thanks for the suggestion.
I am going through a paper in computer vision, and I came through this line :
the L values, or the luminance values, for these pixels are then linearly and horizontally interpolated between the pixels on the (one pixel wide) brightest column in region B, and the pixels in regions A and C.
What does linear and horizontal interpolation mean?
So I tried looking for linear interpolation, so does it mean that we average out the values of pixels which are linear to each other? As I can't see any proper definition.
Paper : http://140.118.9.222/publications/journal/haircut.pdf
Every programmer should know linear interpolation!!! Especially if you're entering the domain of image-processing.
Please read this and never ever forget about it.
https://en.wikipedia.org/wiki/Linear_interpolation
The paper describes pretty well what is going on. They synthesize skin texture by sampling the face and then interpolating between those samples. They sample 3 regions A, B and C.
They pick the brightest column of B, the left-most column of A and the right-most column of C.
Then for every row they linearly interpolate between the columns' pixels.
I know, that the Meanshift-Algorithm calculates the mean of a pixel density and checks, if the center of the roi is equal with this point. If not, it moves the new ROI center to the mean center and checks again... . Like in this picture:
For density, it is clear how to find the mean point. But it can't simply calculate the mean of a histogram and get the new position by this point. How can this algorithm work using color histogram?
The feature space in your image is 2D.
Say you have an intensity image (so it's 1D) then you would just have a line (e.g. from 0 to 255) on which the points are located. The circles shown above would just be line segments on that [0,255] line. Depending on their means, these line segments would then shift, just like the circles do in 2D.
You talked about color histograms, so I assume you are talking about RGB.
In that case your feature space is 3D, so you have a sphere instead of a line segment or circle. Your axes are R,G,B and pixels from your image are points in that 3D feature space. You then still look where the mean of a sphere is, to then shift the center towards that mean.
I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.