I am trying to read a handwritten form which has boxed-input.
I have run tesseract on the image but get strange results. In my understanding, I suppose the best thing to do is to detect the bounding box and minus that from the image. What's the best way to detect the box (semi-box around the character). I tried cv2.HoughLines(), but with no result.
I am new to OpenCV. It will be really helpful if someone can help me out here.
Thanks for your idea. I just realized probably i can look at counting the vertical pixels and greater than certain threshold
def get_pixel_count_in_col(img,col):
count=0
for j in range(img.shape[0]):
if(img[j,col]<255):
count=count+1
return count
def cleanup_img(img):
foundlines=[]
for i in range(img.shape[1]):
if(get_pixel_count_in_col(img,i)>img.shape[0]*0.7):
foundlines.append(i)
if(get_pixel_count_in_col(img,i-1)>img.shape[0]*0.25):
foundlines.append(i-1)
if(get_pixel_count_in_col(img,i+1)>img.shape[0]*0.25):
foundlines.append(i+1)
return np.delete(img,foundlines,1)
The resulting image makes more sense. But is there any other easy way to do this ?
It seems that your input format is quite clean and consistent. You can simply hard-code the width of each box in pixels and crop out the characters. However if the input format is not fixed then we can extend this answer to handle that as well(it would be bit expensive), so as the first attempt we would simply go with hard coding the width of boxes in pixels.
def get_image_chunks(img, size):
chunks = []
# To remove black borders
padding = 2
for i in xrange(0, img.shape[1], size):
col_start = i + padding
col_end = i + size - padding
# Slicing the numpy array.
chunks.append(img[:-padding, col_start:col_end])
return chunks
img = cv2.imread("/Users/anmoluppal/Downloads/GLUmJ.jpg", 0)
chunks = get_image_chunks(img, 42)
Outputs:
;
;
Related
I'm trying to remove the grid lines in handwriting picture. I tried to use FFT to extract the grid pattern and remove it (this is from an answer in the original question, which is closed somehow. It has more background as well.). This image shows what I am able to get currently (Illustration result):
The first line is a real image with handwriting character. Since it's taken by phone in various conditions (light, direction, etc.), the grid line might not be perfect horizontal/vertical, and the color of grid line also varies and might be close the the color of characters. I turn it to grayscale, apply fft, and use tries to use thresholding to extract the patterns (in red rectangle, the illustration is using OTSU). Then I mask the image with the thresholding pattern, and use ifft to get the result. It fails on the real image obviously.
The second line is a real image of blank grid w/o handwriting character. From this, I think 3 lines (vertical and horizontal) in the center are the patterns I care.
The third line is a synthetic image w/ perfect grid lines. It's just for reference. And after applying the same algorithm, the grid lines could be removed successfully.
The fourth line is a synthetic image w/ perfect dashed grid lines, which is closer to the grid lines on real handwriting practice paper. It's also for reference. It shows the pattern of dashed lines are actually more complicated than 3 lines in the center. With the same algorithm, the grid lines could be removed almost completely as well.
The code I use is:
def FFTCV(img):
util.Plot(img, 'Input')
print(img.shape)
if len(img.shape) == 3 and img.shape[2] == 3:
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
util.Plot(img, 'Gray')
dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
util.Plot(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]), 'fft shift')
magnitude_spectrum = np.uint8(20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])))
util.Plot(magnitude_spectrum, 'Magnitude')
_, threshold = cv.threshold(magnitude_spectrum, 0, 1, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)
# threshold = cv.adaptiveThreshold(
# magnitude_spectrum, 1, cv.ADAPTIVE_THRESH_MEAN_C, cv.THRESH_BINARY_INV, 11, 10)
# magnitude_spectrum, 1, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY_INV, 11, 10)
util.Plot(threshold, 'Threshold Mask')
fshift = dft_shift * threshold[:, :, None]
util.Plot(cv.magnitude(fshift[:,:,0],fshift[:,:,1]), 'fft shift Masked')
magnitude_spectrum = np.uint8(20*np.log(cv.magnitude(fshift[:,:,0],fshift[:,:,1])))
util.Plot(magnitude_spectrum, 'Magnitude Masked')
f_ishift = np.fft.ifftshift(fshift)
img_back = cv.idft(f_ishift)
img_back = cv.magnitude(img_back[:,:,0],img_back[:,:,1])
util.Plot(img_back, 'Back')
So I'd like to learn suggestions on how to extract the patterns for real images. Thanks very much.
I've ran in to an issue concerning generating floating point coordinates from an image.
The original problem is as follows:
the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.
At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.
Input:
thinning:
So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.
The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.
Horrible code for this:
Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);
int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();
for(int i = 0; i < roi_buff.length; i++)
{
if(roi_buff[i] != 0)
{
Point tempP = bb.tl();
tempP.x += i%COLS;
tempP.y += i/COLS;
preArrayList.add(tempP);
}
}
Is there any alternatives or am I overlooking something?
UPDATE:
I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv).
Is this possible?
You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).
This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.
Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.
It will not be very fast this way, but it should be a start.
I have a bunch of uncompressed bitonal TIF document images. All of them have a watermark in the middle. When I run them through OCR, the text that overlaps with the watermark does not get recognized. I am trying to see if I can apply some type of cleanup to remove those watermarks to be able to recognize the missing text.
Again, the images are black and white, but when you look at the watermark it appears grey since it has a pattern of black and white pixels that makes the letters in the watermark less "dense" than regular text. At the same time, the watermark letters are very big, much bigger than the regular text.
An example of a somewhat similar image is this (except this one is color and the watermark characters in my case are a lot thicker and bigger; my watermarks are also a lot shorter: only 3 to 4 letters long)
It seems that there might be some sort of clean up filter that would be similar to removing large black borders from an image except borders are ually "denser" than a watermark so they appear "more black".
I have 3 tools at my disposal: GIMP, ImageMagick and IrfanView. Can you recommend any specific features of any subset of these tools that might help me?
Playing with contrast etc did not help, but I found a different way. As stated above, the regular text is a lot "denser" than the watermark text meaning that a regular black pixel has more surrounding black pixels than a watermark black pixel. So I devised a simple window-based filtering and thresholding algorithm.
Here's how I did it in Matlab, using a 5X5 window:
im=imread('imageWithWmark.tif');
imInv = ~im;
nr=size(imInv,1);
nc=size(imInv,2);
d = 2; % for 5X5 window
counts = zeros(nr,nc);
for rr = d+1 : nr-d-1
for cc = d+1 : nc-d-1
counts(rr,cc) = nnz(imInv(rr-d:rr+d,cc-d:cc+d));
end
end
thresh=10; % 10 out of 25 -- the larger the thresh the thinner the resulting letters are
imThresh = (counts>=thresh) & imInv;
imwrite(~imThresh,sprintf('Thresh_%d.tif',thresh),'Compression','none','Resolution',300);
Of course, the size of the window, the threshold and other parameters depend on the parameters of the regular text on the page (letter bigger/smaller, thicker/thinner etc) but even this initial version worked pretty well
i have a 128x128 array of elevation data (elevations from -400m to 8000m are displayed using 9 colors) and i need to resize it to 512x512. I did it with bicubic interpolation, but the result looks weird. In the picture you can see original, nearest and bicubic. Note: only the elevation data are interpolated not the colors themselves (gamut is preserved). Are those artifacts seen on the bicubic image result of my bad interpolation code or they are caused by the interpolating of discrete (9 steps) data?
http://i.stack.imgur.com/Qx2cl.png
There must be something wrong with the bicubic code you're using. Here's my result with Python:
The black border around the outside is where the result was outside of the palette due to ringing.
Here's the program that produced the above:
from PIL import Image
im = Image.open(r'c:\temp\temp.png')
# convert the image to a grayscale with 8 values from 10 to 17
levels=((0,0,255),(1,255,0),(255,255,0),(255,0,0),(255,175,175),(255,0,255),(1,255,255),(255,255,255))
img = Image.new('L', im.size)
iml = im.load()
imgl = img.load()
colormap = {}
for i, color in enumerate(levels):
colormap[color] = 10 + i
width, height = im.size
for y in range(height):
for x in range(width):
imgl[x,y] = colormap[iml[x,y]]
# resize using Bicubic and restore the original palette
im4x = img.resize((4*width, 4*height), Image.BICUBIC)
palette = []
for i in range(256):
if 10 <= i < 10+len(levels):
palette.extend(levels[i-10])
else:
palette.extend((i, i, i))
im4x.putpalette(palette)
im4x.save(r'c:\temp\temp3.png')
Edit: Evidently Python's Bicubic isn't the best either. Here's what I was able to do by hand in Paint Shop Pro, using roughly the same procedure as above.
While bicubic interpolation can sometimes generate interpolating values outside the original range (can you verify if this is happening to you?) It really seems like you may have a bug, but it is hard to say without looking at the code. As a general rule the bicubic solution should be smoother than the nearest neighbor solution.
Edit: I take that back, I see no interpolating values outside the original range in your images. Still, I think the strange part is the "jaggedness" you get when using bicubic, you may want to double check that.
I want get a small image of every word in a lot of scanned books (that is in Persian (Arabic-script)).
I have no experiment in image prossessing.
How can I do that in most efficient way?
I suggest you write a script in MATLAB something like this.
a : half of the maximum distance between the letters.(in pixels)
b : half of the minimum distance between the words.(in pixels)
(lets hope a < b )
Threshold the scanned image of the page.
I(I < Th) = 0;I(I > Th) = 1;
Choose 'Th' by experimenting. You should get a binary image 'I' having 1's where letters are.
Dilate the image.
imdilate(I,a);
This will connect the letters together.
Remove noise.
I = bwareaopen(I,n);
this will remove all connected components with less that n pixels.
Do connected component analysis.
CC = bwconncomp(I);
Rect = regionprops(I,'BoundingBox');
This will return a list of co-ordinates of a rectangle containing a single word.
Extract the sub-matrix from original copy and write the image using imwrite().