Extracting image of words from a scanned paper - image-processing

I want get a small image of every word in a lot of scanned books (that is in Persian (Arabic-script)).
I have no experiment in image prossessing.
How can I do that in most efficient way?

I suggest you write a script in MATLAB something like this.
a : half of the maximum distance between the letters.(in pixels)
b : half of the minimum distance between the words.(in pixels)
(lets hope a < b )
Threshold the scanned image of the page.
I(I < Th) = 0;I(I > Th) = 1;
Choose 'Th' by experimenting. You should get a binary image 'I' having 1's where letters are.
Dilate the image.
imdilate(I,a);
This will connect the letters together.
Remove noise.
I = bwareaopen(I,n);
this will remove all connected components with less that n pixels.
Do connected component analysis.
CC = bwconncomp(I);
Rect = regionprops(I,'BoundingBox');
This will return a list of co-ordinates of a rectangle containing a single word.
Extract the sub-matrix from original copy and write the image using imwrite().

Related

How to divide image into two parts without crossing any object using openCV?

I am using an object detection machine learning model (only 1 object). It working well in case there are a few objects in image. But, if my image has more than 300 objects, it can't recognize anything. So, I want to divide it into two parts or four parts without crossing any object.
I used threshold otsu and get this threshold otsu image. Actually I want to divide my image by this line expect image. I think my model will work well if make predictions in each part of image.
I tried to use findContour, and find contourArea bigger than a half image area, draw it into new image, get remain part and draw into another image. But most of contour area can't reach 1/10 image area. It is not a good solution.
I thought about how to detect a line touch two boundaries (top and bottom), how can I do it?
Any suggestion is appreciate. Thanks so much.
Since your region of interests are separated already, you can use connectedComponents to get the bounding boxes of these regions. My approach is below.
img = cv2.imread('circles.png',0)
img = img[20:,20:] # remove the connecting lines on the top and the left sides
_, img = cv2.threshold(img,0,1,cv2.THRESH_BINARY)
labels,stats= cv2.connectedComponentsWithStats(img,connectivity=8)[1:3]
plt.imshow(labels,'tab10')
plt.show()
As you can see, two regions of interests have different labels. All we need to do is to get the bounding boxes of these regions. But first, we have to get the indices of the regions. For this, we can use the size of the areas, because after the background (blue), they have the largest areas.
areas = stats[1:,cv2.CC_STAT_AREA] # the first index is always for the background, we do not need that, so remove the background index
roi_indices = np.flip(np.argsort(areas))[0:2] # this will give you the indices of two largest labels in the image, which are your ROIs
# Coordinates of bounding boxes
left = stats[1:,cv2.CC_STAT_LEFT]
top = stats[1:,cv2.CC_STAT_TOP]
width = stats[1:,cv2.CC_STAT_WIDTH]
height = stats[1:,cv2.CC_STAT_HEIGHT]
for i in range(2):
roi_ind = roi_indices[i]
roi = labels==roi_ind+1
roi_top = top[roi_ind]
roi_bottom = roi_top+height[roi_ind]
roi_left = left[roi_ind]
roi_right = roi_left+width[roi_ind]
roi = roi[roi_top:roi_bottom,roi_left:roi_right]
plt.imshow(roi,'gray')
plt.show()
For your information, my method is only valid for 2 regions. In order to split into 4 regions, you would need some other approach.

How can I count the number of blobs in this image?

I have an image which is a result of k-means segmentation. The code to obtain it it's here:
% Read the image and convert to L*a*b* color space
I = imread('Crop.jpg');
% h = ginput(2);
% Diameter = sqrt((h(2)-h(1))^2+(h(4)-h(3))^2);
% MeanArea = 3.14*(diameter^2)/4;
Ilab = rgb2lab(I);
% Extract a* and b* channels and reshape
ab = double(Ilab(:,:,2:3));
nrows = size(ab,1);
ncols = size(ab,2);
ab = reshape(ab,nrows*ncols,2);
% Segmentation usign k-means
nColors = 4;
[cluster_idx, cluster_center] = kmeans(ab,nColors,...
'distance', 'sqEuclidean', ...
'Replicates', 3);
% Show the result
pixel_labels = reshape(cluster_idx,nrows,ncols);
figure(1);
imshow(pixel_labels,[]), title('image labeled by cluster index');
Resulting in this picture:
Now as you can see, most of the elements are connected, so I want to count all of the blobs (besides the background one), then filter them using MeanArea, area of the elements incircle. If the blob has dimensions < MeanArea I do not count it, while if the blob has dimensions > MeanArea I want to divide its area by MeanArea to obtain the number of elements. All of this is to have a measure such that #blobs = #elements. I know that it has something to do with 'bwlabel' and 'regionprops' but I don't know how to code this since I'm a beginner, any coding help is appreciated. Thanks.
EDIT: using the 'trees' approach linked in the comments I got very bad results, so I don't think it's the right method. I don't have objects with same color as the tree example, I just have same shape.
I'm following this other approach. Color segmentation by k-means
I obtained the labeled image above, but how can I save it into a variable so that I can erode it and count the number of blobs? That's my question.
EDIT2: The original picture is this one. I'm trying to detect the number of red green and blue objects.

Eliminating various backgrounds from image and segmenting object?

Let say I have this input image, with any number of boxes. I want to segment out these boxes, so I can eventually extract them out.
input image:
The background could anything that is continuous, like a painted wall, wooden table, carpet.
My idea was that the gradient would be the same throughout the background, and with a constant gradient. I could turn where the gradient is about the same, into zero's in the image.
Through edge detection, I would dilate and fill the regions where edges detected. Essentially my goal is to make a blob of the areas where the boxes are. Having the blobs, I would know the exact location of the boxes, thus being able to crop out the boxes from the input image.
So in this case, I should be able to have four blobs, and then I would be able to crop out four images from the input image.
This is how far I got:
segmented image:
query = imread('AllFour.jpg');
gray = rgb2gray(query);
[~, threshold] = edge(gray, 'sobel');
weightedFactor = 1.5;
BWs = edge(gray,'roberts');
%figure, imshow(BWs), title('binary gradient mask');
se90 = strel('disk', 30);
se0 = strel('square', 3);
BWsdil = imdilate(BWs, [se90]);
%figure, imshow(BWsdil), title('dilated gradient mask');
BWdfill = imfill(BWsdil, 'holes');
figure, imshow(BWdfill);
title('binary image with filled holes');
What a very interesting problem! Here's my solution in an attempt to solve this problem for you. This is assuming that the background has the same colour distribution throughout. First, transform your image from RGB to the HSV colour space with rgb2hsv. The HSV colour space is an ideal transform for analyzing colours. After this, I would look at the saturation and value planes. Saturation is concerned with how "pure" the colour is, while value is the intensity or brightness of the colour itself. If you take a look at the saturation and value planes for the image, this is what is shown:
im = imread('http://i.stack.imgur.com/1SGVm.jpg');
out = rgb2hsv(im);
figure;
subplot(2,1,1);
imshow(out(:,:,2));
subplot(2,1,2);
imshow(out(:,:,3));
This is what I get:
By taking a look at some locations in the gray background, it looks like the majority of the saturation are less than 0.2 as well as the elements in the value plane are greater than 0.3. As such, we want to find the opposite of those pixels to get our objects. As such, we find those pixels whose saturation is greater than 0.2 or those pixels with a value that is less than 0.3:
seg = out(:,:,2) > 0.2 | out(:,:,3) < 0.3;
This is what we get:
Almost there! There are some spurious single pixels, so I'm going to perform an opening with imopen with a line structuring element.
After this, I'll perform a dilation with imdilate to close any gaps, then use imfill with the 'holes' option to fill in the gaps, then use erosion with imerode to shrink the shapes back to their original form. As such:
se = strel('line', 3, 90);
pre = imopen(seg, c);
se = strel('square', 20);
pre2 = imdilate(pre, se);
pre3 = imfill(pre2, 'holes');
final = imerode(pre3, se);
figure;
imshow(final);
final contains the segmented image with the 4 candy boxes. This is what I get:
Try resizing the image. When you make it smaller, it would be easier to join edges. I tried what's shown below. You might have to tune it depending on the nature of the background.
close all;
clear all;
im = imread('1SGVm.jpg');
small = imresize(im, .25); % resize
grad = (double(imdilate(small, ones(3))) - double(small)); % extract edges
gradSum = sum(grad, 3);
bw = edge(gradSum, 'Canny');
joined = imdilate(bw, ones(3)); % join edges
filled = imfill(joined, 'holes');
filled = imerode(filled, ones(3));
imshow(label2rgb(bwlabel(filled))) % label the regions and show
If you have a recent version of MATLAB, try the Color Thresholder app in the image processing toolbox. It lets you interactively play with different color spaces, to see which one can give you the best segmentation.
If your candy covers are fixed or you know all the covers that are coming into the scene then Template matching is best for this. As it is independent of the background in the image.
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html

colour detection bot

I want to create a script to automatically click on a moving target on a game.
To do this I want to check colours on the screen to determine where my target is and then click him, repeating this if he moves.
I'm not skilled at programming and there might be an easier solution to what I have proposed below:
1/Split the screen into equal tiles - size of tile should represent the in game object.
2/Loop through each pixel of each tile and create a histogram of the pixel colours.
3/If the most common recorded colour matches what we need, we MIGHT have the correct tile. Save the coords and click the object to complete task
4/Every 0.5 seconds check colour to determine if the object has moved, if it hasnt, keep clicking, if it has repeat steps 1, 2 and 3.
The step I am unsure of how to do technically is step 1. What data structure would I need for a tile? Would a 2D array suffice? Store the value of each colour in this array and then determine if it is the object. Also in pseudo how would I split the screen up into tiles to be searched? The tile issue is my main problem.
EDIT for rayryeng 2:
I will be using Python for this task. This is not my game, I just want to create a macro to automatically perform a task for me in the game. I have no code yet, I am looking more for the ideas behind making this work than actual code.
3rd edit and final code:
#!/usr/bin/python
import win32gui, win32api
#function to take in coords and return colour
def colour_return(x,y):
colours = win32gui.GetPixel(win32gui.GetDC(win32gui.GetActiveWindow()), x,y)
return colours
def click(x,y):
win32api.SetCursorPos((x,y))
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTDOWN,x,y,0,0)
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTUP,x,y,0,0)
#variable declaration
x = 1
y = 1
pixel_value = []
colour_found = 0
while x < 1600:
pixel_value = colour_return(x,y)
if pixel_value == 1844766:
click(x,y)
x=x+1
#print x
print y
if x == 1600:
y=y+1
x=1
#print tile
pixel_value = 0
This is the final code that I have produced. It works but it is incredibly slow. It takes 30 seconds seconds to search all 1600 pixels of y=1. I guess it is my method that is not working. Instead of using histograms and tiles I am now just searching for a colour and clicking the coordinates when it matches. What is the fastest method to use when searching an entire screen for a certain colour? I've seen colour detection bots that manage to keep up every second with a moving character.

OpenCV: Generating points from image after thinning

I've ran in to an issue concerning generating floating point coordinates from an image.
The original problem is as follows:
the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.
At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.
Input:
thinning:
So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.
The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.
Horrible code for this:
Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);
int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();
for(int i = 0; i < roi_buff.length; i++)
{
if(roi_buff[i] != 0)
{
Point tempP = bb.tl();
tempP.x += i%COLS;
tempP.y += i/COLS;
preArrayList.add(tempP);
}
}
Is there any alternatives or am I overlooking something?
UPDATE:
I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv).
Is this possible?
You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).
This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.
Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.
It will not be very fast this way, but it should be a start.

Resources