How to remove dotted band from text image? - image-processing

One of the problems that I am working on is to do OCR on documents. A few of the paystub document have a highlighted line with dots to differentiate important elements like Gross Pay, Net Pay, etc.
These dots give erroneous results in OCR, it considers them as ':' character and doesn't give desired results. I have tried a lot of things for image processing such as ImageMagick, etc to remove these dots. But in each case the quality of entire text data is degraded resulting in poor OCR.
ImageMagick commands that I have tried is:
convert mm150.jpg -kuwahara 3 mm2.jpg
I have also tried connected components, erosion with kernels, etc, but each method fails in some way.
I would like to know if there is some method that I should follow, or am I missing something from Image Processing capabilities.

This issue can be resolved using connectedComponentsWithStats function of opencv. I found reference for this from this question How do I remove the dots / noise without damaging the text?
I changed it a bit to fit as per my needs. And this is the code that helped me get desired output.
import cv2
import numpy as np
import sys
img = cv2.imread(sys.argv[1], 0)
_, blackAndWhite = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, 4, cv2.CV_32S)
sizes = stats[1:, -1] #get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 8: #filter small dotted regions
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
cv2.imwrite('res.jpg', res)
The output file that I got is pretty clear with dotted band removed such as it gives perfect OCR results.

Related

How to improve PyTesseract OCR Accuracy?

I am trying to improve the accuracy of an OCR I wrote. It performs well for a normal image but struggles for a noisy image.
The noisy image:
I wrote a function to remove the noise and it does remove a lot of the noise present but also diminishes the text a bit. I am only able to capture around 60% of the text. I tried adjusting the contrast, sharpness and threshold but not able to improve OCR performance.
import cv2
import pytesseract
import numpy as np
def noise_remove(image):
kernel = np.ones((1,1), np.uint8)
image = cv2.dilate(image, kernel, iterations=1)
kernel = np.ones((1,1), np.uint8)
image = cv2.erode(image, kernel, iterations=1)
image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
image = cv2.medianBlur(image, 3)
return image
img = cv2.imread('2.jpg')
img = cv2.resize(img, None, fx = 0.8, fy = 0.8)
blurImg = noise_remove(img)
hImg, wImg, _ = img.shape
text = pytesseract.image_to_string(blurImg)
print(text)
cv2.waitKey(0)
The output I get:
Result:
Little: afr aid its eat looked now. iy ye lady girl them good me make. It hardly
cousin ime always. fin shortiy village is raising we sheiting replied. She the ~
tavourabdle partiality inhabiting travelling impression pub luo. His six are
entreaties instrument acceptance unsatiable her. Athongs} as or on herself chapter
ertered carried no Sold oid ten are quit lose deal his sent. You correct how sex
several far distant believe journey parties. We shyniss enquire uncivil attied if
carried to A

Matching a template image in CV2 with a different orientation

So I'm a very experienced developer trying to get into some machine learning/neural networking code.
Essentially I need a HUGE dataset so my first problem is that I need to find a way of labelling a lot of images quickly. So take this as the example.
I was thinking I could use template matching on the main image with the image below it? So that way I would simply need to get permission to use this data and I could label it very quickly.
When using openCV(from the examples) I get some very funky results which don't find the plate in the image, it does draw boxes but not around the plate, having tested it, it gets very very close a few times, but not much, code is...
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('./image2.jpg',0)
img2 = img.copy()
template = cv.imread('./Plate2.test.png',0)
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']
for meth in methods:
img = img2.copy()
method = eval(meth)
# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)
plt.show()
The first thing is I'm guessing this isn't working because the Main image we're looking for the template on is orientated differently.
The second thing I should point out is I am NOT a Python programmer so I'm learning this also, and this is my first time touching OpenCV so I'm trying to apply what I DO understanding about object detecting to things I don't properly understand.
What I want to do is get the coordinates for a bounding box in the MAIN image from the smaller plate that way I can(with permission) create a decent dataset to train really quick - otherwise, I have to do it manually :-(
Any help would be greatly appreciated, I have a lot of examples working but this was an interesting problem I didn't find any reading on.
In my mind the steps are:
1)Find the plate and create bounding box
2)Train the dataset across as many images a possible for object detection on said plates
3) When testing the plate needs extracting from the main image and then a perspective transform applying.
4) If you wanted to, then you'd do text extraction once you've got the plate flattened out.
UPDATE:
So I tried SIFT from here the results are as follows(note this image is already in the public domain from the above website.) - not quite on target!
UPDATE 2
I've managed to cobble together a solution from an article as suggested JD in the comments, essentially it lets me label enough images to create a neural network that in turn is much better at detecting them - I'll post an update soon with the answer.

Block scheme detection in text document

I have image of text document. It includes text and block-schemes. The main problem is to detect block-schemes. I think there are two approaches to solve this task: 1) detect geometric primitive that make up the scheme; 2) detect the whole scheme.
How can I solve this task, please, give me some aproaches.
UPDATE 1
I try to detect where in document block-scheme is placed. Example is shown on the picture below. I didn't try to detect text in block-scheme.
UPDATE 2 The main problem is that i should find block-schemes in different varieties. Even part of the block-scheme.
You can either do 1) Object Detection 2) Semantic Segmentation. I would suggest segmentation because boundary extraction is crucial for your application.
I'm assuming you have the pages of the documents as images.
The following are the steps involved in projects involving segmentation.
Dataset
Collect the images of the pages required to solve you problem and do
preprocessing steps such as image resizing to bring all images in
your dataset to a common shape and to reduce the number of computations performed. Be sure to maintain variability in your samples.
Now you have to annotate the regions of the images that you are interested and mark them with a name. Here assigning a class (like classification) to certain regions of the image. You can use the following tools for this.
Labelme -- (my recommendation)
Vgg Annotation tool -- (highly portable tool written in html but has less features than labelme)
Model
You can use U-Net Model for your task. Unet Paper. It is very easy to implement but performs very robustly on most real-world tasks such as yours.
We have done something similar at work. This is the blog post. We have explained in detail the steps involved in the pipe line from the data collection stage to the results.
Literature on Document Layout Analysis.
https://arxiv.org/pdf/1804.10371.pdf -- They have used U-Net with ResNet-50 as encoder. They have achieved very good results compared to previous approaches
https://github.com/leonlulu/DeepLayout-- This is a Python implementation of page layout analysis tool using a Deep Lab v2 model which does semantic segmentation.
Conclusion
The approach presented here might seem tedious and time consuming but it is robust to variability in the documents when you are testing. Comment below if you have any questions.
I would prefer if there were more examples for the types of diagram you are searching for, but based on the example you have given, here is my attempt of solving it naively.
1) Resize image to a manageable size to improve speed and reduce operations.
2) Use morphological open to cluster all the dark objects together.
3) Binarize the dark objects.
4) Label the objects using openCV connected components. This will give us the bounding box of each region.
5) Cluster overlapping bounding box together.
6) Analyze each bounding box to find the one with diagram. Here you can apply a more sophisticated algorithm like box detection or even arrow detection but in your example, i think a simple box ratio is sufficient.
Here is the code for the implementation
import cv2
import numpy as np
# Function to fill all the bounding box
def fill_rects(image, stats):
for i,stat in enumerate(stats):
if i > 0:
p1 = (stat[0],stat[1])
p2 = (stat[0] + stat[2],stat[1] + stat[3])
cv2.rectangle(image,p1,p2,255,-1)
# image name
img_name = 'test_image.png'
# Load image file
diagram = cv2.imread(img_name,0)
diagram = cv2.blur(diagram,(5,5))
fScale = 0.25
# Make it smaller to speed up everything and easier to cluster
small_img = cv2.resize(diagram,(0,0),fx = fScale, fy = fScale)
img_h, img_w = np.shape(small_img)
# Morphological close process to cluster nearby objects
fat_img = cv2.morphologyEx(small_img,cv2.MORPH_OPEN,None,iterations = 1)
# Threshold strong signals
_, bin_img = cv2.threshold(fat_img,210,255,cv2.THRESH_BINARY_INV)
# Analyse connected components
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(bin_img)
# Cluster all the intersected bounding box together
rsmall, csmall = np.shape(small_img)
new_img1 = np.zeros((rsmall, csmall), dtype=np.uint8)
fill_rects(new_img1,stats)
# Analyse New connected components to get filled regions
num_labels_new, labels_new, stats_new, centroids_new = cv2.connectedComponentsWithStats(new_img1)
# Check for regions that satifies conditions coresponds to diagram
min_dia_width = img_w * 0.1
dia_regions = []
for i ,stat in enumerate(stats):
if i > 0:
# get basic dimensions
x,y,w,h = stat[0:4]
# calculate ratio
ratio = w / float(h)
# if condition met, save in list
if ratio < 1 and w > min_dia_width:
dia_regions.append((x/fScale,y/fScale,w/fScale,h/fScale))
# For display purpose
diagram_disp = cv2.imread(img_name)
for region in dia_regions:
x,y,w,h = region
x = int(x)
y = int(y)
w = int(w)
h = int(h)
cv2.rectangle(diagram_disp,(x,y),(x+w,y+h),(0,255,0),2)
labels_disp = np.uint8(200*labels/np.max(labels)) + 50
labels_disp2 = np.uint8(200*labels_new/np.max(labels_new)) + 50
cv2.imshow('small_img',small_img)
cv2.imshow('fat_img',fat_img)
cv2.imshow('bin_img',bin_img)
cv2.imshow("labels",labels_disp)
cv2.imshow("labels_disp2",labels_disp2)
cv2.imshow("diagram_disp",diagram_disp)
cv2.waitKey(0)
Here is the result for another type of input.

Keras: Visualize ImageDataGenerator Output

I would like to see what my ImageDataGenerator yields to my network.
Edit:
removed the channel_shift_range, accidently left it in the code
Generator
genNorm = ImageDataGenerator(rotation_range=10, width_shift_range=0.1,
height_shift_range=0.1, zoom_range=0.1, horizontal_flip=True)
Get Batches
batches = genNorm.flow_from_directory(path+'train', target_size=(224,224),
class_mode='categorical', batch_size=64)
x_batch, y_batch = next(batches)
Plot Images
for i in range (0,32):
image = x_batch[i]
plt.imshow(image.transpose(2,1,0))
plt.show()
Result
Generator Output
Is this normal or am I doing something wrong here?
The strange colors result from your channel shift. Do you really need that to augment your samples? Is a value of 10 (=very high) really what you want?
In addition to that: Another and likely more efficient way of checking what your generator yields is to set a directory with save_to_dir (parameter of flow/flow from directory function). In that you´ll find all the images that have been transformed and been delivered to your fit/flow function.
Edit:
You still somehow seem to invert your images during processing or while displaying them. I assume the original images look more like this:
Save the results of your ImageDataGenerator to a directory and compare these with the results that you see with plt.show.
Try this; change the generator as follow:
import numpy as np
def my_preprocessing_func(img):
image = np.array(img)
return image / 255
genNorm = ImageDataGenerator(rotation_range=10, width_shift_range=0.1,
height_shift_range=0.1, zoom_range=0.1, horizontal_flip=True,
preprocessing_function=my_preprocessing_func)
That worked for me,
Bruno
keras uses the image operation with Pillow backend, which comes with 'RGB' format default. So you don't need to reverse the channel axis in your plt.imshow().
Only in cv2.imread() (which is 'BGR' default), you may need plt.imshow(img[:, :, ::-1]) to display the right image.
BR

Converting Caffe model to CoreML

I am working to understand CoreML. For a starter model, I've downloaded Yahoo's Open NSFW caffemodel. You give it an image, it gives you a probability score (between 0 and 1) that the image contains unsuitable content.
Using coremltools, I've converted the model to a .mlmodel and brought it into my app. It appears in Xcode like so:
In my app, I can successfully pass an image, and the output appears as a MLMultiArray. Where I am having trouble is understanding how to use this MLMultiArray to obtain my probability score. My code is like so:
func testModel(image: CVPixelBuffer) throws {
let model = myModel()
let prediction = try model.prediction(data: image)
let output = prediction.prob // MLMultiArray
print(output[0]) // 0.9992402791976929
print(output[1]) // 0.0007597212097607553
}
For reference, the CVPixelBuffer is being resized to the required 224x224 that the model asks (I'll get into playing with Vision once I can figure this out).
The two indexes I've printed to the console do change if I provide a different image, but their scores are wildly different than the result I get if I run the model in Python. The same image passed into the model when tested in Python gives me an output of 0.16, whereas my CoreML output, per the example above, is far different (and a dictionary, unlike Python's double output) than what I'm expecting to see.
Is more work necessary to get a result like I am expecting?
It seems like you are not transforming the input image in the same way the model expects.
Most caffe models expects "mean subtracted" images as input, so does this model. If you inspect the python code provided with Yahoo's Open NSFW (classify_nsfw.py):
# Note that the parameters are hard-coded for best results
caffe_transformer = caffe.io.Transformer({'data': nsfw_net.blobs['data'].data.shape})
caffe_transformer.set_transpose('data', (2, 0, 1)) # move image channels to outermost
caffe_transformer.set_mean('data', np.array([104, 117, 123])) # subtract the dataset-mean value in each channel
caffe_transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
caffe_transformer.set_channel_swap('data', (2, 1, 0)) # swap channels from RGB to BGR
Also there is a specific way an image is resized to 256x256 and then cropped to 224x224.
To obtain exactly the same results, you'll need to transform your input image in exactly the same way on both platforms.
See this thread for additional information.

Resources