I am trying to improve the accuracy of an OCR I wrote. It performs well for a normal image but struggles for a noisy image.
The noisy image:
I wrote a function to remove the noise and it does remove a lot of the noise present but also diminishes the text a bit. I am only able to capture around 60% of the text. I tried adjusting the contrast, sharpness and threshold but not able to improve OCR performance.
import cv2
import pytesseract
import numpy as np
def noise_remove(image):
kernel = np.ones((1,1), np.uint8)
image = cv2.dilate(image, kernel, iterations=1)
kernel = np.ones((1,1), np.uint8)
image = cv2.erode(image, kernel, iterations=1)
image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
image = cv2.medianBlur(image, 3)
return image
img = cv2.imread('2.jpg')
img = cv2.resize(img, None, fx = 0.8, fy = 0.8)
blurImg = noise_remove(img)
hImg, wImg, _ = img.shape
text = pytesseract.image_to_string(blurImg)
print(text)
cv2.waitKey(0)
The output I get:
Result:
Little: afr aid its eat looked now. iy ye lady girl them good me make. It hardly
cousin ime always. fin shortiy village is raising we sheiting replied. She the ~
tavourabdle partiality inhabiting travelling impression pub luo. His six are
entreaties instrument acceptance unsatiable her. Athongs} as or on herself chapter
ertered carried no Sold oid ten are quit lose deal his sent. You correct how sex
several far distant believe journey parties. We shyniss enquire uncivil attied if
carried to A
Related
This question is a follow-up from this answered question. I'm using Tesseract with python to read some dates from small images. The solution provided in the link worked for most cases, but I just found out that it is not able to read the character "5".
This is the raw image I'm working with:
Following the advice provided in the former question I have pre-processed the image to get this one:
It looks nice, but Tesseract is still not able to read the first "5". It produces o MAY 2021
How can I fine-tune Tesseract, either via parameters or image pre-processing, to get the correct reading?
Since the image I small, I resized the image first. Then I binarized the grayscale image because tesseract gives more accurate outputs with binarized images.
>>> img = cv2.imread("5.jpg")
>>> img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[0, 0, 0])
>>> img = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
>>> gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>> otsu = cv2.threshold(gry,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
>>> otsu = 255-otsu
>>> pytesseract.image_to_string(otsu)
'5 MAY 2021\n\x0c'
>>> print(pytesseract.image_to_string(otsu))
5 MAY 2021
>>>
currently I am working on flower Classification dataset of kaggle which has only 210 images, with this set of image I am getting accuracy of only 11% on validation set.
enter code here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
#from tqdm import tqdm
import os
import warnings
warnings.filterwarnings('ignore')
flower_img = r'C:\Users\asus\Downloads\flower_images\flower_images'
data = pd.read_csv(r'C:\Users\asus\Downloads\flower_images\flower_labels.csv')
img = os.listdir(flower_img)[1]
image_name = [img.split('.')[-2] for img in os.listdir(flower_img)]
label_array = np.array(data['label'])
label_unique = np.unique(label_array)
names = [' phlox','rose','calendula','iris','leucanthemum maximum','bellflower','viola','rudbeckia laciniata','peony','aquilegia']
Flower_names = {}
for i in range(10):
Flower_names[i] = names[i]
print(Flower_names)
Flower_names.get(8)
x = data['label'][2]
Flower_names.get(x)
i=0
for img in os.listdir(flower_img):
#print(img)
path = os.path.join(flower_img,img)
#img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
img = cv2.imread(path)
#print(img.shape)
img = cv2.resize(img,(128,128))
data['file'][i] = np.array(img)
i+=1
data['file'][0].shape
plt.imshow(data['file'][0])
plt.show()
import keras
from keras.models import Sequential
from keras.layers import Dense,Conv2D,Activation,MaxPool2D,Dropout,Flatten
model = Sequential()
model.add(Conv2D(32,kernel_size=3,activation='relu',input_shape=(128,128,3)))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
#model.add(Conv2D(512,kernel_size=3,activation='relu'))
#model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(Dense(10,activation='softmax'))
model.add(Dropout(0.25))
from keras.optimizers import Adam
model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=0.002),metrics=['accuracy'])
model.summary()
x = np.array([i for i in data['file']]).reshape(-1,128,128,3)
y = np.array([i for i in data['label']])
from keras.utils import to_categorical
y = to_categorical(y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y)
model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10)
model.evaluate(x_test,y_test)
model.evaluate(x_train,y_train)
how can I increase accuracy only using this dataset also how can I predict classes for any input image.
Link of Flower color images dataset : https://www.kaggle.com/olgabelitskaya/flower-color-images
Your dataset size is very small. Convolutional neural networks are optimal when trained using very large data sets. You really want to have thousands of images (or more!) in your data set.
You can try to enhance your current data set by using various image processing techniques to increase the size of the data set. These techniques will take the original images, skew them, rotate them and do other modification to bolster the size of the training data. These techniques can be helpful, but increasing the natural size of the data set is preferred.
If you cannot increase the size of the dataset, you should examine why you need to use a CNN. There are other algorithms that may give better results when trained with a smaller data set. Take a look at Support Vector Machines or k-NN.
If you must use a CNN, Transfer Learning is a good solution. You can use the features from a trained model and apply them to your problem. I have had great success with this approach.
The things you can do:
Progressive resizing link
Image augmentation link
Transfer learning link
To be honest, there are much and much more techniques could be utilized to enhance the effectiveness of used data. Try to search about this topic. These ones are the ones that I remember in a minute. These ones that I've given link are just major example ones. You can dig better with a dedicated research.
So I'm a very experienced developer trying to get into some machine learning/neural networking code.
Essentially I need a HUGE dataset so my first problem is that I need to find a way of labelling a lot of images quickly. So take this as the example.
I was thinking I could use template matching on the main image with the image below it? So that way I would simply need to get permission to use this data and I could label it very quickly.
When using openCV(from the examples) I get some very funky results which don't find the plate in the image, it does draw boxes but not around the plate, having tested it, it gets very very close a few times, but not much, code is...
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('./image2.jpg',0)
img2 = img.copy()
template = cv.imread('./Plate2.test.png',0)
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']
for meth in methods:
img = img2.copy()
method = eval(meth)
# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)
plt.show()
The first thing is I'm guessing this isn't working because the Main image we're looking for the template on is orientated differently.
The second thing I should point out is I am NOT a Python programmer so I'm learning this also, and this is my first time touching OpenCV so I'm trying to apply what I DO understanding about object detecting to things I don't properly understand.
What I want to do is get the coordinates for a bounding box in the MAIN image from the smaller plate that way I can(with permission) create a decent dataset to train really quick - otherwise, I have to do it manually :-(
Any help would be greatly appreciated, I have a lot of examples working but this was an interesting problem I didn't find any reading on.
In my mind the steps are:
1)Find the plate and create bounding box
2)Train the dataset across as many images a possible for object detection on said plates
3) When testing the plate needs extracting from the main image and then a perspective transform applying.
4) If you wanted to, then you'd do text extraction once you've got the plate flattened out.
UPDATE:
So I tried SIFT from here the results are as follows(note this image is already in the public domain from the above website.) - not quite on target!
UPDATE 2
I've managed to cobble together a solution from an article as suggested JD in the comments, essentially it lets me label enough images to create a neural network that in turn is much better at detecting them - I'll post an update soon with the answer.
One of the problems that I am working on is to do OCR on documents. A few of the paystub document have a highlighted line with dots to differentiate important elements like Gross Pay, Net Pay, etc.
These dots give erroneous results in OCR, it considers them as ':' character and doesn't give desired results. I have tried a lot of things for image processing such as ImageMagick, etc to remove these dots. But in each case the quality of entire text data is degraded resulting in poor OCR.
ImageMagick commands that I have tried is:
convert mm150.jpg -kuwahara 3 mm2.jpg
I have also tried connected components, erosion with kernels, etc, but each method fails in some way.
I would like to know if there is some method that I should follow, or am I missing something from Image Processing capabilities.
This issue can be resolved using connectedComponentsWithStats function of opencv. I found reference for this from this question How do I remove the dots / noise without damaging the text?
I changed it a bit to fit as per my needs. And this is the code that helped me get desired output.
import cv2
import numpy as np
import sys
img = cv2.imread(sys.argv[1], 0)
_, blackAndWhite = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, 4, cv2.CV_32S)
sizes = stats[1:, -1] #get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 8: #filter small dotted regions
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
cv2.imwrite('res.jpg', res)
The output file that I got is pretty clear with dotted band removed such as it gives perfect OCR results.
Face registration is an important task in face recognition system . I knew how to register face by two eye-center points like this. But i don not know how to use more than two point (e.g. two eye-center, nose tip and two mouth corner) to do a face registration.
Any idea for that? Thanks in advance!
If you have a lot of points, say 68..then you can perform delaunay triangulation and then perform piecewise affine warp.
If you have much fewer than 68, say 5 or 6, then you can try least square fitting of affine or perspective transform. I believe you can use the findhomography function of opencv and then use perspectivetransform function to perform this step.
For 2D alignment - to discover the affine transform that maps a set of landmark points onto another set - you are probably best starting with the classic Procrustes Analysis.
Here someone very graciously provides a converted implementation (from Matlab) into python.
Using this, here's how I can do what I think you are after...
import procrustes as pc
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import cv2
# Open images...
target_X_img = cv2.imread('arnie1.jpg',0)
input_Y_img = cv2.imread('arnie2.jpg',0)
# Landmark points - same number and order!
# l eye, r eye, nose tip, l mouth, r mouth
X_pts = np.asarray([[61,61],[142,62],[101,104],[71,143],[140,139]])
Y_pts = np.asarray([[106,91],[147,95],[129,111],[104,130],[141,135]])
# Calculate transform via procrustes...
d,Z_pts,Tform = pc.procrustes(X_pts,Y_pts)
# Build and apply transform matrix...
# Note: for affine need 2x3 (a,b,c,d,e,f) form
R = np.eye(3)
R[0:2,0:2] = Tform['rotation']
S = np.eye(3) * Tform['scale']
S[2,2] = 1
t = np.eye(3)
t[0:2,2] = Tform['translation']
M = np.dot(np.dot(R,S),t.T).T
tr_Y_img = cv2.warpAffine(input_Y_img,M[0:2,:],(400,400))
# Confirm points...
aY_pts = np.hstack((Y_pts,np.array(([[1,1,1,1,1]])).T))
tr_Y_pts = np.dot(M,aY_pts.T).T
# Show result - input transformed and superimposed on target...
plt.figure()
plt.subplot(1,3,1)
plt.imshow(target_X_img,cmap=cm.gray)
plt.plot(X_pts[:,0],X_pts[:,1],'bo',markersize=5)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(input_Y_img,cmap=cm.gray)
plt.plot(Y_pts[:,0],Y_pts[:,1],'ro',markersize=5)
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(target_X_img,cmap=cm.gray)
plt.imshow(tr_Y_img,alpha=0.6,cmap=cm.gray)
plt.plot(X_pts[:,0],X_pts[:,1],'bo',markersize=5)
plt.plot(Z_pts[:,0],Z_pts[:,1],'ro',markersize=5) # same as...
plt.plot(tr_Y_pts[:,0],tr_Y_pts[:,1],'gx',markersize=5)
plt.axis('off')
plt.show()
All this only holds true, of course, for planar/rigid and affine transforms. As soon as you start having to cater for non-affine/perspective and deformable surfaces, well, that is a entirely different topic...