tesseract not able to read all digits accurately - opencv

I'm using Tesseract to recognize numbers from images of a screen taken with a phone camera. I've done some preprocessing of the image: processed image, and using Tesseract, I'm able to get some mixed results. Using the following code on the above images, I get the following output: "EOE". However, with this image, processed image, I get an exact match: "39:45.8"
import cv2
import pytesseract
from PIL import Image, ImageEnhance
from matplotlib import pyplot as plt
orig_name = "time3.jpg";
image_name = "time3_.jpg";
img = cv2.imread(orig_name, 0)
img = cv2.medianBlur(img, 5)
img_th = cv2.adaptiveThreshold(img, 255,\
cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imshow('image', img_th)
cv2.waitKey(0)
cv2.imwrite(image_name, img_th)
im = Image.open(image_name)
time = pytesseract.image_to_string(im, config = "-psm 7")
print(time)
Is there anything I can do to get more consistent results?

I did three additional things to get it correct for the first Image.
You can set a whitelist for Tesseract. In your case we know that
there will only charachters from this List 01234567890.:. This
improves the accuracy significantly.
I resized the image to make it easier for tesseract.
I switched from psm mode 7 to 11 (Recoginze as much as possible)
Code:
import cv2
import pytesseract
from PIL import Image, ImageEnhance
orig_name = "./time1.jpg";
img = cv2.imread(orig_name)
height, width, channels = img.shape
imgResized = cv2.resize(img, ( width*3, height*3))
cv2.imshow("img",imgResized)
cv2.waitKey()
im = Image.fromarray(imgResized)
time = pytesseract.image_to_string(im, config ='--tessdata-dir "/home/rvq/github/tesseract/tessdata/" -c tessedit_char_whitelist=01234567890.: -psm 11 -oem 0')
print(time)
Note:
You can use Image.fromarray(imgResized) to convert an opencv image to a PIL Image. You don't have to write to disk and read it again.

Related

Image recognition difficulties with OCR - reading numbers from a picture

I am trying to develop a python script which can read numbers from pictures, to be more exact I am trying to get the gas consumption. The numbers' locations are always the same. There are two "types" of pics, bright and dark. (I am taking photos every 10 mins so I have a lot of examples if needed)
I would like to get as a result 8 digits. e.g. 10974748 (from the dark pic)
I am mainly using Pytesseract and OpenCV2.
So far the best solution seemes to be that first I crop the needed part of the picture than I use pytesseract.image_to_string() with config = --psm 7. But unfortunately it is really not a reliable solution, it can not recognize the same digit combinations when there were no consumption but photos were taken.
import cv2
import numpy as np
import os
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
directory = r"C:\Users\user\Desktop\test_pcs\test"
for image in os.listdir(directory):
OriginalImagePath = os.path.join(directory, image)
OriginalImage = cv2.imread(OriginalImagePath)
x_start, y_start = int(1110), int(445)
x_end, y_end = int(1690), int(520)
cropped_image = OriginalImage[y_start:y_end, x_start:x_end]
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)
print(text + " " + OriginalImagePath)
cv2.destroyAllWindows()
After that I tried using thresholding, but sadly I get worse results than with the simple image_to_string. Adaptive thresholding gives an output image which seems not that bad but tesseract can't read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv.imread(r"C:\Users\user\Desktop\test_pcs\new2\2022-10-30_14-49-30.jpg",0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
#'Adaptive Mean Thresholding'
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
#'Adaptive Gaussian Thresholding'
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
images = [img, th2, th3]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.show()
x_start, y_start = int(1110), int(450)
x_end, y_end = int(1690), int(520)
cropped_image = th2[y_start:y_end, x_start:x_end]
plt.imshow(cropped_image,'gray')
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
print("digits: " + text)
I also tried to read the digits character by character but it failed as well.
Now I am trying to get better pictures somehow but the options are quite limited.
I would be greateful for any suggestions as I am doing this for my thesis.

how to set pytesseract to solve captcha alphanumeric and 5 length

Hi everybody I'm using pytesseract and tesseract-ocr-w32-setup-v5.0.0-alpha and pytesseract in python 3.8, I did this code to try to recognize 5 alphanumeric characters:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(r'imagenes/captcha.JPG', lang='spa', config='psm 10')
if len(text)>5:
text = text[0:5]
print(text)
The problem is that it didn't work correctly. In the case of this image
It returns swQgy. I read about some settings: for example, they can be disabled by setting both of the configuration variables load_system_dawg and load_freq_dawg to false in case to avoid dictionary, but I don't know how to do this. In addition, I'm not sure if I can say the lenght of the captcha and avoiding confusion with lines distractions. Thanks in advance.
UPDATE:
I have an update, I could solve the confusion pre procesing the image with cv2. Now my problem is with the letter z, it confuses with the number 2. This is my new code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
image = cv2.imread('captcha.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.medianBlur(image, 3)
text = pytesseract.image_to_string(image, lang='spa', config='--oem 1 --psm 6')
if len(text)>5:
text = text[0:5]
print(text)
Is possible to tune much better? I'm new in pytesseract. This is my new captcha and the result:

Tesseract Failing on reasonably clear image

I have been trying out Tesseract OCR in combination with Open CV (EMGUCV C#) and I am trying to improve the reliability, one the whole it's been good and by apply various filters one at a time and attempting OCR (Orignal, Bilateral, AdaptiveThreshold, Dilate) I have seem significant improvement.
However...
The following image is being stubborn, despite seeming quite clear to being with, I get no results from Tesseract (orignal image before filters):
In this case I am simply after the 2.57
Instead of using filter on the image, scaling the image did helps on the OCR. Below is the code i tried. sorry i am using linux, i test with python instead of C#
#!/usr/bin/env python3
import argparse
import cv2
import numpy as np
from PIL import Image
import pytesseract
import os
from PIL import Image, ImageDraw, ImageFilter
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
img = cv2.imread(args["image"])
#OCR
barroi = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
scale_percent = 8 # percent of original size
width = int(barroi.shape[1] * scale_percent / 100)
height = int(barroi.shape[0] * scale_percent / 100)
dim = (width, height)
barroi = cv2.resize(barroi, dim, interpolation = cv2.INTER_AREA)
text = pytesseract.image_to_string(barroi, lang='eng', config='--psm 10 --oem 3')
print(str(text))
imageName = "Result.tif"
cv2.imwrite(imageName, img)

Denoising multiple grayscaled text images using Opencv [duplicate]

This question already has answers here:
What does OpenCV's cvWaitKey( ) function do?
(9 answers)
what does waitKey (30) mean in OpenCV? [duplicate]
(1 answer)
Closed 2 years ago.
I am trying to denoise multiple gray-scaled text images from a folder. I have converted all the images into gray-scale already. All I want is to remove noise or blurriness from all the images without changing text. For this, I am using opencv in order to remove blurriness or noisiness. I have written the code as shown below, when I run the code it shows no error and displays nothing.Please help me to solve this problem. I am new in image processing that's why I am confused. Here's my code...
import numpy as np
from PIL import Image
import cv2
import glob
src_path = r"C:\Users\usama\Documents\FYP-Data\FYP Project Data\grayscale images\*.png" #images folder path
def get_string(src_path):
for filename in glob.glob(src_path):
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
cv2.imwrite(src_path + "filename", img)
You should narrow down the files you load in. This I prefer to do with glob which allows for easy regular expression patterns when searching for files. I would expect that either you get to a file that is not an image but still loaded or that you are missing a cv2.waitKey(0) to exit the view.
import cv2
from glob import glob
files = glob('*.jpg')
for filename in glob('*.jpg'):
img = cv2.imread(filename)
bilateral_blur = cv2.bilateralFilter(img, 9, 75, 75)
cv2.imshow('denoised_images', bilateral_blur)
cv2.waitKey(0)

google colab kernel crashes: cv.imshow('img',img) cv.waitKey(0) cv.destroyAllWindows()

I am trying to implement face detection tutorial of openCV but my google colab kernel is crashed when following code is used:
from google.colab import files
xml = files.upload()
import numpy as np
import cv2 as cv
face_cascade = cv.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv.CascadeClassifier('haarcascade_eye.xml')
img = cv.imread('elonMusk.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
cv.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
cv.imshow('img',img)
cv.waitKey(0)
cv.destroyAllWindows()
Error displayed : Runtmie died. Automatically restarting.
All the desired xml and jpg files were uploaded.
The code used is exactly the same code as used for face detection openCV tutorial.
https://docs.opencv.org/3.4/d7/d8b/tutorial_py_face_detection.html
Google Colab is actually not designed to run opencv smoothly, so you will absolutely get an error. You should use Jupiter notebook or any other IDE.

Resources