Hi everybody I'm using pytesseract and tesseract-ocr-w32-setup-v5.0.0-alpha and pytesseract in python 3.8, I did this code to try to recognize 5 alphanumeric characters:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(r'imagenes/captcha.JPG', lang='spa', config='psm 10')
if len(text)>5:
text = text[0:5]
print(text)
The problem is that it didn't work correctly. In the case of this image
It returns swQgy. I read about some settings: for example, they can be disabled by setting both of the configuration variables load_system_dawg and load_freq_dawg to false in case to avoid dictionary, but I don't know how to do this. In addition, I'm not sure if I can say the lenght of the captcha and avoiding confusion with lines distractions. Thanks in advance.
UPDATE:
I have an update, I could solve the confusion pre procesing the image with cv2. Now my problem is with the letter z, it confuses with the number 2. This is my new code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
image = cv2.imread('captcha.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.medianBlur(image, 3)
text = pytesseract.image_to_string(image, lang='spa', config='--oem 1 --psm 6')
if len(text)>5:
text = text[0:5]
print(text)
Is possible to tune much better? I'm new in pytesseract. This is my new captcha and the result:
Related
I am trying to develop a python script which can read numbers from pictures, to be more exact I am trying to get the gas consumption. The numbers' locations are always the same. There are two "types" of pics, bright and dark. (I am taking photos every 10 mins so I have a lot of examples if needed)
I would like to get as a result 8 digits. e.g. 10974748 (from the dark pic)
I am mainly using Pytesseract and OpenCV2.
So far the best solution seemes to be that first I crop the needed part of the picture than I use pytesseract.image_to_string() with config = --psm 7. But unfortunately it is really not a reliable solution, it can not recognize the same digit combinations when there were no consumption but photos were taken.
import cv2
import numpy as np
import os
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
directory = r"C:\Users\user\Desktop\test_pcs\test"
for image in os.listdir(directory):
OriginalImagePath = os.path.join(directory, image)
OriginalImage = cv2.imread(OriginalImagePath)
x_start, y_start = int(1110), int(445)
x_end, y_end = int(1690), int(520)
cropped_image = OriginalImage[y_start:y_end, x_start:x_end]
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)
print(text + " " + OriginalImagePath)
cv2.destroyAllWindows()
After that I tried using thresholding, but sadly I get worse results than with the simple image_to_string. Adaptive thresholding gives an output image which seems not that bad but tesseract can't read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv.imread(r"C:\Users\user\Desktop\test_pcs\new2\2022-10-30_14-49-30.jpg",0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
#'Adaptive Mean Thresholding'
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
#'Adaptive Gaussian Thresholding'
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
images = [img, th2, th3]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.show()
x_start, y_start = int(1110), int(450)
x_end, y_end = int(1690), int(520)
cropped_image = th2[y_start:y_end, x_start:x_end]
plt.imshow(cropped_image,'gray')
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
print("digits: " + text)
I also tried to read the digits character by character but it failed as well.
Now I am trying to get better pictures somehow but the options are quite limited.
I would be greateful for any suggestions as I am doing this for my thesis.
I have been trying out Tesseract OCR in combination with Open CV (EMGUCV C#) and I am trying to improve the reliability, one the whole it's been good and by apply various filters one at a time and attempting OCR (Orignal, Bilateral, AdaptiveThreshold, Dilate) I have seem significant improvement.
However...
The following image is being stubborn, despite seeming quite clear to being with, I get no results from Tesseract (orignal image before filters):
In this case I am simply after the 2.57
Instead of using filter on the image, scaling the image did helps on the OCR. Below is the code i tried. sorry i am using linux, i test with python instead of C#
#!/usr/bin/env python3
import argparse
import cv2
import numpy as np
from PIL import Image
import pytesseract
import os
from PIL import Image, ImageDraw, ImageFilter
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
img = cv2.imread(args["image"])
#OCR
barroi = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
scale_percent = 8 # percent of original size
width = int(barroi.shape[1] * scale_percent / 100)
height = int(barroi.shape[0] * scale_percent / 100)
dim = (width, height)
barroi = cv2.resize(barroi, dim, interpolation = cv2.INTER_AREA)
text = pytesseract.image_to_string(barroi, lang='eng', config='--psm 10 --oem 3')
print(str(text))
imageName = "Result.tif"
cv2.imwrite(imageName, img)
This question already has answers here:
What does OpenCV's cvWaitKey( ) function do?
(9 answers)
what does waitKey (30) mean in OpenCV? [duplicate]
(1 answer)
Closed 2 years ago.
I am trying to denoise multiple gray-scaled text images from a folder. I have converted all the images into gray-scale already. All I want is to remove noise or blurriness from all the images without changing text. For this, I am using opencv in order to remove blurriness or noisiness. I have written the code as shown below, when I run the code it shows no error and displays nothing.Please help me to solve this problem. I am new in image processing that's why I am confused. Here's my code...
import numpy as np
from PIL import Image
import cv2
import glob
src_path = r"C:\Users\usama\Documents\FYP-Data\FYP Project Data\grayscale images\*.png" #images folder path
def get_string(src_path):
for filename in glob.glob(src_path):
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
cv2.imwrite(src_path + "filename", img)
You should narrow down the files you load in. This I prefer to do with glob which allows for easy regular expression patterns when searching for files. I would expect that either you get to a file that is not an image but still loaded or that you are missing a cv2.waitKey(0) to exit the view.
import cv2
from glob import glob
files = glob('*.jpg')
for filename in glob('*.jpg'):
img = cv2.imread(filename)
bilateral_blur = cv2.bilateralFilter(img, 9, 75, 75)
cv2.imshow('denoised_images', bilateral_blur)
cv2.waitKey(0)
I am getting a strange error saving a tiff file (stack grayscale), any idea?:
File
"C:\Users\ptyimg_np.MT00200169\Anaconda3\lib\site-packages\tifffile\tifffile.py",
line 1241, in save
sampleformat = {'u': 1, 'i': 2, 'f': 3, 'c': 6}[datadtype.kind] KeyError: 'b'
my code is
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from scipy import ndimage
from skimage import img_as_float
from skimage import exposure,io
from skimage import external
from skimage.color import rgb2gray
from skimage.filters import threshold_local , threshold_niblack
import numpy as np
import tifffile
from joblib import Parallel, delayed
import sys
# Load an example image
input_namefile = sys.argv[1]
output_namefile = 'seg_'+ input_namefile
#Settings
block_size = 25 #Size block of the local thresholding
img = io.imread(input_namefile, plugin='tifffile')
thresh = threshold_niblack(img, window_size=block_size , k=0.8) #
res = img > thresh
res = np.asanyarray(res)
print("saving segmentation")
tifffile.imsave(output_namefile, res , photometric='minisblack' )
It looks like the error is caused by a bug in writing boolean images in your installed version of tifffile. However, the bug has been fixed in more recent versions (I have 2020.2.16 in my current environment). On my machine, this works fine:
import numpy as np
import tifffile
tifffile.imsave('test.tiff', np.random.random((10, 10)) > 0.5)
and the line causing a crash in your version is never executed in the case of a boolean image.
So, long story short, use python -m pip install -U tifffile to upgrade your version of tifffile, and your program should work!
Some analysis first. The offending line:
sampleformat = {'u': 1, 'i': 2, 'f': 3, 'c': 6}[datadtype.kind]
is causing a KeyError exception because the value of datadtype.kind (the NumPy datatype) is set to b and there is no b in that dictionary. It only caters for types i, u, f, and c (respectively, signed integer, unsigned integer, floating-point, and complex floating-point). Type b is boolean.
This looks like a bug in the code that you're using. If it's something that's not supported, the code should really catch the exception and report on it in a more user-friendly manner rather than just dumping an exception for you to figure out.
My advice is to raise this as a bug with the author.
In terms of the root cause of the issue (this is speculation based on analysis, so could be wrong, I'm just providing it as a possible cause), an examination of your code shows:
img = io.imread(input_namefile, plugin='tifffile')
thresh = threshold_niblack(img, window_size=block_size , k=0.8) #
res = img > thresh
res = np.asanyarray(res)
tifffile.imsave(output_namefile, res , photometric='minisblack' )
That third line above will set res to a either a boolean value or a boolean array that depends on the respective values of each pixel in img and thresh (I don't know enough about NumPy to pontificate on this).
However, regardless of that, they are one or more booleans so, when you try to write them with the imsave() call, it complains about the type being used (as mentioned above, it appears to not cater for boolean values correrctly).
Based on some sample code found elsewhere:
image = data.coins()
mask = image > 128
masked_image = image * mask
I suspect that you should use something similar to that last line to apply the mask to the image, then write the resultant value:
img = io.imread(input_namefile, plugin='tifffile')
thresh = threshold_niblack(img, window_size=block_size , k=0.8)
mask = image > 128 # <-- unsure if this is needed.
res = img * thresh # <-- add this line.
res = np.asanyarray(res)
tifffile.imsave(output_namefile, res , photometric='minisblack' )
Applying the mask to the original image should give you an array of usable values that you can write back out to an image file. Note that I'm unsure whether you need the res > thresh line since it appears to me that the threshold already gives you a mask. I could be wrong on that point so my advice is still to raise it with the author.
I'm using Tesseract to recognize numbers from images of a screen taken with a phone camera. I've done some preprocessing of the image: processed image, and using Tesseract, I'm able to get some mixed results. Using the following code on the above images, I get the following output: "EOE". However, with this image, processed image, I get an exact match: "39:45.8"
import cv2
import pytesseract
from PIL import Image, ImageEnhance
from matplotlib import pyplot as plt
orig_name = "time3.jpg";
image_name = "time3_.jpg";
img = cv2.imread(orig_name, 0)
img = cv2.medianBlur(img, 5)
img_th = cv2.adaptiveThreshold(img, 255,\
cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imshow('image', img_th)
cv2.waitKey(0)
cv2.imwrite(image_name, img_th)
im = Image.open(image_name)
time = pytesseract.image_to_string(im, config = "-psm 7")
print(time)
Is there anything I can do to get more consistent results?
I did three additional things to get it correct for the first Image.
You can set a whitelist for Tesseract. In your case we know that
there will only charachters from this List 01234567890.:. This
improves the accuracy significantly.
I resized the image to make it easier for tesseract.
I switched from psm mode 7 to 11 (Recoginze as much as possible)
Code:
import cv2
import pytesseract
from PIL import Image, ImageEnhance
orig_name = "./time1.jpg";
img = cv2.imread(orig_name)
height, width, channels = img.shape
imgResized = cv2.resize(img, ( width*3, height*3))
cv2.imshow("img",imgResized)
cv2.waitKey()
im = Image.fromarray(imgResized)
time = pytesseract.image_to_string(im, config ='--tessdata-dir "/home/rvq/github/tesseract/tessdata/" -c tessedit_char_whitelist=01234567890.: -psm 11 -oem 0')
print(time)
Note:
You can use Image.fromarray(imgResized) to convert an opencv image to a PIL Image. You don't have to write to disk and read it again.