I am trying to develop a python script which can read numbers from pictures, to be more exact I am trying to get the gas consumption. The numbers' locations are always the same. There are two "types" of pics, bright and dark. (I am taking photos every 10 mins so I have a lot of examples if needed)
I would like to get as a result 8 digits. e.g. 10974748 (from the dark pic)
I am mainly using Pytesseract and OpenCV2.
So far the best solution seemes to be that first I crop the needed part of the picture than I use pytesseract.image_to_string() with config = --psm 7. But unfortunately it is really not a reliable solution, it can not recognize the same digit combinations when there were no consumption but photos were taken.
import cv2
import numpy as np
import os
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
directory = r"C:\Users\user\Desktop\test_pcs\test"
for image in os.listdir(directory):
OriginalImagePath = os.path.join(directory, image)
OriginalImage = cv2.imread(OriginalImagePath)
x_start, y_start = int(1110), int(445)
x_end, y_end = int(1690), int(520)
cropped_image = OriginalImage[y_start:y_end, x_start:x_end]
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)
print(text + " " + OriginalImagePath)
cv2.destroyAllWindows()
After that I tried using thresholding, but sadly I get worse results than with the simple image_to_string. Adaptive thresholding gives an output image which seems not that bad but tesseract can't read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv.imread(r"C:\Users\user\Desktop\test_pcs\new2\2022-10-30_14-49-30.jpg",0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
#'Adaptive Mean Thresholding'
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
#'Adaptive Gaussian Thresholding'
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
images = [img, th2, th3]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.show()
x_start, y_start = int(1110), int(450)
x_end, y_end = int(1690), int(520)
cropped_image = th2[y_start:y_end, x_start:x_end]
plt.imshow(cropped_image,'gray')
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
print("digits: " + text)
I also tried to read the digits character by character but it failed as well.
Now I am trying to get better pictures somehow but the options are quite limited.
I would be greateful for any suggestions as I am doing this for my thesis.
Related
Hi everybody I'm using pytesseract and tesseract-ocr-w32-setup-v5.0.0-alpha and pytesseract in python 3.8, I did this code to try to recognize 5 alphanumeric characters:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(r'imagenes/captcha.JPG', lang='spa', config='psm 10')
if len(text)>5:
text = text[0:5]
print(text)
The problem is that it didn't work correctly. In the case of this image
It returns swQgy. I read about some settings: for example, they can be disabled by setting both of the configuration variables load_system_dawg and load_freq_dawg to false in case to avoid dictionary, but I don't know how to do this. In addition, I'm not sure if I can say the lenght of the captcha and avoiding confusion with lines distractions. Thanks in advance.
UPDATE:
I have an update, I could solve the confusion pre procesing the image with cv2. Now my problem is with the letter z, it confuses with the number 2. This is my new code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
image = cv2.imread('captcha.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.medianBlur(image, 3)
text = pytesseract.image_to_string(image, lang='spa', config='--oem 1 --psm 6')
if len(text)>5:
text = text[0:5]
print(text)
Is possible to tune much better? I'm new in pytesseract. This is my new captcha and the result:
I have been trying out Tesseract OCR in combination with Open CV (EMGUCV C#) and I am trying to improve the reliability, one the whole it's been good and by apply various filters one at a time and attempting OCR (Orignal, Bilateral, AdaptiveThreshold, Dilate) I have seem significant improvement.
However...
The following image is being stubborn, despite seeming quite clear to being with, I get no results from Tesseract (orignal image before filters):
In this case I am simply after the 2.57
Instead of using filter on the image, scaling the image did helps on the OCR. Below is the code i tried. sorry i am using linux, i test with python instead of C#
#!/usr/bin/env python3
import argparse
import cv2
import numpy as np
from PIL import Image
import pytesseract
import os
from PIL import Image, ImageDraw, ImageFilter
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
img = cv2.imread(args["image"])
#OCR
barroi = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
scale_percent = 8 # percent of original size
width = int(barroi.shape[1] * scale_percent / 100)
height = int(barroi.shape[0] * scale_percent / 100)
dim = (width, height)
barroi = cv2.resize(barroi, dim, interpolation = cv2.INTER_AREA)
text = pytesseract.image_to_string(barroi, lang='eng', config='--psm 10 --oem 3')
print(str(text))
imageName = "Result.tif"
cv2.imwrite(imageName, img)
I am trying to implement face detection tutorial of openCV but my google colab kernel is crashed when following code is used:
from google.colab import files
xml = files.upload()
import numpy as np
import cv2 as cv
face_cascade = cv.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv.CascadeClassifier('haarcascade_eye.xml')
img = cv.imread('elonMusk.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
cv.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
cv.imshow('img',img)
cv.waitKey(0)
cv.destroyAllWindows()
Error displayed : Runtmie died. Automatically restarting.
All the desired xml and jpg files were uploaded.
The code used is exactly the same code as used for face detection openCV tutorial.
https://docs.opencv.org/3.4/d7/d8b/tutorial_py_face_detection.html
Google Colab is actually not designed to run opencv smoothly, so you will absolutely get an error. You should use Jupiter notebook or any other IDE.
I'm using Tesseract to recognize numbers from images of a screen taken with a phone camera. I've done some preprocessing of the image: processed image, and using Tesseract, I'm able to get some mixed results. Using the following code on the above images, I get the following output: "EOE". However, with this image, processed image, I get an exact match: "39:45.8"
import cv2
import pytesseract
from PIL import Image, ImageEnhance
from matplotlib import pyplot as plt
orig_name = "time3.jpg";
image_name = "time3_.jpg";
img = cv2.imread(orig_name, 0)
img = cv2.medianBlur(img, 5)
img_th = cv2.adaptiveThreshold(img, 255,\
cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imshow('image', img_th)
cv2.waitKey(0)
cv2.imwrite(image_name, img_th)
im = Image.open(image_name)
time = pytesseract.image_to_string(im, config = "-psm 7")
print(time)
Is there anything I can do to get more consistent results?
I did three additional things to get it correct for the first Image.
You can set a whitelist for Tesseract. In your case we know that
there will only charachters from this List 01234567890.:. This
improves the accuracy significantly.
I resized the image to make it easier for tesseract.
I switched from psm mode 7 to 11 (Recoginze as much as possible)
Code:
import cv2
import pytesseract
from PIL import Image, ImageEnhance
orig_name = "./time1.jpg";
img = cv2.imread(orig_name)
height, width, channels = img.shape
imgResized = cv2.resize(img, ( width*3, height*3))
cv2.imshow("img",imgResized)
cv2.waitKey()
im = Image.fromarray(imgResized)
time = pytesseract.image_to_string(im, config ='--tessdata-dir "/home/rvq/github/tesseract/tessdata/" -c tessedit_char_whitelist=01234567890.: -psm 11 -oem 0')
print(time)
Note:
You can use Image.fromarray(imgResized) to convert an opencv image to a PIL Image. You don't have to write to disk and read it again.
I am trying to find the number of objects in a given image using watershed segmentation. Consider for example the coins image. Here I would like to know the number of coins in the image. I implemented the code available at Scikit-image documentation and tweaked with it a little and got results similar to those displayed on the documentation page.
After looking at functions used in the code in detail I found out that ndimage.label() also returns number of unique objects found in the image (mentioned in it's documentation), but when I print that value I am getting 53 which is very high as compared to the number of coins in the actual image.
Can somebody suggest some method to find the number of objects in an image.
Here is a version of your code that counts the coins in one of two ways: a) by directly segmenting the distance image and b) by doing watershed first and rejecting tiny intersecting regions.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color, filter as filters
from scipy import ndimage
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from skimage.measure import regionprops, label
image = color.rgb2gray(io.imread('water_coins.jpg', plugin='freeimage'))
image = image < filters.threshold_otsu(image)
distance = ndimage.distance_transform_edt(image)
# Here's one way to measure the number of coins directly
# from the distance map
coin_centres = (distance > 0.8 * distance.max())
print('Number of coins (method 1):', np.max(label(coin_centres)))
# Or you can proceed with the watershed labeling
local_maxi = peak_local_max(distance, indices=False, footprint=np.ones((3, 3)),
labels=image)
markers, num_features = ndimage.label(local_maxi)
labels = watershed(-distance, markers, mask=image)
# ...but then you have to clean up the tiny intersections between coins
regions = regionprops(labels)
regions = [r for r in regions if r.area > 50]
print('Number of coins (method 2):', len(regions) - 1)
fig, axes = plt.subplots(ncols=3, figsize=(8, 2.7))
ax0, ax1, ax2 = axes
ax0.imshow(image, cmap=plt.cm.gray, interpolation='nearest')
ax0.set_title('Overlapping objects')
ax1.imshow(-distance, cmap=plt.cm.jet, interpolation='nearest')
ax1.set_title('Distances')
ax2.imshow(labels, cmap=plt.cm.spectral, interpolation='nearest')
ax2.set_title('Separated objects')
for ax in axes:
ax.axis('off')
fig.subplots_adjust(hspace=0.01, wspace=0.01, top=1, bottom=0, left=0,
right=1)
plt.show()