How to refine Tesseract to read numbers from a tiny images - image-processing

This question is a follow-up from this answered question. I'm using Tesseract with python to read some dates from small images. The solution provided in the link worked for most cases, but I just found out that it is not able to read the character "5".
This is the raw image I'm working with:
Following the advice provided in the former question I have pre-processed the image to get this one:
It looks nice, but Tesseract is still not able to read the first "5". It produces o MAY 2021
How can I fine-tune Tesseract, either via parameters or image pre-processing, to get the correct reading?

Since the image I small, I resized the image first. Then I binarized the grayscale image because tesseract gives more accurate outputs with binarized images.
>>> img = cv2.imread("5.jpg")
>>> img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[0, 0, 0])
>>> img = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
>>> gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>> otsu = cv2.threshold(gry,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
>>> otsu = 255-otsu
>>> pytesseract.image_to_string(otsu)
'5 MAY 2021\n\x0c'
>>> print(pytesseract.image_to_string(otsu))
5 MAY 2021
>>>

Related

How to improve PyTesseract OCR Accuracy?

I am trying to improve the accuracy of an OCR I wrote. It performs well for a normal image but struggles for a noisy image.
The noisy image:
I wrote a function to remove the noise and it does remove a lot of the noise present but also diminishes the text a bit. I am only able to capture around 60% of the text. I tried adjusting the contrast, sharpness and threshold but not able to improve OCR performance.
import cv2
import pytesseract
import numpy as np
def noise_remove(image):
kernel = np.ones((1,1), np.uint8)
image = cv2.dilate(image, kernel, iterations=1)
kernel = np.ones((1,1), np.uint8)
image = cv2.erode(image, kernel, iterations=1)
image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
image = cv2.medianBlur(image, 3)
return image
img = cv2.imread('2.jpg')
img = cv2.resize(img, None, fx = 0.8, fy = 0.8)
blurImg = noise_remove(img)
hImg, wImg, _ = img.shape
text = pytesseract.image_to_string(blurImg)
print(text)
cv2.waitKey(0)
The output I get:
Result:
Little: afr aid its eat looked now. iy ye lady girl them good me make. It hardly
cousin ime always. fin shortiy village is raising we sheiting replied. She the ~
tavourabdle partiality inhabiting travelling impression pub luo. His six are
entreaties instrument acceptance unsatiable her. Athongs} as or on herself chapter
ertered carried no Sold oid ten are quit lose deal his sent. You correct how sex
several far distant believe journey parties. We shyniss enquire uncivil attied if
carried to A

Keras Same Feature Extraction from Different Images

I'm using Keras' pre-trained model for feature extraction in two images, however they gave the same outcome (array_equal = True). I've tried other model like VGG16 and Resnet50 but the results are the same. Am I writing the code wrong or is it the limitation of pre-trained model? Is there anything I can do to extract different features? Thanks!
import cv2
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_v3 import preprocess_input
model = InceptionV3(weights='imagenet', include_top=False)
def get_img_vector(path):
im = cv2.imread(path)
im = cv2.resize(im,(224,224))
img = preprocess_input(np.expand_dims(im.copy(), axis=0))
resnet_feature = model.predict(img)
return np.array(resnet_feature)
arr1 = get_img_vector('image1.png')
arr2 = get_img_vector('image2.png')
np.array_equal(arr1,arr2)
Below are my two images:
I think that the file format png create the image loading problem. Currently cv2.imread a png file and cv2.imshow it result in a black screen, which make two images identical. Saving the file from png to jpg and trying it again.
If you run the code, you should see some warning like this,
WARNING: TensorFlow:Model was constructed with shape (None, 299, 299, 3)
for input Tensor("input_3:0", shape=(None, 299, 299, 3), dtype=float32),
but it was called on an input with incompatible shape (None, 224, 224, 3).
Change your code to
im = cv2.resize(im,(299,299))
Now about the similar features, pre-trained imagenet can classify 1000 classes and the given picture. If you decode then you'll see that both of them will give you the same output. And you'll see even for the top 5 predictions, confidence is very low, and most similar is to the image of a nematode.
[[('n01930112', 'nematode', 0.11086103), ('n03729826', 'matchstick', 0.08173305), ('n03196217', 'digital_clock', 0.034744), ('n03590841', "jack-o'-lantern", 0.017616412), ('n04286575', 'spotlight', 0.016781498)]]
However, if you want to train a model that can differentiate these two images then you can use the pre-trained models for transfer learning with your own dataset.

Deep Learning - How can I test the MNIST tutorial model on WML?

This is a "Watson Studio" related question. I've done the following Deep-Learning tutorial/experiment assistant, successfully deployed a generated CNN model to WML(WebService). Cool!
Tutorial: Single convolution layer on MNIST data
Experiment Assistant
Next, I'd like to test if the model could identify my image( MNIST ) in deployed environment, and the questions came to my mind.
What kind of input file( maybe pixel image file ) should I prepare for the model input ? How can I kick the scoring endpoint passing my image? ( I saw python code-snippet on the "Implementation" tab, but it's json example and not sure how can I pass the pixel image...)
payload_scoring = {"fields": [array_of_feature_columns], "values": [array_of_values_to_be_scored, another_array_of_values_to_be_scored]}
Any advice/suggestions highly welcomed. Thx in advance.
The model that was trained accepts an input data that is an array of 4 dimensions i.e [<batchsize>, 28, 28, 1], where 28 refers to the height and width of the image in pixels, 1 refers to the number of channels. Currently the WML online deployment and scoring service requires the payload data in the format that matches the input format of the model. So, to predict any image with this model, you must ...
convert the image to an array of [1, 28, 28, 1] dimension. Converting image to an array is explained in next section.
pre-process the image data as required by the model i.e perform (a) normalize the data (b) convert the type to float
pre-processed data must be be specified in json format with appropriate keys. This json doc will be the input payload for the scoring request for the model.
How to convert image to an array?
There are two ways.. (using python code)
a) keras python library has a MNIST dataset that has MNIST images that are converted to 28 x 28 arrays. Using the python code below, we can use this dataset to create the scoring payload.
import numpy as np
from keras.datasets import mnist
(X, y), (X_test, y_test) = mnist.load_data()
score_payload_data = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)
score_payload_data = score_payload_data.astype("float32")/255
score_payload_data = score_payload_data[2].tolist() ## we are choosing the 2nd image in the list to predict
scoring_payload = {'values': [score_payload_data]}
b) If you have an image of size 28 x 28 pixels, we can create the scoring payload using the code below.
img_file_name = "<image file name with full path>"
from scipy import misc
img = misc.imread(img_file_name)
img_to_predict = img.reshape(img.shape[0], img.shape[1], 1)/255
img_to_predict = img_to_predict.astype("float32").tolist()
scoring_payload = {"values": [img_to_predict]}

How to remove dotted band from text image?

One of the problems that I am working on is to do OCR on documents. A few of the paystub document have a highlighted line with dots to differentiate important elements like Gross Pay, Net Pay, etc.
These dots give erroneous results in OCR, it considers them as ':' character and doesn't give desired results. I have tried a lot of things for image processing such as ImageMagick, etc to remove these dots. But in each case the quality of entire text data is degraded resulting in poor OCR.
ImageMagick commands that I have tried is:
convert mm150.jpg -kuwahara 3 mm2.jpg
I have also tried connected components, erosion with kernels, etc, but each method fails in some way.
I would like to know if there is some method that I should follow, or am I missing something from Image Processing capabilities.
This issue can be resolved using connectedComponentsWithStats function of opencv. I found reference for this from this question How do I remove the dots / noise without damaging the text?
I changed it a bit to fit as per my needs. And this is the code that helped me get desired output.
import cv2
import numpy as np
import sys
img = cv2.imread(sys.argv[1], 0)
_, blackAndWhite = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, 4, cv2.CV_32S)
sizes = stats[1:, -1] #get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 8: #filter small dotted regions
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
cv2.imwrite('res.jpg', res)
The output file that I got is pretty clear with dotted band removed such as it gives perfect OCR results.

Converting Caffe model to CoreML

I am working to understand CoreML. For a starter model, I've downloaded Yahoo's Open NSFW caffemodel. You give it an image, it gives you a probability score (between 0 and 1) that the image contains unsuitable content.
Using coremltools, I've converted the model to a .mlmodel and brought it into my app. It appears in Xcode like so:
In my app, I can successfully pass an image, and the output appears as a MLMultiArray. Where I am having trouble is understanding how to use this MLMultiArray to obtain my probability score. My code is like so:
func testModel(image: CVPixelBuffer) throws {
let model = myModel()
let prediction = try model.prediction(data: image)
let output = prediction.prob // MLMultiArray
print(output[0]) // 0.9992402791976929
print(output[1]) // 0.0007597212097607553
}
For reference, the CVPixelBuffer is being resized to the required 224x224 that the model asks (I'll get into playing with Vision once I can figure this out).
The two indexes I've printed to the console do change if I provide a different image, but their scores are wildly different than the result I get if I run the model in Python. The same image passed into the model when tested in Python gives me an output of 0.16, whereas my CoreML output, per the example above, is far different (and a dictionary, unlike Python's double output) than what I'm expecting to see.
Is more work necessary to get a result like I am expecting?
It seems like you are not transforming the input image in the same way the model expects.
Most caffe models expects "mean subtracted" images as input, so does this model. If you inspect the python code provided with Yahoo's Open NSFW (classify_nsfw.py):
# Note that the parameters are hard-coded for best results
caffe_transformer = caffe.io.Transformer({'data': nsfw_net.blobs['data'].data.shape})
caffe_transformer.set_transpose('data', (2, 0, 1)) # move image channels to outermost
caffe_transformer.set_mean('data', np.array([104, 117, 123])) # subtract the dataset-mean value in each channel
caffe_transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
caffe_transformer.set_channel_swap('data', (2, 1, 0)) # swap channels from RGB to BGR
Also there is a specific way an image is resized to 256x256 and then cropped to 224x224.
To obtain exactly the same results, you'll need to transform your input image in exactly the same way on both platforms.
See this thread for additional information.

Resources