pytesseract not showing text in terminal - opencv

enter image description hereI have the following code to convert char and numeric digits from an image
import cv2
import pytesseract
img=cv2.imread('/home/mubashir/jeh_Project/1.jpeg')
rgb_img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
print(pytesseract.image_to_string(rgb_img))
cv2.imshow('RGB_img',rgb_img)
cv2.waitKey(0)
I can view the image but the text is not printing in the terminal.

Related

Changing number of channels in image

I am trying to convert number of channels to 1. For example I have image and I need to resize as 98,98 and I have done with this code -->
from PIL import Image
from skimage.transform import resize
import cv2
image = Image.open('//imagepath')
new_image =image.resize((98, 98))
and I get shape as (98,98,3) but I need it in the shape like (98,98,1). I have tried with this code -->
new_image = cv2.cvtColor(new_image, cv2.COLOR_BGR2GRAY)
but I am getting error. How can I solve this?

How can I convert bounding box pixels of an image to white, and the background to black?

I have a set of images similar to this one:
And for each image, I have a text file with bounding box regions expressed in normalized pixel values, YOLOv5 format (a text document with rows of type: class, x_center, y_center, width, height). Here's an example:
3 0.1661542727623449 0.6696164480452673 0.2951388888888889 0.300925925925926
3 0.41214353459362196 0.851908114711934 0.2719907407407405 0.2961837705761321
I'd like to obtain a new dataset of masked images, where the bounding box area from the original image gets converted into white pixels, and the rest gets converted into black pixels. This would be and example of the output image:
I'm sure there is a way to do this in PIL (Pillow) in Python, but I just can't seem to find a way.
Would somebody be able to help?
Kindest thanks!
so here's the answer:
import os
import numpy as np
from PIL import Image
label=open(os.path.join(labPath, filename), 'r')
lines=label.read().split('\n')
square=np.zeros((1152,1152))
for line in lines:
if line!='':
line=line.split() #line: class, x, y, w, h
left=int((float(line[1])-0.5*float(line[3]))*1152 )
bot=int((float(line[2])+0.5*float(line[4]))*1152)
top=int(bot-float(line[4])*1152)
right=int(left+float(line[3])*1152)
square[top:bot, left:right]=255
square_img = Image.fromarray(square)
square_img=square_img.convert("L")
Let me know if you have any questions!

How to filter bigger font sizes of a text?

I've been writing a code to read a text, using opencv and tesseract on raspberry PI. It is working well, but I would like to filter only the title of the text, that is, differentiate the smallest characters from the biggest and extract only the biggest ones.
Is there any way to achieve this differentiation?
Here is the initial code:
import cv2
import pytesseract
cap = cv2.VideoCapture(0)
cap.set(3,640)
cap.set(4,480)
while True:
success, img = cap.read()
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow("Video",img)
if cv2.waitKey(1) & 0xFF ==ord('q'):
cv2.imwrite("NewPicture.jpg",img)
break
text = pytesseract.image_to_string(img, config='--oem 3 --psm 11')
print(text)
Example image
A quick search of the pytesseract documentation shows that it has:
# Get verbose data including boxes, confidences, line and page numbers
print(pytesseract.image_to_data(Image.open('test.png')))
You may get quite a bit of data using this API and the filter the size of bounding boxes.

why does reading image with cv2 has different behavior from PIL?

As show in the above image, when I read it with pillow:
from PIL import Image
label = Image.open('example.png')
print(np.unique(array(label)))
The number are within range of [0, 34], which is correct. However, when I read with cv2:
import cv2
label = cv2.imread('example.png')
print(np.unique(label))
The number are with [0, 255] which is not correct in my application. How could I align the behavior of cv2 and pil please ?
Also when I checked the matlab example code parsing this image, it is written like this:
[labels, color_mappings] = imread('example.png')
It seems that the png file has two data fields, one is the fields with values ranges from 0 to 34, and the other is the color pixels, how could I parse it with cv2?
I think Dan has the right answer, but if you want to do some "quick and dirty" testing, you could use the following code to:
convert your palette image into a single channel greyscale PGM image of the palette indices that OpenCV can read without any extra libraries, and a separate palette file that you can apply back afterwards
load back a PGM file of the indices that OpenCV may have altered, and reapply the saved palette
#!/usr/bin/env python3
import numpy as np
from PIL import Image
# Open palette image and remove pointless alpha channel
im = Image.open('image.png').convert('P')
# Extract palette and save as CSV
np.array(im.getpalette()).tofile('palette.csv',sep=',')
# Save palette indices as single channel PGM image that OpenCV can read
na = np.array(im)
im = Image.fromarray(na).save('indices.pgm')
So that will have split image.png into indices.pgm that OpenCV can read as a single channel image and palette.csv that we can reload later.
And here is the second part, where we rebuild the image from indices.pgm and palette.csv
# First load indices
im = Image.open('indices.pgm')
# Now load palette
palette = np.fromfile('palette.csv',sep=',').astype(np.uint8)
# Put palette back into image
im.putpalette(palette)
# Save
im.save('result.png')
Remember not to use any interpolation other than NEAREST_NEIGHBOUR in OpenCV else you will introduce new colours not present in the original image.
Keywords: Python, PNG, image processing, palette, palette indices, palette index

Show multiple images in same window with Python OpenCV?

I want to display original image left side and grayscale image on right side. Below is my code, I create grayscale image and create window, but I couldn't put grayscale image to right side. How can I do this?
import cv
import time
from PIL import Image
import sys
filePath = raw_input("file path: ")
filename = filePath
img = cv.LoadImage(filename)
imgGrayScale = cv.LoadImage(filename, cv.CV_LOAD_IMAGE_GRAYSCALE) # create grayscale image
imgW = img.width
imgH = img.height
cv.NamedWindow("title", cv.CV_WINDOW_AUTOSIZE)
cv.ShowImage("title", img )
cv.ResizeWindow("title", imgW * 2, imgH)
cv.WaitKey()
First concatenate the images either horizontally (across columns) or vertically (across rows) and then display it as a single image.
import numpy as np
import cv2
from skimage.data import astronaut
import scipy.misc as misc
img=cv2.cvtColor(astronaut(),cv2.COLOR_BGR2RGB)
numpy_horizontal_concat = np.concatenate((img, img), axis=1)
cv2.imshow('Numpy Horizontal Concat', numpy_horizontal_concat)
As far as I know, one window, one image. So create a new image with imgW*2 and copy the contents of the grayscale image at the region starting from (originalimage.width,0). The ROI capabilities may be helpful to you.

Resources