Implement binary threshold on my own using python but getting wrong result - image-processing

I am trying to implemnt the binary threshold on my own using python, First I have converted the image to to gray scale image. Then I have taken the max value of the gay scale image and threat that value as the threshold value. Then I put nested for loop to carry out that the condition checking. below is my code. I have taken the example Image used ion openCV website.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as im
img = im.imread('threshold.png')
plt.imshow(img)
plt.show()
R=img[:,:,0]
G=img[:,:,1]
B=img[:,:,2]
M,N=R.shape
gray_img= np.zeros((M,N))
intensity= np.zeros((M,N))
for i in range(M):
for j in range(N):
gray_img[i, j]=(R[i, j]*0.2989)+(G[i, j]*0.5870)+(B[i, j]*0.114);
t=gray_img.max()
for i in range(1, M-1):
for j in range(1,N-1):
intensity[i,j]=gray_img[i,j]
if(intensity[i,j]>t):
intensity[i,j]=1
else:
intensity[i,j]=0
plt.imshow(intensity)
plt.show()

Related

Image recognition difficulties with OCR - reading numbers from a picture

I am trying to develop a python script which can read numbers from pictures, to be more exact I am trying to get the gas consumption. The numbers' locations are always the same. There are two "types" of pics, bright and dark. (I am taking photos every 10 mins so I have a lot of examples if needed)
I would like to get as a result 8 digits. e.g. 10974748 (from the dark pic)
I am mainly using Pytesseract and OpenCV2.
So far the best solution seemes to be that first I crop the needed part of the picture than I use pytesseract.image_to_string() with config = --psm 7. But unfortunately it is really not a reliable solution, it can not recognize the same digit combinations when there were no consumption but photos were taken.
import cv2
import numpy as np
import os
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
directory = r"C:\Users\user\Desktop\test_pcs\test"
for image in os.listdir(directory):
OriginalImagePath = os.path.join(directory, image)
OriginalImage = cv2.imread(OriginalImagePath)
x_start, y_start = int(1110), int(445)
x_end, y_end = int(1690), int(520)
cropped_image = OriginalImage[y_start:y_end, x_start:x_end]
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)
print(text + " " + OriginalImagePath)
cv2.destroyAllWindows()
After that I tried using thresholding, but sadly I get worse results than with the simple image_to_string. Adaptive thresholding gives an output image which seems not that bad but tesseract can't read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv.imread(r"C:\Users\user\Desktop\test_pcs\new2\2022-10-30_14-49-30.jpg",0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
#'Adaptive Mean Thresholding'
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
#'Adaptive Gaussian Thresholding'
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
images = [img, th2, th3]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.show()
x_start, y_start = int(1110), int(450)
x_end, y_end = int(1690), int(520)
cropped_image = th2[y_start:y_end, x_start:x_end]
plt.imshow(cropped_image,'gray')
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
print("digits: " + text)
I also tried to read the digits character by character but it failed as well.
Now I am trying to get better pictures somehow but the options are quite limited.
I would be greateful for any suggestions as I am doing this for my thesis.

Is there an equivalent function or an implmentation of skimage.feature.peak_local_max in OpenCV?

I have been trying to segment biological cells in an image using watershed algorithm. I found an excellent article on pyimagesearch which clearly gives an overview of the algorithm and its implementation in python. The code uses both opencv and scikit-image for processing the image.
My goal is to convert the whole code into pure opencv. But the issues is that there's a function called scipy.feature.peak_local_max in scikit-image which does the job of finding local peaks in an image very efficiently. I couldn't find or devise such function in OpenCV.
Original Code(I have documented this snippet according to my understanding, please correct if am wrong):
import the necessary packages
from skimage.feature import peak_local_max
from skimage.morphology import watershed
from scipy import ndimage
import numpy as np
import argparse
import imutils
import cv2
from matplotlib import pyplot as plt
# load the image and perform pyramid mean shift filtering
# to aid the thresholding step
image = cv2.imread("test2.png")
shifted = cv2.pyrMeanShiftFiltering(image, 21, 51)
# Apply grayscale
gray = cv2.cvtColor(shifted, cv2.COLOR_BGR2GRAY)
# Convert to binary
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Watershed starts from here
# compute the exact Euclidean distance from every binary
# pixel to the nearest zero pixel, then find peaks in this
# distance map
D = ndimage.distance_transform_edt(thresh)
localMax = peak_local_max(D, indices=False, min_distance=10,labels=thresh)
# perform a connected component analysis on the local peaks,
# using 8-connectivity, then appy the Watershed algorithm
markers = ndimage.label(localMax, structure=np.ones((3, 3)))[0]
# Apply segmentation
labels = watershed(-D, markers, mask=thresh)
print("[INFO] {} unique segments found".format(len(np.unique(labels)) - 1))
cv2.imwrite("labels.png",labels)
# Contouring
for label in np.unique(labels):
# if the label is zero, we are examining the 'background'
# so simply ignore it
if label == 0:
continue
# otherwise, allocate memory for the label region and draw
# it on the mask
mask = np.zeros(gray.shape, dtype="uint8")
mask[labels == label] = 255
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key=cv2.contourArea)
# draw a circle enclosing the object
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.018 * peri, True)
cv2.drawContours(image, [approx], -1, (0,0,255), 2)
cv2.imwrite("output.jpg",image)
Pure OpenCV Code till finding distance map:
# import the necessary packages
import numpy as np
import cv2
# load the image and perform pyramid mean shift filtering
# to aid the thresholding step
image = cv2.imread("1.png")
shifted = cv2.pyrMeanShiftFiltering(image, 21, 51)
# Apply grayscale
gray = cv2.cvtColor(shifted, cv2.COLOR_BGR2GRAY)
# Convert to binary
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Watershed starts from here
# compute the exact Euclidean distance from every binary
# pixel to the nearest zero pixel, then find peaks in this
# distance map
D = cv2.distanceTransform(thresh,cv2.DIST_L2,0)
The point till D, both the original code and the pure opencv code which I have tried have exactly the same outputs, the issue is I dont exactly have a clear idea on how to implement peak_local_max in opencv which would give identical result as scikit's function.
It would be really helpful if someone who has relavent knowledge could explain how this function works in finding those peaks in such a fine grained manner.
Input Image:
Peak Local max output in scikit-image(BGR format image):
Required output:

How to show kdeplot in a 5*4 subplot?

I am working on a machine learning project and am using the seaborn kdeplot to show the standard scaler after scaling. However, no matter how large the figure size I change, the graphs just can't show and will show the error: AttributeError: 'numpy.ndarray' object has no attribute 'plot'.The image I'm willing to show is a 5*4 subplot that look like this:
expected subplot image
#feature scaling
#since numerical attributes have very different scales,
#we use standardization to get all attributes to have the same scale
import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
matplotlib.style.use('ggplot')
scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(train_set)
scaled_df = pd.DataFrame(scaled_df, columns=["SaleAmount","SaleCount","ReturnAmount","ReturnCount",
"KeyedAmount","KeyedCount","VoidRejectAmount","VoidRejectCount","RetrievalAmount",
"RetrievalCount","ChargebackAmount","ChargebackCount","DepositAmount","DepositCount",
"NetDeposit","AuthorizationAmount","AuthorizationCount","DeclinedAuthorizationAmount","DeclinedAuthorizationCount"])
fig, axes = plt.subplots(figsize=(20,10), ncols=5, nrows=4)
sns.kdeplot(scaled_df['SaleAmount'], ax=axes[0])
sns.kdeplot(scaled_df['SaleCount'], ax=axes[1])
sns.kdeplot(scaled_df['ReturnAmount'], ax=axes[2])
sns.kdeplot(scaled_df['ReturnCount'], ax=axes[3])
sns.kdeplot(scaled_df['KeyedAmount'], ax=axes[4])
sns.kdeplot(scaled_df['KeyedCount'], ax=axes[5])
sns.kdeplot(scaled_df['VoidRejectAmount'], ax=axes[6])
sns.kdeplot(scaled_df['VoidRejectCount'], ax=axes[7])
sns.kdeplot(scaled_df['RetrievalAmount'], ax=axes[8])
sns.kdeplot(scaled_df['RetrievalCount'], ax=axes[9])
sns.kdeplot(scaled_df['ChargebackAmount'], ax=axes[10])
sns.kdeplot(scaled_df['ChargebackCount'], ax=axes[11])
sns.kdeplot(scaled_df['DepositAmount'], ax=axes[12])
sns.kdeplot(scaled_df['DepositCount'], ax=axes[13])
sns.kdeplot(scaled_df['NetDeposit'], ax=axes[14])
sns.kdeplot(scaled_df['AuthorizationAmount'], ax=axes[15])
sns.kdeplot(scaled_df['AuthorizationCount'], ax=axes[16])
sns.kdeplot(scaled_df['DeclinedAuthorizationAmount'], ax=axes[17])
sns.kdeplot(scaled_df['DeclinedAuthorizationCount'], ax=axes[18])
You need to know that you have a two dimension array so something like this:
sns.kdeplot(scaled_df['DeclinedAuthorizationCount'], ax=axes[9,2])

OpenCV2 imwrite is writing a black image

I am messing around with opencv2 for neural style transfer... In cv2.imshow("Output", output), I am able to say my picture. But when I write output to file with cv2.imwrite("my_file.jpg", output). Is it because my file extension is wrong? When I do like cv2.imwrite("my_file.jpg", input) though, it does show my original input picture. Any ideas? Thank you in advance.
# import the necessary packages
from __future__ import print_function
import argparse
import time
import cv2
import imutils
import numpy as np
from imutils.video import VideoStream
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="neural style transfer model")
ap.add_argument("-i", "--image", required=True,
help="input image to apply neural style transfer to")
args = vars(ap.parse_args())
# load the neural style transfer model from disk
print("[INFO] loading style transfer model")
net = cv2.dnn.readNetFromTorch(args["model"])
# load the input image, resize it to have a width of 600 pixels, and
# then grab the image dimensions
image = cv2.imread(args["image"])
image = imutils.resize(image, width=600)
(h, w) = image.shape[:2]
# construct a blob from the image, set the input, and then perform a
# forward pass of the network
blob = cv2.dnn.blobFromImage(image, 1.0, (w, h),
(103.939, 116.779, 123.680), swapRB=False, crop=False)
net.setInput(blob)
start = time.time()
output = net.forward()
end = time.time()
# reshape the output tensor, add back in the mean subtraction, and
# then swap the channel ordering
output = output.reshape((3, output.shape[2], output.shape[3]))
output[0] += 103.939
output[1] += 116.779
output[2] += 123.680
output /= 255.0
output = output.transpose(1, 2, 0)
# show information on how long inference took
print("[INFO] neural style transfer took {:.4f} seconds".format(
end - start))
# show the images
cv2.imshow("Input", image)
cv2.imshow("Output", output)
cv2.waitKey(0)
cv2.imwrite("dogey.jpg", output)
Only the last 4 lines of code have to deal with imshow and imwrite, all lines before are trying to modify the output picture.
The variable output represents a colored image that is composed of pixels. Each pixel is determined by three values (RGB). Depending on the representation of the image each value is chosen from the discrete range [0, 255] or continuous range [0, 1] either. However, in the following line of code, you are scaling the entries of output from the discrete range [0,255] to the "continuous" range [0,1].
output /= 255.0
While the function cv2.imshow(...) can handle images stored with float values in the range [0, 1] the cv2.imwrite(...) function cannot. You have to pass an image composed of values in the range [0, 255]. In your case, you are passing values that are all close to zero and "far" away from 255. Hence, the image is assumed as colorless and therefore black. A quick fix might be:
cv2.imwrite("dogey.jpg", 255*output)

Count number of objects using watershed algorithm - Scikit-image

I am trying to find the number of objects in a given image using watershed segmentation. Consider for example the coins image. Here I would like to know the number of coins in the image. I implemented the code available at Scikit-image documentation and tweaked with it a little and got results similar to those displayed on the documentation page.
After looking at functions used in the code in detail I found out that ndimage.label() also returns number of unique objects found in the image (mentioned in it's documentation), but when I print that value I am getting 53 which is very high as compared to the number of coins in the actual image.
Can somebody suggest some method to find the number of objects in an image.
Here is a version of your code that counts the coins in one of two ways: a) by directly segmenting the distance image and b) by doing watershed first and rejecting tiny intersecting regions.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color, filter as filters
from scipy import ndimage
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from skimage.measure import regionprops, label
image = color.rgb2gray(io.imread('water_coins.jpg', plugin='freeimage'))
image = image < filters.threshold_otsu(image)
distance = ndimage.distance_transform_edt(image)
# Here's one way to measure the number of coins directly
# from the distance map
coin_centres = (distance > 0.8 * distance.max())
print('Number of coins (method 1):', np.max(label(coin_centres)))
# Or you can proceed with the watershed labeling
local_maxi = peak_local_max(distance, indices=False, footprint=np.ones((3, 3)),
labels=image)
markers, num_features = ndimage.label(local_maxi)
labels = watershed(-distance, markers, mask=image)
# ...but then you have to clean up the tiny intersections between coins
regions = regionprops(labels)
regions = [r for r in regions if r.area > 50]
print('Number of coins (method 2):', len(regions) - 1)
fig, axes = plt.subplots(ncols=3, figsize=(8, 2.7))
ax0, ax1, ax2 = axes
ax0.imshow(image, cmap=plt.cm.gray, interpolation='nearest')
ax0.set_title('Overlapping objects')
ax1.imshow(-distance, cmap=plt.cm.jet, interpolation='nearest')
ax1.set_title('Distances')
ax2.imshow(labels, cmap=plt.cm.spectral, interpolation='nearest')
ax2.set_title('Separated objects')
for ax in axes:
ax.axis('off')
fig.subplots_adjust(hspace=0.01, wspace=0.01, top=1, bottom=0, left=0,
right=1)
plt.show()

Resources