Related
Recently, I used opencv to do a project on wheel size recognition. Now encounter this problem:
1 For the threshold processing of grayscale images, I don't know if the function cv2.adaptiveThreshold() should be used, because according to my experiments, the use of the above functions may make the boundary of the hub larger and affect the accuracy of detection.
2 When using Canny to process the edge of the image, I don't know how much upper and lower to choose. It is a waste of time to randomly choose the number to try.
3 The outer circle of the hub and the outline of the inner hole cannot be effectively identified, and the detection results are shown in the following figure:
My English is not very good, thank you for reading and answering, thank you very much! !
Attached code:
def midpoint(ptx, pty):
return ((ptx[0] + pty[0]) * 0.5, ptx[1] + pty[1] * 0.5)
image = cv2.imread('picture.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # 转化成灰度图
blur = cv2.GaussianBlur(gray, (3,3), 0) # 高斯模糊
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 29, 10)
kernel_size = (10,10)
edged = cv2.dilate(thresh.copy(),None, iterations=1)
edged = cv2.erode(edged.copy(),None,iterations=1)
cv2.imshow('orig', blur)
cv2.imshow('edged', edged)
cv2.waitKey(0)
I have multiple colored images of gauges. I apply adaptive Gaussian thresholding to make the filter the image so that the ticks and needle are more prominent.
For the above thresholding I used cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 2). (21,2) seemed to process images best in general.
However, when the gauge images are too dark adaptThresh(21,2) produces much noise
Increasing the kernel sizes (adaptiveThreshold parameters) filters the salt and pepper noise and produces an image I want.
I want to be able to determine how much Gaussian (salt and pepper) noise in the image so that if there is too much noise, my algorithm will increase the filter size. What is a good metric to measure the amount of noise in this case?
A popular way to do this is considering the "derivatives" of the picture, or in other words, how much the picture "varies". A popular way to do that is using the Total-Variation. There are many ways to define it in the discrete domain, but in the end they boild down to a (weighted) sum of the absolute differences of neighbouring pixels. This means if you have a small total variation if the image comprises large areas (with "short" boundaries) of uniform brightness, and you'll have a big value for noisy or high frequency images.
So a simple way to measure the noise is just measuring the total variation and if that is above a certain threshold you could try to increase the filter size.
Get a clean background image. you can use OpenCV to reduce the low-level image info.
code example
img = cv2.imread("/home/xx/Pictures/test.png",cv2.IMREAD_GRAYSCALE)
# filter kernel
kernel = np.ones((5, 5), np.float32) / 25
dst = cv2.filter2D(img, -1, kernel)
img = img - dst/20
img = cv2.adaptiveThreshold(img.astype(np.uint8), 255,
cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 2)
cv2.imshow('1',img)
cv2.waitKey()
Out of an image, I need to extract a sheet of paper, just like camscanner app does, https://www.camscanner.com/
I know that I can do this by detecting the edges of the sheet of paper i want to detect. And later performing perspective transform. I use openCV library in python.
This is the image in which I'm trying to find the sheet of paper:
Here is what I already tried:
Method 1:
(using thresholding)
Preprocessing the image with image smoothening (guassian
blur/bilateral blur)
splitting image into h,s,v channels
adaptive thresholding on the saturation channel
some morphological operations like dilation and erosion
finding contours, identifying the largest contour and finding the
corner points
I've implemented this method based on a stackoverflow answer:
Detecting a sheet of paper / Square Detection
I'm able to find the paper sheet for some images, but it fails for images like this:
Method 2:
(using sobel gradient operator)
Preprocessing the image by converting into grayscale, image smoothening (guassian
blur/bilateral blur)
Finding the gradients of the image
downsampling and upsampling the image
After this I don't know how to find the appropriate boundary enclosing the image.
I've implemented this method based on a stackoverflow answer:
detect paper from background almost same as paper color
Here's how far I got with the image:
Method 3:
(using canny edge detector)
According to the posts I've read on this community seems that everyone prefers canny edge method to extract the edges, but in my case the results are not satisfactory. Here's what I did:
Preprocessing the image by converting into grayscale, image smoothening (guassian
blur/bilateral blur)
Finding the edges using canny edge
some morphological operations like dilation and erosion
But the edges obtained from canny are really not up to the mark.
I've implemented this method based on a stackoverflow answer:
Detecting a sheet of paper / Square Detection, also I didn't quite what he does by iterating over multiple channels in this answer.
Here's how far I got with the image:
Here's some code on the method1(thresholding):
#READING IMAGE INTO BGR SPACE
image = cv2.imread("./images/sheet3.png")
#BILATERAL FILTERING TO SMOOTHEN THE IMAGE BUT NOT THE EDGES
img = cv2.bilateralFilter(image,20,75,75)
#CONVERTING BGR TO HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
#SPLITTING THE HSV CHANNELS
h,s,v = cv2.split(hsv)
#DOUBLING THE SATURATION CHANNEL
gray_s = cv2.addWeighted(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), 0.0, s, 2.0, 0)
#THRESHOLDING USING ADAPTIVETHRESHOLDING
threshed = cv2.adaptiveThreshold(gray_s, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 109, 10)
#APPLYING MORPHOLOGICAL OPERATIONS OF DILATION AND EROSION
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
morph = cv2.morphologyEx(threshed, cv2.MORPH_OPEN, kernel)
#FINDING ALL THE CONTOURS
cnts = cv2.findContours(morph, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[-2]
canvas = img.copy()
#SORTING THE CONTOURS AND TAKING THE LARGEST CONTOUR
cnts = sorted(cnts, key = cv2.contourArea)
cnt = cnts[-1]
#FINDING THE PERIMETER OF THE CONTOUR
arclen = cv2.arcLength(cnt, True)
#FINDING THE END POINTS OF THE CONTOUR BY APPROX POLY DP
approx = cv2.approxPolyDP(cnt, 0.02* arclen, True)
cv2.drawContours(canvas, [cnt], -1, (255,0,0), 1, cv2.LINE_AA)
cv2.drawContours(canvas, [approx], -1, (0, 0, 255), 1, cv2.LINE_AA)
cv2.imwrite("detected.png", canvas)
I'm kind of new to image processing and openCV.
Please share some insights on how to take this further and obtain results more accurately. TIA.
I've gotten access to a lot of reports which are filled out by hand. One of the columns in the report contains a timestamp, which I would like to attempt to identify without going through each report manually.
I am playing with the idea of splitting the times, e.g. 00:30, into four digits, and running these through a classifier trained on MNIST to identify the actual timestamps.
When I manually extract the four digits in Photoshop and run these through an MNIST classifier, it works perfectly. But so far I haven't been able to figure out how to programatically split the number sequences into single digits. I tried to use different types of countour finding in OpenCV, but it didn't work very reliably.
Any suggestions?
I've added a screenshot of some of the relevant columns in the reports.
I would do something like this (no code as long as it is just an idea, you could test it to see if works):
Extract each area for each group of numbers as Rick M. suggested above. So you will have many Kl [hour] rectangles under image form.
For each of these rectangles extract (using OpenCV contours feature) each ROI. Delete Kl if you don't need it (you know the dimensions of this ROI (you can calculate it with img.shape) and they have more or less the same dimensions)
Extract all digits using the same script used above. You can take a look at my questions/answers to find some pieces of code which do this.
You will have a problem with underline in some cases. Search about this on SO, there are few solutions complete with code.
Now, about splitting up. We know the ROI's are in hour format, so hh:mm (or 4 digits). A simply (and very rudimental) solution to split chars wich are attached between would be to split in half the ROI you get with 2 digits inside. It's a raw solution but should perform well in your case because the digits attached are just 2.
Some digits will output with "missing pieces". This can be avoided by using some erosion/dilation/skeletonization.
Here you don't have letters, only numbers so MNIST should work well (not perfect, keep this in mind).
In a few, extracting the data it's not the hard task but recognizing the digits will make you sweat a bit.
I hope I can provide some code to show the steps above as soon as possible.
EDIT - code
This is some code I made. Final output is this:
The code works 100% with this image so, if something don't work for you, check folders/paths/modules installation.
Hope this helped.
import cv2
import numpy as np
# 1 - remove the vertical line on the left
img = cv2.imread('image.jpg', 0)
# gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(img, 100, 150, apertureSize=5)
lines = cv2.HoughLines(edges, 1, np.pi / 50, 50)
for rho, theta in lines[0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a * rho
y0 = b * rho
x1 = int(x0 + 1000 * (-b))
y1 = int(y0 + 1000 * (a))
x2 = int(x0 - 1000 * (-b))
y2 = int(y0 - 1000 * (a))
cv2.line(img, (x1, y1), (x2, y2), (255, 255, 255), 10)
cv2.imshow('marked', img)
cv2.waitKey(0)
cv2.imwrite('image.png', img)
# 2 - remove horizontal lines
img = cv2.imread("image.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_orig = cv2.imread("image.png")
img = cv2.bitwise_not(img)
th2 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2)
cv2.imshow("th2", th2)
cv2.waitKey(0)
cv2.destroyAllWindows()
horizontal = th2
rows, cols = horizontal.shape
# inverse the image, so that lines are black for masking
horizontal_inv = cv2.bitwise_not(horizontal)
# perform bitwise_and to mask the lines with provided mask
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
# reverse the image back to normal
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imshow("masked img", masked_img_inv)
cv2.waitKey(0)
cv2.destroyAllWindows()
horizontalsize = int(cols / 30)
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontalsize, 1))
horizontal = cv2.erode(horizontal, horizontalStructure, (-1, -1))
horizontal = cv2.dilate(horizontal, horizontalStructure, (-1, -1))
cv2.imshow("horizontal", horizontal)
cv2.waitKey(0)
cv2.destroyAllWindows()
# step1
edges = cv2.adaptiveThreshold(horizontal, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 3, -2)
cv2.imshow("edges", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
# step2
kernel = np.ones((1, 2), dtype="uint8")
dilated = cv2.dilate(edges, kernel)
cv2.imshow("dilated", dilated)
cv2.waitKey(0)
cv2.destroyAllWindows()
im2, ctrs, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = img[y:y + h, x:x + w]
# show ROI
rect = cv2.rectangle(img_orig, (x, y), (x + w, y + h), (255, 255, 255), -1)
cv2.imshow('areas', rect)
cv2.waitKey(0)
cv2.imwrite('no_lines.png', rect)
# 3 - detect and extract ROI's
image = cv2.imread('no_lines.png')
cv2.imshow('i', image)
cv2.waitKey(0)
# grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', gray)
cv2.waitKey(0)
# binary
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('thresh', thresh)
cv2.waitKey(0)
# dilation
kernel = np.ones((8, 45), np.uint8) # values set for this image only - need to change for different images
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
cv2.imshow('dilated', img_dilation)
cv2.waitKey(0)
# find contours
im2, ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = image[y:y + h, x:x + w]
# show ROI
# cv2.imshow('segment no:'+str(i),roi)
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 255), 1)
# cv2.waitKey(0)
# save only the ROI's which contain a valid information
if h > 20 and w > 75:
cv2.imwrite('roi\\{}.png'.format(i), roi)
cv2.imshow('marked areas', image)
cv2.waitKey(0)
These are next steps:
Understand what I write ;). It's the most important step.
Using pieces of the code above (especially step 3) you can delete remaining Kl in extracted images.
Create folder for each image and extract digits.
Using MNIST, recognize each digit.
Breaking up text into individual characters is not as easy as it sounds at first. You can try to find some rules and manipulate the image by that, but there will be just too many exceptions. For example you can try to find disjoint marks, but the fourth one in your image, 0715 has it's "5" broken up into three pieces, and the 9th one, 17.00 has the two zeros overlapping.
You are very lucky with the horizontal lines - at least it's easy to separate different entries. But you have to come up with a lot of ideas related to semi-fixed character width, a "soft" disjointness rule, etc.
I did a project like that two years ago and we ended up using an external open source library called Tesseract. Here's this article of Roman numerals recognition with it, up to about 90% accuracy. You might also want to look into the Lipi Toolkit, but I have no experience with that.
You might also want to consider to just train a network to recognize the four digits at once. So the input would be the whole field with the four handwritten digits and the output would be the four numbers. And let the network sort out where the characters are. If you have enough training data, that's probably the easiest approach.
EDIT:
Inspired by #Link's answer, I just came up with this idea, you can give it a try. Once you extracted the area between the two lines, trim the image to get rid of white space all around. Then make an educated guess about how big the characters are. Use maybe the height of the area? Then create a sliding window over the image, and run the recognition all the way. There will most likely be four peaks which would correspond to the four digits.
I'm trying to make a fire detection using Machine learning. My features are mean RGB, variance RGB, and Hu moments.
So what I'm doing right now is I first segment an image based on this paper
According to the paper I use the rules
r > g && g > b
r > 190 && g > 100 && b < 140
here is the result of my color segmentation for the negative and positive images
The pictures on the right are now in
vector<Mat> processedImage
After that I get the hu moments of each picture by converting it into gray scale and blurring it.
cvtColor(processedImage[x], gray_image, CV_BGR2GRAY);
blur(gray_image, gray_image, Size(3, 3));
Canny(gray_image, canny_output, thresh, thresh * 2, 3);
findContours(canny_output, contours, hierarchy, CV_RETR_TREE,CV_CHAIN_APPROX_SIMPLE, Point(0, 0));
cv::Moments mom = cv::moments(contours[0]);
cv::HuMoments(mom, hu); // now in hu are your 7 Hu-Moments
Now I am stuck I'm not sure if my images are okay to obtain useful hu moments because the negative images are so scattered.
Am I on the right track with regards to Hu moments extraction? Will I do the same on testing where I do color segmentation before extracting hu moments?
I think you should follow these steps (Code in Python):
1.Create a binary image by iterating through the original. If a pixel is identified as fire will be turned to white otherwise to black (Be careful if you are using BRG or RGB. OpenCV read images in BRG so you need to convert first):
rows,cols = im2.shape[:2]
for i in xrange(rows):
for j in xrange(cols):
if im2[i,j,0]>im2[i,j,1] and im2[i,j,1]>im2[i,j,2] and im2[i,j,0]>190 and im2[i,j,1] > 100 and im2[i,j,2] <140:
im2[i,j,:]=255
else:
im2[i,j,:]=0
Result:
2.Use morphological operators and blurring to reduce noise/small contours.
# Convert to greyscale-monochromatic
gray = cv2.cvtColor(im2,cv2.COLOR_RGB2GRAY)
#Apply Gaussian Blur
blur= cv2.GaussianBlur(gray,(7,7),0)
# Threshold again since after gaussian blur the image is no longer binary
(thresh, bw_image) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY| cv2.THRESH_OTSU)
# Difine Kernel Size and apply erosion filter
element = cv2.getStructuringElement(cv2.MORPH_RECT,(7,7))
dilated=cv2.dilate(bw_image,element)
eroded=cv2.erode(dilated,element)
3.Afterwards, you can detect contours using the cv2.RETR_EXTERNAL flag, so you can ignore all inner contours (you are interested only in the outer contours of the fire regions). Also you can retain only the contours whose are is bigger than e.g. 500px or just choose the bigger one if you know there is only one "fire".
g, contours,hierarchy = cv2.findContours(eroded,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
contours_retain=[]
for cnt in contours:
if cv2.contourArea(cnt)>500:
contours_retain.append(cnt)
cv2.drawContours(im_cp,contours_retain,-1,(255,0,255),3)
Here is the fire region:
4.Finally calculate your Hu moments
for cnt in contours_retain:
print cv2.HuMoments(cv2.moments(cnt)).flatten()
I hope this helps! Sorry I am not familiar with C++!