OpenCV-Python version 3.4.1
I am trying to detect multiple objects through a camera. The objects are Face, eyes, spoon, pen. Spoon and Pen are particular i.e. it should only detect the Pen and Spoon that I have trained it with. But it detects all the kind of faces and eyes as I have used the '.xml' file for face and eye detection that comes with OpenCV-Python.
My Question is about the code. There is a line in my code below which says
detectMultiScale(gray, 1.3, 10). Now, I used the documentation and still couldn't clearly understand the last two parameters of the bracket.
My code:
# with camera feed
import cv2
import numpy as np
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
spoon_cascade = cv2.CascadeClassifier('SpoonCascade.xml')
pen_cascade = cv2.CascadeClassifier('PenCascade.xml')
cap = cv2.VideoCapture('link')
while True:
ret, img = cap.read()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
spoons = spoon_cascade.detectMultiScale(gray, 1.3, 10)
pens = pen_cascade.detectMultiScale(gray, 1.3, 10)
for (x, y, w, h) in spoons:
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Spoon', (x-w, y-h), font, 0.5, (0, 255, 255), 2,
cv2.LINE_AA)
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
for (x, y, w, h) in pens:
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Pen', (x-w, y-h), font, 0.5, (0, 255, 255), 2,
cv2.LINE_AA)
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
for (x, y, w, h) in faces:
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Face', (x + w, y + h), font, 0.5, (0, 255, 255), 2,
cv2.LINE_AA)
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
roi_color = img[y:y + h, x:x + w]
roi_gray = gray[y:y + h, x:x + w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 0,
255), 2)
cv2.imshow('Voila', img)
cv2.imwrite('KuchhToDetected.jpg', img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
My question:
Is it just a matter of trial and error adjusting these last two parameters or can one know how to change them according to the images?
According to me these two are really significant and make the code very sensitive as it affects false positives. How do I set them properly to reduce false positives ?
It is really important parameter in case of object detection, so it would be beneficial for all if we have the answer once and for all.
Thank you.
Did you get the code (including the call to detectMultiScale) from somewhere, or write it yourself?
Is it just a matter of trial and error adjusting these last two parameters or can one know how to change them according to the images?
There is some trial and error in fine-tuning, but you should understand all the parameters and choose initial values which give a good level of performance. Then you can use some kind of automatic method for fine-tuning (i.e., iteratively re-train and re-test with different parameter values and see if detection improves or worsens, but be careful of overfitting). Since the parameters form a large multi-dimensional space, finding good parameters randomly is not practical.
Looking at the Python OpenCV bindings, it appears the two numeric parameters you use are scaleFactor and minNeighbors respectively. There is a good explanation of minNeighbours on this question: OpenCV detectMultiScale() minNeighbors parameter. Setting it higher should reduce your false positives, as described there.
The scaleFactor parameter determines a trade-off between detection accuracy and speed. The detection window starts out at size minSize, and after testing all windows of that size, the window is scaled up by scaleFactor and re-tested, and so on until the window reaches or exceeds maxSize. If scaleFactor is large (eg. 2.0), of course there will be fewer steps, so detection is faster, but you may miss objects whose size is in between two tested scales. But Haar-like features are inherently robust to some small variation in scale, so there's no need to make scaleFactor very small (eg. 1.001); that just wastes time with needless steps. That is why the default is 1.3 and not something smaller.
Setting minSize and maxSize is also important to maximise detection speed. Don't test windows that are smaller or larger than the size range you expect given your setup. So you should specify those in your call.
To be honest, I don't see Haar cascade classifiers being that good for detecting pens or spoons in unknown orientations (if that is your use case). Pens are long and thin which is poorly suited to a square detection window. You may have more success with LINEMOD for example.
According to me these two are really significant and make the code very sensitive as it affects false positives. How do I set them properly to reduce false positives ?
While your false negative rate and speed are OK, don't play with scaleFactor, instead work on improving your training data to reduce your high false positive rate. If speed falls to unacceptable levels while doing that (because the cascade grows to include too many classifier stages), revisit scaleFactor.
Related
So I have a temperature box where I am trying to pinpoint the coordinate location of a small triangle on each temperature dial. Here are the examples of the box with slight variations:
[
I have been able to isolate each dial, get their outlines and centers. I then have an algorithm that will generate an angle measure from the center point and then the eventually found point on the triangle. However, I have been unable, so to speak, "find" solely the triangle using OpenCV. I've been able to outline it and such but cannot figure out how to isolate just it's lines. I have tried multiple shape detection and edge detection blocks of code but have had no luck because its so lightly raised from the actual dial. If I can just get a point on the dial that would be good enough even.
There are several possible approaches you can try in order to find the direction of the dial. In this answer I will try it with classic contour detection. However a well trained ML model can be much more robust and reliable in different lighting conditions. But of course it is more effort to set it up.
Let's say that you already have isolated the dial and know its radius and center. Starting from there the straight forward approach would be:
Prepare the image for thresholding:
If the image is of low resolution as in our case, scale it up by some reasonable factor
If the image is of high resolution, blur it to reduce noise
Convert it to grayscale
Apply adaptiveThresholding or Canny, in this case use the first one
Only keep areas that are of interest:
In this case only keep the features in a circular range where the triangle is supposed to be
In this case only keep the contour with the largest area
Derive the result:
In this case just get the centroid of the largest contour
Code:
import cv2
import numpy as np
# read image, scale it up by some factor and apply adaptive thresholding
img = cv2.imread("img_red.jpg")
h, w, _ = img.shape
f = 8
img = cv2.resize(img, (w * f, h * f))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255,
cv2.cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 71, 5)
cv2.imwrite("thresh.png", thresh)
# only examine circle where the triangle is supposed to be
mask = np.zeros_like(thresh)
cv2.circle(mask, (int(w * f / 2), int(h * f / 2)), int(w * f / 3), 255, int(w * f / 6))
thresh = cv2.bitwise_and(thresh, mask)
cv2.imwrite("thresh_mask.png", thresh)
# get contours, derive contour with largest area and get centroid
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
if contours:
m = max([(c, cv2.contourArea(c)) for c in contours], key=lambda i: i[1])[0]
M = cv2.moments(m)
if M['m00'] > 0:
x = round(M['m10'] / M['m00'])
y = round(M['m01'] / M['m00'])
# draw small red circle at centroid
cv2.circle(img, (x, y), 2 * f, (0, 0, 255), f)
cv2.imwrite("out.png", img)
Results:
Recently, I used opencv to do a project on wheel size recognition. Now encounter this problem:
1 For the threshold processing of grayscale images, I don't know if the function cv2.adaptiveThreshold() should be used, because according to my experiments, the use of the above functions may make the boundary of the hub larger and affect the accuracy of detection.
2 When using Canny to process the edge of the image, I don't know how much upper and lower to choose. It is a waste of time to randomly choose the number to try.
3 The outer circle of the hub and the outline of the inner hole cannot be effectively identified, and the detection results are shown in the following figure:
My English is not very good, thank you for reading and answering, thank you very much! !
Attached code:
def midpoint(ptx, pty):
return ((ptx[0] + pty[0]) * 0.5, ptx[1] + pty[1] * 0.5)
image = cv2.imread('picture.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # 转化成灰度图
blur = cv2.GaussianBlur(gray, (3,3), 0) # 高斯模糊
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 29, 10)
kernel_size = (10,10)
edged = cv2.dilate(thresh.copy(),None, iterations=1)
edged = cv2.erode(edged.copy(),None,iterations=1)
cv2.imshow('orig', blur)
cv2.imshow('edged', edged)
cv2.waitKey(0)
For each pixel in an RGB image, I want to increase its brightness so that the strongest channel of that pixel (R, G, or B) is maximized. For example, if a pixel has an RGB value of (128, 64, 32), and the maximum channel value is 255, then that pixel should be changed to approximately (255, 128, 64). This is sort of a poor man's shadow removal system.
I can do this by iterating the pixels explicitly, but that is very inefficient. What is the most efficient way to do this by using strictly OpenCV methods? It seems it might use YUV space?
(By the way, I am using C# with EmguCV, but a straight Python/OpenCV answer would be fine. EDIT: But I can't use Python libraries)
In Python, OpenCV images are just numpy array. So here's a python/numpy approach:
# toy sample
np.random.seed(1)
a = np.random.randint(0, 100, (4,4,3), dtype=np.uint8)
# get max values across channels, scaled by 255
maxx = np.max(a, axis=-1)/255
# scale a by maxx
a = (a / maxx[:,:, None]).astype(np.uint8)
Input (with plt.imshow() so in rgb):
Output:
I've gotten access to a lot of reports which are filled out by hand. One of the columns in the report contains a timestamp, which I would like to attempt to identify without going through each report manually.
I am playing with the idea of splitting the times, e.g. 00:30, into four digits, and running these through a classifier trained on MNIST to identify the actual timestamps.
When I manually extract the four digits in Photoshop and run these through an MNIST classifier, it works perfectly. But so far I haven't been able to figure out how to programatically split the number sequences into single digits. I tried to use different types of countour finding in OpenCV, but it didn't work very reliably.
Any suggestions?
I've added a screenshot of some of the relevant columns in the reports.
I would do something like this (no code as long as it is just an idea, you could test it to see if works):
Extract each area for each group of numbers as Rick M. suggested above. So you will have many Kl [hour] rectangles under image form.
For each of these rectangles extract (using OpenCV contours feature) each ROI. Delete Kl if you don't need it (you know the dimensions of this ROI (you can calculate it with img.shape) and they have more or less the same dimensions)
Extract all digits using the same script used above. You can take a look at my questions/answers to find some pieces of code which do this.
You will have a problem with underline in some cases. Search about this on SO, there are few solutions complete with code.
Now, about splitting up. We know the ROI's are in hour format, so hh:mm (or 4 digits). A simply (and very rudimental) solution to split chars wich are attached between would be to split in half the ROI you get with 2 digits inside. It's a raw solution but should perform well in your case because the digits attached are just 2.
Some digits will output with "missing pieces". This can be avoided by using some erosion/dilation/skeletonization.
Here you don't have letters, only numbers so MNIST should work well (not perfect, keep this in mind).
In a few, extracting the data it's not the hard task but recognizing the digits will make you sweat a bit.
I hope I can provide some code to show the steps above as soon as possible.
EDIT - code
This is some code I made. Final output is this:
The code works 100% with this image so, if something don't work for you, check folders/paths/modules installation.
Hope this helped.
import cv2
import numpy as np
# 1 - remove the vertical line on the left
img = cv2.imread('image.jpg', 0)
# gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(img, 100, 150, apertureSize=5)
lines = cv2.HoughLines(edges, 1, np.pi / 50, 50)
for rho, theta in lines[0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a * rho
y0 = b * rho
x1 = int(x0 + 1000 * (-b))
y1 = int(y0 + 1000 * (a))
x2 = int(x0 - 1000 * (-b))
y2 = int(y0 - 1000 * (a))
cv2.line(img, (x1, y1), (x2, y2), (255, 255, 255), 10)
cv2.imshow('marked', img)
cv2.waitKey(0)
cv2.imwrite('image.png', img)
# 2 - remove horizontal lines
img = cv2.imread("image.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_orig = cv2.imread("image.png")
img = cv2.bitwise_not(img)
th2 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2)
cv2.imshow("th2", th2)
cv2.waitKey(0)
cv2.destroyAllWindows()
horizontal = th2
rows, cols = horizontal.shape
# inverse the image, so that lines are black for masking
horizontal_inv = cv2.bitwise_not(horizontal)
# perform bitwise_and to mask the lines with provided mask
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
# reverse the image back to normal
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imshow("masked img", masked_img_inv)
cv2.waitKey(0)
cv2.destroyAllWindows()
horizontalsize = int(cols / 30)
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontalsize, 1))
horizontal = cv2.erode(horizontal, horizontalStructure, (-1, -1))
horizontal = cv2.dilate(horizontal, horizontalStructure, (-1, -1))
cv2.imshow("horizontal", horizontal)
cv2.waitKey(0)
cv2.destroyAllWindows()
# step1
edges = cv2.adaptiveThreshold(horizontal, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 3, -2)
cv2.imshow("edges", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
# step2
kernel = np.ones((1, 2), dtype="uint8")
dilated = cv2.dilate(edges, kernel)
cv2.imshow("dilated", dilated)
cv2.waitKey(0)
cv2.destroyAllWindows()
im2, ctrs, hier = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = img[y:y + h, x:x + w]
# show ROI
rect = cv2.rectangle(img_orig, (x, y), (x + w, y + h), (255, 255, 255), -1)
cv2.imshow('areas', rect)
cv2.waitKey(0)
cv2.imwrite('no_lines.png', rect)
# 3 - detect and extract ROI's
image = cv2.imread('no_lines.png')
cv2.imshow('i', image)
cv2.waitKey(0)
# grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', gray)
cv2.waitKey(0)
# binary
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('thresh', thresh)
cv2.waitKey(0)
# dilation
kernel = np.ones((8, 45), np.uint8) # values set for this image only - need to change for different images
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
cv2.imshow('dilated', img_dilation)
cv2.waitKey(0)
# find contours
im2, ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = image[y:y + h, x:x + w]
# show ROI
# cv2.imshow('segment no:'+str(i),roi)
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 255), 1)
# cv2.waitKey(0)
# save only the ROI's which contain a valid information
if h > 20 and w > 75:
cv2.imwrite('roi\\{}.png'.format(i), roi)
cv2.imshow('marked areas', image)
cv2.waitKey(0)
These are next steps:
Understand what I write ;). It's the most important step.
Using pieces of the code above (especially step 3) you can delete remaining Kl in extracted images.
Create folder for each image and extract digits.
Using MNIST, recognize each digit.
Breaking up text into individual characters is not as easy as it sounds at first. You can try to find some rules and manipulate the image by that, but there will be just too many exceptions. For example you can try to find disjoint marks, but the fourth one in your image, 0715 has it's "5" broken up into three pieces, and the 9th one, 17.00 has the two zeros overlapping.
You are very lucky with the horizontal lines - at least it's easy to separate different entries. But you have to come up with a lot of ideas related to semi-fixed character width, a "soft" disjointness rule, etc.
I did a project like that two years ago and we ended up using an external open source library called Tesseract. Here's this article of Roman numerals recognition with it, up to about 90% accuracy. You might also want to look into the Lipi Toolkit, but I have no experience with that.
You might also want to consider to just train a network to recognize the four digits at once. So the input would be the whole field with the four handwritten digits and the output would be the four numbers. And let the network sort out where the characters are. If you have enough training data, that's probably the easiest approach.
EDIT:
Inspired by #Link's answer, I just came up with this idea, you can give it a try. Once you extracted the area between the two lines, trim the image to get rid of white space all around. Then make an educated guess about how big the characters are. Use maybe the height of the area? Then create a sliding window over the image, and run the recognition all the way. There will most likely be four peaks which would correspond to the four digits.
I'm trying to find coins at different images and mark their location. Coins always are perfect circles (not ellipses), but they can touch or even overlap. Here are some example images, as well as results of my tries (a Python script using skimage and its outputs), but it doesn't seem to perform well.
The script:
def edges(img, t):
#adapt_rgb(each_channel)
def filter_rgb(image):
sigma = 1
return feature.canny(image, sigma=sigma, low_threshold=t/sigma/2, high_threshold=t/sigma)
edges = color.rgb2hsv(filter_rgb(img))
edges = edges[..., 2]
return edges
images = io.ImageCollection('*.bmp', conserve_memory=True)
for i, im in enumerate(images):
es = edges(im, t=220)
output = im.copy()
circles = cv2.HoughCircles((es*255).astype(np.uint8), cv2.cv.CV_HOUGH_GRADIENT, dp=1, minDist=50, param2=50, minRadius=0, maxRadius=0)
if circles is not None:
circles = np.round(circles[0, :]).astype("int")
for (x, y, r) in circles:
cv2.circle(output, (x, y), r, (0, 255, 0), 4)
cv2.rectangle(output, (x - 5, y - 5), (x + 5, y + 5), (0, 128, 255), -1)
# now es is edges
# and output is image with marked circles
A couple of example images, with detected edges and circles:
I am using canny edge detection & hough transform, which is the most common way to detect circles. However, with the same parameters it finds almost nothing on some photos, and finds way too many circles on other.
Can you give me any pointers and suggestions on how to do this better?
I ended up using dlib's object detector and it performed very well. The detector can be easily applied for detecting any kind of objects. For some related discussion see the question topic on reddit.
Hmmm, I would do some morphological operations in the canny results, such
as closing and opening operations:
http://en.wikipedia.org/wiki/Mathematical_morphology
I would also recommend you to take a look in the watershed scheme. Applied
directly into the image gradient and then a Hough Transform on it.
http://en.wikipedia.org/wiki/Watershed_%28image_processing%29