From pictures of tools on a sheet of paper, I'm asked to find their outline contour to vectorize them.
I'm a total beginner in computer-vision-related problems and the only thing I thought about was OpenCV and edge detection.
The result is better than what I've imagined, this is still very unreliable, especially if the source picture isn't "perfect".
I took 2 photographies of a wrench they gave me.
After playing around with opencv bindings for node, I get this:
Then, I've tried with the less-good picture:
That's totally inexploitable.
I can get something a little better by changing the Canny thresold, but that must be automatized (given that the picture is relatively correct).
So I've got a few questions:
Am I taking the right approach? Is GrabCut better for this? A combination of Grabcut and Canny edge detection? I still need vertices at the end, but I feel that GrabCut does what I want too.
The borders are rough and have certain errors. I can augment approxPolyDP's multiplier, but without a loss of precision on good parts.
Related to the above point, I'm thinking of integrating Savitzky-Golay algorithm to smooth the outline, instead of polygon simplification with approxPolyDP. Is it a good idea?
Normally, the line of the outer border must form a simple, cuttable block. Is there a way in OpenCL to avoid that line to do impossible things, like passing on itself? - Or, simply, detect the problem? Those configurations are, of course, impossible but happen when the detection is failed (like in the second pic).
I'm searching a way to do automatic Canny thresold calculation, since I must tweak it manually for each image. Do you have a good example for that?
I noticed that converting the image to grayscale before edge detection sometimes deteriorates the result, and sometimes makes it better. Which one should I choose? (tools can be of any color btw!)
here is the source for my tests:
const cv = require('opencv');
const lowThresh = 90;
const highThresh = 90;
const nIters = 1;
const GRAY = [120, 120, 120];
const WHITE = [255, 255, 255];
cv.readImage('./files/viv1.jpg', function(err, im) {
if (err) throw err;
width = im.width()
height = im.height()
if (width < 1 || height < 1) throw new Error('Image has no size');
const out = new cv.Matrix(height, width);
im_canny = im.copy();
im_canny.canny(lowThresh, highThresh);
contours = im_canny.findContours();
let maxArea = 0;
let biggestContour;
for (i = 0; i < contours.size(); i++) {
const area = contours.area(i);
if (area > maxArea) {
maxArea = area;
biggestContour = i;
out.drawContour(contours, i, GRAY);
const arcLength = contours.arcLength(biggestContour, true);
contours.approxPolyDP(biggestContour, 0.001 * arcLength, true);
out.drawContour(contours, biggestContour, WHITE, 5);'./tmp/out.png');
console.log('Image saved to ./tmp/out.png');
You'll need to add some pre-processing to clean up the image. Because you have a large variation in intensities in the image because of shadow, poor lighting, high shine on tools, etc you should equalize the image. This will help you get a better response in the regions that are currently poorly lit or have high shine.
Here's an opencv tutorial on Histogram equalization in C++:
Hope this helps
You can have an automatic threshold based on some loss function(?). For eg: If you know that the tool will be completely captured in the frame, you know that you should get a high value at every column from x = 10 to x = 800(say). You could then keep reducing the threshold until you get a high value at every column from x = 10 to x = 800. This is a very naive way of doing it, but its an interesting experiment, I think, since you are generating the images yourself and have control over object placement.
You might also try running your images through an adaptive threshold first. This type of binarization is fairly adept at segmenting foreground and background in cases like this, even with inconsistent lighting/shadows (which seems to be the issue in your second example above). Adathresh will require some parameter fine-tuning, but once the entire tool is segmented from the background, Canny edge detection should produce more consistent results.
As for the roughness in your contours, you could try setting your findContours mode to one of the CV_CHAIN_APPROX methods described here.
Context: I have two cameras of very different focus and size that I want to align for image processing. One is RGB, one is near-infrared. The cameras are in a static rig, so fixed relative to each other. Because the image focus/width are so different, it's hard to even get both images to recognize the chessboard at the same time. Pretty much only works when the chessboard is centered in both images with very little skew/tilt.
I need to perform computations on the aligned images, so I need as good of a mapping between the optical frames as I can get. Right now the results I'm getting are pretty far off. I'm not sure if I'm using the method itself wrong, or if I am misusing the output. Details and image below.
Computation: I am using OpenCV stereoCalibrate to estimate the rotation and translation matrices with the following code, and throwing out bad results based on final error.
int flag = cv::CALIB_FIX_INTRINSIC;
double err = cv::stereoCalibrate(temp_points_object_vec, temp_points_alignvec, temp_points_basevec, camera_mat_align, camera_distort_align, camera_mat_base, camera_distort_base, mat_align.size(), rotate_mat, translate_mat, essential_mat, F, flag, cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, 1e-6));
if (last_error_ == -1.0 || (err < last_error_ + improve_threshold_)) {
// -1.0 indicate first calibration, accept points. Other cond indicates acceptable error.
The result doesn't produce an OpenCV error as is, and due to the large difference between images, more than half of the matched points are rejected. Results are much better since I added the conditional on the error, but still pretty poor. Error as computed above starts around 30, but doesn't get lower than 15-17. For comparison, I believe a "good" error would be <1. So for starters, I know the output isn't great, but on top of that, I'm not sure I'm using the output right for validating visually. I've attached images showing some of the best and worst results I see. The middle image on the right of each shows the "cross-validated" chessboard keypoints. These are computed like this (note addalign is the temporary vector containing only the chessboard keypoints from the current image in the frame to be aligned):
for (int i = 0; i < addalign.size(); i++) {
cv::Point2f validate_pt;// = rotate_mat * + translate_mat;
// Project pixel from image aligned to 3D
cv::Point3f ray3d = align_camera_model_.projectPixelTo3dRay(;
// Rotate and translate
rotate_mat.convertTo(rotate_mat, CV_32F);
cv::Mat temp_result = rotate_mat * cv::Mat(ray3d, false);
cv::Point3f ray_transformed;
temp_result.copyTo(cv::Mat(ray_transformed, false));
cv::Mat tmat = cv::Mat(translate_mat, false);
ray_transformed.x +=<float>(0);
ray_transformed.y +=<float>(1);
ray_transformed.z +=<float>(2);
// Reproject to base image pixel
cv::Point2f pixel = base_camera_model_.project3dToPixel(ray_transformed);
Here are two images showing sample outputs, including both raw images, both images with "drawChessboard," and a cross-validated image showing the base image with above-computed keypoints translated from the alignment image.
Better result
Worse result
In the computation of corners_validated, I'm not sure I'm using rotate_mat andtranslate_mat correctly. I'm sure there is probably an OpenCV method that does this more efficiently, but I just did it the way that made sense to me at the time.
Also relevant: This is all inside a ROS package, using ROS noetic on Ubuntu 20.04 which only permits the use of OpenCV 4.2, so I don't have access to some of the newer opencv methods.
The image below has many circles. Click and zoom in to see the circles.
What I want is counting the circles using any free language, such as python.
Is there a function or idea to do it?
Edit: I came up with a better solution, partially inspired by this answer below. I thought of this method originally (as noted in the OP comments) but I decided against it. The original image was just not good enough quality for it. However I improved that method and it works brilliantly for the better quality image. The original approach is first, and then the new approach at the bottom.
First approach
So here's a general approach that seems to work well, but definitely just gives estimates. This assumes that circles are roughly the same size.
First, the image is mostly blue---so it seems reasonable to just do the analysis on the blue channel. Thresholding the blue channel, in this case, using Otsu thresholding (which determines an optimal threshold value without input) seems to work very well. This isn't too much of a surprise since the distribution of color values is pretty much binary. Check the mask that results from it!
Then, do a connected component analysis on the mask to get the area of each component (component = white blob in the mask). The statistics returned from connectedComponentsWithStats() give (among other things) the area, which is exactly what we need. Then we can simply count the circles by estimating how many circles fit in a given component based on its area. Also note that I'm taking the statistics for every label except the first one: this is the background label 0, and not any of the white blobs.
Now, how large in area is a single circle? It would be best to let the data tell us. So you could compute a histogram of all the areas, and since there are more single circles than anything else, there will be a high concentration around 250-270 pixels or so for the area. Or you could just take an average of all the areas between something like 50 and 350 which should also get you in a similar ballpark.
Really in this histogram you can see the demarcations between single circles, double circles, triple, and so on quite easily. Only the larger components will give pretty rough estimates. And in fact, the area doesn't seem to scale exactly linearly. Blobs of two circles are slightly larger than two single circles, and blobs of three are larger still than three single circles, and so on, so this makes it a little difficult to estimate nicely, but rounding should still keep us close. If you want you could include a small multiplication parameter that increases as the area increases to account for that, but that would be hard to quantify without going through the histogram, I didn't worry about this.
A single circle area divided by the average single circle area should be close to 1. And the area of a 5-circle group divided by the average circle area should be close to 5. And this also means that small insignificant components, that are 1 or 10 or even 100 pixels in area, will not count towards the total since round(50/avg_circle_size) < 1/2, so those will round down to a count of 0. Thus I should just be able to take all the component areas, divide them by the average circle size, round, and get to a decent estimate by summing them all up.
import cv2
import numpy as np
img = cv2.imread('circles.png')
mask = cv2.threshold(img[:, :, 0], 255, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
stats = cv2.connectedComponentsWithStats(mask, 8)[2]
label_area = stats[1:, cv2.CC_STAT_AREA]
min_area, max_area = 50, 350 # min/max for a single circle
singular_mask = (min_area < label_area) & (label_area <= max_area)
circle_area = np.mean(label_area[singular_mask])
n_circles = int(np.sum(np.round(label_area / circle_area)))
print('Total circles:', n_circles)
This code is simple and effective for rough counts.
However, there are definitely some assumptions here about the groups of circles compared to a normal circle size, and there are issues where circles that are at the boundaries will not be counted correctly (these aren't well defined---a two circle blob that is half cut off will look more like one circle---no clear way to count or not count these with this method). Further I just used automatic thresholding via Otsu here; you could get (probably better) results with more careful color filtering. Additionally in the mask generated by Otsu, some circles that are masked have a few pixels removed from their center. Morphology could add these pixels back in, which would give you a (slightly larger) more accurate area for the single circle components. Either way, I just wanted to give the general idea towards how you could easily estimate this with minimal code.
New approach
Before, the goal was to count circles. This new approach instead counts the centers of the circles. The general idea is you threshold and then flood fill from a background pixel to fill in the background (flood fill works like the paint bucket tool in photo editing apps), that way you only see the centers, as shown in this answer below.
However, this relies on global thresholding, which isn't robust to local lighting changes. This means that since some centers are brighter/darker than others, you won't always get good results with a single threshold.
Here I've created an animation to show looping through different threshold values; watch as some centers appear and disappear at different times, meaning you get different counts depending on the threshold you choose (this is just a small patch of the image, it happens everywhere):
Notice that the first blob to appear in the top left actually disappears as the threshold increases. However, if we actually OR each frame together, then each detected pixel persists:
But now every single speck appears, so we should clean up the mask each frame so that we remove single pixels as they come (otherwise they may build up and be hard to remove later). Simple morphological opening with a small kernel will remove them:
Applied over the whole image, this method works incredibly well and finds almost every single cell. There are only three false positives (detected blob that's not a center) and two misses I can spot, and the code is very simple. The final thing to do after the mask has been created is simply count the components, minus one for the background. The only user input required here is a single point to flood fill from that is in the background (seed_pt in the code).
img = cv2.imread('circles.png', 0)
seed_pt = (25, 25)
fill_color = 0
mask = np.zeros_like(img)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
for th in range(60, 120):
prev_mask = mask.copy()
mask = cv2.threshold(img, th, 255, cv2.THRESH_BINARY)[1]
mask = cv2.floodFill(mask, None, seed_pt, fill_color)[1]
mask = cv2.bitwise_or(mask, prev_mask)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
n_centers = cv2.connectedComponents(mask)[0] - 1
print('There are %d cells in the image.'%n_centers)
There are 874 cells in the image.
One possible solution would be to read the image using OpenCV, get its grayscale, then use Canny edge detection and perform countour finding in OpenCV. This will return a list of countours. It would look something like:
import cv2
image = cv2.imread('path-to-your-image')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# tweak the parameters of the GaussianBlur for best performance
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
# again, try different values here
edged = cv2.Canny(blurred, 20, 140)
(_, contours, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
If you have all images like this - consider thresholding it, not necessarily by auto threshold-seeking algorithm like Otsu, but rather using simplest threshold by a given threshold value. Yes, before thresholding you have to convert your color input to gray-scale, or take one of color channels. Then based on few experiments with channels and threshold values - determine threshold value to have circles with holes in monochrome thresholding result. Based on your png image I found value of 81 (intensity of gray varies from 0 to 255) to be great to threshold gray-scale version of your input to have such binary image with holes in place, as described above.
Then simply count those holes.
Holes can be determined by seed-filling white area, connected to image border. As result you will have white hole connected components on black background - so simply count them.
More details you can find here and use leptonica primitives to do thresholding, hole counting an so on.
This is my first post, so forgive me if I miss something.
I have been playing around with OpenCV2 with Visual Studio C++. I have a basic object tracker working. By applying a Gaussian Blur, Converting to HSV, Thresholding with Trackbars, Eroding then Dilating. Now I want to set up some way of easily calibrating the color to be thresholded without using the Trackbars.
I've tried setting up an area of interest and taking the average BGR or HSV values (I've tried both ways). Then if needed use trackbars to make finer adjustments, but it does not seem to work. Am I on the right track, or is there a better way?
I have basically followed this video to get where I am.
I am not looking for a code to copy and paste. I am just looking for an Algorithm or explanation of a way to do it. Cheers
Sorry I'll try and clear it up. What I have done is written an object tracking program for a home robot vision project. I just want to make it easier to calibrate what color is to be thresholded. At the moment I use trackbars to set the min and max HSV values for thresholding. Then use Erode and Dilate to clear up the binary image. Before using cv::findConturs and cv::moments to find the centroid for the largest contour.
What I have tried is setting a small 40x40pixel square in the center of the screen. When, for example, I hold a green ball in this square and hit spacebar. I cycle through each pixel in the square and get each separate Hue, Saturation and Value um...value. Then take the mode of each and use that to set the min and max threshold values.
Here is a segment of the code
if(cv::waitKey(20) == 32){ // wait for spacebar
int count = 0;
cv::Mat roi_Crop = frame_HSV(roi); //create cropped image from frame_HSV
for(int i=0; i<roi_Crop.rows; i++) // cycle through each pixel
for(int j=0; j<roi_Crop.cols; j++)
Hue[count] =<cv::Vec3b>(i,j)[0];
Sat[count] =<cv::Vec3b>(i,j)[1];
Val[count] =<cv::Vec3b>(i,j)[2];
HSV_Mode[0] = findMode(Hue);
HSV_Mode[1] = findMode(Sat);
HSV_Mode[2] = findMode(Val);
I hope this helps.
How can I threshold this blurry image to make the digits as clear as possible?
In a previous post, I tried adaptively thresholding a blurry image (left), which resulted in distorted and disconnected digits (right):
Since then, I've tried using a morphological closing operation as described in this post to make the brightness of the image uniform:
If I adaptively threshold this image, I don't get significantly better results. However, because the brightness is approximately uniform, I can now use an ordinary threshold:
This is a lot better than before, but I have two problems:
I had to manually choose the threshold value. Although the closing operation results in uniform brightness, the level of brightness might be different for other images.
Different parts of the image would do better with slight variations in the threshold level. For instance, the 9 and 7 in the top left come out partially faded and should have a lower threshold, while some of the 6s have fused into 8s and should have a higher threshold.
I thought that going back to an adaptive threshold, but with a very large block size (1/9th of the image) would solve both problems. Instead, I end up with a weird "halo effect" where the centre of the image is a lot brighter, but the edges are about the same as the normally-thresholded image:
Edit: remi suggested morphologically opening the thresholded image at the top right of this post. This doesn't work too well. Using elliptical kernels, only a 3x3 is small enough to avoid obliterating the image entirely, and even then there are significant breakages in the digits:
Edit2: mmgp suggested using a Wiener filter to remove blur. I adapted this code for Wiener filtering in OpenCV to OpenCV4Android, but it makes the image even blurrier! Here's the image before (left) and after filtering with my code and a 5x5 kernel:
Here is my adapted code, which filters in-place:
private void wiener(Mat input, int nRows, int nCols) { // I tried nRows=5 and nCols=5
Mat localMean = new Mat(input.rows(), input.cols(), input.type());
Mat temp = new Mat(input.rows(), input.cols(), input.type());
Mat temp2 = new Mat(input.rows(), input.cols(), input.type());
// Create the kernel for convolution: a constant matrix with nRows rows
// and nCols cols, normalized so that the sum of the pixels is 1.
Mat kernel = new Mat(nRows, nCols, CvType.CV_32F, new Scalar(1.0 / (double) (nRows * nCols)));
// Get the local mean of the input. localMean = convolution(input, kernel)
Imgproc.filter2D(input, localMean, -1, kernel, new Point(nCols/2, nRows/2), 0);
// Get the local variance of the input. localVariance = convolution(input^2, kernel) - localMean^2
Core.multiply(input, input, temp); // temp = input^2
Imgproc.filter2D(temp, temp, -1, kernel, new Point(nCols/2, nRows/2), 0); // temp = convolution(input^2, kernel)
Core.multiply(localMean, localMean, temp2); //temp2 = localMean^2
Core.subtract(temp, temp2, temp); // temp = localVariance = convolution(input^2, kernel) - localMean^2
// Estimate the noise as mean(localVariance)
Scalar noise = Core.mean(temp);
// Compute the result. result = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.max(temp, noise, temp2); // temp2 = max(localVariance, noise)
Core.subtract(temp, noise, temp); // temp = localVariance - noise
Core.max(temp, new Scalar(0), temp); // temp = max(0, localVariance - noise)
Core.divide(temp, temp2, temp); // temp = max(0, localVar-noise) / max(localVariance, noise)
Core.subtract(input, localMean, input); // input = input - localMean
Core.multiply(temp, input, input); // input = max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.add(input, localMean, input); // input = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Some hints that you might try out:
Apply the morphological opening in your original thresholded image (the one which is noisy at the right of the first picture). You should get rid of most of the background noise and be able to reconnect the digits.
Use a different preprocessing of your original image instead of morpho closing, such as median filter (tends to blur the edges) or bilateral filtering which will preserve better the edges but is slower to compute.
As far as threshold is concerned, you can use CV_OTSU flag in the cv::threshold to determine an optimal value for a global threshold. Local thresholding might still be better, but should work better with the bilateral or median filter
I've tried thresholding each 3x3 box separately, using Otsu's algorithm (CV_OTSU - thanks remi!) to determine an optimal threshold value for each box. This works a bit better than thresholding the entire image, and is probably a bit more robust.
Better solutions are welcome, though.
If you're willing to spend some cycles on it there are de-blurring techniques that could be used to sharpen up the picture prior to processing. Nothing in OpenCV yet but if this is a make-or-break kind of thing you could add it.
There's a bunch of literature on the subject:
And some chatter on the OpenCV mailing list:
The weird "halo effect" that you're seeing is likely due to OpenCV assuming black for the color when the adaptive threshold is at/near the edge of the image and the window that it's using "hangs over" the edge into non-image territory. There are ways to correct for this, most likely you would make an temporary image that's at least two full block-sizes taller and wider than the image from the camera. Then copy the camera image into the middle of it. Then set the surrounding "blank" portion of the temp image to be the average color of the image from the camera. Now when you perform the adaptive threshold the data at/near the edges will be much closer to accurate. It won't be perfect since its not a real picture but it will yield better results than the black that OpenCV is assuming is there.
My proposal assumes you can identify the sudoku cells, which I think, is not asking too much. Trying to apply morphological operators (although I really like them) and/or binarization methods as a first step is the wrong way here, in my opinion of course. Your image is at least partially blurry, for whatever reason (original camera angle and/or movement, among other reasons). So what you need is to revert that, by performing a deconvolution. Of course asking for a perfect deconvolution is too much, but we can try some things.
One of these "things" is the Wiener filter, and in Matlab, for instance, the function is named deconvwnr. I noticed the blurry to be in the vertical direction, so we can perform a deconvolution with a vertical kernel of certain length (10 in the following example) and also assume the input is not noise free (assumption of 5%) -- I'm just trying to give a very superficial view here, take it easy. In Matlab, your problem is at least partially solved by doing:
f = imread('some_sudoku_cell.png');
g = deconvwnr(f, fspecial('motion', 10, 90), 0.05));
h = im2bw(g, graythresh(g)); % graythresh is the Otsu method
Here are the results from some of your cells (original, otsu, otsu of region growing, morphological enhanced image, otsu from morphological enhanced image with region growing, otsu of deconvolution):
The enhanced image was produced by performing original + tophat(original) - bottomhat(original) with a flat disk of radius 3. I manually picked the seed point for region growing and manually picked the best threshold.
For empty cells you get weird results (original and otsu of deconvlution):
But I don't think you would have trouble to detect whether a cell is empty or not (the global threshold already solves it).
Added the best results I could get with a different approach: region growing. I also attempted some other approaches, but this was the second best one.
I'm looking for a way to detect which of two (similar) images is sharper.
I'm thinking this could be using some measure of overall sharpness and generating a score (hypothetical example: image1 has sharpness score of 9, image2 has sharpness score of 7; so image1 is sharper)
I've done some searches for sharpness detection/scoring algorithms, but have only come across ones that will enhance image sharpness.
Has anyone done something like this, or have any useful resources/leads?
I would be using this functionality in the context of a webapp, so PHP or C/C++ is preferred.
As e.g. shown in this Matlab Central page, the sharpness can be estimated by the average gradient magnitude.
I used this in Python as
from PIL import Image
import numpy as np
im ='L') # to grayscale
array = np.asarray(im, dtype=np.int32)
gy, gx = np.gradient(array)
gnorm = np.sqrt(gx**2 + gy**2)
sharpness = np.average(gnorm)
A similar number can be computed with the simpler numpy.diff instead of numpy.gradient. The resulting array sizes need to be adapted there:
dx = np.diff(array)[1:,:] # remove the first row
dy = np.diff(array, axis=0)[:,1:] # remove the first column
dnorm = np.sqrt(dx**2 + dy**2)
sharpness = np.average(dnorm)
The simple method is to measure contrast -- the image with the largest differences between pixel values is the sharpest. You can, for example, compute the variance (or standard deviation) of the pixel values, and whichever produces the larger number wins. That looks for maximum overall contrast, which may not be what you want though -- in particular, it will tend to favor pictures with maximum depth of field.
Depending on what you want, you may prefer to use something like an FFT, to see which displays the highest frequency content. This allows you to favor a picture that's extremely sharp in some parts (but less so in others) over one that has more depth of field, so more of the image is reasonably sharp, but the maximum sharpness is lower (which is common, due to diffraction with smaller apertures).
Simple practical approach would be to use edge detection (more edges == sharper image).
Quick and dirty hands-on using PHP GD
function getBlurAmount($image) {
$size = getimagesize($image);
$image = imagecreatefromjpeg($image);
imagefilter($image, IMG_FILTER_EDGEDETECT);
$blur = 0;
for ($x = 0; $x < $size[0]; $x++) {
for ($y = 0; $y < $size[1]; $y++) {
$blur += imagecolorat($image, $x, $y) & 0xFF;
return $blur;
$e1 = getBlurAmount('');
$e2 = getBlurAmount('');
echo "Relative blur amount: first image " . $e1 / min($e1, $e2) . ", second image " . $e2 / min($e1, $e2);
(image with less blur is sharper)
More efficient approach would be to detect edges in your code, using Sobel operator. PHP example (rewriting in C++ should give huge performance boost I guess).
This paper describes a method for computing a blur factor using DWT. Looked pretty straight forward but instead of detecting sharpness it's detecting blurredness. Seems it detects edges first (simple convolution) and then uses DWT to accumulate and score it.
Check Contrast Transfer Functions (CTF)
Here's an implementation
Here's an explanation