The image below has many circles. Click and zoom in to see the circles.
https://drive.google.com/open?id=1ox3kiRX5hf2tHDptWfgcbMTAHKCDizSI
What I want is counting the circles using any free language, such as python.
Is there a function or idea to do it?
Edit: I came up with a better solution, partially inspired by this answer below. I thought of this method originally (as noted in the OP comments) but I decided against it. The original image was just not good enough quality for it. However I improved that method and it works brilliantly for the better quality image. The original approach is first, and then the new approach at the bottom.
First approach
So here's a general approach that seems to work well, but definitely just gives estimates. This assumes that circles are roughly the same size.
First, the image is mostly blue---so it seems reasonable to just do the analysis on the blue channel. Thresholding the blue channel, in this case, using Otsu thresholding (which determines an optimal threshold value without input) seems to work very well. This isn't too much of a surprise since the distribution of color values is pretty much binary. Check the mask that results from it!
Then, do a connected component analysis on the mask to get the area of each component (component = white blob in the mask). The statistics returned from connectedComponentsWithStats() give (among other things) the area, which is exactly what we need. Then we can simply count the circles by estimating how many circles fit in a given component based on its area. Also note that I'm taking the statistics for every label except the first one: this is the background label 0, and not any of the white blobs.
Now, how large in area is a single circle? It would be best to let the data tell us. So you could compute a histogram of all the areas, and since there are more single circles than anything else, there will be a high concentration around 250-270 pixels or so for the area. Or you could just take an average of all the areas between something like 50 and 350 which should also get you in a similar ballpark.
Really in this histogram you can see the demarcations between single circles, double circles, triple, and so on quite easily. Only the larger components will give pretty rough estimates. And in fact, the area doesn't seem to scale exactly linearly. Blobs of two circles are slightly larger than two single circles, and blobs of three are larger still than three single circles, and so on, so this makes it a little difficult to estimate nicely, but rounding should still keep us close. If you want you could include a small multiplication parameter that increases as the area increases to account for that, but that would be hard to quantify without going through the histogram analytically...so, I didn't worry about this.
A single circle area divided by the average single circle area should be close to 1. And the area of a 5-circle group divided by the average circle area should be close to 5. And this also means that small insignificant components, that are 1 or 10 or even 100 pixels in area, will not count towards the total since round(50/avg_circle_size) < 1/2, so those will round down to a count of 0. Thus I should just be able to take all the component areas, divide them by the average circle size, round, and get to a decent estimate by summing them all up.
import cv2
import numpy as np
img = cv2.imread('circles.png')
mask = cv2.threshold(img[:, :, 0], 255, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
stats = cv2.connectedComponentsWithStats(mask, 8)[2]
label_area = stats[1:, cv2.CC_STAT_AREA]
min_area, max_area = 50, 350 # min/max for a single circle
singular_mask = (min_area < label_area) & (label_area <= max_area)
circle_area = np.mean(label_area[singular_mask])
n_circles = int(np.sum(np.round(label_area / circle_area)))
print('Total circles:', n_circles)
This code is simple and effective for rough counts.
However, there are definitely some assumptions here about the groups of circles compared to a normal circle size, and there are issues where circles that are at the boundaries will not be counted correctly (these aren't well defined---a two circle blob that is half cut off will look more like one circle---no clear way to count or not count these with this method). Further I just used automatic thresholding via Otsu here; you could get (probably better) results with more careful color filtering. Additionally in the mask generated by Otsu, some circles that are masked have a few pixels removed from their center. Morphology could add these pixels back in, which would give you a (slightly larger) more accurate area for the single circle components. Either way, I just wanted to give the general idea towards how you could easily estimate this with minimal code.
New approach
Before, the goal was to count circles. This new approach instead counts the centers of the circles. The general idea is you threshold and then flood fill from a background pixel to fill in the background (flood fill works like the paint bucket tool in photo editing apps), that way you only see the centers, as shown in this answer below.
However, this relies on global thresholding, which isn't robust to local lighting changes. This means that since some centers are brighter/darker than others, you won't always get good results with a single threshold.
Here I've created an animation to show looping through different threshold values; watch as some centers appear and disappear at different times, meaning you get different counts depending on the threshold you choose (this is just a small patch of the image, it happens everywhere):
Notice that the first blob to appear in the top left actually disappears as the threshold increases. However, if we actually OR each frame together, then each detected pixel persists:
But now every single speck appears, so we should clean up the mask each frame so that we remove single pixels as they come (otherwise they may build up and be hard to remove later). Simple morphological opening with a small kernel will remove them:
Applied over the whole image, this method works incredibly well and finds almost every single cell. There are only three false positives (detected blob that's not a center) and two misses I can spot, and the code is very simple. The final thing to do after the mask has been created is simply count the components, minus one for the background. The only user input required here is a single point to flood fill from that is in the background (seed_pt in the code).
img = cv2.imread('circles.png', 0)
seed_pt = (25, 25)
fill_color = 0
mask = np.zeros_like(img)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
for th in range(60, 120):
prev_mask = mask.copy()
mask = cv2.threshold(img, th, 255, cv2.THRESH_BINARY)[1]
mask = cv2.floodFill(mask, None, seed_pt, fill_color)[1]
mask = cv2.bitwise_or(mask, prev_mask)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
n_centers = cv2.connectedComponents(mask)[0] - 1
print('There are %d cells in the image.'%n_centers)
There are 874 cells in the image.
One possible solution would be to read the image using OpenCV, get its grayscale, then use Canny edge detection and perform countour finding in OpenCV. This will return a list of countours. It would look something like:
import cv2
image = cv2.imread('path-to-your-image')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# tweak the parameters of the GaussianBlur for best performance
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
# again, try different values here
edged = cv2.Canny(blurred, 20, 140)
(_, contours, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))
If you have all images like this - consider thresholding it, not necessarily by auto threshold-seeking algorithm like Otsu, but rather using simplest threshold by a given threshold value. Yes, before thresholding you have to convert your color input to gray-scale, or take one of color channels. Then based on few experiments with channels and threshold values - determine threshold value to have circles with holes in monochrome thresholding result. Based on your png image I found value of 81 (intensity of gray varies from 0 to 255) to be great to threshold gray-scale version of your input to have such binary image with holes in place, as described above.
Then simply count those holes.
Holes can be determined by seed-filling white area, connected to image border. As result you will have white hole connected components on black background - so simply count them.
More details you can find here http://www.leptonica.com/filling.html and use leptonica primitives to do thresholding, hole counting an so on.
Related
I have an array of data from a grayscale image that I have segmented sets of contiguous points of a certain intensity value from.
Currently I am doing a naive bounding box routine where I find the minimum and maximum (x,y) [row, col] points. This obviously does not provide the smallest possible box that contains the set of points which is demonstrable by simply rotating a rectangle so the longest axis is no longer aligned with a principal axis.
What I wish to do is find the minimum sized oriented bounding box. This seems to be possible using an algorithm known as rotating calipers, however the implementations of this algorithm seem to rely on the idea that you have a set of vertices to begin with. Some details on this algorithm: https://www.geometrictools.com/Documentation/MinimumAreaRectangle.pdf
My main issue is in finding the vertices within the data that I currently have. I believe I need to at least find candidate vertices in order to reduce the amount of iterations I am performing, since the amount of points is relatively large and treating the interior points as if they are vertices is unnecessary if I can figure out a way to not include them.
Here is some example data that I am working with:
Here's the segmented scene using the naive algorithm, where it segments out the central objects relatively well due to the objects mostly being aligned with the image axes:
.
In red, you can see the current bounding boxes that I am drawing utilizing 2 vertices: top-left and bottom-right corners of the groups of points I have found.
The rotation part is where my current approach fails, as I am only defining the bounding box using two points, anything that is rotated and not axis-aligned will occupy much more area than necessary to encapsulate the points.
Here's an example with rotated objects in the scene:
Here's the current naive segmentation's performance on that scene, which is drawing larger than necessary boxes around the rotated objects:
Ideally the result would be bounding boxes aligned with the longest axis of the points that are being segmented, which is what I am having trouble implementing.
Here's an image roughly showing what I am really looking to accomplish:
You can also notice unnecessary segmentation done in the image around the borders as well as some small segments, which should be removed with some further heuristics that I have yet to develop. I would also be open to alternative segmentation algorithm suggestions that provide a more robust detection of the objects I am interested in.
I am not sure if this question will be completely clear, therefore I will try my best to clarify if it is not obvious what I am asking.
It's late, but that might still help. This is what you need to do:
expand pixels to make small segments connect larger bodies
find connected bodies
select a sample of pixels from each body
find the MBR ([oriented] minimum bounding rectangle) for selected set
For first step you can perform dilation. It's somehow like DBSCAN clustering. For step 3 you can simply select random pixels from a uniform distribution. Obviously the more pixels you keep, the more accurate the MBR will be. I tested this in MATLAB:
% import image as a matrix of 0s and 1s
oI = ~im2bw(rgb2gray(imread('vSb2r.png'))); % original image
% expand pixels
dI = imdilate(oI,strel('disk',4)); % dilated
% find connected bodies of pixels
CC = bwconncomp(dI);
L = labelmatrix(CC) .* uint8(oI); % labeled
% mark some random pixels
rI = rand(size(oI))<0.3;
sI = L.* uint8(rI) .* uint8(oI); % sampled
% find MBR for a set of connected pixels
for i=1:CC.NumObjects
[Y,X] = find(sI == i);
mbr(i) = getMBR( X, Y );
end
You can also remove some ineffective pixels using some more processing and morphological operations:
remove holes
find boundaries
find skeleton
In MATLAB:
I = imfill(I, 'holes');
I = bwmorph(I,'remove');
I = bwmorph(I,'skel');
I'm trying to show students how the RGB color model works to create a particular color (or moreover to convince them that it really does). So I want to take a picture and convert each pixel to an RGB representation so that when you zoom in, instead of a single colored pixel, you see the RGB colors.
I've done this but for some very obvious reasons the converted picture is either washed out or darker than the original (which is a minor inconvenience but I think it would be more powerful if I could get it to be more like the original).
Here are two pictures "zoomed out":
Here is a "medium zoom", starting to show the RGB artifacts in the converted picture:
And here is a picture zoomed in to the point that you can clearly see individual pixels and the RGB squares:
You'll notice the constant color surrounding the pixels; that is the average RGB of the picture. I put that there so that you could see individual pixels (otherwise you just see rows/columns of shades of red/green/blue). If I take that space out completely, the image is even darker and if I replace it with white, then the image looks faded (when zoomed out).
I know why displaying this way causes it to be darker: a "pure red" will come with a completely black blue and green. In a sense if I were to take a completely red picture, it would essentially be 1/3 the brightness of the original.
So my question is:
1: Are there any tools available that already do this (or something similar)?
2: Any ideas on how to get the converted image closer to the original?
For the 2nd question, I could of course just increase the brightness for each "RGB pixel" (the three horizontal stripes in each square), but by how much? I certainly can't just multiply the RGB ints by 3 (in apparent compensation for what I said above). I wonder if there is some way to adjust my background color to compensate for me? Or would it just have to be something that needs to be fiddled with for each picture?
You were correct to assume you could retain the brightness by multiplying everything by 3. There's just one small problem: the RGB values in an image use gamma correction, so the intensity is not linear. You need to de-gamma the values, multiply, then gamma correct them again.
You also need to lose the borders around each pixel. Those borders take up 7/16 of the final image which is just too much to compensate for. I tried rotating every other pixel by 90 degrees, and while it gives the result a definite zig-zag pattern it does make clear where the pixel boundaries are.
When you zoom out in an image viewer you might see the gamma problem too. Many viewers don't bother to do gamma correction when they resize. For an in-depth explanation see Gamma error in picture scaling, and use the test image supplied at the end. It might be better to forgo scaling altogether and simply step back from the monitor.
Here's some Python code and a crop from the resulting image.
from PIL import Image
im = Image.open(filename)
im2 = Image.new('RGB', (im.size[0]*3, im.size[1]*3))
ld1 = im.load()
ld2 = im2.load()
for y in range(im.size[1]):
for x in range(im.size[0]):
rgb = ld1[x,y]
rgb = [(c/255)**2.2 for c in rgb]
rgb = [min(1.0,c*3) for c in rgb]
rgb = tuple(int(255*(c**(1/2.2))) for c in rgb)
x2 = x*3
y2 = y*3
if (x+y) & 1:
for x3 in range(x2, x2+3):
ld2[x3,y2] = (rgb[0],0,0)
ld2[x3,y2+1] = (0,rgb[1],0)
ld2[x3,y2+2] = (0,0,rgb[2])
else:
for y3 in range(y2, y2+3):
ld2[x2,y3] = (rgb[0],0,0)
ld2[x2+1,y3] = (0,rgb[1],0)
ld2[x2+2,y3] = (0,0,rgb[2])
Don't waste so much time on this. You cannot make two images look the same if you have less information in one of them. You still have your computer that will subsample your image in weird ways while zooming out.
Just pass a magnifying glass through the class so they can see for themselves on their phones or other screens or show pictures of a screen in different magnification levels.
If you want to stick to software, triple the resolution of your image, don't use empty rows and columns or at least make them black to increase contrast and scale the RGB components to full range.
Why don't you keep the magnified image for the background ? This will let the two images look identical when zoomed out, while the RGB strips will remain clearly visible in the zoom-in.
If not, use the average color over the whole image to keep a similar intensity, but the washing effect will remain.
An intermediate option is to apply a strong lowpass filter on the image to smoothen all details and use that as the background, but I don't see a real advantage over the first approach.
I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.
What is a fast and reliable way to threshold images with possible blurring and non-uniform brightness?
Example (blurring but uniform brightness):
Because the image is not guaranteed to have uniform brightness, it's not feasible to use a fixed threshold. An adaptive threshold works alright, but because of the blurriness it creates breaks and distortions in the features (here, the important features are the Sudoku digits):
I've also tried using Histogram Equalization (using OpenCV's equalizeHist function). It increases contrast without reducing differences in brightness.
The best solution I've found is to divide the image by its morphological closing (credit to this post) to make the brightness uniform, then renormalize, then use a fixed threshold (using Otsu's algorithm to pick the optimal threshold level):
Here is code for this in OpenCV for Android:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
Mat closed = new Mat(); // closed will have type CV_32F
Imgproc.morphologyEx(image, closed, Imgproc.MORPH_CLOSE, kernel);
Core.divide(image, closed, closed, 1, CvType.CV_32F);
Core.normalize(closed, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255, Imgproc.THRESH_BINARY_INV
+Imgproc.THRESH_OTSU);
This works great but the closing operation is very slow. Reducing the size of the structuring element increases speed but reduces accuracy.
Edit: based on DCS's suggestion I tried using a high-pass filter. I chose the Laplacian filter, but I would expect similar results with Sobel and Scharr filters. The filter picks up high-frequency noise in the areas which do not contain features, and suffers from similar distortion to the adaptive threshold due to blurring. it also takes about as long as the closing operation. Here is an example with a 15x15 filter:
Edit 2: Based on AruniRC's answer, I used Canny edge detection on the image with the suggested parameters:
double mean = Core.mean(image).val[0];
Imgproc.Canny(image, image, 0.66*mean, 1.33*mean);
I'm not sure how to reliably automatically fine-tune the parameters to get connected digits.
Using Vaughn Cato and Theraot's suggestions, I scaled down the image before closing it, then scaled the closed image up to regular size. I also reduced the kernel size proportionately.
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(5,5));
Mat temp = new Mat();
Imgproc.resize(image, temp, new Size(image.cols()/4, image.rows()/4));
Imgproc.morphologyEx(temp, temp, Imgproc.MORPH_CLOSE, kernel);
Imgproc.resize(temp, temp, new Size(image.cols(), image.rows()));
Core.divide(image, temp, temp, 1, CvType.CV_32F); // temp will now have type CV_32F
Core.normalize(temp, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255,
Imgproc.THRESH_BINARY_INV+Imgproc.THRESH_OTSU);
The image below shows the results side-by-side for 3 different methods:
Left - regular size closing (432 pixels), size 19 kernel
Middle - half-size closing (216 pixels), size 9 kernel
Right - quarter-size closing (108 pixels), size 5 kernel
The image quality deteriorates as the size of the image used for closing gets smaller, but the deterioration isn't significant enough to affect feature recognition algorithms. The speed increases slightly more than 16-fold for the quarter-size closing, even with the resizing, which suggests that closing time is roughly proportional to the number of pixels in the image.
Any suggestions on how to further improve upon this idea (either by further reducing the speed, or reducing the deterioration in image quality) are very welcome.
Alternative approach:
Assuming your intention is to have the numerals to be clearly binarized ... shift your focus to components instead of the whole image.
Here's a pretty easy approach:
Do a Canny edgemap on the image. First try it with parameters to Canny function in the range of the low threshold to 0.66*[mean value] and the high threshold to 1.33*[mean value]. (meaning the mean of the greylevel values).
You would need to fiddle with the parameters a bit to get an image where the major components/numerals are visible clearly as separate components. Near perfect would be good enough at this stage.
Considering each Canny edge as a connected component (i.e. use the cvFindContours() or its C++ counterpart, whichever) one can estimate the foreground and background greylevels and reach a threshold.
For the last bit, do take a look at sections 2. and 3. of this paper. Skipping most of the non-essential theoretical parts it shouldn't be too difficult to have it implemented in OpenCV.
Hope this helped!
Edit 1:
Based on the Canny edge thresholds here's a very rough idea just sufficient to fine-tune the values. The high_threshold controls how strong an edge must be before it is detected. Basically, an edge must have gradient magnitude greater than high_threshold to be detected in the first place. So this does the initial detection of edges.
Now, the low_threshold deals with connecting nearby edges. It controls how much nearby disconnected edges will get combined together into a single edge. For a better idea, read "Step 6" of this webpage. Try setting a very small low_threshold and see how things come about. You could discard that 0.66*[mean value] thing if it doesn't work on these images - its just a rule of thumb anyway.
We use Bradleys algorithm for very similar problem (to segment letters from background, with uneven light and uneven background color), described here: http://people.scs.carleton.ca:8008/~roth/iit-publications-iti/docs/gerh-50002.pdf, C# code here: http://code.google.com/p/aforge/source/browse/trunk/Sources/Imaging/Filters/Adaptive+Binarization/BradleyLocalThresholding.cs?r=1360. It works on integral image, which can be calculated using integral function of OpenCV. It is very reliable and fast, but itself is not implemented in OpenCV, but is easy to port.
Another option is adaptiveThreshold method in openCV, but we did not give it a try: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold. The MEAN version is the same as bradleys, except that it uses a constant to modify the mean value instead of a percentage, which I think is better.
Also, good article is here: https://dsp.stackexchange.com/a/2504
You could try working on a per-tile basis if you know you have a good crop of the grid. Working on 9 subimages rather than the whole pic will most likely lead to more uniform brightness on each subimage. If your cropping is perfect you could even try going for each digit cell individually; but it all depends on how reliable is your crop.
Ellipse shape is complex to calculate if compared to a flat shape.
Try to change:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
to:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(19,19));
can speed up your enough solution with low impact to accuracy.
In my app, I will input a human image and I want to get the face and neck only of that person as output in separate image. Example: Below image as input:(Source:http://www.fremantlepress.com.au)
And I want to get the up image as output:
I want to perform the following algorithm:
1. Detect face
2. Select (face region * 2) area
3. Detect skin and neck
4. Cut the skin region of the selected image
5. Save that cut region into a new image
As going through the EmguCV wiki and other online resources, I am confident to perform the step 1 and 2. But I am not sure how can I accomplish step 3 and 4.
There are some functions/methods I am looking on (Cunny Edge Detection, Contour etc) but I am not sure how and where should I apply those methods.
I am using EmguCV (C#) and Windows Form Application.
Please help me how can I do step 3 and 4. I will be glad if someone elaborate these two steps and some code also.
Well there are several ways you could approach this. Edge detection will only give you a binary image of edges and you will have to perform some line tracing or Hough transforms to detect the location of these. There accuracy will vary.
I will assume for know that you can detect the eyes and the relative location of the face. I would expect a statistical filter would provide a favourable outcome with better performance than a neural network which is the best alternative. A good Alternative is naturally colour segmentation if colour images are used (This is far easier to implement). I will also assume that the head position can change slightly with the neck being more or less visible within an image.
So for a Statistical Filter:
(Note that the background of the individual is similar to the face data when dealing with a greyscale image so a colour image would be better to work with).
Take a blank copy of our original image. We will form a binary map of our face on this while not
necessary it will allow us to examine our success easier
Find the Face, Eyes and Mouth in the original image.
We can assume that any data from the eyes and mouth form part of the face and mark these on the
blank copy with "1"s.
Now we need a bit of maths, as we know the face detection algorithm can only detect a face at a
certain angle to the camera. We use this and select a statistical mask from the image of certain
parts from the image let’s say 10x10 pixels 2 or 3 from the cheek area. This will be the most
likely area of the face within the image. We use this data and get values from the image such as
mean and standard deviation.
We now scan across the segmented part of the image where we have detected the face. We won't do
the whole image as this will take a long period of time. (Note: There is a border half the size
of the mask that won't be looked at). We examine each pixel and it surrounding neighbours to the
size of the 10x10 mask. If the average or standard deviation (whatever we are examining) is
similar to that of our filter say within 10% then we mark this pixel in our blank copy as a "1"
and consider that pixel to belong to the skin.
As for Colour Segmentation:
(Note: You could also try this process for greyscale however it will be less successful due to the brickwork)
Repeat steps 1 to 2.
Again we will select certain areas of the image that we can expect to contain face data (i.e. 10
pixels below the eye). In this case however we examine the data that forms the colour of this
pixel. Don't forget HSV images can obtain better results from this process an a combination more
so. We can the scan across the image examining each pixel for a similar colour. If it matches
mark it on your binary map.
An alternative is subtracting or adding a calculated from the R G and B spectrum of the image of
which only the data face will survive. You can convert this directly to a binary image by
making any value > 1 == 1;
This will only work for Skin as for the hair we will need other filters. A few notes:
A statistical filter working on a colour image has a far greater ability however takes longer.
Use data from the image to form your statistical filter as this will allow for other skin colours to be classified. A mathematical designed filter or colour segmentation will require a lot of work to achieve the same variability.
The size of the mask is important the greater the mask size the less likely errors will occur but again processing time increases.
You can speed up the process by referencing the same area within the binary map copy if the pixel your examining is already a 1 (classified by eye/nose/mouth detection) then why examine it again just skip it.
Multiple skin filters will provide better results however may also introduce more noise and remember each filter must then by compared with a pixel increasing processing time.
To get an lgorithm working accuratley will require a bit of trial and error but you sould see comparable results fairly quickly using these methods.
I hope this helps you on your way. Sorry for not including any code but hopefully others can help you were you get stuck and writing it yourself will help you understand what is going on and allow you to cut down on processing time. Let me know if you require any additional advice I'm doing my PhD in image analysis just so you know that the advice is sound.
Take Care
Chris
[EDIT]
Some quick results:
Here is a 20x20 filter applied in detecting the hair. The program I've written only works on greyscale images at the moment so the skin detection suffers interference from the stone (see later)
Colour Image of Face Region
Binary Map of Average Hair Filter 20x20 Mask 40% Error allowed
As can be observed there is interference from the shirt in this case as it matches the colour of the hair. This can be eliminated by simply only examining the top third or half of the detected facial region.
Binary Map of Average Skin Filter 20x20 Mask 40% Error allowed
In this image I use only 1 filter formed from the chin area as the stubble obviously changes the filters behaviour. There is still noise presented from the stone behind the individual however using colour image could eliminate this. The gaps in the case could be filled by an algorithm or another filter. Again there is noise from the edge of the shirt but we could minimise this either by detecting the shirt and removing any data that forms it or dimply only looking in certain areas.
Examples of the Regions to Inspect
To eliminate false classification you could take the top two thirds of the segmented image and look for the face and the width of the detected eyes to the bottom of the facial region for neck data.
Cheers Again
Chris
Hello Chris Can you share the codes for the same. Actually I have used grabcut algorithm to crop the face upto neck but the accuracy of images is not perfect. I am sharing the code where i am using webcam to capture images and then blurring the background and using grabcut algorithm. Please check it and reply.
import numpy as np
import cv2
import pixellib
from pixellib.tune_bg import alter_bg
rect = (0,0,0,0)
startPoint = False
endPoint = False
img_counter = 0
# function for mouse callback
def on_mouse(event,x,y,flags,params):
global rect,startPoint,endPoint
# get mouse click
if event == cv2.EVENT_LBUTTONDOWN:
if startPoint == True and endPoint == True:
startPoint = False
endPoint = False
rect = (0, 0, 0, 0)
if startPoint == False:
rect = (x, y, 0, 0)
startPoint = True
elif endPoint == False:
rect = (rect[0], rect[1], x, y)
endPoint = True
#cap = cv2.VideoCapture("YourVideoFile.mp4")
#cap = cv2.imread("/home/mongoose/Projects/background removal/bg_grabcut/GrabCut-from-video-master/IMG_6471.jpg")
#capturing the camera feed, '0' denotes the first camera connected to the computer
cap = cv2.VideoCapture(0)
waitTime = 50
change_bg = alter_bg(model_type = "pb")
change_bg.load_pascalvoc_model("/home/mongoose/Projects/background removal/bg_grabcut/test/xception_pascalvoc.pb")
change_bg.blur_camera(cap, extreme = True, frames_per_second= 10, output_video_name= "output_video.mp4", show_frames= True, frame_name= "frame", detect = "person")
#Reading the first frame
(grabbed, frame) = cap.read()
while(cap.isOpened()):
(grabbed, frame) = cap.read()
cv2.namedWindow('frame')
cv2.setMouseCallback('frame', on_mouse)
#drawing rectangle
if startPoint == True and endPoint == True:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[2], rect[3]), (255, 0, 255), 2)
if not grabbed:
break
cv2.imshow('frame',frame)
key = cv2.waitKey(waitTime)
if key == ord('q'):
#esc pressed
break
elif key % 256 == 32:
# SPACE pressed
alpha = 1 # Transparency factor.
img_name = "opencv_frame_{}.png".format(img_counter)
imgCopy = frame.copy()
img = frame
mask = np.zeros(img.shape[:2], np.uint8)
bgdModel = np.zeros((1, 65), np.float64)
fgdModel = np.zeros((1, 65), np.float64)
w = abs(rect[0]-rect[2]+10)
h= abs(rect[1]-rect[3]+10)
rect2 = (rect[0]+10, rect[1]+10,w ,h )
cv2.grabCut(img, mask, rect2, bgdModel, fgdModel, 100, cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')
img = img * mask2[:, :, np.newaxis]
cv2.imwrite(img_name, img )
print("{} written!".format(img_name))
img_counter += 1
cap.release()
cv2.destroyAllWindows()