Related
Let say I have this input image, with any number of boxes. I want to segment out these boxes, so I can eventually extract them out.
input image:
The background could anything that is continuous, like a painted wall, wooden table, carpet.
My idea was that the gradient would be the same throughout the background, and with a constant gradient. I could turn where the gradient is about the same, into zero's in the image.
Through edge detection, I would dilate and fill the regions where edges detected. Essentially my goal is to make a blob of the areas where the boxes are. Having the blobs, I would know the exact location of the boxes, thus being able to crop out the boxes from the input image.
So in this case, I should be able to have four blobs, and then I would be able to crop out four images from the input image.
This is how far I got:
segmented image:
query = imread('AllFour.jpg');
gray = rgb2gray(query);
[~, threshold] = edge(gray, 'sobel');
weightedFactor = 1.5;
BWs = edge(gray,'roberts');
%figure, imshow(BWs), title('binary gradient mask');
se90 = strel('disk', 30);
se0 = strel('square', 3);
BWsdil = imdilate(BWs, [se90]);
%figure, imshow(BWsdil), title('dilated gradient mask');
BWdfill = imfill(BWsdil, 'holes');
figure, imshow(BWdfill);
title('binary image with filled holes');
What a very interesting problem! Here's my solution in an attempt to solve this problem for you. This is assuming that the background has the same colour distribution throughout. First, transform your image from RGB to the HSV colour space with rgb2hsv. The HSV colour space is an ideal transform for analyzing colours. After this, I would look at the saturation and value planes. Saturation is concerned with how "pure" the colour is, while value is the intensity or brightness of the colour itself. If you take a look at the saturation and value planes for the image, this is what is shown:
im = imread('http://i.stack.imgur.com/1SGVm.jpg');
out = rgb2hsv(im);
figure;
subplot(2,1,1);
imshow(out(:,:,2));
subplot(2,1,2);
imshow(out(:,:,3));
This is what I get:
By taking a look at some locations in the gray background, it looks like the majority of the saturation are less than 0.2 as well as the elements in the value plane are greater than 0.3. As such, we want to find the opposite of those pixels to get our objects. As such, we find those pixels whose saturation is greater than 0.2 or those pixels with a value that is less than 0.3:
seg = out(:,:,2) > 0.2 | out(:,:,3) < 0.3;
This is what we get:
Almost there! There are some spurious single pixels, so I'm going to perform an opening with imopen with a line structuring element.
After this, I'll perform a dilation with imdilate to close any gaps, then use imfill with the 'holes' option to fill in the gaps, then use erosion with imerode to shrink the shapes back to their original form. As such:
se = strel('line', 3, 90);
pre = imopen(seg, c);
se = strel('square', 20);
pre2 = imdilate(pre, se);
pre3 = imfill(pre2, 'holes');
final = imerode(pre3, se);
figure;
imshow(final);
final contains the segmented image with the 4 candy boxes. This is what I get:
Try resizing the image. When you make it smaller, it would be easier to join edges. I tried what's shown below. You might have to tune it depending on the nature of the background.
close all;
clear all;
im = imread('1SGVm.jpg');
small = imresize(im, .25); % resize
grad = (double(imdilate(small, ones(3))) - double(small)); % extract edges
gradSum = sum(grad, 3);
bw = edge(gradSum, 'Canny');
joined = imdilate(bw, ones(3)); % join edges
filled = imfill(joined, 'holes');
filled = imerode(filled, ones(3));
imshow(label2rgb(bwlabel(filled))) % label the regions and show
If you have a recent version of MATLAB, try the Color Thresholder app in the image processing toolbox. It lets you interactively play with different color spaces, to see which one can give you the best segmentation.
If your candy covers are fixed or you know all the covers that are coming into the scene then Template matching is best for this. As it is independent of the background in the image.
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
I am working on some leaf images using OpenCV (Java). The leaves are captured on a white paper and some has shadows like this one:
Of course, it's somehow the extreme case (there are milder shadows).
Now, I want to threshold the leaf and also remove the shadow (while reserving the leaf's details).
My current flow is this:
1) Converting to HSV and extracting the Saturation channel:
Imgproc.cvtColor(colorMat, colorMat, Imgproc.COLOR_RGB2HSV);
ArrayList<Mat> channels = new ArrayList<Mat>();
Core.split(colorMat, channels);
satImg = channels.get(1);
2) De-noising (median) and applying adaptiveThreshold:
Imgproc.medianBlur(satImg , satImg , 11);
Imgproc.adaptiveThreshold(satImg , satImg , 255, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY, 401, -10);
And the result is this:
It looks OK, but the shadow is causing some anomalies along the left boundary. Also, I have this feeling that I am not using the white background to my benefit.
Now, I have 2 questions:
1) How can I improve the result and get rid of the shadow?
2) Can I get good results without working on saturation channel?. The reason I ask is that on most of my images, working on L channel (from HLS) gives way better results (apart from the shadow, of course).
Update: Using the Hue channel makes threshdolding better, but makes the shadow situation worse:
Update2: In some cases, the assumption that the shadow is darker than the leaf doesn't always hold. So, working on intensities won't help. I'm looking more toward a color channels approach.
I don't use opencv, instead I was trying to use matlab image processing toolbox to extract the leaf. Hopefully opencv has all the processing functions for you. Please see my result below. I did all the operations in your original image channel 3 and channel 1.
First I used your channel 3, threshold it with 100 (left top). Then I remove the regions on the border and regions with the pixel size smaller than 100, filling in the hole in the leaf, the result is shown in right top.
Next I used your channel 1, did the same thing as I did in channel 3, the result is shown in left bottom. Then I found out the connected regions (there are only two as you can see in the left bottom figure), remove the one with smaller area (shown in right bottom).
Suppose the right top image is I1, and the right bottom image is I, the leaf is extracted by implement ~I && I1. The leaf is:
Hope it helps. Thanks
I tried two different things:
1. other thresholding on the saturation channel
2. try to find two contours: shadow and leaf
I use c++ so your code snippets will look a little different.
trying otsu-thresholding instead of adaptive thresholding:
cv::threshold(hsv_imgs,mask,0,255,CV_THRESH_BINARY|CV_THRESH_OTSU);
leading to following images (just OTSU thresholding on saturation channel):
the other thing is computing gradient information (i used sobel, see oppenCV documentation), thresholding that and after an opening-operator I used findContours giving something like this, not useable yet (gradient contour approach):
I'm trying to do the same thing with photos of butterflies, but with more uneven and unpredictable backgrounds such as this. Once you've identified a good portion of the background (e.g. via thresholding, or as we do, flood filling from random points), what works well is to use the GrabCut algorithm to get all those bits you might miss on the initial pass. In python, assuming you still want to identify an initial area of background by thresholding on the saturation channel, try something like
import cv2
import numpy as np
img = cv2.imread("leaf.jpg")
sat = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)[:,:,1]
sat = cv2.medianBlur(sat, 11)
thresh = cv2.adaptiveThreshold(sat , 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 401, 10);
cv2.imwrite("thresh.jpg", thresh)
h, w = img.shape[:2]
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
grabcut_mask = thresh/255*3 #background should be 0, probable foreground = 3
cv2.grabCut(img, grabcut_mask,(0,0,w,h),bgdModel,fgdModel,5,cv2.GC_INIT_WITH_MASK)
grabcut_mask = np.where((grabcut_mask ==2)|(grabcut_mask ==0),0,1).astype('uint8')
cv2.imwrite("GrabCut1.jpg", img*grabcut_mask[...,None])
This actually gets rid of the shadows for you in this case, because the edge of the shadow actually has high saturation levels, so is included in the grab cut deletion. (I would post images, but don't have enough reputation)
Usually, however, you can't trust shadows to be included in the background detection. In this case you probably want to compare areas in the image with colour of the now-known background using the chromacity distortion measure proposed by Horprasert et. al. (1999) in "A Statistical Approach for Real-time Robust Background Subtraction and Shadow Detection". This measure takes account of the fact that for desaturated colours, hue is not a relevant measure.
Note that the pdf of the preprint you find online has a mistake (no + signs) in equation 6. You can use the version re-quoted in Rodriguez-Gomez et al (2012), equations 1 & 2. Or you can use my python code below:
def brightness_distortion(I, mu, sigma):
return np.sum(I*mu/sigma**2, axis=-1) / np.sum((mu/sigma)**2, axis=-1)
def chromacity_distortion(I, mu, sigma):
alpha = brightness_distortion(I, mu, sigma)[...,None]
return np.sqrt(np.sum(((I - alpha * mu)/sigma)**2, axis=-1))
You can feed the known background mean & stdev as the last two parameters of the chromacity_distortion function, and the RGB pixel image as the first parameter, which should show you that the shadow is basically the same chromacity as the background, and very different from the leaf. In the code below, I've then thresholded on chromacity, and done another grabcut pass. This works to remove the shadow even if the first grabcut pass doesn't (e.g. if you originally thresholded on hue)
mean, stdev = cv2.meanStdDev(img, mask = 255-thresh)
mean = mean.ravel() #bizarrely, meanStdDev returns an array of size [3,1], not [3], so flatten it
stdev = stdev.ravel()
chrom = chromacity_distortion(img, mean, stdev)
chrom255 = cv2.normalize(chrom, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX).astype(np.uint8)[:,:,None]
cv2.imwrite("ChromacityDistortionFromBackground.jpg", chrom255)
thresh2 = cv2.adaptiveThreshold(chrom255 , 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 401, 10);
cv2.imwrite("thresh2.jpg", thresh2)
grabcut_mask[...] = 3
grabcut_mask[thresh==0] = 0 #where thresh == 0, definitely background, set to 0
grabcut_mask[np.logical_and(thresh == 255, thresh2 == 0)] = 2 #could try setting this to 2 or 0
cv2.grabCut(img, grabcut_mask,(0,0,w,h),bgdModel,fgdModel,5,cv2.GC_INIT_WITH_MASK)
grabcut_mask = np.where((grabcut_mask ==2)|(grabcut_mask ==0),0,1).astype('uint8')
cv2.imwrite("final_leaf.jpg", grabcut_mask[...,None]*img)
I'm afraid with the parameters I tried, this still removes the stalk, though. I think that's because GrabCut thinks that it looks a similar colour to the shadows. Let me know if you find a way to keep it.
Hi I'm using EmguCV and I enjoy programming with it.
However I'm wondering whether there is an elegant way to add two pixels individually.
To add images, you can use CvInvoke.Add(), but for individual pixel operation, you seems to have to write it in an ugly way,
say you have p, p1 and p2 as EmguCV::Bgr,
you have to write
p = new Bgr(p1.b + p2.b, p1.g + p2.g, p1.r + p2.r);
I really hate this and tried to write an operator for this. But this is apparently impossible since operator overloading must be in the host class.
Is there any way to do this elegantly?
================Edit================
What I want to do is to calculate the summation of the pixels in an image. So the basic operation in this is to add pixels, or Bgr class.
Let's suppose you have two images img1 and img2:
If you want to add them you can do img3 = img1+ img2
If you simply want the summation of each color channel on a single image img1 you can do:
Bgr sums = img1.GetSum();
double TotalVal = sums.Blue + sums.Green + sums.Red;
Hope this helps,
Luca
Hello I am trying to do some image processing. I use Microsoft Kinect to detect humans on a room. I get depth data, do some background subtraction work and end up with a video sequence like this when a person enters the scene and walks around:
http://www.screenr.com/h7f8
I put a video so that you can see the behaviour of the noise in the video. Different colors represent different levels of depth. White represents empty. As you can see it is pretty noisy, especially the red noises.
I need to get rid of everything except the human as much as possible. When I do erosion/dilation (using a very big window size) I can get rid of a lot of the noise but I wondered if there are other methods I can use. Especially the red noise in the video is hard to remove using erosion/dilation.
Some notes:
1) A better background subtraction could be done if we knew when there are no humans in the scene but the background subtraction we do is fully automatic and it works even when there are humans in the scene and even when the camera is moved etc. so this is the best background subtraction we can get right now.
2) The algorithm will work on an embedded system, real time. So the more efficient and easy the algorithm the better. And it doesn't have to be perfect. Though complicated signal processing techniques are also welcome (maybe we might use them on another project who does not need embedded, real time processing).
3) I don't need an actual code. Just ideas.
Just my two cents:
If you don't mind using the SDK for that, then you can very easily keep only the person pixels using the PlayerIndexBitmask as Outlaw Lemur shows.
Now you may not want to be dependable on the drivers for that and want to do it in an image processing level. An approach that we had tried in a project and worked pretty good was contour based. We began by a background subtraction and then we detected the largest contour in the image assuming that this was the person (since usually the noise that remained was very small blobs) and we filled that contour and kept that. You could also use some kind of median filtering as a first pass.
Of course, this is not perfect nor suitable in every case and probably there are a lot better methods. But I'm just throwing it out there in case it helps you come up with any ideas.
Take a look at the eyesweb.
It is a platform for designing that supports kinect device and you can apply noise filters on the outputs. It is a very usefull and simple tool for multimodal systems designing.
I may be wrong (I'd need the video without processing for that) but I'd tend to say that you are trying to get rid of illumination changes.
This is what makes people detection really difficult in 'real' environmnents.
You can check out this other SO question for some links.
I used to detect humans real-time in the same configuration than you, but with monocular vision.
In my case, a really good descriptor was the LBPs, that is mainly used for texture classification.
This is quite simple to put into practice (there are implementations all over the web).
The LBPs where basically used to define an area of interest where movement is detected, so that I can process only part of the image and get rid of all that noise.
This paper for example uses LBP for grayscale correction of images.
Hope that brings some new ideas.
This is pretty simple assuming you are using the Kinect SDK. I would follow this video for Depth basics, and do something like this:
private byte[] GenerateColoredBytes(DepthImageFrame depthFrame)
{
//get the raw data from kinect with the depth for every pixel
short[] rawDepthData = new short[depthFrame.PixelDataLength];
depthFrame.CopyPixelDataTo(rawDepthData);
//use depthFrame to create the image to display on-screen
//depthFrame contains color information for all pixels in image
//Height x Width x 4 (Red, Green, Blue, empty byte)
Byte[] pixels = new byte[depthFrame.Height * depthFrame.Width * 4];
//Bgr32 - Blue, Green, Red, empty byte
//Bgra32 - Blue, Green, Red, transparency
//You must set transparency for Bgra as .NET defaults a byte to 0 = fully transparent
//hardcoded locations to Blue, Green, Red (BGR) index positions
const int BlueIndex = 0;
const int GreenIndex = 1;
const int RedIndex = 2;
//loop through all distances
//pick a RGB color based on distance
for (int depthIndex = 0, colorIndex = 0;
depthIndex < rawDepthData.Length && colorIndex < pixels.Length;
depthIndex++, colorIndex += 4)
{
//get the player (requires skeleton tracking enabled for values)
int player = rawDepthData[depthIndex] & DepthImageFrame.PlayerIndexBitmask;
//gets the depth value
int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;
//.9M or 2.95'
if (depth <= 900)
{
//we are very close
pixels[colorIndex + BlueIndex] = Colors.White.B;
pixels[colorIndex + GreenIndex] = Colors.White.G;
pixels[colorIndex + RedIndex] = Colors.White.R;
}
// .9M - 2M or 2.95' - 6.56'
else if (depth > 900 && depth < 2000)
{
//we are a bit further away
pixels[colorIndex + BlueIndex] = Colors.White.B;
pixels[colorIndex + GreenIndex] = Colors.White.G;
pixels[colorIndex + RedIndex] = Colors.White.R;
}
// 2M+ or 6.56'+
else if (depth > 2000)
{
//we are the farthest
pixels[colorIndex + BlueIndex] = Colors.White.B;
pixels[colorIndex + GreenIndex] = Colors.White.G;
pixels[colorIndex + RedIndex] = Colors.White.R;
}
////equal coloring for monochromatic histogram
//byte intensity = CalculateIntensityFromDepth(depth);
//pixels[colorIndex + BlueIndex] = intensity;
//pixels[colorIndex + GreenIndex] = intensity;
//pixels[colorIndex + RedIndex] = intensity;
//Color all players "gold"
if (player > 0)
{
pixels[colorIndex + BlueIndex] = Colors.Gold.B;
pixels[colorIndex + GreenIndex] = Colors.Gold.G;
pixels[colorIndex + RedIndex] = Colors.Gold.R;
}
}
return pixels;
}
This turns everything except humans white, and the humans are gold. Hope this helps!
EDIT
I know you didn't necessarily want code just ideas, so I would say find an algorithm that finds the depth, and one that finds the amount of humans, and color everything white except the humans. I have provided all of this, but I didn't know if you knew what was going on. Also I have an image of the final program.
Note: I added the second depth frame for perspective
I can't get this to work in my AS1 application. I am using the Color.setTransform method.
Am I correct in thinking the following object creation should result in transforming a colour to white?
var AColorTransform = {ra:100, rb:255, ga:100, gb:255, ba:100, bb:255, aa:100, ab:255};
And this one to black?
AColorTransform = {ra:100, rb:-255, ga:100, gb:-255, ba:100, bb:-255, aa:100, ab:-255};
I read on some websites that calling setRGB or setTransform may not result in actually changing the display colour when the object you're performing the operation on has some kind of dynamic behaviour. Does anyone know more about these situations? And how to change the colour under all circumstances?
Regards.
Been a long time since I've had to do anything is AS1, but I'll do my best.
The basic code for a color.setTransform() looks like this...
var AColorTransform = {ra:100, rb:255, ga:100, gb:255, ba:100, bb:255, aa:100, ab:255};
var myColor = new Color(mc);
myColor.setTransform(AColorTransform);
...where mc is a MovieClip on the stage somewhere.
Remember that you're asking about transform, which by its nature is intended to transform colors from what they are to something else. If you want to reliably paint in a specific color (such as black or white), you're usually far better off using setRGB, which would look like this:
var myColor = new Color(mc);
//set to black
myColor.setRGB(0x000000);
//or set to white
myColor.setRGB(0xFFFFFF);
These work reliably, though there can be some gotchas. Generally, just remember that the color is attached to the specific MovieClip...so if that MovieClip falls out of scope (ie, it disappears from the timeline) your color will be deleted with it.
Read further only if you want to understand color transform better:
Let's look at the components of that color transform.
a (multiplier 0 > 100%) b(offset -255 > 255)
r ra rb
g ga gb
b ba bb
a aa bb
There are four channels (r, g, b, and a). The first three are for red, green and blue, and the last one for alpha (transparency). Each channel has an 'a' component and a 'b' component, thus ra, rb, ga, gb, etc. The 'a' component is a percentage multiplier. That is, it will multiply any existing channel by the percent in that value. The 'b' component is an offset. So 'ra' multiplies the existing red channel. 'rb' offsets it. If your red channel starts as 'FF' (full on red), setting ra:100 will have no effect, since multiplying FF by 100% results in no change. Similarly, if red starts at '00' (no red at all), no value of 'ra' will have any effect, since (if you recall your Shakespeare) twice nothing is still nothing. Things in-between will multiply as you'd expect.
Offsets are added after multiplication. So you can multiply by some value, then offset it:
r (result red color) = (RR * ra%) + rb
g (result green color) = (GG * ga%) + gb
b (result blue color) = (BB * ba%) + bb
a (result alpha) = (AA * aa%) + ab
example: RR = 128 (hex 0x80), ra = 50 (50% or .5), rb = -20
resulting red channel: (128 * .5) + (-20) = 44 (hex 0x2C)
Frankly, this all gets so confusing that I tend to prefer the simple sanity of avoiding transforms altogether and go with the much simpler setRGB().