Background Subtraction in OpenCV - opencv

I am trying to subtract two images using absdiff function ,to extract moving object, it works good but sometimes background appears in front of foreground.
This actually happens when the background and foreground colors are similar,Is there any solution to overcome this problem?
It may be description of the problem above not enough; so I attach images in the following
link .
Thanks..

You can use some pre-processing techniques like edge detection and some contrast stretching algorithm, which will give you some extra information for subtracting the image. Since color is same but new object should have texture feature like edge; if the edge gets preserved properly then when performing image subtraction you will obtain the object.
Process flow:
Use edge detection algorithm.
Contrast stretching algorithm(like histogram stretching).
Use the detected edge top of the contrast stretched image.
Now use the image subtraction algorithm from OpenCV.

There isn't enough information to formulate a complete solution to your problem but there are some tips I can offer:
First, prefilter the input and background images using a strong
median (or gaussian) filter. This will make your results much more
robust to image noise and confusion from minor, non-essential detail
(like the horizontal lines of your background image). Unless you want
to detect a single moving strand of hair, you don't need to process
the raw pixels.
Next, take the advice offered in the comments to test all 3 color
channels as opposed to going straight to grayscale.
Then create a grayscale image from the the max of the 3 absdiffs done
on each channel.
Then perform your closing and opening procedure.
I don't know your requirements so I can't take them into account. If accuracy is of the utmost importance. I'd use the median filter on input image over gaussian. If speed is an issue I'd scale down the input images for processing by at least half, then scale the result up again. If the camera is in a fixed position and you have a pre-calibrated background, then the current naive difference method should work. If the system has to determine movement from a real world environment over an extended period of time (moving shadows, plants, vehicles, weather, etc) then a rolling average (or gaussian) background model will work better. If the camera is moving you will need to do a lot more processing, probably some optical flow and/or fourier transform tests. All of these things need to be considered to provide the best solution for the application.

Related

Which method of object detection using OpenCV is best suited to changing environments?

I am trying to identify the best OpenCV methods and implementations to identify objects that appear in a live camera feed. The implementation needs to be robust to changing light conditions and would ideally accommodate slight movement in the background (trees/clouds moving) without picking up too much noise.
The options I have tried so far include;
-Identifying the absolute pixel differences between an empty background frame and the current frame (works poorly if light conditions change or if camera jiggles)
-Background subtraction (good for changing conditions but results in excessive noise)
-Have also thought about using edge detection, which would be unaffected by changing light conditions, and somehow compare object shapes to see what is new
I would ideally like an output that allows me to generate a bounding box for any objects that move around in the frame, with the background reference gradually changing over time to accommodate changing light conditions (or can a input frame be normalized for light so this has no effect?)
What would you recommend? I'm running OpenCV through Visual Studio 2017 in C++
I am doing course project on a similar project. I used several methods such as HSV/RGB threshold, Accumulative Difference Image, and MOG from OpenCV library.
For your case I would recommend Accumulative Difference Image which is very similar to MOG. Basically, you subtract consecutive frames from each other (previous from current), and then use threshold to convert to binary image. Then you can perform morphological operation on the binary image to enhance detection.
This method is relatively good for your case as it is not sensitive to reasonably slow changes of lighting and camera movements. However stationary object will not be detected.
In order to remove the noise you may want to Erode and then Dilate the binary image (play around with morphological operations).
Good luck!

Ideas to process challenging image

I'm working with Infra Red image that is an output of a 3D sensor. This sensors project a Infra Red pattern in order to draw a depth map, and, because of this, the IR image has a lot of white spots that reduce its quality. So, I want to process this image to make it smoother in order to make it possible to detect objects laying in the surface.
The original image looks like this:
My objective is to have something like this (which I obtained by blocking the IR projecter with my hand) :
An "open" morphological operation does remove some noise, but I think first there should be some noise removal operation that addresses the white dots.
Any ideas?
I should mention that the algorithm to reduce the noise has to run on real time.
A median filter would be my first attempt .... possibly followed by a Gaussian blur. It really depends what you want to do with it afterwards.
For example, here's your original image after a 5x5 median filter and 5x5 Gaussian blur:
The main difficulty in your images is the large radius of the white dots.
Median and morphologic filters should be of little help here.
Usually I'm not a big fan of these algorithms, but you seem to have a perfect use case for a decomposition of your images on a functional space with a sketch and an oscillatary component.
Basically, these algorithms aim at solving for the cartoon-like image X that approaches the observed image, and that differs from Y only through the removal of some oscillatory texture.
You can find a list of related papers and algorithms here.
(Disclaimer: I'm not Jérôme Gilles, but I know him, and I know that
most of his algorithms were implemented in plain C, so I think most of
them are practical to implement with OpenCV.)
What you can try otherwise, if you want to try simpler implementations first:
taking the difference between the input image and a blurred version to see if it emphasizes the dots, in which case you have an easy way to find and mark them. The output of this part may be enough, but you may also want to fill the previous place of the dots using inpainting,
or applying anisotropic diffusion (like the Rudin-Osher-Fatemi equation) to see if the dots disappear. Despite its apparent complexity, this diffusion can be implemented easily and efficiently in OpenCV by applying the algorithms in this paper. TV diffusion can also be used for the inpainting step of the previous item.
My main point on the noise removal was to have a cleaner image so it would be easier to detect objects. However, as I tried to find a solution for the problem, I realized that it was unrealistic to remove all noise from the image using on-the-fly noise removal algorithms, since most of the image is actually noise.. So I had to find the objects despite those conditions. Here is my aproach
1 - Initial image
2 - Background subtraction followed by opening operation to smooth noise
3 - Binary threshold
4 - Morphological operation close to make sure object has no edge discontinuities (necessary for thin objects)
5 - Fill holes + opening morphological operations to remove small noise blobs
6 - Detection
Is the IR projected pattern fixed or changes over time?
In the second case, you could try to take advantage of the movement of the dots.
For instance, you could acquire a sequence of images and assign each pixel of the result image to the minimum (or a very low percentile) value of the sequence.
Edit: here is a Python script you might want to try

What is the correct method to auto-crop objects from light background?

I'm trying to extract objects from scanned images. There could be a few documents on a white background, and I need to crop and rotate them automatically. This seems like a rather simple task, but I've got stuck at some point and get bad results all the time.
I've tried to:
Binarise the image and get connected components by performing morphological operations.
Perform watershed segmentation by using dilated and eroded binary images as mask components.
Apply Canny detector and fill the contours.
None of this gets me good results. If the object does't have contrast edges (i.e a piece of paper on white background), it splits into a lot of separate components. If I connect these components by applying excessive dilation, background noise also expands and everything becomes a mess.
For example, I have an image:
After applying Canny detector and filling the contours I get something like this:
As you can see, the components are not connected. They are eve too far from each other to be connected by a reasonable amount of dilation. And when I apply watershed to this mask combined with some background points, it yields very bad results.
Some images are noisy:
In this particular case I was able to obtain contour of the whole passport by Canny detector because of it's contrast edges. But threshold method doesn't work here.
If the images are always on a very light background, then you can binarize with a threshold close to the maximum possible value. After that it is a matter of correcting the binary image to get the objects, but this step will vary depending on how your other images look like.
For instance, the following image at left is what we get with a threshold at 99% of the maximum value after a gaussian filtering on the input. After removing components connected to the border and other small components, and also combining with some basic morphological tools, we get the image at right.
This may seem a bit wishy-washy but bear with me:
This looks like quite a challenging case for image processing recipes involving only edge detection, morphological operations and segmentation.
What you are not exploiting here is that you (I believe) know what your document should look like. You are currently looking at completely general solutions which do not take into account this prior knowledge. If you can get some training data then you can go all the way from simple template/patch-based matching (SSD, Normalized Cross-Correlation) to more sophisticated object detection techniques to find the position and rotation of your documents.
My guess is that if your objects are always more or less the same and at the same scale (e.g. passports scanned at a fixed resolution/similar machines) then you can get away with a fairly crude approach. There won't be any one correct method. It's also likely that the technique you end up using will not work until you have done a significant amount of parameter tweaking, so don't give up on anything too quickly.

how to remove background image and get fore image

there are two images
alt text http://bbs.shoucangshidai.com/attachments/month_1001/1001211535bd7a644e95187acd.jpg
alt text http://bbs.shoucangshidai.com/attachments/month_1001/10012115357cfe13c148d3d8da.jpg
one is background image another one is a person's photo with the same background ,same size,what i want to do is remove the second image's background and distill the person's profile only. the common method is subtract first image from the second one,but my problem is if the color of person's wear is similar to the background. the result of subtract is awful. i can not get whole people's profile. who have good idea to remove the background give me some advice.
thank you in advance.
If you have a good estimate of the image background, subtracting it from the image with the person is a good first step. But it is only the first step. After that, you have to segment the image, i.e. you have to partition the image into "background" and "foreground" pixels, with constraints like these:
in the foreground areas, the average difference from the background image should be high
in the background areas, the average difference from the background image should be low
the areas should be smooth. Outline length and curvature should be minimal.
the borders of the areas should have a high contrast in the source image
If you are mathematically inclined, these constraints can be modeled perfectly with the Mumford-Shah functional. See here for more information.
But you can probably adapt other segmentation algorithms to the problem.
If you want a fast and simple (but not perfect) version, you could try this:
subtract the two images
find the largest consecutive "blob" of pixels with a background-foreground difference greater than some threshold. This is the first rough estimate for the "person area" in the foreground image, but the segmentation does not meet the criteria 3 and 4 above.
Find the outline of the largest blob (EDIT: Note that you don't have to start at the outline. You can also start with a larger polygon, as the steps will automatically shrink it to the optimal position.)
now go through each point in the outline and smooth the outline. i.e. for each point find the point that minimizes the formula: c1*L - c2*G, where L is the length of the outline polygon if the point were moved here and G is the gradient at the location the point would be moved to, c1/c2 are constants to control the process. Move the point to that position. This has the effect of smoothing the contour polygon in areas of low gradient in the source image, while keeping it tied to high gradients in the source image (i.e. the visible borders of the person). You can try different expressions for L and G, for example, L could take the length and curvature into account, and G could also take the gradient in the background and subtracted images into account.
you probably will have to re-normalize the outline polygon, i.e. make sure that the points on the outline are spaced regularly. Either that, or make sure that the distances between the points stay regular in the step before. ("Geodesic Snakes")
repeat the last two steps until convergence
You now have an outline polygon that touches the visible person-background border and continues smoothly where the border is not visible or has low contrast.
Look up "Snakes" (e.g. here) for more information.
Low-pass filter (blur) the images before you subtract them.
Then use that difference signal as a mask to select the pixels of interest.
A wide-enough filter will ignore the too-small (high-frequency) features that end up carving out "awful" regions inside your object of interest. It'll also reduce the highlighting of pixel-level noise and misalignment (the highest-frequency information).
In addition, if you have more than two frames, introducing some time hysteresis will let you form more stable regions of interest over time too.
One technique that I think is common is to use a mixture model. Grab a number of background frames and for each pixel build a mixture model for its color.
When you apply a frame with the person in it you will get some probability that the color is foreground or background, given the probability densities in the mixture model for each pixel.
After you have P(pixel is foreground) and P(pixel is background) you could just threshold the probability images.
Another possibility is to use the probabilities as inputs in some more clever segmentation algorithm. One example is graph cuts which I have noticed works quite well.
However, if the person is wearing clothes that are visually indistguishable from the background obviously none of the methods described above would work. You'd either have to get another sensor (like IR or UV) or have a quite elaborate "person model" which could "add" the legs in the right position if it finds what it thinks is a torso and head.
Good luck with the project!
Background vs Foreground detection is very subjective. The application scenario defines background or foreground. However in the application you detail, I guess you are implicitly saying that the person is the foreground.
Using the above assumption, what you seek is a person detection algorithm. A possible solution is:
Run a haar feature detector+ boosted cascade of weak classifiers
(see the opencv wiki for details)
Compute inter-frame motion (differences)
If there is a +ve face detection for a frame, cluster motion pixels
around the face (kNN algorithm)
voila... you should have a simple person detector.
Post the photo on Craigslist and tell them that you'll pay $5 for someone to do it.
Guaranteed you'll get hits in minutes.
Instead of a straight subtraction, you could step through both images, pixel by pixel, and only "subtract" the pixels which are exactly the same. That of course won't account for minor variances in colors, though.

Adaptive threshold Binarization's bad effects

I implemented some adaptive binarization methods, they use a small window and at each pixel the threshold value is calculated. There are problems with these methods:
If we select the window size too small we will get this effect (I think the reason is because of window size is small)
(source: piccy.info)
At the left upper corner there is an original image, right upper corner - global threshold result. Bottom left - example of dividing image to some parts (but I am talking about analyzing image's pixel small surrounding, for example window of size 10X10).
So you can see the result of such algorithms at the bottom right picture, we got a black area, but it must be white.
Does anybody know how to improve an algorithm to solve this problem?
There shpuld be quite a lot of research going on in this area, but unfortunately I have no good links to give.
An idea, which might work but I have not tested, is to try to estimate the lighting variations and then remove that before thresholding (which is a better term than "binarization").
The problem is then moved from adaptive thresholding to finding a good lighting model.
If you know anything about the light sources then you could of course build a model from that.
Otherwise a quick hack that might work is to apply a really heavy low pass filter to your image (blur it) and then use that as your lighting model. Then create a difference image between the original and the blurred version, and threshold that.
EDIT: After quick testing, it appears that my "quick hack" is not really going to work at all. After thinking about it I am not very surprised either :)
I = someImage
Ib = blur(I, 'a lot!')
Idiff = I - Idiff
It = threshold(Idiff, 'some global threshold')
EDIT 2
Got one other idea which could work depending on how your images are generated.
Try estimating the lighting model from the first few rows in the image:
Take the first N rows in the image
Create a mean row from the N collected rows. You know have one row as your background model.
For each row in the image subtract the background model row (the mean row).
Threshold the resulting image.
Unfortunately I am at home without any good tools to test this.
It looks like you're doing adaptive thresholding wrong. Your images look as if you divided your image into small blocks, calculated a threshold for each block and applied that threshold to the whole block. That would explain the "box" artifacts. Usually, adaptive thresholding means finding a threshold for each pixel separately, with a separate window centered around the pixel.
Another suggestion would be to build a global model for your lighting: In your sample image, I'm pretty sure you could fit a plane (in X/Y/Brightness space) to the image using least-squares, then separate the pixels into pixels brighter (foreground) and darker than that plane (background). You can then fit separate planes to the background and foreground pixels, threshold using the mean between these planes again and improve the segmentation iteratively. How well that would work in practice depends on how well your lightning can be modeled with a linear model.
If the actual objects you try to segment are "thinner" (you said something about barcodes in a comment), you could try a simple opening/closing operation the get a lighting model. (i.e. close the image to remove the foreground pixels, then use [closed image+X] as threshold).
Or, you could try mean-shift filtering to get the foreground and background pixels to the same brightness. (Personally, I'd try that one first)
You have very non-uniform illumination and fairly large object (thus, no universal easy way to extract the background and correct the non-uniformity). This basically means you can not use global thresholding at all, you need adaptive thresholding.
You want to try Niblack binarization. Matlab code is available here
http://www.uio.no/studier/emner/matnat/ifi/INF3300/h06/undervisningsmateriale/week-36-2006-solution.pdf (page 4).
There are two parameters you'll have to tune by hand: window size (N in the above code) and weight.
Try to apply a local adaptive threshold using this procedure:
convolve the image with a mean or median filter
subtract the original image from the convolved one
threshold the difference image
The local adaptive threshold method selects an individual threshold for each pixel.
I'm using this approach extensively and it's working fine with images having non uniform background.

Resources