Vehicle Counting Based on BackgroundSubtractorMOG - opencv

I'm working on a project called ATCS(Automatic Traffic Controller System), it will modify traffic light duration based on the vehicle amount in front of the traffic light.
I used openCV and backgroundsubtractorMOG to detect vehicle, it runs successfully when vehicles are moving, but when the red signal turned on, all vehicle are uncountable. Of course that will make my software doesn't work.
So far I know that backgroundsubtractorMOG is best solution because this system work in many variation of wheather, light intensity, etc. It will compare current frame and previous frame so the moving object are detected as foreground (CMIIW). so how about the vehicle that was move and have stopped -- because the red signal of the traffic light is on and it force the driver to stop their vehicle? Will it still be detected as foreground object?
So I want to ask the most suitable algorithm to do it. How to count amount of vehicle when it moving, also when the vehicle stop moving because the red signal -- it still detected as vehicle.
thank you :)

How do you update your background? Because of changes in the lighting condition (clouds, day, night, dusk, weather) you cannot keep it statistic, however the presence of a stopped car could be still detectable if you still know the appearance of the background, that is the appearance of the road if the car is not there.
If you have an area in the image where car do not pass, you can use that to understand if the lighting conditions are changing.
Which is your view angle for the vehicles? There is chance that combining a Viola Jones detector with a KLT tracker you obtain better and more general results.

If background subtraction works for you (as you said), I would try to add another background model. Then you can perform background subtraction two times, once for the previous image (works for all moving objects) and once for your long term background model which will detect all stopped vehicles (and moving ones too) but might have some disadvantages for differing lighting conditions.
You could have a look at ViBe or Gaussian-Mixture-Models for creating those background models.
The other way would be to introduce some tracking mechanism as Antonio already mentioned. Once a vehicle is detected by background subtraction (only moving objects will appear in the image), you start a tracking and you will know that they're there even if they aren't detected again (because they don't move).
So you need a tracking method that's not "tracking by detection" but some other method. I would recommend Kalman Filter or Particle Filtering, or maybe mean-shift tracking.
EDIT: one method often used for vehicle detection which is similar to background subtraction techniques are Local Binary Patterns (LBP)

I would suggest using latent SVM detector with the "car" and "bus" model to detect vehicles and then apply simple tracking on the bounding boxes you get.
Latent SVM detector:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html

Related

How to extract foreground form a moving camera by using Opencv

I am performing motion detection using OpenCV. Challenge is that camera is moving so frame differencing is not a technique to be used directly. So I am trying to separate foreground and background and after that performing frame differencing on foreground images.
Ques is that how to separate foreground and background from a video taken from moving camera??
Any help from your side would be thankful to you!!
Subtracting background is an important technique to generate a foreground mask and its used widely in applications.
Many of the techniques which are used to get background subtraction assumes that camera is constant. Up to now I didn't see any paper or example which works with a moving camera.
You may have a look at the opencv example in here, which is very useful for background subtraction. Especially MOG2 and KNN algorithms are really good at to find shadows which is a problem in background subtraction. I suggest you to take the history parameters of these algorithms so low(1 to 10), by doing this you may get some good results even I don't think.
The best but the difficult way I suggest you is that using an AI. If your desired objects are specific (like people, car etc. ), you can use an AI to detect those objects and subtract the rest of the frame. The most proper AI algorithm for this problem is Mask R-CNN which will detect the mask of the objects also. Here are the some examples of this:
Reference 1
Reference 2
Reference 3
Reference 4

How do I detect multiple people and track them in opencv/emgucv

I have been trying to detect multiple people in a small space and hence track them.
Input: CCTV feed from a camera mounted in a small room.
Expected Output: Track and hence store the path that people take while moving from one end of the room to the other.
I tried to implement some of the basic methods like background subtraction and pedestrian detection. But the results are not as desired.
In the results obtained by implementing background subtraction, due to occlusion the blob is not one single entity(the blob of one person is broken into multiple small blobs) hence, detecting it as a single person is very difficult.
Now, consider the case when there are many people standing close to each other. In this case detecting people using simple background subtraction is a complete disaster.
Is there a better way to detect multiple people?
Or maybe is there a way to improve the result of background subtraction?
And please suggest a good way for tracking multiple people?
that's a quite hard problem and there is no out-of-the-box solution, so you might have to try different methods.
In the beginning you will want to make some assumptions like static camera position and everything that's not background is a person or part of a person, maybe multiple persons. Persons can't appear within the image but they will have to 'enter' it (and are detected on entering and tracked after detection).
Detection and tracking can both be difficult problems so you might want to focus on one of them first. I would start with tracking and choose a probabilisic tracking method, since simple tracking methods like tracking by detection probably can't handle overlap and multiple targets very well.
Tracking:
I would try a particle filter, like http://www.irisa.fr/vista/Papers/2002/perez_hue_eccv02.pdf
which is capable to track multiple targets.
Detection: There is a HoG Person Detector in OpenCV which works quite fine for upright persons
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
but it's good to know the approximate size of a person in the image and scale the image accordingly. You can do this after background subtraction by scaling the blobs or combination of blobs, or you use a calibration of your camera and scale image parts of size 1.6m to 2.0m to your HoG detector size. Otherwise you might have many misses and many false alarms.
In the end you will have to work and research for some time to get the things running, but don't expect early success or 100% hit rates ;)
I would create a sample video and work on that, manually masking entering people as detection and implement the tracker with those detections.

How to Handle Occlusion and Fragmentation

I am trying to implement a people counting system using computer vision for uni project. Currently, my method is:
Background subtraction using MOG2
Morphological filter to remove noise
Track blob
Count blob passing a specified region (a line)
The problem is if people come as group, my method only counts one people. From my readings, I believe this is what called as occlusion. Another problem is when people looks similar to background (use dark clothing and passing a black pillar/wall), the blob is separated while it is actually one person.
From what I read, I should implement a detector + tracker (e.g. detect human using HOG). But my detection result is poor (e.g. 50% false positives with 50% hit rate; using OpenCV human detector and my own trained detector) so I am not convinced to use the detector as basis for tracking. Thanks for your answers and time for reading this post!
Tracking people in video surveillance sequences is still an open problem in the research community. However particule filters (PF) (aka sequential monte-carlo) gives good results towards occlusion and complex scene. You should read this. There is also extra links to example source code after biblio.
An advantage on using PF is the gain in computational time towards tracking by detection (only).
If you go this way, feel free to ask for better understanding about the maths behind the PF.
There is no single "good" answer to this as handling occlusion (and background substraction) are still open problems! There are several pointers that can be given that might help you along with your project.
You want to detect if a "blob" is one person or a group of people. There are several things you could do to handle this.
Use multiple cameras (it's unlikely that a group of people is detected as a single blob from all angles)
Try to detect parts of the human body. If you detect two heads on a single blob, there are multiple people. Same can be said for 3 legs, 5 shoulders, etc.
On the area of tracking a "lost" person (one walking behind another object), is to extrapolate it's position. You know that a person can only move so much in between frames. By holding this into account, you know that it's impossible for a user to be detected in the middle of your image and then suddenly disappear. After several frames of not seeing that person, you can discard the observation, as the person might have had enough time to move away.

single person tracking from video sequence

As a part of my thesis work, I need to build a program for human tracking from video or image sequence like the KTH or IXMAS dataset with the assumptions:
Illumination remains unchanged
Only one person appear in the scene
The program need to perform in real-time
I have searched a lot but still can not find a good solution.
Please suggest me a good method or an existing program that is suitable.
Case 1 - If camera static
If the camera is static, it is really simple to track one person.
You can apply a method called background subtraction.
Here, for better results, you need a bare image from camera, with no persons in it. It is the background. ( It can also be done, even if you don't have this background image. But if you have it, better. I will tell at end what to do if no background image)
Now start capture from camera. Take first frame,convert both to grayscale, smooth both images to avoid noise.
Subtract background image from frame.
If the frame has no change wrt background image (ie no person), you get a black image ( Of course there will be some noise, we can remove it). If there is change, ie person walked into frame, you will get an image with person and background as black.
Now threshold the image for a suitable value.
Apply some erosion to remove small granular noise. Apply dilation after that.
Now find contours. Most probably there will be one contour,ie the person.
Find centroid or whatever you want for this person to track.
Now suppose you don't have a background image, you can find it using cvRunningAvg function. It finds running average of frames from your video which you use to track. But you can obviously understand, first method is better, if you get background image.
Here is the implementation of above method using cvRunningAvg.
Case 2 - Camera not static
Here background subtraction won't give good result, since you can't get a fixed background.
Then OpenCV come with a sample for people detection sample. Use it.
This is the file: peopledetect.cpp
I also recommend you to visit this SOF which deals with almost same problem: How can I detect and track people using OpenCV?
One possible solution is to use feature points tracking algorithm.
Look at this book:
Laganiere Robert - OpenCV 2 Computer Vision Application Programming Cookbook - 2011
p. 266
Full algorithm is already implemented in this book, using opencv.
The above method : a simple frame differencing followed by dilation and erosion would work, in case of a simple clean scene with just the motion of the person walking with absolutely no other motion or illumination changes. Also you are doing a detection every frame, as opposed to tracking. In this specific scenario, it might not be much more difficult to track either. Movement direction and speed : you can just run Lucas Kanade on the difference images.
At the core of it, what you need is a person detector followed by a tracker. Tracker can be either point based (Lucas Kanade or Horn and Schunck) or using Kalman filter or any of those kind of tracking for bounding boxes or blobs.
A lot of vision problems are ill-posed, some some amount of structure/constraints, helps to solve it considerably faster. Few questions to ask would be these :
Is the camera moving : No quite easy, Yes : Much harder, exactly what works depends on other conditions.
Is the scene constant except for the person
Is the person front facing / side-facing most of the time : Detect using viola jones or train one (adaboost with Haar or similar features) for side-facing face.
How spatially accurate do you need it to be, will a bounding box do it, or do you need a contour : Bounding box : just search (intelligently :)) in neighbourhood with SAD (sum of Abs Differences). Contour : Tougher, use contour based trackers.
Do you need the "tracklet" or only just the position of the person at each frame, temporal accuracy ?
What resolution are we speaking about here, since you need real time :
Is the scene sparse like the sequences or would it be cluttered ?
Is there any other motion in the sequence ?
Offline or Online ?
If you develop in .NET you can use the Aforge.NET framework.
http://www.aforgenet.com/
I was a regular visitor of the forums and I seem to remember there are plenty of people using it for tracking people.
I've also used the framework for other non-related purposes and can say I highly recommend it for its ease of use and powerful features.

Image Processing: What are occlusions?

I'm developing an image processing project and I come across the word occlusion in many scientific papers, what do occlusions mean in the context of image processing? The dictionary is only giving a general definition. Can anyone describe them using an image as a context?
Occlusion means that there is something you want to see, but can't due to some property of your sensor setup, or some event.
Exactly how it manifests itself or how you deal with the problem will vary due to the problem at hand.
Some examples:
If you are developing a system which tracks objects (people, cars, ...) then occlusion occurs if an object you are tracking is hidden (occluded) by another object. Like two persons walking past each other, or a car that drives under a bridge.
The problem in this case is what you do when an object disappears and reappears again.
If you are using a range camera, then occlusion is areas where you do not have any information. Some laser range cameras works by transmitting a laser beam onto the surface you are examining and then having a camera setup which identifies the point of impact of that laser in the resulting image. That gives the 3D-coordinates of that point. However, since the camera and laser is not necessarily aligned there can be points on the examined surface which the camera can see but the laser can not hit (occlusion).
The problem here is more a matter of sensor setup.
The same can occur in stereo imaging if there are parts of the scene which are only seen by one of the two cameras. No range data can obviously be collected from these points.
There are probably more examples.
If you specify your problem, then maybe we can define what occlusion is in that case, and what problems it entails
The problem of occlusion is one of the main reasons why computer vision is hard in general. Specifically, this is much more problematic in Object Tracking. See the below figures:
Notice, how the lady's face is not completely visible in frames 0519 & 0835 as opposed to the face in frame 0005.
And here's one more picture where the face of the man is partially hidden in all three frames.
Notice in the below image how the tracking of the couple in red & green bounding box is lost in the middle frame due to occlusion (i.e. partially hidden by another person in front of them) but correctly tracked in the last frame when they become (almost) completely visible.
Picture courtesy: Stanford, USC
Occlusion is the one which blocks our view. In the image shown here, we can easily see the people in the front row. But the second row is partly visible and third row is much less visible. Here, we say that second row is partly occluded by first row, and third row is occluded by first and second rows.
We can see such occlusions in class rooms (students sitting in rows), traffic junctions (vehicles waiting for signal), forests (trees and plants), etc., when there are a lot of objects.
Additionally to what has been said I want to add the following:
For Object Tracking, an essential part in dealing with occlusions is writing an efficient cost function, which will be able to discriminate between the occluded object and the object that is occluding it. If the cost function is not ok, the object instances (ids) may swap and the object will be incorrectly tracked. There are numerous ways in which cost functions can be written some methods use CNNs[1] while some prefer to have more control and aggregate features[2]. The disadvantage of CNN models is that in case you are tracking objects that are in the training set in the presence of objects which are not in the training set, and the first ones get occluded, the tracker can latch onto the wrong object and may or may never recover. Here is a video showing this. The disadvantage of aggregate features is that you have to manually engineer the cost function, and this can take time and sometimes knowledge of advanced mathematics.
In the case of dense Stereo Vision reconstruction, occlusion happens when a region is seen with the left camera and not seen with the right(or vice versa). In the disparity map this occluded region appears black (because the corresponding pixels in that region have no equivalent in the other image). Some techniques use the so called background filling algorithms which fill the occluded black region with pixels coming from the background. Other reconstruction methods simply let those pixels with no values in the disparity map, because the pixels coming from the background filling method may be incorrect in those regions. Bellow you have the 3D projected points obtained using a dense stereo method. The points were rotated a bit to the right(in the 3D space). In the presented scenario the values in the disparity map which are occluded are left unreconstructed (with black) and due to this reason in the 3D image we see that black "shadow" behind the person.
As the other answers have explained the occlusion well, I will only add to that. Basically, there is semantic gap between us and the computers.
Computer actually see every image as the sequence of values, typically in the range 0-255, for every color in RGB Image. These values are indexed in the form of (row, col) for every point in the image. So if the objects change its position w.r.t the camera where some aspect of the object hides (lets hands of a person are not shown), computer will see different numbers (or edges or any other features) so this will change for the computer algorithm to detect, recognize or track the object.

Resources