I've been testing openCV on an RPi using Python. The video is coming from a USB grabber from a CCTV camera.
I tested it in a room with 'ideal' stick figures and it worked great, tracking and zoom automatically.
However when testing in the real world, the first test location has a corrugated roof in view and the vertical lines of the roof always get detected as a person.
I was very surprised by this as the HoG detection seemed quite robust against bushes, trees and other optically jumbled images. A series of vertical lines seems to catch it out every time.
Why might this be?
Do I need to look at trying to re-train it? I would imagine this would be quite a task!
Has anyone else found this issue?
Maybe I should try and pre-filter the vertical lines out of the image?
Having a person tracker that can't cope with fences or roofs is a bit of a limitation!
Having false positives after just a single training session is common and should be expected. You should now record all these false positives and use them for hard negative training. That is, you would add these false positives in the negative training set. Once you perform a hard negative training, your model should perform much better and the number of false positives will reduce.
Understanding why the fence and other edges shows up as a false positive is a bit complicated to explain and is better explained by the many articles and the original HOG paper by Dalal and Triggs, which I would highly recommend.
Related
I have many hours of video captured by an infrared camera placed by marine biologists in a canal. Their research goal is to count herring that swim past the camera. It is too time consuming to watch each video, so they'd like to employ some computer vision to help them filter out the frames that do not contain fish. They can tolerate some false positives and false negatives, and we do not have sufficient tagged training data yet, so we cannot use a more sophisticated machine learning approach at this point.
I am using a process that looks like this for each frame:
Load the frame from the video
Apply a Gaussian (or median blur)
Subtract the background using the BackgroundSubtractorMOG2 class
Apply a brightness threshold — the fish tend to reflect the sunlight, or an infrared light that is turned on at night — and dilate
Compute the total area of all of the contours in the image
If this area is greater than a certain percentage of the frame, the frame may contain fish. Extract the frame.
To find optimal parameters for these operations, such as the blur algorithm and its kernel size, the brightness threshold, etc., I've taken a manually tagged video and run many versions of the detector algorithm using an evolutionary algorithm to guide me to optimal parameters.
However, even the best parameter set I can find still creates many false negatives (about 2/3rds of the fish are not detected) and false positives (about 80% of the detected frames in fact contain no fish).
I'm looking for ways that I might be able to improve the algorithm. I don't know specifically what direction to look in, but here are two ideas:
Can I identify the fish by the ellipse of their contour and the angle (they tend to be horizontal, or at an upward or downward angle, but not vertical or head-on)?
Should I do something to normalize the lighting conditions so that the same brightness threshold works whether day or night?
(I'm a novice when it comes to OpenCV, so examples are very appreciated.)
i think you're in the correct direction. Your camera is fixed so it will be easy to extract the fish image.
But you're lacking a good tool to accelerate the process. believe me, coding will cost you a lot of time.
Personally, in the past i choose few data first. Then i use bgslibrary to check which background subtraction method work for my data first. Then i code the program by hand again to run for the entire data. The GUI is very easy to use and the library is awesome.
GUI video
Hope this will help you.
I am working on a project which involves detection of people in various frames. The detector is able to detect most of the people in the frame sequence.
But it sometimes detects stationary background objects as people. I would really like to know why this is happening and how does the current working of the detector lead to these false positives.
And what can be done to remove these false positives?
A sample of false positive detection:
As the authors of the this paper imply in the title: "How Far are We from Solving Pedestrian Detection?", we haven't solved yet the problem of visual pedestrian detection in real scenarios, in fact, some think it will never be completely solved.
Detecting people in urban scenarios may rank among the most difficult tasks in computer vision. The scenes are cluttered with chaotic, random and unpredictable elements, pedestrians may be occluded, they may be hidden in shadow or in such dark environments that a camera can't see them. In fact, visual pedestrian detection remains one of the most important challenges to date.
And you aren't even using the best method in the state of the art, as you can see in the bellow graphic, its been a long time since HOG has been the best performing algorithm for this task.
(image taken from "Pedestrian Detection: An Evaluation of the State of the Art" by Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona)
That paper is already a bit outdated, but you see that even the best performing algorithms still do not perform brilliantly in image datasets, let alone real scenarios.
So, to answer your question, what can you do to improve its performance? It depends. If there are assumptions you can make in your specific scenario that allow this problem to become simpler, then you may be able to eliminate some false positives. Other way to improve results, and what every single Autonomous Driving Assistance System does, is fusing different sensor information to help the visual system. Most use LIDAR and RADAR to feed the camera with places to look at, and this helps the algorithm both in performance and speed.
So, as you can see it is very application dependent. If your application is supposed to work on a simple scenario, then a background subtraction algorithm will help removing false detections. You can also bootstrap your classifier with wrongly detected data to improve its performance.
But know one thing: there is no 100% in Computer-Vision, no matter how much you try. It is always a balance between accepting false positives and system robustness.
Cheers.
EDIT: To answer the question the the title, why background objects are detected as people? Because HOG is all about evaluating edges of the image, then you are probably sending HOG features to a SVM, right? The vertical pole detected in the image you provide shares some visual properties with humans, such as its vertical edges. That is why these algorithms fail a lot in traffic signs and other vertical elements, as you can see in my master thesis on this topic: Visual Pedestrian Detection using Integral Channels for ADAS
My xmas holiday project this year was to build a little Android app, which should be able to detect arbitrary Euro coins in a picture, recognize their value and sum the values up.
My assumptions/requirements for the picture for a good recognition are
uniform background
picture should be roughly the size of a DinA4 paper
coins may not overlap, but may touch each other
number-side of the coins must be up/visible
My initial thought was, that for the coin value-recognition later it would be best to first detect the actual coins/their regions in the picture. Any recognition then would run on only these regions of the picture, where actual coins are found.
So the first step was to find circles. This i have accomplished using this OpenCV 3 pipeline, as suggested in several books and SO postings:
convert to gray
CannyEdge detection
Gauss blurring
HoughCircle detection
filtering out inner/redundant circles
The detection works rather successfully IMHO, here a picture of the result:
Coins detected with HoughCircles with blue border
Up to the recognition now for every found coin!
I searched for solutions to this problem and came up with
template matching
feature detection
machine learning
The template matching seems very inappropriate for this problem, as the coins can be arbitrary rotated with respect to a template coin (and the template matching algorithm is not rotation-invariant! so i would have to rotate the coins!).
Also pixels of the template coin will never exactly match those of the region of the formerly detected coin. So any algorithm computing the similarity will produce only poor results, i think.
Then i looked into feature detection. This seemed more appropriate to me. I detected the features of a template-coin and the candidate-coin picture and drew the matches (combination of ORB and BRUTEFORCE_HAMMING). Unfortunately the features of the template-coin were also detected in the wrong candidate coins.
See the following picture, where the template or "feature" coin is on the left, a 20 Cents coin. To the right there are the candidate coin, where the left-most coin is a 20 Cents coin. I actually expected this coin to have the most matches, unfortunately not. So again, this seems not to be a viable way to recognize the value of coins.
Feature-matches drawn between a template coin and candidate coins
So machine learning is the third possible solution. From university i still now about neural networks, how they work, etc. Unfortunately my practical knowledge is rather poor AND i don't know Support Vector Machines (SVM) at all, which is the machine learning supported by OpenCV.
So my question is actually not source-code related, but more how to setup the learning process.
Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
How much positives and negatives per coin should be given?
Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Of course, if you have a different possible solution, this is also welcome!
Any help is appreciated! :-)
Bye & Thanks!
Answering your questions:
1- Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
For many object classification tasks it's better to extract the features first and then train a classifier using a learning algorithm. (e.g the features can be HOG and the learning algorithm can be something like SVM or Adaboost). It's mainly due to the fact that the features have more meaningful information compared to the pixel values. (They can describe edges,shapes, texture, etc.) However, the algorithms like deep learning will extract the useful features as a part of learning procedure.
2 - How much positives and negatives per coin should be given?
You need to answer this question depending on the variation in the classes you want to recognize and the learning algorithm you use. For SVM , if you use HOG features and want to recognize specific numbers on coins you won't need much.
3- Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
Again it depends on your final decision about the features(not SVM which is the learning algorithm) you're going to choose. HOG features are not rotation invariant but there are features like SIFT or SURF which are.
4-One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Again, choose your algorithm , some of them ask you for a fixed/similar width/height ratio. You can find out about the specific requirements in related papers.
If you decide to use SVM take a look at this and also if you feel ok with Neural Network, using Tensorflow is a good idea.
I am using OpenCV sample code “peopledetect.cpp” to detect pedestrians.
The code uses HoG for feature extraction and SVM for classification. Please find the reference paper used here.
The camera is mounted on the wall at a height of 10 feet and 45o down. There is no restriction on the pedestrian movement within the frame.
I am satisfied with the true positive rate (correctly detecting pedestrians) but false positive rate is very high.
Some of the false detections I observed are moving car, tree, and wall among others.
Can anyone suggest me how to improve the existing code to reduce false detection rate.
Any reference to blogs/codes is very helpful.
You could apply a background subtraction algorithm on your video stream. I had some success on a similar project using BackgroundSubtractorMOG2.
Another trick I used is to eliminate all "moving pixels" that are too small or with a wrong aspect ratio. I did this by doing a blob/contour analysis of the background subtraction output image. You need to be careful with the aspect ratio to make sure you support overlapping pedestrians.
Note that the model you're using (not sure which) is probably trained on a front faced pedestrian and not with a 45 degrees angle down. This will obviously affect your accuracy.
I'm looking for the fastest and more efficient method of detecting an object in a moving video. Things to note about this video: It is very grainy and low resolution, also both the background and foreground are moving simultaneously.
Note: I'm trying to detect a moving truck on a road in a moving video.
Methods I've tried:
Training a Haar Cascade - I've attempted training the classifiers to identify the object by taking copping multiple images of the desired object. This proved to produce either many false detects or no detects at all (the object desired was never detected). I used about 100 positive images and 4000 negatives.
SIFT and SURF Keypoints - When attempting to use either of these methods which is based on features, I discovered that the object I wanted to detect was too low in resolution, so there were not enough features to match to make an accurate detection. (Object desired was never detected)
Template Matching - This is probably the best method I've tried. It's the most accurate although the most hacky of them all. I can detect the object for one specific video using a template cropped from the video. However, there is no guaranteed accuracy because all that is known is the best match for each frame, no analysis is done on the percentage template matches the frame. Basically, it only works if the object is always in the video, otherwise it will create a false detect.
So those are the big 3 methods I've tried and all have failed. What would work best is something like template matching but with scale and rotation invariance (which led me to try SIFT/SURF), but i have no idea how to modify the template matching function.
Does anyone have any suggestions how to best accomplish this task?
Apply optical flow to the image and then segment it based on flow field. Background flow is very different from "object" flow (which mainly diverges or converges depending on whether it is moving towards or away from you, with some lateral component also).
Here's an oldish project which worked this way:
http://users.fmrib.ox.ac.uk/~steve/asset/index.html
This vehicle detection paper uses a Gabor filter bank for low level detection and then uses the response to create the features space where it trains an SVM classifier.
The technique seems to work well and is at least scale invariant. I am not sure about rotation though.
Not knowing your application, my initial impression is normalized cross-correlation, especially since I remember seeing a purely optical cross-correlator that had vehicle-tracking as the example application. (Tracking a vehicle as it passes using only optical components and an image of the side of the vehicle - I wish I could find the link.) This is similar (if not identical) to "template matching", which you say kind of works, but this won't work if the images are rotated, as you know.
However, there's a related method based on log-polar coordinates that will work regardless of rotation, scale, shear, and translation.
I imagine this would also enable tracking that the object has left the scene of the video, too, since the maximum correlation will decrease.
How low resolution are we talking? Could you also elaborate on the object? Is it a specific color? Does it have a pattern? The answers affect what you should be using.
Also, I might be reading your template matching statement wrong, but it sounds like you are overtraining it (by testing on the same video you extracted the object from??).
A Haar Cascade is going to require significant training data on your part, and will be poor for any adjustments in orientation.
Your best bet might be to combine template matching with an algorithm similar to camshift in opencv (5,7MB PDF), along with a probabilistic model (you'll have to figure this one out) of whether the truck is still in the image.