OpenCV: Effective way to automatically set NoiseSigma in BackgroundSubtractorMOG

I'm trying to make a program that detects people in CCTV footage and I've made a lot of progress. Unfortunately, the amount of noise in the videos varies a lot between different cameras and time of day, so with each of the sample videos. This means that the NoiseSigma needed varies from 1-25.
I've used a fastNlMeansDenoisingColored function and that helped a bit, but the NoiseSigma is still an issue.
Would it be effective to maybe loop through the video once, and somehow get an idea for how noisy the video is and make a relationship for noise vs NoiseSigma? Any ideas would be welcome.

I don't think it's possible to determine noise level in an image (or video) without having a reference data which doesn't contain any noise. One thing that comes to my mind is to record some static scenery and measure how all the frames differ between each other and then try to find some relationship (hopefuly linear) between the measure and NoiseSigma. If there was no noise, the accumulated difference between frames would be 0. By accumulated difference I mean something like this:
for i=1, i<frames.count(), ++i
cumulativeError += sum(abs(frame(i) - frame(i-1)))
Where sum adds up all elements of an image (frame) to produce scalar value.
Please keep in mind that I'm just following my intuition here and it's not a method I've seen before.


Unstable homography estimation using ORB

I am developing a feature tracking application and so far, after trying to almost all the feature detectors/descriptors, i've got the most satisfactory overall results with ORB.
Both my feature descriptor and detector is ORB.
I am selecting a specific area for detecting features on my source image (by masking). and then matching it with features detected on subsequent frames.
Then i filter my matches by performing ratio test on 'matches' obtained from the following code:
std::vector<std::vector<DMatch>> matches1;
m_matcher.knnMatch( m_descriptorsSrcScene, m_descriptorsCurScene, matches1,2 );
I also tried the two way ratio test(filtering matches from Source to Current scene and vice-versa, then filtering out common matches) but it didn't do much, so I went ahead with the one way ratio test.
i also add a min distance check to my ratio test, which, it apppears, gives better results
if (distanceRatio < m_fThreshRatio && bestMatch.distance < 5*min_dist)
and in the end , i estimate the Homography.
Mat H = findHomography(points1,points2);
I've tried using the RANSAC method for estimating inliners and then using those to recalculate my Homography, but that gives more unstability plus consumes more time.
then in the end i draw a rectangle around my specific region which is to be tracked. i get the plane coordinates by:
perspectiveTransform( obj_corners, scene_corners, H);
where 'objcorners' are the coordinates of my masked(or unmasked) region.
The reactangle I draw using 'scene_corners' seems to be vibrating. increasing the number of features has reduced it quite a bit, but I cant increase them too much because of the time constraint.
How can i improve the stability?
Any suggestions would be appreciated.
If it is the vibrations that are really bothersome to you then you could try taking the moving average of the homography matrices over time:
cv::Mat homoG = cv::findHomography(obj, scene, CV_RANSAC);
if (homography.empty()) {
cv::accumulateWeighted(homoG, homography, 0.1);
Make the 'homography' variable global, and keep calling this every time you get a new frame.
The alpha parameter of accumulateWeighted is the reciprocal of the period of the moving average.
So 0.1 is taking the average of the last 10 frames and 0.2 is taking the average of the last 5 and so on...
A suggestion that comes to mind from experience with feature detection/matching is that sometimes you just have to accept the matched feature points will not work perfectly. Even subtle changes in the scene you are looking at can cause somewhat annoying problems, for example changes in light or unwanted objects coming into view.
It appears to me that you have a decently working feature matching in place from what you say, you may want to work on a way of keeping the region of interest constant. If you know the typical speed or any other movement patterns unique to any object you are trying to track between frames, or any constraints relating to the position of your camera, it may be useful in avoiding recalculating the region of interest unnecessarily causing vibrations. Or in fact it may help in creating a more efficient searching algorithm, allowing you to increase the number of feature points you can detect and use.
Another (small) hack you can use is to avoid redrawing the region window if the previous window was of similar size and position.

Idea needed about vehicle counting by analysing the white pixels of the binary foreground image

I am a bit new to image processing so I'd like to ask you about finding the optimal solution for my problem, not help for code.
I couldn't think of a good idea yet so wanted to ask for your advices. Hope you can help.
I'm working on a project under OpenCV which is about counting the vehicles from a video file or a live camera. Other people working on such a project generally track the moving objects then count them but instead of it, I wanted to work with a different viewpoint; asking user to set a ROI(Region of interest) on the video window and work only for this region(for some reasons, like to not deal with the whole frame and some performance increase), as seen below.(btw, user can set more than one ROI and user is asked to set the height of the ROI about 2 times of a normal car by sense of proportion )
I've done some basic progress so far, like backgound updating, morphological filters, threshoulding and getting the moving object as a binary image something like below.
After doing them, I tried to count the white pixels of the final threshoulded foreground frame and estimate whether it was a car or not by checking the total white pixels number(I set a lower bound by a static calculation by knowing the height of ROI). To illustrate, I drew a sample graphic:
As you can see from the graphic, it was easy to calculate the white pixels and checking if it draws a curve by the time and determining whether a car or something like noise.
I was quite successful until two cars passed through my ROI together at the same time. My algorithm crashed by counting them as one car as you can guess :/ I tried different approaches for this problem and similar to this like long vehicles but I couldn't get an optimum solution up to now.
My question is: is it impossible to handle this task by this approach of pixel value counting? If it is possible, what would be your suggestion? I wish you also faced something similar to this before and can help me.
All ideas are welcome, thanks in advance friends.
Isolate the traffic from the background - take two images, run high pass filter on one of them, convert the other to a binary image - use the binary image to mask the filtered one, you should be able to use edge detection to identify the roof of each vehicle as a quadrilateral and you should then be able to compute a relative measure of it.
You then have four scenarios:
no quadrilaterals - no cars
large quadrilaterals - trucks
multiple small quadrilaterals - several cars
single quadrilaterals - one car
In answer to your question "Is it possible to do this using pixel counting?"
The short answer is "No", for the very reason your quoting: mere pixel counting of static images is not enough.
If you are limited to pixel counting, you can try looking at pixel count velocity (change of pixel counts between success frames) and you might pick out different "velocity" shapes when 1 car, 2 cars or trucks pass.
But just plain pixel counting? No. You need shape (geometric) information as well.
If you apply any kind of thresholding algorithm (e.g. for background subtraction), don't forget to update the background whenever light levels change (e.g. day and night). Also consider the grief when it is a partly cloudy with sharp cloud shadows that move across your image.

Algorithm for real-time tracking of several simple objects

I'm trying to write a program to track relative position of certain objects while I'm playing the popular game, League of Legends. Specifically, I want to track the x,y screen coordinates of any "minions" currently on the screen (The "minions" are the little guys in the center of the picture with little red and green bars over their heads).
I'm currently using the Java Robot class to send screen captures to my program while I'm playing, and am trying to figure out the best algorithm for locate the minions and track them so long as they stay on the screen.
My current thinking is to use a convolutional neural network to identify and locate the minions by the the colored bars over there heads. However, I'd have to re-identify and locate the minions on every new frame, and this seems like it'd be computationally expensive if I want to do this in real time (~10-60 fps).
These sorts of computer vision algorithms aren't really my specialization, but it seems reasonable that algorithms exist that exploit the fact objects in videos move in a continuous manner (i.e. they don't jump around from frame to frame).
So, is there an easily implementable algorithm for accomplishing this task?
Since this is a computer game, I think that the color of the bars should be constant. That might not be true only if the dynamic illumination affects the health bar, which is highly unlikely.
Thus, just find all of the pixels with this specific colors. Then you do some morphological operations and segment the image into blobs. By selecting only the blobs that fit some criteria, you can find the location of the units.
I know that my answer does not involve video, but the operations should be so simple, that it should be very quick.
As for the tracking, just find per each point the closest in the next frame.
Since the HUD location is constant, there should be no problem removing it.
Here is mine quick and not-so-robust implementation in Matlab that has a few limitations:
Units must be quite healthy (At least 40 pixels wide)
The bars do not overlap.
function FindUnits()
x = double(imread('c:\1.jpg'));
green = cat(3,149,194,151);
diff = abs(x - repmat(green,[size(x,1) size(x,2)]));
diff = mean(diff,3);
diff = logical(diff < 30);
diff = imopen(diff,strel('square',1));
rp = regionprops(diff,'Centroid','MajorAxisLength','MinorAxisLength','Orientation');
long = [rp.MajorAxisLength]./[rp.MinorAxisLength];
rp( long < 20) = [];
xy = [rp.Centroid];
x = xy(1:2:end);
y = xy(2:2:end);
figure;imshow('c:\1.jpg');hold on ;scatter(x,y,'g');
And the results:
You should use a model which includes a dynamic structure in it. For your object tracking purpose Hidden Markov Models (HMMs) (or in general Dynamic Bayesian Networks) are very well suitable. You can find a lot of resources on HMMs online. The issues you are going to face however, depends on your system model. If your system dynamics can easily be represented as a linear Gauss-Markov model then a simple Kalman Filter will do fine. However, in the case of nonlinear non-gaussian dynamics you should use Particle Filtering which is a Sequential Monte Carlo Method. Both Kalman Filter and Particle Filter are sequential methods so you will use the results you have at the current step to have a result at the next time step. I suggest you to check some online tutorials and papers on Multiple Object Tracking via Particle Filters. As far as I am concerned the main difficulty you will have is however, the number of objects you may want to track since you won't know the number of the objects you want to track and also a object you are tracking can just disappear as well (you may kill those little guys or they may just leave the screen) or some other guy can just enter the screen. Hope this helps.

Identify a specific sound on iOS

I'd like to be able to recognise a specific sound in an iOS application. I guess it would basically work like speech recognition in that it's fairly fuzzy, but it would only have to be for 1 specific sound.
I've done some quick FFT stuff to identify specific frequencies over a certain threshold and only when they're solo (ie, they're not surrounded by other frequencies) so I can identify individual tones pretty easily. I'm thinking it's just an extension of this, but comparing to an FFT data set of a recording of the sound, and compare say 0.1 second chunks over the length of the audio. And I would also have to account for variation in amplitude, a little in pitch and a little in time.
Can anyone point me to any pre-existing source that I could use to speed this process along? I can't seem to find anything usable. Or failing that, any ideas on how to get started on something like this?
Thanks very much
From your description it is not entirely clear what you want to do.
What is the "specific" sound like? Does it have high background noise?
Whats the specific recognizable feature (e.g. pitch, inhamonicity, timbre ...)?
Against which other "sounds" do you want to compare it?
Do you simply want to match an arbitrary sound spectrum against a "template sound"?
Is your sound percussive, melodic, speech, ...? Is it long, short ...?
Whats the frequency range you expect the best discriminability? Are the features invariant with time?
There is no "general" solution that works for everything. Speech recognition in itself is fairly complex and wont work well for abstract sounds whose discriminable frequencies are not in the e.g. MEL bands.
So in conclusion, you are leaving too many open questions to get a useful answer.
Only suggestion i can make based on the few informations is the following:
For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification
For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound
Note: This is a very crude method which discriminates sounds according to their most powerful frequencies. Using the gaussians it leaves room for slight shifts in the most powerful frequencies.

OpenCV: Detect blinking lights in a video feed

I have a video feed. This video feed contains several lights blinking at different rates. All lights are the same color (they are all infrared LEDs). How can I detect the position and frequency of these blinking lights?
Disclaimer: I am extremely new to OpenCV. I do have a copy of Learning OpenCV, but I am finding it a bit overwhelming. If anyone could explain a solution in OpenCV terminology, it would be greatly appreciated. I am not expecting code to be written for me.
Threshold each image in the sequence with a threshold that makes the LED:s visible. If you can threshold it with a threshold that only keeps the LED and removes background then you are more or less finished since all you need to do now is to keep track of each position that has seen a LED and count how often it occurs.
As a middle step, if there is "background noise" in the thresholded image would be to use erosion to remove small mistakes, and then maybe dilate to "close holes" in the blobs you are actually interested in.
If the scene is static you could also make a simple background model by taking the median of a few frames and removing the resulting median image from any frame and threshold that. Stuff that has changed (your LEDs) will appear stronger.
If the scene is moving I see no other (easy) solution than making sure the LED are bright enough to be able to use the threshold approach given above.
As for OpenCV: if you know what you want to do, it is not very hard to find a function that does it. The hard part is coming up with a method to solve the problem, not the actual coding.
If the leds are stationary, the problem is far simpler than when they are moving. Assuming they are stationary, a solution to find the frequency could simply be to keep a vector or an array for each pixel location in which you store the values of that pixel, preferably after the preprocessing described by kigurai, over some timeframe. You can then compute the 1D fourier transform of those value vectors and find the ground frequency as the first significant component after the DC peak. If the DC peak is too low, it means there is no led there.
Hope this problem is still somewhat actual, and that my solution makes sense.
