Smoothing motion by using Kalman Filter or Particle Filter in video stabilization - opencv

I have a problem. I have read many papers about video stabilization. Almost papers mention about smoothing motion by using Kalman Filter, so it's strong and run in real-time applications.
But there is also another filter strongly, that is particle filter.
But why dont we use Partilce filter in smoothing motion to create stabilized video?
Some papers only use particle filter in estimating global motion between frames (motion estimation part).
It is hard to understand them.
Can anyone explain them for me, please?
Thank you so much.

A Kalman Filter is uni-modal. That means it has one belief along with an error covariance matrix to represent the confidence in that belief as a normal distribution. If you are going to smooth some process, you want to get out a single, smoothed result. This is consistent with a KF. It's like using least squares regression to fit a line to data. You are simplifying the input to one result.
A particle filter is multi-modal by its very nature. Where a Kalman Filter represents belief as a central value and a variance around that central value, a particle filter just has many particles whose values are clustered around regions that are more likely. A particle filter can represent essentially the same state as a KF (imagine a histogram of the particles that looks like the classic bell curve of the normal distribution). But a particle filter can also have multiple humps or really any shape at all. This ability to have multiple simultaneous modes is ideally suited to handle problems like estimating motion, because one mode (cluster of particles) can represent one move, and another mode represents a different move. When presented with this ambiguity, a KF would have to abandon one of the possibilities altogether, but a particle filter can keep on believing both things at the same time until the ambiguity is resolved by more data.

Related

Finding patterns in a numerical signal

Let's consider some 2D signals (amplitude over time).
I have a library of snippets that have specific 'shapes'. For the sake of the example, let's assume a square wave, a triangle wave and a sawtooth wave, but in practice, they're a lot more complex.
And I have a complex signal, such as an audio recording.
What would be the best way to train a system to spot elements from the library inside the complex signal, knowing that:
The library shape may be present at a different frequency.
The library shape may be present at a different amplitude.
Multiple shapes can be overlapping.
There is noise all over.
What I would like to recover is:
Which shape was recognized.
How close is it to the reference signal.
Where does that shape fit in the complex signal (position, frequency, amplitude range).
Bonus problem: since I'm looking for the shape itself, it may be stretched over time in a non linear fashion.
I draw a quick picture to illustrate:
As an example here, I drew 3 basic shapes for my library and I overlapped a few at different places on an audio signal.
What would be the best approach to solve this problem?
I would lean toward training a classifier to recognize the shapes, but I am not sure this is the right approach, nor really how practical it would be with this kind of data which has frequencies well distributed all across a wide range (50hz to 15khz).
If you do not have a labeled data set, I would probably try a simplified convolutional autoencoder.
If the base shapes that you want to recognize are fixed, I would use
a single hidden layer (the bottleneck) where the kernel functions are set to base shapes. The values of the neurons at the bottleneck would tell you which shapes are detected and where.

How to decorrelate accelerometer data

Is it possible to decorrelate accelerometer data in real-time? If so, how is it done?
Background:
My application is receiving (X,Y,Z) accelerometer data in real-time (sample rate is 6.75Hz). The sensor is moving in a periodic motion but the motion is not necessarily along only one axis. The 3 signals x(t), y(t) and z(t) are therefore slightly correlated and I would like to know if I can find a rotation matrix (in real time) which can be used to rotate the measured (x,y,z) into a new vector (x*,y*,z*) so that the entire motion is along the z-axis?
I would like to implement the algorithm in C.
Thanks.
What you're trying to do is generally called "principal component analysis". The Wikipedia article is pretty good:
https://en.wikipedia.org/wiki/Principal_component_analysis
For static data you generally use the eigenvectors of the covariance matrix as your new coordinate basis.
PCA in real time is doable, but not super easy. See, for example: http://www.bio-conferences.org/articles/bioconf/pdf/2011/01/bioconf_skills_00055.pdf
I'd like to first of all emphasize that Matt Timmermans' answer has done exactly what people are actually doing when classifying accelerometer data from clinical studies (a project I worked on).
Then: you're observing a sampled signal. In general, if you have a sensor that gives you samples at a rate of 6.75Hz, the highest frequency of a signal you can detect is 6.75Hz/2 = 3.375Hz. Everything that has a frequency higher than that will inherently be aliased back and look like it was something with a frequency f with 0<=f<3.375Hz. If you've not considered this, please go and read up on the Nyquist–Shannon sampling theorem. Especially: shield your sensors (however you do that, e.g. by employing dampeners) from all input above that limit, otherwise your measurements might be worth very little or even nothing. If your sensor does this internally (that's absolutely possible, there are enough accelerometers with analog low pass filters), this has been taken care of. However, document that characteristics of your sensor.
Now, your case is a little bit easier because you know pretty well that your whole observation is going to be periodic, and it's measured along three orthogonal axis.
In this case, just doing three discrete Fourier transforms at once, extracting the "strongest" spectral component over all three channels, and finding the phase of that spectral component (which is but the complex argument of that DFT bin) in the two others would give you something that you can map to a periodic movement around a specific axis in 3D space. If you want to, remove these value (set the bins to 0), and search for strongest component again etc.
Discrete cosine transforms can be done in staggering speed nowadays. with 6.75Hz, no PC in this world will ever get into trouble when you try this while you receive further samples. It's a hilariously low sampling rate.
Another, more elegant (read: you need less samples to compute this) would be using a parametric estimator; in your case, a direction-of-arrival sensor from the world of RF technology with multiple antennas would, as far as I can think, map directly to detection of rotational axis. The classical algorithms here are MUSIC and ESPRIT, and for your case (limited, known amount of oscillating parts), ESPRIT might be the better choice.

Vehicle segmentation and tracking

I've been working on a project for some time, to detect and track (moving) vehicles in video captured from UAV's, currently I am using an SVM trained on bag-of-feature representations of local features extracted from vehicle and background images. I am then using a sliding window detection approach to try and localise vehicles in the images, which I would then like to track. The problem is that this approach is far to slow and my detector isn't as reliable as I would like so I'm getting quite a few false positives.
So I have been considering attempting to segment the cars from the background to find the approximate position so to reduce the search space before applying my classifier, but I am not sure how to go about this, and was hoping someone could help?
Additionally, I have been reading about motion segmentation with layers, using optical flow to segment the frame by flow model, does anyone have any experience with this method, if so could you offer some input to as whether you think this method would be applicable for my problem.
Below is two frames from a sample video
frame 0:
frame 5:
Assumimg your cars are moving, you could try to estimate the ground plane (road).
You may get a descent ground plane estimate by extracting features (SURF rather than SIFT, for speed), matching them over frame pairs, and solving for a homography using RANSAC, since plane in 3d moves according to a homography between two camera frames.
Once you have your ground plane you can identify the cars by looking at clusters of pixels that don't move according to the estimated homography.
A more sophisticated approach would be to do Structure from Motion on the terrain. This only presupposes that it is rigid, and not that it it planar.
Update
I was wondering if you could expand on how you would go about looking for clusters of pixels that don't move according to the estimated homography?
Sure. Say I and K are two video frames and H is the homography mapping features in I to features in K. First you warp I onto K according to H, i.e. you compute the warped image Iw as Iw( [x y]' )=I( inv(H)[x y]' ) (roughly Matlab notation). Then you look at the squared or absolute difference image Diff=(Iw-K)*(Iw-K). Image content that moves according to the homography H should give small differences (assuming constant illumination and exposure between the images). Image content that violates H such as moving cars should stand out.
For clustering high-error pixel groups in Diff I would start with simple thresholding ("every pixel difference in Diff larger than X is relevant", maybe using an adaptive threshold). The thresholded image can be cleaned up with morphological operations (dilation, erosion) and clustered with connected components. This may be too simplistic, but its easy to implement for a first try, and it should be fast. For something more fancy look at Clustering in Wikipedia. A 2D Gaussian Mixture Model may be interesting; when you initialize it with the detection result from the previous frame it should be pretty fast.
I did a little experiment with the two frames you provided, and I have to say I am somewhat surprised myself how well it works. :-) Left image: Difference (color coded) between the two frames you posted. Right image: Difference between the frames after matching them with a homography. The remaining differences clearly are the moving cars, and they are sufficiently strong for simple thresholding.
Thinking of the approach you currently use, it may be intersting combining it with my proposal:
You could try to learn and classify the cars in the difference image D instead of the original image. This would amount to learning what a car motion pattern looks like rather than what a car looks like, which could be more reliable.
You could get rid of the expensive window search and run the classifier only on regions of D with sufficiently high value.
Some additional remarks:
In theory, the cars should even stand out if they are not moving since they are not flat, but given your distance to the scene and camera resolution this effect may be too subtle.
You can replace the feature extraction / matching part of my proposal with Optical Flow, if you like. This amounts to identifying flow vectors that "stick out" from a consistent frame-to-frame motion of the ground. It may be prone to outliers in the optical flow, however. You can also try to get the homography from the flow vectors.
This is important: Regardless of which method you use, once you have found cars in one frame you should use this information to robustify your search of these cars in consecutive frame, giving a higher likelyhood to detections close to the old ones (Kalman filter, etc). That's what tracking is all about!
If the number of cars in your field of view always remain the same but move around then you can use optical flow...it will give you good results against a still background...if the number of cars are changing then you need to call goodFeaturestoTrack function in OpenCV after certain number of frames and again track the cars using optical flow.
You can use background modelling to model the background and hence the cars are always your foreground.The simplest example is frame differentiation...subtract the previous frame current frame. diff(x,y,k) = I(x,y,k) - I(x,y,k-1) .As your cars are moving in each frame you will get their position..
Both the process will work fine since you have a still background I presume..check this link to find what Optical flow can do.

Kalman filter eye tracking

i'd like to implement a kalman filter using OpenCV to track eye (in particular eye ball). I read something around internet about Kalman Filter. I have to set the state of my filter. What can i use as state? My only available data are 3D coordinates of eye (x,y,z).
You have to understand the Kalman filter first in order to use it. The most human readable intro with examples I have found so far is the SIGGRAPH Course Pack.
UPDATE
I do not know the Kalman filter implementation in OpenCV.
The state of the filter is perhaps the true coordinates of the eye. However, you can only estimate it from the frames (these are the coordinates you write in your question), hence the need for the filter.
To use Kalman filter as a black-box you will still need
an initial estimate of the state
measurement noise covariance R
process noise covariance Q
A reasonable estimate for 1. is the eye coordinates on the first frame.
As for 2. and 3., see 5.1 Parameter Estimation or Tuning in the SIGGRAPH Course pack.
Perhaps the example 4.3 An Example: Estimating a Random Constant will also help to understand how the Kalman filter works and what you need.

Perlin noise for motion?

I'm successfully using Perlin noise to generate terrain, clouds and a few other nifty things. However, I'm now trying to animate a group of flying insects (specifically fireflies), and it was suggested to me to use Perlin noise for this, as well. However, I'm not really sure how to go about this.
The first thing that occurred to me was, given a noise map like so:
Assign each firefly a random initial location, velocity and angular acceleration.
On frame, advance the fly's position following its direction vector.
Read the noise map at the new location, and use it to adjust the angular acceleration, causing
the fly to "turn" towards lighter pixels.
Adjust angular acceleration again by proximity of other flies to avoid having them cluster around local maximums.
However, this doesn't cover cases where flies reach the edge of the map, or cases where they might wind up just orbiting a single point. The second case might not be a big deal, but I'm unsure of a reliable way to have them turn to avoid collisions with the map edge.
Suggestions? Tutorials or papers (in English, please)?
Here is a very good source for 2D perlin noise. You can follow the exact same principles, but instead of creating a 2D grid of gradients, you can create a 1D array of gradients. You can use this to create your noise for a particular axis.
Simply follow this recipe, and you can create similar perlin noise functions for each of your other axes too! Combine these motions, and you should have some good looking noise on your hands. (You could also use these noise functions as random accellerations or velocities. Since the Perlin noise function is globally monotonous, your flies won't rocket off to crazy distances.)
http://webstaff.itn.liu.se/~stegu/TNM022-2005/perlinnoiselinks/perlin-noise-math-faq.html
If you're curious about other types of motion, I would suggest Brownian Motion. That is the same sort of motion that dust particles exhibit when they are floating around your room. This article gets into some more interesting math at the end, but if you're at all familliar with Matlab, the first few sets of instructions should be pretty easy to understand. If not, just google the funcitons, and find their native equivalents for your environment (or create them yourself!) This will be a little more realistic, and much quicker to calculate than perlin noise
http://lben.epfl.ch/files/content/sites/lben/files/users/179705/Simulating%20Brownian%20Motion.pdf
Happy flying!
Maybe you're looking for boids?
Wikipedia page
It doesn't feature Perlin noise in the original concept, maybe you could use the noise to generate attractors or repulsors, as you're trying to do with the 'fly to lighter' behavior.
PS: the page linked above features a related link to Firefly algorithm, maybe you'll be interested in that?

Resources