Wavelet packet decomposition vs bandpass filters - signal-processing

If I am right, Wavelet packet decomposition (WPT) breaks a signal into various filter banks.
The same thing can be done using many band pass filters.
My aim is to find the energy content of a signal with a large sapmling rate ((2000 hz) in various frequency bands like 1-200, 200-400, 400-600.
What are the advantages and disadvantages of using a WPT of band pass filters?

with wpt (or dwt indeed) you have quadrature mirror filters that will ensure that if you add up all the reconstructed signals in the last level (the leaves) of the wpt tree you get exactly the original signal except for the math processor finite word length aproximations. The algorithm is pretty fast.
Moreover if your signal is non-stationary you can gain the time-frequency localization although this will drastically decrease as you go down on the tree (inverted).
The other aspect is that if yoy are lucky to get a wavelet that correlates well with the non stationary components of your signal the transform will map this components more efficiently.
For you application firstly see how many levels you have to go down in the wpt tree to go from your sampling frequency to the desired freq intervals, you may not get excately 200-400, 400-600 etc,the downer you go in the tree the more accurate are the feq limits, and you may have to join nodes to get your bands.

Related

How to remove sidelobes while computing frequency from fft?

I am currently operating in vhf band AND trying to detect frequencies using Fast Fourier transform thresholding method.
While detection of multiple frequencies , i received spurs(May not appropriate word) in addition with original frequencies, Such as
in case of f1,f2 that are incoming frequencies i also receive their sum f1+f2 and difference f1-f2.
i am trying to eliminate these using thresholding method but i can't differentiate them with real frequency magnitudes.
Please suggest me some method, or methodology to eliminate this problem
Input frequencies F1, F2
Expected frequencies F1,F2
Receive frequencies ,F1,F2,F1-F2,F1+F2
https://imgur.com/3rYYNv2 plot link that elaborate problem
Windowing can reduce windowing artifacts and distant side lobes, but makes the main lobe wider in exchange. But a large reduction in both the main-lobe and near side-lobes normally requires using more data and a longer FFT.

Kalman Filtering need

I have a swarm robotic project. The localization system is done using ultrasonic and infrared transmitters/receivers. The accuracy is +-7 cm. I was able to do follow the leader algorithm. However, i was wondering why do i still have to use Kalman filter if the sensors raw data are good? what will it improve? isn't just will delay the coordinate being sent to the robots (the coordinates won't be updated instantly as it will take time to do the kalman filter math as each robot send its coordinates 4 times a second)
Sensor data is NEVER the truth, no matter how good they are. They will always be perturbed by some noise. Additionally, they do have finite precision. So sensor data is nothing but an observation that you make, and what you want to do is estimate the true state based on these observations. In mathematical terms, you want to estimate a likelihood or joint probability based on those measurements. You can do that using different tools depending on the context. One such tool is the Kalman filter, which in the simplest case is just a moving average, but is usually used in conjunction with dynamic models and some assumptions on error/state distributions in order to be useful. The dynamic models model the state propagation (e.g motion knowing previous states) and the observation (measurements), and in robotics/SLAM one often assumes the error is Gaussian. A very important and useful product of such filters is an estimation of the uncertainty in terms of covariances.
Now, what are the potential improvements? Basically, you make sure that your sensor measurements are coherent with a mathematical model and that they are "smooth". For example, if you want to estimate the position of a moving vehicle, the kinematic equations will tell you where you expect the vehicle to be, and you have an associated covariance. Your measurements also come with a covariance. So, if you get measurements that have low certainty, you will end up trusting the mathematical model instead of trusting the measurements, and vice-versa.
Finally, if you are worried about the delay... Note that the complexity of a standard extended Kalman filter is roughly O(N^3) where N is the number of landmarks. So if you really don't have enough computational power you can just reduce the state to pose, velocity and then the overhead will be negligible.
In general, Kalman filter helps to improve sensor accuracy by summing (with right coefficients) measurement (sensor output) and prediction for the sensor output. Prediction is the hardest part, because you need to create model that predicts in some way sensors' output. And I think in your case it is unnecessary to spend time creating this model.
Although you are getting accurate data from sensors but they cannot be consistent always. The Kalman filter will not only identify any outliers in the measurement data but also can predict it when some measurement is missing. However, if you are really looking for something with less computational requirements then you can go for a complimentary filter.

Time Series DFT Signals Clustering

I have a number of time series data sets, which I want to transform to dft signals in order to reduce dimensionality. After transforming to dft, I want to cluster the resulting dft data sets using k-means algorithm.
Since dft signals contain an imaginary number how can one cluster them?
You could simply treat the imaginary part as another component in your vectors. In other applications, you will want to ignore it!
But you'll be facing other, more severe challenges.
Data mining, and clustering in particular, rarely is as easy as appliyng function a (dft) and function b (k-means) and then you have the result, hooray. Sorry - that is not how exploratory data mining works.
First of all, for many time series, DFT will not be helpful at all. On others, you will first have to do appropriate resampling, or segmentation, or get rid of uninteresting effects such as seasonality. Even if DFT works, it may emphasize artifacts such as the sampling frequency or some interferences.
And then you'll run into one major problem: k-means is based on the assumption that all attributes have the same importance. And DFT is based on the very opposite idea: the first components capture most of the signal, the later ones only minor deviations from it (and that is the very motivation for using this as dimensionality reduction).
So based on this intuition, you maybe never should apply k-means on DFT coefficients at all. At the same time, data-mining repeatedly has shown that appfoaches that are "statistical nonsense" can nevertheless provide useful results... so you can try, but verify your resultd with care, and avoid being too enthusiastic or optimistic.
With the help of FFT, it converts dataset into dft signals. It helps to calculates DFT for each small data set.

Phase difference between two signals?

I'm working on this embedded project where I have to resonate the transducer by calculating the phase difference between its Voltage and Current waveform and making it zero by changing its frequency. Where I(current) & V(Voltage) are the same frequency signals at any instant but not the fixed frequency signals approx.(47Khz - 52kHz). All I have to do is to calculate phase difference between these two signals. Which method will be most effective.
FFT of Two signals and then phase difference between the specific components
Or cross-correlation of two signals?
Or another if any ? Which method will give me most accurate result ? and with what resolution? Does sampling rate affects phase difference's resolution (minimum phase difference which can be sensed) ?
I'm new to Digital signal processing, in case of any mistake, correct me.
ADDITIONAL DETAILS:-
Noise In my system can be white/Gaussian Noise(Not significant) & Harmonics of Fundamental (Which might be significant one in resonant mismatch case).
Yes 4046 can be a good alternative with switching regulators. I'm working with (NCO/DDS) where I can scale/ reshape sinusoidal on ongoing basis.
Implementation of Analog filter will be very complex as I will require higher order filter with high roll-off rate for harmonic removal , so I'm choosing DSP based filter and its easy to work with MATLAB DSP Processors.
What sampling rate would you suggest for a ~50 KHz (47Khz-52KHz) system for achieving result in FFT or Goertzel with phase resolution of preferably =<0.1 degrees or less and frequency steps will vary from as small as ~1 to 2Hz . to 50 Hz-200Hz.
My frequency is variable 45KHz - 55Khz ... But will be known to my system... Knowing phase error for the last fed frequency is more desirable. After FFT AND DIGITAL FILTERING , IFFT can be performed for more noise free samples which can be used for further processing. So i guess FFT do both the tasks ...
But I'm wondering about the Phase difference accuracy cause thats the crucial part.
The Goertzel algorithm http://www.embedded.com/design/configurable-systems/4024443/The-Goertzel-Algorithm is a fairly efficient tone detection method that resolves the signal into real and imaginary components. I'll assume you can do the numeric to get the phase difference or just polarity, as you require.
Resolution versus time constant is a design tradeoff which this article highlights issues. http://www.mstarlabs.com/dsp/goertzel/goertzel.html
Additional
"What accuracy can be obtained?"
It depends...upon what you are faced with (i.e., signal levels, external noise, etc.), what hardware you have (i.e., adc, processor, etc.), and how you implement your solution (sample rate, numerical precision, etc.). Without the complete picture, I'll be guessing what you could achieve as the Goertzel approach is far from easy.
But I imagine for a high school project with good signal levels and low noise, an easier method of using the phase comparator (2 as it locks at zero degrees) of a 4046 PLL www.nxp.com/documents/data_sheet/HEF4046B.pdf will likely get you down to a few degrees.
One other issue if you have a high Q transducer is generating a high-resolution frequency. There is a method but that's another avenue.
Yet more
"Harmonics of Fundamental (Which might be significant)"... hmm hence the digital filtering;
but if the sampling rate is too low then there might be a problem with aliasing. Also, mismatched anti-aliasing filters are likely to take your whole error budget. A rule of thumb of ten times sampling frequency seems a bit low, and it being higher it will make the filter design easier.
Spatial windowing addresses off-frequency issues along with higher roll-off and attenuation and is described in this article. Sliding Spectrum Analysis by Eric Jacobsen and Richard Lyons in Streamlining Digital Signal Processing http://www.amazon.com/Streamlining-Digital-Signal-Processing-Guidebook/dp/1118278380
In my previous project after detecting either carrier, I then was interested in the timing of the frequency changes in immense noise. With carrier phase generation inconstancies, the phase error was never quiescent to be quantified, so I can't guess better than you what you might get with your project conditions.
Not to detract from chip's answer (I upvoted it!) but some other options are:
Cross correlation. Off the top of my head, I am not sure what the performance difference between that and the Goertzel algorithm will be, but both should be doable on an embedded system.
Ad-hoc methods. For example, I would try something like this: bandpass the signals to eliminate noise, find the peaks and measure the time difference between the peaks. This will probably be more efficient, and, provided you do a reasonable job throwing out outliers and handling wrap-around, should be extremely robust. The bandpass filters will, themselves, alter the phase, so you'll have to make sure you apply exactly the same filter to both signals.
If the input signal-to-noise ratios are not too bad, a computually efficient solution can be built based on zero crossing detection. Also, have a look at http://www.metrology.pg.gda.pl/full/2005/M&MS_2005_427.pdf for a nice comparison of phase difference detection algorithms, including zero-crossing ones.
Computing 1-bin of a DFT (or using the similar complex Goertzel block filter) will work if the signal frequency is accurately known. (Set the DFT bin or the Goertzel to exactly that frequency).
If the frequency isn't exactly known, you could try using an FFT with an FFTshift to interpolate the frequency magnitude peak, and then interpolate the phase at that frequency for each of the two signals. An FFT will also allow you to window the data, which may improve phase estimation accuracy if the frequency isn't exactly bin centered (or exactly the Goertzel filter frequency). Different windows may improve the phase estimation accuracy for frequencies "between bins". A Blackman-Nutall window will be better than a rectangular window, but there may be better window choices.
The phase measurement accuracy will depend on the S/N ratio, the length of time one samples the two (assumed stationary) signals, and possibly the window used.
If you have a Phase Locked Loop (PLL) that tracks each input, then you can subtract the phase coefficients (of the generator components) to determine offset between the phases. This would also be robust against noise.

Clustering Method Selection in High-Dimension?

If the data to cluster are literally points (either 2D (x, y) or 3D (x, y,z)), it would be quite intuitive to choose a clustering method. Because we can draw them and visualize them, we somewhat know better which clustering method is more suitable.
e.g.1 If my 2D data set is of the formation shown in the right top corner, I would know that K-means may not be a wise choice here, whereas DBSCAN seems like a better idea.
However, just as the scikit-learn website states:
While these examples give some intuition about the algorithms, this
intuition might not apply to very high dimensional data.
AFAIK, in most of the piratical problems we don't have such simple data. Most probably, we have high-dimensional tuples, which cannot be visualized like such, as data.
e.g.2 I wish to cluster a data set where each data is represented as a 4-D tuple <characteristic1, characteristic2, characteristic3, characteristic4>. I CANNOT visualize it in a coordinate system and observes its distribution like before. So I will NOT be able to say DBSCAN is superior to K-means in this case.
So my question:
How does one choose the suitable clustering method for such an "invisualizable" high-dimensional case?
"High-dimensional" in clustering probably starts at some 10-20 dimensions in dense data, and 1000+ dimensions in sparse data (e.g. text).
4 dimensions are not much of a problem, and can still be visualized; for example by using multiple 2d projections (or even 3d, using rotation); or using parallel coordinates. Here's a visualization of the 4-dimensional "iris" data set using a scatter plot matrix.
However, the first thing you still should do is spend a lot of time on preprocessing, and finding an appropriate distance function.
If you really need methods for high-dimensional data, have a look at subspace clustering and correlation clustering, e.g.
Kriegel, Hans-Peter, Peer Kröger, and Arthur Zimek. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 3.1 (2009): 1.
The authors of that survey also publish a software framework which has a lot of these advanced clustering methods (not just k-means, but e.h. CASH, FourC, ERiC): ELKI
There are at least two common, generic approaches:
One can use some dimensionality reduction technique in order to actually visualize the high dimensional data, there are dozens of popular solutions including (but not limited to):
PCA - principal component analysis
SOM - self-organizing maps
Sammon's mapping
Autoencoder Neural Networks
KPCA - kernel principal component analysis
Isomap
After this one goes back to the original space and use some techniques that seems resonable based on observations in the reduced space, or performs clustering in the reduced space itself.First approach uses all avaliable information, but can be invalid due to differences induced by the reduction process. While the second one ensures that your observations and choice is valid (as you reduce your problem to the nice, 2d/3d one) but it loses lots of information due to transformation used.
One tries many different algorithms and choose the one with the best metrics (there have been many clustering evaluation metrics proposed). This is computationally expensive approach, but has a lower bias (as reducting the dimensionality introduces the information change following from the used transformation)
It is true that high dimensional data cannot be easily visualized in an euclidean high dimensional data but it is not true that there are no visualization techniques for them.
In addition to this claim I will add that with just 4 features (your dimensions) you can easily try the parallel coordinates visualization method. Or simply try a multivariate data analysis taking two features at a time (so 6 times in total) to try to figure out which relations intercour between the two (correlation and dependency generally). Or you can even use a 3d space for three at a time.
Then, how to get some info from these visualizations? Well, it is not as easy as in an euclidean space but the point is to spot visually if the data clusters in some groups (eg near some values on an axis for a parallel coordinate diagram) and think if the data is somehow separable (eg if it forms regions like circles or line separable in the scatter plots).
A little digression: the diagram you posted is not indicative of the power or capabilities of each algorithm given some particular data distributions, it simply highlights the nature of some algorithms: for instance k-means is able to separate only convex and ellipsoidail areas (and keep in mind that convexity and ellipsoids exist even in N-th dimensions). What I mean is that there is not a rule that says: given the distributiuons depicted in this diagram, you have to choose the correct clustering algorithm consequently.
I suggest to use a data mining toolbox that lets you explore and visualize the data (and easily transform them since you can change their topology with transformations, projections and reductions, check the other answer by lejlot for that) like Weka (plus you do not have to implement all the algorithms by yourself.
In the end I will point you to this resource for different cluster goodness and fitness measures so you can compare the results rfom different algorithms.
I would also suggest soft subspace clustering, a pretty common approach nowadays, where feature weights are added to find the most relevant features. You can use these weights to increase performance and improve the BMU calculation with euclidean distance, for example.

Resources