Discrete Wavelet Transform (Daubechies wavelet) of an array complex numbers

Discrete Wavelet Transform (Daubechies wavelet) of an array complex numbers - image-processing

Say, I have a signal represented as an array of real numbers y = [1,2,0,4,5,6,7,90,5,6]. I can use Daubechies-4 coefficients D4 = [0.482962, 0.836516, 0.224143, -0.129409], and apply a wavelet transform to receive high- and low-frequencies of the signal. So, the high frequency component will be calculated like this:
high[v] = y[2*v]*D4[0] + y[2*v+1]*D4[1] + y[2*v+2]*D4[2] + y[2*v+3]*D4[3],
and the low frequency component can be calculated using other D4 coefs permutation.
The question is: what if y is complex array? Do I just multiply and add complex numbers to receive subbands, or is it correct to get amplitude and phase, treat each of them like a real number, do the wavelet transform for them, and then restore complex number array for each subband using formulas real_part = abs * cos(phase) and imaginary_part = abs * sin(phase)?

To handle the case of complex data, you're looking at the Complex Wavelet Transform. It's actually a simple extension to the DWT. The most common way to handle complex data is to treat the real and imaginary components as two separate signals and perform a DWT on each component separately. You will then receive the decomposition of the real and imaginary components.
This is commonly known as the Dual-Tree Complex Wavelet Transform. This can best be described by the figure below that I pulled from Wikipedia:
Source: Wikipedia
It's called "dual-tree" because you have two DWT decompositions happening in parallel - one for the real component and one for the imaginary. In the above diagram, g0/h0 represent the low-pass and high-pass components of the real part of the signal x and g1/h1 represent the low-pass and high-pass components of the imaginary part of the signal x.
Once you decompose the real and imaginary parts into their respective DWT decompositions, you can combine them to get the magnitude and/or phase and proceed to the next step or whatever you desire to do with them.
The mathematical proof regarding the correctness of this is outside the scope of what we're talking about, but if you would like to see how this got derived, I refer you to the canonical paper by Kingsbury in 1997 in the work Image Processing with Complex Wavelets - http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=835E60EAF8B1BE4DB34C77FEE9BBBD56?doi=10.1.1.55.3189&rep=rep1&type=pdf. Pay close attention to the noise filtering of images using the CWT - this is probably what you're looking for.

Related

Trying to do PCA analysis on interest rate swaps data (multivariate time series)

I have a data set with 20 non-overlapping different swap rates (spot1y, 1y1y, 2y1y, 3y1y, 4y1y, 5y2y, 7y3y, 10y2y, 12y3y...) over the past year.
I want to use PCA / multiregression and look at residuals in order to determine which sectors on the curve are cheap/rich. Has anyone had experience with this? I've done PCA but not for time series. I'd ideally like to model something similar to the first figure here but in USD.
https://plus.credit-suisse.com/rpc4/ravDocView?docid=kv66a7
Thanks!

Here are some broad strokes that can help answer your question. Also, that's a neat analysis from CS :)
Let's be pythonistas and use NumPy. You can imagine your dataset as a 20x261 array of floats. The first place to start is creating the array. Suppose you have a CSV file storing the raw data persistently. Then a reasonable first step to load the data would be something as simple as:
import numpy
x = numpy.loadtxt("path/to/my/file")
The object x is our raw time series matrix, and we verify the truthness of x.shape == (20, 261). The next step is to transform this array into it's covariance matrix. Whether it has been done on the raw data already, or it still has to be done, the first step is centering each time series on it's mean, like this:
x_centered = x - x.mean(axis=1, keepdims=True)
The purpose of this step is to help simplify any necessary rescaling, and is a very good habit that usually shouldn't be skipped. The call to x.mean uses the parameters axis and keepdims to make sure each row (e.g. the time series for spot1yr, ...) is centered with it's mean value.
The next steps are to square and scale x to produce a swap rate covariance array. With 2-dimensional arrays like x, there are two ways to square it-- one that leads to a 261x261 array and another that leads to a 20x20 array. It's the second array we are interested in, and the squaring procedure that will work for our purposes is:
x_centered_squared = numpy.matmul(x_centered, x_centered.transpose())
Then, to scale one can chose between 1/261 or 1/(261-1) depending on the statistical context, which looks like this:
x_covariance = x_centered_squared * (1/261)
The array x_covariance has an entry for how each swap rate changes with itself, and changes with any one of the other swap rates. In linear-algebraic terms, it is a symmetric operator that characterizes the spread of each swap rate.
Linear algebra also tells us that this array can be decomposed into it's associated eigen-spectrum, with elements in this spectrum being scalar-vector pairs, or eigenvalue-eigenvector pairs. In the analysis you shared, x_covariance's eigenvalues are plotted in exhibit two as percent variance explained. To produce the data for a plot like exhibit two (which you will always want to furnish to the readers of your PCA), you simply divide each eigenvalue by the sum of all of them, then multiply each by 100.0. Due to the convenient properties of x_covariance, a suitable way to compute it's spectrum is like this:
vals, vects = numpy.linalg.eig(x_covariance)
We are now in a position to talk about residuals! Here is their definition (with our namespace): residuals_ij = x_ij − reconstructed_ij; i = 1:20; j = 1:261. Thus for every datum in x, there is a corresponding residual, and to find them, we need to recover the reconstructed_ij array. We can do this column-by-column, operating on each x_i with a change of basis operator to produce each reconstructed_i, each of which can be viewed as coordinates in a proper subspace of the original or raw basis. The analysis describes a modified Gram-Schmidt approach to compute the change of basis operator we need, which ensures this proper subspace's basis is an orthogonal set.
What we are going to do in the approach is take the eigenvectors corresponding to the three largest eigenvalues, and transform them into three mutually orthogonal vectors, x, y, z. Research the web for active discussions and questions geared toward developing the Gram-Schmidt process for all sorts of practical applications, but for simplicity let's follow the analysis by hand:
x = vects[0] - sum([])
xx = numpy.dot(x, x)
y = vects[1] - sum(
(numpy.dot(x, vects[1]) / xx) * x
)
yy = numpy.dot(y, y)
z = vects[2] - sum(
(numpy.dot(x, vects[2]) / xx) * x,
(numpy.dot(y, vects[2]) / yy) * y
)
It's reasonable to implement normalization before or after this step, which should be informed by the data of course.
Now with the raw data, we implicitly made the assumption that the basis is standard, we need a map between {e1, e2, ..., e20} and {x,y,z}, which is given by
ch_of_basis = numpy.array([x,y,z]).transpose()
This can be used to compute each reconstructed_i, like this:
reconstructed = []
for measurement in x.transpose().tolist():
reconstructed.append(numpy.dot(ch_of_basis, measurement))
reconstructed = numpy.array(reconstructed).transpose()
And then you get the residuals by subtraction:
residuals = x - reconstructed
This flow obviously might need further tuning, but it's the gist of how to do compute all the residuals. To get that periodic bar plot, take the average of each row in residuals.

Is there a fundamental difference between DSP for Audio Signal Processing and Sensor Signal Processing?

Audio is made up of multiple frequencies occurring at any given time, and we can perform the FFT to get the Frequency bins, but what does the concept of Frequency mean when it comes to Sensor data?
For example, a Triaxial Accelerometer somehow converts a voltage signal and produces acceleration readings in ms^-2. Is the FFT performed with those X,Y,Z readings or the voltages sampled at Fs.
Am I overcomplicating things or is there a difference in mindset when performing DSP for Audio vs Sensor data?

A Fourier transform is tool to convert functions or signals into something that is easier to work with. It is a mathematical tool. The result can have an easy physical interpretation but that is not always the case.
Assume you have an object with constant mass and several periodic sin-like forces F_1*sin(c*t), F_2*sin(d*t), ... that act on the object. The total force is just the sum of those forces:
F(t) = F_1*sin(c*t) + F_2*sin(d*t) + ...
You get the particle's acceleration using Newton's 2nd law:
m * a(t) = F(t)
=> a(t) = F(t) / m = F_1/m * sin(c*t) + F_2/m * sin(d*t) + ...
Let's assume you measured a(t) but don't know the equation above. It you perform a Fourier transformation you can calculate the values of F_1/m, F_2/m, ... . That means your Fourier transform of the the acceleration is the amplitude of the force at the given frequency over the object's mass.
This interpretation works because the Fourier transform is linear and so is the adding of forces (See Newtons 2nd law). If you describe something non-linear chances are that there is no easy interpretation of the result of the transformation.
So when do you perform the FFT? It depends:
If you do it to improve you signal (remove noise) do it on the measured data.
If you want to analyse the physical object (resonances) do it on the acceleration.
(If the conversion is linear (ADC output to m/s^2 is a simple multiplication) it does not matter.)
I hope this makes things a bit clearer (at least from the physical point of view).

Feed a complex-valued image into Neural network (tensorflow)

I'm working on a project which tries to "learn" a relationship between a set of around 10 k complex-valued input images (amplitude/phase; real/imag) and a real-valued output-vector with 48 entries. This output-vector is not a set of labels, but a set of numbers which represents the best parameters to optimize the visual impression of the given complex-valued image. These parameters are generated by an algorithm. It's possible, that there is some noise in the data (comming from images and from the algorithm which generates the parameter-vector)
Those parameters more-less depends on the FFT (fast-fourier-transform) of the input image. Therfore I was thinking of feeding the network (5 hidden-layers, but architecture shouldn't matter right now) with a 1D-reshaped version of the FFT(complexImage) - some pseudocode:
// discretize spectrum
obj_ft = fftshift(fft2(object));
obj_real_2d = real(obj_ft);
obj_imag_2d = imag(obj_ft);
// convert 2D in 1D rows
obj_real_1d = reshape(obj_real_2d, 1, []);
obj_imag_1d = reshape(obj_imag_2d, 1, []);
// create complex variable for 1d object and concat
obj_complx_1d(index, :) = [obj_real_1d obj_imag_1d];
opt_param_1D(index, :) = get_opt_param(object);
I was wondering if there is a better approach for feeding complex-valued images into a deep-network. I'd like to avoid the use of complex gradients, because it's not really necessary?! I "just" try to find a "black-box" which outputs the optimized parameters after inserting a new image.
Tensorflow gets the input: obj_complx_1d and output-vector opt_param_1D for training.

There are several ways you can treat complex signals as input.
Use a transform to make them into 'images'. Short Time Fourier Transforms are used to make spectrograms which are 2D. The x-axis being time, y-axis being frequency. If you have complex input data, you may choose to simply look at the magnitude spectrum, or the power spectral density of your transformed data.
Something else that I've seen in practice is to treat the in-phase and quadrature (real/imaginary) channels separate in early layers of the network, and operate across both in higher layers. In the early layers, your network will learn characteristics of each channel, in higher layers it will learn the relationship between the I/Q channels.
These guys do a lot with complex signals and neural nets. In particular check out 'Convolutional Radio Modulation Recognition Networks'
https://radioml.com/research/

The simplest way to feed complex valued numbers with out using complex gradients in your models is to represent the complex values in a different representation. The two main ways are:
Magnitude/Angle components
Real/Imaginary components
I'll show this idea using magnitude/angle components. Assuming you have a 2d numpy array representing an image with shape = (WIDTH, HEIGHT)
import numpy as np
kSpace = np.fft.ifftshift(np.fft.fft2(img))
This would give you a 2D complex array. You can then transform the array into a
data = np.dstack((np.abs(kSpace), np.angle(kSpace)))
This array will be a numpy array with shape = (WIDTH, HEIGHT, 2). This array represents one complex valued image. For a set of images, make sure to concatenate them together to get an array of shape = (NUM_IMAGES, WIDTH, HEIGHT, 2)
I made a simple example of using tensorflow to learn an Fourier Transform with a simple neural network. You can find this example at https://github.com/michaelmendoza/learning-tensorflow

kalman filter with redundant state measurements

I am trying to implement a kalman filter for orientation detection. just like most other implementations I found online, I will be using a gyro and accelerometer to measure the pitch and roll, however I intend to also add horizon detection. This will give me a second reading for the pitch and roll. This means that I will have two means of measuring the current state, accelerometer and horizon detection whilst the gyro will be used for control.
So far I have implemented the filter on the sensor data and horizon detection separately based on this tutorial: http://blog.tkjelectronics.dk/2012/09/a-practical-approach-to-kalman-filter-and-how-to-implement-it/
Which part of the kalman filter do I have to modify for the algorithm to choose the best reading between the predicted state, accelerometer reading and horizon detected reading? Any help, links to papers or sites will be appreciated
thanks in advance for your help

The KF consists of two parallel components:
1. the estimated state, AND
2. the uncertainty in that estimate (specifically, the covariance matrix of the state components).
When combining 2 estimates of the state, the standard method takes a weighted average, with the weights being the inverses of the (co)variances. That is, the more certain of the 2 estimates (smaller covariance) is weighted more highly than the other.
So if you are not already tracking the covariances for your 2 estimates, you will need to do this.
For a scalar state X with 2 estimates X' and X", you would have a variance for each estimate: V' and V", with inverses C' = 1/V' and C" = 1/V". (The "certainties" C are easier to use than the variances V.)
Then the MMSE estimate (which is what the KF attempts to optimize) of the state is given by: Xmmse = (X'/V' + X"/V") / (1/V' + 1/V").
[There is also a corresponding update to V at this point, based on V' and V".]
For vectorial states, V will be replaces by a covariance matrix, and the divisions will become matrix inverses. In this case, it may be easier to track the inverses C directly: Xmmse = (C' + C") \ (C' * X' + C" * X") [and corresponding update to C],
where "\" denotes pre-multiplication by the inverse of the first factor, C'+C".
I hope this helps.
[I apologize for the poor formatting here. Stack Overflow has inferred that the algebraic expressions are code, and demanded formatting them as code before it would let me answer. They aren't, so I couldn't.]

Why do the convolution results have different lengths when performed in time domain vs in frequency domain?

I'm not a DSP expert, but I understand that there are two ways that I can apply a discrete time-domain filter to a discrete time-domain waveform. The first is to convolve them in the time domain, and the second is to take the FFT of both, multiply both complex spectrums, and take IFFT of the result. One key difference in these methods is the second approach is subject to circular convolution.
As an example, if the filter and waveforms are both N points long, the first approach (i.e. convolution) produces a result that is N+N-1 points long, where the first half of this response is the filter filling up and the 2nd half is the filter emptying. To get a steady-state response, the filter needs to have fewer points than the waveform to be filtered.
Continuing this example with the second approach, and assuming the discrete time-domain waveform data is all real (not complex), the FFT of the filter and the waveform both produce FFTs of N points long. Multiplying both spectrums IFFT'ing the result produces a time-domain result also N points long. Here the response where the filter fills up and empties overlap each other in the time domain, and there's no steady state response. This is the effect of circular convolution. To avoid this, typically the filter size would be smaller than the waveform size and both would be zero-padded to allow space for the frequency convolution to expand in time after IFFT of the product of the two spectrums.
My question is, I often see work in the literature from well-established experts/companies where they have a discrete (real) time-domain waveform (N points), they FFT it, multiply it by some filter (also N points), and IFFT the result for subsequent processing. My naive thinking is this result should contain no steady-state response and thus should contain artifacts from the filter filling/emptying that would lead to errors in interpreting the resulting data, but I must be missing something. Under what circumstances can this be a valid approach?
Any insight would be greatly appreciated

The basic problem is not about zero padding vs the assumed periodicity, but that Fourier analysis decomposes the signal into sine waves which, at the most basic level, are assumed to be infinite in extent. Both approaches are correct in that the IFFT using the full FFT will return the exact input waveform, and both approaches are incorrect in that using less than the full spectrum can lead to effects at the edges (that usually extend a few wavelengths). The only difference is in the details of what you assume fills in the rest of infinity, not in whether you are making an assumption.
Back to your first paragraph: Usually, in DSP, the biggest problem I run into with FFTs is that they are non-causal, and for this reason I often prefer to stay in the time domain, using, for example, FIR and IIR filters.
Update:
In the question statement, the OP correctly points out some of the problems that can arise when using FFTs to filter signals, for example, edge effects, that can be particularly problematic when doing a convolution that is comparable in the length (in the time domain) to the sampled waveform. It's important to note though that not all filtering is done using FFTs, and in the paper cited by the OP, they are not using FFT filters, and the problems that would arise with an FFT filter implementation do not arise using their approach.
Consider, for example, a filter that implements a simple average over 128 sample points, using two different implementations.
FFT: In the FFT/convolution approach one would have a sample of, say, 256, points and convolve this with a wfm that is constant for the first half and goes to zero in the second half. The question here is (even after this system has run a few cycles), what determines the value of the first point of the result? The FFT assumes that the wfm is circular (i.e. infinitely periodic) so either: the first point of the result is determined by the last 127 (i.e. future) samples of the wfm (skipping over the middle of the wfm), or by 127 zeros if you zero-pad. Neither is correct.
FIR: Another approach is to implement the average with an FIR filter. For example, here one could use the average of the values in a 128 register FIFO queue. That is, as each sample point comes in, 1) put it in the queue, 2) dequeue the oldest item, 3) average all of the 128 items remaining in the queue; and this is your result for this sample point. This approach runs continuously, handling one point at a time, and returning the filtered result after each sample, and has none of the problems that occur from the FFT as it's applied to finite sample chunks. Each result is just the average of the current sample and the 127 samples that came before it.
The paper that OP cites takes an approach much more similar to the FIR filter than to the FFT filter (note though that the filter in the paper is more complicated, and the whole paper is basically an analysis of this filter.) See, for example, this free book which describes how to analyze and apply different filters, and note also that the Laplace approach to analysis of the FIR and IIR filters is quite similar what what's found in the cited paper.

Here's an example of convolution without zero padding for the DFT (circular convolution) vs linear convolution. This is the convolution of a length M=32 sequence with a length L=128 sequence (using Numpy/Matplotlib):
f = rand(32); g = rand(128)
h1 = convolve(f, g)
h2 = real(ifft(fft(f, 128)*fft(g)))
plot(h1); plot(h2,'r')
grid()
The first M-1 points are different, and it's short by M-1 points since it wasn't zero padded. These differences are a problem if you're doing block convolution, but techniques such as overlap and save or overlap and add are used to overcome this problem. Otherwise if you're just computing a one-off filtering operation, the valid result will start at index M-1 and end at index L-1, with a length of L-M+1.
As to the paper cited, I looked at their MATLAB code in appendix A. I think they made a mistake in applying the Hfinal transfer function to the negative frequencies without first conjugating it. Otherwise, you can see in their graphs that the clock jitter is a periodic signal, so using circular convolution is fine for a steady-state analysis.
Edit: Regarding conjugating the transfer function, the PLLs have a real-valued impulse response, and every real-valued signal has a conjugate symmetric spectrum. In the code you can see that they're just using Hfinal[N-i] to get the negative frequencies without taking the conjugate. I've plotted their transfer function from -50 MHz to 50 MHz:
N = 150000 # number of samples. Need >50k to get a good spectrum.
res = 100e6/N # resolution of single freq point
f = res * arange(-N/2, N/2) # set the frequency sweep [-50MHz,50MHz), N points
s = 2j*pi*f # set the xfer function to complex radians
f1 = 22e6 # define 3dB corner frequency for H1
zeta1 = 0.54 # define peaking for H1
f2 = 7e6 # define 3dB corner frequency for H2
zeta2 = 0.54 # define peaking for H2
f3 = 1.0e6 # define 3dB corner frequency for H3
# w1 = natural frequency
w1 = 2*pi*f1/((1 + 2*zeta1**2 + ((1 + 2*zeta1**2)**2 + 1)**0.5)**0.5)
# H1 transfer function
H1 = ((2*zeta1*w1*s + w1**2)/(s**2 + 2*zeta1*w1*s + w1**2))
# w2 = natural frequency
w2 = 2*pi*f2/((1 + 2*zeta2**2 + ((1 + 2*zeta2**2)**2 + 1)**0.5)**0.5)
# H2 transfer function
H2 = ((2*zeta2*w2*s + w2**2)/(s**2 + 2*zeta2*w2*s + w2**2))
w3 = 2*pi*f3 # w3 = 3dB point for a single pole high pass function.
H3 = s/(s+w3) # the H3 xfer function is a high pass
Ht = 2*(H1-H2)*H3 # Final transfer based on the difference functions
subplot(311); plot(f, abs(Ht)); ylabel("abs")
subplot(312); plot(f, real(Ht)); ylabel("real")
subplot(313); plot(f, imag(Ht)); ylabel("imag")
As you can see, the real component has even symmetry and the imaginary component has odd symmetry. In their code they only calculated the positive frequencies for a loglog plot (reasonable enough). However, for calculating the inverse transform they used the values for the positive frequencies for the negative frequencies by indexing Hfinal[N-i] but forgot to conjugate it.

I can shed some light to the reason why "windowing" is applied before FFT is applied.
As already pointed out the FFT assumes that we have a infinite signal. When we take a sample over a finite time T this is mathematically the equivalent of multiplying the signal with a rectangular function.
Multiplying in the time domain becomes convolution in the frequency domain. The frequency response of a rectangle is the sync function i.e. sin(x)/x. The x in the numerator is the kicker, because it dies down O(1/N).
If you have frequency components which are exactly multiples of 1/T this does not matter as the sync function is zero in all points except that frequency where it is 1.
However if you have a sine which fall between 2 points you will see the sync function sampled on the frequency point. It lloks like a magnified version of the sync function and the 'ghost' signals caused by the convolution die down with 1/N or 6dB/octave. If you have a signal 60db above the noise floor, you will not see the noise for 1000 frequencies left and right from your main signal, it will be swamped by the "skirts" of the sync function.
If you use a different time window you get a different frequency response, a cosine for example dies down with 1/x^2, there are specialized windows for different measurements. The Hanning window is often used as a general purpose window.
The point is that the rectangular window used when not applying any "windowing function" creates far worse artefacts than a well chosen window. i.e by "distorting" the time samples we get a much better picture in the frequency domain which closer resembles "reality", or rather the "reality" we expect and want to see.

Although there will be artifacts from assuming that a rectangular window of data is periodic at the FFT aperture width, which is one interpretation of what circular convolution does without sufficient zero padding, the differences may or may not be large enough to swamp the data analysis in question.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart