What is Phase unwrapping and Why its needed - phase

I am new to Digital Signal Processing field and trying to understand what is phase unwrapping and why it is needed. So far i have read that it's done to avoid phase jumps and to avoid multiple of 2pi is added to the difference between the two phases, but what i don't understand why phase jump happens. I guess this is the missing link in my understanding.
Thanks

Consider a simple oscillatory system in the xy plane (a point moving in a circle, with constant speed and amplitude). You have the time series of x and y of that point. You can apply the estimator phi = arctg(y/x) (since "amplitude and speed are constants").
Now, supose your system starts at phi=0, so phi(t=0) = 0 rads. Just before the it completes the first cicle, phi(t) approx 2Pi rads (one turn, as we expected). Just after that moment, in the begining of the 2nd cicle, phi(t) approx 0 again (oops, we want it to be 2pi! But note that x and y should have returned to the same values x(0) and y(0)... so we should expect that actg should be again 0). And in the end of that 2nd cicle, one has phi(t) = 2pi rads again (and we want it to be 4pi! Buts x and y shall now have the same values as in the final of the 1st cicle, and arctg naturally will be near 2pi and not 4pi).
So, for each cicle, the function arctg doesn't know that a complete turn was made. So, YOU need manually add 2Pi for each turn.
I suggest the following references:
Boashash, B. (1992). Estimating and interpreting the instantaneous frequency of a signal. II. Algorithms and applications. Proceedings of the IEEE, 80(4), 540–568. doi:10.1109/5.135378
Boccaletti, S., Kurths, J., Osipov, G., Valladares, D. L., & Zhou, C. S. (2002). The synchronization of chaotic systems. Physics Reports, 366, 1–101. doi:10.1016/S0370-1573(02)00137-0

Related

Algorithm for detecting the lowest level of EMG activity?

How, having only data from the EMG sensor, to determine whether a person is in the REM phase? In other words, I need to detect the lowest level of activity from the sensor's EMG. Well, or at least register the phase change ...
In more detail... I'm going to make a REM phase detector using an EMG (electromyography) sensor. There is already a sketch of the Android application on the github, if you are interested, I can post a link. Although there is still work to be done...)
The device should work based on the fact that in different states of the brain (wakefulness, slow sleep, REM sleep), different levels of activity will be recorded from the sensor. In REM sleep, this activity is minimal.
The Bluetooth sensor is attached to the body before going to bed, the Android program is launched, communicates with the sensor and sends the data read from it to the connected TCP client via WiFi. TCP client - python script running on a nettop. It receives data, and by design should determine in real time whether the current level of activity is the minimum for the entire observation period. If so, the script will tell the server (Android program) to turn on the hint - it can be a vibration on the phone or a fitness bracelet, playing an audio sample through a headphone, light flashes, a slight electric shock =), etc.
Because only one EMG sensor is used, I admit that it will not be possible to catch REM sleep phases with 100% accuracy, but this is not necessary. If the accuracy is 80% - it's already good. For starters, even an algorithm for detecting a change in the current activity level is suitable. - There will be something to experiment with and something to build on.
The problem is with the algorithm. I would not like to use fixed thresholds, because these thresholds will be different for different people, and even for the same person at different times and in different states they will differ. Therefore, I will be glad to ideas and tips from your side.
I found an interesting article "A Self-adaptive Threshold Method for Automatic Sleep Stage Classifi-
cation Using EOG and EMG" (https://www.researchgate.net/publication/281722966_A_Self-adaptive_Threshold_Method_for_Automatic_Sleep_Stage_Classification_Using_EOG_and_EMG):
But there remains incomprehensible, a few points. First, how is the energy calculated (Step 1-4: Energy)?
d = f.read(epoche_seconds*SAMPLE_RATE*2)
if not d:
break
print('epoche#{}'.format(i))
fmt = '<{}H'.format(len(d) // 2)
t = unpack(fmt, d)
d = list(t)
d = np.array(d)
d = d / (1 << 14) # 14 - bits per sample
df=pd.DataFrame({'signal': d, 'id': [x*2 for x in range(len(d))]})
df = df.set_index('id')
d = df.signal
# signal prepare:
d = butter_bandpass_filter(d, lowcut, highcut, SAMPLE_RATE, order=2)
# extract signal futures:
future_iv = np.mean(np.absolute(d))
print('iv={}'.format(future_iv))
future_var = np.var(d)
print('var={}'.format(future_var))
ws = 6
df = pd.DataFrame({'signal': np.absolute(d)})
future_e = df.rolling(ws).sum()[ws-1:].max().signal
print('E={}'.format(future_e))
-- Will it be right?
Secondly, can someone elaborate on this:
Step 2-1: normalized processed
EMG and EOG feature vectors were processed with
normalized function before involved into classification
steps. Feature vectors were normalized as the follow-
ing function (4):
Where, Xmax and Xmin were got by following
steps: first, sort x(i) vector, and set window length N
represents 50 ; then compare 2Nk (k, values from
10 to 1) with the length of x(i) , if 2Nk is bigger
than the later one, reduce k , until 2Nk is lower than
the length of x(i). If length of x(i) is greater than
2Nk, compute the mean of 50 larger values as
Xmax, and the average of 50 smaller values as Xmin.
?
If you are looking for REM (deep sleep); a spectogram will show you intensity and frequency information on a 1 page graph/chart. People of a certain age refer to spectograms as TFFT - the wiki link is... https://en.wikipedia.org/wiki/Spectrogram
Can you input the data into a spectogram display/plot. Use a large FFT window (you probably have hours of data) with a small overlap (15%). I would recommend starting with an FFT window around 1 second.

Anomaly detection with machine learning without labels

I am tracing multiple signals for a certain period of time and associating them with a timestamp like following:
t0 1 10 2 0 1 0 ...
t1 1 10 2 0 1 0 ...
t2 3 0 9 7 1 1 ... // pressed a button to change the mode
t3 3 0 9 7 1 1 ...
t4 3 0 8 7 1 1 ... // pressed button to adjust a certain characterstic like temperature (signal 3)
where t0 is the tamp stamp, 1 is the value for signal 1, 10 the value for signal 2 and so on.
That captured data during that certain period of time should be considered as the normal case. Now significant derivations should be detected from the normal case. With significant derivation I do NOT mean that one signal value just changes to a value that has not been seen during the tracing phase but rather that a lot of values change that have not yet been related to each other. I do not want to hardcode rules since in the future more signals might be added or removed and other "modi" that have other signal values might be implemented.
Can this be achieved via a certain Machine Learning algorithm? If a small derivation occurs I want the algorithm to first see it as a minor change to the training set and if it occurs multiple times in the future it should be "learned". The major goal is to detect the bigger changes / anomalies.
I hope I could explain my problem detailed enough. Thanks in advance.
you could just calculate the nearest neighbor in your feature space and set a threshold how far its allowed to be away from your test point to not be an anomaly.
Lets say you have 100 values in your "certain period of time"
so you use a 100 dimensional feature space with your training data (which doesn't contain anomalies)
If you get a new dataset you want to test, you calculate the (k) nearest neighbor(s) and calculate the (e.g. euclidean) distance in your featurespace.
If that distance is larger than a certain threshold it's an anomaly.
What you have to do in order to optimize is finding a good k and a good threshold. E.g. by Grid-search.
(1) Note that something like this probably only works well if your data has a fixed starting and ending point. Otherwise you would need a huge amount of data and even than it will not perform as good.
(2) Note It should be worth trying to create an own detector for every "mode" you have mentioned in your question.

What are some relevant A.I. techniques for programming a flock of entities?

Specifically, I am talking about programming for this contest: http://www.nodewar.com/about
The contest involves you facing other teams with a swarm of spaceships in a two dimensional world. The ships are constrained by a boundary (if they exit they die), and they must continually avoid moons (who pull the ships in with their gravity). The goal is to kill the opposing queen.
I have attempted to program a few, relatively basic techniques, but I feel as though I am missing something fundamental.
For example, I implemented some of the boids behaviours (http://www.red3d.com/cwr/boids/), but they seemed to lack a... goal, so to speak.
Are there any common techniques (or, preferably, combinations of techniques) for this sort of game?
EDIT
I would just like to open this up again with a bounty since I feel like I'm still missing critical pieces of information. The following is my NodeWar code:
boundary_field = (o, position) ->
distance = (o.game.moon_field - o.lib.vec.len(o.lib.vec.diff(position, o.game.center)))
return distance
moon_field = (o, position) ->
return o.lib.vec.len(o.lib.vec.diff(position, o.moons[0].pos))
ai.step = (o) ->
torque = 0;
thrust = 0;
label = null;
fields = [boundary_field, moon_field]
# Total the potential fields and determine a target.
target = [0, 0]
score = -1000000
step = 1
square = 1
for x in [(-square + o.me.pos[0])..(square + o.me.pos[0])] by step
for y in [(-square + o.me.pos[1])..(square + o.me.pos[1])] by step
position = [x, y]
continue if o.lib.vec.len(position) > o.game.moon_field
value = (fields.map (f) -> f(o, position)).reduce (t, s) -> t + s
target = position if value > score
score = value if value > score
label = target
{ torque, thrust } = o.lib.targeting.simpleTarget(o.me, target)
return { torque, thrust, label }
I may have implemented potential fields incorrectly, however, since all the examples I could find are about discrete movements (whereas NodeWar is continuous and not exact).
The main problem being my A.I. never stays within the game area for more than 10 seconds without flying off-screen or crashing into a moon.
You can easily make the boids algorithm play the game of Nodewar, by just adding additional steering behaviours to your boids and/or modifying the default ones. For example, you would add a steering behaviour to avoid the moon, and a steering behaviour for enemy ships (repulsion or attraction, depending on the position between yours and the enemy ship). The weights of the attraction/repulsion forces should then be tweaked (possibly by genetic algorithms, playing different configurations against each other).
I believe this approach would give you already a relatively strong baseline player, to which you can start adding collaborative strategies etc.
You can achieve the best possible flocking behaviour using control theory. We can start by expressing the state of a ship (a vector of positions and velocities) as variable X. We will use x to represent a small deviation from some state X0. For each node, linearize around the state you want the individual ships to be at:
d/dt(X) = f(X - X0)
where X0 is the state you want the ship to be at. f() can be nonlinear (as in the case of your potential field). Close to this point, the will obey
d/dt(x) = Ax
A is a fixed matrix which you should tweak to achieve stability. Much of the field of control theory tells you how to do this. It's very important to you that A leads to a stable system and that it converges fast. You should now break out MATLAB or GNU Octave. You should also read up on what poles mean in control theory, and Laplace transforms. The poles of the system (eigenvalues of matrix A) will characterize the response of the ship to disturbances of any kind, its stability, and its convergeance speed. It will also tell you whether the ship will move towards X0 or oscillate around X0.
There are several ways of choosing A (You probably don't get full control over A because of the game's limited physics) without having to do much analysis yourself. Two such algorithms are optimal control (You define the cost of controlling the system vs the cost of deviating from X0), and pole placement (You choose where the poles will go).
The state space theory also applies to an entire flock of ships, deviating from the configuration desired by every ship at that point in time. If the system is nonlinear, You might not be able to guarantee that all configurations are stable.
Conclusion:
Use control theory to guarantee individual stability, calculating X0 based on the ships around you, and add a potential field to prevent ships from bumping into eachother and moons.
Disclaimer:
There are several ways of expressing a state space system. This is the simplest but you'll need to express it differently when considering the open loop system (split up A into another 'A' which gives you the physical system and some other matrices which give you the control parameters)

setting nls parameters for a curve

I am a relative newcomer to R and not a mathematician but a geneticist. I have many sets of multiple pairs of data points. When they are plotted they yield a flattened S curve with most of the data points ending up near the zero mark. A minority of the data points fly far off creating what is almost two J curves, one down and one up. I need to find the inflection points where the data sharply veers upward or downward. This may be an issue with my math but is seems to me that if I can smooth and fit a curve to the line and get an equation I could then take the second derivative of the curve and determine the inflection points from where the second derivative changes sign. I tried it in excel and used the curve to get approximate fit to get the starting formula but the data has a bit of "wiggling" in it so determining any one inflection point is not possible even if I wanted to do it all manually (which I don't). Each of the hundreds of data sets that I have to find these two inflection points in will yield about the same curve but will have slightly different inflection points and determining those inflections points precisely is absolutely critical to the problem. So if I can set it up properly once in an equation that should do it. For simplicity I would like to break them into the positive curve and the negative curve and do each one separately. (Maybe there is some easier formula for s curves that makes that a bad idea?)
I have tried reading the manual and it's kind of hard to understand likely because of my weak math skills. I have also been unable to find any similar examples I could study from.
This is the head of my data set:
x y
[1,] 1 0.00000000
[2,] 2 0.00062360
[3,] 3 0.00079720
[4,] 4 0.00085100
[5,] 5 0.00129020
(X is just numbering 1 to however many data points and the number of X will vary a bit by the individual set.)
This is as far as I have gotten to resolve the curve fitting part:
pos_curve1 <- nls(curve_fitting ~ (scal*x^scal),data = cbind.data.frame(curve_fitting),
+ start = list(x = 0, scal = -0.01))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Am I just doing the math the hard way? What am I doing wrong with the nls? Any help would be much much appreciated.
Found it. The curve is exponential not J and the following worked.
fit <- nls(pos ~ a*tmin^b,
data = d,
start = list(a = .1, b = .1),
trace = TRUE)
Thanks due to Jorge I Velez at R Help Oct 26, 2009
Also I used "An Appendix to An R Companion to Applied Regression, second edition" by John Fox & Sanford Weisberg last revision 13: Dec 2010.
Final working settings for me were:
fit <- nls(y ~ a*log(10)^(x*b),pos_curve2,list(a = .01, b = .01), trace=TRUE)
I figured out what the formula should be by using open office spread sheet and testing the various curve fit options until I was able to show exponential was the best fit. I then got the structure of the equation from that. I used the Fox & Sanford article to understand the way to set the parameters.
Maybe I am not alone in this one but I really found it hard to figure out the parameters and there were few references or questions on it that helped me.

Why do the convolution results have different lengths when performed in time domain vs in frequency domain?

I'm not a DSP expert, but I understand that there are two ways that I can apply a discrete time-domain filter to a discrete time-domain waveform. The first is to convolve them in the time domain, and the second is to take the FFT of both, multiply both complex spectrums, and take IFFT of the result. One key difference in these methods is the second approach is subject to circular convolution.
As an example, if the filter and waveforms are both N points long, the first approach (i.e. convolution) produces a result that is N+N-1 points long, where the first half of this response is the filter filling up and the 2nd half is the filter emptying. To get a steady-state response, the filter needs to have fewer points than the waveform to be filtered.
Continuing this example with the second approach, and assuming the discrete time-domain waveform data is all real (not complex), the FFT of the filter and the waveform both produce FFTs of N points long. Multiplying both spectrums IFFT'ing the result produces a time-domain result also N points long. Here the response where the filter fills up and empties overlap each other in the time domain, and there's no steady state response. This is the effect of circular convolution. To avoid this, typically the filter size would be smaller than the waveform size and both would be zero-padded to allow space for the frequency convolution to expand in time after IFFT of the product of the two spectrums.
My question is, I often see work in the literature from well-established experts/companies where they have a discrete (real) time-domain waveform (N points), they FFT it, multiply it by some filter (also N points), and IFFT the result for subsequent processing. My naive thinking is this result should contain no steady-state response and thus should contain artifacts from the filter filling/emptying that would lead to errors in interpreting the resulting data, but I must be missing something. Under what circumstances can this be a valid approach?
Any insight would be greatly appreciated
The basic problem is not about zero padding vs the assumed periodicity, but that Fourier analysis decomposes the signal into sine waves which, at the most basic level, are assumed to be infinite in extent. Both approaches are correct in that the IFFT using the full FFT will return the exact input waveform, and both approaches are incorrect in that using less than the full spectrum can lead to effects at the edges (that usually extend a few wavelengths). The only difference is in the details of what you assume fills in the rest of infinity, not in whether you are making an assumption.
Back to your first paragraph: Usually, in DSP, the biggest problem I run into with FFTs is that they are non-causal, and for this reason I often prefer to stay in the time domain, using, for example, FIR and IIR filters.
Update:
In the question statement, the OP correctly points out some of the problems that can arise when using FFTs to filter signals, for example, edge effects, that can be particularly problematic when doing a convolution that is comparable in the length (in the time domain) to the sampled waveform. It's important to note though that not all filtering is done using FFTs, and in the paper cited by the OP, they are not using FFT filters, and the problems that would arise with an FFT filter implementation do not arise using their approach.
Consider, for example, a filter that implements a simple average over 128 sample points, using two different implementations.
FFT: In the FFT/convolution approach one would have a sample of, say, 256, points and convolve this with a wfm that is constant for the first half and goes to zero in the second half. The question here is (even after this system has run a few cycles), what determines the value of the first point of the result? The FFT assumes that the wfm is circular (i.e. infinitely periodic) so either: the first point of the result is determined by the last 127 (i.e. future) samples of the wfm (skipping over the middle of the wfm), or by 127 zeros if you zero-pad. Neither is correct.
FIR: Another approach is to implement the average with an FIR filter. For example, here one could use the average of the values in a 128 register FIFO queue. That is, as each sample point comes in, 1) put it in the queue, 2) dequeue the oldest item, 3) average all of the 128 items remaining in the queue; and this is your result for this sample point. This approach runs continuously, handling one point at a time, and returning the filtered result after each sample, and has none of the problems that occur from the FFT as it's applied to finite sample chunks. Each result is just the average of the current sample and the 127 samples that came before it.
The paper that OP cites takes an approach much more similar to the FIR filter than to the FFT filter (note though that the filter in the paper is more complicated, and the whole paper is basically an analysis of this filter.) See, for example, this free book which describes how to analyze and apply different filters, and note also that the Laplace approach to analysis of the FIR and IIR filters is quite similar what what's found in the cited paper.
Here's an example of convolution without zero padding for the DFT (circular convolution) vs linear convolution. This is the convolution of a length M=32 sequence with a length L=128 sequence (using Numpy/Matplotlib):
f = rand(32); g = rand(128)
h1 = convolve(f, g)
h2 = real(ifft(fft(f, 128)*fft(g)))
plot(h1); plot(h2,'r')
grid()
The first M-1 points are different, and it's short by M-1 points since it wasn't zero padded. These differences are a problem if you're doing block convolution, but techniques such as overlap and save or overlap and add are used to overcome this problem. Otherwise if you're just computing a one-off filtering operation, the valid result will start at index M-1 and end at index L-1, with a length of L-M+1.
As to the paper cited, I looked at their MATLAB code in appendix A. I think they made a mistake in applying the Hfinal transfer function to the negative frequencies without first conjugating it. Otherwise, you can see in their graphs that the clock jitter is a periodic signal, so using circular convolution is fine for a steady-state analysis.
Edit: Regarding conjugating the transfer function, the PLLs have a real-valued impulse response, and every real-valued signal has a conjugate symmetric spectrum. In the code you can see that they're just using Hfinal[N-i] to get the negative frequencies without taking the conjugate. I've plotted their transfer function from -50 MHz to 50 MHz:
N = 150000 # number of samples. Need >50k to get a good spectrum.
res = 100e6/N # resolution of single freq point
f = res * arange(-N/2, N/2) # set the frequency sweep [-50MHz,50MHz), N points
s = 2j*pi*f # set the xfer function to complex radians
f1 = 22e6 # define 3dB corner frequency for H1
zeta1 = 0.54 # define peaking for H1
f2 = 7e6 # define 3dB corner frequency for H2
zeta2 = 0.54 # define peaking for H2
f3 = 1.0e6 # define 3dB corner frequency for H3
# w1 = natural frequency
w1 = 2*pi*f1/((1 + 2*zeta1**2 + ((1 + 2*zeta1**2)**2 + 1)**0.5)**0.5)
# H1 transfer function
H1 = ((2*zeta1*w1*s + w1**2)/(s**2 + 2*zeta1*w1*s + w1**2))
# w2 = natural frequency
w2 = 2*pi*f2/((1 + 2*zeta2**2 + ((1 + 2*zeta2**2)**2 + 1)**0.5)**0.5)
# H2 transfer function
H2 = ((2*zeta2*w2*s + w2**2)/(s**2 + 2*zeta2*w2*s + w2**2))
w3 = 2*pi*f3 # w3 = 3dB point for a single pole high pass function.
H3 = s/(s+w3) # the H3 xfer function is a high pass
Ht = 2*(H1-H2)*H3 # Final transfer based on the difference functions
subplot(311); plot(f, abs(Ht)); ylabel("abs")
subplot(312); plot(f, real(Ht)); ylabel("real")
subplot(313); plot(f, imag(Ht)); ylabel("imag")
As you can see, the real component has even symmetry and the imaginary component has odd symmetry. In their code they only calculated the positive frequencies for a loglog plot (reasonable enough). However, for calculating the inverse transform they used the values for the positive frequencies for the negative frequencies by indexing Hfinal[N-i] but forgot to conjugate it.
I can shed some light to the reason why "windowing" is applied before FFT is applied.
As already pointed out the FFT assumes that we have a infinite signal. When we take a sample over a finite time T this is mathematically the equivalent of multiplying the signal with a rectangular function.
Multiplying in the time domain becomes convolution in the frequency domain. The frequency response of a rectangle is the sync function i.e. sin(x)/x. The x in the numerator is the kicker, because it dies down O(1/N).
If you have frequency components which are exactly multiples of 1/T this does not matter as the sync function is zero in all points except that frequency where it is 1.
However if you have a sine which fall between 2 points you will see the sync function sampled on the frequency point. It lloks like a magnified version of the sync function and the 'ghost' signals caused by the convolution die down with 1/N or 6dB/octave. If you have a signal 60db above the noise floor, you will not see the noise for 1000 frequencies left and right from your main signal, it will be swamped by the "skirts" of the sync function.
If you use a different time window you get a different frequency response, a cosine for example dies down with 1/x^2, there are specialized windows for different measurements. The Hanning window is often used as a general purpose window.
The point is that the rectangular window used when not applying any "windowing function" creates far worse artefacts than a well chosen window. i.e by "distorting" the time samples we get a much better picture in the frequency domain which closer resembles "reality", or rather the "reality" we expect and want to see.
Although there will be artifacts from assuming that a rectangular window of data is periodic at the FFT aperture width, which is one interpretation of what circular convolution does without sufficient zero padding, the differences may or may not be large enough to swamp the data analysis in question.

Resources