Why does FFT not have an effect on my smoothed signal? - signal-processing

I'm playing with FFT at the moment and I try to get periods from noisy signals by recreating this example. While experimenting, I've noticed that after smoothing a quite noisy signal, the result of fft() is actually the same signal again - which is what I don't understand.
Here is a full example which can be run in an IPython Notebook (You can create a notebook here and run the code if you want).
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
figsize = (16,8)
n = 500
ls = np.linspace(0,2*np.pi, n)
x_target = np.sin(12*ls) + np.sin(52*ls)
x = np.sin(12*ls) + np.sin(52*ls) + np.random.rand(n) * 3.5
x = x - np.mean(x)
x_smooth = pd.rolling_mean(pd.DataFrame(x), 14).replace(np.nan, 0.0).as_matrix()
x_smooth = x_smooth - np.mean(x_smooth)
x_smooth = np.roll(x_smooth, -7)
# Getting shwifty and showing what we've got
plt.figure(figsize=(16,8))
plt.scatter(ls, x, s=3, c=[1.0,0.0,0.0,1.0])
plt.plot(ls, x_target, color=[1.0,0.0,0.0, 0.3])
plt.plot(ls, x_smooth)
plt.legend(["Target", "Smooth", "Noisy Data"])
# Target
x_fft = np.abs(np.fft.fft(x_target))
pd.DataFrame(x_fft).plot(figsize=figsize)
# Looks like it should
x_fft = np.abs(np.fft.fft(x))
pd.DataFrame(x_fft).plot(figsize=figsize)
# Plots the same signal?
x_fft = np.abs(np.fft.fft(x_smooth))
pd.DataFrame(x_fft).plot(figsize=figsize)
Below you find the resulting plots of this script.
Noisy data with smoothed signal:
FFT of the target function
FFT of the noisy data
FFT of the smoothed data
I don't really get why this is the case here. Can somebody explain this to me or am I doing something wrong here?

The critical difference is between:
x_fft = np.abs(np.fft.fft(x_smooth))
and
x_fft = np.abs(np.fft.fft(x_smooth.flatten()))
because it seems that x_smooth has gotten itself all 2-dimensional somewhere along the way. Its shape is (500,1) and because np.fft.fft works by default along axis=-1 (i.e. the highest dimension) it is taking the 500 separate FFTs of 500 different 1-sample signals. (Unsuprisingly enough, that returns only the DC component for each, so put them all together and you end up with the same signal you started with.)
The FFT from the smoothed signal really looks like this:

Related

Can I calculate the confidence bound of a Prophet model that would contain a certain value?

can I use the y-hat variance, the bounds, and the point estimate from the forecast data frame to calculate the confidence level that would contain a given value?
I've seen that I can change my interval level prior to fitting but programmatically that feels like a LOT of expensive trial and error.
Is there a way to estimate the confidence bound using only the information from the model parameters and the forecast data frame?
Something like:
for level in [.05, .1, .15, ... , .95]:
if value_in_question in (yhat - Z_{level}*yhat_variance/N, yhat + Z_{level}*yhat_variance/N):
print 'im in the bound level {level}'
# This is sudo code not meant to run in console
EDIT: working prophet example
# csv from fbprohets working examples https://github.com/facebook/prophet/blob/master/examples/example_wp_log_peyton_manning.csv
import pandas as pd
from fbprophet import Prophet
import os
df = pd.read_csv('example_wp_log_peyton_manning.csv')
m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
# the smallest confidence level s.t. the confidence interval of the 30th prediction contains 9
## My current approach
def __probability_calculation(estimate, forecast, j = 30):
sd_residuals = (forecast.yhat_lower[j] - forecast.yhat[j])/(-1.28)
for alpha in np.arange(.5, .95, .01):
z_val = st.norm.ppf(alpha)
if (forecast.yhat[j]-z_val*sd_residuals < estimate < forecast.yhat[j]+z_val*sd_residuals):
return alpha
prob = __probability_calculation(9, forecast)
fbprophet uses the numpy.percentile method to estimate the percentiles as you can see here in the source code:
https://github.com/facebook/prophet/blob/0616bfb5daa6888e9665bba1f95d9d67e91fed66/python/prophet/forecaster.py#L1448
How to inverse calculate percentiles for values is already answered here:
Map each list value to its corresponding percentile
Combining everything based on your code example:
import pandas as pd
import numpy as np
import scipy.stats as st
from fbprophet import Prophet
url = 'https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv'
df = pd.read_csv(url)
# put the amount of uncertainty samples in a variable so we can use it later.
uncertainty_samples = 1000 # 1000 is the default
m = Prophet(uncertainty_samples=uncertainty_samples)
m.fit(df)
future = m.make_future_dataframe(periods=30)
# You need to replicate some of the preparation steps which are part of the predict() call internals
tmpdf = m.setup_dataframe(future)
tmpdf['trend'] = m.predict_trend(tmpdf)
sim_values = m.sample_posterior_predictive(tmpdf)
The sim_values object contains for every datapoint 1000 simulations on which the confidence interval is based.
Now you can call the scipy.stats.percentileofscore method with any target value
target_value = 8
st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples
# returns 44.26
To prove this works backwards and forwards you can get the output of the np.percentile method and put it in the scipy.stats.percentileofscore method.
This works for an accuracy of 4 decimals:
ACCURACY = 4
for test_percentile in np.arange(0, 100, 0.5):
target_value = np.percentile(sim_values['yhat'], test_percentile)
if not np.round(st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples, ACCURACY) == np.round(test_percentile, ACCURACY):
print(test_percentile)
raise ValueError('This doesnt work')

My speaker recognition neural network doesn’t work well

I have a final project in my first degree and I want to build a Neural Network that gonna take the first 13 mfcc coeffs of a wav file and return who talked in the audio file from a banch of talkers.
I want you to notice that:
My audio files are text independent, therefore they have different length and words
I have trained the machine on about 35 audio files of 10 speaker ( the first speaker had about 15, the second 10, and the third and fourth about 5 each )
I defined :
X=mfcc(sound_voice)
Y=zero_array + 1 in the i_th position ( where i_th position is 0 for the first speaker, 1 for the second, 2 for the third... )
And than trained the machine, and than checked the output of the machine for some files...
So that’s what I did... but unfortunately it’s look like the results are completely random...
Can you help me understand why?
This is my code in python -
from sklearn.neural_network import MLPClassifier
import python_speech_features
import scipy.io.wavfile as wav
import numpy as np
from os import listdir
from os.path import isfile, join
from random import shuffle
import matplotlib.pyplot as plt
from tqdm import tqdm
winner = [] # this array count how much Bingo we had when we test the NN
for TestNum in tqdm(range(5)): # in every round we build NN with X,Y that out of them we check 50 after we build the NN
X = []
Y = []
onlyfiles = [f for f in listdir("FinalAudios/") if isfile(join("FinalAudios/", f))] # Files in dir
names = [] # names of the speakers
for file in onlyfiles: # for each wav sound
# UNESSECERY TO UNDERSTAND THE CODE
if " " not in file.split("_")[0]:
names.append(file.split("_")[0])
else:
names.append(file.split("_")[0].split(" ")[0])
names = list(dict.fromkeys(names)) # names of speakers
vector_names = [] # vector for each name
i = 0
vector_for_each_name = [0] * len(names)
for name in names:
vector_for_each_name[i] += 1
vector_names.append(np.array(vector_for_each_name))
vector_for_each_name[i] -= 1
i += 1
for f in onlyfiles:
if " " not in f.split("_")[0]:
f_speaker = f.split("_")[0]
else:
f_speaker = f.split("_")[0].split(" ")[0]
(rate, sig) = wav.read("FinalAudios/" + f) # read the file
try:
mfcc_feat = python_speech_features.mfcc(sig, rate, winlen=0.2, nfft=512) # mfcc coeffs
for index in range(len(mfcc_feat)): # adding each mfcc coeff to X, meaning if there is 50000 coeffs than
# X will be [first coeff, second .... 50000'th coeff] and Y will be [f_speaker_vector] * 50000
X.append(np.array(mfcc_feat[index]))
Y.append(np.array(vector_names[names.index(f_speaker)]))
except IndexError:
pass
Z = list(zip(X, Y))
shuffle(Z) # WE SHUFFLE X,Y TO PERFORM RANDOM ON THE TEST LEVEL
X, Y = zip(*Z)
X = list(X)
Y = list(Y)
X = np.asarray(X)
Y = np.asarray(Y)
Y_test = Y[:50] # CHOOSE 50 FOR TEST, OTHERS FOR TRAIN
X_test = X[:50]
X = X[50:]
Y = Y[50:]
clf = MLPClassifier(solver='lbfgs', alpha=1e-2, hidden_layer_sizes=(5, 3), random_state=2) # create the NN
clf.fit(X, Y) # Train it
for sample in range(len(X_test)): # add 1 to winner array if we correct and 0 if not, than in the end it plot it
if list(clf.predict([X[sample]])[0]) == list(Y_test[sample]):
winner.append(1)
else:
winner.append(0)
# plot winner
plot_x = []
plot_y = []
for i in range(1, len(winner)):
plot_y.append(sum(winner[0:i])*1.0/len(winner[0:i]))
plot_x.append(i)
plt.plot(plot_x, plot_y)
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
This is my zip file that contains the code and the audio file : https://ufile.io/eggjm1gw
You have a number of issues in your code and it will be close to impossible to get it right in one go, but let's give it a try. There are two major issues:
Currently you're trying to teach your neural network with very few training examples, as few as a single one per speaker (!). It's impossible for any machine learning algorithm to learn anything.
To make matters worse, what you do is that you feed to the ANN only MFCC for the first 25 ms of each recording (25 comes from winlen parameter of python_speech_features). In each of these recordings, first 25 ms will be close to identical. Even if you had 10k recordings per speaker, with this approach you'd not get anywhere.
I will give you concrete advise, but won't do all the coding - it's your homework after all.
Use all MFCC, not just first 25 ms. Many of these should be skipped, simply because there's no voice activity. Normally there should be VOD (Voice Activity Detector) telling you which ones to take, but in this exercise I'd skip it for starter (you need to learn basics first).
Don't use dictionaries. Not only it won't fly with more than one MFCC vector per speaker, but also it's very inefficient data structure for your task. Use numpy arrays, they're much faster and memory efficient. There's a ton of tutorials, including scikit-learn that demonstrate how to use numpy in this context. In essence, you create two arrays: one with training data, second with labels. Example: if omersk speaker "produces" 50000 MFCC vectors, you will get (50000, 13) training array. Corresponding label array would be 50000 with single constant value (id) that corresponds to the speaker (say, omersk is 0, lucas is 1 and so on). I'd consider taking longer windows (perhaps 200 ms, experiment!) to reduce the variance.
Don't forget to split your data for training, validation and test. You will have more than enough data. Also, for this exercise I'd watch for not feeding too much of data for any single speaker - ot taking steps to make sure algorithm is not biased.
Later, when you make prediction, you will again compute MFCCs for the speaker. With 10 sec recording, 200 ms window and 100 ms overlap, you'll get 99 MFCC vectors, shape (99, 13). The model should run on each of the 99 vectors, for each producing probability. When you sum it (and normalise, to make it nice) and take top value, you'll get the most likely speaker.
There's a dozen of other things that typically would be taken into account, but in this case (homework) I'd focus on getting the basics right.
EDIT: I decided to take a stab at creating the model with your idea at heart, but basics fixed. It's not exactly clean Python, all because it's adapted from Jupyter Notebook I was running.
import python_speech_features
import scipy.io.wavfile as wav
import numpy as np
import glob
import os
from collections import defaultdict
from sklearn.neural_network import MLPClassifier
from sklearn import preprocessing
from sklearn.model_selection import cross_validate
from sklearn.ensemble import RandomForestClassifier
audio_files_path = glob.glob('audio/*.wav')
win_len = 0.04 # in seconds
step = win_len / 2
nfft = 2048
mfccs_all_speakers = []
names = []
data = []
for path in audio_files_path:
fs, audio = wav.read(path)
if audio.size > 0:
mfcc = python_speech_features.mfcc(audio, samplerate=fs, winlen=win_len,
winstep=step, nfft=nfft, appendEnergy=False)
filename = os.path.splitext(os.path.basename(path))[0]
speaker = filename[:filename.find('_')]
data.append({'filename': filename,
'speaker': speaker,
'samples': mfcc.shape[0],
'mfcc': mfcc})
else:
print(f'Skipping {path} due to 0 file size')
speaker_sample_size = defaultdict(int)
for entry in data:
speaker_sample_size[entry['speaker']] += entry['samples']
person_with_fewest_samples = min(speaker_sample_size, key=speaker_sample_size.get)
print(person_with_fewest_samples)
max_accepted_samples = int(speaker_sample_size[person_with_fewest_samples] * 0.8)
print(max_accepted_samples)
training_idx = []
test_idx = []
accumulated_size = defaultdict(int)
for entry in data:
if entry['speaker'] not in accumulated_size:
training_idx.append(entry['filename'])
accumulated_size[entry['speaker']] += entry['samples']
elif accumulated_size[entry['speaker']] < max_accepted_samples:
accumulated_size[entry['speaker']] += entry['samples']
training_idx.append(entry['filename'])
X_train = []
label_train = []
X_test = []
label_test = []
for entry in data:
if entry['filename'] in training_idx:
X_train.append(entry['mfcc'])
label_train.extend([entry['speaker']] * entry['mfcc'].shape[0])
else:
X_test.append(entry['mfcc'])
label_test.extend([entry['speaker']] * entry['mfcc'].shape[0])
X_train = np.concatenate(X_train, axis=0)
X_test = np.concatenate(X_test, axis=0)
assert (X_train.shape[0] == len(label_train))
assert (X_test.shape[0] == len(label_test))
print(f'Training: {X_train.shape}')
print(f'Testing: {X_test.shape}')
le = preprocessing.LabelEncoder()
y_train = le.fit_transform(label_train)
y_test = le.transform(label_test)
clf = MLPClassifier(solver='lbfgs', alpha=1e-2, hidden_layer_sizes=(5, 3), random_state=42, max_iter=1000)
cv_results = cross_validate(clf, X_train, y_train, cv=4)
print(cv_results)
{'fit_time': array([3.33842635, 4.25872731, 4.73704267, 5.9454329 ]),
'score_time': array([0.00125694, 0.00073504, 0.00074005, 0.00078583]),
'test_score': array([0.40380048, 0.52969121, 0.48448687, 0.46043165])}
The test_score isn't stellar. There's a lot to improve (for starter, choice of algorithm), but the basics are there. Notice for starter how I get the training samples. It's not random, I only consider recordings as whole. You can't put samples from a given recording to both training and test, as test is supposed to be novel.
What was not working in your code? I'd say a lot. You were taking 200ms samples and yet very short fft. python_speech_features likely complained to you that the fft is should be longer than the frame you're processing.
I leave to you testing the model. It won't be good, but it's a starter.

How to normalise all training samples at once using MinMaxScaler

I have 1320 training samples (sea surface temperature) and each sample is a 2d array(160,320) so the final array is in the shape (1320,160,320). I would like to normalize them to values between 0 and 1 using MinMaxScaler(). I get the error "Found array with dim 3. MinMaxScaler expected <= 2.". My code is as follows. I could loop through all the 1320 samples, normalising them one by one but I would like to know if there is a way to normalize all of them because Max and Mix for each sample is not the same.
scaler = prep.MinMaxScaler()
sst = scaler.fit_transform(sst)
As far as I know, you can't really do it only using MinMaxScaler(). np.apply_along_axis won't be useful either since you want to apply a min-max scaler over 2D slices. One solution could be something like this:
import numpy as np
a = np.random.random((2, 3, 3))
def customMinMaxScaler(X):
return (X - X.min()) / (X.max() - X.min())
np.array([customMinMaxScaler(x) for x in a])
But I guess it wouldn't be much faster than iterating over the samples.

Output of BatchNorm1d in PyTorch does not match output of manually normalizing input dimensions

In an attempt to understand how BatchNorm1d works in PyTorch, I tried to match the output of BatchNorm1d operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here's the code (note that affine is set to false):
import torch
import torch.nn as nn
from torch.autograd import Variable
X = torch.randn(20,100) * 5 + 10
X = Variable(X)
B = nn.BatchNorm1d(100, affine=False)
y = B(X)
mu = torch.mean(X[:,1])
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ration below should be equal to one
print(x.data / y[:,1].data )
Output is:
0.9747
0.9747
0.9747
....
Doing the same thing for BatchNorm2d works without any issues. How does BatchNorm1d calculate its output?
Found out the reason. torch.var uses Bessel's correction while calculating variance. Passing the attribute unbiased=False gives identical values.

Kernel Function in Gaussian Processes

Given a kernel in Gaussian Process, is it possible to know the shape of functions being drawn from the prior distribution without sampling at first?
I think the best way to know the shape of prior functions is to draw them. Here's 1-dimensional example:
These are the samples from the GP prior (mean is 0 and covariance matrix induced by the squared exponential kernel). As you case see they are smooth and generally it gives a feeling how "wiggly" they are. Also note that in case of multi-dimensions each one of them will look somewhat like this.
Here's a full code I used, feel free to write your own kernel or tweak the parameters to see how it affects the samples:
import numpy as np
import matplotlib.pyplot as pl
def kernel(a, b, gamma=0.1):
""" GP squared exponential kernel """
sq_dist = np.sum(a**2, 1).reshape(-1, 1) + np.sum(b**2, 1) - 2*np.dot(a, b.T)
return np.exp(-0.5 * (1 / gamma) * sq_dist)
n = 300 # number of points.
m = 10 # number of functions to draw.
s = 1e-6 # noise variance.
X = np.linspace(-5, 5, n).reshape(-1, 1)
K = kernel(X, X)
L = np.linalg.cholesky(K + s * np.eye(n))
f_prior = np.dot(L, np.random.normal(size=(n, m)))
pl.figure(1)
pl.clf()
pl.plot(X, f_prior)
pl.title('%d samples from the GP prior' % m)
pl.axis([-5, 5, -3, 3])
pl.show()

Resources