How to get mfcc features with octave - signal-processing

My goal is to create program on octave that loads audio file (wav, flac), calculates its mfcc features and serve them as output. The problem is that I do not have much experience with octave and cannot get octave load the audio file and that is why I am not sure if the extraction algorithms is correct. Is there simple way of loading the file and getting its features?

You can run mfcc code from RASTAMAT in octave, you only need to fix few things, the fixed version is available for download here.
The changes are to properly set windows in powspec.m
WINDOW = hanning(winpts);
and to fix the bug in specgram function which is not compatible with Matlab.

Check out Octave functions for calculating MFCC at https://github.com/jagdish7908/mfcc-octave
For a detailed theory on steps to compute MFCC, refer http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
function frame = create_frames(y, Fs, Fsize, Fstep)
N = length(y);
% divide the signal into frames with overlap = framestep
samplesPerFrame = floor(Fs*Fsize);
samplesPerFramestep = floor(Fs*Fstep);
i = 1;
frame = [];
while(i <= N-samplesPerFrame)
frame = [frame y(i:(i+samplesPerFrame-1))];
i = i+samplesPerFramestep;
endwhile
return
endfunction
function ans = hz2mel(f)
ans = 1125*log(1+f/700);
return
endfunction
function ans = mel2hz(f)
ans = 700*(exp(f/1125) - 1);
return
endfunction
function bank = melbank(n, min, max, sr)
% n = number of banks
% min = min frequency in hertz
% max = max frequency in hertz
% convert the min and max freq in mel scale
NFFT = 512;
% figure out bin value of min and max freq
minBin = floor((NFFT)*min/(sr/2));
maxBin = floor((NFFT)*max/(sr/2));
% convert the min, max in mel scale
min_mel = hz2mel(min);
max_mel = hz2mel(max);
m = [min_mel:(max_mel-min_mel)/(n+2-1):max_mel];
%disp(m);
h = mel2hz(m);
% replace frequencies in h with thier respective bin values
fbin = floor((NFFT)*h/(sr/2));
%disp(h);
% create triangular melfilter vectors
H = zeros(NFFT,n);
for vect = 2:n+1
for k = minBin:maxBin
if k >= fbin(vect-1) && k <= fbin(vect)
H(k,vect) = (k-fbin(vect-1))/(fbin(vect)-fbin(vect-1));
elseif k >= fbin(vect) && k <= fbin(vect+1)
H(k,vect) = (fbin(vect+1) - k)/(fbin(vect+1)-fbin(vect));
endif
endfor
endfor
bank = H;
return
endfunction
clc;
clear all;
close all;
pkg load signal;
% record audio
Fs = 44100;
y = record(3,44100);
% OR %
% Load existing file
%[y, Fs] = wavread('../FILE_PATH/');
%y = y(44100:2*44100);
% create mel filterbanks
minFreq = 500; % minimum cutoff frequency in Hz
maxFreq = 10000; % maximum cutoff frequency in Hz
% melbank(number_of_banks, minFreq, mazFreq, sampling_rate)
foo = melbank(30,minFreq,maxFreq,Fs);
% create frames
frames = create_frames(y, Fs, 0.025, 0.010);
% calculate periodogram of each frame
NF = length(frames(1,:));
[P,F] = periodogram(frames(:,1),[], 1024, Fs);
% apply mel filters to the power spectra
P = foo.*P(1:512);
% sum the energy in each filter and take the logarithm
P = log(sum(P));
% take the DCT of the log filterbank energies
% discard the first coeff 'cause it'll be -Inf after taking log
L = length(P);
P = dct(P(2:L));
PXX = P;
for i = 2:NF
P = periodogram(frames(:,i),[], 1024, Fs);
% apply mel filters to the power spectra
P = foo.*P(1:512);
% sum the energy in each filter and take the logarithm
P = log(sum(P));
% take the DCT of the log filterbank energies
% discard the first coeff 'cause it'll be -Inf after taking log
P = dct(P(2:L));
% coeffients are stacked row wise for each frame
PXX = [PXX; P];
endfor
% stack the coeffients column wise
PXX = PXX';
plot(PXX);

Related

Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission.
The data set is of the sort,
337 0.92
324 0.76
316 0.72
322 0.8
. . .
My main Program.m file looks like,
% read the data
data = load('Admission_Predict.txt');
% initiate variables
x = data(:,1);
y = data(:,2);
m = length(y);
theta = zeros(2,1);
alpha = 0.01;
iters = 1500;
J_hist = zeros(iters,1);
% plot data
subplot(1,2,1);
plot(x,y,'rx','MarkerSize', 10);
title('training data');
% compute cost function
x = [ones(m,1), (data(:,1) ./ 300)]; % feature scaling
J = computeCost(x,y,theta);
% run gradient descent
[theta, J_hist] = gradientDescent(x,y,theta,alpha,iters);
hold on;
subplot(1,2,1);
plot((x(:,2) .* 300), (x*theta),'-');
xlabel('GRE score');
ylabel('Probability');
hold off;
subplot (1,2,2);
plot(1:iters, J_hist, '-b');
xlabel('no: of iteration');
ylabel('Cost function');
computeCost.m looks like,
function J = computeCost(x,y,theta)
m = length(y);
h = x * theta;
J = (1/(2*m))*sum((h-y) .^ 2);
endfunction
and gradientDescent.m looks like,
function [theta, J_hist] = gradientDescent(x,y,theta,alpha,iters)
m = length(y);
J_hist = zeros(iters,1);
for i=1:iters
diff = (x*theta - y);
theta = theta - (alpha * (1/(m))) * (x' * diff);
J_hist(i) = computeCost(x,y,theta);
endfor
endfunction
The graphs plotted then looks like this,
which you can see, doesn't feel right even though my Cost function seems to be minimized.
Can someone please tell me if this is right? If not, what am I doing wrong?
The easiest way to check whether your implementation is correct is to compare with a validated implementation of linear regression. I suggest using an alternative implementation approach like the one suggested here, and then comparing your results. If the fits match, then this is the best linear fit to your data and if they don't match, then there may be something wrong in your implementation.

Generating a Histogram by Harmonic Number

I am trying to create a program in GNU Octave to draw a histogram showing the fundamental and harmonics of a modified sinewave (the output from an SCR dimmer, which consists of a sinewave which is at zero until part way through the wave).
I've been able to generate the waveform and perform FFT to get a set of Frequency vs Amplitude points, however I am not sure how to convert this data into bins suitable for generating a histogram.
Sample code and an image of what I'm after below - thanks for the help!
clear();
vrms = 120;
freq = 60;
nCycles = 2;
level = 25;
vpeak = sqrt(2) * vrms;
sampleinterval = 0.00001;
num_harmonics = 10
disp("Start");
% Draw the waveform
x = 0 : sampleinterval : nCycles * 1 / freq; % time in sampleinterval increments
dimmed_wave = [];
undimmed_wave = [];
for i = 1 : columns(x)
rad_value = x(i) * 2 * pi * freq;
off_time = mod(rad_value, pi);
on_time = pi*(100-level)/100;
if (off_time < on_time)
dimmed_wave = [dimmed_wave, 0]; % in the dimmed period, value is zero
else
dimmed_wave = [dimmed_wave, sin(rad_value)]; % when not dimmed, value = sine
endif
undimmed_wave = [undimmed_wave, sin(rad_value)];
endfor
y = dimmed_wave * vpeak; % calculate instantaneous voltage
undimmed = undimmed_wave * vpeak;
subplot(2,1,1)
plot(x*1000, y, '-', x*1000, undimmed, '--');
xlabel ("Time (ms)");
ylabel ("Voltage");
% Fourier Transform to determine harmonics
subplot(2,1,2)
N = length(dimmed_wave); % number of points
fft_vals = abs(fftshift(fft(dimmed_wave))); % perform fft
frequency = [ -(ceil((N-1)/2):-1:1) ,0 ,(1:floor((N-1)/2)) ] * 1 / (N *sampleinterval);
plot(frequency, fft_vals);
axis([0,400]);
xlabel ("Frequency");
ylabel ("Amplitude");
You know your base frequency (fundamental tone), let's call it F. 2*F is the second harmonic, 3*F the third, etc. You want to set histogram bin edges halfway between these: 1.5*F, 2.5*F, etc.
You have two periods in your input signal, therefore your (integer) base frequency is k=2 (the value at fft_vals[k+1], the first peak in your plot). The second harmonic is at k=4, the third one at k=6, etc.
So you would set your bins edges at k = 1:2:end.
In general, this would be k = nCycles/2:nCycles:end.
You can compute your bar graph according to our computed bin edges as follows:
fft_vals = abs(fft(dimmed_wave));
nHarmonics = 9;
edges = nCycles/2 + (0:nHarmonics)*nCycles;
H = cumsum(fft_vals);
H = diff(H(edges));
bar(1:nHarmonics,H);

Not getting accurate result when using normalized data for gradient descent

I am currently on week 2 of Andrew NG's Machine Learning course on Coursera, and I came across an issue that I cannot sort out.
Based on a data set, where the first column is the house size, the second the number of bedrooms in it, and the third column is the price of it, I need to use linear regression and gradient descent after normalizing the data to predict new house prices.
However, I am getting a gigantic number for my prediction and I cannot find where is the error on my calculations.
I am using the following:
alpha = 0.03;
num_iters = 400;
Code to normalize the features (X is the data set matrix):
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
for i = 1:size(X, 2);
mu(1, i) = mean(X(:, i)), % Getting the mean of each row.
sigma(1, i) = std(X(:, i)), % Getting the standard deviation of each row.
for j = 1:size(X, 1);
X_norm(j, i) = (X(j, i) .- mu(1, i)) ./ sigma(1, i);
end;
end;
Code to calculate current cost:
m = length(y);
J = 0;
predictions = X * theta;
sqErrors = (predictions - y).^2;
J = (1/(2*m)) * sum(sqErrors);
Code to calculate gradient descent:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% Getting the predictions for our firstly chosen theta values.
predictions = X * theta;
% Getting the error difference of the hypothesis(h(x)) and real results(y).
diff = predictions - y;
% Getting the number of features.
features_num = size(X, 2);
% Applying gradient descent for each feature.
for i = 1:features_num;
theta(i, 1) = theta(i, 1) - (alpha / m) * sum(diff .* X(:, i))
end;
% Saving the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);
The resulting price I am getting when predicting a house with 1650 squared feet and 3 bedrooms:
182329818.366117

Gibbs sampling gives small probabilities

As part of our final design project, we have to design a Gibbs sampler to denoise an image. We have chosen to use the Metropolis Algorithm instead of a regular Gibbs sampler. A rough sketch of the algorithm is as follows, all pixels are 0-255 greyscale values. Also, we are using a simple smoothness prior distribution.
main()
get input image as img
originalImg = img
for k = 1 to 1000
beta = 3/log(1+k)
initialEnergy = energy(img,originalImg)
for i = 0 to imageRows
for j = 0 to imageCols
img[i][j] = metropolisSample(beta,originalImg,img,initialEnergy,i,j)
energy(img,originalImg)
for i = 1 to imageRows
for j = 1 to imageCols
ans += (img[i][j] - originalImg[i][j])^2 / (255*255)
ans += (img[i][j] - image[i][j+1])^2 / (255*255)
ans += (img[i][j] - image[i][j-1])^2 / (255*255)
ans += (img[i][j] - image[i+1][j])^2 / (255*255)
ans += (img[i][j] - image[i-1][j])^2 / (255*255)
return ans
metropolisSample(beta,originalImg,img,initialEnergy,i,j)
imageCopy = img
imageCopy[i][j] = random int between 0 and 255
newEnergy = energy(imageCopy,originalImg)
if (newEnergy < initialEnergy)
initialEnergy = newEnergy
return imageCopy[i][j]
else
rand = random float between 0 and 1
prob = exp(-(1/beta) * (newEnergy - initialEnergy))
if rand < prob
initialEnergy = newEnergy
return imageCopy[i][j]
else
return img[i][j]
That's pretty much the gist of the program. My issue is that in the step where I calculate probability
prob = exp(-(1/beta) * (newEnergy - initialEnergy))
The difference in energies is so large that the probability is almost always zero. What is the proper way to mitigate this? We have also tried the Gibbs sampling approach, but we run into a similar problem. The Gibbs sampler code is as follows. Instead of using metropolisSample, we use gibbsSample instead
gibbsSample(beta,originalImg,img,initialEnergy,i,j)
imageCopy = img
sum = 0
for k = 0 to 255
imageCopy[i][j] = k
energies[k] = energy(imageCopy,originalImg)
prob[k] = exp(-(1/beta) * energies[k])
sum += prob[k]
for k = 0 to 255
prob[k] / sum
for k = 1 to 255
prob[k] = prob[k-1] + prob[k] //Convert our PDF to a CDF
rand = random float between 0 and 1
k = 0
while (1)
if (rand < prob[k])
break
k++
initialEnergy = energy[k]
return k
We were having similar issues with this implementation as well. When we calculated
prob[k] = exp(-(1/beta) * energies[k])
our energies were so large that our probabilities all went to zero. Theoretically, this shouldn't be an issue because we are summing them all up and then dividing by the sum, but the floating point representation just isn't accurate enough. What would be a good way to fix this?
I know nothing about your specific problem, but my first response would be to scale the energies. Your pixels are in the range of 0..255, which is arbitrary. If the pixels were fractions between zero and one, you would have very different results.
If the energy units are in pixel^2, try dividing the energies by 256^2. Else, try dividing by 256.
Also, given that the data is fully random, it is possible that there are very high energies, and there should in fact not be high probabilities.
My lack of knowledge of your problem may have resulted in a useless answer. If so, please ignore it.
I think probability for Gibbs Sampling in Ising model should be
p = 1 / (1 + np.exp(-2 * beta * Energy(x,y)))

How to convert a low-pass filter to a band-pass filter

I have a a low pass filter described by the following transfer function:
h[n] = (w_c/Pi) * sinc( n * w_c / Pi ), where is w_c is the cutoff frequency
I have to convert this low-pass filter to a band-pass filter.
You h[n] transforms into a rect in frequency domain. To make it band pass you need to move its central frequency higher.
To do this, multiply h[n] by exp(j*w_offset*n), where w_offset is the amount to shift. If w_offset is positive, then you shift towards higher frequencies.
Multiplication in time domain is convolution in frequency domain. Since exp(j*w_offset*n) turns into impulse function centred on w_offset, the multiplication shifts the H(w) by w_offset.
See Discrete Time Fourier Transform for more details.
Note: such a filter will not be symmetric about 0, which means it will have complex values. To make it symmetric, you need to add h[n] multiplied by exp(-j*w_offset*n):
h_bandpass[n] = h[n](exp(j*w_offset*n)+exp(-j*w_offset*n))
Since cos(w*n) = (exp(j*w*n)+exp(-j*w*n))/2 we get:
h_bandpass[n] = h[n]cos(w_offset*n)
This filter then has purely real values.
The short answer is that you will multiply by a complex exponential in the time domain. Multiplication in the time domain will shift the signal in the frequency domain.
Matlab code:
n_taps = 100;
n = 1:n_taps;
h = ( w_c / Pi ) * sinc( ( n - n_taps / 2) * w_c / Pi ) .* ...
exp( i * w_offset * ( n - n_taps / 2) );
p.s. I happened to have just implemented this exact functionality for school a couple of weeks ago.
Here is code for creating your own band pass filter using the windowing method:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: Create bandpass filter using windowing method
% Purpose: Simple method for creating filter taps ( useful when more elaborate
% filter design libraries are not available )
%
% #author Trevor B. Smith, 24MAR2009
%
% #param n_taps How many taps are in your output filter
% #param omega_p1 The lower cutoff frequency for your passband filter
% #param omega_p2 The upper cutoff frequency for your passband filter
% #return h_bpf_hammingWindow The filter coefficients for your passband filter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function h_bpf_hammingWindow = BPF_hammingWindow(n_taps,omega_p1,omega_p2)
% Error checking
if( ( omega_p2 == omega_p1 ) || ( omega_p2 < omega_p1 ) || ( n_taps < 10 ) )
str = 'ERROR - h_bpf_hammingWindow(): Incorrect input parameters'
h_bpf_hammingWindow = -1;
return;
end
% Compute constants from function parameters
length = n_taps - 1; % How many units of T ( i.e. how many units of T, sampling period, in the continuous time. )
passbandLength = omega_p2 - omega_p1;
passbandCenter = ( omega_p2 + omega_p1 ) / 2;
omega_c = passbandLength / 2; % LPF omega_c is half the size of the BPF passband
isHalfSample = 0;
if( mod(length,2) == 1 )
isHalfSample = 1/2;
end
% Compute hamming window
window_hamming = hamming(n_taps);
% Compute time domain samples
n = transpose(-ceil(length/2):floor(length/2));
h1 = sinc( (1/pi) * omega_c * ( n + isHalfSample ) ) * pi .* exp( i * passbandCenter * ( n + isHalfSample ) );
% Window the time domain samples
h2 = h1 .* window_hamming;
if 1
figure; stem(h2); figure; freqz(h2);
end
% Return filter coefficients
h_bpf_hammingWindow = h2;
end % function BPF_hammingWindow()
Example on how to use this function:
h_bpf_hammingWindow = BPF_hammingWindow( 36, pi/4, 3*pi/4 );
freqz(h_bpf_hammingWindow); % View the frequency domain
Let f[n] be the signal you get from the low-pass filter with w_c at the lower bound of the desired band. You can get the frequencies above this lower bound by subtracting f[n] from the original signal. This is the input you want for the second low-pass filter.

Resources