What are exactly the Keyword list's thresholds - Pocketsphinx - keyword

I have been developing a program with Pocketsphinx and i have a keyword list.
I know that a threshold is between 1 and 1e-x with x up until 50 and bigger the word x should increase, but what exactly is that threshold? I can't find anything on Internet explaining that concept.

Related

How would I break down a signals sound pressure level by frequency

I've been given some digitized sound recordings and asked to plot the sound pressure level per Hz.
The signal is sampled at 40KHz and the units for the y axis are simply volts.
I've been asked to produce a graph of the SPL as dB/Hz vs Hz.
EDIT: The input units are voltage vs time.
Does this make sense? I though SPL was a time domain measure?
If it does make sense how would I go about producing this graph? Apply the dB formula (20 * log10(x) IIRC) and do an FFT on that or...?
What you're describing is a Power Spectral Density. Matlab, for example, has a pwelch function that does literally what you're asking for. To scale to dBSPL/Hz, simply apply 10*log10([psd]) where psd is the output of pwelch. Let me know if you need help with the function inputs.
If you're working with a different framework, let me know which, 100% sure they'll have a version of this function, possibly with a different output format in which case the scaling might be different.

fundamental frequency of female voice

According to what I have read on the internet, the normal range of fundamental frequency of female voice is 165 to 255 Hz .
I am using Praat and also python library called Parselmouth to get the fundamental frequency values of female voice in an audio file(.wav). however, I got some values that are over 255Hz(eg: 400+Hz, 500Hz).
Is it normal to get big values like this?
It is possible, but unlikely, if you are trying to capture the fundamental frequency (F0) of a speaking voice. It sounds likely that you are capturing a more easily resonating overtone (e.g. F1 or F2) instead.
My experiments with Praat give me the impression that the with good parameters it will reliably extract F0.
What you'll want to do is to verify that by comparing the pitch curve with a spectrogram. Here's an example of a fitting made by Praat (female speaker):
You can see from the image that
Most prominent frequency seems to be F2
Around 200 Hz seems likely to be F0, since there's only noise below that (compared to before/after the segment)
Praat has calculated a good estimate of F0 for the voiced speech segments
If, after a visual inspection, it seems that you are getting wrong results, you can try to tweak the parameters. Window length greatly affects the frequency resolution.
If you can't capture frequencies this low, you should try increasing the window length - the intuition is that it gives the algorithm a better chance at finding slowly changing periodic features in the data.

Explanation for Values in Scharr-Filter used in OpenCV (and other places)

The Scharr-Filter is explained in Scharrs dissertation. However the values given on page 155 (167 in the pdf) are [47 162 47] / 256. Multiplying this with the derivation-filter would yield:
Yet all other references I found use
Which is roughly the same as the ones given by Scharr, scaled by a factor of 32.
Now my guess is that the range can be represented better, but I'm curious if there is an official explanation somewhere.
To get the ball rolling on this question in case no "expert" can be found...
I believe the values [3, 10, 3] ... instead of [47 162 47] / 256 ... are used simply for speed. Recall that this method is competing against the Sobel Operator whose coefficient values are are 0, and positive/negative 1's and 2's.
Even though the divisor in the division, 256 or 512, is a power of 2 and can can be performed by a shift, doing that and multiplying by 47 or 162 is going to take more time. A multiplication by 3 however can in fact be done on some RISC architectures like the IBM POWER series in a single shift-and-add operation. That is 3x = (x << 1) + x. (On these architectures, the shifter and adder are separate units and can be done independently).
I don't find it surprising that Phd paper used the more complicated and probably more precise formula; it needed to prove or demonstrate something, and the author probably wasn't totally certain or concerned that it be used and implemented alongside other methods. The purpose in the thesis was probably to have "perfect rotational symmetry". Afterwards when one decides to implement it, that person I suspect used the approximation formula and gave up a little on perfect rotational symmetry, to gain speed. That person's goal as I said was to have something that was competitive at the expense of little bit of speed for this rotational stuff.
Since I'm guessing you are willing to do work this as it is your thesis, my suggestion is to implement the original algorithm and benchmark it against both the OpenCV Scharr and Sobel code.
The other thing to try to get an "official" answer is: "Use the 'source', Luke!". The code is on github so check it out and see who added the Scharr filter there and contact that person. I won't put the person's name here, but I will say that the code was added 2010-05-11.

Which of the parameters in LibSVM is the slack variable?

I am a bit confused about the namings in the SVM. I am using this library LibSVM. There are so many parameters that can be set. Does anyone know which of these is the slack variable?
thx
The "slack variable" is C in c-svm and nu in nu-SVM. These both serve the same function in their respective formulations - controlling the tradeoff between a wide margin and classifier error. In the case of C, one generally test it in orders of magnitude, say 10^-4, 10^-3, 10^-2,... to 1, 5 or so. nu is a number between 0 and 1, generally from .1 to .8, which controls the ratio of support vectors to data points. When nu is .1, the margin is small, the number of support vectors will be a small percentage of the number of data points. When nu is .8, the margin is very large and most of the points will fall in the margin.
The other things to consider are your choice of kernel (linear, RBF, sigmoid, polynomial) and the parameters for the chosen kernel. Generally one has to do a lot of experimenting to find the best combination of parameters. However, be careful of over-fitting to your dataset.
Burges wrote a great tutorial: A Tutorial on Support Vector Machines for Pattern
Recognition
But if you mostly just want to know how to USE it and less about how it works, read "A Practical Guide to Support Vector Classication" by Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin (authors of libsvm)
First decide which type of SVM are u intending to use: C-SVC, nu-SVC , epsilon-SVR or nu-SVR. In my opinion u need to vary C and gamma most of the time... the rest are usually fixed..

Software Phase Locked Loop example code needed

Does anyone know of anywhere I can find actual code examples of Software Phase Locked Loops (SPLLs) ?
I need an SPLL that can track a PSK modulated signal that is somewhere between 1.1 KHz and 1.3 KHz. A Google search brings up plenty of academic papers and patents but nothing usable. Even a trip to the University library that contains a shelf full of books on hardware PLL's there was only a single chapter in one book on SPLLs and that was more theoretical than practical.
Thanks for your time.
Ian
I suppose this is probably too late to help you (what did you end up doing?) but it may help the next guy.
Here's a golfed example of a software phase-locked loop I just wrote in one line of C, which will sing along with you:
main(a,b){for(;;)a+=((b+=16+a/1024)&256?1:-1)*getchar()-a/512,putchar(b);}
I present this tiny golfed version first in order to convince you that software phase-locked loops are actually fairly simple, as software goes, although they can be tricky.
If you feed it 8-bit linear samples on stdin, it will produce 8-bit samples of a sawtooth wave attempting to track one octave higher on stdout. At 8000 samples per second, it tracks frequencies in the neighborhood of 250Hz, just above B below middle C. On Linux you can do this by typing arecord | ./pll | aplay. The low 9 bits of b are the oscillator (what might be a VCO in a hardware implementation), which generates a square wave (the 1 or -1) which gets multiplied by the input waveform (getchar()) to produce the output of the phase detector. That output is then low-pass filtered into a to produce the smoothed phase error signal which is used to adjust the oscillation frequency of b to push a toward 0. The natural frequency of the square wave, when a == 0, is for b to increment by 16 every sample, which increments it by 512 (a full cycle) every 32 samples. 32 samples at 8000 samples per second are 1/250 of a second, which is why the natural frequency is 250Hz.
Then putchar() takes the low 8 bits of b, which make up a sawtooth wave at 500Hz or so, and spews them out as the output audio stream.
There are several things missing from this simple example:
It has no good way to detect lock. If you have silence, noise, or a strong pure 250Hz input tone, a will be roughly zero and b will be oscillating at its default frequency. Depending on your application, you might want to know whether you've found a signal or not! Camenzind's suggestion in chapter 12 of Designing Analog Chips is to feed a second "phase detector" 90° out of phase from the real phase detector; its smoothed output gives you the amplitude of the signal you've theoretically locked onto.
The natural frequency of the oscillator is fixed and does not sweep. The capture range of a PLL, the interval of frequencies within which it will notice an oscillation if it's not currently locked onto one, is pretty narrow; its lock range, over which it will will range in order to follow the signal once it's locked on, is much larger. Because of this, it's common to sweep the PLL's frequency all over the range where you expect to find a signal until you get a lock, and then stop sweeping.
The golfed version above is reduced from a much more readable example of a software phase-locked loop in C that I wrote today, which does do lock detection but does not sweep. It needs about 100 CPU cycles per input sample per PLL on the Atom CPU in my netbook.
I think that if I were in your situation, I would do the following (aside from obvious things like looking for someone who knows more about signal processing than I do, and generating test data). I probably wouldn't filter and downconvert the signal in a front end, since it's at such a low frequency already. Downconverting to a 200Hz-400Hz band hardly seems necessary. I suspect that PSK will bring up some new problems, since if the signal suddenly shifts phase by 90° or more, you lose the phase lock; but I suspect those problems will be easy to resolve, and it's hardly untrodden territory.
This is an interactive design package
for designing digital (i.e. software)
phase locked loops (PLLs). Fill in the
form and press the ``Submit'' button,
and a PLL will be designed for you.
Interactive Digital Phase Locked Loop Design
This will get you started, but you really need to understand the fundamentals of PLL design well enough to build it yourself in order to troubleshoot it later - This is the realm of digital signal processing, and while not black magic it will certainly give you a run for your money during debugging.
-Adam
Have Matlab with Simulink? There are PLL demo files available at Matlab Central here. Matlab's code generation capabilities might get you from there to a PLL written in C.

Resources