The false positive rate is the x-axis spanning from 0 to 1. The true positive rate is the y-axis spanning from 0 to 1. And the graphs show data points like (.8,.8). Which if the tpr is .8 and the fpr is .8, they add up to 1.6...
Typically the axis are normalised using the total number of FPs or TPs in the test/validation set. Otherwise the end of the curve wouldn't be 1/1. I personally prefer to label the axis by the number of instances.
Why to not normalise by the total number - in real applications, it gets rather complicated as you often do not have labels for all examples. The typical example for ROC curves are mass mailings. To normalise the curve correctly you would need to spam the entire world.
I'm trying to measure the frequency of a simulated digital signal per SOLVED: measure frequency of digital random signal. I don't need an averaging window because my signal will be consistent and not random.
CODE:
This is my code, which is identical to the code from the solution, minus the averaging window:
QUESTION:
My simulated signal is a 0-5 VDC square wave with a frequency of 100 Hz and a sampling rate of 1000 Hz. Why is it outputting that my frequency is 125 Hz? (NOTE: Regardless of what value I enter for frequency, I measure a higher value than the true value)
It was a stupid logic error on my end. I forgot to add dt to Period one last time before enqueuing onto the Periods queue.
Periods holds the length of time of each recorded half period of the signal (hence the divide by two at the end before outputting to the graph). In this example, the signal frequency was 100 Hz (period 0.01 sec), so every half cycle is 0.005 sec long. By not adding the last sampling period before enqueuing, I was saying every half cycle was 0.004 sec. This is why the frequency was 125 Hz ( 0.5 cycles / 0.004 sec = 125 Hz).
Therefore, since we should add dt to Period regardless of whether a zero-crossing has occurred, we fix the code by moving the addition from the False case of the select statement to before the select statement, and simply pass the queue and sum through the False case (see below).
IF True:
IF False:
NOTE:
One could use the Express >> Signal Analysis >> Timing and Transitions VI to measure frequency, but it wasn't working for me when I started out for some reason, and more importantly I wanted to learn how to program this myself.
I have input data that looks like:
col1 col2 col3 col4 col5 col 6
-0.1144887 -0.1717161 3847 3350 2823 2243
0.3534122 0.53008300 4230 3520 2421 3771
...
So columns 1 and 2 range from -1 to 1, and columns 3-6 range from 2000-5000
The output data ranges from 5.0 to 10.0. I expect to predict a single real-valued output for each input vector and am using a linear regression dense neural network with an 'mse' loss function.
I'm thinking I should scale columns 3-6 to between 0 and 1 and leave columns 1 and 2 as is. Is that correct or should I also scale columns 1 and 2 to be between 0 and 1? If I scale the input, does that affect my predicted output value or does it only speed up the learning? Is there any need to scale the output?
You should scale all the features in the same range. The standard way is to center to the mean value and scale using the variance:
1) compute the mean value and the variance of the features using the training set (e.g. col1_av=average(col1_train), col2_av=average(col2_train),...)
2) from each feature subtract the corresponding mean value and scale wrt the variance(e.g. [x1=-0.1144887,x2=0.3534122,...]-> (x1-col1_av)/col1_var). The sample in the test set must be scaled using the value estimated on the training set.
Having features so different in magnitude will affect also the output and not only the learning process since feature with larger magnitude will weight more in the model.
In general there is no need to scale the output.
An interesting read: https://medium.com/greyatom/why-how-and-when-to-scale-your-features-4b30ab09db5e
I was going thru this book to understand wavelets. Its a beautifully written not much technical document.
web.iitd.ac.in/~sumeet/WaveletTutorial.pdf
But in its very first chapter it describes below figure with explanation:
The frequency is measured in cycles/second, or with a more common
name, in "Hertz". For example the electric power we use in our daily
life in the US is 60 Hz (50 Hz elsewhere in the world). This means
that if you try to plot the electric current, it will be a sine wave
passing through the same point 50 times in 1 second. Now, look at the
following figures. The first one is a sine wave at 3 Hz, the second
one at 10 Hz, and the third one at 50 Hz. Compare them
But I am unable to understand what X and Y axis values represents. The X values range is in between [1,-1] so I am assuming it is value of the signal while Y axis is representing the time in milliseconds (1000ms = 1 sec). But then the document goes on further to state the representation of same signal in frequency-amplitude domain:
So how do we measure frequency, or how do we find the frequency
content of a signal? The answer is FOURIER TRANSFORM (FT). If the FT
of a signal in time domain is taken, the frequency-amplitude
representation of that signal is obtained. In other words, we now have
a plot with one axis being the frequency and the other being the
amplitude. This plot tells us how much of each frequency exists in our
signal.
But I am not able to understand what does in the upper graph X and Y axis values represents - shouldn't is be Frequency (X Axis) and Amplitude (Y axis) - if I am correct then why does Y axis has values ranked as 0,200 and 400 - shouldn't it be between range [1,-1] or rather [0,1]?
For the time domain signals the X axis is time and the Y axis is amplitude.
For the frequency domain equivalents the X axis is frequency and the Y axis is magnitude.
Note that when using most FFTs there is a scaling factor of N, where N is the number of points, so the magnitude values in the frequency domain plots are much greater than amplitude of the original time domain signal.
As Paul R wrote above, in the first image the horizontal X-axis represents time with the units ms.
The time interval has the length 1000ms.
The vertical Y-axis represents the amplitude of the signal. However, in the diagram the unit is not Volt, but it is normalized to amplitude 1.
If you perform a Fourier Transformation on that time signal, you will get a frequency spectrum.
If you use a DFT (Discrete Fourier Transformation) or a FFT (Fast Fourier Transformation),
the result depends on the implementation of the algorithm.
a) If the algorithm delivers a normalized result, the amplitude of your frequency line is 0.5 (if the amplitude of your input signal is 1).
b) If the algorithm delivers a non normalized result, the amplitude of your frequency line is half the value of the number of DFT/FFT input values.
Your frequency line has the value of 500, which means the algorithm does not use normalization and the number of input samples was 1000.
Now, what is represented by the horizontal X-axes in the frequency domain?
In the time domain, the length of your time input interval is T = 1000ms = 1s.
Therefore the distance between the frequency lines in the frequency domain is df = 1/s = 1Hz.
As we know from the amplitude in the frequency domain, the input signal in time domain had 1000 samples. This means the sampling time was dt = T/1000 = 1s/1000 = 1ms.
Therefore the total frequency interval F = (fmin, ..., fmax) in frequency domain is 1/dt = 1/1ms = 1kHz.
However, the range does NOT start at fmin = 0 Hz and ends at 1kHz, as one could assume inspecting the upper diagram in the second image. The spectrum calculated by a DFT/FFT contains a positive and a negative frequency range. This means you get a frequency range: (-500Hz, -499Hz, -498Hz, ... -1Hz, 0Hz, 1Hz, 2Hz, ..., 498Hz, 499Hz). The value 500Hz does not exist!
However, for the user's convenience the spectrum is not output in this order, but it is shifted by 500Hz (F/2). This means the spectrum starts with the DC value:
0Hz, 1Hz, 2Hz, ..., 498Hz, 499Hz, -500Hz, -499Hz, -498Hz, ..., -2Hz, -1HZ.
Because the spectrum of a real input function is hermitian Y(f) == Y(-f)*, the positive band carries the complete information. So, you can cut off the negative side band.
The upper diagram in the second image shows two peaks. The first peak appears at f = 50Hz and the second peak is shown at f=950Hz. However, this is not correct. The labels of the horizontal axes are wrong. The second peak appears at f = -50Hz.
In the lower diagram the frequency range ends at 500Hz (499Hz would be correct)a). The range of the negative frequencies is cut off.
I'm newbie in cepstrum analysis. So that's the question.
I have signal with the length 4096 and sample rate 8000 Hz. I make FFT and get the array with the length 4096*2 (2*i position is for cosinus coeff, 2*i+1 position is for sinus coeff). Frequency step is (sampleRate/signalLength == 8000/4096). So, I can calculate frequency at i position this way: i*sampleRate/signalLength.
Then, I make the cepstrum transformation. I can't understand how to find quefrency step and how to find frequency for given quefrency.
The bin number of an FFT result is inversely proportional to the length of the period of a sinusoidal component in the time domain. The bin number of a quefrency result is also inversely proportional to the distance between partials in a series of overtones in the frequency domain (this distance often the same as a root or fundamental pitch). Thus quefrency bin number would be proportional to period or repeat lag (autocorrelation peak) of a harmonically rich periodic signal in the time domain.