How does x265 calculate bitrate? - video-encoding

I'm using x265 to encode a video. A get the bitrate summary and bitrate for each group of encoding frames.
When I calculate the average bitrate of the groups it does not converge with the summary bitrate.
Any ideas about how x265 calculates summary bitrate?

I found the solution.
It's the division of the average of the given Bits values per GOP with (1000 (for kb) multiply by the time length for the GOP).
For example GOP = 96, FPS = 30Hz, thus time length = 96/30 = 3.2
So, summary bitrate (per GOP) = (average bits)/(1000*3.2) valued in kbps

Related

How to calculate 512 point FFT using 2048 point FFT hardware module

I have a 2048 point FFT IP. How may I use it to calculate 512 point FFT ?
There are different ways to accomplish this, but the simplest is to replicate the input data 4 times, to obtain a signal of 2048 samples. Note that the DFT (which is what the FFT computes) can be seen as assuming the input signal being replicated infinitely. Thus, we are just providing a larger "view" of this infinitely long periodic signal.
The resulting FFT will have 512 non-zero values, with zeros in between. Each of the non-zero values will also be four times as large as the 512-point FFT would have produced, because there are four times as many input samples (that is, if the normalization is as commonly applied, with no normalization in the forward transform and 1/N normalization in the inverse transform).
Here is a proof of principle in MATLAB:
data = randn(1,512);
ft = fft(data); % 512-point FFT
data = repmat(data,1,4);
ft2 = fft(data); % 2048-point FFT
ft2 = ft2(1:4:end) / 4; % 512-point FFT
assert(all(ft2==ft))
(Very surprising that the values were exactly equal, no differences due to numerical precision appeared in this case!)
An alternate solution from the correct solution provided by Cris Luengo which does not require any rescaling is to pad the data with zeros to the required length of 2048 samples. You then get your result by reading every 2048/512 = 4 outputs (i.e. output[0], output[3], ... in a 0-based indexing system).
Since you mention making use of a hardware module, this could be implemented in hardware by connecting the first 512 input pins and grounding all other inputs, and reading every 4th output pin (ignoring all other output pins).
Note that this works because the FFT of the zero-padded signal is an interpolation in the frequency-domain of the original signal's FFT. In this case you do not need the interpolated values, so you can just ignore them. Here's an example computing a 4-point FFT using a 16-point module (I've reduced the size of the FFT for brievety, but kept the same ratio of 4 between the two):
x = [1,2,3,4]
fft(x)
ans> 10.+0.j,
-2.+2.j,
-2.+0.j,
-2.-2.j
x = [1,2,3,4,0,0,0,0,0,0,0,0,0,0,0,0]
fft(x)
ans> 10.+0.j, 6.499-6.582j, -0.414-7.242j, -4.051-2.438j,
-2.+2.j, 1.808+1.804j, 2.414-1.242j, -0.257-2.3395j,
-2.+0.j, -0.257+2.339j, 2.414+1.2426j, 1.808-1.8042j,
-2.-2.j, -4.051+2.438j, -0.414+7.2426j, 6.499+6.5822j
As you can see in the second output, the first column (which correspond to output 0, 3, 7 and 11) is identical to the desired output from the first, smaller-sized FFT.

AurioTouch2 sound frequency

Is there a way to get the frequency value in Apple's example project AurioTouch2? I've been looking all over for an answer to this and couldn't find one. Thanks
Maybe it helps
1.
The try-it-out way
FFTs use frequency bins and the bin frequency width is based on the FFT parameters. To find a frequency you will need to record it sampled at a rate at least twice the highest frequency present in the sample. Then find the time between the cycles. If it is not a pure frequency this will of course be harder.
2.
I heard of the frequency of each bin is
yFract * hwSampleRate * 1/2
yFract is the half of fftLength and the last bin of the FFT corresponds with half of the sampling rate. Using it like this way
NSLog(#"The magnitude for %f Hz is %f.", (yFract * hwSampleRate * .5), (interpVal * 120));

What actually does the size of FFT mean

While using FFT sample code from Apple documentation, what actually does the N, log2n, n and nOver2 mean?
Does N refer to the window size of the fft or the whole number of samples in a given audio, and
how do I calculate N from an audio file?
how are they related to the audio sampling rate i.e. 44.1kHz?
What would be the FFT frame size in this code?
Code:
/* Set the size of FFT. */
log2n = N;
n = 1 << log2n;
stride = 1;
nOver2 = n / 2;
printf("1D real FFT of length log2 ( %d ) = %d\n\n", n, log2n);
/* Allocate memory for the input operands and check its availability,
* use the vector version to get 16-byte alignment. */
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
originalReal = (float *) malloc(n * sizeof(float));
obtainedReal = (float *) malloc(n * sizeof(float));
N or n typically refers to the number of elements. log2n is the base-two logarithm of n. (The base-two logarithm of 32 is 5.) nOver2 is n/2, n divided by two.
In the context of an FFT, n is the number of samples being fed into the FFT.
n is usually determined by a variety of constraints. You want more samples to provide a better quality result, but you do not want so many samples that processing takes up a lot of computer time or that the result is not available until so late that the user notices a lag. Usually, it is not the length of an audio file that determines the size. Rather, you design a “window” that you will use for processing, then you read samples from the audio file into a buffer big enough to hold your window, then you process the buffer, then you repeat with more samples from the file. Repetitions continue until the entire file is processed.
A higher audio sampling rate means there will be more samples in a given period of time. E.g., if you want to keep your window under 1/30th of a second, then a 44.1 kHz sampling rate will have less than 44.1•1000/30 = 1470 samples. A higher sampling rate means you have more work to do, so you may need to adjust your window size to keep the processing within limits.
That code uses N for log2n, which is unfortunate, since it may confuse people. Otherwise, the code is as I described above, and the FFT frame size is n.
There can be some confusion about FFT size or length when a mix of real data and complex data is involved. Typically, for a real-to-complex FFT, the number of real elements is said to be the length. When doing a complex-to-complex FFT, the number of complex elements is the length.
'N' is the number of samples, i.e., your vector size. Corresponding, 'log2N' is the logarithm of 'N' with the base 2, and 'nOver2' is the half of 'N'.
To answer the other questions, one must know, what do you want to do with FFT. This document, even it is written with a specific system in mind, can serve as an survey about the relation and the meaning of the parameters in (D)FFT.

EEG Wavelet Analysis

I want to do a time-frequency analysis of an EEG signal. I found the GSL wavelet function for computing wavelet coefficients. How can I extract actual frequency bands (e.g. 8 - 12 Hz) from that coefficients? The GSL manual says:
For the forward transform, the elements of the original array are replaced by the discrete wavelet transform f_i -> w_{j,k} in a packed triangular storage layout, where J is the index of the level j = 0 ... J-1 and K is the index of the coefficient within each level, k = 0 ... (2^j)-1. The total number of levels is J = \log_2(n).
The output data has the following form, (s_{-1,0}, d_{0,0}, d_{1,0}, d_{1,1}, d_{2,0}, ..., d_{j,k}, ..., d_{J-1,2^{J-1}-1})
If I understand that right an output array data[] contains at position 1 (e.g. data[1]) the amplitude of the frequency band 2^0 = 1 Hz, and
data[2] = 2^1 Hz
data[3] = 2^1 Hz
data[4] = 2^2 Hz
until
data[7] = 2^2 Hz
data[8] = 2^3 Hz
and so on ...
That means I have only the amplitudes for the frequencies 1 Hz, 2 Hz, 4 Hz, 8 Hz, 16 Hz, ... How can I get e.g. the amplitude of a frequency component oscillating at 5.3 Hz? How can I get the amplitude of a whole frequency range, e.g. the amplitude of 8 - 13 Hz? Any recommendations how to get a good time-frequency distribution?
I'm not sure how familiar you are with general signal processing, so I'll try to be clear, but not chew the food for you.
Wavelets are essentially filter banks. Each filter splits a given signal into two non-overlapping independent high frequency and low frequency subbands such that it can then be reconstructed by the means of an inverse transform. When such filters are applied continually, you get a tree of filters with output of one fed into the next. The simplest, and the most intuitive way to build such tree is as follows:
Decompose a signal into low frequency (approximation) and high frequency (detail) components
Take the low frequency component, and perform the same processing on that
Keep going until you've processed the required number of levels
The reason for this is that you can then downsample the resulting approximation signal. For example, if your filter splits a signal with sampling frequency (Fs) 48000 Hz -- which yields maximum frequency of 24000 Hz by Nyquist Theorem -- into 0 to 12000 Hz approximation component and 12001 to 24000 Hz detail component, you can then take every second sample of the approximation component without aliasing, essentially decimating the signal. This is widely used in signal and image compression.
According to this description, at level one you split your frequency content down the middle and create two separate signals. Then you take your lower frequency component and split it down the middle again. You now get three components in total: 0 to 6000 Hz, 6001 to 12000 Hz, and 12001 to 24000 Hz. You see that the two newer components are both half the bandwidth of the first detail component. You get this sort of a picture:
This correlates with the bandwidths you describe above (2^1 Hz, 2^2 Hz, 2^3 Hz and so on). However, using a broader definition of a filter bank, we can arrange the above tree structure as we like, and it will still remain a filter bank. For example, we can feed both approximation and detail component to to split into two high-frequency and low-frequency signals like so
If you look at it carefully, you see that both high and low frequency components down the middle in their frequencies and as a result you get a uniform filter bank whose frequency separation looks more like this:
Notice that all bands are of the same size. By building a uniform filter bank with N levels, you end up with responses of 2^(N-1) band-bass filters. You can fine-tune your filter bank to eventually give you the desired band (8-13 Hz).
In general, I would not advise you to do this with wavelets. You can go through some literature on designing good band-pass filters and simply build a filter that would only let through 8-13 Hz of your EEG signals. That's what I've done before and it worked quite well for me.

Help with resampling/upsampling

I have an array of 240 data points sampled at 600hz, representing 400ms. I need to resample this data to 512 data points sampled at 1024hz, representing 500ms. I assume since I'm starting with 400ms of data, the last 100ms will just need to be padded with 0s.
Is there a best approach to take to accomplish this?
If you want to avoid interpolation then you need to upsample to a 76.8 kHz sample rate (i.e. insert 127 0s after every input sample), low pass filter, then decimate (drop 74 out of every 75 samples).
You can use windowed Sinc interpolation, which will give you the same result as upsampling and downsampling using a linear phase FIR low-pass filter with a windowed Sinc impulse response. When using a FIR filter, one normally has to pad a signal with zeros the length of the FIR filter kernel on both sides.
Added:
Another possibility is to zero pad 240 samples with 60 zeros, apply a non-power-of-2 FFT of length 300, "center" zero pad the FFT result with 212 complex zeros to make it 512 long, but with the identical spectrum, and do an IFFT of length 512 to get the resampled result.
Yes to endolith's response, if you want to interpolate x[n] by simply computing the FFT, zero-stuff, and then IFFT, you'll get errors if x[n] is not periodic. See this reference: http://www.embedded.com/design/other/4212939/Time-domain-interpolation-using-the-Fast-Fourier-Transform-
FFT based resampling/upsampling is pretty easy...
If you can use python, scipy.signal.resample should work.
For C/C++, there is a simple fftw trick to upsample if you have real (as opposed to complex) data.
nfft = the original data length
upnfft = the new data length
double * data = the original data
// allocate
fftw_complex * tmp_fd = (fftw_complex*)fftw_malloc((upnfft/2+1)*sizeof(fftw_complex));
double * result = (double*)fftw_malloc(upnfft*sizeof(double));
// create fftw plans
fftw_plan fft_plan = fftw_plan_dft_r2c_1d(nfft, data, tmp_fd, FFTW_ESTIMATE);
fftw_plan ifft_plan = fftw_plan_dft_c2r_1d(upnfft, tmp_fd, result, FFTW_ESTIMATE);
// zero out tmp_fd
memset(tmp_fd, 0, (upnfft/2+1)*sizeof(fftw_complex));
// execute the plans (forward then reverse)
fftw_execute_dft_r2c(fft_plan, data, tmp_fd);
fftw_execute_dft_c2r(ifft_plan, tmp_fd, result);
// cleanup
fftw_free(tmp_fd);

Resources