NVENC: Number of IDR slices per IDR refresh - nvidia

I was examining a h264 bitstream I generated with Nvidia Video Encoder (NVENC) and notices that each IDR update contains not a single IDR nal, but 4 of them.
I don't understand why there is a need in 4 IDR nals per update.
I am trying to find a way to reduce that number to a single IDR slice per update.
In NVENC config I use idrPeriod to tell the encoder the frequency of update, but I can't find a way to control number of IDR nals per update.

Related

How can I resample an audio file programatically in swift?

I would like to know if it's possible to resample an already written AVAudioFile.
All the references that I found don't work on this particular problem, since:
They propose the resampling while the user is recording an AVAudioFile, while installTap is running. In this approach, the AVAudioConverter works in each buffer chunk given by the inputNode and appends it in the AVAudioFile. [1] [2]
The point is that I would like to resample my audio file regardless of the recording process.
The harder approach would be to upsample the signal by a L factor and applying decimation by a factor of M, using vDSP:
Audio on Compact Disc has a sampling rate of 44.1 kHz; to transfer it to a digital medium that uses 48 kHz, method 1 above can be used with L = 160, M = 147 (since 48000/44100 = 160/147). For the reverse conversion, the values of L and M are swapped. Per above, in both cases, the low-pass filter should be set to 22.05 kHz. [3]
The last one obviously seems like a too hard coded way to solve it. I hope there's a way to resample it with AVAudioConverter, but it lacks documentation :(

How to use CRC32 generator for an effective CRC16?

I am writing in C for an embedded STM32F105 microcontroller. I need to implement a CRC routine to validate a message sent over the air.
The microcontroller has a CRC32 generator built into its hardware. You feed it 4 bytes at a time and it calcs the CRC without additional processor overhead. It's non-configurable and uses the Ethernet CRC32 polynomial.
I want to use this hardware CRC generator, but I only want to add two bytes (not four) to each data packet. The packet will vary in size between 4 and 1022 bytes.
Can I simply use the two high (or low) bytes of the CRC32? Or can I always feed the CRC module 2 bytes at a time, with the high bytes being zero?
Is there some other way to get what I'm looking for?
For most applications, sure, you can just use the low two bytes of the CRC-32 as a 16-bit check value. However that will not be a 16-bit CRC. It will be as good as any other hash value for checking for gross errors in a message.
It will not have certain desirable properties for small numbers of bit errors in short packet lengths that are afforded by CRCs.
There's no point in feeding the CRC generator zeros. Go ahead and give it four bytes of data for each instruction.

Frequency analysis of very short signal in GNU Octave

I have some very short signals from oscilloscope (50k-200k samples) registered over about 2ms time length. Those are acoustic signals with registered signal of a spark of ESD (electrostatic discharge).
I'd like to get some frequency data of that signal, in near-acoustic frequency range (up to about 30kHz) with as high time resolution as possible.
I have tried ploting a spectrogram (specgram in Octave) to view the signal, but the output is not really usefull. Using specgram( x, N, fs );, where x is my signal of fs sampling rate, I receive plot starting at very high frequencies of about 500MHz for low values of N and I get better frequency resolution for big N values (like 2^12-13) but the window is too wide and I receive only 2 spectrum values over whole signal length.
I understand that it may be the limitation of Fourier transform which is probably used by the specgram function (actually, I don't know much about signal analysis).
Is there any other way to get some frequency (as a function of time) information of that kind of signal? I've read something about wavelets, but when I tried using dwt function of signal package, I received this error:
error: 'wfilters' undefined near line 51 column 14
error: called from
dwt at line 51 column 12
Even if this would work, I am not so sure if I'd know how to actually use the output of those wavelet functions ...
To get audio frequency information from such a high sample rate, you will need obtain a sample vector long enough to contain at least a few whole cycles at audio frequencies, e.g. many 10's of milliseconds of contiguous samples, which may or may not be more than your scope can gather. To reasonably process this amount of data, you might low pass filter the sample data to just contain audio frequencies, and then resample it to a lower sample rate, but above twice that filter cut-off frequency. Then you will end up with a much shorter sample vector to feed an FFT for your audio spectrum analysis.

FDK AAC encoder/decoder : Access Huffman encoded and decoded data

For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay
To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"

mpeg-ts fundamental

I read some tutorials about mpeg transport stream, but there are 2 fundamental issues I do not understand:
1. mpeg-ts muxer recieve pes packets from audio and video, and output mpeg-ts packets. How does it do this muxing ? Is it that whenever a packet from any program is waiting on its input, that the muxer wakes up and process the pes slicing into mpeg-ts ?
2. Is it that the user can select which bit rate the mpeg-ts muxer will output ? what is the connection between the rate of the encoding to the rate of mpeg-ts ?
Thank you very much,
Ran
MPEG2-TS muxing is a complex art-form. Suggested reading: MPEG2-TS specification, SPTS/MPTS, VBR vs. CBR, Hypothetical reference decoder and buffers (EB, MB, TB), jitter and drift.
a very short answer to your questions can be summarized like this:
for each encoder, on the other end of the line there is a decoder which wants to display a video frame (or audio frame) every frame interval. this frame needs to be decoded before its presentation time. if this frame uses other frames as reference, they also need to be decoded prior to presentation.
when multiplexing, the data must arrive sufficient time before presentation. A video frame to be presented at time n must be available at decoder at time n - x where is x is a measure of time depending on the buffer rate of the decoder (see MB,TB,EB). if TS bit rate is too low, "underflow" occurs and the video is not in the decoder on time. if TS bit rate is too large, "overflow" occurs, and the buffers have to drop packets which will also create visual artifacts.

Resources