I read some tutorials about mpeg transport stream, but there are 2 fundamental issues I do not understand:
1. mpeg-ts muxer recieve pes packets from audio and video, and output mpeg-ts packets. How does it do this muxing ? Is it that whenever a packet from any program is waiting on its input, that the muxer wakes up and process the pes slicing into mpeg-ts ?
2. Is it that the user can select which bit rate the mpeg-ts muxer will output ? what is the connection between the rate of the encoding to the rate of mpeg-ts ?
Thank you very much,
Ran
MPEG2-TS muxing is a complex art-form. Suggested reading: MPEG2-TS specification, SPTS/MPTS, VBR vs. CBR, Hypothetical reference decoder and buffers (EB, MB, TB), jitter and drift.
a very short answer to your questions can be summarized like this:
for each encoder, on the other end of the line there is a decoder which wants to display a video frame (or audio frame) every frame interval. this frame needs to be decoded before its presentation time. if this frame uses other frames as reference, they also need to be decoded prior to presentation.
when multiplexing, the data must arrive sufficient time before presentation. A video frame to be presented at time n must be available at decoder at time n - x where is x is a measure of time depending on the buffer rate of the decoder (see MB,TB,EB). if TS bit rate is too low, "underflow" occurs and the video is not in the decoder on time. if TS bit rate is too large, "overflow" occurs, and the buffers have to drop packets which will also create visual artifacts.
Related
I would like to know if it's possible to resample an already written AVAudioFile.
All the references that I found don't work on this particular problem, since:
They propose the resampling while the user is recording an AVAudioFile, while installTap is running. In this approach, the AVAudioConverter works in each buffer chunk given by the inputNode and appends it in the AVAudioFile. [1] [2]
The point is that I would like to resample my audio file regardless of the recording process.
The harder approach would be to upsample the signal by a L factor and applying decimation by a factor of M, using vDSP:
Audio on Compact Disc has a sampling rate of 44.1 kHz; to transfer it to a digital medium that uses 48 kHz, method 1 above can be used with L = 160, M = 147 (since 48000/44100 = 160/147). For the reverse conversion, the values of L and M are swapped. Per above, in both cases, the low-pass filter should be set to 22.05 kHz. [3]
The last one obviously seems like a too hard coded way to solve it. I hope there's a way to resample it with AVAudioConverter, but it lacks documentation :(
I've been wondering how buffering works for microcontrollers. Essentially, I am using a temperature probe where I am samplying around 50hz.
If I buffer 50 samples, then do processing on the 50 samples; rinse an drepeat for 10 000 samples, will I be missing samples? Essentially does buffering happen in parallel or can a simple microcontroller (atmega328 as an example) do either or?
Can I still buffer data in from the sensor when I am manipulating and processing data?
I was examining a h264 bitstream I generated with Nvidia Video Encoder (NVENC) and notices that each IDR update contains not a single IDR nal, but 4 of them.
I don't understand why there is a need in 4 IDR nals per update.
I am trying to find a way to reduce that number to a single IDR slice per update.
In NVENC config I use idrPeriod to tell the encoder the frequency of update, but I can't find a way to control number of IDR nals per update.
For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay
To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"
The default iphone audio recorder has a sampling rate of 44.1khz and a bit rate of 64 kbps. When exporting an 8 min audio recording from the recorder I can see that the exported file size comes out to a little under 4MB. When I try to export an audio file from my custom audio recorder my 3 min audio recording fails to export because its 22MB. How are they getting their file size so low with such a high sampling rate? Also, I see that the exported audio file is .m4a but in itunes the file "kind" is AAC. Shouldn't that make the audio file .AAC?
Just read this: What is the difference between M4A and AAC Audio Files?. So an m4a file can contain an AAC audio track?
Kind of confused here.
AAC is one of many audio codec which is the flavor of compression ... once encoded it becomes binary data which needs to get wrapped inside a container format for transport over the wire or into a file ... once such container is m4a
Doing the math on 64 kbps (this is an uncompressed figure) for an 8 minute clip ... 64 * 60 * 8 == over 30 meg ... AAC typically does a 10 to 1 compression so it becomes the 4mb you quote ... bit rate is based on two underlying factors : sample rate (44.1 kHz) and bit depth or the number of bits of resolution per sample ... standard CD quality bit depth is 16 bits, whereas telephone audio can be as low as a bit depth of 8 bits per sample ... single channel mono versus 2 ch stereo would double the required bit per second
under the covers all audio processing/recording/playback uses PCM which is raw audio prior to compression
Your figure of 22 meg for 3 minutes --> 22000000 / 60 / 3 == 122 kbps which is about right for stereo uncompressed PCM ... so I would say your 22 meg is uncompressed
hope this helps