FDK AAC encoder/decoder : Access Huffman encoded and decoded data - huffman-code

For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay

To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"

Related

How to use CRC32 generator for an effective CRC16?

I am writing in C for an embedded STM32F105 microcontroller. I need to implement a CRC routine to validate a message sent over the air.
The microcontroller has a CRC32 generator built into its hardware. You feed it 4 bytes at a time and it calcs the CRC without additional processor overhead. It's non-configurable and uses the Ethernet CRC32 polynomial.
I want to use this hardware CRC generator, but I only want to add two bytes (not four) to each data packet. The packet will vary in size between 4 and 1022 bytes.
Can I simply use the two high (or low) bytes of the CRC32? Or can I always feed the CRC module 2 bytes at a time, with the high bytes being zero?
Is there some other way to get what I'm looking for?
For most applications, sure, you can just use the low two bytes of the CRC-32 as a 16-bit check value. However that will not be a 16-bit CRC. It will be as good as any other hash value for checking for gross errors in a message.
It will not have certain desirable properties for small numbers of bit errors in short packet lengths that are afforded by CRCs.
There's no point in feeding the CRC generator zeros. Go ahead and give it four bytes of data for each instruction.

NVENC: Number of IDR slices per IDR refresh

I was examining a h264 bitstream I generated with Nvidia Video Encoder (NVENC) and notices that each IDR update contains not a single IDR nal, but 4 of them.
I don't understand why there is a need in 4 IDR nals per update.
I am trying to find a way to reduce that number to a single IDR slice per update.
In NVENC config I use idrPeriod to tell the encoder the frequency of update, but I can't find a way to control number of IDR nals per update.

How to extract motion vectors from H.264 AVC CMBlockBufferRef after VTCompressionSessionEncodeFrame

I'm trying read or understand CMBlockBufferRef representation of H.264 AVC 1/30 frame.
The buffer and the encapsulating CMSampleBufferRef is created by using VTCompressionSessionRef.
https://gist.github.com/petershine/de5e3d8487f4cfca0a1d
H.264 data is represented as AVC memory buffer, CMBlockBufferRef from the compressed sample.
Without fully decompressing again, I'm trying to extract motion vectors or predictions from this CMBlockBufferRef.
I believe that for the fastest performance, byte-by-byte reading from the data buffer using CMBlockBufferGetDataPointer() should be necessary.
However, I'm having trouble finding the right way to read the data buffer, with the intention to find and extract motion vectors or predictions.
Is there no way at all, without decompressing, or using ffmpeg?

Is there a way to compress DICOM data?

Say, I have a sequence on .dicom files in a folder. The cumulative size is about 100 Mb. It's a lot of data. I tried to convert data into .nrrd and .nii, but those files had the summary size of the converted .dicom files (which is fairly predictable, though .nrrd was compressed with gzip). I'd like to know, if there a file format that would give me far less sizes, or just a way to solve that. Perhaps, .vtk, or something else (not sure it qould work). Thanks in advance.
DICOM supports compression of the pixel data within the file itself. The idea of DICOM is that it's format agnostic from the point of view of the pixel data it holds.
DICOM can hold raw pixel data and also can hold JPEG-compressed pixel data, as well as many other formats. The transfer syntax tag of the DICOM file gives you the compression protocol of the pixel data within the DICOM.
The first thing is to figure out whether you need lossless or lossy compression. If lossy, there are a lot of options, and the compression ratio is quite high in some - the tradeoff is that you do lose fidelity and the images may not be adequate for diagnostic purposes. There are also lossless compression schemes - like JPEG2000, RLE and even JPEG-LS. These will compress the pixel data, but retain diagnostic quality without any image degradation.
You can also zip the files, which, if raw, should produce very good results. What are you looking to do w/ these compressed DICOMs?

mpeg-ts fundamental

I read some tutorials about mpeg transport stream, but there are 2 fundamental issues I do not understand:
1. mpeg-ts muxer recieve pes packets from audio and video, and output mpeg-ts packets. How does it do this muxing ? Is it that whenever a packet from any program is waiting on its input, that the muxer wakes up and process the pes slicing into mpeg-ts ?
2. Is it that the user can select which bit rate the mpeg-ts muxer will output ? what is the connection between the rate of the encoding to the rate of mpeg-ts ?
Thank you very much,
Ran
MPEG2-TS muxing is a complex art-form. Suggested reading: MPEG2-TS specification, SPTS/MPTS, VBR vs. CBR, Hypothetical reference decoder and buffers (EB, MB, TB), jitter and drift.
a very short answer to your questions can be summarized like this:
for each encoder, on the other end of the line there is a decoder which wants to display a video frame (or audio frame) every frame interval. this frame needs to be decoded before its presentation time. if this frame uses other frames as reference, they also need to be decoded prior to presentation.
when multiplexing, the data must arrive sufficient time before presentation. A video frame to be presented at time n must be available at decoder at time n - x where is x is a measure of time depending on the buffer rate of the decoder (see MB,TB,EB). if TS bit rate is too low, "underflow" occurs and the video is not in the decoder on time. if TS bit rate is too large, "overflow" occurs, and the buffers have to drop packets which will also create visual artifacts.

Resources