How to extract motion vectors from H.264 AVC CMBlockBufferRef after VTCompressionSessionEncodeFrame - ios

I'm trying read or understand CMBlockBufferRef representation of H.264 AVC 1/30 frame.
The buffer and the encapsulating CMSampleBufferRef is created by using VTCompressionSessionRef.
https://gist.github.com/petershine/de5e3d8487f4cfca0a1d
H.264 data is represented as AVC memory buffer, CMBlockBufferRef from the compressed sample.
Without fully decompressing again, I'm trying to extract motion vectors or predictions from this CMBlockBufferRef.
I believe that for the fastest performance, byte-by-byte reading from the data buffer using CMBlockBufferGetDataPointer() should be necessary.
However, I'm having trouble finding the right way to read the data buffer, with the intention to find and extract motion vectors or predictions.
Is there no way at all, without decompressing, or using ffmpeg?

Related

How to let FFMPEG fetch frames from OpenCV and stream them to HTTP server

There is a camera that shoots at 20 frame per second. each frame is 4000x3000 pixel.
The frames are sent to a software that contain openCV in it. OpenCV resizes the freames to 1920x1080 then they must be sent to FFMPEG to be encoded to H264 or H265 using Nvidia Nvenc.
The encoded video then got steamed HTTP to a maximum of 10 devices.
The infrastructure is crazy good (10 GB Lan) with state of the art switchers, routers etc...
Right now, i can get 90 FPS when encoding the images from an Nvme SSD. this means that the required encoding speed is achieved.
The question is how to get the images from OpenCV to FFMPEG ?
the stream will be watched on a webapp that was made using MERN stack (assuming that this is relevant).
For cv::Mat you have cv::VideoWriter. If you wish to use FFMpeg, assuming Mat is continuous, which can be enforced:
if (! mat.isContinuous())
{
mat = mat.clone();
}
you can simply feed mat.data into sws_scale
sws_scale(videoSampler, mat.data, stride, 0, mat.rows, videoFrame->data, videoFrame->linesize);
or directly into AVFrame
For cv::cuda::GpuMat, VideoWriter implementation is not available, but you can use NVIDIA Video Codec SDK and similarly feed cv::cuda::GpuMat::data into NvEncoderCuda, just make sure your GpuMat has 4 channels (BGRA):
NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_ABGR;
std::unique_ptr<NvEncoderCuda> pEnc(new NvEncoderCuda(cuContext, nWidth, nHeight, eFormat));
...
cv::cuda::cvtColor(srcIn, srcIn, cv::ColorConversionCodes::COLOR_BG2BGRA);
NvEncoderCuda::CopyToDeviceFrame(cuContext, srcIn.data, 0, (CUdeviceptr)encoderInputFrame->inputPtr,
(int)encoderInputFrame->pitch,
pEnc->GetEncodeWidth(),
pEnc->GetEncodeHeight(),
CU_MEMORYTYPE_HOST,
encoderInputFrame->bufferFormat,
encoderInputFrame->chromaOffsets,
encoderInputFrame->numChromaPlanes);
Here's my complete sample of using GpuMat with NVIDIA Video Codec SDK

converting pointcloud data from mmwave sensor to laserscan

I am using ti mmwave 1642 evm sensor for generation of pointcloud data. For processing the data, I am using Intel NUC.
I am facing the problem of converting pointcloud data from mmwave sensor to laserscan.
By launching rviz_1642_2d.launch, I am able to see pointcloud data in rviz.
How to convert the pointcloud data, generated from mmwave sensor, to laserscan?
First of all, this conversion is not straight forward since a pointcloud describes an unordered set of 3d points in the world. A laser scan, on the other hand, is a well parametrized and ordered 2d description of equiangular distance measurements.
Therefore, converting a pointcloud into a laserscan will cause a massive loss in information.
However, there are packages like pointcloud_to_laserscan which does the conversion for you and furthermore, you can define how the conversion should be applied.

What is the data being hold in an AudioQueue buffer?

Can any one tell me what is the data being hold by an AudioQueue buffer in AudioQueue services? Are the samples are amplitudes?
Float32 *samples = (Float32*)ioData->mBuffers[0].mData;
I referred the link http://www.davidstarke.com/2015/04/waveforms.html and here mentioned that,The samples hold amplitude values.Is that correct?

FDK AAC encoder/decoder : Access Huffman encoded and decoded data

For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay
To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"

mpeg-ts fundamental

I read some tutorials about mpeg transport stream, but there are 2 fundamental issues I do not understand:
1. mpeg-ts muxer recieve pes packets from audio and video, and output mpeg-ts packets. How does it do this muxing ? Is it that whenever a packet from any program is waiting on its input, that the muxer wakes up and process the pes slicing into mpeg-ts ?
2. Is it that the user can select which bit rate the mpeg-ts muxer will output ? what is the connection between the rate of the encoding to the rate of mpeg-ts ?
Thank you very much,
Ran
MPEG2-TS muxing is a complex art-form. Suggested reading: MPEG2-TS specification, SPTS/MPTS, VBR vs. CBR, Hypothetical reference decoder and buffers (EB, MB, TB), jitter and drift.
a very short answer to your questions can be summarized like this:
for each encoder, on the other end of the line there is a decoder which wants to display a video frame (or audio frame) every frame interval. this frame needs to be decoded before its presentation time. if this frame uses other frames as reference, they also need to be decoded prior to presentation.
when multiplexing, the data must arrive sufficient time before presentation. A video frame to be presented at time n must be available at decoder at time n - x where is x is a measure of time depending on the buffer rate of the decoder (see MB,TB,EB). if TS bit rate is too low, "underflow" occurs and the video is not in the decoder on time. if TS bit rate is too large, "overflow" occurs, and the buffers have to drop packets which will also create visual artifacts.

Resources