How to render to a DirectX11 texture from H.264 NV12 IMFSample output? - directx

Are there any good examples that show how to render the IMFSample output from the H.264 decoder? My scenario uses a 4K resolution H.264 stream and the PC that I am targeting will only accept 1080p using the DXGI buffers. But the H.264 decoder will handle 4K so I need to find a way to feed that NV12 IMFSample directly to the DirectX 11 renderer. I have already tried using the DX11VideoRenderer sample but it fails due to this particular IMFSample not having an IMFDXGIBuffer interface.
It looks like in the DX11VideoRenderer the input IMFDXGIBuffer is NV12 type and that can be rendered successfully in hardware. So it seems logical that a non-DXGI buffer of NV12 type should be acceptable too?
Perhaps I need to create a ID3D11Texture2D texture or resource with an NV12 type? I found examples for how to create a texture from a file but none for how to create a texture from a sample, which would seem to be even more useful. And if I can create a NV12 texture, how to figure out the SysMemPitch and SysMemSlicePitch values in the D3D11_SUBRESOURCE_DATA structure for NV12?
Any help would be really appreciated! Thank you.

I was able to find a complete example that renders an NV12 sample to the screen. Although there are some simple stride calculation errors in how it renders it's own example image, the actual rendering code does work correctly. It appears to be an old Microsoft sample that I cannot find any other information about.
D3D11NV12Rendering

Related

What factors determine DXGI_FORMAT?

I am not familiar with directx, but I ran into a problem in a small project, part of which involves capturing directx data. I hope, below I make some sense.
General question:
I would like to know what factors determine the DXGI_FORMAT of a texture in the backbuffer (hardware?, OS?, application?, directx version?). And more importantly, when capturing a texture from the backbuffer, is it possible to receive a texture in the desired format by supplying the format as a parameter, having the format automatically converted if necessary.
Specifics about my problem :
I capture screens from games using Open Broadcaster Software(OBS) and process them using a specific library(OpenCV) prior to streaming. I noticed that, following updates to both Windows and OBS, I get 'DXGI_FORMAT_R10G10B10A2_UNORM' as the DXGI_FORMAT. This is a problem for me, because as far as I know OpenCV does not provide a convenient way for building an OpenCV object when colors are 10bits. Below are a few relevant lines from the modified OBS source file.
d3d11_copy_texture(data.texture, backbuffer);
...
hlog(toStr(data.format)); // prints 24 = DXGI_FORMAT_R10G10B10A2_UNORM
...
ID3D11Texture2D* tex;
bool success = create_d3d11_stage_surface(&tex);
if (success) {
...
HRESULT hr = data.context->Map(tex, subresource, D3D11_MAP_READ, 0, &mappedTex);
...
Mat frame(data.cy, data.cx, CV_8UC4, mappedTex.pData, (int)mappedTex.RowPitch); //This creates an OpenCV Mat object.
//No support for 10-bit coors. Expects 8-bit colors (CV_8UC4 argument).
//When the resulting Mat is viewed, colours are jumbled (Probably because 10-bits did not fit into 8-bits).
Before the updates (when I was working on this a year ago), I was probably receiving DXGI_FORMAT = DXGI_FORMAT_B8G8R8A8_UNORM, because the code above used to work.
Now I wonder what changed, and whether I can modify the source code of OBS to receive data with the desired DXGI_FORMAT.
'create_d3d11_stage_surface' method called above sets the DXGI_FORMAT, but I am not sure if it means 'give me data with this DXGI_FORMAT' or 'I know you work with this format, give me what you have'.
static bool create_d3d11_stage_surface(ID3D11Texture2D **tex)
{
HRESULT hr;
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = data.cx;
desc.Height = data.cy;
desc.Format = data.format;
...
I hoped that, overriding the desc.Format with DXGI_FORMAT_B8G8R8A8_UNORM would result in that format being passed as argument in the ID3D11DeviceContext::Map call above, and I would get data with specified format. But that did not work.
The choice of render target is up to the application, but they need to pick one based on the Direct3D hardware feature level. Formats for render targets in swapchains are usually display scanout formats:
DXGI_FORMAT_R8G8B8A8_UNORM
DXGI_FORMAT_R8G8B8A8_UNORM_SRGB
DXGI_FORMAT_B8G8R8A8_UNORM
DXGI_FORMAT_B8G8R8A8_UNORM
DXGI_FORMAT_R10G10B10A2_UNORM
DXGI_FORMAT_R16G16B16A16_FLOAT
DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM (rare)
See the DXGI documentation for the full list of supported formats and usages by feature level.
Direct3D 11 does not do format conversions when you copy resources such as copying to staging render textures, so if you want to do a format conversion you'll need to handle that yourself. Note that CPU-side conversion code for all the DXGI formats can be found in DirectXTex.
It is the application that decides that format. The simplest one would be R8G8B8A8, which simply represents RGB and alpha values. But, if developer decides that he will be using HDR, the backbuffer would probably be R11B11G10, because you can store way more precise data there, without alpha channel information. If the game is for example black and white, there's no need to keep all RGB channels in the back buffer, you could use simpler format. I hope this helps.

Mov file has more frames than written/Possible iOS AVAsset writer usage issue

I am manually generated a .mov video file.
Here is a link to an example file: link, I wrote a few image frames, and then after a long break wrote approximately 15 image frames just to emphasise my point for debuting purposes. When I extract images from the video ffmpeg returns around 400 frames instead of the 15-20 I expected. Is this because the API i am using is inserting these image files automatically? Is it a part of the .mov file format that requires this? Or is it due to the way the library is extracting the image frames from the video? I have tried searching the internet but could not arrive at an answer.
My use case is that I am trying to write the current "sensor data" (from core motion) from core motion while writing a video. For each frame I receive from the camera, I use "AppendPixelBuffer" to write the frame to the video and then
Thanks for any help. The end result is I want a 1:1 ratio of Frames in the video to rows in the CSV file. I have confirmed I am writing the CSV file correctly using various counters etc. So my issue is cleariy the understanding of the movie format or API.
Thanks for any help.
UPDATED
It looks like your ffmpeg extractor is wrong. To extract only the timestamped frames (and not frames sampled at 24Hz) in your file, try this:
ffmpeg -i video.mov -r 1/1 image-%03d.jpeg
This gives me the 20 frames expected.
OLD ANSWER
ffprobe reports that your video has a frame rate of 2.19 frames/s and a duration of 17s, which gives 2.19 * 17 = 37 frames, which is closer to your expected 15-20 than ffmpeg's 400.
So maybe the ffmpeg extractor is at fault?
Hard to say if you don't show how you encode and decode the file.

In OpenCV many conversions to JPG using imEncode fails

For a specific purpose I am trying to convert an AVI video to a kind of Moving JPEG format using OpenCV. In order to do so I read images from the source video, convert them to JPEG using imEncode, and write these JPEG images to the target video.
After several hundreds of frames suddenly the size of the resulting JPEG image nearly doubles. Here's a list of sizes:
68045
68145
68139
67885
67521
67461
67537
67420
67578
67573
67577
67635
67700
67751
127800
127899
127508
127302
126990
126904
Anybody got a clue what's going on here?
By the way: I'm using OpenCV.Net as a wrapper for OpenCV.
Thanks a lot in advance,
Paul
I found the solution. If I explicitly enter the third parameter to imEncode (for JPEG encoding this indicates the quality of the encoding, ranging from 0 to 100) instead of using the default (95) the problem disappears. It's likely this is a bug in OpenCV.Net, but it could also be a bug in OpenCV itself.

Get PTS from raw H264 mdat generated by iOS AVAssetWriter

I'm trying to simultaneously read and write H.264 mov file written by AVAssetWriter. I managed to extract individual NAL units, pack them into ffmpeg's AVPackets and write them into another video format using ffmpeg. It works and the resulting file plays well except the playback speed is not right. How do I calculate the correct PTS/DTS values from raw H.264 data? Or maybe there exists some other way to get them?
Here's what I've tried:
Limit capture min/max frame rate to 30 and assume that the output file will be 30 fps. In fact its fps is always less than values that I set. And also, I think the fps is not constant from packet to packet.
Remember each written sample's presentation timestamp and assume that samples map one-to-one to NALUs and apply saved timestamp to output packet. This doesn't work.
Setting PTS to 0 or AV_NOPTS_VALUE. Doesn't work.
From googling about it I understand that raw H.264 data usually doesn't contain any timing info. It can sometimes have some timing info inside SEI, but the files that I use don't have it. On the other hand, there are some applications that do exactly what I'm trying to do, so I suppose it is possible somehow.
You will either have to generate them yourself, or access the Atom's containing timing information in the MP4/MOV container to generate PTS/DTS information. FFmpeg's mov.c in libavformat might help.
Each sample/frame you write with AVAssetWriter will map one to one with the VCL NALs. If all you are doing is converting then have FFmpeg do all the heavy lifting. It will properly maintain the timing information when going from one container format to another.
The bitstream generated by AVAssetWriter does not contain SEI data. It only contains SPS/PPS/I/P frames. The SPS also does not contain VUI or HRD parameters.
-- Edit --
Also, keep in mind that if you are saving PTS information from the CMSampleBufferRef's then the time base may be different from that of the target container. For instance AVFoundation time base is nanoseconds, and a FLV file is milliseconds.

Load RAW YUV video in OPENCV

I have a problem with a RAW YUB video load in OpenCV. I can play it in mplayer with the following command:
mplayer myvideo.raw -rawvideo w=1280:h=1024:fps=30:y8 -demuxer rawvideo
My code for load in OpenCV is:
CvCapture* capture=cvCaptureFromFile("C:\\myvideo.raw");
cvCaptureFromFile always return NULL. But if I try with a normal avi file, the code runs normally (capture is not null).
I'm working with the lastest version of OpenCV under Windows 7.
EDIT: Output messages are
[IMGUTILS # 0036f724] Picture size 0x0 is invalid
[image2 # 009f3300] Could not find codec parameters (Video: rawvideo, yuv420p)
Thanks
OpenCV uses ffmpeg as back-end, however, it includes only a subset of ffmpeg functions. What you can try is to install some codecs. (K-lite helped me some time ago)
But, if your aim is to obtain raw YUV in OpenCV, the answer is "not possible".
OpenCV is hardcoded to convert every input format to BGR, so even if you will be able to open the raw input, it will automatically convery it to BGR before passing it. No chance to solve that, the only way is to use a different capture library or hack into OpenCV.
What you can do (to simulate YUV input) is to capture the avi, convert to YUV
cvtColor(...,CV_BGR2YCBCR /* or CV_BGR2YUV */ );
and then process it

Resources