D3D11CreateDeviceAndSwapChain slow - directx

I've been lately bother by a slow startup of my d3d11 app, so I started investigating and found that the culprit is D3D11CreateDeviceAndSwapChain. This single call takes roughly 1.5 seconds. That seems crazy slow to me. Is this also your experience?
This is the setup code:
DXGI_SWAP_CHAIN_DESC swap_chain_desc = {};
swap_chain_desc.BufferDesc.Width = window->window_width;
swap_chain_desc.BufferDesc.Height = window->window_height;
swap_chain_desc.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
swap_chain_desc.SampleDesc.Count = 1;
swap_chain_desc.SampleDesc.Quality = 0;
swap_chain_desc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swap_chain_desc.BufferCount = 2;
swap_chain_desc.OutputWindow = window->window_handle;
swap_chain_desc.BufferDesc.RefreshRate.Numerator = 60;
swap_chain_desc.BufferDesc.RefreshRate.Denominator = 1;
swap_chain_desc.Windowed = true;
swap_chain_desc.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;
swap_chain_desc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL;
D3D_FEATURE_LEVEL feature_level = D3D_FEATURE_LEVEL_11_0;
D3D_FEATURE_LEVEL supported_feature_level;
UINT flags = 0;
#ifdef DEBUG
flags = D3D11_CREATE_DEVICE_DEBUG;
#endif
HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, flags, &feature_level, 1, D3D11_SDK_VERSION,
&swap_chain_desc, &context.swapChain, &context.device, &supported_feature_level, &context.context);

First, you should consider D3D11CreateDeviceAndSwapChain deprecated, the device should be create separately, and you have a few variant of swap chain creation based on what your application is (hwnd vs core window for ex).
There is no justification to a slow device creation, but past experience with a slow device creation where due to d3d hooking from application like Steam. The easiest way to confirm is to take a profiling capture of your initialization and look at the involved callstacks and DLLs, child of the device creation.

Related

error when creating 2d texture with dynamic format

i am pretty newbie on directx.
and i am stumbling with handling resource.
Okay First, i created texture that i can read/write in GPU, and it worked well.
And now, as you can check in my code, i wanted to read this texture in CPU as well(reading from application side), so i edited the usage from USAGE_DEFAULT to USAGE_DYNAMIC.
Microsoft::WRL::ComPtr<ID3D11Texture2D> outputTexture;
D3D11_TEXTURE2D_DESC outputTex_desc;
outputTex_desc.Format = DXGI_FORMAT_R32_FLOAT;
outputTex_desc.Width = 3;
outputTex_desc.Height = 3;
outputTex_desc.MipLevels = 1;
outputTex_desc.ArraySize = 1;
outputTex_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS |
D3D11_BIND_SHADER_RESOURCE;
outputTex_desc.SampleDesc.Count = msCount;
outputTex_desc.SampleDesc.Quality = msQuality;
outputTex_desc.Usage = D3D11_USAGE_DYNAMIC;
outputTex_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
outputTex_desc.MiscFlags = 0;
// CREATE 'TEXTURE'
device->CreateTexture2D( // FAIL HERE !!!
&outputTex_desc,
nullptr,
outputTexture.GetAddressOf());
// CREATE 'SRV'
...
// CREATE 'UAV'
...
and it starts failing exactly when it executes the 'device->CreateTexture2D()'
any advice would be amazing.
The first two steps in debugging any Direct3D 11 program is:
Make sure you are checking every HRESULT for either success (SUCCEEDED macro) or failure (FAILED macro). If it is safe to ignore the return value at runtime, then the function returns void. See ThrowIfFailed.
Enable the Direct3D debug device and look for debug output (a.k.a. use D3D11_CREATE_DEVICE_DEBUG in your Debug configuration).
If you enable the Direct3D debug device, you'll get detailed information on why the API returns failure codes in many cases.
If you did that, you'd see the error for this code:
D3D11 ERROR: ID3D11Device::CreateTexture2D: A D3D11_USAGE_DYNAMIC Resource
may only have the D3D11_CPU_ACCESS_WRITE CPUAccessFlags set.
[ STATE_CREATION ERROR #98: CREATETEXTURE2D_INVALIDCPUACCESSFLAGS]
In order to READ it on the CPU, you must copy to a D3D11_USAGE_STAGING Resource first. For example source code on doing that, see the ScreenGrab code from DirectX Tool Kit.
You don't mention what your msCount or msQuality values are here. I assumed 1, and 0 respectively. If you use any other values, you'll get:
D3D11 ERROR: ID3D11Device::CreateTexture2D: Multisampling is not supported
with the D3D11_BIND_UNORDERED_ACCESS BindFlag. SampleDesc.Count must be 1
and SampleDesc.Quality must be 0.
[ STATE_CREATION ERROR #99: CREATETEXTURE2D_INVALIDBINDFLAGS]
In order to access a texture from cpu (read mode), you need you create a separate staging texture, then copy your texture into it.
These are the flags for the gpu only texture (please note I force sample count to 1, as UAV access is not allowed for multisampled textures)
D3D11_TEXTURE2D_DESC gpuTexDesc;
gpuTexDesc.Format = DXGI_FORMAT_R32_FLOAT;
gpuTexDesc.Width = 3;
gpuTexDesc.Height = 3;
gpuTexDesc.MipLevels = 1;
gpuTexDesc.ArraySize = 1;
gpuTexDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS |
D3D11_BIND_SHADER_RESOURCE;
gpuTexDesc.SampleDesc.Count = 1;
gpuTexDesc.SampleDesc.Quality = 0;
gpuTexDesc.Usage = D3D11_USAGE_DEFAULT;
gpuTexDesc.CPUAccessFlags = D3D11_CPU_ACCESS_NONE;
gpuTexDesc.MiscFlags = 0;
Then you create a second texture for reading
D3D11_TEXTURE2D_DESC readTexDesc;
readTexDesc.Format = DXGI_FORMAT_R32_FLOAT;
readTexDesc.Width = 3;
readTexDesc.Height = 3;
readTexDesc.MipLevels = 1;
readTexDesc.ArraySize = 1;
readTexDesc.BindFlags = 0; //No bind flags allowed for staging
readTexDesc.SampleDesc.Count = 1;
readTexDesc.SampleDesc.Quality = 0;
readTexDesc.Usage = D3D11_USAGE_STAGING; //need staging flag for read
readTexDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
readTexDesc.MiscFlags = 0;
then you can use CopyResource:
deviceContext->CopyResource(readTex, gpuTex);
Once done, you can finally access texture data for reading using Map
D3D11_MAPPED_SUBRESOURCE MappedResource;
deviceContext->Map(readTex, 0, D3D11_MAP_READ, 0, &MappedResource);
MappedResource will give you access to data in cpu, once you done processing, don't forget to Unmap the resource.
deviceContext->Unmap(readTex, 0);

ParamValidationExt error with WelsInitEncoderExt failed while setting up OpenH264 encoder

Scenario:
I am using OpenH264 with my App to encode into a video_file.mp4.
Environment:
Platform : MacOs Sierra
Compiler : Clang++
The code:
Following is the crux of the code I have:
void EncodeVideoFile() {
ISVCEncoder * encoder_;
std:string video_file_name = "/Path/to/some/folder/video_file.mp4";
EncodeFileParam * pEncFileParam;
SEncParamExt * pEnxParamExt;
float frameRate = 1000;
EUsageType usageType = EUsageType::CAMERA_VIDEO_REAL_TIME;
bool denoise = false;
bool lossless = true;
bool enable_ltr = false;
int layers = 1;
bool cabac = false;
int sliceMode = 1;
pEncFileParam = new EncodeFileParam;
pEncFileParam->eUsageType = EUsageType::CAMERA_VIDEO_REAL_TIME;
pEncFileParam->pkcFileName = video_file_name.c_str();
pEncFileParam->iWidth = frame_width;
pEncFileParam->iHeight = frame_height;
pEncFileParam->fFrameRate = frameRate;
pEncFileParam->iLayerNum = layers;
pEncFileParam->bDenoise = denoise;
pEncFileParam->bLossless = lossless;
pEncFileParam->bEnableLtr = enable_ltr;
pEncFileParam->bCabac = cabac;
int rv = WelsCreateSVCEncoder (&encoder_);
pEnxParamExt = new SEncParamExt;
pEnxParamExt->iUsageType = pEncFileParam->eUsageType;
pEnxParamExt->iPicWidth = pEncFileParam->iWidth;
pEnxParamExt->iPicHeight = pEncFileParam->iHeight;
pEnxParamExt->fMaxFrameRate = pEncFileParam->fFrameRate;
pEnxParamExt->iSpatialLayerNum = pEncFileParam->iLayerNum;
pEnxParamExt->bEnableDenoise = pEncFileParam->bDenoise;
pEnxParamExt->bIsLosslessLink = pEncFileParam->bLossless;
pEnxParamExt->bEnableLongTermReference = pEncFileParam->bEnableLtr;
pEnxParamExt->iEntropyCodingModeFlag = pEncFileParam->bCabac ? 1 : 0;
for (int i = 0; i < pEnxParamExt->iSpatialLayerNum; i++) {
pEnxParamExt->sSpatialLayers[i].sSliceArgument.uiSliceMode = pEncFileParam->eSliceMode;
}
encoder_->InitializeExt(pEnxParamExt);
int videoFormat = videoFormatI420;
encoder_->SetOption (ENCODER_OPTION_DATAFORMAT, &videoFormat);
int frameSize = frame_width * frame_height * 3 / 2;
int total_num = 500;
BufferedData buf;
buf.SetLength (frameSize);
// check the buffer before proceeding
if (buf.Length() != (size_t)frameSize) {
CloseEncoder();
return;
}
SFrameBSInfo info;
memset (&info, 0, sizeof (SFrameBSInfo));
SSourcePicture pic;
memset (&pic, 0, sizeof (SSourcePicture));
pic.iPicWidth = frame_width;
pic.iPicHeight = frame_height;
pic.iColorFormat = videoFormatI420;
pic.iStride[0] = pic.iPicWidth;
pic.iStride[1] = pic.iStride[2] = pic.iPicWidth >> 1;
pic.pData[0] = buf.data();
pic.pData[1] = pic.pData[0] + frame_width * frame_height;
pic.pData[2] = pic.pData[1] + (frame_width * frame_height >> 2);
for(int num = 0; num < total_num; num++) {
// try to encode the frame
rv = encoder_->EncodeFrame (&pic, &info);
}
if (encoder_) {
encoder_->Uninitialize();
WelsDestroySVCEncoder (encoder_);
}
}
Above code is something I pulled up from official usage examples of OpenH264 where BufferedData.h is a class I reused from OpenH264 utils
Issue:
But, I am getting the following error:
[OpenH264] this = 0x0x1038bc8c0, Error:ParamValidationExt(), width > 0, height > 0, width * height <= 9437184, invalid 0 x 0 in dependency layer settings!
[OpenH264] this = 0x0x1038bc8c0, Error:WelsInitEncoderExt(), ParamValidationExt failed return 2.
[OpenH264] this = 0x0x1038bc8c0, Error:CWelsH264SVCEncoder::Initialize(), WelsInitEncoderExt failed.
Above does not crash the application but it goes through a blank run without creating the video_file.mp4 with the dummy data that I am trying to write into it.
Question:
There seems to be something wrong with the set up config I applying to pEnxParamExtwhich goes into encoder_->InitializeExt.
What am I doing wrong with the set up of the encoder?
Note:
I am not trying to hook up to any camera device. I am just trying to create a .mp4 video out of some dummy image data.
If you want to get complete and working OpenH264 Encoder Initialization procedure you can click... here.
According to your problem scenario, you are trying to create a video file(.mp4/.avi) from some dummy images. This task can be accomplished using two different libraries: i) Library for Codec, ii) Library for Container.
i) Library for Codec: It's so much easy to use a OpenH264 to compress data. One thing I must mention is that, OpenH264 always works with raw frames e.g. yuv420 data. So, if you want to compress your image data, you have to convert these image data into yuv420 color format. To get OpenH264 click... here
ii) Library for Container: After getting the encoded data you have to use another library to create the container with extension .mp4, .avi, .flv etc. There exists a lot of libraries in github to do that staff like FFmpeg, OpenCV, Bento4, MP4Maker, mp4parser etc. Before using these libraries please check in detail about the license issues. If you use FFmpeg, you will not need to use OpenH264 becuse FFmpeg itself works along with several codecs. You will also find lot more working examples as so many developers are working with video data out there.
Hope it helps. :)

Can't save to Flash Memory?

I am using the following library <flash.h> to Erase/Write/Read from memory but unfortunately the data I am trying to save doesn't seem to be written to flash memory. I am using PIC18F87j11 with MPLAB XC8 compiler. Also when I read the program memory from PIC after attempting to write to it, there is no data on address 0x1C0CA. What am I doing wrong?
char read[1];
/* set FOSC clock to 8MHZ */
OSCCON = 0b01110000;
/* turn off 4x PLL */
OSCTUNE = 0x00;
TRISDbits.TRISD6 = 0; // set as ouput
TRISDbits.TRISD7 = 0; // set as ouput
LATDbits.LATD6 = 0; // LED 1 OFF
LATDbits.LATD7 = 1; // LED 2 ON
EraseFlash(0x1C0CA, 0x1C0CA);
WriteBytesFlash(0x1C0CA, 1, 0x01);
ReadFlash(0x1C0CA, 1, read[0]);
if (read[0] == 0x01)
LATDbits.LATD6 = 1; // LED 1 ON
while (1) {
}
I don't know what WriteFlashBytes does but the page size for your device is 64 bytes and after writing you need to write an ulock sequence to EECON2 and EECON1 registers to start programming the flash memory

Portaudio MME device behaviour issue

I am using the multiple-output-device feature provided by paMME host API to output audio through multiple stereo devices. I also need to use a single multichannel input device using MME.
- When I configure just the output device and play internally generated audio, there is no problem.
- However problem starts to occur when I configure both the input device and the mulitple-stereo output devices. The application crashes when I try to use more than two channels on the output. That is, if I try to increment the 'out' pointer for more than 2*frames_per_buffer , it crashes, which indicates that buffer has been allocated only to two output channels.
Can anybody throw some light on what could be the problem. The configuration code is given below:
outputParameters.device = paUseHostApiSpecificDeviceSpecification;
outputParameters.channelCount = 8;
outputParameters.sampleFormat = paInt16;
outputParameters.hostApiSpecificStreamInfo = NULL;
wmmeStreamInfo.size = sizeof(PaWinMmeStreamInfo);
wmmeStreamInfo.hostApiType = paMME;
wmmeStreamInfo.version = 1;
wmmeStreamInfo.flags = paWinMmeUseMultipleDevices;
wmmeDeviceAndNumChannels[0].device = selectedDeviceIndex[0];
wmmeDeviceAndNumChannels[0].channelCount = 2;
wmmeDeviceAndNumChannels[1].device = selectedDeviceIndex[1];
wmmeDeviceAndNumChannels[1].channelCount = 2;
wmmeDeviceAndNumChannels[2].device = selectedDeviceIndex[2];
wmmeDeviceAndNumChannels[2].channelCount = 2;
wmmeDeviceAndNumChannels[3].device = selectedDeviceIndex[3];
wmmeDeviceAndNumChannels[3].channelCount = 2;
wmmeStreamInfo.devices = wmmeDeviceAndNumChannels;
wmmeStreamInfo.deviceCount = 4;
outputParameters.suggestedLatency = Pa_GetDeviceInfo( selectedDeviceIndex[0] )->defaultLowOutputLatency;
outputParameters.hostApiSpecificStreamInfo = &wmmeStreamInfo;
inputParameters.device = selectedInputDeviceIndex; /* default output device */
inputParameters.channelCount = 8; /* stereo output */
inputParameters.sampleFormat = paInt16; /* 32 bit floating point output */
inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultLowInputLatency;
inputParameters.hostApiSpecificStreamInfo = NULL;
Thanks and regards,
Siddharth Kumar.

Return error when trying to copy the render target's backbuffer

I have one WDDM user mode display driver for DX9. Now I would like to dump the
render target's back buffer to a bmp file. Since the render target resource is
not lockable, I have to create a resource from system buffer and bitblt from the
render target to the system buffer and then save the system buffer to the bmp
file. However, calling the bitblt always return the error code E_FAIL. I also
tried to call the pfnCaptureToSysMem which also returned the same error code.
Anything wrong here?
D3DDDI_SURFACEINFO nfo;
nfo.Depth = 0;
nfo.Width = GetRenderSize().cx;
nfo.Height = GetRenderSize().cy;
nfo.pSysMem = NULL;
nfo.SysMemPitch = 0;
nfo.SysMemSlicePitch = 0;
D3DDDIARG_CREATERESOURCE resource;
resource.Format = D3DDDIFMT_A8R8G8B8;
resource.Pool = D3DDDIPOOL_SYSTEMMEM;
resource.MultisampleType = D3DDDIMULTISAMPLE_NONE;
resource.MultisampleQuality = 0;
resource.pSurfList = &nfo;
resource.SurfCount = 1;
resource.MipLevels = 1;
resource.Fvf = 0;
resource.VidPnSourceId = 0;
resource.RefreshRate.Numerator = 0;
resource.RefreshRate.Denominator = 0;
resource.hResource = NULL;
resource.Flags.Value = 0;
resource.Flags.Texture = 1;
resource.Flags.Dynamic = 1;
resource.Rotation = D3DDDI_ROTATION_IDENTITY;
HRESULT hr = m_pDevice->m_deviceFuncs.pfnCreateResource(m_pDevice->GetDrv(), &resource);
HANDLE hSysSpace = resource.hResource;
D3DDDIARG_BLT blt;
blt.hSrcResource = m_pDevice->m_hRenderTarget;
blt.hDstResource = hSysSpace;
blt.SrcRect.left = 0;
blt.SrcRect.top = 0;
blt.SrcRect.right = GetRenderSize().cx;
blt.SrcRect.bottom = GetRenderSize().cy;
blt.DstRect = blt.SrcRect;
blt.DstSubResourceIndex = 0;
blt.SrcSubResourceIndex = 0;
blt.Flags.Value = 0;
blt.ColorKey = 0;
hr = m_pDevice->m_deviceFuncs.pfnBlt(m_pDevice, &blt);
You are on the right track, but I think you can use the DirectX functions for this.
In order to copy the render target from video memory to system memory you should use the IDirect3DDevice9::GetRenderTargetData() function.
This function requires that the destination surface is an offscreen plain surface created with pool D3DPOOL_SYSTEMMEM. This surface also must have the same dimensions as the render target (no stretching allowed). Use IDirect3DDevice9::CreateOffscreenPlain() to create this surface.
Then this surface can be locked and the color data can be accessed by the CPU.

Resources