Best way to render in multiple windows using DirectX11 - directx

I am a newbie in DirectX and currently learning DirectX11. I have been assigned a task where I have to render our graphics in multiple windows (each window with it's own HWND), usually two(say Window1 and Window2).
What is the most efficient way to implement this ?
We are using onboard Intel 4600 HD. It's not a fancy chip but gets the job done. Would appreciate some insights into this multiple window rendering.
The hardware we are using has two video outputs. We connect two physical displays (Display1 and Display2) to these outputs and each window is then rendered in it's physical display.
That is, Window1 will be rendered on Display1 and Window2 will be rendered on Display2.

Related

Enforce use of independent flip mode with DXGI FLIP SwapChain

I currently face a problem with DXGI Swapchains (DirectX 11). My C++ application shows (live) video and my goal is to minimize latency. I have no user input to process.
In order to decrease latency I switched to a DXGI_SWAP_EFFECT_FLIP_DISCARD swapchain (I used BitBlt before - see For best performance, use DXGI flip model for further details). I use the following flags:
//Swapchain Init:
sc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
sc.Flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT | DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING;
//Present:
m_pDXGISwapChain->Present(0, DXGI_PRESENT_ALLOW_TEARING);
On one computer the swapchain (windowed) goes into the "Hardware: Independent Flip" mode and I have perfectly low latency, as long as I have no other windows in front. On all other computers I tried, I am stuck in the "Composed: Flip" mode with higher latency (~ 25-30 ms more). The software (binary) is exactly the same. I am checking this with PresentMon.
What I find interesting is that on the computer where the independent flip mode works, it is also active without the ALLOW_TEARING flags - from what I understood they should be required for it. I btw also see tearing in this case, but that is a different problem.
I already tried to compare Windows 10 versions, graphic drivers and driver settings. GPU is a Quadro RTX 4000 for all systems. I couldn't spot any difference between the systems.
I would really appreciate any hints on additional preconditions for the independent flip mode I might have missed in the docs. Thanks for your help!
Update1: I updated the "working" system from Nvidia driver 511.09 to 473.47 (latest stable). After that I got the same behavior like on the other systems (no ind. flip). After going back to 511.09 it worked again. So the driver seems to have influence. The other systems also had 511.09 for my original tests though.
Update2: After dealing with all DirectX debug outputs, it still does not work as desired. I manage to get into independent flip mode only in real full screen mode or in windowed mode where the window has no decorations and actually covers the whole screen. Unfortunately, using the Graphics Tools for VS I never enter the independent flip and cannot do further analysis here. But it is interesting that when using the Graphics Tools debug, PresentMon shows Composed Flip, but the Graphics Analyzer from the Graphics Tools shows only DISCARD as SwapEffect for the SwapChain. I would have expected FLIP_DISCARD as I explicitly used DXGI_SWAP_EFFECT_FLIP_DISCARD.

How to get the memory address of a rendered image?

I develop a simulator in Unity which has to communicate with another application. The problem is that I have to render a camera manually and send the obtained image to the other application, which takes too long.
My question: is there a way to get the memory address where the obtained image is? I need this to read the image directly from the other application in order to reduce the time of the transmission.

Access the whole video memory

I'm looking for a way to read the whole video memory that a video card outputs to a display. That includes also hardware accelerated output, video playback and output in fullscreen mode (that somehow I feel could be different from windowed mode).
In short: I want to be able to capture everything that is going to be represented on a display.
I suppose that IF that's possible it would be os-dependant. The targets I'm interested in are Windows OSX and Linux.
Do you have any hint?
For windows I guess you could take CamStudio, strip it down and use it to record the screen then do whatever you want with the output, other than that you could look into forensic kernel drivers for accessing RAM. It's not exactly as simple as a pointer pointing to the video memory anymore, haha.
Digital Rights Management, requested feature of Windows, attempts to block your access to blocks of graphics-card frame buffer memory. Using an open-source driver under Linux would seem to be the only way to access this memory, or as mentioned earlier, some 3rd party software that knows some back doors or hacks or ways to locate other program's frame buffer space.
Unless of course, you are trying to capture output from your own program (ie you are calling the video/graphics creation functions yourself), there are APIs to manipulate display frames in DirectX and OpenGL.
I think I found some resources that can help to capture the display memory in Windows
Fastest method of screen capturing
How to save backbuffer to file in DirectX 10?
http://betterlogic.com/roger/2010/07/fast-screen-capture/

Is PIX replay using actual driver?

If I run a 3D application (like a benchmark tool or game) using PIX and replay the capture later, is the replay actually calling the same API (and thus invoking the actual driver and GPU, rather than running a punt back or emulated 3D using CPU) the same way it was when running the original 3D application? I'm focusing only on the Direct3D API part.
Is there any other way I can do the capture, because for some application, PIX fails to capture them.
Is there a way for me to capture only a subset of the rendering, say only the middle 50 frames?

DirectX 11 - 2nd GPU is slower?

I have two GPUs, both the same (nvidia 680s) with SLI disabled. If I create the device and device context with the enum 0 adapter (GPU) my simple clear surface program runs at 0.05ms/frame. However, if I run it with the enum 1 adapter instead (other GPU) it runs at over 1ms/frame.
How is it that one of my GPUs can be so much slower than the other? They are both installed in the correct PCI 3.0 16x slots as according to the motherboard.
Am I missing something? I've looked over the code 1000x and have virtually ruled out a mistake in coding - I simply swapped between the adapter used in creating the device and swap chain.
You're not providing enough information as to what your code does, specifically with respect to display.
But my guess would be that one GPU controls the display you're using for your output, and the other does not. So displaying your render is immediate on the first GPU, but requires a full framebuffer copy between the 2 GPUs on the other case.
This will have to go over PCIe. 16x PCIe 3 is still 15.6GB/s, so that would imply ~10MB of transfer, which is about what a 1920x1200 display is.
Can you give more details as to what your resolution and display look like? was this in full-screen?

Resources