DXGI Waitable SwapChain not waiting (D3D11)

DXGI Waitable SwapChain not waiting (D3D11) - directx

there is another question with the same title on the site, but that one didn't solve my problem
I'm writing a Direct3D 11 desktop application, and I'm trying to implement waitable swap chain introduced by this document to reduce latency (specifically, the latency between when user moved the mouse and when the monitor displayed the change)
Now the problem is, I called WaitForSingleObject on the handle returned by GetFrameLatencyWaitableObject, but it did not wait at all and returned immediately, (which result in my application getting about 200 to 1000 fps, when my monitor is 60Hz) so my questions are:
Did even I understand correctly what a waitable swap chain does? According to my understanding, this thing is very similar to VSync (which is done by passing 1 for the SyncInterval param when calling Present on the swap chain), except instead of waiting for a previous frame to finish presenting on the screen at the end of a render loop (which is when we're calling Present), we can wait at the start of a render loop (by calling WaitForSingleObject on the waitable object)
If I understood correctly, then what am I missing? or is this thing only works for UWP applications? (because that document and its sample project are in UWP?)
here's my code to create swap chain:
SwapChainDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
SwapChainDesc.Stereo = false;
SwapChainDesc.SampleDesc.Count = 1;
SwapChainDesc.SampleDesc.Quality = 0;
SwapChainDesc.BufferUsage = D3D11_BIND_RENDER_TARGET;
SwapChainDesc.BufferCount = 2;
SwapChainDesc.Scaling = DXGI_SCALING_STRETCH;
SwapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
SwapChainDesc.AlphaMode = DXGI_ALPHA_MODE_UNSPECIFIED;
SwapChainDesc.Flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
result = Factory2->CreateSwapChainForHwnd(Device.Get(), hWnd, &SwapChainDesc, &FullscreenDesc, nullptr, &SwapChain1);
if (FAILED(result)) return result;
here's my code to get waitable object:
result = SwapChain2->SetMaximumFrameLatency(1); // also tried setting it to "2"
if (FAILED(result)) return result;
WaitableObject = SwapChain2->GetFrameLatencyWaitableObject(); // also, I never call ResizeBuffers
if (WaitableObject == NULL) return E_FAIL;
and here's my code for render loop:
while (Running) {
if (WaitForSingleObject(WaitableObject, 1000) == WAIT_OBJECT_0) {
Render();
HRESULT result = SwapChain->Present(0, 0);
if (FAILED(result)) return result;
}
}

So I took some time to download and test the official sample, now I think I'm ready to answer my own questions:
No, waitable swap chain does not work like how I think, it does not wait until a previous frame is presented on the monitor. Instead, I think what it does is probably to wait until all the work before Present are finished (GPU finished rendering to render target, but not yet displayed it on the monitor) or queued (CPU finished sending GPU all the commands, but GPU haven't finished executing them yet) I'm not sure which one is the real case, but either one, in theory, would help reduce input latency (and according to my tests, it did, both when VSync is on and off), also, now that I know that this thing has almost nothing to do with framerate control, I know now that it shouldn't be compared with VSync.
I don't think it's limited to UWP
And now I'd like to share some ideas that I have concluded for myself about input latency and framerate control:
I now believe that the concept of reducing input latency and the concept of framerate control are mutually exclusive and that a perfect balance point between them probably doesn't exist;
for example, if I want to limit framerate to 1 frame per "vblank", then the input latency (in an ideal scenario) would be as high as the monitor frame latency (which is about 16ms for a 60hz monitor);
but when I don't limit framerate, the input latency would be as high as how long a GPU would take to finish a frame (which in an ideal scenario, about 1 or 2ms, which is significantly faster not only in numbers, the improvement is visible to user's perspective as well), but a lot of frames (and CPU/GPU resources used to render them) would be wasted
As an FPS game player myself, the reason why I want to reduce input latency is obvious, is because I hate input lag;
and the reasons why I want to invest in framerate control are: firstly, I hate frame tearing (a little more than how much I hate input lag), secondly, I want to ease CPU/GPU usage when possible.
However, recently I discovered that frame tearing is perfectly defeated by using flip model (I just don't get any tearing at all when using flip model, no VSync needed), so I don't need to worry about tearing anymore.
So I now plan to prioritize latency reduction rather than framerate control until when and if one day I move on to D3D12 to figure out a way to ease CPU/GPU usage while preserving low input latency.

Related

What happens in the GPU between the call to gl.drawArrays() to g.readPixels?

Changing the Title in the hopes of being more accurate.
We have some code which runs several programs in succession by calling drawArrays() . The output textures from each stage are fed into the next and so on.
After the final call to draw, a call to readPixels() is made.
This call takes an enormous amount of time (for an output of < 1000 floats). I have measured a readPixels of that size in isolation which takes 1 or 2 ms. However in our case we see a delay of about 1500ms.
So we conjectured that the actual computation must have not started until we called readPixels(). To test this theory and to force the computation, we placed a call to gl.flush() after each gl.drawxx(). This made no difference.
So we replaced that with a call to gl.finish(). Again no difference. We finally replaced it with a call to getError(). Still no difference.
Can we conclude that gpu actually does not draw anything unless the framebuffer is read from? Can we force it to do so?

Erratic timing results from mach_absolute_time()

I'm trying to optimize a function (an FFT) on iOS, and I've set up a test program to time its execution over several hundred calls. I'm using mach_absolute_time() before and after the function call to time it. I'm doing the tests on an iPod touch 4th generation running iOS 6.
Most of the timing results are roughly consistent with each other, but occasionally one run will take much longer than the others (as much as 100x longer).
I'm pretty certain this has nothing to do with my actual function. Each run has the same input data, and is a purely numerical calculation (i.e. there are no system calls or memory allocations). I can also reproduce this if I replace the FFT with an otherwise empty for loop.
Has anyone else noticed anything like this?
My current guess is that my app's thread is somehow being interrupted by the OS. If so, is there any way to prevent this from happening? (This is not an app that will be released on the App Store, so non-public APIs would be OK for this.)
I no longer have an iOS 5.x device, but I'm pretty sure this was not happening prior to the update to iOS 6.
EDIT:
Here's a simpler way to reproduce:
for (int i = 0; i < 1000; ++i)
{
uint64_t start = mach_absolute_time();
for (int j = 0; j < 1000000; ++j);
uint64_t stop = mach_absolute_time();
printf("%llu\n", stop-start);
}
Compile this in debug (so the for loop is not optimized away) and run; most of the values are around 220000, but occasionally a value is 10 times larger or more.

In my experience, mach_absolute_time is not reliable. Now I use CFAbsoluteTime instead. It returns the current time in seconds with a much better precision than the second.
const CFAbsoluteTime newTime = CFAbsoluteTimeGetCurrent();

mach_absolute_time() is actually very low level and reliable. It runs at a steady 24MHz on all iOS devices, from the 3GS to the iPad 4th gen. It's also the fastest way to get timing information, taking between 0.5µs and 2µs depending on CPU. But if you get interrupted by another thread, of course you're going to get spurious results.
SCHED_FIFO with maximum priority will allow you to hog the CPU, but only for a few seconds at most, then the OS decides you're being too greedy. You might want to try sleep( 5 ) before running your timing test, as this will build up some "credit".
You don't actually need to start a new thread, you can temporarily change the priority of the current thread with this:
struct sched_param sched;
sched.sched_priority = 62;
pthread_setschedparam( pthread_self(), SCHED_FIFO, &sched );
Note that sched_get_priority_min & max return a conservative 15 & 47, but this only corresponds to an absolute priority of about 0.25 to 0.75. The actual usable range is 0 to 62, which corresponds to 0.0 to 1.0.

It happens while app spend some time in another threads.

How to steady memory load in mono for android

Like CPU simulation
I need to write an application that can simulate high memory-usage at a pre-set values ( e.g., 30%, 50%, 90% etc) for a certain duration. Meaning it will take two inputs (memoryvalue and duration). Let say i use 50% for memory-Usage and 2 minutes for Duration). This mean that when I run the application, it should take 50% memory for 2 minutes. Any ideas how this can be achieved?
Any help pls.

You can simulate a memory leak like this (taken from this thread):
var list = new List<byte[]>();
while (true)
{
list.Add(new byte[1024]); // Change the size here.
}
Similarly to the app I wrote for simulating CPU load for a specific amount of time, you just make a method allocating an amount of memory and create a timer, which when it runs out, clears the list and then invokes the Garbage Collector.
You have to watch out that if you allocate too much memory your system might become unresponsive and you might crash it.

how to get page size

I was asked this question in an interview Plz tell me the answer :-
You have no documentation of the kernel. You only knows that you kernel supports paging.
How will you find that page size ? There is no flag or macro you have that can tell you about page size.
I was given the hint as you can use Time to get the answer. I still have no clue for it.

Run code like the following:
for (int stride = 1; stride < maxpossiblepagesize; stride += searchgranularity) {
char* somemem = (char*)malloc(veryverybigsize*stride);
starttime = getcurrentveryaccuratetime();
for (pos = somemem; pos < somemem+veryverybigsize*stride; pos += stride) {
// iterate over "veryverybigsize" chunks of size "stride"
*pos = 'Q'; // Just write something to force the page back into physical memory
}
endtime = getcurrentveryaccuratetime();
printf("stride %u, runtime %u", stride, endtime-starttime);
}
Graph the results with stride on the X axis and runtime on the Y axis. There should be a point at stride=pagesize, where the performance no longer drops.
This works by incurring a number of page faults. Once stride surpasses pagesize, the number of faults ceases to increase, so the program's performance no longer degrades noticeably.
If you want to be cleverer, you could exploit the fact that the mprotect system call must work on whole pages. Try it with something smaller, and you'll get an error. I'm sure there are other "holes" like that, too - but the code above will work on any system which supports paging and where disk access is much more expensive than RAM access. That would be every seminormal modern system.

It looks to me like a question about 'how does paging actually work'
They want you to explain the impact that changing the page size will have on the execution of the system.
I am a bit rusty on this stuff, but when a page is full, the system starts page swapping, which slows everything down. So you want to run something that will fill up the memory to different sizes, and measure the time it takes to do a task. At some point there will be a jump, where the time taken to do the task will suddenly jump.
Like I said I am a bit rusty on the implementation of doing this. But i'm pretty sure that is the shape of the answer they were after.

Whatever answer they were expecting it would almost certainly be a brittle solution. For one thing you can have multiple pages sizes so any answer you may have gotten for one small allocation may be irrelevant for the next multi-megabyte allocation (see things like Linux's Large Page support).
I suspect the question was more aimed at seeing how you approached the problem rather than the final solution you came up with.
By the way this question isn't about linux because you do have documentation for that as well as POSIX compliance, for which you just call sysconf(_SC_PAGE_SIZE).

What does it mean by buffer?

I see the word "BUFFER" everywhere, but I am unable to grasp what it exactly is.
Would anybody please explain what is buffer in layman's language?
When is it used?
How is it used?

Imagine that you're eating candy out of a bowl. You take one piece regularly. To prevent the bowl from running out, someone might refill the bowl before it gets empty, so that when you want to take another piece, there's candy in the bowl.
The bowl acts as a buffer between you and the candy bag.
If you're watching a movie online, the web service will continually download the next 5 minutes or so into a buffer, that way your computer doesn't have to download the movie as you're watching it (which would cause hanging).

The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer gets around to it, it can read from the buffer. The buffer is there in the middle to bridge the gap.
If you average out the definitions at http://en.wiktionary.org/wiki/buffer, I think you'll get the idea.
For proof that we really did "have to walk 10 miles thought the snow every day to go to school", see TOPS-10 Monitor Calls Manual Volume 1, section 11.9, "Using Buffered I/O", at bookmark 11-24. Don't read if you're subject to nightmares.

A buffer is simply a chunk of memory used to hold data. In the most general sense, it's usually a single blob of memory that's loaded in one operation, and then emptied in one or more, Perchik's "candy bowl" example. In a C program, for example, you might have:
#define BUFSIZE 1024
char buffer[BUFSIZE];
size_t len = 0;
// ... later
while((len=read(STDIN, &buffer, BUFSIZE)) > 0)
write(STDOUT, buffer, len);
... which is a minimal version of cp(1). Here, the buffer array is used to store the data read by read(2) until it's written; then the buffer is re-used.
There are more complicated buffer schemes used, for example a circular buffer, where some finite number of buffers are used, one after the next; once the buffers are all full, the index "wraps around" so that the first one is re-used.

Buffer means 'temporary storage'. Buffers are important in computing because interconnected devices and systems are seldom 'in sync' with one another, so when information is sent from one system to another, it has somewhere to wait until the recipient system is ready.

Really it would depend on the context in each case as there is no one definition - but speaking very generally a buffer is an place to temporarily hold something. The best real world analogy I can think of would be a waiting area. One simple example in computing is when buffer refers to a part of RAM used for temporary storage of data.

That a buffer is "a place to store something temporarily, in order to mitigate differences between input speed and output speed" is accurate, consider this as an even more "layman's" way of understanding it.
"To Buffer", the verb, has made its way into every day vocabulary. For example, when an Internet connection is slow and a Netflix video is interrupted, we even hear our parents say stuff like, "Give it time to buffer."
What they are saying is, "Hit pause; allow time for more of the video to download into memory; and then we can watch it without it stopping or skipping."
Given the producer / consumer analogy, Netflix is producing the video. The viewer is consuming it (watching it). A space on your computer where extra downloaded video data is temporarily stored is the buffer.
A video progress bar is probably the best visual example of this:
That video is 5:05. Its total play time is represented by the white portion of the bar (which would be solid white if you had not started watching it yet.)
As represented by the purple, I've actually consumed (watched) 10 seconds of the video.
The grey portion of the bar is the buffer. This is the video data that that is currently downloaded into memory, the buffer, and is available to you locally. In other words, even if your Internet connection where to be interrupted, you could still watch the area you have buffered.

Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer speeds up, it can read from the buffer. The buffer is there in the middle to bridge the gap.

A buffer is a data area shared by hardware devices or program processes that operate at different speeds or with different sets of priorities. The buffer allows each device or process to operate without being held up by the other. In order for a buffer to be effective, the size of the buffer and the algorithms for moving data into and out of the buffer.
buffer is a "midpoint holding place" but exists not so much to accelerate the speed of an activity as to support the coordination of separate activities.
This term is used both in programming and in hardware. In programming, buffering sometimes implies the need to screen data from its final intended place so that it can be edited or otherwise processed before being moved to a regular file or database.

Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
There are many advantages of Buffering like it allows things to happen in parallel, improve IO performance, etc.
It also has many downside if not used correctly like buffer overflow, buffer underflow, etc.
C Example of Character buffer.
char *buffer1 = calloc(5, sizeof(char));
char *buffer2 = calloc(15, sizeof(char));

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

DXGI Waitable SwapChain not waiting (D3D11) - directx

Related

What happens in the GPU between the call to gl.drawArrays() to g.readPixels?

Erratic timing results from mach_absolute_time()

How to steady memory load in mono for android

how to get page size

What does it mean by buffer?

Categories

Resources