how to get a back buffer copy inside the video memory? - directx

Now I can get backbuffer pointer called pBackBuffer by
m_pDevice->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO,
IREF_GETPPTR(pBackBuffer,IDirect3DSurface9));
And then create a surface called pSurfTemp in syetem memory by
m_pDevice->CreateOffscreenPlainSurface(
g_Proc.m_Stats.m_SizeWnd.cx, g_Proc.m_Stats.m_SizeWnd.cy,
s_bbFormat, D3DPOOL_SYSTEMMEM,
IREF_GETPPTR(pSurfTemp,IDirect3DSurface9), NULL );
Then I get the back buffer data by
m_pDevice->GetRenderTargetData(pBackBuffer, pSurfTemp);
Data seems transforming like this: videoMemory->systemMemory
Since next step is to operate the data in video memory again, it will copy it from system memory back into video memory. It wastes time.
I want to copy the data inside video memory, How can I do it?

If I understood right, you will want to CreateOffscreenPlainSurface() with D3DPOOL_DEFAULT flag, so driver will choose appropriate memory location automatically. But it never guaranteed that it always will be videocard's on-board memory.
By the way, premature optimization is the root of all evil. =)
Edit: Another option is to switch API from DirectX 9 (which is obsolete) to DirectX 11, which allows much more precise resources manipulations. Also, OpenGL goes somewhere in between. Both, is a huge code rewrite.

Related

Is reusing a WebGLProgram a good idea?

Instead of creating new WebGlProgram's using gl.createProgram() is it a good idea to keep reusing one?
I am listing the steps that I should be doing if I am to reuse one:
AttachShader(s): in my case I need to attach a new Fragment shader only. (Question: Can I hang on to a compiled shader?)
linkProgram
useProgram
getAttribLocation
getUniformLocation
What are you trying to do? 99.9% of all GPU apps make shader programs and are done. They might make 1, they might make 5000, but they aren't ever in a position where they would even need to consider your question. So what are you really trying to do?
Those few apps that do allow you to edit shaders (shadertoy, glslsandbox, vertexshaderart, ...) either make new ones and delete old or reuse. There's no benefit to one or the other it's just a matter of style.
Yes you can hold on to shaders. You can use shaders with multiple programs. It's common to do so.
If you change a shader it won't affect a program unless/until you relink that program with gl.linkProgram. Anytime you call gl.linkProgram and it's successful all your previous uniform locations for that program are obsolete and you have to query new ones.

What is the result of sample with an uninitialized texture in DirectX?

The OpenGL has its description about this, but how about DirectX?
In my guess, the sample result is float(0, 0, 0, 0), or arise a crash by the driver. Whatever, it just my guess, or partial case if tested it myself only. I want to make it clear.
The uninitialized texture means, did not pass any data with D3DDevice::CreateTexture2D(), and also, did not map nor update resource.
I want to take the description about DirectX 11 version if possible.
From the Create2DTexture function (that you linked):
If you don't pass anything to pInitialData, the initial content of the memory for the resource is undefined. In this case, you need to write the resource content some other way before the resource is read.
Undefined can mean anything, the driver will determine the exact behavior. It's unlikely that it will crash, but as with undefined behavior, anything is possible. A sample from this texture is certainly not guaranteed to be float4(0,0,0,0) as it is with an unbound texture.
It's analogous to accessing uninitialized system memory. The contents might be filled memory written from previous operations that had allocated the same memory (depending on the allocator's behavior). I would suggest if you want consistent behavior, either use an unbound texture instead, or, initialize the contents.
Yes you get 0 for all values its the same if you sample from a null texture.
I cant remember where it was that I read it but I know its in the MSDN doc, I also happen to do this from time to time because I allocate most of the textures\buffers\views at the start and fill as I go and if I sample from null or uninitialized then it just read 0.

Why does VkAccessFlagBits include both read bits and write bits?

In vulkan.h, every instance of VkAccessFlagBits appears in a pair that contains a srcAccessMask and a dstAccessMask:
VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
In every case, according to my understanding, the purpose of these masks is to help designate two sets of operations, such that results of operations in the first set will be visible to operations in the second set. For instance, write operations occurring prior to a barrier should not get hung up in caches but should instead propagate all the way to locations from which they can be read after the barrier. Or something like that.
The access flags come in both READ and WRITE forms:
/* ... */
VK_ACCESS_SHADER_READ_BIT = 0x00000020,
VK_ACCESS_SHADER_WRITE_BIT = 0x00000040,
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT = 0x00000080,
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT = 0x00000100,
/* ... */
But it seems to me that srcAccessMask should probably always be some sort of VK_ACCESS_*_WRITE_BIT combination, while dstAccessMask should always be a combination of VK_ACCESS_*_READ_BIT values. If that is true, then the READ/WRITE distinction is identical to and implicit in the src/dst distinction, and so it should be good enough to just have VK_ACCESS_SHADER_BIT etc., without READ_ or WRITE_ variants.
Why are there READ_ and WRITE_ variants, then? Is it ever useful to specify that some read operations must fully complete before some other operations have begun? Note that all operations using VkAccessFlagBits produce (I think) execution dependencies as well as memory dependencies. It seems to me that the execution dependencies should be good enough to prevent earlier reads from receiving values written by later writes.
While writing this question I encountered a statement in the Vulkan specification that provides at least part of an answer:
Memory dependencies are used to solve data hazards, e.g. to ensure that write operations are visible to subsequent read operations (read-after-write hazard), as well as write-after-write hazards. Write-after-read and read-after-read hazards only require execution dependencies to synchronize.
This is from the section 6.4. Execution And Memory Dependencies. Also, from earlier in that section:
The application must use memory dependencies to make writes visible before subsequent reads can rely on them, and before subsequent writes can overwrite them. Failure to do so causes the result of the reads to be undefined, and the order of writes to be undefined.
From this I surmise that, yes, the execution dependencies produced by the Vulkan commands that involve these access flags probably do free you from ever having to put a VK_ACCESS_*_READ_BIT into a srcAccessMask field--but that you might in fact want to have READ_ flags, WRITE_ flags, or both in some of your dstAccessMask fields, because apparently it's possible to use an explicit dependency to prevent read-after-write hazards in such a way that write-after-write hazards are NOT prevented. (And maybe vice-versa?)
Like, maybe your Vulkan will sometimes decide that a write does not actually need to be propagated all the way through a particular cache to its final specified destination for the sake of a subsequent read operation, IF Vulkan happens to know that that read operation will simply read from that same cache, saving some time? But then a second write might happen, and write to a different cache, and there'll be two caches left in a race (with the choice of winner undefined) to send their two values to the same spot. Or something? Maybe my mental model of these caches is entirely wrong.
It is fairly solidly established, at least, that memory barriers are confusing.
Let's go over all the possibilities:
read–read — well yeah that one is pretty useless. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
read–write — execution dependency should be sufficient to synchronize without this. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
write–read — that's the obvious and most common one.
write–write — similar reason to write–read above. Without it the order of the writes would be undefined. It is a bit pointless for most situations to write something you haven't even read in between. But hey, now you have a way to synchronize it.
You can provide bitmask of more of these masks to both src and dst. In which case it makes sense to have both masks for driver to sort the dependencies out for you. (I don't expect performance overhead from this on API level, so it is allowed as convenience)
From API design perspective, it could mean adding different enum for srcAccess. But perhaps _READ variants could just be forbidden in srcAccess through "Valid Usage", making this argument weak. The src == READ variant might have been kept, because it is benign.

ChromeWorker to write a huge file

In my extension, I need to write a huge file (say around 20 gigs) to the disk. Currently I am doing it in the main thread, but file creation is very expensive operation. I was about to move the whole file creation process to a ChromeWorker, but based on https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Functions_and_classes_available_to_workers I cannot have access to the nsiFile from a ChromeWorker.
So my questions are:
1. Is it possible to access Cc, Ci, and Cu from within a ChromeWorker?
2. If not what would be the most efficient way to create and fill large files in Firefox. Note that I need to write the file based on segments and offsets (Ci.nsISeekableStream).
It's not possible to access nsIFile from ChromeWorker. But nsIFile is horrible synchronus option.
Go with OS.File: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm
On that page go to the link for usage on workers: https://developer.mozilla.org/docs/Mozilla/JavaScript_code_modules/OSFile.jsm/OS.File_for_workers
On the mainthread os.file returns promises.
In worker they are synchronus. Wrap your os.file functions in worker with a try-catch, as when an error occurs, (like os.file.remove with option of ignoreAbsent set to false) then the catch will hold the OS.File.Error object.
Great move to ChromeWorker btw! I'm a huge fan of ChromeWorkers. I wrote a simple example of jsm using chromeworker here: https://github.com/Noitidart/jpm-chromeworker
For segments, you'll have to OS.File.open and then on the return value do a .setPosition() then you can read certain number of bytes from that position, or write, or whatever. Its awesome stuff. OS.File is the new way and the recommended way to do file operations. Its been around awhile now though since like Firefox 29 or before that.

Unix: sharing already-mapped memory between processes

I have a pre-built userspace library that has an API along the lines of
void getBuffer (void **ppBuf, unsigned long *pSize);
void bufferFilled (void *pBuf, unsigned long size);
The idea being that my code requests a buffer from the lib, fills it with stuff, then hands it back to the lib.
I want another process to be able to fill this buffer. I can do this by creating some new shared buffer via shm*/shm_* APIs, have the other process fill that, then copy it to the lib's buffer in the lib's local process, but this has the overhead of an extra (potentially large) copy.
Is there a way to share memory that has ALREADY been mapped for a process? eg something like:
[local lib process]
getBuffer (&myLocalBuf, &mySize);
shmName = shareThisMemory (myLocalBuf, mySize);
[other process]
myLocalBuf = openTheSharedMemory (shmName);
That way the other process could write directly into the lib's buffer.
(Synchronization between the processes is already taken care of so no problems there).
There are good reasons for not allowing this functionality, particularly from the security side of things. A "share this mem" API would subvert the access permissions system.
Just assume an application holds some sort of critical/sensitive information in memory; the app links (via e.g. using a shared library, a preload, a modified linker/loader) to whatever component outside, and said component for the sheer fun of it decides to "share out the address space". It'd be a free-for-all, a method to bypass any sort of data access permission/restriction. You'd tunnel your way into the app.
Not good for your usecase, admitted, but rather justified from the system / application integrity point of view. Try searching the web for /proc/pid/mem mmap vulnerability for some explanation why this sort of access isn't wanted (in general).
If the library you use is designed to allow such shared access, it must itself provide the hooks to either allocate such a shared buffer, or use an elsewhere-preallocated (and possibly shared) buffer.
Edit: To make this clear, the process boundary is explicitly about not sharing the address space (amongst other things).
If you require a shared address space, either use threads (then the entire address space is shared and there's never any need to "export" anything), or explicitly set up a shared memory region in the same way as you'd set up a shared file.
Look at it from the latter point of view, two processes not opening it O_EXCL would share access to a file. But if one process already has it open O_EXCL, then the only way to "make it shared" (open-able to another process) is to close() it first then open() it again without O_EXCL. There's no other way to "remove" exclusive access from a file that you've opened as such other than to close it first.
Just as there is no way to remove exclusive access to a memory region mapped as such other than to unmap it first - and for a process' memory, MAP_PRIVATE is the default, for good reasons.
More: a process-shared memory buffer really isn't much different than a process shared file; using SysV-IPC style semantics, you have:
| SysV IPC shared memory Files
==============+===================================================================
creation | id = shmget(key,..., IPC_CREAT); fd = open("name",...,O_CREAT);
lookup | id = shmget(key,...); fd = open("name",...);
access | addr = shmat(id,...); addr = mmap(...,fd,...);
|
global handle | IPC key filename
local handle | SHM ID number filedescriptor number
mem location | created by shmat() created by mmap()
I.e. the key is the "handle" you're looking for, pass that the same way you would pass a filename, and both sides of the IPC connection can then use that key to check whether the shared resource exists, as well at access (attach to the handle) the contents though that.
A more modern way to share memory among processes is to use the POSIX shm_open() API.
Essentially, it's a portable way of putting files on a ramdisk (tmpfs). So one process uses shm_open plus ftruncate plus mmap. The other uses shm_open (with the same name) plus mmap plus shm_unlink. (With more than two processes, the last one to mmap it can unlink it.)
This way the shared memory will get reclaimed automatically when the last process exits; no need to explicitly remove the shared segment (as with SysV shared memory).
You still need to modify your application to allocate shared memory in this way, though.
In theory at least, you can record the memory address of the buffer you got from your lib and have the other process mmap /proc/$PID_OF_FIRST_PROCCESS/mem file with the address as the offset.
I haven't tested it and I'm not sure /proc/PID/mem actually has an mmap file op implemented and there are a ton of security consideration but it might work. Best of luck :-)

Resources