I am new to directx, and I am trying to understand why do we need to lock surface before we can manipulate it? Can anyone help me explain?
You have to Lock surfaces and buffers in order to tell the GPU, that you are manipulating these resources. These are necessary to syncronize the GPU and the program running on the CPU.
Related
I want to use Electron as a debug overlay for a Vulkan Render Engine im building. Since i have a lot of requirements on this debug tool writing one in engine myself would take way too long. I would like to use electron instead of Qt or similar since i feel its a lot more powerful and flexible with less effort (once its working).
The problem is now that i somehow either have to get my render output to electron or electrons output to my engine. As far as i can tell the easiest solution would be to copy the data back to cpu then transfer it. But that would be extremely slow and cost a lot of bandwidth. So i was wondering if there is a better solution.
I have two ideas to make it work but i didnt find any ways to implement them or even anyone talking about it.
The first would be to have electron configured to run on the gpu somehow get the handle for the output texture and importing it into my render engine using vulkan external memory. However as i have no experience with chromium and there doesnt seem to be anyone else that did it this i dont think it would work out to well.
The second idea was to do the opposite. Using a canvas element with webgl and again using vulkan external memory to copy the output of my engine to a texture and displaying it. I have full control over the draw process here so i think it would be a lot simpler and more stable. However again i found no way of setting up a webGL texture handle as an external memory object.
Is there any better way of doing this or some help on how to implement it?
I am a desktop GL developer, and I am starting to explore the mobile world.
To avoid misunderstandings, or welcome-but-trivial replies, I can humbly say that I am pretty well aware of the GL and GL|ES machinery.
The short question is: if we are using GL|ES 2.0 in a shared memory architecture, what's the point behind using VBOs against client-side arrays?
More in detail:
Vertex buffers are raw chunks of memory, the driver cannot in any way optimize anything because the access pattern depends on: 1) how the application configures vertex data layout, 2) how a vertex shader consumes buffer content, and 3) we can have lots of vertex shaders operating in different ways, and differently sourcing the same buffer.
Alignment: individual VBO storage could start at addresses that are optimal to the underlying GL system; what if I just force (e.g, respect alignment best practices) client-side arrays allocation to these boundaries?
Tile-Based Rendering vs. Immediate Mode architectures should not come into play: to my understanding, this is not related to my question (i.e., memory access).
I understand that using VBOs can have your code run better/faster in future platforms/hardware without modifying it, but this is not the focus of this question.
Alongside, I also realize that using VBOs in a shared memory architecture doubles memory usage (if you, for some reason, have to keep vertex data at your disposal), and it costs you a memcpy of the data.
As with interleaved vertex arrays, VBO usage has got a great "hype" in developers' forums/blogs/official_technotes without any data supporting those statements (i.e., benchmarks).
Is VBO usage worth it on shared memory architectures?
Do client-side arrays work well?
What do you think/know about this?
I can report that using of VBOs to store vertex data on Android devices gave me zero performance improvement. Tried it on Adreno, Mali400 and PowerVR GPUs. However, we use VBOs considering that it is the best practice for OpenGL ES.
You can find notes about this in our article (Vertex Buffer Objects paragraph).
According to this report, even holding SMA constant it depends on both the OpenGL implementation (some VBO work is secretly done on the CPU) and the size of the VBOs:
http://sarofax.wordpress.com/2011/07/10/vbo-vertex-buffer-object-limitations-on-ios-devices/
I will tell you, what i know about iOS platform.
VBO does really improve your performance.
VBO is perfect, if you have a static geometry - once copied, no additional overhead on every draw call. CA will do copy your data from client memory to "gpu memory" every drawcall. It may do data realigning, if you forgot about it.
VBO can be mapped to gpu vie glMapBuffer - it is an asynchronous operation, meaning, it almost has no overhead, but you should remember - when you are mapping\unmapping your buffer, it's better to use it 2 frames after unmap operation - to avoid synchronization
Apple engineers proclaim, that VBO will have better performance, than CA on SGX hardware, even if you'll reupload it every frame - i don't know the details.
VBO is a best practice. CA are deprecated. Better keep in pace with modern trends and stay as much cross-platform, as possible
I've known about screen capture using Device Contexts and GDI, since windows XP. Is there a better way (i.e. DirectX?) now that the desktop is mostly Direct3D.
How can I screen capture using DirectX?
I want to know the most efficent way to user-mode screen capture. For a tech support program that needs frequent screen scrapes.
UPDATE: I don't want to resort to using kernel mode drivers.
I am unsure this will actually be faster than the algorithms you have in mind, but one way to do it would be to copy your buffer out using GetRenderTargetData.
GetRenderTargetData
Based upon vcsjones answer (above). See CodeProject http://www.codeproject.com/KB/dialog/screencap.aspx#And%20The%20DirectX%20way%20of%20doing%20it%20
An alternative method is using Spazzarama's application, which uses DirectX (based on SlimDx) and Easyhook to inject your capture dll into a running application's DirextX pipeline.
My application seems to be slow, but in terms of CPU and RAM, it seems that it is OK. So, I want to know how much memory of the graphic card I am using. I've seen some questions about this on SO, but they talk about Linux or NVidia. I would like to have this information for ATI cards on Windows.
Thanks.
How about the OpenGL debugger?
if you use OpenSceneGraph in order to render scene, there is a stats monitor that shows the usage of GPU.
I am rendering a certain scene to an off-screen frame buffer (FBO) and then I'm reading the rendered image using glReadPixels() for processing on the CPU. The processing involves some very simple scanning routines and extraction of data.
After profiling I realized that most of what my application does is spend time in glReadPixels() - more than 50% of the time. So the natural step is to move the processing to the GPU so that the data would not have to be copied.
So my question is - what would be the best way to program such a thing to the GPU?
GLSL?
CUDA?
Anything else I'm not currently aware of?
The main requirements is that it'll have access to The rendered off-screen frame bufferes (or texture data since it is possible to render to a texture) and to be able to output some information to the CPU, say in the order of 1-2Kb per frame.
You might find the answers in the "Intro to GPU programming" questions useful.
-Adam
There are a number of pointers to getting started with GPU programming in other questions, but if you have an application that is already built using OpenGL, then probably your question really is "which one will interoperate with OpenGL"?
After all, your whole point is to avoid the overhead of reading your FBO back from the GPU to the CPU with glReadPixels(). If, for example you had to read it back anyway, then copy the data into a CUDA buffer, then transfer it back to the gpu using CUDA APIs, there wouldn't be much point.
So you need a GPGPU package that will take your OpenGL FBO object as an input directly, without any extra copying.
That would probably rule out everything except GLSL.
I'm not 100% sure whether CUDA has any way of operating directly on an OpenGL buffer object, but I don't think it has that feature.
I am sure that ATI's Stream SDK doesn't do that. (Although it will interoperate with DirectX.)
I doubt that the DirectX 11 "technology preview" with compute shaders has that feature, either.
EDIT: Follow-up: it looks like CUDA, at least the most recent version, has some support for OpenGL interoperability. If so, that's probably your best bet.
I recently found this Modern GPU
You may find OpenAI Triton useful