I am rendering a certain scene to an off-screen frame buffer (FBO) and then I'm reading the rendered image using glReadPixels() for processing on the CPU. The processing involves some very simple scanning routines and extraction of data.
After profiling I realized that most of what my application does is spend time in glReadPixels() - more than 50% of the time. So the natural step is to move the processing to the GPU so that the data would not have to be copied.
So my question is - what would be the best way to program such a thing to the GPU?
GLSL?
CUDA?
Anything else I'm not currently aware of?
The main requirements is that it'll have access to The rendered off-screen frame bufferes (or texture data since it is possible to render to a texture) and to be able to output some information to the CPU, say in the order of 1-2Kb per frame.
You might find the answers in the "Intro to GPU programming" questions useful.
-Adam
There are a number of pointers to getting started with GPU programming in other questions, but if you have an application that is already built using OpenGL, then probably your question really is "which one will interoperate with OpenGL"?
After all, your whole point is to avoid the overhead of reading your FBO back from the GPU to the CPU with glReadPixels(). If, for example you had to read it back anyway, then copy the data into a CUDA buffer, then transfer it back to the gpu using CUDA APIs, there wouldn't be much point.
So you need a GPGPU package that will take your OpenGL FBO object as an input directly, without any extra copying.
That would probably rule out everything except GLSL.
I'm not 100% sure whether CUDA has any way of operating directly on an OpenGL buffer object, but I don't think it has that feature.
I am sure that ATI's Stream SDK doesn't do that. (Although it will interoperate with DirectX.)
I doubt that the DirectX 11 "technology preview" with compute shaders has that feature, either.
EDIT: Follow-up: it looks like CUDA, at least the most recent version, has some support for OpenGL interoperability. If so, that's probably your best bet.
I recently found this Modern GPU
You may find OpenAI Triton useful
Related
I want to use Electron as a debug overlay for a Vulkan Render Engine im building. Since i have a lot of requirements on this debug tool writing one in engine myself would take way too long. I would like to use electron instead of Qt or similar since i feel its a lot more powerful and flexible with less effort (once its working).
The problem is now that i somehow either have to get my render output to electron or electrons output to my engine. As far as i can tell the easiest solution would be to copy the data back to cpu then transfer it. But that would be extremely slow and cost a lot of bandwidth. So i was wondering if there is a better solution.
I have two ideas to make it work but i didnt find any ways to implement them or even anyone talking about it.
The first would be to have electron configured to run on the gpu somehow get the handle for the output texture and importing it into my render engine using vulkan external memory. However as i have no experience with chromium and there doesnt seem to be anyone else that did it this i dont think it would work out to well.
The second idea was to do the opposite. Using a canvas element with webgl and again using vulkan external memory to copy the output of my engine to a texture and displaying it. I have full control over the draw process here so i think it would be a lot simpler and more stable. However again i found no way of setting up a webGL texture handle as an external memory object.
Is there any better way of doing this or some help on how to implement it?
I am a mathematician and not a programmer, I have a notion on the basics of programming and am a quite advanced power-user both in linux and windows.
I know some C and some python but nothing much.
I would like to make an overlay so that when I start a game it can get info about amd and nvidia GPUs like frame time and FPS because I am quite certain the current system benchmarks use to compare two GPUs is flawed because small instances and scenes that bump up the FPS momentarily (but are totally irrelevant in terms of user experience) result in a higher average FPS number and mislead the market either unintentionally or intentionally (for example, I cant remember the name of the game probably COD there was a highly tessellated entity on the map that wasnt even visible to the player which lead AMD GPUs to seemingly under perform when roaming though that area leading to lower average FPS count)
I have an idea on how to calculate GPU performance in theory but I dont know how to harvest the data from the GPU, Could you refer me to api manuals or references to help me making such an overlay possible?
I would like to study as little as possible (by that I mean I would like to learn what I absolutely have to learn in order to get the job done I dont intent to become a coder).
I thank you in advance.
It is generally what the Vulkan Layer system is for, which allows to intercept API commands and inject your own. But it is nontrivial to code it yourself. Here are some pre-existing open-source options for you:
To get to timing info and draw your custom overlay you can use (and modify) a tool like OCAT. It supports Direct3D 11, Direct3D 12, and Vulkan apps.
To just get the timing (and other interesting info) as CSV you can use a command-line tool like PresentMon. Should work in D3D, and I have been using it with Vulkan apps too and it seems to accept them.
I am a desktop GL developer, and I am starting to explore the mobile world.
To avoid misunderstandings, or welcome-but-trivial replies, I can humbly say that I am pretty well aware of the GL and GL|ES machinery.
The short question is: if we are using GL|ES 2.0 in a shared memory architecture, what's the point behind using VBOs against client-side arrays?
More in detail:
Vertex buffers are raw chunks of memory, the driver cannot in any way optimize anything because the access pattern depends on: 1) how the application configures vertex data layout, 2) how a vertex shader consumes buffer content, and 3) we can have lots of vertex shaders operating in different ways, and differently sourcing the same buffer.
Alignment: individual VBO storage could start at addresses that are optimal to the underlying GL system; what if I just force (e.g, respect alignment best practices) client-side arrays allocation to these boundaries?
Tile-Based Rendering vs. Immediate Mode architectures should not come into play: to my understanding, this is not related to my question (i.e., memory access).
I understand that using VBOs can have your code run better/faster in future platforms/hardware without modifying it, but this is not the focus of this question.
Alongside, I also realize that using VBOs in a shared memory architecture doubles memory usage (if you, for some reason, have to keep vertex data at your disposal), and it costs you a memcpy of the data.
As with interleaved vertex arrays, VBO usage has got a great "hype" in developers' forums/blogs/official_technotes without any data supporting those statements (i.e., benchmarks).
Is VBO usage worth it on shared memory architectures?
Do client-side arrays work well?
What do you think/know about this?
I can report that using of VBOs to store vertex data on Android devices gave me zero performance improvement. Tried it on Adreno, Mali400 and PowerVR GPUs. However, we use VBOs considering that it is the best practice for OpenGL ES.
You can find notes about this in our article (Vertex Buffer Objects paragraph).
According to this report, even holding SMA constant it depends on both the OpenGL implementation (some VBO work is secretly done on the CPU) and the size of the VBOs:
http://sarofax.wordpress.com/2011/07/10/vbo-vertex-buffer-object-limitations-on-ios-devices/
I will tell you, what i know about iOS platform.
VBO does really improve your performance.
VBO is perfect, if you have a static geometry - once copied, no additional overhead on every draw call. CA will do copy your data from client memory to "gpu memory" every drawcall. It may do data realigning, if you forgot about it.
VBO can be mapped to gpu vie glMapBuffer - it is an asynchronous operation, meaning, it almost has no overhead, but you should remember - when you are mapping\unmapping your buffer, it's better to use it 2 frames after unmap operation - to avoid synchronization
Apple engineers proclaim, that VBO will have better performance, than CA on SGX hardware, even if you'll reupload it every frame - i don't know the details.
VBO is a best practice. CA are deprecated. Better keep in pace with modern trends and stay as much cross-platform, as possible
I am working on an image processing app for the iOS, and one of the various stages of my application is a vector based image posterization/color detection.
Now, I've written the code that can, per-pixel, determine the posterized color, but going through each and every pixel in an image, I imagine, would be quite difficult for the processor if the iOS. As such, I was wondering if it is possible to use the graphics processor instead;
I'd like to create a sort of "pixel shader" which uses OpenGL-ES, or some other rendering technology to process and posterize the image quickly. I have no idea where to start (I've written simple shaders for Unity3D, but never done the underlying programming for them).
Can anyone point me in the correct direction?
I'm going to come at this sideways and suggest you try out Brad Larson's GPUImage framework, which describes itself as "a BSD-licensed iOS library that lets you apply GPU-accelerated filters and other effects to images, live camera video, and movies". I haven't used it and assume you'll need to do some GL reading to add your own filtering but it'll handle so much of the boilerplate stuff and provides so many prepackaged filters that it's definitely worth looking into. It doesn't sound like you're otherwise particularly interested in OpenGL so there's no real reason to look into it.
I will add the sole consideration that under iOS 4 I found it often faster to do work on the CPU (using GCD to distribute it amongst cores) than on the GPU where I needed to be able to read the results back at the end for any sort of serial access. That's because OpenGL is generally designed so that you upload an image and then it converts it into whatever format it wants and if you want to read it back then it converts it back to the one format you expect to receive it in and copies it to where you want it. So what you save on the GPU you pay for because the GL driver has to shunt and rearrange memory. As of iOS 5 Apple have introduced a special mechanism that effectively gives you direct CPU access to OpenGL's texture store so that's probably not a concern any more.
I want to do projects to make my resume more appealing to game companies. So I am going to start buying books. But I don't know rather to read DirectX 9 or 10 api books to start off with. DirectX10 is great, but it seems the industry is moving slow to 10. so should I use 9 or go with 10 ??
I would suggest learning the basics using directx9 and then rapidly moving on to dx11. DirectX11 is harder to get started in than DirectX9 because it's slightly more complex but also a lot of the utility functions in D3DX are no longer there, or have been moved to source code like the effects framework. This is no bad thing, but it does make it signifiacantly more complex to learn as you have to learn a lot more things at once.
Spend 2 or 3 weeks learning DX9 then move to DX11 for "real" work :P
Learn basic DX9 using the fixed pipeline and d3dx for loading models etc. It's a lot simpler than DX11 and much better documented, and you'll get a triangle and then a model on screen very much faster. Play with that until you completely understand the basic concepts and tranformations.
But then rewrite it all using shaders only. You'll need to use them in DX10/11 anyway but it's a lot easier to learn when you already have a working framework of code, and it's a lot simpler to get that working in DX9.
Once you have that working, learn DX11. You'll have to switch math libraries. You'll have to invent your own model formats and loaders. You'll have to either invent your own effects framework or use the example one, but they are all much easier now you already know the basics of 3d and programming shaders.
TBH further to OneOfOne's comment if you know how to do 3D development in GL, D3D9, D3D10 or D3D11 then you can transfer those skills to any of the others with a little bit of work.
Personally I'd aim for D3D11 as that way you are learning the cutting edge. You'll find you'll be able to do GL, D3D9 or D3D10 with a little work. Do enough work on the theory and you'll discover that its not even that hard to transfer the skills to a fully software engine.
If your intention is really to learn a skill that you would use in the game industry, stick with DirectX 9. Since DirectX 10 and 11 both require Vista or Window 7, game developers are still mostly ignoring them and targeting DirectX 9 in order to have support for Windows XP.
That being said, it doesn't really matter which you start with. The differences are not that large. If you understand the concepts behind 3D APIs and how the GPU pipeline works, you can pick up any of the three or even OpenGL with minimal effort.
Fact is, you need to learn both.
As long as 50% of gamers are still on WinXP, you're going to need to be able to program in Direct3D9.
D3D9 isn't any easier to get started with than D3D10/11. Its the same principles, with vertices to be placed, normals to be calculated, and meshes to be rendered. Whether you're creating a ID3D11BlendState structure or calling IDirect3DDevice9::SetRenderState(), its the same concept, just different ways of doing it.
After working with d3d11 a couple of days, I've come to think of it as better than DX9 in a lot of ways. For one, you're able to use the full caps of the GPU including geometry shaders. 2nd, it forces you to fully understand the graphics pipeline to even draw anything (note how functions are named after the stage of the pipeline they affect: here: (IA* fcns: input-assembler stage, OM* fcns: output-merger stage etc) ). This may result in a slightly larger INITIAL startup curve, but once you get it, its not any harder than D3D9 and is better, since the very naming of the functions helps concepts stick.
So get going on both, and learning them in tandem may help reduce the amount of effort you spend learning deprecated API's/methods of doing things from DX9 (ie you really want to spend more time using shaders, and don't use the fixed function pipeline section of DX9 too much).
You can check Luna's books for DX9 /DX11(I suggest you start with 11). You can check out http://www.rastertek.com/tutdx11.html but he doesn't explain everything so you can go in Luna s book to see what is with those functions or properties
With some little exceptions, DX10 is just a legacy free DX9. For example DX9 had build in options for rendering Flatshaded, Textured or using a Shader. In DX10 these options are gone, you always have to use a real shader. If you want to do flatshading, write a HLSL shader that does flat shading.
So I would suggest you learn DX10 (or DX11). You will be able to adopt fast to DX9 but with a more modern coding style by not using legacy functions. They can be quiet confusing, so DX10 will focus you on relevant things.
If you are a real beginner, and setting up a vertex-buffer to create a single triangle is confusing you (as real 3D-Programmer you are no more interesten in single triangles) I even would suggest to start with OpenGL. You will have faster success, but in reality this can be a little bit distracting as DX9-Legacy if you want to focus on modern 3D-Coding.
Yes do not waste your time with DX10 it was never really adopted as the industry standard for any period of time, there wasn't any big enough changes to warrant people upgrading from DX9 but for DX11 there was.
I suggest directx 11, there's no reason in my opinion to waste time on deprecated functions or techniques.
Learning shaders from the start will make things way more clear
Try doing the samples from the sample folder of both 9 and 10, and if your computer can support it, 11. This is what I am doing.