I have not found any definitive answers to this, so decided to ask here. Is there really no way to save and load compiled webgl shaders? It seems a waste to compile the shaders every time someone loads the page, when all you would have to do is compile the shaders once, save it to a file, then load the compiled shader object, as you would HLSL (I know it's not GLSL, but i'm still a little new to OpenGL).
So, if possible, how can i save and load a compiled shader in webgl?
There really is no way, and imho thats a good thing. It would pose a security issue(feeding arbitrary bytecode to the GPU) in addition to that when drivers are updated the precompiled shaders are potentially missing new optimizations or just break.
when all you would have to do is compile the shaders once, save it to a file, then load the compiled shader object, as you would HLSL
OpenGL(and its derivatives) does not support loading pre-compiled shaders the same way DirectX does:
Program binary formats are not intended to be transmitted. It is not reasonable to expect different hardware vendors to accept the same binary formats. It is not reasonable to expect different hardware from the same vendor to accept the same binary formats.
https://www.opengl.org/wiki/Shader_Compilation#Binary_limitations
There seems to be no intermediate format like SPIR-V in OpenGL so you would need to compile the shaders on the target platform introducing a whole lot of additional concerns with users changing their graphics cards / employing a hybrid graphics solution, storage limitations on the client(5 MB using localstorage) and the possibility of abusing it to fingerprint the hardware.
Related
I am making a 2D game with Direct3D,and use D3DXCreateTextureFromFileInMemoryEx to load my images of game.There are 221KB size of images in my game. But when I use D3DXCreateTextureFromFileInMemoryEx to load the image to memory.They are becoming Dozens of times in the memory of use.
So,how can i to reduce the memory of use?
The D3DX Texture loading functions by default are very accomodating and will try and be helpful by performing all kinds of format conversions, compression / decompression, scaling, mip-map generation and other tasks for you to turn an arbitrary source image into a usable texture.
If you want to prevent any of that conversion (which can be expensive in both CPU cost and memory usage) then you need to ensure that your source textures are already in exactly the format you want and then pass flags to D3DX to tell it not to perform any conversions or scaling.
Generally the only file format that supports all of the texture formats that D3D uses is the D3D format DDS so you will want to save all of your source assets as DDS files. If the software you use to make your textures does not directly support saving DDS files you can use the tools that come with the DirectX SDK to do the conversion for you, write your own tools or use other third party conversion tools.
If you have the full DirectX SDK installed (the June 2010 SDK, the last standalone release) you can use the DirectX Texture Tool which is installed along with it. This isn't shipped with the Windows 8 SDK though (the DirectX SDK is now part of the Windows SDK and doesn't ship as a standalone SDK).
Once you have converted your source assets to DDS files with exactly the format you want to use at runtime (perhaps DXT compressed, with a full mipmap chain pre-generated) you want to pass the right flags to D3DX texture loading functions to avoid any conversions. This means you should pass:
D3DX_DEFAULT_NONPOW2 for the Width and Height parameters and D3DX_DEFAULT for the MipLevels parameter (meaning take the dimensions from the file and don't round the size to a power of 2)*.
D3DFMT_FROM_FILE for the Format parameter (meaning use the exact texture format from the file).
D3DX_FILTER_NONE for the Filter and MipFilter parameters (meaning don't perform any scaling or filtering).
If you pre-process your source assets to be in the exact format you want to use at runtime, save them as DDS files and set all the flags above to tell D3DX not to perform any conversions then you should find your textures load much faster and without excessive memory usage.
*This requires your target hardware supports non power of 2 textures or you only use power of 2 textures. Almost all hardware from the last 5 years supports non power of 2 textures though so this should be pretty safe.
I am a desktop GL developer, and I am starting to explore the mobile world.
To avoid misunderstandings, or welcome-but-trivial replies, I can humbly say that I am pretty well aware of the GL and GL|ES machinery.
The short question is: if we are using GL|ES 2.0 in a shared memory architecture, what's the point behind using VBOs against client-side arrays?
More in detail:
Vertex buffers are raw chunks of memory, the driver cannot in any way optimize anything because the access pattern depends on: 1) how the application configures vertex data layout, 2) how a vertex shader consumes buffer content, and 3) we can have lots of vertex shaders operating in different ways, and differently sourcing the same buffer.
Alignment: individual VBO storage could start at addresses that are optimal to the underlying GL system; what if I just force (e.g, respect alignment best practices) client-side arrays allocation to these boundaries?
Tile-Based Rendering vs. Immediate Mode architectures should not come into play: to my understanding, this is not related to my question (i.e., memory access).
I understand that using VBOs can have your code run better/faster in future platforms/hardware without modifying it, but this is not the focus of this question.
Alongside, I also realize that using VBOs in a shared memory architecture doubles memory usage (if you, for some reason, have to keep vertex data at your disposal), and it costs you a memcpy of the data.
As with interleaved vertex arrays, VBO usage has got a great "hype" in developers' forums/blogs/official_technotes without any data supporting those statements (i.e., benchmarks).
Is VBO usage worth it on shared memory architectures?
Do client-side arrays work well?
What do you think/know about this?
I can report that using of VBOs to store vertex data on Android devices gave me zero performance improvement. Tried it on Adreno, Mali400 and PowerVR GPUs. However, we use VBOs considering that it is the best practice for OpenGL ES.
You can find notes about this in our article (Vertex Buffer Objects paragraph).
According to this report, even holding SMA constant it depends on both the OpenGL implementation (some VBO work is secretly done on the CPU) and the size of the VBOs:
http://sarofax.wordpress.com/2011/07/10/vbo-vertex-buffer-object-limitations-on-ios-devices/
I will tell you, what i know about iOS platform.
VBO does really improve your performance.
VBO is perfect, if you have a static geometry - once copied, no additional overhead on every draw call. CA will do copy your data from client memory to "gpu memory" every drawcall. It may do data realigning, if you forgot about it.
VBO can be mapped to gpu vie glMapBuffer - it is an asynchronous operation, meaning, it almost has no overhead, but you should remember - when you are mapping\unmapping your buffer, it's better to use it 2 frames after unmap operation - to avoid synchronization
Apple engineers proclaim, that VBO will have better performance, than CA on SGX hardware, even if you'll reupload it every frame - i don't know the details.
VBO is a best practice. CA are deprecated. Better keep in pace with modern trends and stay as much cross-platform, as possible
I am developing a large game that streams in level data (including shaders) as you move through the game world. I do not want to have hitches in my frame rate as shaders are compiled/linked or on the first time they are used.
I have my shader compilation and linking working on a separate thread with its own open-gl context. But I have not been able to get the prewarming of the shaders to work on the separate thread (so that there is no performance hit when the shader is first used).
Prewarming is really not mentioned anywhere in the iOS or OpenGL docs. It is however mentioned in the OpenGL ES Analyzer (one of the instruments available when profiling from xcode). In this tool I get a "Shader Compiled Outside of Prewarming Phase" warning each time something is rendered with a shader that has not been used to render something before. The "Extended detail" says this:
"OpenGL ES Analyzer detected a shader compilation that is not part of an initial prewarming phase. Shader compilation can be a time consuming operation. To avoid them, prewarm all shaders used for rendering. To do this, make a prewarming passwhen your application launches and execute a drawing call with each of the shader programs to be used, using any gl state settings the shader program will be used in conjunction with. States such as blending, color mask, logic ops, multisamping, texture formats, and point primitive state can all affect shader compilation."
The term "compilation" is a little confusing here. The vertex and fragment shaders have already been compiled and the program has been linked. But the first time something is rendered with a given OpenGL state it does some more work on the shader to optimize it for that state I guess.
I have code to pre-warm the shaders by rendering a zero sized triangle before it's first use.
If I compile, link and pre-warm the shaders on the main thread with the same Open GL context as the normal rendering then it works. However if I do it on the background thread with its separate Open GL context it does not work (it still gets the Analyzer warning on first use).
So... it could be that prewarming a shader on a separate context has no effect on other contexts. Or it could be that I don't have all the same state set up the separate context. There is a lot of potential Open GL state that might need to be set up. I'm using an offscreen render buffer on the background thread so that could be considered part of the state.
Has anyone succeeded in getting prewarming working on a background thread?
To be honest with you I was quite ignorant on this matter until yesterday though I have been working on my engine optimization for a while. So, first of all, thank you for the tip :).
I have studied since then the shader warming topic and I have not found much around.
I have found a mention the official AMD documentation in a document titled "ATI OpenGL Programming and Optimization Guide":
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=0CEoQFjAF&url=http%3A%2F%2Fdeveloper.amd.com%2Fmedia%2Fgpu_assets%2FATI_OpenGL_Programming_and_Optimization_Guide.pdf&ei=3HIeT_-jKYbf8AOx3o3BDg&usg=AFQjCNFProzLiXf5Aqqs4jZ2jOb4x0pssg&sig2=6YV7SVA97EFglXv_SX5weg
This is an excerpt of which refers to the warming of the shaders:
Quote:
While the R500 natively supports flow control in the fragment shading unit, the R300 and R400
asics does not. Static flow control for the R300 and R400 is emulated by the driver compiling out
unused conditionals and unrolling loops based on the set constants. Even though the R500 asics family
natively support flow control, the driver will still attempt to compile out static flow conditions enabling
it to reorganize shader instructions for better instruction scheduling. The driver will also try to cache
away the compiled shader for a specific static flow condition set in anticipation for its reuse. So when
writing a fragment program that uses static flow control, it is recommended to “warm” the shader cache
by rendering a dummy triangle on the very first frame that uses the common static conditional
permutations relevant for the life of the shader.
The best explanation I have found around is the following:
http://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/
Quote:
Incidentally, this is also the reason why you’ll often see a delay the first time you use a new shader or resource; a lot of the creation/compilation work is deferred by the driver and only executed when it’s actually necessary (you wouldn’t believe how much unused crap some apps create!). Graphics programmers know the other side of the story – if you want to make sure something is actually created (as opposed to just having memory reserved), you need to issue a dummy draw call that uses it to “warm it up”. Ugly and annoying, but this has been the case since I first started using 3D hardware in 1999 – meaning, it’s pretty much a fact of life by this point, so get used to it. :)
In this presentation, it is mentioned how the cryteck engined performed it on the far cry engine though it is mostly related to DirectX.
http://www.powershow.com/view/11f2b1-MzUxN/Far_Cry_and_DirectX_flash_ppt_presentation
I hope these links help in some way.
Is the format of compiled pixel and vertex shader object files as produced by fxc.exe documented anywhere either officially or unofficially?
I'd like to be able to read the constant name to register assignments from the shader files. I know that the effects framework in D3DX can do this, but I need to avoid using D3DX as it may not be installed on user's machines and I don't need it for anything else so I want to avoid them having to run the directx update.
If the effects framework can do it, then so can I if I can find out the file format but I can' seem to find it documented anywhere.
(this is for use in directx9)
From MSDN:
Asm Shader Reference (Windows)
Shader Binary Format
The bitwise layout of the shader instruction stream is defined in D3d9types.h. If you want to design your own shader compiler or construction tools and you want more information about the shader token stream, refer to the Direct3D 9 Driver Development Kit (DDK).
So you can either look through 'D3d9type.h' and try to figure it out that way (had a quick look and could see the enums/types you should need, but not how its structured), or download the DDK and read the official documentation.
Some more info can be found here: Direct3D Shader Codes (expand the tree on the left hand side of the screen to get all the info).
Microsoft deliberately keep this information away from you. As you are using DirectX 9 its relatively easy to backward engineer the format though. If you write a simple piece of shader assembly you can check out what he compiled code returned at the other side is. By making modifications to the assembler you can see how they byte code changes. You will start to see patterns in how registers are handled and where the instruction is encoded. You can thus, slowly but surely, work out the byte code. It won't be too quick though!
Microsoft has put the format specification online here: Direct3D Shader Codes.
It refers to constants by name, however (eg. D3DSIO_DCL), so you'll likely still need the Windows DDK to get any use out of it.
I am rendering a certain scene to an off-screen frame buffer (FBO) and then I'm reading the rendered image using glReadPixels() for processing on the CPU. The processing involves some very simple scanning routines and extraction of data.
After profiling I realized that most of what my application does is spend time in glReadPixels() - more than 50% of the time. So the natural step is to move the processing to the GPU so that the data would not have to be copied.
So my question is - what would be the best way to program such a thing to the GPU?
GLSL?
CUDA?
Anything else I'm not currently aware of?
The main requirements is that it'll have access to The rendered off-screen frame bufferes (or texture data since it is possible to render to a texture) and to be able to output some information to the CPU, say in the order of 1-2Kb per frame.
You might find the answers in the "Intro to GPU programming" questions useful.
-Adam
There are a number of pointers to getting started with GPU programming in other questions, but if you have an application that is already built using OpenGL, then probably your question really is "which one will interoperate with OpenGL"?
After all, your whole point is to avoid the overhead of reading your FBO back from the GPU to the CPU with glReadPixels(). If, for example you had to read it back anyway, then copy the data into a CUDA buffer, then transfer it back to the gpu using CUDA APIs, there wouldn't be much point.
So you need a GPGPU package that will take your OpenGL FBO object as an input directly, without any extra copying.
That would probably rule out everything except GLSL.
I'm not 100% sure whether CUDA has any way of operating directly on an OpenGL buffer object, but I don't think it has that feature.
I am sure that ATI's Stream SDK doesn't do that. (Although it will interoperate with DirectX.)
I doubt that the DirectX 11 "technology preview" with compute shaders has that feature, either.
EDIT: Follow-up: it looks like CUDA, at least the most recent version, has some support for OpenGL interoperability. If so, that's probably your best bet.
I recently found this Modern GPU
You may find OpenAI Triton useful