After more than an hour of looking for an answer (trying stuff in Munshi's "OpenGL ES 2.0 Programming Guide", searching Apple's documentation, searching StackOverflow), I'm still at a loss for getting glReadPixels to work. I've tried so many different ways, and the best I've got is fluctuating (and therefore wrong) results.
I've set up the simple case of a quad being rendered with shaders to the screen, and I've manually assigned gl_FragColor to pure red, so there should be absolutely no fluctuation on the screen. Then I try something like the following code before presentRenderbuffer:
GLubyte *pixels = (GLubyte *)malloc(3);
glReadPixels(100, 100, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, pixels);
NSLog(#"%d", (int)pixels[0]);
free(pixels);
Basically, trying to read a single pixel at (100, 100) and read the red value of it, which I expect to be either 1 or 255. Instead I get values like 1522775, 3587, and 65536, though the image on the screen never changes. I did something like this on Mac and it worked fine, but for some reason I can't get this to work on iOS. I have the above statement (and have tried a number of variations I've come across on the internet) after the call to glDrawArrays() and before the presentBuffer: call. I've even tried the method from "OpenGL ES 2.0 Programming Guide" that handles all cases of read-types and read-formats by querying glGetIntegerv() for the framebuffer's information.
Any ideas? I'm sure someone will say, "use the search feature," but I've seriously come up dry on it and can't get any further. Thanks for your help!
From the Opengl ES Manual:
Only two format/type parameter pairs are accepted. GL_RGBA/GL_UNSIGNED_BYTE is always accepted, and the other acceptable pair can be discovered by querying GL_IMPLEMENTATION_COLOR_READ_FORMAT and GL_IMPLEMENTATION_COLOR_READ_TYPE.
So, you better use GL_RGBA instead of GL_RGB.
Related
I have a certain library that uses WebGL1 to render things.
It heavily uses float textures and instanced rendering.
Nowadays it seems like support for WebGL1 is pretty weird, with some devices supporting for example WebGL2, where these extensions are core, but not supporting WebGL1, or supporting it, but not the extensions.
At the same time, support for WebGL2 isn't amazing. Maybe one day it will be, but for not it isn't.
I started looking at what it will take to support both versions.
For shaders, I think I can mostly get away with #defineing things. For example, #define texture2D texture and other similar things.
When it comes to extensions, it becomes more problematic, since the extension objects no longer exist.
As an experiment, I tried copying the extension properties into the context object, e.g. gl.drawArraysInstanced = (...args) => ext.drawArraysInstancedANGLE(...args).
When it comes to textures, not much needs to be changed, perhaps add something like gl.RGBA8 = gl.RGBA when running in WebGL1, thus it will "just work" when running in WebGL2.
So then comes the question...did anyone try this?
I am worried about it hurting performance, especially the extra indirection for function calls.
It will also make reading the code less obvious if the assumption is that it can run in WebGL1. After all, no WebGL1 context has drawArraysInstanced, or RGBA8. This also bothers Typescript typing and other minor things.
The other option is to have branches all over the code. Two versions of shaders (or #ifdef trickery), lots of brancing for every place where texture formats are needed, and every place where instancing is done.
Having something like what follows all over the place is pretty ugly:
if (version === 1) {
instancedArrays.vertexAttribDivisorANGLE(m0, 1);
instancedArrays.vertexAttribDivisorANGLE(m1, 1);
instancedArrays.vertexAttribDivisorANGLE(m2, 1);
instancedArrays.vertexAttribDivisorANGLE(m3, 1);
} else {
gl.vertexAttribDivisor(m0, 1);
gl.vertexAttribDivisor(m1, 1);
gl.vertexAttribDivisor(m2, 1);
gl.vertexAttribDivisor(m3, 1);
}
Finally, maybe there's a third way I didn't think about.
Got any recommendations?
Unfortunately I think most answers will be primarily opinion based.
The first question is why support both? If your idea runs fine on WebGL1 then just use WebGL1. If you absolutely must have WebGL2 features then use WebGL2 and realize that many devices don't support WebGL2.
If you're intent on doing it twgl tries to make it easier by providing a function that copies all the WebGL1 extensions into their WebGL2 API positions. For like you mentioned, instead of
ext = gl.getExtension('ANGLE_instanced_arrays');
ext.drawArraysInstancedANGLE(...)
You instead do
twgl.addExtensionsToContext(gl);
gl.drawArraysInstanced(...);
I don't believe there will be any noticeable perf difference. Especially since those functions are only called a few hundred times a frame the wrapping is not going to be the bottleneck in your code.
The point though is not really to support WebGL1 and WebGL2 at the same time. Rather it's just to make it so the way you write code is the same for both APIs.
Still, there are real differences in the 2 APIs. For example to use a FLOAT RGBA texture in WebGL1 you use
gl.texImage2D(target, level, gl.RGBA, width, height, 0, gl.RGBA, gl.FLOAT, ...)
In WebGL2 it's
gl.texImage2D(target, level, gl.RGBA32F, width, height, 0, gl.RGBA, gl.FLOAT, ...)
WebGL2 will fail if you try to call it the same as WebGL1 in this case. There are other differences as well.
Will work just fine in WebGL1 and WebGL2. The spec specifically says that combination results in RGBA8 on WebGL2.
Note though that your example of needing RGBA8 is not true.
gl.texImage2D(target, level, gl.RGBA, width, height, 0, gl.RGBA, gl.UNSIGNED_BYTE, ...)
The biggest difference though is there is no reason to use WebGL2 if you can get by with WebGL1. Or, visa versa, if you need WebGL2 then you probably can not easily fall back to WebGL1
For example you mentioned using defines for shaders but what are you going to do about features in WebGL2 that aren't in WebGL1. Features like textureFetch or the mod % operator, or integer attributes, etc.... If you need those features you mostly need to write a WebGL2 only shader. If you don't need those features then there was really no point in using WebGL2 in the first place.
Of course if you really want to go for it maybe you want to make a fancier renderer if the user has WebGL2 and fall back to a simpler one if WebGL1.
TD;LR IMO Pick one or the other
I found this question when writing a documentation of my library, which has many objectives, but one of them is exactly this - to support WebGL1 and WebGL2 in the same time for higher cross-device compatibility.
https://xemantic.github.io/shader-web-background/
For example I discovered with BrowserStack that Samsung phones don't support rendering to floating point textures in WebGL1, while it is perfectly fine for them in WebGL2. In the same time WebGL2 will never appear on Apple devices, but rendering to half floating point textures is pretty well supported.
My library is not providing full WebGL abstraction, but rather will configure pipeline for fragment shaders. Here is the source on GitHub with the WebGL strategy code depending on the version:
https://github.com/xemantic/shader-web-background/blob/main/src/main/js/webgl-utils.js
Therefore to answer your question, it is doable and desirable, but doing it in a totally generic way, for every WebGL feature, might be quite challenging. I guess the first question to ask is "What would be the common denominator?" in terms of supported extensions.
I'm just starting out with procedural generation and I've made a program that generates lines using a D0L-systems by following Paul Bourke's website. For the first two simple examples it works great, but when I input the rules of the L-System Leaf, my results are incorrect as can be seen on this image.
Could any of you more experienced people point out where I might be going wrong? I'm pretty sure that I'm misunderstanding something about the usage of the length factor. In my case, lengthFactor is a static float, that is set once before the generation starts and is used to multiply/divide line's length in the current drawing state. lenghFactor itself won't change during the generation.
I'm using OpenGL for rendering and programming in C++.
What i'm doing is GPGPU on WebGL and I don't know the access pattern which I'd be talking about applies to general graphics and gaming programs. In our code, frequently, we come across data which needs to be summarized or reduced per output texel. A very simple example is matrix multiplication during which, for every output texel, your return a value which is a dot product of a row of one input and a column of the other input.
This has been the sore point of our performance because of not so much the computation but multiplied data access. So I've been trying to find a pattern of reads or data layouts which would expedite this operation and I have been completely unsuccessful.
I will be describing some assumptions and some schemes below. The sample code for all these are under https://github.com/jeffsaremi/webgl-experiments
Unfortunately due to size I wasn't able to use the 'snippet' feature of StackOverflow. NOTE: All examples write to console not the html page.
Base matmul implementation: Example: [2,3]x[3,4]->[2,4] . This produces in a simplistic form 2 textures of (w:3,h:2) and (w:4,h:3). For each output texel I will be reading along the X axis of the left texture but going along the Y axis of the right texture. (see webgl-matmul.html)
Assuming that GPU accesses data similar to CPU -- that is block by block -- if I read along the width of the texture I should be hitting the cache pretty often.
For this, I'd layout both textures in a way that I'd be doing dot products of corresponding rows (along texture width) only. Example: [2,3]x[4,3]->[2,4] . Note that the data for the right texture is now transposed so that for each output texel I'd be doing a dot product of one row from the left and one row from the right. (see webgl-matmul-shared-alongX.html)
To ensure that the above assumption is indeed working, I created a negative test also. In this test I'd be reading along the Y axis of both left and right textures which should have the worst performance ever. Data is pre-transposed so that the results make sense. Example: [3,2]x[3,4]->[2,4]. (see webgl-matmul-shared-alongY.html).
So I ran these -- and I hope you could do as well to see -- and I found no evidence to support existence or non-existence of such caching behavior. You need to run each example a few times to get consistent results for comparison.
Then I came along this paper http://fileadmin.cs.lth.se/cs/Personal/Michael_Doggett/pubs/doggett12-tc.pdf which in short claims that the GPU caches data in blocks (or tiles as I call them).
Based on this promising lead I created a version of matmul (or dot product) which uses blocks of 2x2 to do its calculation. Prior to using this of course I had to rearrange my inputs into such layout. The cost of that re-arrangement is not included in my comparison. Let's say I could do that once and run my matmul many times after. Even this scheme did not contribute anything to the performance if not taking something away. (see webgl-dotprod-tiled.html).
A this point I am completely out of ideas and any hints would be appreciated.
thanks
I'm looking at getting a program written for DirectX11 to play nice on DirectX10. To do that, I need to compile the shaders for model 4, not 5. Right now the only problem with that is that the geometry shaders use instancing which is unsupported by 4. The general model is
[instance(NUM_INSTANCES)]
void Gs(..., in uint instanceId : SV_GSInstanceID) { }
I can't seem to find many documents on why this exists, because my thought is: can't I just replace this with a loop from instanceId=0 to instanceId=NUM_INSTANCES-1?
The answer seems to be no, as it doesn't seems to output correctly, but besides my exact problem - can you help me understand why the concept of instancing exists. Is there some implication on the entire pipeline that instancing has beyond simply calling the main function twice with a different index?
With regards to why my replacement did not work:
Geometry shaders are annotated with [maxvertexcount(N)]. I had incorrectly assumed this was the vertex input count, and ignored it. In fact, input is determined by the type of primitive coming in, and so this was about the output. Before, if N was my output over I instances, each instance output N vertices. But now that I want to use a loop, a single instance outputs N*I vertices. As such, the answer was to do as I suggested, and also use [maxvertexcount(N*NUM_INSTANCES)].
To more broadly answer my question on why instances may be useful in a world that already has loops, I can only guess
Loops are not truly supported in shaders, it turns out - graphics card cores do not have a concept of control flow. When loops are written in shaders, the loop is unrolled (see [unroll]). This has limitations, makes compilation slower, and makes the shader blob bigger.
Instances can be parallelized - one GPU core can run one instance of a shader while another runs the next instance of the same shader with the same input.
I'm using the SharpDX Toolkit, and I'm trying to create a Texture2D programmatically, so I can manually specify all the pixel values. And I'm not sure what pixel format to create it with.
SharpDX doesn't even document the toolkit's PixelFormat type (they have documentation for another PixelFormat class but it's for WIC, not the toolkit). I did find the DirectX enum it wraps, DXGI_FORMAT, but its documentation doesn't give any useful guidance on how I would choose a format.
I'm used to plain old 32-bit bitmap formats with 8 bits per color channel plus 8-bit alpha, which is plenty good enough for me. So I'm guessing the simplest choices will be R8G8B8A8 or B8G8R8A8. Does it matter which I choose? Will they both be fully supported on all hardware?
And even once I've chosen one of those, I then need to further specify whether it's SInt, SNorm, Typeless, UInt, UNorm, or UNormSRgb. I don't need the sRGB colorspace. I don't understand what Typeless is supposed to be for. UInt seems like the simplest -- just a plain old unsigned byte -- but it turns out it doesn't work; I don't get an error, but my texture won't draw anything to the screen. UNorm works, but there's nothing in the documentation that explains why UInt doesn't. So now I'm paranoid that UNorm might not work on some other video card.
Here's the code I've got, if anyone wants to see it. Download the SharpDX full package, open the SharpDXToolkitSamples project, go to the SpriteBatchAndFont.WinRTXaml project, open the SpriteBatchAndFontGame class, and add code where indicated:
// Add new field to the class:
private Texture2D _newTexture;
// Add at the end of the LoadContent method:
_newTexture = Texture2D.New(GraphicsDevice, 8, 8, PixelFormat.R8G8B8A8.UNorm);
var colorData = new Color[_newTexture.Width*_newTexture.Height];
_newTexture.GetData(colorData);
for (var i = 0; i < colorData.Length; ++i)
colorData[i] = (i%3 == 0) ? Color.Red : Color.Transparent;
_newTexture.SetData(colorData);
// Add inside the Draw method, just before the call to spriteBatch.End():
spriteBatch.Draw(_newTexture, new Vector2(0, 0), Color.White);
This draws a small rectangle with diagonal lines in the top left of the screen. It works on the laptop I'm testing it on, but I have no idea how to know whether that means it's going to work everywhere, nor do I have any idea whether it's going to be the most performant.
What pixel format should I use to make sure my app will work on all hardware, and to get the best performance?
The formats in the SharpDX Toolkit map to the underlying DirectX/DXGI formats, so you can, as usual with Microsoft products, get your info from the MSDN:
DXGI_FORMAT enumeration (Windows)
32-bit-textures are a common choice for most texture scenarios and have a good performance on older hardware. UNorm means, as already answered in the comments, "in the range of 0.0 .. 1.0" and is, again, a common way to access color data in textures.
If you look at the Hardware Support for Direct3D 10Level9 Formats (Windows) page you will see, that DXGI_FORMAT_R8G8B8A8_UNORM as well as DXGI_FORMAT_B8G8R8A8_UNORM are supported on DirectX 9 hardware. You will not run into compatibility-problems with both of them.
Performance is up to how your Device is initialized (RGBA/BGRA?) and what hardware (=supported DX feature level) and OS you are running your software on. You will have to run your own tests to find it out (though in case of these common and similar formats the difference should be a single digit percentage at most).