OpenGL ES 2.0 Vertex skinning maximum number of bones? - ios

When drawing a vertex skinned model, what is the maximum number of bones per draw-call/batch for different iOS devices?
On OpenGL ES 1.1 the limit is set by the number of palette matrices, but how about OpenGL ES 2.0, what sets the limit?

OpenGL ES 2.0 uses shaders for all of its vertex processing. So it depends on how many uniform matrices you can create. This is an implementation-defined limit, so it varies from hardware to hardware.
You can also use quaternions+position for bones instead of full matrices to save space.

From iOS 7 you can access the texture units from the vertex shader, so you can make a texture, fill it with your matrices and access the matrices from your vertex shader. This will allow many more matrices to be accessed, at the expense of a more complex implementation.

Related

Does flat shading require this much vertex duplication?

I'm very new to WebGL (1.0) / OpenGL and I'm having trouble understanding vertices for flat and smooth shading -- and whether data optimization is even possible for flat shading in this situation:
Say I want to use an icosphere (2-subdivision). It has 42 points that define its 80 faces. Those point coordinates lie on a unit sphere.
Both flat and smooth-shaded icospheres will appear on the same screen.
With smooth shading, the normals will be identical to position vectors, so I get them for free. So I could use 42 vec3 in one buffer for both a_position and v_normal and an index buffer of 240 unsigned_byte to access them for the object. Cheap!
But with flat shading, each face would have its own normal, which I think that means for WebGL 1.0 there will be three duplicate normals for each face. 80 faces means 240 vec3 for a_position (with a lot of duplicate vectors) and 240 vec3 for a_normal (two-thirds of which is just duplicate vectors). I can't see any other way to do this. On the other hand, I can add position and normal data together in the same buffer and I wouldn't need an index buffer.
I've got this working and it seems fast, but am I correct? Does it matter?
Icosphere property
Count
Floats needed
Faces
80
Positions (smooth)
42
126 (+240 indices)
Normals (smooth)
42
0 (reuse positions)
Positions (flat)
240
720
Normals (flat)
240
720
I feel like either I missed something in my studies or that this is just the reality of OpenGL and I should get used to it because it's inherently fast.
You are correct at arriving that for flat shading you'll have to duplicate positions. That is, because a vertex is the whole tuple of position, normal, and all the other attributes.
However this duplication has almost zero impact on rendering times. It adds some memory overhead, yes, but as far as the rendering process is concerned the same amount of data is transferred and incorporated into the rendering process. As a matter of fact the duplication of certain properties at a whole makes caching more predictable, since there's no data indirections (i.e. look up a different normal, depending on what face is rendered) involved. So that actually has theoretical performance gains.
You're doing it exactly right.
Indeed, the most portable implementation of flat shading will require duplicating vertexes for each drawn triangle, which brings an overhead on memory usage (considerable in case of a complex geometry). Potentially, it may affect rendering performance as well, but this would depend on hardware (shouldn't be noticeable nowadays). That's what a basic WebGL 1.0 allows to do.
However, WebGL 2.0 and WebGL 1.0 with OES_standard_derivatives extension gives another option - computing triangle normal directly in a Fragment Shader via derivatives:
#extension GL_OES_standard_derivatives : enable
...
varying vec4 Position;
varying vec3 View;
...
void main()
{
vec3 Normal = normalize (cross (dFdx (Position.xyz / Position.w), dFdy (Position.xyz / Position.w)));
if (!gl_FrontFacing) { Normal = -Normal; }
...
gl_FragColor = computeLighting (normalize (Normal), normalize (View), Position);
This requires per-fragment lighting (e.g. Phong shading instead of Gouraud shading). The shading result will NOT be exactly the same as duplicating vertexes and precomputing triangle normals on CPU, but visual effect will be the same - flat shading with distinguishable triangles.
Practically speaking, GL_OES_standard_derivatives is widely adopted.
In fact, GLSL 1.1 from desktop OpenGL 2.0 supported derivatives from the very beginning (no extension required) - it is only OpenGL ES 2.0 (and hence, WebGL 1.0) decided excluding it.
There are, however, some complains against derivatives implementations on various GPUs. Precise derivatives are expensive to compute, so that GLSL specifications allowed returning faster approximations instead - which was critical for an old graphics hardware. In practice, the method works mostly fine for flat shading, though one OpenGL ES implementation (Qualcomm) has a weird behavior with flipped sign of returned values.
Here is, for example, a research done for Android devices a couple of years ago (don't know if the same issues will be experienced in WebGL - web browsers might black-list broken implementations or apply some workarounds to known driver bugs):

Vertex normal, texture uv coordinate

I have one gltf issue: In gltf, does one vertex have to correspond to one vertex normal and one vertex texture uv coordinate? If I have one source format's model, in which one vertex can correspond to three vertex normals. How can I export such model to the glTF file?
Example:
A cube in the source format's model: 8 vertexes, 24 vertex normals.
A cube in the glTF file: Need I write 24 vertexes, and 24 vertex normals?
Yes, glTF exporters/writers must "split" vertices anywhere that discontinuous UVs or normals appear.
glTF is designed to be a GPU-ready delivery format, a last-mile format, not an artist interchange format. As a result, its internal data structures are nearly a 1:1 match with the way vertex attributes are handed off to the GPU, for example with a vec3 position attribute, vec3 normal attribute, and possibly a vec2 texture coordinate in a typical case. So yes, one normal and one UV per vertex, as expected for a set of raw data being supplied to the GPU.
Part of the advantage is that a well-curated collection of glTF files will contain binary payloads that can be set to (for example) mobile devices, where those mobile devices can then transfer whole sections of the binary data straight to GPU memory without further processing. WebGL frameworks, for example, don't have to do a lot of vertex processing after receiving the file, they just load it and render. The burden is placed explicitly on exporters and writers, not readers and loaders.
More details on the structure are spelled out fairly well in the glTF Tutorial. In particular the section on Buffers, Buffer Views, and Accessors covers the raw storage of vertex data in binary blobs. Generally, a programmer familiar with graphics APIs could think of a glTF accessor as an individual vertex attribute, and a bufferView as a block of GPU memory containing multiple vertex attributes (accessors) at the same stride, possibly interleaved. The buffer itself is just a lump of all binary data (bufferViews) in a glTF, without any stride.

Use single vertex buffer or many?

I'm implementing a 2D game with lots of independent rectangular game pieces of various dimensions. The dimensions of each piece do not change between frames. Most of the pieces will display an image and share the same fragment shader. I am new to WebGL and it is not clear to me what the best strategy is for managing vertex buffers in regard to performance for this situation.
Is it better to use a single vertex buffer (quad) to represent all of the game's pieces and then rescale those vertices in the vertex shader for each piece? Or, should I define a separate static vertex buffer for each piece?
The GPU is a state machine, switching states is expensive(even more when done through WebGL because of the additional layer of checks introduced by the WebGL implementation) so binding vertex buffers is expensive.
Its good practice to reduce API calls to a minimum.
Even when having multiple distinct objects you still want to use a single vertex buffer and use the offset parameter of the drawArrays or drawElements methods.
Here is a list of API calls ordered by decreasing expensiveness(top is most expensive):
FrameBuffer
Program
Texture binds
Vertex format
Vertex bindings
Uniform updates
For more information on this you can watch this great talk Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead by Cass Everitt and John McDonald, this is also where the list above comes from.
While these benchmarks were done on Nvidia hardware its a good guideline for AMD and Intel graphics hardware as well.

Is it possible to read floats out from OpenGL ES framebuffer via the iOS texture cache API?

This is related to OpenGL ES 2.0 :glReadPixels() with float or half_float textures.
I want to read out the float values from a framebuffer object after rendering.
On iOS, the following
GLint ext_type;
glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_TYPE, &ext_type);
really just tells us that glReadPixels only allows GL_UNSIGNED_BYTEs to be read out.
Is there a way to use the textured cache technique related in this article to get around this?
The back story is that I am trying to implement a general matrix multiplication routine for arbitrary-sized matrices (e.g., 100,000 x 100,000) using an OpenGL ES 2.0 fragment shader (similar to Dominik Göddeke's trusty ol' tutorial example). glReadPixel is not being particularly cooperative here because it converts the framebuffer floats to GL_UNSIGNED_BITS, causing a loss of precision.
I asked a similar question and I think the answer is NO, if only because texture caches (as an API) use CoreVideo pixel buffers and they don't currently don't support float formats.

Accessing multiple textures from fragment shader in OpenGL ES 2.0 on iOS

I have N textures in my app, which are 2D slices from 3D volumetric data. In the fragment shader, I need to access all of these textures. My understanding is that I can only access bound textures from the shader, which means I am limited by the number of multi-texturing units allowed.
N can vary from 8 to 512, depending on the data. Is there a way to do this without multi-texturing?
The reason for this approach is because 3D texturing is not available on OpenGL ES 2.0. I'd appreciate suggestions on any other way of doing this.
I also considered texture atlases, but I think the maximum single texture dimensions will be a problem.

Resources