Speed difference between updating texture or updating buffers - webgl

I'm interesting about speed of updating a texture or buffer in WebGL.
(I think this performance would be mostly same with OpenGLES2)
If I needs to update texture or buffer one time per frame which contains same amount of data in byte size, which is good for performance?
Buffer usage would be DRAW_DYNAMIC and these buffer should be drawed by index buffers.

This would be really up to the device/driver/browser. There's no general answer. One device or driver might be faster for buffers, another for textures. There's also the actual access. Buffers don't have random access, textures do. Do if you need random access your only option is a texture.
One example of a driver optimization is if you replace the entire buffer or texture it's possible for the driver to just create a new buffer or texture internally and then start using it when appropriate. If it doesn't do this and you update a buffer or texture that is currently being used, as in commands have already been issued to draw something using the buffer or texture but those commands have not yet been executed, then the driver would have to stall your program, wait for the buffer or texture to be used, so it can then replace it with the new contents. This also suggests that gl.bufferData can be faster than gl.bufferSubData and gl.texImage2D can be faster than gl.texSubImage2D but it's only can be. Again, it's up to the driver what it does, what optimizations it can and can't, does and doesn't do.
As for WebGL vs OpenGL ES 2, WebGL is more strict. You mentioned index buffers. Well, WebGL has to validate index buffers. When you draw it has to check that all the indices in your buffer are in range for the currently bound and used attribute buffers. WebGL implementations cache this info so they don't have to do it again but if you update an index buffer the cache for that buffer is cleared so in that case updating textures would probably be faster than updating index buffers. On the other hand it comes back to usage. If you're putting vertex positions in a texture and looking them up in the vertex shader from the texture vs using them in a buffer I while updating the texture might be faster rendering vertices doing texture lookups is likely slower. Too slow is again up to your app and the device/driver etc...


If I use vertex shader to do all operations on object, then constant buffer can be empty?

The program cycle is
In Update() constant buffer for each object, that after transformations, has this object world matrix is copied to GPU upload heap. And in UpdatePipeline(), among other things, installed shaders are called. Because we do all matrix transformation using CPU, vertex shader just returns position, right? If yes - is it true that performance will increase?
Now I want to do all transformations using GPU, i.e. via vertex shader. It means that in Update() I just should call memcpy() with an empty constant buffer as a source?
A constant buffer is just a buffer to move data from the CPU to the GPU. Whether you use one, or how many you use, and what you are using them for, is up to you.
The most common and most simple use-case for a constant buffer is to move a transformation matrix to the GPU. That matrix is indeed calculated by the CPU, and the vertex shader uses that matrix to transform the positions in the vertex buffer from local to screen space. This allows the CPU to move an object around without the need to update the (usually quite big) vertex buffer.
Whether performance increases depends on your hardware, your code, and - most importantly - what you are comparing the performance with. Since I don't know your current code, nor what exact changes you intent to do, I can't even guess if it will increase or not.
Also, even though I don't know your code, just by the way you phrased your question I would assume that you defenitly don't want to use a constant buffer as a source for any operation on the CPU.

How do I use indexed normals as an attribute? (WebGL) [duplicate]

I have some vertex data. Positions, normals, texture coordinates. I probably loaded it from a .obj file or some other format. Maybe I'm drawing a cube. But each piece of vertex data has its own index. Can I render this mesh data using OpenGL/Direct3D?
In the most general sense, no. OpenGL and Direct3D only allow one index per vertex; the index fetches from each stream of vertex data. Therefore, every unique combination of components must have its own separate index.
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
Your best bet is to simply accept that your data will be larger. A great many model formats will use multiple indices; you will need to fixup this vertex data before you can render with it. Many mesh loading tools, such as Open Asset Importer, will perform this fixup for you.
It should also be noted that most meshes are not cubes. Most meshes are smooth across the vast majority of vertices, only occasionally having different normals/texture coordinates/etc. So while this often comes up for simple geometric shapes, real models rarely have substantial amounts of vertex duplication.
GL 3.x and D3D10
For D3D10/OpenGL 3.x-class hardware, it is possible to avoid performing fixup and use multiple indexed attributes directly. However, be advised that this will likely decrease rendering performance.
The following discussion will use the OpenGL terminology, but Direct3D v10 and above has equivalent functionality.
The idea is to manually access the different vertex attributes from the vertex shader. Instead of sending the vertex attributes directly, the attributes that are passed are actually the indices for that particular vertex. The vertex shader then uses the indices to access the actual attribute through one or more buffer textures.
Attributes can be stored in multiple buffer textures or all within one. If the latter is used, then the shader will need an offset to add to each index in order to find the corresponding attribute's start index in the buffer.
Regular vertex attributes can be compressed in many ways. Buffer textures have fewer means of compression, allowing only a relatively limited number of vertex formats (via the image formats they support).
Please note again that any of these techniques may decrease overall vertex processing performance. Therefore, it should only be used in the most memory-limited of circumstances, after all other options for compression or optimization have been exhausted.
OpenGL ES 3.0 provides buffer textures as well. Higher OpenGL versions allow you to read buffer objects more directly via SSBOs rather than buffer textures, which might have better performance characteristics.
I found a way that allows you to reduce this sort of repetition that runs a bit contrary to some of the statements made in the other answer (but doesn't specifically fit the question asked here). It does however address my question which was thought to be a repeat of this question.
I just learned about Interpolation qualifiers. Specifically "flat". It's my understanding that putting the flat qualifier on your vertex shader output causes only the provoking vertex to pass it's values to the fragment shader.
This means for the situation described in this quote:
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
You can have 8 vertexes, 6 of which contain the unique normals and 2 of normal values are disregarded, so long as you carefully order your primitives indices such that the "provoking vertex" contains the normal data you want to apply to the entire face.
EDIT: My understanding of how it works:

Use single vertex buffer or many?

I'm implementing a 2D game with lots of independent rectangular game pieces of various dimensions. The dimensions of each piece do not change between frames. Most of the pieces will display an image and share the same fragment shader. I am new to WebGL and it is not clear to me what the best strategy is for managing vertex buffers in regard to performance for this situation.
Is it better to use a single vertex buffer (quad) to represent all of the game's pieces and then rescale those vertices in the vertex shader for each piece? Or, should I define a separate static vertex buffer for each piece?
The GPU is a state machine, switching states is expensive(even more when done through WebGL because of the additional layer of checks introduced by the WebGL implementation) so binding vertex buffers is expensive.
Its good practice to reduce API calls to a minimum.
Even when having multiple distinct objects you still want to use a single vertex buffer and use the offset parameter of the drawArrays or drawElements methods.
Here is a list of API calls ordered by decreasing expensiveness(top is most expensive):
Texture binds
Vertex format
Vertex bindings
Uniform updates
For more information on this you can watch this great talk Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead by Cass Everitt and John McDonald, this is also where the list above comes from.
While these benchmarks were done on Nvidia hardware its a good guideline for AMD and Intel graphics hardware as well.

What is a practical size limit to vertex buffer size?

In OpenGL, you generally get better performance by using vertex buffers, and even better performance by putting many objects into the same vertex buffer, so that lots of vertices can be drawn with a single glDrawArrays call.
But, what is the practical upper limit of this? How many MB of vertex data in the same buffer is too much? At what point should you cut a vertex array into two separate vertex arrays? How do you know this?
I know this answer could be seen as generic but I guess it is actually dependent on the GPU you are using and the GPU memory available.
There is no hard limit in the number of vertices you can process as per OpenGL specs.
Here you can find some useful information on this topic:
How many maximum triangles can be drawn on ipad using opengl es in 1 frame?

How do PowerVR GPUs provide a depth buffer?

iOS devices use a PowerVR graphics architecture. The PowerVR architecture is a tile-based deferred rendering model. The primary benefit of this model is that it does not use a depth buffer.
However, I can access the depth buffer on my iOS device. Specifically, I can use an offscreen Frame Buffer Object to turn the depth buffer into a color texture and render it.
If the PowerVR architecture doesn't use a depth buffer, how is it that I'm able to render a depth buffer?
It is true that a tile-based renderer doesn't need a traditional depth buffer in order to work.
TBR split the screen in tiles and completely renders the contents of this tile using fast on-chip memory to store temporary colors and depths. Then, when the tile is finished, the final values are moved to the actual framebuffer. However, depth values in a depth buffer are traditionally temporary because they are just used as a hidden surface algorithm. Then depth values in this case can be completely discarded after the tile is rendered.
That means that effectively tile-based renderers don't really need a full screen depth buffer in slower video memory, saving both bandwidth and memory.
The Metal API easily exposes this functionality, allowing you to set the storeAction of the depth buffer to 'don't care' value, meaning that it will not back up the resulting depth values into main memory.
The exception to this case is that you may need the depth buffer contents after rendering (i.e. for a deferred renderer or as a source for some algorithm that operates with depth values). In that case the hardware must ensure that the depth values are stored in the framebuffer for you tu use.
Tile-based deferred rendering — as the name clearly says — works on a tile-by-tile basis. Each portion of the screen is loaded into internal caches, processed and written out again. The hardware takes the list of triangles overlapping the current tile and the current depth values and from those comes up with a new set of colours and depths and then writes those all out again. So whereas a completely dumb GPU might do one read and one write to every relevant depth buffer value per triangle, the PowerVR will do one read and one write per batch of geometry, with the ray casting-style algorithm doing the rest in between.
It isn't really possible to implement OpenGL on a 'pure' tile-based deferred renderer because the depth buffer needs to be readable. It also generally isn't efficient to make the depth buffer write only because the colour buffer is readable at any time (either explicitly via glReadPixels or implicitly according to whenever you present the frame buffer as per your OS's mechanisms), meaning that the hardware may have to draw a scene, then draw more onto it.
PowerVR does use a depth buffer, but in a different way than a regular(Immediate Mode Rendering) GPU
The differed part of Tile-based differed rendering means that triangles for a give scene are first processed (shaded, transformed clipped, etc. ) and saved into an intermediate buffer. Only after the entire scene is processed the tiles are rendered one by one.
Having all the processed triangles in one buffer allows the hardware to perform hidden surface removal - removing the triangles that will end up being hidden/overdrawn by other triangles. This significantly reduces the number of rendered triangles, resulting in improved performance and reduced power consumption.
Hidden surface removal typically uses something called a Tab Buffer as well as a depth buffer. (Both are small on-chip memories as they store a tile at a time)
Not sure why you're saying that PowerVR doesn't use a depth buffer. My guess is that it is just a "marketing" way of saying that there is not need to perform expensive writes and reads from system memory in order to perform depth test.
Just to add to Tommy's answer: the primary benefits of tile based differed rendering are:
Since fragments are processed a tile at a time all color/depth/stencil buffer read and writes are performed from a fast on-chip memory. While the color buffer still has to be read/written to system memory ones per tile, in many cases the depth and stencil buffers need to be written to system memory only if it is required for later use(like your user case). System memory traffic is a significant source of power consumption... so you can see how it reduced power consumption.
Differed rendering enables hidden surface removal. Less rendered triangles means less fragments processing, means less texture memory access.
