I'm on iOS 5.1
I was trying to display several batch of lines in the same vertex array and I wanted to separate them using degenerated vertices. But it does not seem to work. I line is drawn between each batch of vertices.
Googling the problem gave me results that degenerated vertices was not compatible with GL_LINE_STRIP but I'm not really sure about it. Can someone confirm that? And also what's the alternative ?
As far as I know, you can only draw a continuous line using a single vertex array and GL_LINE_STRIP. The alternative is to use GL_LINES, which treats each vertex pair as an independent line segment. To get a contiguous line, duplicate the last vertex of the previous segment as the start of your next segment in your vertex array.
One possibility that may come to your mind would be to use vertices with some strange values (like infinity, or a w of 0), but those will most probably just get rendered as normal points at some crazy distance away (and thus you get weird clipped lines). So this won't work generally.
When drawing triangle strips, you can use degenerate triangles to restart the strip. This works by duplicating a vertex (or better two consecutive ones), which then results in a triangle (or better four) that degenerates to a line and thus has zero area (and is not drawn). But lets look at a line strip. When duplicating a vertex there, you get a line that degenerates to a point (and is thus not drawn), but once when starting the next line strip, you have to get a new vertex and since two distinct vertices always make a valid line, you see that by duplicating vertices you cannot get a line strip restart.
So there is no real way to put multiple line strips into a single draw call using degenerate vertices (though modern desktop GL has other ways to do it). The best idea would probably be to just use an ordinary line set (GL_LINES), as Drewsmits suggests. Ok, you will approximately double the number of vertices (if your strips are very long), but the less driver overhead from the batching may possibly outweight the additional memory and copying overhead.
Whereas you can't use degenerate vertices in a line strip, you can use a primitive restart index (usually the maximum index value possible for the numeric type), also called a strip-cut index in Direct3D parlance. It can be configured using glPrimitiveRestartIndex.
Related
I want to specify a color for each strip in my triangle strip separated by degenerate triangles. As of now, I am sending that data with each vertex so my vertices consists of: [ PosX, PosY, PosZ, ColorX, ColorY, ColorZ, ColorW ]. However, the color data is constant for every vertex until it may change after a degenerate triangle, and therefore I am wasting 8 bytes per vertex (Wide P3 colors).
Obviously using many draw calls instead of degenerate triangles would solve the memory issue (by using an uniform buffer to store the color for each draw call), but would also create a massive performance overhead which is unwanted.
Since I only have simple shapes, the degenerate triangles are better than index buffers in my case. Although the current implementation of repeating the color data for all vertices in each strip works, I would like to know if there is another way to pass this data more efficiently to the GPU. Could not find much about this topic on Google (presumably because degenerate triangles are not very common nowadays). If anyone could give me some insight on how the memory usage may be optimised without worsening the performance (maybe by using another buffer? I imagine I need to be able to tell which strip separated by degenerate triangles I am currently drawing in the shader) it would be highly appreciated.
I am using Swift 4 and Metal.
I have this huge model(helix) created with 2 million vertices at once and some million more indices for which vertices to use.
I am pretty sure this is a very bad way to draw so many vertices.
I need some hints to where I should start to optimize this?
I thought about copying 1 round of my helix (vertices) and moving the z of that. But in the end, I would be drawing a lot of triangles at once again...
How naive are you currently being? As per rickster's comment, there's a serious case of potential premature optimisation here: the correct way to optimise is to find the actual bottlenecks and to widen those.
Knee-jerk thoughts:
Minimise memory bandwidth. Pack your vertices into the smallest space they can fit into (i.e. limit precision where it is acceptable to do so) and make sure all the attributes that describe a single vertex are contiguously stored (i.e. the individual arrays themselves will be interleaved).
Consider breaking your model up to achieve that aim. Instanced drawing as rickster suggests is a good idea if it's sufficiently repetitive. You might also consider what you can do with 65536-vertex segments, since that'll cut your index size.
Use triangle strips if it allows you to specify the geometry in substantially fewer indices, even if you have to add degenerate triangles.
Consider where the camera will be. Do you really need that level of detail all the way around? Will the whole thing even ever be on screen? If not then consider level-of-detail solutions and subdivision for culling (both outside the viewport and within via the occlusion query).
My question maybe a bit too broad but i am going for the concept. How can i create surface as they did in "Cham Cham" app
https://itunes.apple.com/il/app/cham-cham/id760567889?mt=8.
I got most of the stuff done in the app but the surface change with user touch is quite different. You can change its altitude and it grows and shrinks. How this can be done using sprite kit what is the concept behind that can anyone there explain it a bit.
Thanks
Here comes the answer from Cham Cham developers :)
Let me split the explanation into different parts:
Note: As the project started quite a while ago, it is implemented using pure OpenGL. The SpiteKit implementation might differ, but you just need to map the idea over to it.
Defining the ground
The ground is represented by a set of points, which are interpolated over using Hermite Spline. Basically, the game uses a bunch of points defining the surface, and a set of points between each control one, like the below:
The red dots are control points, and eveyrthing in between is computed used the metioned Hermite interpolation. The green points in the middle have nothing to do with it, but make the whole thing look like boobs :)
You can choose an arbitrary amount of steps to make your boobs look as smooth as possible, but this is more to do with performance.
Controlling the shape
All you need to do is to allow the user to move the control points (or some of them, like in Cham Cham; you can define which range every point could move in etc). Recomputing the interpolated values will yield you an changed shape, which remains smooth at all times (given you have picked enough intermediate points).
Texturing the thing
Again, it is up to you how would you apply the texture. In Cham Cham, we use one big texture to hold the background image and recompute the texture coordinates at every shape change. You could try a more sophisticated algorithm, like squeezing the texture or whatever you found appropriate.
As for the surface texture (the one that covers the ground – grass, ice, sand etc) – you can just use the thing called Triangle Strips, with "bottom" vertices sitting at every interpolated point of the surface and "top" vertices raised over (by offsetting them against "bottom" ones in the direction of the normal to that point).
Rendering it
The easiest way is to utilize some tesselation library, like libtess. What it will do it covert you boundary line (composed of interpolated points) into a set of triangles. It will preserve texture coordinates, so that you can just feed these triangles to the renderer.
SpriteKit note
Unfortunately, I am not really familiar with SpriteKit engine, so cannot guarantee you will be able to copy the idea over one-to-one, but please feel free to comment on the challenging aspects of the implementation and I will try to help.
I decided to build my engine on triangle lists after reading (a while ago) that indexed triangle lists perform better due to less draw calls needed. Today i stumbled on 0xffffffff, which in DX is considered a strip-cut index so you can draw multiple strips in one call. Does this mean that triangle lists no longer hold superior performance?
It is possible to draw multiple triangle strips in a single draw call using degenerate triangles which have an area of zero. A strip cut is made by simply repeating the last vertex of the previous and the first vertex of the next strip, adding two elements per strip break (two zero-area triangles).
New in Direct3D 10 are the strip-cut index (for indexed geometry) and the RestartStrip HLSL function. Both can be used to replace the degenerate triangles method effectively cutting down the bandwidth cost. (Instead of two indices for a cut only one is needed.)
Expressiveness
Can any primitive list be converted to an equal strip and vise versa? Strip to list conversion is of course trivial. For list to strip conversion we have to assume that we can cut the strip. Then we can map each primitive in the list to a one-primitive-sub-strip, though this would not be useful.
So, at least for triangle primitives, strips and lists always had the same expressiveness. Before Direct3D 10 strip cuts in line strips where not possible, so they actually were not equally expressive.
Memory and Bandwidth
How much data needs to be sent to the GPU? In order to compare the methods we need to be able to calculate the number of elements needed for a certain topology.
Primitive List Formula
N ... total number of elements (vertices or indices)
P ... total number of primitives
n ... elements per primitive (point => 1, line => 2, triangle => 3)
N = Pn
Primitive Strip Formula
N, P, n ... same as above
S ... total number of sub-strips
o ... primitive overlap
c ... strip cut penalty
N = P(n-o) + So + c(S-1)
primitive overlap describes the number of elements shared by adjacent primitives. In a classical triangle strip a triangle uses two vertices from the previous primitive, so the overlap is 2. In a line strip only one vertex is shared between lines, so the overlap is 1. A triangle strip using an overlap of 1 is of course theoretically possible but has no representation in Direct3D.
strip cut penalty is the number of elements needed to start a new sub-strip. It depends on the method used. Using strip-cut indices the penalty would be 1, since one index is used to separate two strips. Using degenerate triangles the penalty would be two, since we need two zero-area triangles for a strip cut.
From these formulas we can deduce that it depends on the geometry which method needs the least space.
Caching
One important property of strips is the high temporal locality of the data. When a new primitive is assembled each vertex needs to be fetched from GPU memory. For a triangle this has to be done three times. Now accessing memory is usually slow, that's why processors use multiple levels of caches. In the best case the data needed is already stored in the cache, reducing memory access time. Now for triangle strips the last two vertices of the previous primitive are used, almost guaranteeing that two of three vertices are already present in the cache.
Ease of Use
As stated above, converting a list to a strip is very simple. The problem is converting a list to an efficient primitive strip by reducing the number of sub-strips. For simple procedurally generated geometry (e.g. heightfield terrains) this is usually achievable. Writing a converter for existing meshes might be more difficult.
Conclusion
The introduction of Direct3D 10 has not much impact on the strip vs. list question. There is now equal expressiveness for line strips and a slight data reduction. In any case, when using strips you always gain the most if you reduce the number of sub-strips.
On modern hardware with pre- and post-transform vertex caches, tri-stripping is not a win over indexed triangle lists. The only time you really use tri-stripping would be non-indexed primitives generated by something where the strips are trivial to compute such as a terrain system.
Instead, you should do vertex cache optimization of indexed triangle lists for best performance. The Hoppe algorithm is implemented DirectXMesh, or you can look at Tom Forsyth's alternative algorithm.
...or am I insane to even try?
As a novice to using bare vertices for 3d graphics, I haven't ever worked with vertex buffers and the like before. I am guessing that I should use a dynamic buffer because my game deals with manipulating, adding and deleting primitives. But how would I go about doing that?
So far I have stored my indices in a Triangle.cs class. Triangles are stored in Quads (which contain the vertices that correspond to their indices), quads are stored in blocks. In my draw method, I iterate through each block, each quad in each block, and finally each triangle, apply the appropriate texture to my effect, then call DrawUserIndexedPrimitives to draw the vertices stored in the triangle.
I'd like to use a vertex buffer because this method cannot support the scale I am going for. I am assuming it to be dynamic. Since my vertices and indices are stored in a collection of separate classes, though, can I still effectively use a buffer? Is using separate buffers for each quad silly (I'm guessing it is)? Is it feasible and effective for me to dump vertices into the buffer the first time a quad is drawn and then store where those vertices were so that I can apply that offset to that triangle's indices for successive draws? Is there a feasible way to handle removing vertices from the buffer in this scenario (perhaps event-based shifting of index offsets in triangles)?
I apologize that these questions may be either far too novicely or too confusing/vague. I'd be happy to provide clarification. But as I've said, I'm new to this and I may not even know what I'm talking about...
I can't exactly tell what you're trying to do, but using a seperate buffer for every quad is very silly.
The golden rule in graphics programming is batch, batch, batch. This means to pack as much stuff into a single DrawUserIndexedPrimitives call as possible, your graphics card will love you for it.
In your case, put all of your verticies and indicies into one vertex buffer and index buffer (you might need to use more, I have no idea how many verticies we're talking about). Whenever the user changes one of the primatives, regenerate the entire buffer. If you really have a lot of primatives, split them up into multiple buffers and on only regenerate the ones you need when the user changes something.
The most important thing is to minimize the amount of 'DrawUserIndexedPrimitives' calls, those things have a lot of overhead, you could easily make your game on the order of 20x faster.
Graphics cards are pipelines, they like being given a big chunk of data for them to eat away at. What you're doing by giving it one triangle at a time is like forcing a large-scale car factory to only make one car at a time. Where they can't start on building the next car before the last one is finished.
Anyway good luck, and feel free to ask any questions.