Vulkan reading a storage buffer returns zero [duplicate] - buffer

The vec3 type is a very nice type. It only takes up 3 floats, and I have data that only needs 3 floats. And I want to use one in a structure in a UBO and/or SSBO:
layout(std140) uniform UBO
{
vec4 data1;
vec3 data2;
float data3;
};
layout(std430) buffer SSBO
{
vec4 data1;
vec3 data2;
float data3;
};
Then, in my C or C++ code, I can do this to create matching data structures:
struct UBO
{
vector4 data1;
vector3 data2;
float data3;
};
struct SSBO
{
vector4 data1;
vector3 data2;
float data3;
};
Is this a good idea?

NO! Never do this!
When declaring UBOs/SSBOs, pretend that all 3-element vector types don't exist. This includes column-major matrices with 3 rows or row-major matrices with 3 columns. Pretend that the only types are scalars, 2, and 4 element vectors (and matrices). You will save yourself a very great deal of grief if you do so.
If you want the effect of a vec3 + a float, then you should pack it manually:
layout(std140) uniform UBO
{
vec4 data1;
vec4 data2and3;
};
Yes, you'll have to use data2and3.w to get the other value. Deal with it.
If you want arrays of vec3s, then make them arrays of vec4s. Same goes for matrices that use 3-element vectors. Just banish the entire concept of 3-element vectors from your SSBOs/UBOs; you'll be much better off in the long run.
There are two reasons why you should avoid vec3:
It won't do what C/C++ does
If you use std140 layout, then you will probably want to define data structures in C or C++ that match the definition in GLSL. That makes it easy to mix&match between the two. And std140 layout makes it at least possible to do this in most cases. But its layout rules don't match the usual layout rules for C and C++ compilers when it comes to vec3s.
Consider the following C++ definitions for a vec3 type:
struct vec3a { float a[3]; };
struct vec3f { float x, y, z; };
Both of these are perfectly legitimate types. The sizeof and layout of these types will match the size&layout that std140 requires. But it does not match the alignment behavior that std140 imposes.
Consider this:
//GLSL
layout(std140) uniform Block
{
vec3 a;
vec3 b;
} block;
//C++
struct Block_a
{
vec3a a;
vec3a b;
};
struct Block_f
{
vec3f a;
vec3f b;
};
On most C++ compilers, sizeof for both Block_a and Block_f will be 24. Which means that the offsetof b will be 12.
In std140 layout however, vec3 is always aligned to 4 words. And therefore, Block.b will have an offset of 16.
Now, you could try to fix that by using C++11's alignas functionality (or C11's similar _Alignas feature):
struct alignas(16) vec3a_16 { float a[3]; };
struct alignas(16) vec3f_16 { float x, y, z; };
struct Block_a
{
vec3a_16 a;
vec3a_16 b;
};
struct Block_f
{
vec3f_16 a;
vec3f_16 b;
};
If the compiler supports 16-byte alignment, this will work. Or at least, it will work in the case of Block_a and Block_f.
But it won't work in this case:
//GLSL
layout(std140) Block2
{
vec3 a;
float b;
} block2;
//C++
struct Block2_a
{
vec3a_16 a;
float b;
};
struct Block2_f
{
vec3f_16 a;
float b;
};
By the rules of std140, each vec3 must start on a 16-byte boundary. But vec3 does not consume 16 bytes of storage; it only consumes 12. And since float can start on a 4-byte boundary, a vec3 followed by a float will take up 16 bytes.
But the rules of C++ alignment don't allow such a thing. If a type is aligned to an X byte boundary, then using that type will consume a multiple of X bytes.
So matching std140's layout requires that you pick a type based on exactly where it is used. If it's followed by a float, you have to use vec3a; if it's followed by some type that is more than 4 byte aligned, you have to use vec3a_16.
Or you can just not use vec3s in your shaders and avoid all this added complexity.
Note that an alignas(8)-based vec2 will not have this problem. Nor will C/C++ structs&arrays using the proper alignment specifier (though arrays of smaller types have their own issues). This problem only occurs when using a naked vec3.
Implementation support is fuzzy
Even if you do everything right, implementations have been known to incorrectly implement vec3's oddball layout rules. Some implementations effectively impose C++ alignment rules to GLSL. So if you use a vec3, it treats it like C++ would treat a 16-byte aligned type. On these implementations, a vec3 followed by a float will work like a vec4 followed by a float.
Yes, it's the implementers' fault. But since you can't fix the implementation, you have to work around it. And the most reasonable way to do that is to just avoid vec3 altogether.
Note that, for Vulkan (and OpenGL using SPIR-V), the SDK's GLSL compiler gets this right, so you don't need to be worried about it for that.

Related

How disable float arithmetic optimizations working with uniforms in WebGL/WebGL2 shaders

I am trying to implement 64-bit arithmetic on the WebGL or WebGL2 shaders based on 32-bit floats. One of the basic functions which is needed there is the function which is splitting any float number into two "non overlapping" floats. The first float contains first half of the original float's fraction bits and the second float contains the second half. Here is the implementation of this function:
precision highp float;
...
...
vec2 split(const float a)
{
const float split = 4097.0; // 2^12 + 1
vec2 result;
float t = a * split; // almost 4097 * a
float diff = t - a; // almost 4096 * a
result.x = t - diff; // almost a
result.y = a - result.x; //very small number
return result;
}
This function is working as expected if I pass to it arguments defined in the shader:
precision highp float;
...
...
float number = 0.1;
vec2 splittedNumber = split(number);
if (splittedNumber.y != 0.0)
{
// color with white
// we step here and see the white screen
}
else
{
//color with black
}
But whenever the number depends somehow from any uniform everything starts to behave differently:
precision highp float;
uniform float uniformNumber;
...
...
float number = 0.2;
if (uniformNumber > 0.0)
{
// uniform number is positive,
// so we step here
number = 0.1;
}
vec2 splittedNumber = split(number);
if (splittedNumber.y != 0.0)
{
// color with white
}
else
{
//color with black
// we step here and see the black screen
}
So in the second situation when the "number" depends on the uniform the split function is somehow became optimized and return back vec2 with the zero y value.
There is a similar question on the stackoverflow on the similar problem in OpenGL Differing floating point behaviour between uniform and constants in GLSL
The suggestion there was to use the "precise" modifier inside the function "split". Unfortunately in WebGL/WebGL2 shaders there is no such modifier.
Do you have any suggestion how get rid of the optimizations in my case and implement the "split" function?
Mathematically speaking, both your examples should output black pixels. Because:
float diff = t - a;
result.x = t - diff; // t - diff = t - t + a = a
result.y = a - result.x; // a - a = 0
It could be that in case of the constant argument to split() (the value known beforehand) the compiler took the path of calculating the function before optimizing the expressions and you got splittedNumber.y != 0.0 because of precission errors. When you used uniform, the compiler took the path of optimizing the expression which produced mathmatically strict zero.
To verify that this is the case you could try the following comparison instead:
if(abs(splittedNumber.y) > 1e-6)
{
}
On a side note, highp does not guarantee a 32-bit float in WebGL Shaders. Depending on the hardware it could be a 24-bit float or even 16-bit due to fallback to mediump (if highp is not supported in fragment shaders). To see the actual precision you can use
gl.getShaderPrecisionFormat
https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/getShaderPrecisionFormat

using pointSize to trigger the fragment shader to draw pixels

I queries the pointSize range gl.getParameter(gl.ALIASED_POINT_SIZE_RANGE) and got [1,1024] this means, that using this point to cover a texture (so it triggers the fragment shader to draw all pixels spans by the pointSize
at best, using this method i cannot render images larger then 1024x1024, ?
I guess i have to bind 2 triangles (6 points) to the fragment shader so it covers all of clipspace and then gl.viewport(x, y, width, height); will map this entire area to the output texture (frame buffer object or canvas)?
is there any other way (maybe something new in webgl2) other then using an attribute in the fragment shader?
Correct, the largest size area you can render with a single point is whatever is returned by gl.getParameter(gl.ALIASED_POINT_SIZE_RANGE)
The spec does not require any size larger than 1. The fact that your GPU/Driver/Browser returned 1024 does not mean that your users' machines will also return 1024.
note: Answering based on your history of questions
The normal thing to do in WebGL for 99% off all cases is to submit vertices. Want to draw a quad, submit 4 vertices and 6 indices or 6 vertex. Want to draw a triangle, submit 3 vertices. Want to draw a circle, submit the vertices for a circle. Want to draw a car, submit the vertices for a car or more likely submit the vertices for a wheel, draw 4 wheels with those vertices, submit the vertices for other parts of the car, draw each part of the car.
You multiply those vertices by some matrices to move, scale, rotate, and project them into 2D or 3D space. All your favorite games do this. The canvas 2D api does this via OpenGL ES internally. Chrome itself does this to render all the parts of this webpage. That's the norm. Anything else is an exception and will likely lead to limitations.
For fun, in WebGL2, there are some other things you can do. They are not the normal thing to do and they are not recommended to actually solve real world problems. They can be fun though just for the challenge.
In WebGL2 there is an global variable in the vertex shader called gl_VertexID which is the count of the vertex currently being processed. You can use that with clever math to generate vertices in the vertex shader with no other data.
Here's some code that draws a quad that covers the canvas
function main() {
const gl = document.querySelector('canvas').getContext('webgl2');
const vs = `#version 300 es
void main() {
int x = gl_VertexID % 2;
int y = (gl_VertexID / 2 + gl_VertexID / 3) % 2;
gl_Position = vec4(ivec2(x, y) * 2 - 1, 0, 1);
}
`;
const fs = `#version 300 es
precision mediump float;
out vec4 outColor;
void main() {
outColor = vec4(1, 0, 0, 1);
}
`;
// compile shaders, link program
const prg = twgl.createProgram(gl, [vs, fs]);
gl.useProgram(prg);
const count = 6;
gl.drawArrays(gl.TRIANGLES, 0, count);
}
main();
<canvas></canvas>
<script src="https://twgljs.org/dist/4.x/twgl.min.js"></script>
Example: And one that draws a circle
function main() {
const gl = document.querySelector('canvas').getContext('webgl2');
const vs = `#version 300 es
#define PI radians(180.0)
void main() {
const int TRIANGLES_AROUND_CIRCLE = 100;
int triangleId = gl_VertexID / 3;
int pointId = gl_VertexID % 3;
int pointIdOffset = pointId % 2;
float angle = float((triangleId + pointIdOffset) * 2) * PI /
float(TRIANGLES_AROUND_CIRCLE);
float radius = 1. - step(1.5, float(pointId));
float x = sin(angle) * radius;
float y = cos(angle) * radius;
gl_Position = vec4(x, y, 0, 1);
}
`;
const fs = `#version 300 es
precision mediump float;
out vec4 outColor;
void main() {
outColor = vec4(1, 0, 0, 1);
}
`;
// compile shaders, link program
const prg = twgl.createProgram(gl, [vs, fs]);
gl.useProgram(prg);
const count = 300; // 100 triangles, 3 points each
gl.drawArrays(gl.TRIANGLES, 0, 300);
}
main();
<canvas></canvas>
<script src="https://twgljs.org/dist/4.x/twgl.min.js"></script>
There is an entire website based on this idea. The site is based on the puzzle of making pretty pictures given only an id for each vertex. It's the vertex shader equivalent of shadertoy.com. On Shadertoy.com the puzzle is basically given only gl_FragCoord as input to a fragment shader write a function to draw something interesting.
Both sites are toys/puzzles. Doing things this way is not recommended for solving real issues like drawing a 3D world in a game, doing image processing, rendering the contents of a browser window, etc. They are cute puzzles on given only minimal inputs, drawing something interesting.
Why is this technique not advised? The most obvious reason is it's hard coded and inflexible where as the standard techniques are super flexible. For example above to draw a fullscreen quad required one shader. To draw a circle required a different shader. Where a standard vertex buffer based attributes multiplied by matrices can be used for any shape provided, 2d or 3d. Not just any shape, with just a simple single matrix multiply in the shader those shapes can be translated, rotated, scaled, projected into 3D, there rotation centers and scale centers can be independently set, etc.
Note: you are free to do whatever you want. If you like these techniques then by all means use them. The reason I'm trying to steer you away form them is based on your previous questions you're new to WebGL and I feel like you'll end up making WebGL much harder for yourself if you use obscure and hard coded techniques like these instead of the traditional more common flexible techniques that experienced devs use to get real work done. But again, it's up to you, do whatever you want.

How to provide custom data to SCNProgram?

I have array with SCNVector3, one for each vertex.
var terrainArray = [SCNVector3]()
I need to provide this data per vertex in my fragment shader. Something like this:
struct TerrainVertexInput
{
float3 position [[attribute(SCNVertexSemanticPosition)]];
float4 color [[attribute(SCNVertexSemanticColor)]];
};
struct TerrainVertexOutput
{
float4 position [[position]];
float3 terrain;
float4 color;
};
vertex TerrainVertexOutput terrainVertex(TerrainVertexInput in [[stage_in]],
constant SCNSceneBuffer& scn_frame [[buffer(0)]],
constant MyNodeBuffer& scn_node [[buffer(1)]],
constant float3 terrain [[buffer(2)]])
{
TerrainVertexOutput v;
v.position = scn_node.modelViewProjectionTransform * float4(in.position, 1.0);
v.terrain = terrain;
v.color = in.color;
return v;
}
As I understand I need to create Data object with array data and provide it to program with setValue(_:forKey:) but I'm not sure if vertex function will get right element for vertex.
How to do this right?
You're on the right track, but you don't want to use SCNVector3 for data you're passing into a Metal shader. SceneKit's vector types have components of type CGFloat, the size of which is platform-dependent.
Instead, your data should use one of the simd vector types. In Swift and Metal, that means float3 or float4. Note that float3 actually occupies 16 bytes of space; there's a dummy element at the end for alignment purposes. If you want to pack your data tightly, using exactly 3 floats per vertex, you can type your buffer in Metal as packed_float3 and write 3 contiguous floats into your data buffer for each vertex. There is no three-element packed float vector type in Swift.
There are many ways to copy an array of SCNVector3 into a suitably-typed data buffer. Here's one:
// Allocate enough memory to store three floats per vertex, ensuring we free it later
let terrainBuffer = UnsafeMutableBufferPointer<Float>.allocate(capacity: terrainArray.count * 3)
defer {
terrainBuffer.deallocate()
}
// Copy each element of each vector into the buffer
terrainArray.enumerated().forEach { i, v in
terrainBuffer[i * 3 + 0] = Float(v.x)
terrainBuffer[i * 3 + 1] = Float(v.y)
terrainBuffer[i * 3 + 2] = Float(v.z)
}
// Copy the buffer data into a Data object, as expected by SceneKit
let terrainData = Data(buffer: terrainBuffer)
You can then use setValue(:forKey:) on your geometry or material:
material.setValue(terrainData, forKey: "terrain")
Rather than taking a single float3 as a parameter in your vertex function, instead take a pointer to packed_float3 and index into it according to the vertex ID:
vertex TerrainVertexOutput terrainVertex(TerrainVertexInput in [[stage_in]],
constant SCNSceneBuffer& scn_frame [[buffer(0)]],
constant MyNodeBuffer& scn_node [[buffer(1)]],
constant packed_float3 *terrain [[buffer(2)]],
uint vid [[vertex_id]]) {
// ...
v.terrain = terrain[vid];
// ...
}
This assumes an exact correspondence between vertices in your geometry and terrain data points. Rather than using the vertex ID directly, you can of course do whatever sort of fancy indexing you want to look up the terrain data for a given vertex.

OpenGL ES 2.0 fragment shader arithmetic incorrect on iOS

I am very new to GLSL so please excuse the basic nature of these questions..
First of all, a call to:
int range[2], precision;
glGetShaderPrecisionFormat(GL_FRAGMENT_SHADER, GL_HIGH_FLOAT, range, &precision);
For my device this returns precision = 23, and range = {127, 127}. Before I go any further, my understanding of this is that this is therefore a 32-bit float (1 sign + 8 exp + 23 mantissa)- is this correct?
Second, my goal is to emulate double-float precision in a fragment shader using this method. I split a double into 2 floats on the CPU side using the following code:
static double const SPLITTER = (1 << 29) + 1;
static inline void set2d(double a, float* b, float* c) {
double t = a * SPLITTER;
double hi = t - (t - a);
double lo = a - hi;
*b = (float)hi;
*c = (float)lo;
}
Then I set 2 uniforms in the fragment shader with the returned hi/lo values. The uniforms are declared like this:
precision highp float;
uniform float u_h0, u_h1;
The problem I have is that in the following code:
float a = u_h0;
float b = u_h1;
float s=a+b;
float e=b-(s-a); // always 0
The overflow term float e=b-(s-a); always evaluates to 0 on the GPU (I debugged by setting colours based on value), no matter what I pass in for u_h0 and u_h1 (I also confirmed that both these were non-zero).
It seems like the compiler may be interpreting e as b - ((a + b) - a) == (b - b) == 0, ie optimising code away at compile time.
Does this seem likely, and if so how could I stop this from happening?
Thanks for any help.
Changing:
varying vec2 v_texCoords;
To:
invariant varying vec2 v_texCoords;
In both vertex and fragment shaders fixes the problem.

Fragment shader behavior on iOS

I'm doing some basic optimization on a fragment shader for an iOS app. I want to assign one of a few colors to gl_FragColor. My first attempt used the ternary operator. This displays things correctly on both the simulator and iPhone 5C:
lowp float maxC = max(color.r, max(color.g, color.b);
lowp float minC = min(color.r, min(color.g, color.b);
gl_FragColor.rgb = (maxC > 1.0 ? result0 : (minC < 0.0 ? result1 : result2));
I tried to replace the ternary operators with a combination of mix and step to see if I can replicate the above the logic with less branching. Unfortunately, this works in the simulator but not on the iOS device:
lowp float step0 = step(0.0, minC);
lowp float step1 = step(1.001, maxC);
lowp vec3 mix0 = mix(result1, result2, step0);
gl_FragColor.rgb = mix(mix0, result0, step1);
Specifically, some white areas of the screen that draw correctly in the simulator are drawn incorrectly as black areas on the device. Everything else looks fine.
What are some reasons for the above combination of step and mix not reproducing the same results as the approach using ternary operators?
The exact details of floating point operations in an OpenGL shader can be difficult to understand. What is likely going on is that the ternary operator is being optimized into a multiply by zero operation by the GLSL compiler. Have a look at avoiding-shader-conditionals for info about how folding conditionals into multiply or other simple operations. The most useful utility function is:
float when_eq(float x, float y) {
return 1.0 - abs(sign(x - y));
}
An example of use would be:
float comp = 0.0;
comp += when_eq(vTest, 0.0) * v1;
comp += when_eq(vTest, 1.0) * v2;
The code above will set comp to either v1 or v2 depending on the current value of vTest. Be sure to test this explicit implementation vs the ternary impl generated by the compiler to see which one is faster.

Resources