I have a 2D texture formatted as DXGI_FORMAT_R32_FLOAT. In my pixel shader I sample from it thusly:
float sample = texture.Sample(sampler, coordinates);
This results in the following compiler warning:
warning X3206: implicit truncation of vector type
I'm confused by this. Shouldn't Sample return a single channel, and therefore a scalar value, as opposed to a vector?
I'm using shader model 4 level 9_1.
Either declare your texture as having one channel, or specify which channel you want. Without the <float> bit, it'll assume it's a 4 channel texture and so therefore Sample will return a float4.
Texture2D<float> texture;
or
float sample = texture.Sample(sampler, coordinates).r;
Related
My code works when I draw on MTLTexture with rgba32Float pixel format, I can take then CVPixelBuffer out of it.
But FlutterTexture requires bgra8Unorm format. I do not want to convert CVPixelBuffer due to performance overhead.
So I'm trying to render on MTLTexture with bgra8Unorm pixel format, but the following fragment shader code won't compile:
fragment vector_uchar4 fragmentShader2(Vertex interpolated [[stage_in]]) {
return 0xFFFFFFFF;
}
With error: Invalid return type 'vector_uchar4' for fragment function
I've tried to replace it with uint type, but it crashes with error:
Fatal error: 'try!' expression unexpectedly raised an error:
Error Domain=AGXMetalA11 Code=3
"output of type uint is not compatible with a MTLPixelFormatBGRA8Unorm color attachement."
UserInfo={NSLocalizedDescription=output of type uint is not compatible with a MTLPixelFormatBGRA8Unorm color attachement.}
If I use vector_float4 or vector_half4 return type my texture and buffers are empty.
Which return type I have to use for bgra8Unorm pixel format and get non empty image? Is it possible with metal at all?
I've found answer on page 30 of Metal Shading Language specification
And finally this code draws image as expected:
fragment float4 fragmentShader2(Vertex interpolated [[stage_in]]) {
// ...
rgba8unorm<float4> rgba;
rgba = float4(color.r, color.g, color.b, 1.0);
return rgba;
}
If someone can explain what is happening under the hood, I would really like to not waste bounty.
It depends on many different factors. In most cases you should use float4 or half4.
All modern apple GPUs that support metal designed to perform calculation on ( 32-bit or 64-bit) floating point data. It's how GPUs works, this means that any read operation calculated by the shader on the Float, Snorm, Unorm formats will be performed on 32-bit or 64-bit floating point, regardless of the original input format.
On any writing operation shader performs conversion from 32-bit or 64-bit floating point to target format. For conversion rules please see Metal Shading Language specification page 217.
Any metal formats that use the Float, Snorm, Unorm suffix are floating-point formats, while Uint and Sint are unsigned and signed integer.
Float - A floating-point value in any of the representations defined by metal.
Unorm - A floating-point value in range [0.0, 1.0].
Snorm - A floating-point value in range [-1.0, 1.0].
Uint - A unsigned integer.
Sint - A signed integer.
I have a question is between the fragment shader return value and MTkView.colorPixelFormat.
My fragment shader return float4, it is a 4 * 32bit vector, and MTkView.colorPixelFormat is .bgr10_xr.
how to convert float4 to .bgr10_xr? or this conversion is automatically?
It should just work. Metal will do the conversion for you. Refer to section 7.7 of Metal Shading Language Specification theres an entry about 10-bit formats 7.7.4 Conversion for 10- and 11-bit Floating-Point Pixel Data Type.
I'm rewriting an OpenGL filter from the Android version of the app I'm currently working at in Metal. It uses the following texture lookup function:
vec4 texture2D(sampler2D sampler, vec2 coord, float bias)
Assuming my filter kernel function looks like this:
float4 fname(sampler src) {
...
}
The texture lookup call would be the following:
src.sample(coord)
But how can I pass the bias parameter? (the sample function takes only 1 argument)
I'm afraid Core Image only supports 2D textures – no mipmapping and LOD selection. Only bilinear sampling is available.
If you need different LODs, you need to pass different samplers to your kernel and do the interpolation yourself.
I am trying to digest these two links:
https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview
https://www.khronos.org/opengl/wiki/Vertex_Shader
The pipeline overview says that vertex shader runs before the primitive assembly.
The second one mentions this:
A vertex shader is (usually) invariant with its input. That is, within a single Drawing Command, two vertex shader invocations that get the exact same input attributes will return binary identical results. Because of this, if OpenGL can detect that a vertex shader invocation is being given the same inputs as a previous invocation, it is allowed to reuse the results of the previous invocation, instead of wasting valuable time executing something that it already knows the answer to.
OpenGL implementations generally do not do this by actually comparing the input values (that would take far too long). Instead, this optimization typically only happens when using indexed rendering functions. If a particular index is specified more than once (within the same Instanced Rendering), then this vertex is guaranteed to result in the exact same input data.
Therefore, implementations employ a cache on the results of vertex shaders. If an index/instance pair comes up again, and the result is still in the cache, then the vertex shader is not executed again. Thus, there can be fewer vertex shader invocations than there are vertices specified.
So if i have two quads with two triangles each:
indexed:
verts: { 0 1 2 3 }
tris: { 0 1 2 }
{ 1 2 3 }
soup:
verts: { 0 1 2 3 4 5 }
tris: { 0 1 2 }
{ 3 4 5 }
and perhaps a vertex shader that looks like this:
uniform mat4 mvm;
uniform mat4 pm;
attribute vec3 position;
void main (){
vec4 res;
for ( int i = 0; i < 256; i++ ){
res = pm * mvm * vec4(position,1.);
}
gl_Position = res;
Should I care that one has 4 vertices while the other one has 6? Is this even true from gpu to gpu, will one invoke the vertex shader 4 times vs 6? How is this affected by the cache:
If an index/instance pair comes up again, and the result is still in the cache...
How is the primitive number related to performance here? In both cases i have the same amount of primitives.
In the case of a very simple fragment shader, but an expensive vertex shader:
void main(){
gl_FragColor = vec4(1.);
}
And a tessellated quad (100x100 segments) can i say that the indexed version will run faster, or can run faster, or maybe say nothing?
Like everything in GPUs according to the spec you can say nothing. It's up to the driver and GPU. In reality though in your example 4 vertices will run faster than 6 pretty much everywhere?
Search for vertex order optimization and lots of articles come up
Linear-Speed Vertex Cache Optimisation
Triangle Order Optimization
AMD Triangle Order Optimization Tool
Triangle Order Optimization for Graphics Hardware Computation Culling
unrelated but another example of the spec vs realtiy is that according to the spec depth testing happens AFTER the fragment shader runs (otherwise you couldn't set gl_FragDepth in the fragment shader. In reality though as long as the results are the same the driver/GPU can do whatever it wants so fragment shaders that don't set gl_FragDepth or discard certain fragments are depth tested first and only run if the test passes.
I am working on doing heavy linear algebra with GPU on IOS so I am trying out openGL ES 3.0. I started from Bartosz Ciechanowski's toy code to work out some general large-by-small matrix multiplication. I split the large one into vectors and use a set of uniform matrices to represent the smaller matrix. I have managed to input and output the vertex shaders, but cannot get the uniform matrices right as it always looks empty. Here is what I did:
In the vertex shader, I just copy over what I sent in the uniform vector:
uniform vec3 transMat0;
uniform vec3 transMat1;
in vec2 InV0;
out vec2 OutV0;
void main(){
OutV0 = vec2(transMat0.x,transMat1.y);
// OutV0 = InV0+vec2(1.0,2.0); //this works
}
In the ObjC code, I declared the buffer content as global variables:
static GLfloat mat0Data[] = {1000.0,100.0,10.0,1.0,0.1,0.01};//outside methods
static GLfloat mat1Data[] = {1000.0,100.0,10.0,1.0,0.1,0.01};//outside methods
And at where I compile the shader and generate the program handle, I attach the buffer:
GLuint mat0 = glGetUniformLocation(program, "transMat0");
GLuint mat1 = glGetUniformLocation(program, "transMat1");
glUniform3fv(mat0, 1, &(mat0Data[0]));
glUniform3fv(mat1, 1, &(mat1Data[0]));
The output buffer just reads all 0. Just to prove everything else worked, I can get the correct result at the transform feedback when I used the commented line
OutV0 = InV0+vec2(1.0,2.0); //this works
So I have to doubt I missed something with the uniform vectors. Any suggestions?
As Reto pointed out in the comment, I missed calling
glUseProgram()
With that added, I can get everything correct now.