I'm rewriting an OpenGL filter from the Android version of the app I'm currently working at in Metal. It uses the following texture lookup function:
vec4 texture2D(sampler2D sampler, vec2 coord, float bias)
Assuming my filter kernel function looks like this:
float4 fname(sampler src) {
...
}
The texture lookup call would be the following:
src.sample(coord)
But how can I pass the bias parameter? (the sample function takes only 1 argument)
I'm afraid Core Image only supports 2D textures – no mipmapping and LOD selection. Only bilinear sampling is available.
If you need different LODs, you need to pass different samplers to your kernel and do the interpolation yourself.
Related
I'm trying to learn MSL through the Metal Shading Language Specification, and saw that you can set LOD options when sampling a texture by specifying the options in the sample function. This is one of the examples given in the spec:
Tv sample(sampler s, float2 coord, lod_options options, int2 offset = int2(0)) const
lod_options include bias, level, gradient2d, etc.
I've looked all over but cannot find the usage syntax for this. Are these named arguments? Is lod_options a struct? For example, if I want to specify the LOD level, what is the correct way to do it? I know these options can also be specified in the sampler object itself, but if I want to do it here, what would be the right syntax to do so?
There is no lod_options type as such; you can think of it as a placeholder for one of the bias, level, gradient2d, etc. types. Each of these types is a different struct, which allows the Metal standard library to have an overloaded variant of the sample function for each such option.
To specify, for example, the mipmap level to sample, you'd provide a parameter of level type:
float4 color = myTexture.sample(mySampler, coords, level(1));
I am trying to digest these two links:
https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview
https://www.khronos.org/opengl/wiki/Vertex_Shader
The pipeline overview says that vertex shader runs before the primitive assembly.
The second one mentions this:
A vertex shader is (usually) invariant with its input. That is, within a single Drawing Command, two vertex shader invocations that get the exact same input attributes will return binary identical results. Because of this, if OpenGL can detect that a vertex shader invocation is being given the same inputs as a previous invocation, it is allowed to reuse the results of the previous invocation, instead of wasting valuable time executing something that it already knows the answer to.
OpenGL implementations generally do not do this by actually comparing the input values (that would take far too long). Instead, this optimization typically only happens when using indexed rendering functions. If a particular index is specified more than once (within the same Instanced Rendering), then this vertex is guaranteed to result in the exact same input data.
Therefore, implementations employ a cache on the results of vertex shaders. If an index/instance pair comes up again, and the result is still in the cache, then the vertex shader is not executed again. Thus, there can be fewer vertex shader invocations than there are vertices specified.
So if i have two quads with two triangles each:
indexed:
verts: { 0 1 2 3 }
tris: { 0 1 2 }
{ 1 2 3 }
soup:
verts: { 0 1 2 3 4 5 }
tris: { 0 1 2 }
{ 3 4 5 }
and perhaps a vertex shader that looks like this:
uniform mat4 mvm;
uniform mat4 pm;
attribute vec3 position;
void main (){
vec4 res;
for ( int i = 0; i < 256; i++ ){
res = pm * mvm * vec4(position,1.);
}
gl_Position = res;
Should I care that one has 4 vertices while the other one has 6? Is this even true from gpu to gpu, will one invoke the vertex shader 4 times vs 6? How is this affected by the cache:
If an index/instance pair comes up again, and the result is still in the cache...
How is the primitive number related to performance here? In both cases i have the same amount of primitives.
In the case of a very simple fragment shader, but an expensive vertex shader:
void main(){
gl_FragColor = vec4(1.);
}
And a tessellated quad (100x100 segments) can i say that the indexed version will run faster, or can run faster, or maybe say nothing?
Like everything in GPUs according to the spec you can say nothing. It's up to the driver and GPU. In reality though in your example 4 vertices will run faster than 6 pretty much everywhere?
Search for vertex order optimization and lots of articles come up
Linear-Speed Vertex Cache Optimisation
Triangle Order Optimization
AMD Triangle Order Optimization Tool
Triangle Order Optimization for Graphics Hardware Computation Culling
unrelated but another example of the spec vs realtiy is that according to the spec depth testing happens AFTER the fragment shader runs (otherwise you couldn't set gl_FragDepth in the fragment shader. In reality though as long as the results are the same the driver/GPU can do whatever it wants so fragment shaders that don't set gl_FragDepth or discard certain fragments are depth tested first and only run if the test passes.
Metal supports kernel in addition to the standard vertex and fragment functions. I found a metal kernel example that converts an image to grayscale.
What exactly is the difference between doing this in a kernel vs fragment? What can a compute kernel do (better) that a fragment shader can't and vice versa?
Metal has four different types of command encoders:
MTLRenderCommandEncoder
MTLComputeCommandEncoder
MTLBlitCommandEncoder
MTLParallelRenderCommandEncoder
If you're just doing graphics programming, you're most familiar with the MTLRenderCommandEncoder. That is where you would set up your vertex and fragment shaders. This is optimized to deal with a lot of draw calls and object primitives.
The kernel shaders are primarily used for the MTLComputeCommandEncoder. I think the reason a kernel shader and a compute encoder were used for the image processing example is because you're not drawing any primitives as you would be with the render command encoder. Even though both methods are utilizing graphics, in this instance it's simply modifying color data on a texture rather than calculating depth of multiple objects on a screen.
The compute command encoder is also more easily set up to do parallel computing using threads:
https://developer.apple.com/reference/metal/mtlcomputecommandencoder
So if your application wanted to utilize multithreading on data modification, it's easier to do that in this command encoder than the render command encoder.
I am working on doing heavy linear algebra with GPU on IOS so I am trying out openGL ES 3.0. I started from Bartosz Ciechanowski's toy code to work out some general large-by-small matrix multiplication. I split the large one into vectors and use a set of uniform matrices to represent the smaller matrix. I have managed to input and output the vertex shaders, but cannot get the uniform matrices right as it always looks empty. Here is what I did:
In the vertex shader, I just copy over what I sent in the uniform vector:
uniform vec3 transMat0;
uniform vec3 transMat1;
in vec2 InV0;
out vec2 OutV0;
void main(){
OutV0 = vec2(transMat0.x,transMat1.y);
// OutV0 = InV0+vec2(1.0,2.0); //this works
}
In the ObjC code, I declared the buffer content as global variables:
static GLfloat mat0Data[] = {1000.0,100.0,10.0,1.0,0.1,0.01};//outside methods
static GLfloat mat1Data[] = {1000.0,100.0,10.0,1.0,0.1,0.01};//outside methods
And at where I compile the shader and generate the program handle, I attach the buffer:
GLuint mat0 = glGetUniformLocation(program, "transMat0");
GLuint mat1 = glGetUniformLocation(program, "transMat1");
glUniform3fv(mat0, 1, &(mat0Data[0]));
glUniform3fv(mat1, 1, &(mat1Data[0]));
The output buffer just reads all 0. Just to prove everything else worked, I can get the correct result at the transform feedback when I used the commented line
OutV0 = InV0+vec2(1.0,2.0); //this works
So I have to doubt I missed something with the uniform vectors. Any suggestions?
As Reto pointed out in the comment, I missed calling
glUseProgram()
With that added, I can get everything correct now.
I have a 2D texture formatted as DXGI_FORMAT_R32_FLOAT. In my pixel shader I sample from it thusly:
float sample = texture.Sample(sampler, coordinates);
This results in the following compiler warning:
warning X3206: implicit truncation of vector type
I'm confused by this. Shouldn't Sample return a single channel, and therefore a scalar value, as opposed to a vector?
I'm using shader model 4 level 9_1.
Either declare your texture as having one channel, or specify which channel you want. Without the <float> bit, it'll assume it's a 4 channel texture and so therefore Sample will return a float4.
Texture2D<float> texture;
or
float sample = texture.Sample(sampler, coordinates).r;