iOS Metal Swift - Terrible performance using texture array in fragment shader

iOS Metal Swift - Terrible performance using texture array in fragment shader - ios

I'm making a sprite batcher that can deal with more than one texture per batch. A sprite's data is stored into this huge uniforms buffer which is sent to the GPU as soon as the sprite batch is all filled up. I tried assuming there would be 16 textures, some of which were going to be unused, and based on the textureID sent through the instance uniforms the fragment shader would pick the right texture to use. This yielded in roughly 60 fps with 800-1000 sprites on an iPhone 5s. I then tested this with a single texture and received a satisfying 2000 sprites at 60 fps. Knowing I would still need to be able to swap textures, I decided to use texture arrays to bind one texture with 16 slices. If I render using texture index 0, the fps is just as it was with the single slice texture. Once I delve into further slices, however, performance drops massively.
Here is the shader:
struct VertexIn {
packed_float2 position [[ attribute(0) ]];
packed_float2 texCoord [[ attribute(1) ]];
};
struct VertexOut {
float4 position [[position]];
float2 texCoord;
uint iid;
};
struct InstanceUniforms {
float3x2 transformMatrix;
float2 uv;
float2 uvLengths;
float textureID;
};
vertex VertexOut spriteVertexShader(const device VertexIn *vertex_array [[ buffer(0) ]],
const device InstanceUniforms *instancedUniforms [[ buffer(1) ]],
uint vid [[ vertex_id ]],
uint iid [[ instance_id ]]) {
VertexIn vertexIn = vertex_array[vid];
InstanceUniforms instanceUniforms = instancedUniforms[iid];
VertexOut vertexOut;
vertexOut.position = float4(instanceUniforms.transformMatrix * float3(vertexIn.position, 1.0), 0.0, 1.0);
vertexOut.texCoord = instanceUniforms.uv + vertexIn.texCoord * instanceUniforms.uvLengths;
vertexOut.iid = iid;
return vertexOut;
}
fragment float4 spriteFragmentShader(VertexOut interpolated [[ stage_in ]],
const device InstanceUniforms *instancedUniforms [[ buffer(0) ]],
texture2d_array<float> tex [[ texture(0) ]],
sampler sampler2D [[ sampler(0) ]],
float4 dst_color [[ color(0) ]]) {
InstanceUniforms instanceUniforms = instancedUniforms[interpolated.iid];
float2 texCoord = interpolated.texCoord;
return tex.sample(sampler2D, texCoord, uint(instanceUniforms.textureID));
}
Everything is working exactly as expected until I use a texture slice greater than 0. I am using instanced rendering. All sprites share the same vertex and index buffer.

Related

Creating texture in Vertex shader and passing to Fragment to achieve smudging brush with Metal?

I am trying to add a smudge effect to my paint brush project. To achieve that, I think I need to sample the the current results (which is in paintedTexture) from the start of the brush stroke coordinates and pass it to the fragment shader.
I have a vertex shader such as:
vertex VertexOut vertex_particle(
device Particle *particles [[buffer(0)]],
constant RenderParticleParameters *params [[buffer(1)]],
texture2d<half> imageTexture [[ texture(0) ]],
texture2d<half> paintedTexture [[ texture(1) ]],
uint instance [[instance_id]])
{
VertexOut out;
Drawing a fragment shader such as:
fragment half4 fragment_particle(VertexOut in [[ stage_in ]],
half4 existingColor [[color(0)]],
texture2d<half> brushTexture [[ texture(0) ]],
float2 point [[ point_coord ]]) {
Is it possible to create a clipped texture from the paintedTexture and send it to the fragment shader?
paintedTexture is the current results that have been painted to the canvas. I would like to create a new texture from paintedTexture using the same area as the brush texture and pass it to the fragment shader.
The existingColor [[color(0)]] in the fragment shader is of no use since it is the current color, not the color at the beginning of a stroke. If I use existingColor, it's like using transparency (or a transfer mode based on what math is used to combine it with a new color).
If I am barking up the wrong tree, any suggestions on how to achieve a smudging effect with Metal would potentially be acceptable answers.
Update: I tried using a texture2d in the VertexOut struct:
struct VertexOut {
float4 position [[ position ]];
float point_size [[ point_size ]];
texture2d<half> paintedTexture;
}
But it fails to compile with the error:
vertex function has invalid return type 'VertexOut'
It doesn't seem possible to have an array in the VertexOut struct either (which isn't nearly as ideal as a texture, but it could be a path forward):
struct VertexOut {
float4 position [[ position ]];
float point_size [[ point_size ]];
half4 paintedPixels[65536];
}
Gives me the error:
type 'VertexOut' is not valid for attribute 'stage_in'

It's not possible for shaders to create textures. They could fill an existing one, but I don't think that's what you want or need, here.
I would expect you could pass paintedTexture to the fragment shader and use the vertex shader to note where, from that texture, to sample. So, just coordinates.

SCNParticleSystem Crash

I add stock smoke SCNParticleSystem to a node in a scene in an ARSCNView. This works as expected.
I use ARSCNView.snapshot() to capture image and process before drawing in MTKView draw() method.
I then call removeAllParticleSystems() on the main thread on node with particle system and remove the node from scene with removeFromParent().
I then add other nodes to the scene and eventually, the app crashes with the error validateFunctionArguments:3469: failed assertion 'Vertex Function(uberparticle_vert): missing buffer binding at index 19 for vertexBuffer.1[0].'
TheAll Exceptions break point often stops on the ARSCNView.snapshot() call.
Why is this this crashing?
What does the error mean?
How should I be adding and removing particle systems from scenes in ARSCNView's?
UPDATE:
I hooked up the MTKView subclass I use from here to a working ARKit demo with particle system and the same Vertex Function crash occurs.
Does that mean the issue is with the passthrough vertex shader function?
Why are the particle systems treated differently?
Below are the shader functions. Thanks.
#include <metal_stdlib>
using namespace metal;
// Vertex input/output structure for passing results from vertex shader to fragment shader
struct VertexIO
{
float4 position [[position]];
float2 textureCoord [[user(texturecoord)]];
};
// Vertex shader for a textured quad
vertex VertexIO vertexPassThrough(device packed_float4 *pPosition [[ buffer(0) ]],
device packed_float2 *pTexCoords [[ buffer(1) ]],
uint vid [[ vertex_id ]])
{
VertexIO outVertex;
outVertex.position = pPosition[vid];
outVertex.textureCoord = pTexCoords[vid];
return outVertex;
}
// Fragment shader for a textured quad
fragment half4 fragmentPassThrough(VertexIO inputFragment [[ stage_in ]],
texture2d<half> inputTexture [[ texture(0) ]],
sampler samplr [[ sampler(0) ]])
{
return inputTexture.sample(samplr, inputFragment.textureCoord);
}

How do I save depth buffer to a texture with Metal?

I'd like to save a depth buffer to a texture in metal, but nothing I've tried seems to work.
_renderPassDesc.colorAttachments[1].clearColor = MTLClearColorMake(0.f, 0.f, 0.f, 1.f);
[self createTextureFor:_renderPassDesc.colorAttachments[1]
size:screenSize
withDevice:_device
format:MTLPixelFormatRGBA16Float];
_renderPassDesc.depthAttachment.loadAction = MTLLoadActionClear;
_renderPassDesc.depthAttachment.storeAction = MTLStoreActionStore;
_renderPassDesc.depthAttachment.texture = self.depthTexture;
_renderPassDesc.depthAttachment.clearDepth = 1.0;
When I pass depthTexture into my shader (which works fine with data from my other textures), all I get is red pixels.
As I change clearDepth to values closer to zero, I get darker shades of red. Perhaps I'm somehow not sampling the texture correctly in my shader?
fragment float4 cubeFrag(ColorInOut in [[stage_in]],
texture2d<float> albedo [[ texture(0) ]],
texture2d<float> normals [[ texture(1) ]],
texture2d<float> albedo2 [[ texture(2) ]],
texture2d<float> normals2 [[ texture(3) ]],
texture2d<float> lightData [[ texture(4) ]],
texture2d<float> depth [[ texture(5) ]])
{
constexpr sampler texSampler(min_filter::linear, mag_filter::linear);
return depth.sample(texSampler, in.texCoord).rgba;
}

Use depth2d<float> instead of texture2d<float> as the argument type, and read a float from the depth texture float val = depth.sample(texSampler, in.texCoord);

OK, it turns out that I just needed to use depth2d instead of texture2d:
depth2d<float> depth [[ texture(5) ]])

Did Apple break texture projection again?

In iOS 8 there was a problem with the division of floats in Metal preventing proper texture projection, which I solved.
Today I discovered that the texture projection on iOS 9 is broken again, although I'm not sure why.
The result of warping a texture on CPU (with OpenCV) and on GPU are not the same. You can see on your iPhone if you run this example app (already includes the fix for iOS 8) from iOS 9.
The expected CPU warp is colored red, while the GPU warp done by Metal is colored green, so where they overlap they are yellow. Ideally you should not see green or red, but only shades of yellow.
Can you:
confirm the problem exists on your end;
give any advice on anything that might be wrong?
The shader code is:
struct VertexInOut
{
float4 position [[ position ]];
float3 warpedTexCoords;
float3 originalTexCoords;
};
vertex VertexInOut warpVertex(uint vid [[ vertex_id ]],
device float4 *positions [[ buffer(0) ]],
device float3 *texCoords [[ buffer(1) ]])
{
VertexInOut v;
v.position = positions[vid];
// example homography
simd::float3x3 h = {
{1.03140473, 0.0778113901, 0.000169219566},
{0.0342947133, 1.06025684, 0.000459250761},
{-0.0364957005, -38.3375587, 0.818259298}
};
v.warpedTexCoords = h * texCoords[vid];
v.originalTexCoords = texCoords[vid];
return v;
}
fragment half4 warpFragment(VertexInOut inFrag [[ stage_in ]],
texture2d<half, access::sample> original [[ texture(0) ]],
texture2d<half, access::sample> cpuWarped [[ texture(1) ]])
{
constexpr sampler s(coord::pixel, filter::linear, address::clamp_to_zero);
half4 gpuWarpedPixel = half4(original.sample(s, inFrag.warpedTexCoords.xy * (1.0 / inFrag.warpedTexCoords.z)).r, 0, 0, 255);
half4 cpuWarpedPixel = half4(0, cpuWarped.sample(s, inFrag.originalTexCoords.xy).r, 0, 255);
return (gpuWarpedPixel + cpuWarpedPixel) * 0.5;
}

Do not ask me why, but if I multiply the warped coordinates by 1.00005 or any number close to 1.0, it is fixed (apart from very tiny details). See last commit in the example app repo.

How to use a 3x3 2D transformation in a vertex/fragment shader (Metal)

I have a supposedly simple task, but apparently I still don't understand how projections work in shaders. I need to do a 2D perspective transformation on a texture quad (2 triangles), but visually it doesn't look correct (e.g. trapezoid is slightly higher or more stretched than what it is in the CPU version).
I have this struct:
struct VertexInOut
{
float4 position [[position]];
float3 warp0;
float3 warp1;
float3 warp2;
float3 warp3;
};
And in the vertex shader I do something like (texCoords are pixel coords of the quad corners and homography is calculated in pixel coords):
v.warp0 = texCoords[vid] * homographies[0];
Then in the fragment shader like this:
return intensity.sample(s, inFrag.warp0.xy / inFrag.warp0.z);
The result is not what I expect. I spent hours on this, but I cannot figure it out. venting
UPDATE:
These are code and result for CPU (aka expected result):
// _image contains the original image
cv::Matx33d h(1.03140473, 0.0778113901, 0.000169219566,
0.0342947133, 1.06025684, 0.000459250761,
-0.0364957005, -38.3375587, 0.818259298);
cv::Mat dest(_image.size(), CV_8UC4);
// h is transposed because OpenCV is col major and using backwarping because it is what is used on the GPU, so better for comparison
cv::warpPerspective(_image, dest, h.t(), _image.size(), cv::WARP_INVERSE_MAP | cv::INTER_LINEAR);
These are code and result for GPU (aka wrong result):
// constants passed in buffers, image size 320x240
const simd::float4 quadVertices[4] =
{
{ -1.0f, -1.0f, 0.0f, 1.0f },
{ +1.0f, -1.0f, 0.0f, 1.0f },
{ -1.0f, +1.0f, 0.0f, 1.0f },
{ +1.0f, +1.0f, 0.0f, 1.0f },
};
const simd::float3 textureCoords[4] =
{
{ 0, IMAGE_HEIGHT, 1.0f },
{ IMAGE_WIDTH, IMAGE_HEIGHT, 1.0f },
{ 0, 0, 1.0f },
{ IMAGE_WIDTH, 0, 1.0f },
};
// vertex shader
vertex VertexInOut homographyVertex(uint vid [[ vertex_id ]],
constant float4 *positions [[ buffer(0) ]],
constant float3 *texCoords [[ buffer(1) ]],
constant simd::float3x3 *homographies [[ buffer(2) ]])
{
VertexInOut v;
v.position = positions[vid];
// example homography
simd::float3x3 h = {
{1.03140473, 0.0778113901, 0.000169219566},
{0.0342947133, 1.06025684, 0.000459250761},
{-0.0364957005, -38.3375587, 0.818259298}
};
v.warp = h * texCoords[vid];
return v;
}
// fragment shader
fragment int4 homographyFragment(VertexInOut inFrag [[stage_in]],
texture2d<uint, access::sample> intensity [[ texture(1) ]])
{
constexpr sampler s(coord::pixel, filter::linear, address::clamp_to_zero);
float4 targetIntensity = intensityRight.sample(s, inFrag.warp.xy / inFrag.warp.z);
return targetIntensity;
}
Original image:
UPDATE 2:
Contrary to the common belief that the perspective divide should be done in the fragment shader, I get a much more similar result if I divide in the vertex shader (and no distortion or seam between triangles), but why?
UPDATE 3:
I get the same (wrong) result if:
I move the perspective divide to the fragment shader
I simply remove the divide from the code
Very strange, it looks like the divide is not happening.

OK, the solution was of course a very small detail: the division of simd::float3 behaves absolutely nuts. In fact, if I do the perspective divide in the fragment shader like this:
float4 targetIntensity = intensityRight.sample(s, inFrag.warp.xy * (1.0 / inFrag.warp.z));
it works!
Which lead me to find out that multiplying by the pre-divided float is different than dividing by a float. The reason for this is still unknown to me, if anyone knows why we can unravel this mystery.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

iOS Metal Swift - Terrible performance using texture array in fragment shader - ios

Related

Creating texture in Vertex shader and passing to Fragment to achieve smudging brush with Metal?

SCNParticleSystem Crash

How do I save depth buffer to a texture with Metal?

Did Apple break texture projection again?

How to use a 3x3 2D transformation in a vertex/fragment shader (Metal)

Categories

Resources