Memory leak when I use custom compute shaders in Metal - ios

Im trying to apply the live camera filters through metal using the default MPSKernal filters given by apple and custom compute Shaders.
Im applying the custom filters on a collection view in a grid with the combination of default and the custom kernel functions.
It looks like in the app Clips.
But what I observed is that using the custom filters there is a lot of memory leaks compared to the default kernel functions given by apple.
I don't know what mistakes I made though if any.
Here is my custom compute shader.
kernel void customFunction1(
texture2d<float, access::read> inTexture [[texture(0)]],
texture2d<float, access::write> outTexture [[texture(1)]],
uint2 gid [[thread_position_in_grid]]){
const float4 colorAtPixel = inTexture.read(gid);
const float4 outputColor = float4(colorAtPixel.r, colorAtPixel.g, colorAtPixel.b, 1);
outTexture.write(outputColor, gid);
}
Regarding to my creating my pipeline and the dispatching through thread groups the code goes here
let blur = MPSImageGaussianBlur(device: device, sigma: 0)
let threadsPerThreadgroup = MTLSizeMake(4, 4, 1)
let threadgroupsPerGrid = MTLSizeMake(destinationTexture.width / threadsPerThreadgroup.width, destinationTexture.height / threadsPerThreadgroup.height, 1)
let commandEncoder = commandBuffer.makeComputeCommandEncoder()
commandEncoder.setComputePipelineState(pipelineState!)
commandEncoder.setTexture(sourceTexture, at: 0)
commandEncoder.setTexture(destinationTexture, at: 1)
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
commandEncoder.endEncoding()
autoreleasepool {
let inPlaceTexture = UnsafeMutablePointer<MTLTexture>.allocate(capacity: 1)
inPlaceTexture.initialize(to: destinationTexture)
blur.encode(commandBuffer: commandBuffer, inPlaceTexture: inPlaceTexture, fallbackCopyAllocator: nil)
}
The Pipeline state with the custom Shader is created like this.
let defaultLibrary = device.newDefaultLibrary()
let kernelFunction = defaultLibrary!.makeFunction(name: name)
do {
pipelineState = try device.makeComputePipelineState(function: kernelFunction!)
} catch {
fatalError("Unable to create pipeline state")
}
And in instrumentation it shows there is a leak in some Malloc 16 bytes and in [Mtkview draw] method.
The screenshot is shown below.
I want help in finding where and how the issue is occurring from.
Thanks.

There's no reason to explicitly allocate an UnsafeMutablePointer to store the in-place texture parameter. Incidentally, that's the source of your leak: you allocate the pointer and then never deallocate it.
Use a local variable to pass the texture instead:
var inPlaceTexture = destinationTexture
blur.encode(commandBuffer: commandBuffer, inPlaceTexture: &inPlaceTexture, fallbackCopyAllocator: nil)
By the way, you're (eventually) going to have a bad time if you call the in-place encode method without providing a fallback allocator or checking the return value. In-place encoding will fail in certain situations, so you should provide a closure that allocates an appropriate texture in the event of failure.

Related

Metal defaultLibrary does not load .metal functions

My metal default library does not contain the vertex and shader functions from the .metal file of the same directory.
Then the library.makeFunction(name: ..) returns nil for both the vertex and shader functions that should be assigned to pipelineDescriptor vars.
The metal file & headers are copied from the Apple Sample App "BasicTexturing" (Creating and Sampling Textures).
The file APPLShaders.metal and APPLShaderTypes.h contain a vertexShader and samplingShader functions that are loaded by an AAPLRenderer.m
In the sample it's really straightforward
id<MTLLibrary> defaultLibrary = [_device newDefaultLibrary];
id<MTLFunction> vertexFunction = [defaultLibrary newFunctionWithName:#"vertexShader"];
id<MTLFunction> fragmentFunction = [defaultLibrary newFunctionWithName:#"samplingShader"];
I have copied these files to a RayWenderlich Swift tutorial and used the swift version
There is an init to set the library
Renderer.library = device.makeDefaultLibrary()
then
let library = Renderer.library
let importVertexFunction = library?.makeFunction(name: "vertexShader")
let importShaderFunction = library?.makeFunction(name: "samplingShader")
This works just fine!
Same thing in my app with the same files copied over and it does not load the functions.
I have checked compileSources in build settings - it lists the metal file.
Comparing everything in settings and don't see a difference between the working apps and my app.
I don't see any error messages or log messages to indicate a syntax or path problem.
Any ideas?
The Apple sample code AAPLShaders.metal
/*
See LICENSE folder for this sample’s licensing information.
Abstract:
Metal shaders used for this sample
*/
#include <metal_stdlib>
#include <simd/simd.h>
using namespace metal;
// Include header shared between this Metal shader code and C code executing Metal API commands
#import "AAPLShaderTypes.h"
// Vertex shader outputs and per-fragment inputs. Includes clip-space position and vertex outputs
// interpolated by rasterizer and fed to each fragment generated by clip-space primitives.
typedef struct
{
// The [[position]] attribute qualifier of this member indicates this value is the clip space
// position of the vertex wen this structure is returned from the vertex shader
float4 clipSpacePosition [[position]];
// Since this member does not have a special attribute qualifier, the rasterizer will
// interpolate its value with values of other vertices making up the triangle and
// pass that interpolated value to the fragment shader for each fragment in that triangle;
float2 textureCoordinate;
} RasterizerData;
// Vertex Function
vertex RasterizerData
vertexShader(uint vertexID [[ vertex_id ]],
constant AAPLVertex *vertexArray [[ buffer(AAPLVertexInputIndexVertices) ]],
constant vector_uint2 *viewportSizePointer [[ buffer(AAPLVertexInputIndexViewportSize) ]])
{
RasterizerData out;
// Index into our array of positions to get the current vertex
// Our positions are specified in pixel dimensions (i.e. a value of 100 is 100 pixels from
// the origin)
float2 pixelSpacePosition = vertexArray[vertexID].position.xy;
// Get the size of the drawable so that we can convert to normalized device coordinates,
float2 viewportSize = float2(*viewportSizePointer);
// The output position of every vertex shader is in clip space (also known as normalized device
// coordinate space, or NDC). A value of (-1.0, -1.0) in clip-space represents the
// lower-left corner of the viewport whereas (1.0, 1.0) represents the upper-right corner of
// the viewport.
// In order to convert from positions in pixel space to positions in clip space we divide the
// pixel coordinates by half the size of the viewport.
out.clipSpacePosition.xy = pixelSpacePosition / (viewportSize / 2.0);
// Set the z component of our clip space position 0 (since we're only rendering in
// 2-Dimensions for this sample)
out.clipSpacePosition.z = 0.0;
// Set the w component to 1.0 since we don't need a perspective divide, which is also not
// necessary when rendering in 2-Dimensions
out.clipSpacePosition.w = 1.0;
// Pass our input textureCoordinate straight to our output RasterizerData. This value will be
// interpolated with the other textureCoordinate values in the vertices that make up the
// triangle.
out.textureCoordinate = vertexArray[vertexID].textureCoordinate;
return out;
}
// Fragment function
fragment float4
samplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(AAPLTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::linear,
min_filter::linear);
// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);
// We return the color of the texture
return float4(colorSample);
}
The Apple Sample code header AAPLShaderTypes.h
/*
See LICENSE folder for this sample’s licensing information.
Abstract:
Header containing types and enum constants shared between Metal shaders and C/ObjC source
*/
#ifndef AAPLShaderTypes_h
#define AAPLShaderTypes_h
#include <simd/simd.h>
// Buffer index values shared between shader and C code to ensure Metal shader buffer inputs match
// Metal API buffer set calls
typedef enum AAPLVertexInputIndex
{
AAPLVertexInputIndexVertices = 0,
AAPLVertexInputIndexViewportSize = 1,
} AAPLVertexInputIndex;
// Texture index values shared between shader and C code to ensure Metal shader buffer inputs match
// Metal API texture set calls
typedef enum AAPLTextureIndex
{
AAPLTextureIndexBaseColor = 0,
} AAPLTextureIndex;
// This structure defines the layout of each vertex in the array of vertices set as an input to our
// Metal vertex shader. Since this header is shared between our .metal shader and C code,
// we can be sure that the layout of the vertex array in the code matches the layout that
// our vertex shader expects
typedef struct
{
// Positions in pixel space (i.e. a value of 100 indicates 100 pixels from the origin/center)
vector_float2 position;
// 2D texture coordinate
vector_float2 textureCoordinate;
} AAPLVertex;
#endif /* AAPLShaderTypes_h */
Debug print of my library
Printing description of self.library:
(MTLLibrary?) library = (object = 0x00006000004af7b0) {
object = 0x00006000004af7b0 {
baseNSObject#0 = {
isa = CaptureMTLLibrary
}
Debug print of working library from RayWenderlich sample app
The new added sampleShader and vertexShader are shown in the library along with the existing fragment and vertex functions.
▿ Optional<MTLLibrary>
- some : <CaptureMTLLibrary: 0x600000f54210> -> <MTLDebugLibrary: 0x600002204050> -> <_MTLLibrary: 0x600001460280>
label = <none>
device = <MTLSimDevice: 0x15a5069d0>
name = Apple iOS simulator GPU
functionNames: fragment_main vertex_main samplingShader vertexShader
Did you check the target membership of file? Your code is nothing to weird so please check the target.
Answer - issue of not loading functions into the metal library is resolved by removing a leftover -fcikernel flag in the Other Metal Compiler Flags option of Build Settings of the project target.
The flag was set when testing a CoreImageKernel.metal as documented in https://developer.apple.com/documentation/coreimage/cikernel/2880194-init
I removed the kernel definition file from the app but missed the compiler flag.. and missed it when visually comparing build settings.

Metal Shading language for Core Image color kernel, how to pass an array of float3

I'm trying to port some CIFilter from this source by using metal shading language for Core Image.
I have a palette of color composed by an array of RGB struct and I want to pass them as an argument to a custom CI color image kernel.
The RGB struct is converted into an array of SIMD3<Float>.
static func SIMD3Palette(_ palette: [RGB]) -> [SIMD3<Float>] {
return palette.map{$0.toFloat3()}
}
The kernel should take and array of simd_float3 values, the problem is the when I launch the filter it tells me that the argument at index 1 is expecting an NSData.
override var outputImage: CIImage? {
guard let inputImage = inputImage else
{
return nil
}
let palette = EightBitColorFilter.palettes[Int(inputPaletteIndex)]
let extent = inputImage.extent
let arguments = [inputImage, palette, Float(palette.count)] as [Any]
let final = colorKernel.apply(extent: extent, arguments: arguments)
return final
}
This is the kernel:
float4 eight_bit(sample_t image, simd_float3 palette[], float paletteSize, destination dest) {
float dist = distance(image.rgb, palette[0]);
float3 returnColor = palette[0];
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(image.rgb, palette[i]);
if (tempDist < dist) {
dist = tempDist;
returnColor = palette[i];
}
}
return float4(returnColor, 1);
}
I'm wondering how can I pass a data buffer to the kernel since converting it into an NSData seems not enough. I saw some example but they are using "full" shading language that is not available for Core Image that is a sort of subset for dealing only with fragments.
Update
We have now figured out how to pass data buffers directly into Core Image kernels. Using a CIImage as described below is not needed, but still possible.
Assuming that you have your raw data as an NSData, you can just pass it to the kernel on invocation:
kernel.apply(..., arguments: [data, ...])
Note: Data might also work, but I know that NSData is an argument type that allows Core Image to cache filter results based on input arguments. So when in doubt, better cast to NSData.
Then in the kernel function, you only need to declare the parameter with an appropriate constant type:
extern "C" float4 myKernel(constant float3 data[], ...) {
float3 data0 = data[0];
// ...
}
Previous Answer
Core Image kernels don't seem to support pointer or array parameter types. Though there seem to be something coming with iOS 13. From the Release Notes:
Metal CIKernel instances support arguments with arbitrarily structured data.
But, as so often with Core Image, there seem to be no further documentation for that…
However, you can still use the "old way" of passing buffer data by wrapping it in a CIImage and sampling it in the kernel. For example:
let array: [Float] = [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
let data = array.withUnsafeBufferPointer { Data(buffer: $0) }
let dataImage = CIImage(bitmapData: data, bytesPerRow: data.count, size: CGSize(width: array.count/4, height: 1), format: .RGBAf, colorSpace: nil)
Note that there is no CIFormat for 3-channel images since the GPU doesn't support those. So you either have to use single-channel .Rf and re-pack the values inside your kernel to float3 again, or add some strides to your data and use .RGBAf and float4 respectively (which I'd recommend since it reduces texture fetches).
When you pass that image into your kernel, you probably want to set the sampling mode to nearest, otherwise you might get interpolated values when sampling between two pixels:
kernel.apply(..., arguments: [dataImage.samplingNearest(), ...])
In your (Metal) kernel, you can assess the data as you would with a normal input image via a sampler:
extern "C" float4 myKernel(coreimage::sampler data, ...) {
float4 data0 = data.sample(data.transform(float2(0.5, 0.5))); // data[0]
float4 data1 = data.sample(data.transform(float2(1.5, 0.5))); // data[1]
// ...
}
Note that I added 0.5 to the coordinates so that they point in the middle of a pixel in the data image to avoid ambiguity and interpolation.
Also note that pixel values you get from a sampler always have 4 channels. So even when you are creating your data image with formate .Rf, you'll get a float4 when sampling it (the other values are filled with 0.0 for G and B and 1.0 for alpha). In this case, you can just do
float data0 = data.sample(data.transform(float2(0.5, 0.5))).x;
Edit
I previously forgot to transform the sample coordinate from absolute pixel space (where (0.5, 0.5) would be the middle of the first pixel) to relative sampler space (where (0.5, 0.5) would be the middle of the whole buffer). It's fixed now.
I made it, event if the answer was good and also deploys to lower target the result wasn't exactly what I was expecting. The difference between the original kernel written as a string and the above method to create an image to be used as a source of data were kind of big.
Didn't get exactly the reason, but the image I was passing as a source of the palette was kind of different from the created one in size and color(probably due to color spaces).
Since there was no documentation about this statement:
Metal CIKernel instances support arguments with arbitrarily structured
data.
I tried a lot in my spare time and came up to this.
First the shader:
float4 eight_bit_buffer(sampler image, constant simd_float3 palette[], float paletteSize, destination dest) {
float4 color = image.sample(image.transform(dest.coord()));
float dist = distance(color.rgb, palette[0]);
float3 returnColor = palette[0];
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(color.rgb, palette[i]);
if (tempDist < dist) {
dist = tempDist;
returnColor = palette[i];
}
}
return float4(returnColor, 1);
}
Second the palette transformation into SIMD3<Float>:
static func toSIMD3Buffer(from palette: [RGB]) -> Data {
var simd3Palette = SIMD3Palette(palette)
let size = MemoryLayout<SIMD3<Float>>.size
let count = palette.count * size
let palettePointer = UnsafeMutableRawPointer.allocate(
byteCount: simd3Palette.count * MemoryLayout<SIMD3<Float>>.stride,
alignment: MemoryLayout<SIMD3<Float>>.alignment)
let simd3Pointer = simd3Palette.withUnsafeMutableBufferPointer { (buffer) -> UnsafeMutablePointer<SIMD3<Float>> in
let p = palettePointer.initializeMemory(as: SIMD3<Float>.self,
from: buffer.baseAddress!,
count: buffer.count)
return p
}
let data = Data(bytesNoCopy: simd3Pointer, count: count * MemoryLayout<SIMD3<Float>>.stride, deallocator: .free)
return data
}
The first time I tried by appending SIMD3 to the Data object but wasn't working probably due to memory alignment.
Remember to dealloc the memory created after you used it.
Hope to help someone else.

Called CAMetalLayer nextDrawable earlier than needed

The GPU frame capture is warning:
your application called CAMetalLayer nextDrawable earlier than needed
I am calling present(drawable) after all my other GPU calls have been made, directly before committing the command buffer
guard let commandBuffer = commandQueue.makeCommandBuffer(),
let computeBuffer = commandQueue.makeCommandBuffer(),
let descriptor = view.currentRenderPassDescriptor else { return }
...
//First run the compute kernel
guard let computeEncoder = computeBuffer.makeComputeCommandEncoder() else { return }
computeEncoder.setComputePipelineState(computePipelineState)
dispatchThreads(particleCount: particleCount)
computeEncoder.endEncoding()
computeBuffer.commit()
//I need to wait here because I need values from a computed buffer first
//and also why I am not just using a single pipeline descriptor
computeBuffer.waitUntilCompleted()
//Next render computed particles with vertex shader to a texture
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor0)!
renderEncoder.setRenderPipelineState(renderPipelineState)
...
renderEncoder.endEncoding()
//Draw texture (created above) using vertex shader:
let renderTexture = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)
renderTexture?.setRenderPipelineState(renderCanvasPipelineState)
...
renderTexture?.endEncoding()
//Finally present drawable and commit command buffer:
guard let drawable = view.currentDrawable else { return }
commandBuffer.present(drawable)
commandBuffer.commit()
I don't see how it is possible to request the currentDrawable any later. Am I doing something wrong or sub-optimal?
I started looking into the issue because most frames (basically doing the same thing), the wait for the current drawable is about 3-10ms. Occasionally the wait is 35-45ms.
I see a number of recommendations to use presentDrawable instead of present, but that does not seem to be an option with Swift.
Is there any way to request the current drawable when it is needed and make the warning go away?
You're calling view.currentRenderPassDescriptor at the top of the code you listed. The render pass descriptor has to have a reference to the drawable's texture as a color attachment. That implies obtaining the drawable and asking for its texture. So, that line requests the drawable.
Don't obtain the render pass descriptor until just before you create the render command encoder (let renderTexture = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)) using it.

"SceneKit: error, missing buffer [-1/2]" when trying to pass uniforms to a metal shader

I've attached a SCNProgram to a SceneKit's geometry, and I'm trying to pass uniforms to the fragment shader. In my simple code snippet I just pass the output color to the fragment shader as a uniform, which returns it as an output value.
I've already tested the shaders and they work, in the sense that I can succesfully rotate an object in the vertex shader, or draw an object in a different color in the fragment shader, etc... but the problem is when I pass the uniforms. This is my fragment shader:
struct Uniforms
{
float4 color;
};
fragment float4 myFragment(MyVertexOutput in [[ stage_in ]],
constant Uniforms& uniforms [[buffer(2)]])
{
return uniforms.color;
}
And this is how I try to pass the uniforms in my SceneKit+Swift code:
SCNTransaction.begin()
cube.geometry?.setValue(NSValue(SCNVector4:SCNVector4(0.0,1.0,0.0,1.0)), forKey: "uniforms.color")
SCNTransaction.commit()
But my object (it's a cube) is not even drawn (it's black), and I get this error:
2016-04-01 01:00:34.485 Shaded Cube[30266:12687154] SceneKit: error, missing buffer [-1/0]
EDIT
I tried to follow #lock's suggestions, but I'm still getting the same error. This is the full project repository: https://github.com/ramy89/Shaded-Cube.git
Your shader code looks fine, and the way you're passing the uniform in is close to working.
The name of the shader uniform must match that which is used in setValue. In this case your Metal variable name is uniforms and the value you use in setValue is uniforms.color. These don't match, hence the error you see. Suggest changing the swift code to simply use uniforms in the setValue call.
Next you need to ensure the data passed in as the value to setValue is the same in both Metal and Swift. A SCNVector4 is a struct of four 64bit doubles (CGFloats), whereas Metal's float4 is a struct of four 32bit floats. The documentation seems to indicate you can pass in a SCNVector as a NSValue but I've never got it to work.
In your swift code I'd create a struct to contain the uniforms you want to pass in. There's not a lot of documentation on the vector_float4 struct, but it matches the Metal float4 struct in that it is four 32bit floats.
struct Uniforms {
var color:vector_float4
}
Pass this into the setValue function as NSData.
let myColor = vector_float4(0,1,0,1)
var myUniforms = Uniforms(color:myColor)
var myData = NSData(bytes:&myUniforms, length:sizeof(Uniforms))
cube.geometry?.setValue(myData, forKey: "uniforms")
I'm not sure you need the SCNTransaction calls, I've not needed them in past and they could be costly performance wise.
update
After looking at Ramy's code it seems there is an issue with setValue and SCNPrograms when applied to the geometry. I can't tell you why, but setting the custom shader to the material seems to fix this, eg;
func makeShaders()
{
let program = SCNProgram()
program.vertexFunctionName = "myVertex"
program.fragmentFunctionName = "myFragment"
//cube.geometry?.program = program
cube.geometry?.firstMaterial?.program = program
var uniforms = Uniforms(color: vector_float4(0.0,1.0,0.0,1.0))
let uniformsData = NSData(bytes: &uniforms, length: sizeof(Uniforms))
//cube.geometry?.setValue(uniformsData, forKey: "uniforms")
cube.geometry?.firstMaterial?.setValue(uniformsData, forKey: "uniforms")
}

How do you get normalized devices coordinates into Apple's Metal kernel functions?

I have a kernel function in Metal that I pass in a texture to so that I can perform some operations on the image. I'm passing in uint2 gid [[thread_position_in_grid]] which gives me the pixel coordinates as integers.
To get a the normalized devices coordinates I can do some simple math on gid.x and gid.y along with my texture width and heigh. Is this the best way to do it? Better way?
Your approach is a good one. If you don't want to query the texture dimensions inside the kernel function or create a buffer just to pass them in, you can use the -[MTLComputeCommandEncoder setBytes:length:atIndex:] method to bind the texture dimensions in a "temporary" buffer of sorts that is handled by Metal:
[computeEncoder setBytes:&dimensions length:sizeof(dimensions) atIndex:0]
I think you right, and it is good way to use the same approach usually is applied in GLSL:
compute texel size
float2 texSize = float2(1/outTexture.get_with(),1/outTexture.get_height());
then use it to get normalized pixel position
constexpr sampler s(address::clamp_to_edge, filter::linear, coord::normalized);
//
// something to do...
//
float4 color = inTexture.sample(s,float2(gid)*texSize);
//
// something todo with pixel
//
outTexture.write(color,gid);
The method specified in the question works well. But for completion, an alternate way to read from textures using non-normalized (and/or normalized device coordinates) would be to use samplers.
Create a sampler:
id<MTLSamplerState> GetSamplerState()
{
MTLSamplerDescriptor *desc = [[MTLSamplerDescriptor alloc] autorelease];
desc.minFilter = MTLSamplerMinMagFilterNearest;
desc.magFilter = MTLSamplerMinMagFilterNearest;
desc.mipFilter = MTLSamplerMipFilterNotMipmapped;
desc.maxAnisotropy = 1;
desc.sAddressMode = MTLSamplerAddressModeClampToEdge;
desc.tAddressMode = MTLSamplerAddressModeClampToEdge;
desc.rAddressMode = MTLSamplerAddressModeClampToEdge;
// The key point: specifies that the sampler reads non-normalized coordinates
desc.normalizedCoordinates = NO;
desc.lodMinClamp = 0.0f;
desc.lodMaxClamp = FLT_MAX;
id <MTLSamplerState> sampler_state = nil;
sampler_state = [[device_ newSamplerStateWithDescriptor:desc] autorelease];
// Release the descriptor
desc = nil;
return sampler_state;
}
And then attach it to your compute command encoder:
id <MTLComputeCommandEncoder> compute_encoder = [comand_buffer computeCommandEncoder];
id<MTLSamplerState> ss = GetSamplerState();
// Attach the sampler state to the encoder, say at sampler bind point 0
[compute_encoder setSamplerState:ss atIndex:0];
// And set your texture, say at texture bind point 0
[compute_encoder setTexture:my_texture atIndex:0];
Finally use it in the kernel:
// An example kernel that samples from a texture,
// writes one component of the sample into an output buffer
kernel void compute_main(
texture2d<uint, access::sample> tex_to_sample [[ texture(0) ]],
sampler smp [[ sampler(0) ]],
device uint *out [[buffer(0)]],
uint2 tid [[thread_position_in_grid]])
{
out[tid] = tex_to_sample.sample(smp, tid).x;
}
Using a sampler allows you to specify parameters for sampling (like filtering). You can also access the texture in different ways by using different samplers attached to the same kernel. Sampler also avoids having to pass and check for bounds on texture dimensions.
Note that the sampler can also be set up from within the compute kernel. Refer to Section 2.6 Samplers in the Metal Shading Language Specification
Finally, one main difference between read function (using gid, as specified in the question) vs. sampling using a sampler is that read() takes integer coordinates, whereas sample() takes floating point coordinates. So integer coordinates passed into sample will get casted into equivalent floating-point.

Resources