I am trying to run my metal program on my iPhone SE.
I tried many numbers for threadsPerThreadGroup and threadsPerGrid sizes and all of them gave me this error: TLValidateFeatureSupport:3539: failed assertion `Dispatch Threads with Non-Uniform Threadgroup Size is only supported on MTLGPUFamilyApple4 and later.'
Here is my code.
var threadsPerThreadGroup: MTLSize
var threadsPerGrid: MTLSize
computeCommandEncoder.setComputePipelineState(updateShader)
let w = updateShader.threadExecutionWidth
threadsPerThreadGroup = MTLSize(width: w, height: 1, depth: 1)
threadsPerGrid = MTLSize(width: Int(constants.bufferLength), height: 1, depth: 1)
if(frames % 2 == 0) {
computeCommandEncoder.setBuffer(buffer1, offset: 0, index: 0)
computeCommandEncoder.setBuffer(buffer2, offset: 0, index: 1)
} else {
computeCommandEncoder.setBuffer(buffer2, offset: 0, index: 0)
computeCommandEncoder.setBuffer(buffer1, offset: 0, index: 1)
}
computeCommandEncoder.setBytes(&constants, length: MemoryLayout<MyConstants>.stride, index: 2)
computeCommandEncoder.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerThreadGroup)
frames += 1
I am using iOS 13.4 and XCode 11.4.
threadExecutionWidth evaluates to 32 and constants.bufferLength is 512.
Use [dispatchThreads] only if the device supports non-uniform threadgroup sizes.
That is not worded as clearly as it could be. It means that dispatchThreads does not work on pre-A11 GPUs.
If you want a solution that works on all devices, you have to calculate how many threadgroups go into a grid yourself, and use dispatchThreadgroups.
If you want to have both methods in your code, you can detect the device's feature set at runtime.
Related
Now I have already known how to render multiple triangles in Metal:
let vertexBuffer = device.makeBuffer(vertices_triangles)
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: vertices_triangles.count)
renderEncoder.endEncoding()
commandBuffer.present(view.currentDrawable!)
commandBuffer.commit()
Here, vertices_triangles is an Array of element Vertex. The adjacent three vertices shows a triangle to render.
However, I don't really know how to render multiple triangleStrips in Metal.
let vertexBuffer = device.makeBuffer(vertices_triangleStrips)
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: vertices_triangleStrips.count)
If I put adjacent vertices in vertices_triangleStrips and set renderEncoder.drawPrimitives.type to .triangleStrip, I will get one triangleStrip. But how can I render multiple triangleStrips? I tried using for loop to make multiple vertexBuffers and use renderEncoder.drawPrimitives to draw each triangleStrip. It seems that it's not a good idea to do this for performance reasons.
Referring to documentation of drawIndexedPrimitives(type:indexCount:indexType:indexBuffer:indexBufferOffset:instanceCount:baseVertex:baseInstance:) in Metal:
Primitive restart functionality is enabled with the largest unsigned integer index value, relative to indexType (0xFFFF for MTLIndexTypeUInt16 or 0xFFFFFFFF for MTLIndexTypeUInt32). This feature finishes drawing the current primitive at the specified index and starts drawing a new one with the next index.
You could render multiple triangleStrips by defining an indexBuffer seperated by 0xFFFF or 0xFFFFFFFF.
eg. rendering triangleStrips at vertex [0,1,2,3] [4,5,6,7] [8,9,10] [11,12,13,14,15,16]
let indexBytes: [UInt32] = [0, 1, 2, 3, 0xFFFFFFFF, 4, 5, 6, 7, 0xFFFFFFFF, 8, 9, 10, 0xFFFFFFFF, 11, 12, 13, 14, 15, 16, 0xFFFFFFFF]
let vertexBuffer = device.makeBuffer(bytes: vertices_triangleStrips,
length: vertices_triangleStrips.count * MemoryLayout<MetalPosition2>.stride,
options: [])!
let indexBuffer = device.makeBuffer(bytes: indexBytes,
length: indexBytes.count * MemoryLayout<UInt32>.stride,
options: [])!
renderEncoder.setVertexBuffer(vertexBuffer,
offset: 0,
index: 0)
renderEncoder.drawIndexedPrimitives(type: .triangleStrip,
indexCount: indexBytes.count,
indexType: .uint32,
indexBuffer: indexBuffer,
indexBufferOffset: 0) // only one instance
I've watched this WWDC session as well as its sample project: https://developer.apple.com/documentation/pencilkit/inspecting_modifying_and_constructing_pencilkit_drawings
However, when I try to plot CGPoints on my drawing canvas, nothing shows up.
Here's my setup:
var points: [CGPoint] = []
(500...1000).forEach { x in
(500...1000).forEach { y in
points.append(CGPoint(x: x, y: y))
}
}
let strokePoints = points.map {
PKStrokePoint(location: $0, timeOffset: 0, size: CGSize(uniform: 1), opacity: 2, force: 1, azimuth: 1, altitude: 1)
}
let strokePath = PKStrokePath(controlPoints: strokePoints, creationDate: Date())
let stroke = PKStroke(ink: PKInk(.pen, color: .systemGreen), path: strokePath)
canvasView.drawing = PKDrawing(strokes: [ stroke ])
I figured that the problem was on the size of my stroke. This will work:
PKStrokePoint(location: point, timeOffset: 0, size: CGSize(uniform: 3), opacity: 1, force: 0, azimuth: 0, altitude: 0)
Note that it must be at least 3. If it's 1, it's invisible, if it's 2 it's semi-transparent. This doesn't seem to be related to the screen's scale (my iPad Pro's UIScreen.main.scale = 2.0) so I just hardcoded it at 3, to represent a single pixel.
To achieve this result, I drew a single pixel on the screen using the pencil and logged its content, which showed me these parameters:
▿ PencilKit.PKStrokePoint
- strokePoint: <PKStrokePoint: 0x600001ad8f60 location={445, 333.5} timeOffset=0.000000 size={3.084399938583374, 3.084399938583374} opacity=0.999985 azimuth=-3.141593 force=0.000000 altitude=1.570796> #0
- super: NSObject
So then I played around with those values (for size, opacity, azimuth, force and altitude), and figured that none of those except the size and opacity matter. That's why I set them as all as zero values in my code.
I am trying to get disparity map from stereo images. When I try stereo_match.cpp with images below, result image is smaller from original images and left of the image is missing.
Is it about parameters? What is the reason and how can solve it?
How can I improve the result with changing parameters or other things?
Here are parameters:
Ptr<StereoSGBM> sbm = StereoSGBM::create(16, 64, 3, 48, 192, 0, 0,10,200,100);
CV_WRAP static Ptr<StereoSGBM> create(int minDisparity = 0, int numDisparities = 16, int blockSize = 3,
int P1 = 0, int P2 = 0, int disp12MaxDiff = 0,
int preFilterCap = 0, int uniquenessRatio = 0,
int speckleWindowSize = 0, int speckleRange = 0,
int mode = StereoSGBM::MODE_SGBM);
Here is theoretical explanation of disparity mapcreation:
https://www.eecis.udel.edu/~grauerg/globalStereoPresCompVisionClass.pdf
In order to create disparity or depth map, there should be an object visible on both cameras. In your example, disparity map shows cameras common field of view.
If you want to incerase this field of view you could move camera closer each other but it has a cost. You will have less accuracy in z dimension and more noise.
I'm trying to fill MPSImage or 2D Metal texture with values manually and pass that to do convolutional network operation. An input for CNN (Metal Performance Shaders) is usually an image (like this https://developer.apple.com/library/content/samplecode/MPSCNNHelloWorld/Introduction/Intro.html#//apple_ref/doc/uid/TP40017482-Intro-DontLinkElementID_2), that's why I could pass UnsafePointer of CGContext, but this time I'd like to use Float array as an input.
The following is what I tried. I converted an input array to NSData, but it didn't work.
var inputData = NSData(bytes: inputFloatArrayOfArray, length: inputFloatArrayOfArray.count * inputFloatArrayOfArray[0].count * MemoryLayout<Float>.size)
// The type of inputFloatArrayOfArray is [[Float]]
network.srcImage.texture.replace(region: MTLRegion( origin: MTLOrigin(x: 0, y: 0, z: 0),
size: MTLSize(width: inputWidth, height: inputHeight, depth: 1)),
mipmapLevel: 0,
slice: 0,
withBytes: &inputData,
bytesPerRow: inputWidth * MemoryLayout<Float>.size,
bytesPerImage: 0)
Manually set a 1D Texture in Metal may relate to my question (FYI: it says "deal with 2D textures that load the texture by converting a loaded UIImage to raw bytes data, but creating a dummy UIImage felt like a hack for me." ), but it seems there is no enough answer. Now I have no ideas how to tackle this. Please let me know anything if you have any ideas.
Thank you very much in advance.
If your tensor has <= 4 feature channels, then you just copy them in with feature channels 0-3 sitting where RGBA would be in the texture. If your tensor has more than that, then you use a MTLTexture2DArray instead. Additional feature channels beyond the first four go consecutively into the same coordinate in later images in the array.
Image[0]: // a.k.a. slice 0
pix[0][0] = {feature channel 0, feature channel 1, feature channel 2, feature channel 3}
Image[1]: // a.k.a. slice 1
pix[0][0] = {feature channel 4, feature channel 5, feature channel 6, feature channel 7}
Image[2]: // a.k.a. slice 2
pix[0][0] = {feature channel 8, feature channel 9, feature channel 10, feature channel 11}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using Genetic Algorithms (GA) on an image processing problem (an image segmentation to be more precise). In this case, an individual represent a block of pixels (i.e. a set of pixel coordinates). I need to encourage individuals with contiguous pixels.
To encourage contiguous blocks of pixels:
The "contiguousness" of an individual need to be considered in the fitness function to encourage individuals having adjacent pixels (best-fit). Hence during the evolution, the contiguousness of a set of coordinates (i.e. an individual) will influence the fitness of this individual.
The problem I'm facing is how to measure this feature (how much contiguous) on a set of pixel coordinates (x, y)?
As can be shown on the image below, the individual (set of pixels in black) on the right is clearly more "contiguous" (and therefore fitter) than the individual on the left:
I think I understand what you are asking, and my suggestion would be to count the number of shared "walls" between your pixels:
I would argue that from left to right the individuals are decreasing in continuity.
Counting the number of walls is not difficult to code, but might be slow the way I've implemented it here.
import random
width = 5
height = 5
image = [[0 for x in range(width)] for y in range(height)]
num_pts_in_individual = 4
#I realize this may give replicate points
individual = [[int(random.uniform(0,height)),int(random.uniform(0,width))] for x in range(num_pts_in_individual)]
#Fill up the image
for point in individual:
image[point[0]][point[1]] = 1
#Print out the image
for row in image:
print row
def count_shared_walls(image):
num_shared = 0
height = len(image)
width = len(image[0])
for h in range(height):
for w in range(width):
if image[h][w] == 1:
if h > 0 and image[h-1][w] == 1:
num_shared += 1
if w > 0 and image[h][w-1] == 1:
num_shared += 1
if h < height-1 and image[h+1][w] == 1:
num_shared += 1
if w < width-1 and image[h][w+1] == 1:
num_shared += 1
return num_shared
shared_walls = count_shared_walls(image)
print shared_walls
Different images and counts of shared walls:
[0, 0, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 1, 1]
[0, 0, 0, 0, 0]
2
[1, 0, 0, 0, 0]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 1, 0, 0]
0
[0, 0, 0, 1, 1]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
4
One major problem with this, is that if a change in pixel locations occurs that does not change the number of shared walls, it will not affect the score. Maybe a combination of the distance method you described and the shared walls approach would be best.