Efficiently copying Swift Array to memory buffer for iOS Metal - ios

I am writing an iOS application using Apple's new Metal framework. I have an array of Matrix4 objects (see Ray Wenderlich's tutorial) that I need to pass in to a shader via the MTLDevice.newBufferWithLength() method. The Matrix4 object is leveraging Apple's GLKit (it contains a GLKMatrix4 object).
I'm leveraging instancing with the GPU calls.
I will later change this to a struct which includes more data per instance (beyond just the Matrix4 object.
How can I efficiently copy the array of [Matrix4] objects into this buffer?
Is there a better way to do this? Again, I'll expand this to use a struct with more data in the future.
Below is a subset of my code:
let sizeofMatrix4 = sizeof(Float) * Matrix4.numberofElements()
// This returns an array of [Matrix4] objects.
let boxArray = createBoxArray(parentModelViewMatrix)
let sizeOfUniformBuffer = boxArray.count * sizeOfMatrix4
var uniformBuffer = device.newBufferWithLength(sizeofUniformBuffer, options: .CPUCacheModeDefaultCache)
let bufferPointer = uniformBuffer?.contents()
// Ouch - way too slow. How can I optimize?
for i in 0..<boxArray.count
{
memcpy(bufferPointer! + (i * sizeOfMatrix4), boxArray[i].raw(), sizeOfMatrix4)
}
renderEncoder.setVertexBuffer(uniformBuffer, offset: 0, atIndex: 2)
Note:
The boxArray[i].raw() method is defined as this in the Objective-C code:
- (void *)raw {
return glkMatrix.m;
}
You can see I'm looping through each array object and then doing a memcpy. I did this since I was experiencing problems treating the array as a contiguous set of memory.
Thanks!

A Swift Array is promised to be contiguous memory, but you need to make sure it's really a Swift Array and not secretly an NSArray. If you want to be completely certain, use a ContiguousArray. That will ensure contiguous memory even if the objects in it are bridgeable to ObjC. If you want even more control over the memory, look at ManagedBuffer.
With that, you should be using newBufferWithBytesNoCopy(length:options:deallocator) to create a MTL buffer around your existing memory.

I've done this with an array of particles that I pass to a compute shader.
In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:
let particleCount: Int = 1048576
var particlesMemory:UnsafeMutablePointer<Void> = nil
let alignment:UInt = 0x4000
let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
var particlesVoidPtr: COpaquePointer!
var particlesParticlePtr: UnsafeMutablePointer<Particle>!
var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!
When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:
posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
The loop to populate the particles is slightly different - I now loop over the buffer pointer:
for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex
{
[...]
let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)
particlesParticleBufferPtr[index] = particle
}
Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:
let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
options: nil, deallocator: nil)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
There's an up to date Swift 2.0 version of this at my GitHub repo here

Obviously the point of using shared memory and MTLDevice.makeBuffer(bytesNoCopy:...) is to avoid redundant memory copies. Therefore, ideally we look for a design that allows us to easily manipulate the data after it's already been loaded into the MTLBuffer object.
After researching this for a while, I've decided to try and create a semi-generic solution to allow for simplified allocation of page-aligned memory, loading your content into that memory, and subsequently manipulating your items in that shared memory block.
I've created a Swift array implementation called PageAlignedArray that matches the interface and functionality of the built-in Swift array, but always resides on page-aligned memory, and so can be very easily made into an MTLBuffer. I've also added a convenience method to directly convert PageAlignedArray into a Metal buffer.
Of course, you can continue to mutate your array afterwards and your updates will be automatically available to the GPU courtesy of the shared-memory architecture. However, keep in mind that you must regenerate your MTLBuffer object whenever the array's length changes.
Here's a quick code sample:
var alignedArray : PageAlignedContiguousArray<matrix_double4x4> = [matrixTest, matrixTest]
alignedArray.append(item)
alignedArray.removeFirst() // Behaves just like a built-in array, with all convenience methods
// When it's time to generate a Metal buffer:
let testMetalBuffer = device?.makeBufferWithPageAlignedArray(alignedArray)
The sample uses matrix_double4x4, but the array should work for any Swift value types. Please note that if you use a reference type (such as any kind of class), the array will contain pointers to your elements and so won't be usable from your GPU code.

Related

Casting a multidimensional array to Data object for TF inference

I am currently using the Swift release of Tensorflow in my iOS app.
My model is working fine, but I am having trouble copying the data into the first Tensor so I can use the neural net to detect stuff.
I consulted the testsuite inside the repository, and their code is working as follows:
They are using some extensions:
extension Array {
/// Creates a new array from the bytes of the given unsafe data.
///
/// - Note: Returns `nil` if `unsafeData.count` is not a multiple of
/// `MemoryLayout<Element>.stride`.
/// - Parameter unsafeData: The data containing the bytes to turn into an array.
init?(unsafeData: Data) {
guard unsafeData.count % MemoryLayout<Element>.stride == 0 else { return nil }
let elements = unsafeData.withUnsafeBytes {
UnsafeBufferPointer<Element>(
start: $0,
count: unsafeData.count / MemoryLayout<Element>.stride
)
}
self.init(elements)
}
}
extension Data {
/// Creates a new buffer by copying the buffer pointer of the given array.
///
/// - Warning: The given array's element type `T` must be trivial in that it can be copied bit
/// for bit with no indirection or reference-counting operations; otherwise, reinterpreting
/// data from the resulting buffer has undefined behavior.
/// - Parameter array: An array with elements of type `T`.
init<T>(copyingBufferOf array: [T]) {
self = array.withUnsafeBufferPointer(Data.init)
}
}
to create the array containing the data, and a Data object from that:
static let inputData = Data(copyingBufferOf: [Float32(1.0), Float32(3.0)])
Afterwards, they copy the inputData into the neural net.
I've tried to modify their code to load an image into a [1,28,28,1] Tensor.
The image is looking something like this:
[[[[Float32(254.0)],
[Float32(255.0)],
[Float32(254.0)],
[Float32(250.0)],
[Float32(252.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(254.0)],
[Float32(214.0)],
[Float32(160.0)],
[Float32(130.0)],
[Float32(124.0)],
[Float32(129.0)],
...
you get the point.
But if I try to cast that to Data / init Data with the image data I somehow only get 8 bytes:
private func createTestData() -> Data {
return Data(copyingBufferOf:
[[[[Float32(254.0)],
[Float32(255.0)],
[Float32(254.0)],
...
Same goes for the code in the tests, but for them, it is fine (2*Float32 = 8 bytes).
For me, that is considerably too small (should be 28*28*4 = 3136 bytes)!
Is there something I am missing (have I overlooked something)?
What do I need to do to get my images into the correct arrays/data types?
A Swift Array is a fixed-sized structure with (opaque) pointers to the actual element storage. The withUnsafeBufferPointer() method calls the given closure with a buffer pointer to that element storage. In the case of a [Float] array, that is a pointer to the memory address of the floating point values. That's why
array.withUnsafeBufferPointer(Data.init)
works to get a Data value representing the floating point numbers.
If you pass a nested array (e.g. of type [[Float]]) to the withUnsafeBufferPointer() method then the closure is called with a pointer to the Array structures of the inner arrays. So the element type now is not Float but [Float] – and not a “trivial type” in the sense of the warning
/// - Warning: The given array's element type `T` must be trivial in that it can be copied bit
/// for bit with no indirection or reference-counting operations; otherwise, reinterpreting
/// data from the resulting buffer has undefined behavior.
What you need to do is to flatten the nested array to a simple array, and then create a Data value from the simple array.

Converting Objective-C malloc to Swift

I am working on a project that was written in Objective-C and needs to be updated to Swift. We use a C file for transferring data.
Here is the code I was given in Objective-C:
- (NSData *)prepareEndPacket {
UInt8 *buff_data;
buff_data = (uint8_t *)malloc(sizeof(uint8_t)*(PACKET_SIZE+5));
// Call to C File
PrepareEndPacket(buff_data);
NSData *data_first = [NSData dataWithBytes:buff_data length:sizeof(uint8_t)*(PACKET_SIZE+5)];
return data_first;
}
In the C .h file I have this for reference:
#define PACKET_SIZE ((uint32_t)128)
I can not seem to find a good way of converting this to Swift. Any help would be appreciated.
malloc and free actually work fine in Swift; however, the UnsafeMutablePointer API is more "native". I'd probably use Data's bytesNoCopy for better performance. If you want, you can use Data(bytes:count:), but that will make a copy of the data (and then you need to make sure to deallocate the pointer after making the copy, or you'll leak memory, which is actually a problem in the Objective-C code above since it fails to free the buffer).
So, something like:
func prepareEndPacket() -> Data {
let count = PACKET_SIZE + 5
let buf = UnsafeMutablePointer<UInt8>.allocate(capacity: count)
PrepareEndPacket(buf)
return Data(bytesNoCopy: buf, count: count, deallocator: .custom { ptr, _ in
ptr.deallocate()
})
}
By using bytesNoCopy, the Data object returned is basically a wrapper around the original pointer, which will be freed by the deallocator when the Data object is destroyed.
Alternatively, you can create the Data object from scratch and get a pointer to its contents to pass to PrepareEndPacket():
func prepareEndPacket() -> Data {
var data = Data(count: PACKET_SIZE + 5)
data.withUnsafeMutableBytes { (ptr: UnsafeMutablePointer<UInt8>) in
PrepareEndPacket(ptr)
}
return data
}
This is slightly less efficient, since the Data(count:) initializer will initialize all the Data's bytes to zero (similar to using calloc instead of malloc), but in many cases, that may not make enough of a difference to matter.

Working with UnsafeMutablePointer array

I'm trying to work with Brad Larson's splendid GPUImage framework, and I'm struggling to process the cornerArray returned by the GPUImageHarrisCornerDetectionFilter.
The corners are returned as an array of GLFloat in an UnsafeMutablePointer - and I would like to convert that to an array of CGPoint
I've tried allocating space for the memory
var cornerPointer = UnsafeMutablePointer<GLfloat>.alloc(Int(cornersDetected) * 2)
but the data doesn't seem to make any sense - either zero or 1E-32
I found what looked like the perfect answer how to loop through elements of array of <UnsafeMutablePointer> in Swift and tried
filter.cornersDetectedBlock = {(cornerArray:UnsafeMutablePointer<GLfloat>, cornersDetected:UInt, frameTime:CMTime) in
crosshairGenerator.renderCrosshairsFromArray(cornerArray, count:cornersDetected, frameTime:frameTime)
for floatData in UnsafeBufferPointer(start: cornerArray, count: cornersDetected)
{
println("\(floatData)")
}
but the compiler didn't like the UnsafeBufferPointer - so I changed it to UnsafeMutablePointer, but it didn't like the argument list.
I'm sure this is nice and simple, and it sounds like something other people must have had to do - so what's the solution?
The UnsafeMutablePointer<GLfloat> type translated from C can have its elements accessed via a subscript, just like a normal array. To achieve your goal of converting these to CGPoints, I'd use the following code:
filter.cornersDetectedBlock = { (cornerArray:UnsafeMutablePointer<GLfloat>, cornersDetected:UInt, frameTime:CMTime) in
var points = [CGPoint]()
for index in 0..<Int(cornersDetected) {
points.append(CGPoint(x:CGFloat(cornerArray[index * 2]), y:CGFloat(cornerArray[(index * 2) + 1])))
}
// Do something with these points
}
The memory backing cornerArray is allocated immediately before the callback is triggered, and deallocated immediately after it. Unless you copy these values over in the middle of the block, as I do above, I'm afraid that you'll leave yourself open to some nasty bugs. It's also easier to convert to the correct format at that point, anyway.
I have found a solution - and it's simple. The answer was here https://gist.github.com/kirsteins/6d6e96380db677169831
var dataArray = Array(UnsafeBufferPointer(start: cornerArray, count: Int(cornersDetected) * 2))
Try this :
var cornerPointer = UnsafeMutablePointer<GLfloat>.alloc(Int(cornersDetected) * 2)
filter.cornersDetectedBlock = {(cornerArray:UnsafeMutablePointer<GLfloat>, cornersDetected:UInt, frameTime:CMTime) in
crosshairGenerator.renderCrosshairsFromArray(cornerArray, count:cornersDetected, frameTime:frameTime)
for i in 0...cornersDetected
{
print("\(cornerPointer[i])")
}

Pointer (memory) alignment to 16K in Swift for Metal buffer creation

I'd like to createa Metal buffers with the newBufferWithBytesNoCopy function for letting CPU and GPU share memory and practicing zero-copy data transfer.
The newBufferWithBytesNoCopy function takes a UnsafeMutablePointer-type pointer, and the pointer needs to be aligned to 16K(16384) bytes.
Could anyone provide advice on how to create a aligned memory to a certain size in Swift?
I believe this should work for you:
var memory:UnsafeMutablePointer<Void> = nil
var alignment:UInt = 0x4000 // 16K aligned
var size:UInt = bufferSize // bufferSize == your buffer size
posix_memalign(&memory, alignment, size)
For reference:
http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_memalign.html
Swift PageAligned Array (just works solution)
PageAlignedArray is a project that handles the problem of memory allocation for you when memory that will be used with Metal.
More detail
4096 byte alignment
I'm not seeing your 16k byte requirement. The debugger says 4k. Maybe you can provide a reference? It should be based upon the system's page size.
Using posix_memalign
The original idea came from memkite I believe and goes like this:
private func setupSharedMemoryWithSize(byteCount: Int) ->
(pointer: UnsafeMutablePointer<Void>,
memoryWrapper: COpaquePointer)
{
let memoryAlignment = 0x1000 // 4096 bytes, 0x4000 would be 16k
var memory: UnsafeMutablePointer<Void> = nil
posix_memalign(&memory, memoryAlignment, byteSizeWithAlignment(memoryAlignment, size: byteCount))
let memoryWrapper = COpaquePointer(memory)
return (memory, memoryWrapper)
}
Calculating the memory size
You'll want to calculate the correct amount of memory to allocate so that it lands on the byte boundary. Which means you'll probably need to allocate a bit more than you desired. Here you pass the size you need and alignment and it will return the amount you should allocate.
private func byteSizeWithAlignment(alignment: Int, size: Int) -> Int
{
return Int(ceil(Float(size) / Float(alignment))) * alignment
}
Using the functions
let (pointer, memoryWrapper) = setupSharedMemoryWithSize(byteCount)
var unsafeVoidPointer: UnsafeMutablePointer<Void> = pointer
// Assuming your underlying data is unsigned 8-bit integers.
let unsafeMutablePointer = UnsafeMutablePointer<UInt8>(memoryWrapper)
let unsafeMutableBufferPointer = UnsafeMutableBufferPointer(start: unsafeMutablePointer, count: byteCount)
Don't forget to free the memory you allocated.
Allocating shared memory without posix_memalign.
These days you can allocate the memory without posix_memalign by just specifying .StorageModeShared.
let byteCount = 1000 * sizeof(Float)
let sharedMetalBuffer = self.metalDevice.newBufferWithLength(byteCount, options: .StorageModeShared)
After experiencing some annoyance with this problem, I've decided to go ahead and create a simple solution that should make this a lot easier.
I've created a Swift array implementation called PageAlignedArray that matches the interface and functionality of the built-in Swift array, but always resides on page-aligned memory, and so can be very easily made into an MTLBuffer. I've also added a convenience method to directly convert PageAlignedArray into a Metal buffer.
Of course, you can continue to mutate your array afterwards and your updates will be automatically available to the GPU courtesy of the shared-memory architecture. However, keep in mind that you must regenerate your MTLBuffer object whenever the array's length changes.
Here's a quick code sample:
var alignedArray : PageAlignedContiguousArray<matrix_double4x4> = [matrixTest, matrixTest]
alignedArray.append(item)
alignedArray.removeFirst() // Behaves just like a built-in array, with all convenience methods
// When it's time to generate a Metal buffer:
let device = MTLCreateSystemDefaultDevice()
let testMetalBuffer = device?.makeBufferWithPageAlignedArray(alignedArray)
The sample uses matrix_double4x4, but the array should work for any Swift value types. Please note that if you use a reference type (such as any kind of class), the array will contain pointers to your elements and so won't be usable from your GPU code.
Please grab PageAlignedArray here: https://github.com/eldoogy/PageAlignedArray

Xcode / iOS: Simple example of a mutable C-Array as a class instance variable?

For some reason I just cant seem to get my head around the process of creating a C-Array instance variable for a class that can have elements added to it dynamically at runtime.
My goal is to create a class called AEMesh. All AEMesh objects will have a c-array storing the vertexdata for that specific AEMesh's 3D model for use with OpenGL ES (more specifically it's functionality for drawing a model by passing it a simple C-Array of vertexdata).
Initially I was using an NSMutableArray, on the assumption that I could simply pass this array to OpenGL ES, however that isnt the case as the framework requires a C-Array. I got around the issue by essentially creating a C-Array of all of the vertexdata for the current AEMesh when it came time to render that specific mesh, and passing that array to OpenGL ES. Obviously the issue here is performance as I am constantly allocating and deallocating enough memory to hold every 3D model's vertexdata in the app about a dozen times a second.
So, Im not one to want the answer spoon fed to me, but if anyone would be willing to explain to me the standard idiom for giving a class a mutable c-array (some articles Ive read mention using malloc?) I would greatly appreciate it. From the information Ive gathered, using malloc might work, but this isn't creating a standard c-array I can pass in to OpenGL ES, instead its more of a pseudo-c-array that works like a c-array?
Anyways, I will continue to experiment and search the internet but again, if anyone can offer a helping hand I would greatly appreciate it.
Thanks,
- Adam Eisfeld
The idea would just be to add a pointer to an array of AEMesh structures to your class, and then maintain the array as necessary. Following is a little (untested) code that uses malloc() to create such an array and realloc() to resize it. I'm growing the array 10 meshes at a time:
#interface MyClass : NSObject
{
int meshCount;
AEMesh *meshes;
}
#end
#implementation MyClass
-(id)init {
if ((self = [super init])) {
meshCount = 0;
meshes = malloc(sizeof(AEMesh)*10);
}
return self;
}
-(void)addMesh:(AEMesh)mesh {
if (meshCount % 10 = 0) {
meshCount = realloc(sizeof(AEMesh) * (meshCount + 10));
}
if (meshCount != nil) {
meshes[meshCount] = mesh;
meshCount++;
}
}
#end
It might be worthwhile to factor the array management into it's own Objective-C class, much as Brian Coleman's answer uses std::vector to manage the meshes. That way, you could use it for C arrays of any type, not just AEMesh.
From the information Ive gathered, using malloc might work, but this
isn't creating a standard c-array I can pass in to OpenGL ES, instead
its more of a pseudo-c-array that works like a c-array?
A C array is nothing more than a series of objects ("objects" used here in the C sense of contiguous memory, not the OO sense) in memory. You can create one by declaring it on the stack:
int foo[10]; // array of 10 ints
or dynamically on the heap:
int foo[] = malloc(sizeof(int)*10); // array of 10 ints, not on the stack
int *bar = malloc(sizeof(int)*10); // another way to write the same thing
Don't forget to use free() to deallocate any blocks of memory you've created with malloc(), realloc(), calloc(), etc. when you're done with them.
I know it doesn't directly answer your question, but an even easier approach would be to work with an NSMutableArray instance variable until the point where you need to pass it to the API, where you would use getObjects:range: in order to convert it to a C-Array. That way you won't have to deal with "mutable" C-Arrays and save yourself the trouble.
If you're willing to use ObjectiveC++ and stray outside the bounds of C and ObjectiveC, then you can use a std::vector to amortise the cost of resizing the array of vertex data. Here's what things would look like:
include <vector>
include <gl.h>
#interface MyClass {
std::vector<GLfloat> vertexData;
}
-(void) createMyVertexData;
-(void) useMyVertexData;
#end
#implementation
-(void) createMyVertexData {
// Erase all current data from vertexData
vertexData.erase(vertexData.begin(),
std::remove(vertexData.begin(),
vertexData.end());
// The number of vertices in a triangle
std::size_t nVertices = 3;
// The number of coordinates required to specify a vertex (x, y, z)
std::size_t nDimensions = 3;
// Reserve sufficient capacity to store the vertex data
vertexData.reserve(nVertices * nDimensions);
// Add the vertex data to the vector
// First vertex
vertexData.push_back(0);
vertexData.push_back(0);
vertexData.push_back(0);
// And so on
}
-(void) useMyVertexData {
// Get a pointer to the first element in the vertex data array
GLfloat* rawVertexData = &vertexData[0];
// Get the size of the vertex data
std::size_t sizeVertexData = vertexData.size();
// Use the vertex data
}
#end
The neat bit is that vertexData is automatically destroyed along with the instance of MyClass. You don't have to add anything to the dealloc method in MyClass. Remember to define MyClass in a .mm file

Resources