Related
Yes, I know about using CIAreaAverate CIFilter to get the average color of pixels.
I am trying to create some alternative using Accelerate Framework to see if I can come with something faster.
I am rendering a CIImage to a context. For that purpose I have this CIImage extension...
let device: MTLDevice = MTLCreateSystemDefaultDevice()!
let context = CIContext.init(mtlDevice: device, options: [.workingColorSpace: kCFNull])
let w = self.extent.width
let h = self.extent.height
let size = w * h * 4
var bitmap = [UInt8](repeating: 0, count:Int(size))
context.render(self,
toBitmap: &bitmap,
rowBytes: 4 * Int(w),
bounds: self.extent,
format: .BGRA8,
colorSpace: nil)
At this point I have bitmap containing the BGRA bytes interleaved.
To get the average of R, G and B, all I have to do is something like this:
var averageBlue : Int = 0
for x in stride(from:0, through: bitmap.count-4, by: 4) {
let value = bitmap[Int(x)]
averageBlue += Int(value)
}
averageBlue /= numberOfPixels
but this for loop is slow as hell, as expected.
I was thinking about using some Accelerate function like
vDSP_meanvD(bitmap, 2, &r, vDSP_Length(numberOfPixels))
but this function requires bitmap to be an array of UnsafePointer<Double>...
I could convert bitmap to that, but that would require a for loop, that is slow...
Is there any way to extract those R, G and B pixels and have their individual averages using some accelerate stuff going on?
You can convert bitmap to single-precision floating-point values using vDSP_vfltu8(_:_:_:_:_:) :
let bitmap: [UInt8] = [1, 10, 50, 0,
2, 20, 150, 5,
3, 30, 250, 10]
//Blue
var blueFloats = [Float](repeating: 0, count: bitmap.count/4)
vDSP_vfltu8(bitmap,
vDSP_Stride(4),
&blueFloats,
vDSP_Stride(1),
vDSP_Length(blueFloats.count))
And then use vDSP_meanv(_:_:_:_:) :
var blue: Float = 0
vDSP_meanv(blueFloats,
vDSP_Stride(1),
&blue,
vDSP_Length(blueFloats.count))
print("blue =", blue) //2.0
As to the reds :
//Red
var redFloats = [Float](repeating: 0, count: bitmap.count/4)
vDSP_vfltu8(UnsafePointer.init(bitmap).advanced(by: 2),
vDSP_Stride(4),
&redFloats,
vDSP_Stride(1),
vDSP_Length(redFloats.count))
var red: Float = 0
vDSP_meanv(redFloats,
vDSP_Stride(1),
&red,
vDSP_Length(redFloats.count))
print("red =", red) //150.0
Like ielyamani’s said, you can use vDSP_vfltu8 to build that buffer of Float efficiently.
But rather than striding through that array four times, you can also use cblas_sgemv (or cblas_sgemm) to calculate all four averages in a single call:
let pixelCount: Int = width * height
let channelsPerPixel: Int = 4
let m: Int32 = Int32(channelsPerPixel)
let n: Int32 = Int32(pixelCount)
let lda = m
var a = [Float](repeating: 0, count: pixelCount * channelsPerPixel)
vDSP_vfltu8(pixelBuffer, vDSP_Stride(1), &a, vDSP_Stride(1), vDSP_Length(pixelCount * channelsPerPixel))
var x = [Float](repeating: 1 / Float(pixelCount), count: pixelCount)
var y = [Float](repeating: 0, count: channelsPerPixel)
cblas_sgemv(CblasColMajor, CblasNoTrans, m, n, 1, &a, lda, &x, 1, 1, &y, 1)
print(y)
I have got the following error :
[SceneKit] Error: C3DMeshElementSetPrimitives invalid index buffer size
It appears at every frame (many errors !)
Do you know how to solve this issue ?
Thanks
Check that indices array is paring the vertices in the vertices array properly.
Suppose you wanted to draw a square using lines with 4 vertices like this:
v1 ------ v0
| |
| |
v2 ------ v3
The vertices array would be v0, v1, v2, v3.
The lines pairs would be: (v0,v1) , (v1,v2) , (v2,v3) , (v3,v0)
So indices array would be: 0,1,1,2,2,3,3,0
In code, this is an example that draws a node that contains a red circle using lines:
func circlePathNode() -> SCNNode {
var vertices:[SCNVector3] = []
let N = 200
var indices:[Int16] = []
// vertices
for i in 1...N {
let t = (Float(i-1) / Float(N)) * 2 * .pi
let v = SCNVector3(cos(t), sin(t), 0)
vertices.append(v)
}
// indices pair vertices
for i in 1...N-1 {
indices.append(Int16(i-1))
indices.append(Int16(i))
}
// last:
indices.append(Int16(N-1))
indices.append(0)
let verticesSource = SCNGeometrySource(vertices: vertices)
let indexData = Data(bytes: indices, count: indices.count * MemoryLayout<Int16>.size) // 2 for 2 bytes each
let element = SCNGeometryElement(data: indexData,
primitiveType: .line,
primitiveCount: N,
bytesPerIndex: MemoryLayout<Int16>.size)
let geometry = SCNGeometry(sources: [verticesSource],elements: [element])
#if os(iOS)
geometry.firstMaterial?.emission.contents = UIColor.red
#else
geometry.firstMaterial?.emission.contents = NSColor.red
#endif
return SCNNode(geometry: geometry)
}
I had the same error when i accidentally pass index count to primitive count. Make sure that you pass exactly primitive count parameter (not index count) to primitive count parameter of SCNGeometryElement constructor.
Primitive count can be calculated this way:
func calculatePrimintiveCount( indexCount: Int, primitiveType: SCNGeometryPrimitiveType) -> Int {
switch (primitiveType){
case .triangles: return indexCount / 3
case .triangleStrip: return indexCount - 2
case .line: return indexCount / 2
case .point: return indexCount
case .polygon: return indexCount / 4 // not sure
}
}
Based on #Kametrixom answer, I have made some test application for parallel calculation of sum in an array.
My test application looks like this:
import UIKit
import Metal
class ViewController: UIViewController {
// Data type, has to be the same as in the shader
typealias DataType = CInt
override func viewDidLoad() {
super.viewDidLoad()
let data = (0..<10000000).map{ _ in DataType(200) } // Our data, randomly generated
var start, end : UInt64
var result:DataType = 0
start = mach_absolute_time()
data.withUnsafeBufferPointer { buffer in
for elem in buffer {
result += elem
}
}
end = mach_absolute_time()
print("CPU result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParallel4(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParralel(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParallel3(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
}
func sumParralel(data : Array<DataType>) -> DataType {
let count = data.count
let elementsPerSum: Int = Int(sqrt(Double(count)))
let device = MTLCreateSystemDefaultDevice()!
let parsum = device.newDefaultLibrary()!.newFunctionWithName("parsum")!
let pipeline = try! device.newComputePipelineStateWithFunction(parsum)
var dataCount = CUnsignedInt(count)
var elementsPerSumC = CUnsignedInt(elementsPerSum)
let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of individual results = count / elementsPerSum (rounded up)
let dataBuffer = device.newBufferWithBytes(data, length: strideof(DataType) * count, options: []) // Our data in a buffer (copied)
let resultsBuffer = device.newBufferWithLength(strideof(DataType) * resultsCount, options: []) // A buffer for individual results (zero initialized)
let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()), count: resultsCount) // Our results in convenient form to compute the actual result later
let queue = device.newCommandQueue()
let cmds = queue.commandBuffer()
let encoder = cmds.computeCommandEncoder()
encoder.setComputePipelineState(pipeline)
encoder.setBuffer(dataBuffer, offset: 0, atIndex: 0)
encoder.setBytes(&dataCount, length: sizeofValue(dataCount), atIndex: 1)
encoder.setBuffer(resultsBuffer, offset: 0, atIndex: 2)
encoder.setBytes(&elementsPerSumC, length: sizeofValue(elementsPerSumC), atIndex: 3)
// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWidth` (rounded up) because each threadgroup will process `threadExecutionWidth` threads
let threadgroupsPerGrid = MTLSize(width: (resultsCount + pipeline.threadExecutionWidth - 1) / pipeline.threadExecutionWidth, height: 1, depth: 1)
// Here we set that each threadgroup should process `threadExecutionWidth` threads, the only important thing for performance is that this number is a multiple of `threadExecutionWidth` (here 1 times)
let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)
encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
encoder.endEncoding()
var result : DataType = 0
cmds.commit()
cmds.waitUntilCompleted()
for elem in results {
result += elem
}
return result
}
func sumParralel1(data : Array<DataType>) -> UnsafeBufferPointer<DataType> {
let count = data.count
let elementsPerSum: Int = Int(sqrt(Double(count)))
let device = MTLCreateSystemDefaultDevice()!
let parsum = device.newDefaultLibrary()!.newFunctionWithName("parsum")!
let pipeline = try! device.newComputePipelineStateWithFunction(parsum)
var dataCount = CUnsignedInt(count)
var elementsPerSumC = CUnsignedInt(elementsPerSum)
let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of individual results = count / elementsPerSum (rounded up)
let dataBuffer = device.newBufferWithBytes(data, length: strideof(DataType) * count, options: []) // Our data in a buffer (copied)
let resultsBuffer = device.newBufferWithLength(strideof(DataType) * resultsCount, options: []) // A buffer for individual results (zero initialized)
let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()), count: resultsCount) // Our results in convenient form to compute the actual result later
let queue = device.newCommandQueue()
let cmds = queue.commandBuffer()
let encoder = cmds.computeCommandEncoder()
encoder.setComputePipelineState(pipeline)
encoder.setBuffer(dataBuffer, offset: 0, atIndex: 0)
encoder.setBytes(&dataCount, length: sizeofValue(dataCount), atIndex: 1)
encoder.setBuffer(resultsBuffer, offset: 0, atIndex: 2)
encoder.setBytes(&elementsPerSumC, length: sizeofValue(elementsPerSumC), atIndex: 3)
// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWidth` (rounded up) because each threadgroup will process `threadExecutionWidth` threads
let threadgroupsPerGrid = MTLSize(width: (resultsCount + pipeline.threadExecutionWidth - 1) / pipeline.threadExecutionWidth, height: 1, depth: 1)
// Here we set that each threadgroup should process `threadExecutionWidth` threads, the only important thing for performance is that this number is a multiple of `threadExecutionWidth` (here 1 times)
let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)
encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
encoder.endEncoding()
cmds.commit()
cmds.waitUntilCompleted()
return results
}
func sumParallel3(data : Array<DataType>) -> DataType {
var results = sumParralel1(data)
repeat {
results = sumParralel1(Array(results))
} while results.count >= 100
var result : DataType = 0
for elem in results {
result += elem
}
return result
}
func sumParallel4(data : Array<DataType>) -> DataType {
let queue = NSOperationQueue()
queue.maxConcurrentOperationCount = 4
var a0 : DataType = 0
var a1 : DataType = 0
var a2 : DataType = 0
var a3 : DataType = 0
let op0 = NSBlockOperation( block : {
for i in 0..<(data.count/4) {
a0 = a0 + data[i]
}
})
let op1 = NSBlockOperation( block : {
for i in (data.count/4)..<(data.count/2) {
a1 = a1 + data[i]
}
})
let op2 = NSBlockOperation( block : {
for i in (data.count/2)..<(3 * data.count/4) {
a2 = a2 + data[i]
}
})
let op3 = NSBlockOperation( block : {
for i in (3 * data.count/4)..<(data.count) {
a3 = a3 + data[i]
}
})
queue.addOperation(op0)
queue.addOperation(op1)
queue.addOperation(op2)
queue.addOperation(op3)
queue.suspended = false
queue.waitUntilAllOperationsAreFinished()
let aaa: DataType = a0 + a1 + a2 + a3
return aaa
}
}
And I have a shader that looks like this:
kernel void parsum(const device DataType* data [[ buffer(0) ]],
const device uint& dataLength [[ buffer(1) ]],
device DataType* sums [[ buffer(2) ]],
const device uint& elementsPerSum [[ buffer(3) ]],
const uint tgPos [[ threadgroup_position_in_grid ]],
const uint tPerTg [[ threads_per_threadgroup ]],
const uint tPos [[ thread_position_in_threadgroup ]]) {
uint resultIndex = tgPos * tPerTg + tPos; // This is the index of the individual result, this var is unique to this thread
uint dataIndex = resultIndex * elementsPerSum; // Where the summation should begin
uint endIndex = dataIndex + elementsPerSum < dataLength ? dataIndex + elementsPerSum : dataLength; // The index where summation should end
for (; dataIndex < endIndex; dataIndex++)
sums[resultIndex] += data[dataIndex];
}
On my surprise function sumParallel4 is the fastest, which I thought it shouldn't be. I noticed that when I call functions sumParralel and sumParallel3, the first function is always slower even if I change the order of function. (So if I call sumParralel first this is slower, if I call sumParallel3 this is slower.).
Why is this? Why is sumParallel3 not a lot faster than sumParallel ? Why is sumParallel4 the fastest, although it is calculated on CPU?
How can I update my GPU function with posix_memalign ? I know it should work faster because it would have shared memory between GPU and CPU, but I don't know witch array should be allocated this way (data or result) and how can I allocate data with posix_memalign if data is parameter passed in function?
In running these tests on an iPhone 6, I saw the Metal version run between 3x slower and 2x faster than the naive CPU summation. With the modifications I describe below, it was consistently faster.
I found that a lot of the cost in running the Metal version could be attributed not merely to the allocation of the buffers, though that was significant, but also to the first-time creation of the device and compute pipeline state. These are actions you'd normally perform once at application initialization, so it's not entirely fair to include them in the timing.
It should also be noted that if you're running these tests through Xcode with the Metal validation layer and GPU frame capture enabled, that has a significant run-time cost and will skew the results in the CPU's favor.
With those caveats, here's how you might use posix_memalign to allocate memory that can be used to back a MTLBuffer. The trick is to ensure that the memory you request is in fact page-aligned (i.e. its address is a multiple of getpagesize()), which may entail rounding up the amount of memory beyond how much you actually need to store your data:
let dataCount = 1_000_000
let dataSize = dataCount * strideof(DataType)
let pageSize = Int(getpagesize())
let pageCount = (dataSize + (pageSize - 1)) / pageSize
var dataPointer: UnsafeMutablePointer<Void> = nil
posix_memalign(&dataPointer, pageSize, pageCount * pageSize)
let data = UnsafeMutableBufferPointer(start: UnsafeMutablePointer<DataType>(dataPointer),
count: (pageCount * pageSize) / strideof(DataType))
for i in 0..<dataCount {
data[i] = 200
}
This does require making data an UnsafeMutableBufferPointer<DataType>, rather than an [DataType], since Swift's Array allocates its own backing store. You'll also need to pass along the count of data items to operate on, since the count of the mutable buffer pointer has been rounded up to make the buffer page-aligned.
To actually create a MTLBuffer backed with this data, use the newBufferWithBytesNoCopy(_:length:options:deallocator:) API. It's crucial that, once again, the length you provide is a multiple of the page size; otherwise this method returns nil:
let roundedUpDataSize = strideof(DataType) * data.count
let dataBuffer = device.newBufferWithBytesNoCopy(data.baseAddress, length: roundedUpDataSize, options: [], deallocator: nil)
Here, we don't provide a deallocator, but you should free the memory when you're done using it, by passing the baseAddress of the buffer pointer to free().
I am writing an app in Swift which employs the Scandit barcode scanning SDK. The SDK permits you to access camera frames directly and provides the frame as a CMSampleBuffer. They provide documentation in Objective-C, which I am having trouble getting to work in Swift. I do not know if the problem is in porting the code, or if there is something amiss with the sample buffer itself, perhaps due to a change in Core Media since their documentation was generated.
Their API exposes the frame as follows (Objective-C):
interface YourViewController () <SBSProcessFrameDelegate>
...
- (void)barcodePicker:(SBSBarcodePicker*)barcodePicker
didProcessFrame:(CMSampleBufferRef)frame
session:(SBSScanSession*)session {
// Process the frame yourself.
}
Building from several answers here on SO, I attempt to process the frame with:
let imageBuffer = CMSampleBufferGetImageBuffer(frame)!
CVPixelBufferLockBaseAddress(imageBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.NoneSkipFirst.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue)
let context = CGBitmapContextCreate(baseAddress, width, height, 8, bytesPerRow, colorSpace, bitmapInfo.rawValue)
let quartzImage = CGBitmapContextCreateImage(context)
CVPixelBufferUnlockBaseAddress(imageBuffer,0)
let image = UIImage(CGImage: quartzImage!)
But, this fails with:
Jan 29 09:01:30 Scandit[1308] <Error>: CGBitmapContextCreate: invalid data bytes/row: should be at least 7680 for 8 integer bits/component, 3 components, kCGImageAlphaNoneSkipFirst.
Jan 29 09:01:30 Scandit[1308] <Error>: CGBitmapContextCreateImage: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
fatal error: unexpectedly found nil while unwrapping an Optional value
The fatal error is in attempting to resolve a UIImage from quartzImage.
The width, height, and bytesPerRow are (at the base address):
Width: 1920
Height: 1080
Bytes per row: 2904
As passed from the delegate, here is what the buffer contains according to CMSampleBufferGetFormatDescription(frame):
Optional(<CMVideoFormatDescription 0x1447dafa0 [0x1a1864b68]> {
mediaType:'vide'
mediaSubType:'420f'
mediaSpecific: {
codecType: '420f' dimensions: 1920 x 1080
}
extensions: {<CFBasicHash 0x1447dba10 [0x1a1864b68]>{type = immutable dict, count = 6,
entries =>
0 : <CFString 0x19d28b678 [0x1a1864b68]>{contents = "CVImageBufferYCbCrMatrix"} = <CFString 0x19d28b6b8 [0x1a1864b68]>{contents = "ITU_R_601_4"}
1 : <CFString 0x19d28b7d8 [0x1a1864b68]>{contents = "CVImageBufferTransferFunction"} = <CFString 0x19d28b698 [0x1a1864b68]>{contents = "ITU_R_709_2"}
2 : <CFString 0x19d2b65c0 [0x1a1864b68]>{contents = "CVBytesPerRow"} = <CFNumber 0xb00000000000b582 [0x1a1864b68]>{value = +2904, type = kCFNumberSInt32Type}
3 : <CFString 0x19d2b6640 [0x1a1864b68]>{contents = "Version"} = <CFNumber 0xb000000000000022 [0x1a1864b68]>{value = +2, type = kCFNumberSInt32Type}
5 : <CFString 0x19d28b758 [0x1a1864b68]>{contents = "CVImageBufferColorPrimaries"} = <CFString 0x19d28b698 [0x1a1864b68]>{contents = "ITU_R_709_2"}
6 : <CFString 0x19d28b818 [0x1a1864b68]>{contents = "CVImageBufferChromaLocationTopField"} = <CFString 0x19d28b878 [0x1a1864b68]>{contents = "Center"}
}
}
})
I realize there may be multiple "planes" here, but even with:
let pixelBufferBytesPerRow0 = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0)
let pixelBufferBytesPerRow1 = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 1)
Gives:
Pixel buffer bytes per row (Plane 0): 1920
Pixel buffer bytes per row (Plane 1): 1920
I don't understand that discrepancy.
I also attempted to process each pixel individually as it is clear the buffer contains some manner of YCbCr, but it fails every way I have tried. The Scandit API suggest (Objective-C):
// Get the buffer info for the YCbCrBiPlanar format.
void *baseAddress = CVPixelBufferGetBaseAddress(imageBuffer);
CVPlanarPixelBufferInfo_YCbCrBiPlanar *bufferInfo = (CVPlanarPixelBufferInfo_YCbCrBiPlanar *)baseAddress;
But, I cannot find a Swift implementation that permits access to the buffer info using CVPlanarPixelBufferInfo... everything I have tried fails, so I am unable to determine the offset for "Y", "Cr", etc.
How can I access the pixel data in the buffer? Is this a problem with the CMSampleBuffer the SDK is passing, a problem with iOS9, or both?
Working from Codo's "hints" and integrating with Objective-C code in the Scandit documentation, I worked out a solution in Swift. Though I accepted Codo's answer as it helped tremendously, I'm also answering my own question in the hopes that a complete solution would help someone in the future:
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let chromaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
let lumaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let chromaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1)
let lumaBuffer = UnsafeMutablePointer<UInt8>(lumaBaseAddress)
let chromaBuffer = UnsafeMutablePointer<UInt8>(chromaBaseAddress)
var rgbaImage = [UInt8](count: 4*width*height, repeatedValue: 0)
for var x = 0; x < width; x++ {
for var y = 0; y < height; y++ {
let lumaIndex = x+y*lumaBytesPerRow
let chromaIndex = (y/2)*chromaBytesPerRow+(x/2)*2
let yp = lumaBuffer[lumaIndex]
let cb = chromaBuffer[chromaIndex]
let cr = chromaBuffer[chromaIndex+1]
let ri = Double(yp) + 1.402 * (Double(cr) - 128)
let gi = Double(yp) - 0.34414 * (Double(cb) - 128) - 0.71414 * (Double(cr) - 128)
let bi = Double(yp) + 1.772 * (Double(cb) - 128)
let r = UInt8(min(max(ri,0), 255))
let g = UInt8(min(max(gi,0), 255))
let b = UInt8(min(max(bi,0), 255))
rgbaImage[(x + y * width) * 4] = b
rgbaImage[(x + y * width) * 4 + 1] = g
rgbaImage[(x + y * width) * 4 + 2] = r
rgbaImage[(x + y * width) * 4 + 3] = 255
}
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let dataProvider: CGDataProviderRef = CGDataProviderCreateWithData(nil, rgbaImage, 4 * width * height, nil)!
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.NoneSkipFirst.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue)
let cgImage: CGImageRef = CGImageCreate(width, height, 8, 32, width * 4, colorSpace!, bitmapInfo, dataProvider, nil, true, CGColorRenderingIntent.RenderingIntentDefault)!
let image: UIImage = UIImage(CGImage: cgImage)
CVPixelBufferUnlockBaseAddress(pixelBuffer,0)
Despite iterating through the entire 8.3MP image, the code executes very quickly. I freely admit that I don't have a deep understanding of Core Media frameworks, but I believe this means the code is executing on the GPU. But, I would appreciate any comments on the code to make it more efficient, or to improve the "Swiftness" as I am completely an amateur.
This is not a complete answer, just some hints:
Scandit uses the YCbCrBiPlanar format. It has a Y byte for each pixel and a Cb and a Cr byte for each group of 2x2 pixels. The Y values are on the first plane, the Cb and Cr values on the second plane.
If the image is w x h pixels large, then the first plane contains h rows of w bytes (and maybe some padding for each line).
The second plane contains h / 2 lines of w / 2 pairs of byte. Each pair consists of a Cb and Cr value. Again each line might have some padding at the end.
So the value of Y for the pixel at position (x, y) can be found at the address:
Y: baseAddressPlane1 + y * bytesPerRowPlane1 + x
And the value Cb and Cr for the pixel at position (x, y) can be found at the address:
Cb: baseAddressPlane2 + (y / 2) * bytesPerRowPlan2 + (x / 2) * 2
Cr: baseAddressPlane2 + (y / 2) * bytesPerRowPlan2 + (x / 2) * 2 + 1
The divisions by 2 are integer divisions that discard the fractional part.
I am trying to get the CIColorCube filter working. However the Apple documents only provide a poorly explained reference example here:
// Allocate memory
const unsigned int size = 64;
float *cubeData = (float *)malloc (size * size * size * sizeof (float) * 4);
float rgb[3], hsv[3], *c = cubeData;
// Populate cube with a simple gradient going from 0 to 1
for (int z = 0; z < size; z++){
rgb[2] = ((double)z)/(size-1); // Blue value
for (int y = 0; y < size; y++){
rgb[1] = ((double)y)/(size-1); // Green value
for (int x = 0; x < size; x ++){
rgb[0] = ((double)x)/(size-1); // Red value
// Convert RGB to HSV
// You can find publicly available rgbToHSV functions on the Internet
rgbToHSV(rgb, hsv);
// Use the hue value to determine which to make transparent
// The minimum and maximum hue angle depends on
// the color you want to remove
float alpha = (hsv[0] > minHueAngle && hsv[0] < maxHueAngle) ? 0.0f: 1.0f;
// Calculate premultiplied alpha values for the cube
c[0] = rgb[0] * alpha;
c[1] = rgb[1] * alpha;
c[2] = rgb[2] * alpha;
c[3] = alpha;
c += 4; // advance our pointer into memory for the next color value
}
}
}
// Create memory with the cube data
NSData *data = [NSData dataWithBytesNoCopy:cubeData
length:cubeDataSize
freeWhenDone:YES];
CIColorCube *colorCube = [CIFilter filterWithName:#"CIColorCube"];
[colorCube setValue:#(size) forKey:#"inputCubeDimension"];
// Set data for cube
[colorCube setValue:data forKey:#"inputCubeData"];
So I have attempted to translate this over to Swift with the following:
var filter = CIFilter(name: "CIColorCube")
filter.setValue(ciImage, forKey: kCIInputImageKey)
filter.setDefaults()
var size: UInt = 64
var floatSize = UInt(sizeof(Float))
var cubeDataSize:size_t = size * size * size * floatSize * 4
var colorCubeData:Array<Float> = [
0,0,0,1,
0,0,0,1,
0,0,0,1,
0,0,0,1,
0,0,0,1,
0,0,0,1,
0,0,0,1,
0,0,0,1
]
var cubeData:NSData = NSData(bytesNoCopy: colorCubeData, length: cubeDataSize)
However I get an error when trying to create the cube data:
"Extra argument 'bytesNoCopy' in call"
Basically I am creating the cubeData wrong. Can you advise me on how to properly create the cubeData object in Swift?
Thanks!
Looks like you are after the chroma key filter recipe described here. Here's some code that works. You get a filter for the color you want to make transparent, described by its HSV angle:
func RGBtoHSV(r : Float, g : Float, b : Float) -> (h : Float, s : Float, v : Float) {
var h : CGFloat = 0
var s : CGFloat = 0
var v : CGFloat = 0
let col = UIColor(red: CGFloat(r), green: CGFloat(g), blue: CGFloat(b), alpha: 1.0)
col.getHue(&h, saturation: &s, brightness: &v, alpha: nil)
return (Float(h), Float(s), Float(v))
}
func colorCubeFilterForChromaKey(hueAngle: Float) -> CIFilter {
let hueRange: Float = 60 // degrees size pie shape that we want to replace
let minHueAngle: Float = (hueAngle - hueRange/2.0) / 360
let maxHueAngle: Float = (hueAngle + hueRange/2.0) / 360
let size = 64
var cubeData = [Float](repeating: 0, count: size * size * size * 4)
var rgb: [Float] = [0, 0, 0]
var hsv: (h : Float, s : Float, v : Float)
var offset = 0
for z in 0 ..< size {
rgb[2] = Float(z) / Float(size) // blue value
for y in 0 ..< size {
rgb[1] = Float(y) / Float(size) // green value
for x in 0 ..< size {
rgb[0] = Float(x) / Float(size) // red value
hsv = RGBtoHSV(r: rgb[0], g: rgb[1], b: rgb[2])
// the condition checking hsv.s may need to be removed for your use-case
let alpha: Float = (hsv.h > minHueAngle && hsv.h < maxHueAngle && hsv.s > 0.5) ? 0 : 1.0
cubeData[offset] = rgb[0] * alpha
cubeData[offset + 1] = rgb[1] * alpha
cubeData[offset + 2] = rgb[2] * alpha
cubeData[offset + 3] = alpha
offset += 4
}
}
}
let b = cubeData.withUnsafeBufferPointer { Data(buffer: $0) }
let data = b as NSData
let colorCube = CIFilter(name: "CIColorCube", withInputParameters: [
"inputCubeDimension": size,
"inputCubeData": data
])
return colorCube!
}
Then to get your filter call
let chromaKeyFilter = colorCubeFilterForChromaKey(hueAngle: 120)
I used 120 for your standard green screen.
I believe you want to use NSData(bytes: UnsafePointer<Void>, length: Int) instead of NSData(bytesNoCopy: UnsafeMutablePointer<Void>, length: Int). Make that change and calculate the length in the following way and you should be up and running.
let colorCubeData: [Float] = [
0, 0, 0, 1,
1, 0, 0, 1,
0, 1, 0, 1,
1, 1, 0, 1,
0, 0, 1, 1,
1, 0, 1, 1,
0, 1, 1, 1,
1, 1, 1, 1
]
let cubeData = NSData(bytes: colorCubeData, length: colorCubeData.count * sizeof(Float))