My custom metal image filter is slow. How can I make it faster? - ios

I've seen a lot of other's online tutorial that are able to achieve 0.0X seconds mark on filtering an image. Meanwhile my code here took 1.09 seconds to filter an image.(Just to reduce brightness by half).
edit after first comment
time measured with 2 methods
Date() timeinterval , when the button “apply filter” tapped and after the apply filter function is done running
build it on iphone and count manually with my timer on my watch
Since I'm new to metal & kernel stuff, I don't really know the difference between my code and those tutorials that achieve faster result. Which part of my code can be improved/ use different approach to make it a lot faster.
here's my kernel code
#include <metal_stdlib>
using namespace metal;
kernel void black(
texture2d<float, access::write> outTexture [[texture(0)]],
texture2d<float, access::read> inTexture [[texture(1)]],
uint2 id [[thread_position_in_grid]]) {
float3 val = inTexture.read(id).rgb;
float r = val.r / 4;
float g = val.g / 4;
float b = val.b / 2;
float4 out = float4(r, g, b, 1.0);
outTexture.write(out.rgba, id);
}
this is my swift code
import Metal
import MetalKit
// UIImage -> CGImage -> MTLTexture -> COMPUTE HAPPENS |
// UIImage <- CGImage <- MTLTexture <--
class Filter {
var device: MTLDevice
var defaultLib: MTLLibrary?
var grayscaleShader: MTLFunction?
var commandQueue: MTLCommandQueue?
var commandBuffer: MTLCommandBuffer?
var commandEncoder: MTLComputeCommandEncoder?
var pipelineState: MTLComputePipelineState?
var inputImage: UIImage
var height, width: Int
// most devices have a limit of 512 threads per group
let threadsPerBlock = MTLSize(width: 32, height: 32, depth: 1)
init(){
print("initialized")
self.device = MTLCreateSystemDefaultDevice()!
print(device)
//changes: I did do catch try, and use bundle parameter when making make default library
let frameworkBundle = Bundle(for: type(of: self))
print(frameworkBundle)
self.defaultLib = device.makeDefaultLibrary()
self.grayscaleShader = defaultLib?.makeFunction(name: "black")
self.commandQueue = self.device.makeCommandQueue()
self.commandBuffer = self.commandQueue?.makeCommandBuffer()
self.commandEncoder = self.commandBuffer?.makeComputeCommandEncoder()
//ERROR HERE
if let shader = grayscaleShader {
print("in")
self.pipelineState = try? self.device.makeComputePipelineState(function: shader)
} else { fatalError("unable to make compute pipeline") }
self.inputImage = UIImage(named: "stockImage")!
self.height = Int(self.inputImage.size.height)
self.width = Int(self.inputImage.size.width)
}
func getCGImage(from uiimg: UIImage) -> CGImage? {
UIGraphicsBeginImageContext(uiimg.size)
uiimg.draw(in: CGRect(origin: .zero, size: uiimg.size))
let contextImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return contextImage?.cgImage
}
func getMTLTexture(from cgimg: CGImage) -> MTLTexture {
let textureLoader = MTKTextureLoader(device: self.device)
do{
let texture = try textureLoader.newTexture(cgImage: cgimg, options: nil)
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: texture.pixelFormat, width: width, height: height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return texture
} catch {
fatalError("Couldn't convert CGImage to MTLtexture")
}
}
func getCGImage(from mtlTexture: MTLTexture) -> CGImage? {
var data = Array<UInt8>(repeatElement(0, count: 4*width*height))
mtlTexture.getBytes(&data,
bytesPerRow: 4*width,
from: MTLRegionMake2D(0, 0, width, height),
mipmapLevel: 0)
let bitmapInfo = CGBitmapInfo(rawValue: (CGBitmapInfo.byteOrder32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue))
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &data,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: 4*width,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
return context?.makeImage()
}
func getUIImage(from cgimg: CGImage) -> UIImage? {
return UIImage(cgImage: cgimg)
}
func getEmptyMTLTexture() -> MTLTexture? {
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: MTLPixelFormat.rgba8Unorm,
width: width,
height: height,
mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return self.device.makeTexture(descriptor: textureDescriptor)
}
func getInputMTLTexture() -> MTLTexture? {
if let inputImage = getCGImage(from: self.inputImage) {
return getMTLTexture(from: inputImage)
}
else { fatalError("Unable to convert Input image to MTLTexture") }
}
func getBlockDimensions() -> MTLSize {
let blockWidth = width / self.threadsPerBlock.width
let blockHeight = height / self.threadsPerBlock.height
return MTLSizeMake(blockWidth, blockHeight, 1)
}
func applyFilter() -> UIImage? {
print("start")
let date = Date()
print(date)
if let encoder = self.commandEncoder, let buffer = self.commandBuffer,
let outputTexture = getEmptyMTLTexture(), let inputTexture = getInputMTLTexture() {
encoder.setTextures([outputTexture, inputTexture], range: 0..<2)
encoder.setComputePipelineState(self.pipelineState!)
encoder.dispatchThreadgroups(self.getBlockDimensions(), threadsPerThreadgroup: threadsPerBlock)
encoder.endEncoding()
buffer.commit()
buffer.waitUntilCompleted()
guard let outputImage = getCGImage(from: outputTexture) else { fatalError("Couldn't obtain CGImage from MTLTexture") }
print("stop")
let date2 = Date()
print(date2.timeIntervalSince(date))
return getUIImage(from: outputImage)
} else { fatalError("optional unwrapping failed") }
}
}

In case someone still need the answer, I found a different approach which is make it as custom CIFilter. It works pretty fast and super easy to undestand!

You using UIImage, CGImage. These objects stored in CPU memory.
Need implement code with using just CIImage or MTLTexture.
These object are storing in GPU memory and have best performace.

Related

How do I get the distance of a specific coordinate from the screen size depthMap of ARDepthData?

I am trying to get the distance of a specific coordinate from a depthMap resized to the screen size, but it is not working.
I have tried to implement the following steps.
convert the depthMap to CIImage, and then resize the image to the orientation and size of the screen using affine transformation
convert the converted image to a screen-sized CVPixelBuffer
get the distance in meters stored in CVPixelBuffer from a one-dimensional array by width * y + x when getting the coordinates of (x, y).
I have implemented the above procedure, but I cannot get the appropriate index from the one-dimensional array. What should I do?
The code for the procedure is shown below.
1.
let depthMap = depthData.depthMap
// convert the depthMap to CIImage
let image = CIImage(cvPixelBuffer: depthMap)
let imageSize = CGSize(width: depthMap.width, height: depthMap.height)
// 1) キャプチャ画像を 0.0〜1.0 の座標に変換
let normalizeTransform = CGAffineTransform(scaleX: 1.0/imageSize.width, y: 1.0/imageSize.height)
// 2) 「Flip the Y axis (for some mysterious reason this is only necessary in portrait mode)」とのことでポートレートの場合に座標変換。
// Y軸だけでなくX軸も反転が必要。
let interfaceOrientation = self.arView.window!.windowScene!.interfaceOrientation
let flipTransform = (interfaceOrientation.isPortrait) ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity
// 3) キャプチャ画像上でのスクリーンの向き・位置に移動
let displayTransform = frame.displayTransform(for: interfaceOrientation, viewportSize: arView.bounds.size)
// 4) 0.0〜1.0 の座標系からスクリーンの座標系に変換
let toViewPortTransform = CGAffineTransform(scaleX: arView.bounds.size.width, y: arView.bounds.size.height)
// 5) 1〜4までの変換を行い、変換後の画像をスクリーンサイズでクリップ
let transformedImage = image.transformed(by: normalizeTransform.concatenating(flipTransform).concatenating(displayTransform).concatenating(toViewPortTransform)).cropped(to: arView.bounds)
// convert the converted image to a screen-sized CVPixelBuffer
if let convertDepthMap = transformedImage.pixelBuffer(cgSize: arView.bounds.size) {
previewImage.image = transformedImage.toUIImage()
DispatchQueue.main.async {
self.processDepthData(convertDepthMap)
}
}
// The process of acquiring CVPixelBuffer is implemented in extension
extension CIImage {
func toUIImage() -> UIImage {
UIImage(ciImage: self)
}
func pixelBuffer(cgSize size:CGSize) -> CVPixelBuffer? {
var pixelBuffer: CVPixelBuffer?
let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
let width:Int = Int(size.width)
let height:Int = Int(size.height)
CVPixelBufferCreate(kCFAllocatorDefault,
width,
height,
kCVPixelFormatType_DepthFloat32,
attrs,
&pixelBuffer)
// put bytes into pixelBuffer
let context = CIContext()
context.render(self, to: pixelBuffer!)
return pixelBuffer
}
}
private func processDepthData(_ depthMap: CVPixelBuffer) {
CVPixelBufferLockBaseAddress(depthMap, .readOnly)
let width = CVPixelBufferGetWidth(depthMap)
let height = CVPixelBufferGetHeight(depthMap)
if let baseAddress = CVPixelBufferGetBaseAddress(depthMap) {
let mutablePointer = baseAddress.bindMemory(to: Float32.self, capacity: width*height)
let bufferPointer = UnsafeBufferPointer(start: mutablePointer, count: width*height)
let depthArray = Array(bufferPointer)
CVPixelBufferUnlockBaseAddress(depthMap, .readOnly)
// index = width * y + x to trying to get the distance in meters for the coordinate of (300, 100), but it gets the distance for another coordinate
print(depthArray[width * 100 + 300])
}
}

Tensorflow lite image output different between python and iOS/Android

I convert Keras model to TF lite the output dimension is (1, 256, 256, 1).
the result on python is correct, but when I try to construct the image on ios swift the result is wrong.
Here is the code, that I use to construct an UIImage from a list of output.
// helper function
---------------------------------------
// MARK: - Extensions
extension Data {
init<T>(copyingBufferOf array: [T]) {
self = array.withUnsafeBufferPointer(Data.init)
}
/// Convert a Data instance to Array representation.
func toArray<T>(type: T.Type) -> [T] where T: ExpressibleByIntegerLiteral {
var array = [T](repeating: 0, count: self.count/MemoryLayout<T>.stride)
_ = array.withUnsafeMutableBytes { copyBytes(to: $0) }
return array
}
}
func imageFromSRGBColorArray(pixels: [UInt32], width: Int, height: Int) -> UIImage?
{
guard width > 0 && height > 0 else { return nil }
guard pixels.count == width * height else { return nil }
// Make a mutable copy
var data = pixels
// Convert array of pixels to a CGImage instance.
let cgImage = data.withUnsafeMutableBytes { (ptr) -> CGImage in
let ctx = CGContext(
data: ptr.baseAddress,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: MemoryLayout<UInt32>.size * width,
space: CGColorSpace(name: CGColorSpace.sRGB)!,
bitmapInfo: CGBitmapInfo.byteOrder32Little.rawValue
+ CGImageAlphaInfo.premultipliedFirst.rawValue
)!
return ctx.makeImage()!
}
// Convert the CGImage instance to an UIImage instance.
return UIImage(cgImage: cgImage)
}
let results = outputTensor.data.toArray(type: UInt32.self)
let maskImage = imageFromSRGBColorArray(pixels: results, width: 256, height: 256)
the result I get is completely wrong compared to python.
I think the function imageFromSRGBColorArray is not correct.
can anyone help me to figure out the problem?

Why is CoreML Prediction using over 10 times more RAM on older device?

I am using CoreML style transfer based on the torch2coreml implementation on git. For purposes herein, I have only substituted my mlmodel with an input/output size of 1200 pixels with the sample mlmodels.
This works perfectly on my iPhone 7 plus and uses a maximum of 65.11 MB of RAM. Running the identical code and identical mlmodel on an iPad Mini 2, it uses 758.87 MB of RAM before it crashes with an out of memory error.
Memory allocations on the iPhone 7 Plus:
Memory allocations on the iPad Mini 2:
Running on the iPad Mini, there are two 200 MB and one 197.77 MB Espresso library allocations that are not present on the iPhone 7+. The iPad Mini also uses a 49.39 MB allocation that the iPhone 7+ doesn't use, and three 16.48 MB allocations versus one 16.48 MB allocations on the iPhone 7+ (see screenshots above).
What on earth is going on, and how can I fix it?
Relevant code (download project linked above for full source):
private var inputImage = UIImage(named: "input")!
let imageSize = 1200
private let models = [
test().model
]
#IBAction func styleButtonTouched(_ sender: UIButton) {
guard let image = inputImage.scaled(to: CGSize(width: imageSize, height: imageSize), scalingMode: .aspectFit).cgImage else {
print("Could not get a CGImage")
return
}
let model = models[0] //Use my test model
toggleLoading(show: true)
DispatchQueue.global(qos: .userInteractive).async {
let stylized = self.stylizeImage(cgImage: image, model: model)
DispatchQueue.main.async {
self.toggleLoading(show: false)
self.imageView.image = UIImage(cgImage: stylized)
}
}
}
private func stylizeImage(cgImage: CGImage, model: MLModel) -> CGImage {
let input = StyleTransferInput(input: pixelBuffer(cgImage: cgImage, width: imageSize, height: imageSize))
let outFeatures = try! model.prediction(from: input)
let output = outFeatures.featureValue(for: "outputImage")!.imageBufferValue!
CVPixelBufferLockBaseAddress(output, .readOnly)
let width = CVPixelBufferGetWidth(output)
let height = CVPixelBufferGetHeight(output)
let data = CVPixelBufferGetBaseAddress(output)!
let outContext = CGContext(data: data,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(output),
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageByteOrderInfo.order32Little.rawValue | CGImageAlphaInfo.noneSkipFirst.rawValue)!
let outImage = outContext.makeImage()!
CVPixelBufferUnlockBaseAddress(output, .readOnly)
return outImage
}
private func pixelBuffer(cgImage: CGImage, width: Int, height: Int) -> CVPixelBuffer {
var pixelBuffer: CVPixelBuffer? = nil
let status = CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32BGRA , nil, &pixelBuffer)
if status != kCVReturnSuccess {
fatalError("Cannot create pixel buffer for image")
}
CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags.init(rawValue: 0))
let data = CVPixelBufferGetBaseAddress(pixelBuffer!)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGBitmapInfo.byteOrder32Little.rawValue | CGImageAlphaInfo.noneSkipFirst.rawValue)
let context = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: rgbColorSpace, bitmapInfo: bitmapInfo.rawValue)
context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
CVPixelBufferUnlockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
return pixelBuffer!
}
class StyleTransferInput : MLFeatureProvider {
/// input as color (kCVPixelFormatType_32BGRA) image buffer, 720 pixels wide by 720 pixels high
var input: CVPixelBuffer
var featureNames: Set<String> {
get {
return ["inputImage"]
}
}
func featureValue(for featureName: String) -> MLFeatureValue? {
if (featureName == "inputImage") {
return MLFeatureValue(pixelBuffer: input)
}
return nil
}
init(input: CVPixelBuffer) {
self.input = input
}
}

How to convert YUV frames (from OTVideoFrame) to CVPixelBuffer

I need to convert YUV Frames to CVPixelBuffer that I get from OTVideoFrame Class
This class provides an array of planes in the video frame which contains three elements for y,u,v frame each at index 0,1,2.
#property (nonatomic, retain) NSPointerArray *planes
and format of the video frame
#property (nonatomic, retain) OTVideoFormat *format
That contains Properties like width, height, bytesPerRow etc. of the frame
I need to add filter to the image I receive in the form of OTVideoFrame, I have already tried these answers :
How to convert from YUV to CIImage for iOS
Create CVPixelBuffer from YUV with IOSurface backed
These two links have the solutions in Objective-C but I want to do it in swift. One of the answers in second link was in swift but it lacks some information about the YUVFrame struct that the answer has reference to.
The Format that I receive is NV12
Here is what I have been trying to do till now but I don't know how to proceed next :-
/**
* Calcualte the size of each plane from OTVideoFrame.
*
* #param frame The frame to render.
* #return tuple containing three elements for size of each plane
*/
fileprivate func calculatePlaneSize(forFrame frame: OTVideoFrame)
-> (ySize: Int, uSize: Int, vSize: Int){
guard let frameFormat = frame.format
else {
return (0, 0 ,0)
}
let baseSize = Int(frameFormat.imageWidth * frameFormat.imageHeight) * MemoryLayout<GLubyte>.size
return (baseSize, baseSize / 4, baseSize / 4)
}
/**
* Renders a frame to the video renderer.
*
* #param frame The frame to render.
*/
func renderVideoFrame(_ frame: OTVideoFrame) {
let planeSize = calculatePlaneSize(forFrame: frame)
let yPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.ySize)
let uPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.uSize)
let vPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.vSize)
memcpy(yPlane, frame.planes?.pointer(at: 0), planeSize.ySize)
memcpy(uPlane, frame.planes?.pointer(at: 1), planeSize.uSize)
memcpy(vPlane, frame.planes?.pointer(at: 2), planeSize.vSize)
let yStride = frame.format!.bytesPerRow.object(at: 0) as! Int
// multiply chroma strides by 2 as bytesPerRow represents 2x2 subsample
let uStride = frame.format!.bytesPerRow.object(at: 1) as! Int
let vStride = frame.format!.bytesPerRow.object(at: 2) as! Int
let width = frame.format!.imageWidth
let height = frame.format!.imageHeight
var pixelBuffer: CVPixelBuffer? = nil
var err: CVReturn;
err = CVPixelBufferCreate(kCFAllocatorDefault, Int(width), Int(height), kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, nil, &pixelBuffer)
if (err != 0) {
NSLog("Error at CVPixelBufferCreate %d", err)
fatalError()
}
}
Taking Guidance from those two links I tried to create Pixel buffer but I got stuck every time at this point because the conversion of the Objective-C code after this is not similar to what we have in Swift 3.
For those who are looking for a fast solution, I did with swift Accelerate
using vImageConvert_AnyToAny(_:_:_:_:_:) function.
import Foundation
import Accelerate
import UIKit
import OpenTok
class Accelerater{
var infoYpCbCrToARGB = vImage_YpCbCrToARGB()
init() {
_ = configureYpCbCrToARGBInfo()
}
func configureYpCbCrToARGBInfo() -> vImage_Error {
print("Configuring")
var pixelRange = vImage_YpCbCrPixelRange(Yp_bias: 0,
CbCr_bias: 128,
YpRangeMax: 255,
CbCrRangeMax: 255,
YpMax: 255,
YpMin: 1,
CbCrMax: 255,
CbCrMin: 0)
let error = vImageConvert_YpCbCrToARGB_GenerateConversion(
kvImage_YpCbCrToARGBMatrix_ITU_R_601_4!,
&pixelRange,
&infoYpCbCrToARGB,
kvImage420Yp8_Cb8_Cr8,
kvImageARGB8888,
vImage_Flags(kvImagePrintDiagnosticsToConsole))
print("Configration done \(error)")
return error
}
public func convertFrameVImageYUV(toUIImage frame: OTVideoFrame, flag: Bool) -> UIImage {
var result: UIImage? = nil
let width = frame.format?.imageWidth ?? 0
let height = frame.format?.imageHeight ?? 0
var pixelBuffer: CVPixelBuffer? = nil
_ = CVPixelBufferCreate(kCFAllocatorDefault, Int(width), Int(height), kCVPixelFormatType_32BGRA, nil, &pixelBuffer)
_ = convertFrameVImageYUV(frame, to: pixelBuffer)
var ciImage: CIImage? = nil
if let pixelBuffer = pixelBuffer {
ciImage = CIImage(cvPixelBuffer: pixelBuffer)
}
let temporaryContext = CIContext(options: nil)
var uiImage: CGImage? = nil
if let ciImage = ciImage {
uiImage = temporaryContext.createCGImage(ciImage, from: CGRect(x: 0, y: 0, width: CVPixelBufferGetWidth(pixelBuffer!), height: CVPixelBufferGetHeight(pixelBuffer!)))
}
if let uiImage = uiImage {
result = UIImage(cgImage: uiImage)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer!, [])
return result!
}
func convertFrameVImageYUV(_ frame: OTVideoFrame, to pixelBufferRef: CVPixelBuffer?) -> vImage_Error{
let start = CFAbsoluteTimeGetCurrent()
if pixelBufferRef == nil {
print("No PixelBuffer refrance found")
return vImage_Error(kvImageInvalidParameter)
}
let width = frame.format?.imageWidth ?? 0
let height = frame.format?.imageHeight ?? 0
let subsampledWidth = frame.format!.imageWidth/2
let subsampledHeight = frame.format!.imageHeight/2
print("subsample height \(subsampledHeight) \(subsampledWidth)")
let planeSize = calculatePlaneSize(forFrame: frame)
print("ysize : \(planeSize.ySize) \(planeSize.uSize) \(planeSize.vSize)")
let yPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.ySize)
let uPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.uSize)
let vPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.vSize)
memcpy(yPlane, frame.planes?.pointer(at: 0), planeSize.ySize)
memcpy(uPlane, frame.planes?.pointer(at: 1), planeSize.uSize)
memcpy(vPlane, frame.planes?.pointer(at: 2), planeSize.vSize)
let yStride = frame.format!.bytesPerRow.object(at: 0) as! Int
// multiply chroma strides by 2 as bytesPerRow represents 2x2 subsample
let uStride = frame.format!.bytesPerRow.object(at: 1) as! Int
let vStride = frame.format!.bytesPerRow.object(at: 2) as! Int
var yPlaneBuffer = vImage_Buffer(data: yPlane, height: vImagePixelCount(height), width: vImagePixelCount(width), rowBytes: yStride)
var uPlaneBuffer = vImage_Buffer(data: uPlane, height: vImagePixelCount(subsampledHeight), width: vImagePixelCount(subsampledWidth), rowBytes: uStride)
var vPlaneBuffer = vImage_Buffer(data: vPlane, height: vImagePixelCount(subsampledHeight), width: vImagePixelCount(subsampledWidth), rowBytes: vStride)
CVPixelBufferLockBaseAddress(pixelBufferRef!, .readOnly)
let pixelBufferData = CVPixelBufferGetBaseAddress(pixelBufferRef!)
let rowBytes = CVPixelBufferGetBytesPerRow(pixelBufferRef!)
var destinationImageBuffer = vImage_Buffer()
destinationImageBuffer.data = pixelBufferData
destinationImageBuffer.height = vImagePixelCount(height)
destinationImageBuffer.width = vImagePixelCount(width)
destinationImageBuffer.rowBytes = rowBytes
var permuteMap: [UInt8] = [3, 2, 1, 0] // BGRA
let convertError = vImageConvert_420Yp8_Cb8_Cr8ToARGB8888(&yPlaneBuffer, &uPlaneBuffer, &vPlaneBuffer, &destinationImageBuffer, &infoYpCbCrToARGB, &permuteMap, 255, vImage_Flags(kvImagePrintDiagnosticsToConsole))
CVPixelBufferUnlockBaseAddress(pixelBufferRef!, [])
yPlane.deallocate()
uPlane.deallocate()
vPlane.deallocate()
let end = CFAbsoluteTimeGetCurrent()
print("Decoding time \((end-start)*1000)")
return convertError
}
fileprivate func calculatePlaneSize(forFrame frame: OTVideoFrame)
-> (ySize: Int, uSize: Int, vSize: Int)
{
guard let frameFormat = frame.format
else {
return (0, 0 ,0)
}
let baseSize = Int(frameFormat.imageWidth * frameFormat.imageHeight) * MemoryLayout<GLubyte>.size
return (baseSize, baseSize / 4, baseSize / 4)
}
}
Performance tested on iPhone7, one frame conversion is less than a millisecond.
Here's what worked for me (I've taken your function and changed it a bit):
func createPixelBufferWithVideoFrame(_ frame: OTVideoFrame) -> CVPixelBuffer? {
if let fLock = frameLock {
fLock.lock()
let planeSize = calculatePlaneSize(forFrame: frame)
let yPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.ySize)
let uPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.uSize)
let vPlane = UnsafeMutablePointer<GLubyte>.allocate(capacity: planeSize.vSize)
memcpy(yPlane, frame.planes?.pointer(at: 0), planeSize.ySize)
memcpy(uPlane, frame.planes?.pointer(at: 1), planeSize.uSize)
memcpy(vPlane, frame.planes?.pointer(at: 2), planeSize.vSize)
let width = frame.format!.imageWidth
let height = frame.format!.imageHeight
var pixelBuffer: CVPixelBuffer? = nil
var err: CVReturn;
err = CVPixelBufferCreate(kCFAllocatorDefault, Int(width), Int(height), kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, nil, &pixelBuffer)
if (err != 0) {
NSLog("Error at CVPixelBufferCreate %d", err)
return nil
}
if let pixelBuffer = pixelBuffer {
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let yPlaneTo = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
memcpy(yPlaneTo, yPlane, planeSize.ySize)
let uvRow: Int = planeSize.uSize*2/Int(width)
let halfWidth: Int = Int(width)/2
if let uPlaneTo = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1) {
let uvPlaneTo = uPlaneTo.bindMemory(to: GLubyte.self, capacity: Int(uvRow*halfWidth*2))
for i in 0..<uvRow {
for j in 0..<halfWidth {
let dataIndex: Int = Int(i) * Int(halfWidth) + Int(j)
let uIndex: Int = (i * Int(width)) + Int(j) * 2
let vIndex: Int = uIndex + 1
uvPlaneTo[uIndex] = uPlane[dataIndex]
uvPlaneTo[vIndex] = vPlane[dataIndex]
}
}
}
}
fLock.unlock()
return pixelBuffer
}
return nil
}

Rendering a SceneKit scene to video output

As a primarily high-level/iOS dev, I'm interested in using SceneKit for animation projects.
I've been having fun with SceneKit for some months now, despite it obviously being designed for 'live' interaction, I would find it incredibly useful to be able to 'render' an SKScene to video. Currently, I've been using Quicktime's screen recorder to capture video output, but (of course) the frame-rate drops in doing so. Is there an alternative that allows a scene to be rendered at its own pace and outputted as a smooth video file?
I understand this is unlikely to be possible... Just thought I'd ask in case I was missing something lower-level!
You could use an SCNRenderer to render to a CGImage offscreen, then add the CGImage to a video stream using AVFoundation.
I wrote this Swift extension for rendering into a CGImage.
public extension SCNRenderer {
public func renderToImageSize(size: CGSize, floatComponents: Bool, atTime time: NSTimeInterval) -> CGImage? {
var thumbnailCGImage: CGImage?
let width = GLsizei(size.width), height = GLsizei(size.height)
let samplesPerPixel = 4
#if os(iOS)
let oldGLContext = EAGLContext.currentContext()
let glContext = unsafeBitCast(context, EAGLContext.self)
EAGLContext.setCurrentContext(glContext)
objc_sync_enter(glContext)
#elseif os(OSX)
let oldGLContext = CGLGetCurrentContext()
let glContext = unsafeBitCast(context, CGLContextObj.self)
CGLSetCurrentContext(glContext)
CGLLockContext(glContext)
#endif
// set up the OpenGL buffers
var thumbnailFramebuffer: GLuint = 0
glGenFramebuffers(1, &thumbnailFramebuffer)
glBindFramebuffer(GLenum(GL_FRAMEBUFFER), thumbnailFramebuffer); checkGLErrors()
var colorRenderbuffer: GLuint = 0
glGenRenderbuffers(1, &colorRenderbuffer)
glBindRenderbuffer(GLenum(GL_RENDERBUFFER), colorRenderbuffer)
if floatComponents {
glRenderbufferStorage(GLenum(GL_RENDERBUFFER), GLenum(GL_RGBA16F), width, height)
} else {
glRenderbufferStorage(GLenum(GL_RENDERBUFFER), GLenum(GL_RGBA8), width, height)
}
glFramebufferRenderbuffer(GLenum(GL_FRAMEBUFFER), GLenum(GL_COLOR_ATTACHMENT0), GLenum(GL_RENDERBUFFER), colorRenderbuffer); checkGLErrors()
var depthRenderbuffer: GLuint = 0
glGenRenderbuffers(1, &depthRenderbuffer)
glBindRenderbuffer(GLenum(GL_RENDERBUFFER), depthRenderbuffer)
glRenderbufferStorage(GLenum(GL_RENDERBUFFER), GLenum(GL_DEPTH_COMPONENT24), width, height)
glFramebufferRenderbuffer(GLenum(GL_FRAMEBUFFER), GLenum(GL_DEPTH_ATTACHMENT), GLenum(GL_RENDERBUFFER), depthRenderbuffer); checkGLErrors()
let framebufferStatus = Int32(glCheckFramebufferStatus(GLenum(GL_FRAMEBUFFER)))
assert(framebufferStatus == GL_FRAMEBUFFER_COMPLETE)
if framebufferStatus != GL_FRAMEBUFFER_COMPLETE {
return nil
}
// clear buffer
glViewport(0, 0, width, height)
glClear(GLbitfield(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)); checkGLErrors()
// render
renderAtTime(time); checkGLErrors()
// create the image
if floatComponents { // float components (16-bits of actual precision)
// slurp bytes out of OpenGL
typealias ComponentType = Float
var imageRawBuffer = [ComponentType](count: Int(width * height) * samplesPerPixel * sizeof(ComponentType), repeatedValue: 0)
glReadPixels(GLint(0), GLint(0), width, height, GLenum(GL_RGBA), GLenum(GL_FLOAT), &imageRawBuffer)
// flip image vertically — OpenGL has a different 'up' than CoreGraphics
let rowLength = Int(width) * samplesPerPixel
for rowIndex in 0..<(Int(height) / 2) {
let baseIndex = rowIndex * rowLength
let destinationIndex = (Int(height) - 1 - rowIndex) * rowLength
swap(&imageRawBuffer[baseIndex..<(baseIndex + rowLength)], &imageRawBuffer[destinationIndex..<(destinationIndex + rowLength)])
}
// make the CGImage
var imageBuffer = vImage_Buffer(
data: UnsafeMutablePointer<Float>(imageRawBuffer),
height: vImagePixelCount(height),
width: vImagePixelCount(width),
rowBytes: Int(width) * sizeof(ComponentType) * samplesPerPixel)
var format = vImage_CGImageFormat(
bitsPerComponent: UInt32(sizeof(ComponentType) * 8),
bitsPerPixel: UInt32(sizeof(ComponentType) * samplesPerPixel * 8),
colorSpace: nil, // defaults to sRGB
bitmapInfo: CGBitmapInfo(CGImageAlphaInfo.PremultipliedLast.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue | CGBitmapInfo.FloatComponents.rawValue),
version: UInt32(0),
decode: nil,
renderingIntent: kCGRenderingIntentDefault)
var error: vImage_Error = 0
thumbnailCGImage = vImageCreateCGImageFromBuffer(&imageBuffer, &format, nil, nil, vImage_Flags(kvImagePrintDiagnosticsToConsole), &error)!.takeRetainedValue()
} else { // byte components
// slurp bytes out of OpenGL
typealias ComponentType = UInt8
var imageRawBuffer = [ComponentType](count: Int(width * height) * samplesPerPixel * sizeof(ComponentType), repeatedValue: 0)
glReadPixels(GLint(0), GLint(0), width, height, GLenum(GL_RGBA), GLenum(GL_UNSIGNED_BYTE), &imageRawBuffer)
// flip image vertically — OpenGL has a different 'up' than CoreGraphics
let rowLength = Int(width) * samplesPerPixel
for rowIndex in 0..<(Int(height) / 2) {
let baseIndex = rowIndex * rowLength
let destinationIndex = (Int(height) - 1 - rowIndex) * rowLength
swap(&imageRawBuffer[baseIndex..<(baseIndex + rowLength)], &imageRawBuffer[destinationIndex..<(destinationIndex + rowLength)])
}
// make the CGImage
var imageBuffer = vImage_Buffer(
data: UnsafeMutablePointer<Float>(imageRawBuffer),
height: vImagePixelCount(height),
width: vImagePixelCount(width),
rowBytes: Int(width) * sizeof(ComponentType) * samplesPerPixel)
var format = vImage_CGImageFormat(
bitsPerComponent: UInt32(sizeof(ComponentType) * 8),
bitsPerPixel: UInt32(sizeof(ComponentType) * samplesPerPixel * 8),
colorSpace: nil, // defaults to sRGB
bitmapInfo: CGBitmapInfo(CGImageAlphaInfo.PremultipliedLast.rawValue | CGBitmapInfo.ByteOrder32Big.rawValue),
version: UInt32(0),
decode: nil,
renderingIntent: kCGRenderingIntentDefault)
var error: vImage_Error = 0
thumbnailCGImage = vImageCreateCGImageFromBuffer(&imageBuffer, &format, nil, nil, vImage_Flags(kvImagePrintDiagnosticsToConsole), &error)!.takeRetainedValue()
}
#if os(iOS)
objc_sync_exit(glContext)
if oldGLContext != nil {
EAGLContext.setCurrentContext(oldGLContext)
}
#elseif os(OSX)
CGLUnlockContext(glContext)
if oldGLContext != nil {
CGLSetCurrentContext(oldGLContext)
}
#endif
return thumbnailCGImage
}
}
func checkGLErrors() {
var glError: GLenum
var hadError = false
do {
glError = glGetError()
if glError != 0 {
println(String(format: "OpenGL error %#x", glError))
hadError = true
}
} while glError != 0
assert(!hadError)
}
** This is the answer for SceneKit using Metal.
** Warning: This may not be a proper method for App Store. But it's working.
Step 1: Swap the method of nextDrawable of CAMetalLayer with a new one using swizzling.
Save the CAMetalDrawable for each render loop.
extension CAMetalLayer {
public static func setupSwizzling() {
struct Static {
static var token: dispatch_once_t = 0
}
dispatch_once(&Static.token) {
let copiedOriginalSelector = #selector(CAMetalLayer.orginalNextDrawable)
let originalSelector = #selector(CAMetalLayer.nextDrawable)
let swizzledSelector = #selector(CAMetalLayer.newNextDrawable)
let copiedOriginalMethod = class_getInstanceMethod(self, copiedOriginalSelector)
let originalMethod = class_getInstanceMethod(self, originalSelector)
let swizzledMethod = class_getInstanceMethod(self, swizzledSelector)
let oldImp = method_getImplementation(originalMethod)
method_setImplementation(copiedOriginalMethod, oldImp)
method_exchangeImplementations(originalMethod, swizzledMethod)
}
}
func newNextDrawable() -> CAMetalDrawable? {
let drawable = orginalNextDrawable()
// Save the drawable to any where you want
AppManager.sharedInstance.currentSceneDrawable = drawable
return drawable
}
func orginalNextDrawable() -> CAMetalDrawable? {
// This is just a placeholder. Implementation will be replaced with nextDrawable.
return nil
}
}
Step 2:
Setup the swizzling in AppDelegate: didFinishLaunchingWithOptions
func application(application: UIApplication, didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool {
CAMetalLayer.setupSwizzling()
return true
}
Step 3:
Disable framebufferOnly for your's SCNView's CAMetalLayer (In order to call getBytes for MTLTexture)
if let metalLayer = scnView.layer as? CAMetalLayer {
metalLayer.framebufferOnly = false
}
Step 4:
In your SCNView's delegate (SCNSceneRendererDelegate), play with the texture
func renderer(renderer: SCNSceneRenderer, didRenderScene scene: SCNScene, atTime time: NSTimeInterval) {
if let texture = AppManager.sharedInstance.currentSceneDrawable?.texture where !texture.framebufferOnly {
AppManager.sharedInstance.currentSceneDrawable = nil
// Get image from texture
let image = texture.toImage()
// Use the image for video recording
}
}
extension MTLTexture {
func bytes() -> UnsafeMutablePointer<Void> {
let width = self.width
let height = self.height
let rowBytes = self.width * 4
let p = malloc(width * height * 4) //Beware for memory leak
self.getBytes(p, bytesPerRow: rowBytes, fromRegion: MTLRegionMake2D(0, 0, width, height), mipmapLevel: 0)
return p
}
func toImage() -> UIImage? {
var uiImage: UIImage?
let p = bytes()
let pColorSpace = CGColorSpaceCreateDeviceRGB()
let rawBitmapInfo = CGImageAlphaInfo.NoneSkipFirst.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue
let bitmapInfo:CGBitmapInfo = CGBitmapInfo(rawValue: rawBitmapInfo)
let selftureSize = self.width * self.height * 4
let rowBytes = self.width * 4
let provider = CGDataProviderCreateWithData(nil, p, selftureSize, {_,_,_ in })!
if let cgImage = CGImageCreate(self.width, self.height, 8, 32, rowBytes, pColorSpace, bitmapInfo, provider, nil, true, CGColorRenderingIntent.RenderingIntentDefault) {
uiImage = UIImage(CGImage: cgImage)
}
return uiImage
}
func toImageAsJpeg(compressionQuality: CGFloat) -> UIImage? {
}
}
Step 5 (Optional):
You may need to confirm the drawable at CAMetalLayer you are getting is your target. (If more then one CAMetalLayer at the same time)
It would actually be pretty easy! Here's a pseudo code of how I would do it (on the SCNView):
int numberOfFrames = 300;
int currentFrame = 0;
int framesPerSecond = 30;
-(void) renderAFrame{
[self renderAtTime:1/framesPerSecond];
NSImage *frame = [self snapshot];
// save the image with the frame number in the name such as f_001.png
currentFrame++;
if(currentFrame < numberOfFrames){
[self renderAFrame];
}
}
It will output you a sequence of images, rendered at 30 frames per second, that you can import in any editing software and convert to video.
You can do it this way with a SKVideoNode you put into a SKScene that you use to map as a SCNode's SCMaterial.Diffuse.Content (Hope that's clear ;) )
player = AVPlayer(URL: fileURL!)
let videoSpriteKitNodeLeft = SKVideoNode(AVPlayer: player)
let videoNodeLeft = SCNNode()
let spriteKitScene1 = SKScene(size: CGSize(width: 1280 * screenScale, height: 1280 * screenScale))
spriteKitScene1.shouldRasterize = true
videoNodeLeft.geometry = SCNSphere(radius: 30)
spriteKitScene1.scaleMode = .AspectFit
videoSpriteKitNodeLeft.position = CGPoint(
x: spriteKitScene1.size.width / 2.0, y: spriteKitScene1.size.height / 2.0)
videoSpriteKitNodeLeft.size = spriteKitScene1.size
spriteKitScene1.addChild(videoSpriteKitNodeLeft)
videoNodeLeft.geometry?.firstMaterial?.diffuse.contents = spriteKitScene1
videoNodeLeft.geometry?.firstMaterial?.doubleSided = true
// Flip video upside down, so that it's shown in the right position
var transform = SCNMatrix4MakeRotation(Float(M_PI), 0.0, 0.0, 1.0)
transform = SCNMatrix4Translate(transform, 1.0, 1.0, 0.0)
videoNodeLeft.pivot = SCNMatrix4MakeRotation(Float(M_PI_2), 0.0, -1.0, 0.0)
videoNodeLeft.geometry?.firstMaterial?.diffuse.contentsTransform = transform
videoNodeLeft.position = SCNVector3(x: 0, y: 0, z: 0)
scene.rootNode.addChildNode(videoNodeLeft)
I've extracted the code from a github project of mine for a 360 video player using SceneKit to play a video inside a 3D Sphere: https://github.com/Aralekk/simple360player_iOS/blob/master/simple360player/ViewController.swift
I hope this helps !
Arthur

Resources