I'm trying to convert an example from Bob McCune's Learning AVFoundation book and having some issues using AVAssetReader and NSInputStream. The graph should be a pure sine wave but the values seem reflected on the X-axis somehow.
I've tried every iteration of byte swapping I could think of and that didn't work.
Playground posted to github here:
//: Playground - noun: a place where people can play
import UIKit
import AVFoundation
import XCPlayground
func plotArrayInPlayground<T>(arrayToPlot:Array<T>, title:String) {
for currentValue in arrayToPlot {
XCPCaptureValue(title, value: currentValue)
class SSSampleDataFilter {
var sampleData:NSData?
init(data:NSData) {
sampleData = data
func filteredSamplesForSize(size:CGSize) -> [Int]{
var filterSamples = [UInt16]()
if let sampleData = sampleData {
let sampleCount = sampleData.length
let binSize = CGFloat(sampleCount) / size.width
let stream = NSInputStream(data: sampleData)
var readBuffer = Array<UInt8>(count: 16 * 1024, repeatedValue: 0)
var totalBytesRead = 0
let size = sizeof(UInt16)
while (totalBytesRead < sampleData.length) {
let numberOfBytesRead = stream.read(&readBuffer, maxLength: size)
let u16: UInt16 = UnsafePointer<UInt16>(readBuffer).memory
var sampleBin = [UInt16]()
for _ in 0..<Int(binSize) {
totalBytesRead += numberOfBytesRead
//plotArrayInPlayground(filterSamples, title: "Samples")
return [0]
let sineURL = NSBundle.mainBundle().URLForResource("440.0-sine", withExtension: "aif")!
let asset = AVAsset(URL: sineURL)
var assetReader:AVAssetReader
assetReader = try AVAssetReader(asset: asset)
fatalError("Unable to read Asset: \(error) : \(__FUNCTION__).")
let track = asset.tracksWithMediaType(AVMediaTypeAudio).first
let outputSettings: [String:Int] =
[ AVFormatIDKey: Int(kAudioFormatLinearPCM),
AVLinearPCMIsBigEndianKey: 0,
AVLinearPCMIsFloatKey: 0,
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsNonInterleaved: 0]
let trackOutput = AVAssetReaderTrackOutput(track: track!, outputSettings: outputSettings)
var sampleData = NSMutableData()
while assetReader.status == AVAssetReaderStatus.Reading {
if let sampleBufferRef = trackOutput.copyNextSampleBuffer() {
if let blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef) {
let bufferLength = CMBlockBufferGetDataLength(blockBufferRef)
var data = NSMutableData(length: bufferLength)
CMBlockBufferCopyDataBytes(blockBufferRef, 0, bufferLength, data!.mutableBytes)
var samples = UnsafeMutablePointer<Int16>(data!.mutableBytes)
sampleData.appendBytes(samples, length: bufferLength)
let view = UIView(frame: CGRectMake(0, 0, 375.0, 667.0))
//view.backgroundColor = UIColor.lightGrayColor()
if assetReader.status == AVAssetReaderStatus.Completed {
let filter = SSSampleDataFilter(data: sampleData)
let filteredSamples = filter.filteredSamplesForSize(view.bounds.size)
//XCPShowView("Bezier Path", view: view)
Here's what the graph should look like (taken from Audacity)
Here's what the graph looks like in the playground

Unfortunately your playground doesn't render anything for me in Xcode7b5, however you're asking the AVAssetReaderTrackOutput to give you signed 16bit ints, yet your code treats them as unsigned UInt16s (and your Audacity file uses floats).
Changing all instances of UInt16 to Int16 in your playground seems to print sensible looking sinusoidal data.


How to make surface material double-sided for MDLAsset?

I am trying to create an app that allows me to scan a room and then export a 3D file using lidar. I am able to do this with the following code (thanks to ARKit – How to export OBJ from iPhone/iPad with LiDAR?); however, it seems that the surfaces are all one-sided (when you rotate the object around, the backside of every surface is clear). How can I make the surface double-sided in my saveButtonTapped method?
import RealityKit
import ARKit
import MetalKit
import ModelIO
#IBOutlet var arView: ARView!
var saveButton: UIButton!
let rect = CGRect(x: 50, y: 50, width: 100, height: 50)
override func viewDidLoad() {
let tui = UIControl.Event.touchUpInside
saveButton = UIButton(frame: rect)
saveButton.setTitle("Save", for: [])
saveButton.addTarget(self, action: #selector(saveButtonTapped), for: tui)
#objc func saveButtonTapped(sender: UIButton) {
print("Saving is executing...")
guard let frame = arView.session.currentFrame
else { fatalError("Can't get ARFrame") }
guard let device = MTLCreateSystemDefaultDevice()
else { fatalError("Can't create MTLDevice") }
let allocator = MTKMeshBufferAllocator(device: device)
let asset = MDLAsset(bufferAllocator: allocator)
let meshAnchors = frame.anchors.compactMap { $0 as? ARMeshAnchor }
for ma in meshAnchors {
let geometry = ma.geometry
let vertices = geometry.vertices
let faces = geometry.faces
let vertexPointer = vertices.buffer.contents()
let facePointer = faces.buffer.contents()
for vtxIndex in 0 ..< vertices.count {
let vertex = geometry.vertex(at: UInt32(vtxIndex))
var vertexLocalTransform = matrix_identity_float4x4
vertexLocalTransform.columns.3 = SIMD4<Float>(x: vertex.0,
y: vertex.1,
z: vertex.2,
w: 1.0)
let vertexWorldTransform = (ma.transform * vertexLocalTransform).position
let vertexOffset = vertices.offset + vertices.stride * vtxIndex
let componentStride = vertices.stride / 3
vertexPointer.storeBytes(of: vertexWorldTransform.x,
toByteOffset: vertexOffset,
as: Float.self)
vertexPointer.storeBytes(of: vertexWorldTransform.y,
toByteOffset: vertexOffset + componentStride,
as: Float.self)
vertexPointer.storeBytes(of: vertexWorldTransform.z,
toByteOffset: vertexOffset + (2 * componentStride),
as: Float.self)
let byteCountVertices = vertices.count * vertices.stride
let byteCountFaces = faces.count * faces.indexCountPerPrimitive * faces.bytesPerIndex
let vertexBuffer = allocator.newBuffer(with: Data(bytesNoCopy: vertexPointer,
count: byteCountVertices,
deallocator: .none), type: .vertex)
let indexBuffer = allocator.newBuffer(with: Data(bytesNoCopy: facePointer,
count: byteCountFaces,
deallocator: .none), type: .index)
let indexCount = faces.count * faces.indexCountPerPrimitive
let material = MDLMaterial(name: "material",
scatteringFunction: MDLPhysicallyPlausibleScatteringFunction())
let submesh = MDLSubmesh(indexBuffer: indexBuffer,
indexCount: indexCount,
indexType: .uInt32,
geometryType: .triangles,
material: material)
let vertexFormat = MTKModelIOVertexFormatFromMetal(vertices.format)
let vertexDescriptor = MDLVertexDescriptor()
vertexDescriptor.attributes[0] = MDLVertexAttribute(name: MDLVertexAttributePosition,
format: vertexFormat,
offset: 0,
bufferIndex: 0)
vertexDescriptor.layouts[0] = MDLVertexBufferLayout(stride: ma.geometry.vertices.stride)
let mesh = MDLMesh(vertexBuffer: vertexBuffer,
vertexCount: ma.geometry.vertices.count,
descriptor: vertexDescriptor,
submeshes: [submesh])
let filePath = FileManager.default.urls(for: .documentDirectory,
in: .userDomainMask).first!
let usd: URL = filePath.appendingPathComponent("model.usd")
if MDLAsset.canExportFileExtension("usd") {
do {
try asset.export(to: usd)
let controller = UIActivityViewController(activityItems: [usd],
applicationActivities: nil)
controller.popoverPresentationController?.sourceView = sender
self.present(controller, animated: true, completion: nil)
} catch let error {
} else {
fatalError("Can't export USD")

Having trouble with input image with iOS Swift TensorFlowLite Image Classification Model?

I've been trying to add a plant recognition classifier to my app through a Firebase cloud-hosted ML model, and I've gotten close - problem is, I'm pretty sure I'm messing up the input for the image data somewhere along the way. My classifier is churning out nonsense probabilities/results based on this classifier's output, and I've been testing the same classifier through a python script which is giving me accurate results.
The input for the model requires a 224x224 image with 3 channels scaled to 0,1. I've done all this but can't seem to figure out the CGImage through the Camera/ImagePicker. Here is the bit of the code that processes the input for the image:
if let imageData = info[.originalImage] as? UIImage {
DispatchQueue.main.async {
let resizedImage = imageData.scaledImage(with: CGSize(width:224, height:224))
let ciImage = CIImage(image: resizedImage!)
let CGcontext = CIContext(options: nil)
let image : CGImage = CGcontext.createCGImage(ciImage!, from: ciImage!.extent)!
guard let context = CGContext(
data: nil,
width: image.width, height: image.height,
bitsPerComponent: 8, bytesPerRow: image.width * 4,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {
context.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
guard let imageData = context.data else { return }
print("Image data showing as: \(imageData)")
var inputData = Data()
do {
for row in 0 ..< 224 {
for col in 0 ..< 224 {
let offset = 4 * (row * context.width + col)
// (Ignore offset 0, the unused alpha channel)
let red = imageData.load(fromByteOffset: offset+1, as: UInt8.self)
let green = imageData.load(fromByteOffset: offset+2, as: UInt8.self)
let blue = imageData.load(fromByteOffset: offset+3, as: UInt8.self)
// Normalize channel values to [0.0, 1.0].
var normalizedRed = Float32(red) / 255.0
var normalizedGreen = Float32(green) / 255.0
var normalizedBlue = Float32(blue) / 255.0
// Append normalized values to Data object in RGB order.
let elementSize = MemoryLayout.size(ofValue: normalizedRed)
var bytes = [UInt8](repeating: 0, count: elementSize)
memcpy(&bytes, &normalizedRed, elementSize)
inputData.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedGreen, elementSize)
inputData.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedBlue, elementSize)
inputData.append(&bytes, count: elementSize)
print("Successfully added inputData")
self.parent.invokeInterpreter(inputData: inputData)
} catch let error {
print("Failed to add input: \(error)")
Afterwards, I process the inputData with the following:
func invokeInterpreter(inputData: Data) {
do {
var interpreter = try Interpreter(modelPath: ProfileUserData.sharedUserData.modelPath)
var labels: [String] = []
try interpreter.allocateTensors()
try interpreter.copy(inputData, toInputAt: 0)
try interpreter.invoke()
let output = try interpreter.output(at: 0)
switch output.dataType {
case .uInt8:
guard let quantization = output.quantizationParameters else {
print("No results returned because the quantization values for the output tensor are nil.")
let quantizedResults = [UInt8](output.data)
let results = quantizedResults.map {
quantization.scale * Float(Int($0) - quantization.zeroPoint)
let sum = results.reduce(0, +)
print("Sum of all dequantized results is: \(sum)")
print("Count of dequantized results is: \(results.indices.count)")
let filename = "plantLabels"
let fileExtension = "csv"
guard let labelPath = Bundle.main.url(forResource: filename, withExtension: fileExtension) else {
print("Labels file not found in bundle. Please check labels file.")
do {
let contents = try String(contentsOf: labelPath, encoding: .utf8)
labels = contents.components(separatedBy: .newlines)
print("Count of label rows is: \(labels.indices.count)")
} catch {
fatalError("Labels file named \(filename).\(fileExtension) cannot be read. Please add a " +
"valid labels file and try again.")
let zippedResults = zip(labels.indices, results)
// Sort the zipped results by confidence value in descending order.
let sortedResults = zippedResults.sorted { $0.1 > $1.1 }.prefix(3)
print("Printing sortedResults: \(sortedResults)")
case .float32:
print("Output tensor data type [Float32] is unsupported for this model.")
print("Output tensor data type \(output.dataType) is unsupported for this model.")
} catch {
//Error with interpreter
print("Error with running interpreter: \(error.localizedDescription)")

Swift AVFoundation timing info for audio measurements

I am creating an application that will take an audio measurement by playing some stimulus data and recording the microphone input, and then analysing the data.
I am having trouble accounting for the time taken to initialise and start the audio engine, as this varies each time and is also dependant on the hardware used, etc.
So, I have an audio engine and have installed a Tap the hardware input, with input 1 being the microphone recording, and input 2 being a reference input (also from the hardware). The output is physically Y-Split and fed back into input 2.
The app initialises the engine, plays the stimulus audio plus 1 second of silence (to allow propagation time for the microphone to record the whole signal back), and then stop and close the engine.
I write the two input buffers as a WAV file so that I can import this into an an existing DAW. to visually examine the signals. I can see that each time I take a measurement, the time difference between the two signals is different (despite the fact the microphone is not moved and the hardware has stayed the same). I am assuming this is to do with the latency of the hardware, the time taken to initialise the engine and the way the divice distributes tasks.
I have tried to capture the absolute time using mach_absolute_time of the first buffer callback on each installTap function and subtracting the two, and I can see that this does vary quite a lot with each call:
class newAVAudioEngine{
var engine = AVAudioEngine()
var audioBuffer = AVAudioPCMBuffer()
var running = true
var in1Buf:[Float]=Array(repeating:0, count:totalRecordSize)
var in2Buf:[Float]=Array(repeating:0, count:totalRecordSize)
var buf1current:Int = 0
var buf2current:Int = 0
var in1firstRun:Bool = false
var in2firstRun:Bool = false
var in1StartTime = 0
var in2startTime = 0
func measure(inputSweep:SweepFilter) -> measurement {
initializeEngine(inputSweep: inputSweep)
while running == true {
let measureResult = measurement.init(meas: meas,ref: ref)
return measureResult
func initializeEngine(inputSweep:SweepFilter) {
buf1current = 0
buf2current = 0
in1StartTime = 0
in2startTime = 0
in1firstRun = true
in2firstRun = true
in1Buf = Array(repeating:0, count:totalRecordSize)
in2Buf = Array(repeating:0, count:totalRecordSize)
engine = AVAudioEngine()
let srcNode = AVAudioSourceNode { _, _, frameCount, AudioBufferList -> OSStatus in
let ablPointer = UnsafeMutableAudioBufferListPointer(AudioBufferList)
if (Int(frameCount) + time) <= inputSweep.stimulus.count {
for frame in 0..<Int(frameCount) {
let value = inputSweep.stimulus[frame + time]
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Float> = UnsafeMutableBufferPointer(buffer)
buf[frame] = value
time += Int(frameCount)
return noErr
} else {
for frame in 0..<Int(frameCount) {
let value = 0
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Float> = UnsafeMutableBufferPointer(buffer)
buf[frame] = Float(value)
return noErr
let format = engine.outputNode.inputFormat(forBus: 0)
let stimulusFormat = AVAudioFormat(commonFormat: format.commonFormat,
sampleRate: Double(sampleRate),
channels: 1,
interleaved: format.isInterleaved)
do {
try AVAudioSession.sharedInstance().setCategory(.playAndRecord)
let ioBufferDuration = 128.0 / 44100.0
try AVAudioSession.sharedInstance().setPreferredIOBufferDuration(ioBufferDuration)
} catch {
assertionFailure("AVAudioSession setup failed")
let input = engine.inputNode
let inputFormat = input.inputFormat(forBus: 0)
print("InputNode Format is \(inputFormat)")
engine.connect(srcNode, to: engine.mainMixerNode, format: stimulusFormat)
if internalRefLoop == true {
srcNode.installTap(onBus: 0, bufferSize: 1024, format: stimulusFormat, block: {(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
if self.in2firstRun == true {
var info = mach_timebase_info()
let currentTime = mach_absolute_time()
let nanos = currentTime * UInt64(info.numer) / UInt64(info.denom)
self.in2startTime = Int(nanos)
self.in2firstRun = false
do {
let floatData = buffer.floatChannelData?.pointee
for frame in 0..<buffer.frameLength{
if (self.buf2current + Int(frame)) < totalRecordSize{
self.in2Buf[self.buf2current + Int(frame)] = floatData![Int(frame)]
self.buf2current += Int(buffer.frameLength)
if (self.numberOfSamples + Int(buffer.frameLength)) <= totalRecordSize{
try self.stimulusFile.write(from: buffer)
self.numberOfSamples += Int(buffer.frameLength) } else {
self.running = false
} catch {
print(NSString(string: "write failed"))
let micAudioConverter = AVAudioConverter(from: inputFormat, to: stimulusFormat!)
var micChannelMap:[NSNumber] = [0,-1]
micAudioConverter?.channelMap = micChannelMap
let refAudioConverter = AVAudioConverter(from: inputFormat, to: stimulusFormat!)
var refChannelMap:[NSNumber] = [1,-1]
refAudioConverter?.channelMap = refChannelMap
//Measurement Tap
engine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat, block: {(buffer2: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
if self.in1firstRun == true {
var info = mach_timebase_info()
let currentTime = mach_absolute_time()
let nanos = currentTime * UInt64(info.numer) / UInt64(info.denom)
self.in1StartTime = Int(nanos)
self.in1firstRun = false
do {
let micConvertedBuffer = AVAudioPCMBuffer(pcmFormat: stimulusFormat!, frameCapacity: buffer2.frameCapacity)
let micInputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer2
var error: NSError? = nil
//let status = audioConverter.convert(to: convertedBuffer!, error: &error, withInputFrom: inputBlock)
let status = micAudioConverter?.convert(to: micConvertedBuffer!, error: &error, withInputFrom: micInputBlock)
let floatData = micConvertedBuffer?.floatChannelData?.pointee
for frame in 0..<micConvertedBuffer!.frameLength{
if (self.buf1current + Int(frame)) < totalRecordSize{
self.in1Buf[self.buf1current + Int(frame)] = floatData![Int(frame)]
if (self.buf1current + Int(frame)) >= totalRecordSize {
self.running = false
self.buf1current += Int(micConvertedBuffer!.frameLength)
try self.measurementFile.write(from: micConvertedBuffer!)
} catch {
print(NSString(string: "write failed"))
if internalRefLoop == false {
if self.in2firstRun == true{
var info = mach_timebase_info()
let currentTime = mach_absolute_time()
let nanos = currentTime * UInt64(info.numer) / UInt64(info.denom)
self.in2startTime = Int(nanos)
self.in2firstRun = false
do {
let refConvertedBuffer = AVAudioPCMBuffer(pcmFormat: stimulusFormat!, frameCapacity: buffer2.frameCapacity)
let refInputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer2
var error: NSError? = nil
let status = refAudioConverter?.convert(to: refConvertedBuffer!, error: &error, withInputFrom: refInputBlock)
let floatData = refConvertedBuffer?.floatChannelData?.pointee
for frame in 0..<refConvertedBuffer!.frameLength{
if (self.buf2current + Int(frame)) < totalRecordSize{
self.in2Buf[self.buf2current + Int(frame)] = floatData![Int(frame)]
if (self.numberOfSamples + Int(buffer2.frameLength)) <= totalRecordSize{
self.buf2current += Int(refConvertedBuffer!.frameLength)
try self.stimulusFile.write(from: refConvertedBuffer!) } else {
self.running = false
} catch {
print(NSString(string: "write failed"))
assert(engine.inputNode != nil)
running = true
try! engine.start()
So The above method is my entire class. Currently each buffer call on installTap writes the input directly to a WAV file. This is where I can see the two end results differing each time. I have tried adding the startTime variable and subtracting the two, but the results still vary.
Do I need to take into account my output will have latency too that may vary with each call? If so, how do I add this time into the equation? What I am looking for is for the two inputs and outputs to all have relative time, so that I can compare them. The different hardware latency will not matter too much, as long as I can identify the end call times.
If you are doing real-time measurements, you might want to use AVAudioSinkNode instead of a Tap. The Sink Node is new and introduced along with AVAudioSourceNode you are using. With installing a Tap you won't be able to get precise timing.

Extract meter levels from audio file

I need to extract audio meter levels from a file so I can render the levels before playing the audio. I know AVAudioPlayer can get this information while playing the audio file through
func averagePower(forChannel channelNumber: Int) -> Float.
But in my case I would like to obtain an [Float] of meter levels beforehand.
Swift 4
It takes on an iPhone:
0.538s to process an 8MByte mp3 player with a 4min47s duration, and 44,100 sampling rate
0.170s to process an 712KByte mp3 player with a 22s duration, and 44,100 sampling rate
0.089s to process caffile created by converting the file above using this command afconvert -f caff -d LEI16 audio.mp3 audio.caf in the terminal.
Let's begin:
A) Declare this class that is going to hold the necessary information about the audio asset:
/// Holds audio information used for building waveforms
final class AudioContext {
/// The audio asset URL used to load the context
public let audioURL: URL
/// Total number of samples in loaded asset
public let totalSamples: Int
/// Loaded asset
public let asset: AVAsset
// Loaded assetTrack
public let assetTrack: AVAssetTrack
private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
self.audioURL = audioURL
self.totalSamples = totalSamples
self.asset = asset
self.assetTrack = assetTrack
public static func load(fromAudioURL audioURL: URL, completionHandler: #escaping (_ audioContext: AudioContext?) -> ()) {
let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
fatalError("Couldn't load AVAssetTrack")
asset.loadValuesAsynchronously(forKeys: ["duration"]) {
var error: NSError?
let status = asset.statusOfValue(forKey: "duration", error: &error)
switch status {
case .loaded:
let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
let audioFormatDesc = formatDescriptions.first,
let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
else { break }
let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
case .failed, .cancelled, .loading, .unknown:
print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
We are going to use its asynchronous function load, and handle its result to a completion handler.
B) Import AVFoundation and Accelerate in your view controller:
import AVFoundation
import Accelerate
C) Declare the noise level in your view controller (in dB):
let noiseFloor: Float = -80
For example, anything less than -80dB will be considered as silence.
D) The following function takes an audio context and produces the desired dB powers. targetSamples is by default set to 100, you can change that to suit your UI needs:
func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
guard let audioContext = audioContext else {
fatalError("Couldn't create the audioContext")
let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
guard let reader = try? AVAssetReader(asset: audioContext.asset)
else {
fatalError("Couldn't initialize the AVAssetReader")
reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
let outputSettingsDict: [String : Any] = [
AVFormatIDKey: Int(kAudioFormatLinearPCM),
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsBigEndianKey: false,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsNonInterleaved: false
let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
outputSettings: outputSettingsDict)
readerOutput.alwaysCopiesSampleData = false
var channelCount = 1
let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
for item in formatDescriptions {
guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
fatalError("Couldn't get the format description")
channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
var outputSamples = [Float]()
var sampleBuffer = Data()
// 16-bit samples
defer { reader.cancelReading() }
while reader.status == .reading {
guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
// Append audio sample buffer into our current sample buffer
var readBufferLength = 0
var readBufferPointer: UnsafeMutablePointer<Int8>?
CMBlockBufferGetDataPointer(readBuffer, 0, &readBufferLength, nil, &readBufferPointer)
sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
let downSampledLength = totalSamples / samplesPerPixel
let samplesToProcess = downSampledLength * samplesPerPixel
guard samplesToProcess > 0 else { continue }
processSamples(fromData: &sampleBuffer,
outputSamples: &outputSamples,
samplesToProcess: samplesToProcess,
downSampledLength: downSampledLength,
samplesPerPixel: samplesPerPixel,
filter: filter)
//print("Status: \(reader.status)")
// Process the remaining samples at the end which didn't fit into samplesPerPixel
let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
if samplesToProcess > 0 {
let downSampledLength = 1
let samplesPerPixel = samplesToProcess
let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
processSamples(fromData: &sampleBuffer,
outputSamples: &outputSamples,
samplesToProcess: samplesToProcess,
downSampledLength: downSampledLength,
samplesPerPixel: samplesPerPixel,
filter: filter)
//print("Status: \(reader.status)")
// if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
guard reader.status == .completed else {
fatalError("Couldn't read the audio file")
return outputSamples
E) render uses this function to down-sample the data from the audio file, and convert to decibels:
func processSamples(fromData sampleBuffer: inout Data,
outputSamples: inout [Float],
samplesToProcess: Int,
downSampledLength: Int,
samplesPerPixel: Int,
filter: [Float]) {
sampleBuffer.withUnsafeBytes { (samples: UnsafePointer<Int16>) in
var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
let sampleCount = vDSP_Length(samplesToProcess)
//Convert 16bit int samples to floats
vDSP_vflt16(samples, 1, &processingBuffer, 1, sampleCount)
//Take the absolute values to get amplitude
vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
//get the corresponding dB, and clip the results
getdB(from: &processingBuffer)
//Downsample and average
var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
filter, &downSampledData,
//Remove processed samples
sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
outputSamples += downSampledData
F) Which in turn calls this function that gets the corresponding dB, and clips the results to [noiseFloor, 0]:
func getdB(from normalizedSamples: inout [Float]) {
// Convert samples to a log scale
var zero: Float = 32768.0
vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
//Clip to [noiseFloor, 0]
var ceil: Float = 0.0
var noiseFloorMutable = noiseFloor
vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
G) Finally you can get the waveform of the audio like so:
guard let path = Bundle.main.path(forResource: "audio", ofType:"mp3") else {
fatalError("Couldn't find the file path")
let url = URL(fileURLWithPath: path)
var outputArray : [Float] = []
AudioContext.load(fromAudioURL: url, completionHandler: { audioContext in
guard let audioContext = audioContext else {
fatalError("Couldn't create the audioContext")
outputArray = self.render(audioContext: audioContext, targetSamples: 300)
Don't forget that AudioContext.load(fromAudioURL:) is asynchronous.
This solution is synthesized from this repo by William Entriken. All credit goes to him.
Swift 5
Here is the same code updated to Swift 5 syntax:
import AVFoundation
import Accelerate
/// Holds audio information used for building waveforms
final class AudioContext {
/// The audio asset URL used to load the context
public let audioURL: URL
/// Total number of samples in loaded asset
public let totalSamples: Int
/// Loaded asset
public let asset: AVAsset
// Loaded assetTrack
public let assetTrack: AVAssetTrack
private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
self.audioURL = audioURL
self.totalSamples = totalSamples
self.asset = asset
self.assetTrack = assetTrack
public static func load(fromAudioURL audioURL: URL, completionHandler: #escaping (_ audioContext: AudioContext?) -> ()) {
let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
fatalError("Couldn't load AVAssetTrack")
asset.loadValuesAsynchronously(forKeys: ["duration"]) {
var error: NSError?
let status = asset.statusOfValue(forKey: "duration", error: &error)
switch status {
case .loaded:
let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
let audioFormatDesc = formatDescriptions.first,
let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
else { break }
let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
case .failed, .cancelled, .loading, .unknown:
print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
let noiseFloor: Float = -80
func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
guard let audioContext = audioContext else {
fatalError("Couldn't create the audioContext")
let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
guard let reader = try? AVAssetReader(asset: audioContext.asset)
else {
fatalError("Couldn't initialize the AVAssetReader")
reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
let outputSettingsDict: [String : Any] = [
AVFormatIDKey: Int(kAudioFormatLinearPCM),
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsBigEndianKey: false,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsNonInterleaved: false
let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
outputSettings: outputSettingsDict)
readerOutput.alwaysCopiesSampleData = false
var channelCount = 1
let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
for item in formatDescriptions {
guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
fatalError("Couldn't get the format description")
channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
var outputSamples = [Float]()
var sampleBuffer = Data()
// 16-bit samples
defer { reader.cancelReading() }
while reader.status == .reading {
guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
// Append audio sample buffer into our current sample buffer
var readBufferLength = 0
var readBufferPointer: UnsafeMutablePointer<Int8>?
atOffset: 0,
lengthAtOffsetOut: &readBufferLength,
totalLengthOut: nil,
dataPointerOut: &readBufferPointer)
sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
let downSampledLength = totalSamples / samplesPerPixel
let samplesToProcess = downSampledLength * samplesPerPixel
guard samplesToProcess > 0 else { continue }
processSamples(fromData: &sampleBuffer,
outputSamples: &outputSamples,
samplesToProcess: samplesToProcess,
downSampledLength: downSampledLength,
samplesPerPixel: samplesPerPixel,
filter: filter)
//print("Status: \(reader.status)")
// Process the remaining samples at the end which didn't fit into samplesPerPixel
let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
if samplesToProcess > 0 {
let downSampledLength = 1
let samplesPerPixel = samplesToProcess
let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
processSamples(fromData: &sampleBuffer,
outputSamples: &outputSamples,
samplesToProcess: samplesToProcess,
downSampledLength: downSampledLength,
samplesPerPixel: samplesPerPixel,
filter: filter)
//print("Status: \(reader.status)")
// if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
guard reader.status == .completed else {
fatalError("Couldn't read the audio file")
return outputSamples
func processSamples(fromData sampleBuffer: inout Data,
outputSamples: inout [Float],
samplesToProcess: Int,
downSampledLength: Int,
samplesPerPixel: Int,
filter: [Float]) {
sampleBuffer.withUnsafeBytes { (samples: UnsafeRawBufferPointer) in
var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
let sampleCount = vDSP_Length(samplesToProcess)
//Create an UnsafePointer<Int16> from samples
let unsafeBufferPointer = samples.bindMemory(to: Int16.self)
let unsafePointer = unsafeBufferPointer.baseAddress!
//Convert 16bit int samples to floats
vDSP_vflt16(unsafePointer, 1, &processingBuffer, 1, sampleCount)
//Take the absolute values to get amplitude
vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
//get the corresponding dB, and clip the results
getdB(from: &processingBuffer)
//Downsample and average
var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
filter, &downSampledData,
//Remove processed samples
sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
outputSamples += downSampledData
func getdB(from normalizedSamples: inout [Float]) {
// Convert samples to a log scale
var zero: Float = 32768.0
vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
//Clip to [noiseFloor, 0]
var ceil: Float = 0.0
var noiseFloorMutable = noiseFloor
vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
Old solution
Here is a function you could use to pre-render the meter levels of an audio file without playing it:
func averagePowers(audioFileURL: URL, forChannel channelNumber: Int, completionHandler: #escaping(_ success: [Float]) -> ()) {
let audioFile = try! AVAudioFile(forReading: audioFileURL)
let audioFilePFormat = audioFile.processingFormat
let audioFileLength = audioFile.length
//Set the size of frames to read from the audio file, you can adjust this to your liking
let frameSizeToRead = Int(audioFilePFormat.sampleRate/20)
//This is to how many frames/portions we're going to divide the audio file
let numberOfFrames = Int(audioFileLength)/frameSizeToRead
//Create a pcm buffer the size of a frame
guard let audioBuffer = AVAudioPCMBuffer(pcmFormat: audioFilePFormat, frameCapacity: AVAudioFrameCount(frameSizeToRead)) else {
fatalError("Couldn't create the audio buffer")
//Do the calculations in a background thread, if you don't want to block the main thread for larger audio files
DispatchQueue.global(qos: .userInitiated).async {
//This is the array to be returned
var returnArray : [Float] = [Float]()
//We're going to read the audio file, frame by frame
for i in 0..<numberOfFrames {
//Change the position from which we are reading the audio file, since each frame starts from a different position in the audio file
audioFile.framePosition = AVAudioFramePosition(i * frameSizeToRead)
//Read the frame from the audio file
try! audioFile.read(into: audioBuffer, frameCount: AVAudioFrameCount(frameSizeToRead))
//Get the data from the chosen channel
let channelData = audioBuffer.floatChannelData![channelNumber]
//This is the array of floats
let arr = Array(UnsafeBufferPointer(start:channelData, count: frameSizeToRead))
//Calculate the mean value of the absolute values
let meanValue = arr.reduce(0, {$0 + abs($1)})/Float(arr.count)
//Calculate the dB power (You can adjust this), if average is less than 0.000_000_01 we limit it to -160.0
let dbPower: Float = meanValue > 0.000_000_01 ? 20 * log10(meanValue) : -160.0
//append the db power in the current frame to the returnArray
//Return the dBPowers
And you can call it like so:
let path = Bundle.main.path(forResource: "audio.mp3", ofType:nil)!
let url = URL(fileURLWithPath: path)
averagePowers(audioFileURL: url, forChannel: 0, completionHandler: { array in
//Use the array
Using instruments, this solution makes high cpu usage during 1.2 seconds, takes about 5 seconds to return to the main thread with the returnArray, and up to 10 seconds when on low battery mode.
First of all, this is heavy operation, so it will take some OS time and resources to accomplish this. In below example I will use standard frame rates and sampling, but you should really sample far far less if you for example only want to display bars as an indications
OK so you don't need to play sound to analyze it. So in this i will not use AVAudioPlayer at all I assume that I will take track as URL:
let path = Bundle.main.path(forResource: "example3.mp3", ofType:nil)!
let url = URL(fileURLWithPath: path)
Then I will use AVAudioFile to get track information into AVAudioPCMBuffer. Whenever you have it in buffer you have all information regarding your track:
func buffer(url: URL) {
do {
let track = try AVAudioFile(forReading: url)
let format = AVAudioFormat(commonFormat:.pcmFormatFloat32, sampleRate:track.fileFormat.sampleRate, channels: track.fileFormat.channelCount, interleaved: false)
let buffer = AVAudioPCMBuffer(pcmFormat: format!, frameCapacity: UInt32(track.length))!
try track.read(into : buffer, frameCount:UInt32(track.length))
self.analyze(buffer: buffer)
} catch {
As you may notice there is analyze method for it. You should have close to floatChannelData variable in your buffer. It's a plain data so you'll need to parse it. I will post a method and below explain this:
func analyze(buffer: AVAudioPCMBuffer) {
let channelCount = Int(buffer.format.channelCount)
let frameLength = Int(buffer.frameLength)
var result = Array(repeating: [Float](repeatElement(0, count: frameLength)), count: channelCount)
for channel in 0..<channelCount {
for sampleIndex in 0..<frameLength {
let sqrtV = sqrt(buffer.floatChannelData![channel][sampleIndex*buffer.stride]/Float(buffer.frameLength))
let dbPower = 20 * log10(sqrtV)
result[channel][sampleIndex] = dbPower
There are some calculations (heavy one) involved in it. When I was working on similar solutions couple of moths ago I came across this tutorial: https://www.raywenderlich.com/5154-avaudioengine-tutorial-for-ios-getting-started there is excelent explanation of this calculation there and also parts of the code that I pasted above and also use in my project, so I want to credit author here: Scott McAlister 👏
Based on #Jakub's answer above, here's an Objective-C version.
If you want to increase the accuracy, change the deciblesCount variable, but beware of performance hit. If you want to return more bars, you can increase the divisions variable when you call the function (with no additional performance hit). You should probably put it on a background thread in any case.
A 3:36 minute / 5.2MB song takes about 1.2s. The above images are of a shotgun firing with 30 and 100 divisions respectively
-(NSArray *)returnWaveArrayForFile:(NSString *)filepath numberOfDivisions:(int)divisions{
//pull file
NSError * error;
NSURL * url = [NSURL URLWithString:filepath];
AVAudioFile * file = [[AVAudioFile alloc] initForReading:url error:&error];
//create av stuff
AVAudioFormat * format = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32 sampleRate:file.fileFormat.sampleRate channels:file.fileFormat.channelCount interleaved:false];
AVAudioPCMBuffer * buffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:format frameCapacity:(int)file.length];
[file readIntoBuffer:buffer frameCount:(int)file.length error:&error];
//grab total number of decibles, 1000 seems to work
int deciblesCount = MIN(1000,buffer.frameLength);
NSMutableArray * channels = [NSMutableArray new];
float frameIncrement = buffer.frameLength / (float)deciblesCount;
//needed later
float maxDecible = 0;
float minDecible = HUGE_VALF;
NSMutableArray * sd = [NSMutableArray new]; //used for standard deviation
for (int n = 0; n < MIN(buffer.format.channelCount, 2); n++){ //go through channels
NSMutableArray * decibles = [NSMutableArray new]; //holds actual decible values
//go through pulling the decibles
for (int i = 0; i < deciblesCount; i++){
int offset = frameIncrement * i; //grab offset
//equation from stack, no idea the maths
float sqr = sqrtf(buffer.floatChannelData[n][offset * buffer.stride]/(float)buffer.frameLength);
float decible = 20 * log10f(sqr);
decible += 160; //make positive
decible = (isnan(decible) || decible < 0) ? 0 : decible; //if it's not a number or silent, make it zero
if (decible > 0){ //if it has volume
[sd addObject:#(decible)];
[decibles addObject:#(decible)];//add to decibles array
maxDecible = MAX(maxDecible, decible); //grab biggest
minDecible = MIN(minDecible, decible); //grab smallest
[channels addObject:decibles]; //add to channels array
//find standard deviation and then deducted the bottom slag
NSExpression * expression = [NSExpression expressionForFunction:#"stddev:" arguments:#[[NSExpression expressionForConstantValue:sd]]];
float standardDeviation = [[expression expressionValueWithObject:nil context:nil] floatValue];
float deviationDeduct = standardDeviation / (standardDeviation + (maxDecible - minDecible));
//go through calculating deviation percentage
NSMutableArray * deviations = [NSMutableArray new];
NSMutableArray * returning = [NSMutableArray new];
for (int c = 0; c < (int)channels.count; c++){
NSArray * channel = channels[c];
for (int n = 0; n < (int)channel.count; n++){
float decible = [channel[n] floatValue];
float remainder = (maxDecible - decible);
float deviation = standardDeviation / (standardDeviation + remainder) - deviationDeduct;
[deviations addObject:#(deviation)];
//go through creating percentage
float maxTotal = 0;
int catchCount = floorf(deciblesCount / divisions); //total decible values within a segment or division
NSMutableArray * totals = [NSMutableArray new];
for (int n = 0; n < divisions; n++){
float total = 0.0f;
for (int k = 0; k < catchCount; k++){ //go through each segment
int index = n * catchCount + k; //create the index
float deviation = [deviations[index] floatValue]; //grab value
total += deviation; //add to total
//max out maxTotal var -> used later to calc percentages
maxTotal = MAX(maxTotal, total);
[totals addObject:#(total)]; //add to totals array
//normalise percentages and return
NSMutableArray * percentages = [NSMutableArray new];
for (int n = 0; n < divisions; n++){
float total = [totals[n] floatValue]; //grab the total value for that segment
float percentage = total / maxTotal; //divide by the biggest value -> making it a percentage
[percentages addObject:#(percentage)]; //add to the array
//add to the returning array
[returning addObject:percentages];
//return channel data -> array of two arrays of percentages
return (NSArray *)returning;
Call like this:
int divisions = 30; //number of segments you want for your display
NSString * path = [[NSBundle mainBundle] pathForResource:#"satie" ofType:#"mp3"];
NSArray * channels = [_audioReader returnWaveArrayForFile:path numberOfDivisions:divisions];
You get the two channels back in that array, which you can use to update your UI. Values in each array are between 0 and 1 which you can use to build your bars.

Converting M4A file into Raw data

I am trying to read the raw values of a sound file. I am pretty new to IOS development. I am ultimately trying to take a fast fourier transform of the audio file. The output of the data looks like a sound wave, but when I take the fft of a beep sound provided hereenter link description here I do not get an obvious frequency from the fft, which leads me to believe I am not getting the real raw data. I have constructed the following code using a combination of several stack overflow posts. Am I reading the file incorrectly?
class AudioAnalyzer {
init(file_path: NSURL) {
var assetOptions = [
AVURLAssetPreferPreciseDurationAndTimingKey : 1,
AVFormatIDKey : kAudioFormatLinearPCM
var videoAsset=AVURLAsset(URL: file_path, options: assetOptions)
var error:NSError?
var videoAssetReader=AVAssetReader(asset: videoAsset, error: &error)
if error != nil
var tracksArray=videoAsset?.tracksWithMediaType(AVMediaTypeAudio)
var videotrack = tracksArray?[0] as! AVAssetTrack
var fps = videotrack.nominalFrameRate
var videoTrackOutput=AVAssetReaderTrackOutput(track:videotrack as AVAssetTrack , outputSettings: nil)
if videoAssetReader.canAddOutput(videoTrackOutput)
if videoAssetReader.status == AVAssetReaderStatus.Reading {
var sampleBuffer = videoTrackOutput.copyNextSampleBuffer()
var audioBuffer = CMSampleBufferGetDataBuffer(sampleBuffer)
let samplesInBuffer = CMSampleBufferGetNumSamples(sampleBuffer)
var currentZ = Double(samplesInBuffer)
let buffer: CMBlockBufferRef = CMSampleBufferGetDataBuffer(sampleBuffer)
var lengthAtOffset: size_t = 0
var totalLength: size_t = 0
var data: UnsafeMutablePointer<Int8> = nil
var output: Array<Float> = [];
if( CMBlockBufferGetDataPointer( buffer, 0, &lengthAtOffset, &totalLength, &data ) != noErr ) {
println("some sort of error happened")
} else {
for i in stride(from: 0, to: totalLength, by: 2) {
var myint = Int16(data[i]) << 8 | Int16(data[i+1])
var myFloat = Float(myint)
Your AVAssetReaderTrackOutput is giving you raw packet data. For LPCM output, pass in some outputSettings:
var settings = [NSObject : AnyObject]()
settings[AVFormatIDKey] = kAudioFormatLinearPCM
settings[AVLinearPCMBitDepthKey] = 16
settings[AVLinearPCMIsFloatKey] = false
var videoTrackOutput=AVAssetReaderTrackOutput(track:videotrack as AVAssetTrack , outputSettings: settings)
p.s. I'd feel much better if you renamed videoTrackOutput to audioTrackOutput.
