I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
#objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
#objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio
Related
I struggle to make the following scenario work as expected (code will be provided below).
Record my microphone input and store an AVAudioPCMBuffer in memory, this is done with AVAudioPCMBuffer extension method copy(from buffer: AVAudioPCMBuffer, readOffset: AVAudioFrameCount = default, frames: AVAudioFrameCount = default). I indeed get the buffer at the end of my recording.
When record is ended pass the buffer to AKPlayer and play. Here is a code snippet to demonstrate what I do (I know it is no the full app code, if needed I can share it):
.
private var player: AKPlayer = AKPlayer()
self.player.buffering = .always
// in the record complete callbak:
self.player.buffer = self.bufferRecorder?.pcmBuffer
self.player.volume = 1
self.player.play()
please note that the plater is connected to a mixer which is eventually connected to the AudioKit output.
when I inspect and debug the application I could see the buffer is with the correct length, and all my output/input setup uses the same processing format (sample rate, channels, bitrate etc) as well as the buffer recorded, but still my app crashes on this line:
2018-10-28 08:40:32.625001+0200 BeatmanApp[71037:6731884] [avae] AVAEInternal.h:70:_AVAE_Check:
required condition is false: [AVAudioPlayerNode.mm:665:ScheduleBuffer: (_outputFormat.channelCount == buffer.format.channelCount)]
when I debug and walk through the AudioKit code I can see that the breaking line is on AKPlayer+Playback.swift on line 162 on the method: playerNode.scheduleBuffer
more information that could be helpful:
the buffer recorded is 16 seconds long.
when I tried to pass the buffer straight to the player node in the tap method it seems as it worked, I did hear a delay from mic to speaker but it indeed played back.
I tried call prepare on the player before play method invoked, no help
thanks!
Ok, this was super uncool debugging session. I had to investigate the AVAudioEngine and how this kind of scenario could be done there, which of course not the final result I was looking. This quest helped me to understand how to solve it with AudioKit (half of my app is implemented using AudioKit's tools so it doesn't make sense to rewrite it with AVFoundation).
AFFoundation solution:
private let engine = AVAudioEngine()
private let bufferSize = 1024
private let p: AVAudioPlayerNode = AVAudioPlayerNode()
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.playAndRecord, mode: .default, options: .defaultToSpeaker)
} catch {
print("Setting category to AVAudioSessionCategoryPlayback failed.")
}
let inputNode = self.engine.inputNode
engine.connect(inputNode, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
// !!! the following lines are the key to the solution.
// !!! the player has to be attached to the engine before actually connected
engine.attach(p)
engine.connect(p, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
do {
try engine.start()
} catch {
print("could not start engine \(error.localizedDescription)")
}
recordBufferAndPlay(duration: 4)
recordBufferAndPlay function:
func recordBufferAndPlay(duration: Double){
let inputNode = self.engine.inputNode
let total: Double = AVAudioSession.sharedInstance().sampleRate * duration
let totalBufferSize: UInt32 = UInt32(total)
let recordedBuffer : AVAudioPCMBuffer! = AVAudioPCMBuffer(pcmFormat: inputNode.inputFormat(forBus: 0), frameCapacity: totalBufferSize)
var alreadyRecorded = 0
inputNode.installTap(onBus: 0, bufferSize: 256, format: inputNode.inputFormat(forBus: 0)) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
recordedBuffer.copy(from: buffer) // this helper function is taken from audio kit!
alreadyRecorded = alreadyRecorded + Int(buffer.frameLength)
print(alreadyRecorded, totalBufferSize)
if(alreadyRecorded >= totalBufferSize){
inputNode.removeTap(onBus: 0)
self.p.scheduleBuffer(recordedBuffer, at: nil, options: .loops, completionHandler: {
print("completed playing")
})
self.p.play()
}
}
}
AudioKit solution:
So in the AudioKit solution these line should be invoked on your AKPlayer object. Note that this should be done before you actually start your engine.
self.player.buffering = .always
AudioKit.engine.attach(self.player.playerNode)
AudioKit.engine.connect(self.player.playerNode, to: self.mixer.inputNode, format: AudioKit.engine.inputNode.outputFormat(forBus: 0))
than the record is done pretty similarly to how you would have done it in AVAudioEngine, you install a tap on your node (microphone or other node) and record the buffer of PCM samples.
I am trying to save depth images from the iPhoneX TrueDepth camera. Using the AVCamPhotoFilter sample code, I am able to view the depth, converted to grayscale format, on the screen of the phone in real-time. I cannot figure out how to save the sequence of depth images in the raw (16 bits or more) format.
I have depthData which is an instance of AVDepthData. One of its members is depthDataMap which is an instance of CVPixelBuffer and image format type kCVPixelFormatType_DisparityFloat16. Is there a way to save it to the phone to transfer for offline manipulation?
There's no standard video format for "raw" depth/disparity maps, which might have something to do with AVCapture not really offering a way to record it.
You have a couple of options worth investigating here:
Convert depth maps to grayscale textures (which you can do using the code in the AVCamPhotoFilter sample code), then pass those textures to AVAssetWriter to produce a grayscale video. Depending on the video format and grayscale conversion method you choose, other software you write for reading the video might be able to recover depth/disparity info with sufficient precision for your purposes from the grayscale frames.
Anytime you have a CVPixelBuffer, you can get at the data yourself and do whatever you want with it. Use CVPixelBufferLockBaseAddress (with the readOnly flag) to make sure the content won't change while you read it, then copy data from the pointer CVPixelBufferGetBaseAddress provides to wherever you want. (Use other pixel buffer functions to see how many bytes to copy, and unlock the buffer when you're done.)
Watch out, though: if you spend too much time copying from buffers, or otherwise retain them, they won't get deallocated as new buffers come in from the capture system, and your capture session will hang. (All told, it's unclear without testing whether a device has the memory & I/O bandwidth for much recording this way.)
You can use Compression library to create a zip file with the raw CVPixelBuffer data.
Few problems with this solution.
It's a lot of data and zip is not a good compression. (the compressed file is 20 times bigger than 32bits per frame video with the same number of frames).
Apple's Compression library creates a file which standard zip program does't open. I use zlib in C code to read it and use inflateInit2(&strm, -15); to make it work.
You'll need to do some work to export the file out of your application
Here is my code (which I limited to 250 frames since it hold it in RAM but you can flush to disk if needed more frames):
// DepthCapture.swift
// AVCamPhotoFilter
//
// Created by Eyal Fink on 07/04/2018.
// Copyright © 2018 Resonai. All rights reserved.
//
// Capture the depth pixelBuffer into a compress file.
// This is very hacky and there are lots of TODOs but instead we need to replace
// it with a much better compression (video compression)....
import AVFoundation
import Foundation
import Compression
class DepthCapture {
let kErrorDomain = "DepthCapture"
let maxNumberOfFrame = 250
lazy var bufferSize = 640 * 480 * 2 * maxNumberOfFrame // maxNumberOfFrame frames
var dstBuffer: UnsafeMutablePointer<UInt8>?
var frameCount: Int64 = 0
var outputURL: URL?
var compresserPtr: UnsafeMutablePointer<compression_stream>?
var file: FileHandle?
// All operations handling the compresser oobjects are done on the
// porcessingQ so they will happen sequentially
var processingQ = DispatchQueue(label: "compression",
qos: .userInteractive)
func reset() {
frameCount = 0
outputURL = nil
if self.compresserPtr != nil {
//free(compresserPtr!.pointee.dst_ptr)
compression_stream_destroy(self.compresserPtr!)
self.compresserPtr = nil
}
if self.file != nil {
self.file!.closeFile()
self.file = nil
}
}
func prepareForRecording() {
reset()
// Create the output zip file, remove old one if exists
let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0] as NSString
self.outputURL = URL(fileURLWithPath: documentsPath.appendingPathComponent("Depth"))
FileManager.default.createFile(atPath: self.outputURL!.path, contents: nil, attributes: nil)
self.file = FileHandle(forUpdatingAtPath: self.outputURL!.path)
if self.file == nil {
NSLog("Cannot create file at: \(self.outputURL!.path)")
return
}
// Init the compression object
compresserPtr = UnsafeMutablePointer<compression_stream>.allocate(capacity: 1)
compression_stream_init(compresserPtr!, COMPRESSION_STREAM_ENCODE, COMPRESSION_ZLIB)
dstBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
compresserPtr!.pointee.dst_ptr = dstBuffer!
//defer { free(bufferPtr) }
compresserPtr!.pointee.dst_size = bufferSize
}
func flush() {
//let data = Data(bytesNoCopy: compresserPtr!.pointee.dst_ptr, count: bufferSize, deallocator: .none)
let nBytes = bufferSize - compresserPtr!.pointee.dst_size
print("Writing \(nBytes)")
let data = Data(bytesNoCopy: dstBuffer!, count: nBytes, deallocator: .none)
self.file?.write(data)
}
func startRecording() throws {
processingQ.async {
self.prepareForRecording()
}
}
func addPixelBuffers(pixelBuffer: CVPixelBuffer) {
processingQ.async {
if self.frameCount >= self.maxNumberOfFrame {
// TODO now!! flush when needed!!!
print("MAXED OUT")
return
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let add : UnsafeMutableRawPointer = CVPixelBufferGetBaseAddress(pixelBuffer)!
self.compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(add.assumingMemoryBound(to: UInt8.self))
let height = CVPixelBufferGetHeight(pixelBuffer)
self.compresserPtr!.pointee.src_size = CVPixelBufferGetBytesPerRow(pixelBuffer) * height
let flags = Int32(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_OK {
NSLog("Buffer compression retured: \(compression_status)")
return
}
if self.compresserPtr!.pointee.src_size != 0 {
NSLog("Compression lib didn't eat all data: \(compression_status)")
return
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
// TODO(eyal): flush when needed!!!
self.frameCount += 1
print("handled \(self.frameCount) buffers")
}
}
func finishRecording(success: #escaping ((URL) -> Void)) throws {
processingQ.async {
let flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
self.compresserPtr!.pointee.src_size = 0
//compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_END {
NSLog("ERROR: Finish failed. compression retured: \(compression_status)")
return
}
self.flush()
DispatchQueue.main.sync {
success(self.outputURL!)
}
self.reset()
}
}
}
I am trying to perform fft on Spotify's audio stream using EZAudio.
Following this suggestion, I have subclassed SPTCoreAudioController, overrode attemptToDeliverAudioFrames:ofCount:streamDescription:, and initialized my SPTAudioStreamingController with my new class successfully.
Spotify does not say if the pointer, which is passed into the overridden function, points to a float, double, integer, etc.. I have interpreted it as many different data types, which all failed, leaving me confused if my fft is wrong or if my audio buffer is wrong. Here is spotify's documentation on SPTCoreAudioController.
Assuming the audio buffer is a buffer of floats, here is one of my attempts at FFT:
class GetAudioPCM : SPTCoreAudioController, EZAudioFFTDelegate{
let ViewControllerFFTWindowSize: vDSP_Length = 128
var fft: EZAudioFFTRolling?
//var fft: EZAudioFFT?
override func attempt(toDeliverAudioFrames audioFrames: UnsafeRawPointer!, ofCount frameCount: Int, streamDescription audioDescription: AudioStreamBasicDescription) -> Int {
if let fft = fft{
let newPointer = (UnsafeMutableRawPointer(mutating: audioFrames)!.assumingMemoryBound(to: Float.self))
let resultBuffer : UnsafeMutablePointer<Float> = (fft.computeFFT(withBuffer: newPointer, withBufferSize: 128))
print("results: \(resultBuffer.pointee)")
}else{
fft = EZAudioFFTRolling(windowSize: ViewControllerFFTWindowSize, sampleRate: Float(audioDescription.mSampleRate), delegate: self)
//fft = EZAudioFFT(maximumBufferSize: 128, sampleRate: Float(audioDescription.mSampleRate))
}
return super.attempt(toDeliverAudioFrames: audioFrames, ofCount: frameCount, streamDescription: audioDescription)
}
func fft(_ fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
print("\n \n DATA ---------------------")
print(bufferSize)
if (fft?.fftData) != nil {
print("First: \(fftData.pointee)")
for i : Int in 0..<Int(bufferSize) {
print(fftData[i], terminator: " :: ")
}
}
}
}
I make my custom class the EZAudioFFTDelegate, I initialize an EZAudioFFTRolling (have also tried plain EZAudioFFT) object, and tell it to perform an fft with the buffer. I chose a small size of 128 just for initial testing.
I have tried different data types for the buffer and different methods of fft. I decided using a well known library that has fft should give me the correct results. Yet, my output from this and similar FFT's produced 'nan' for almost every single float in the new buffer.
Is the way I access spotify's audio buffer wrong, or is my FFT process at fault?
I am making an app which needs to stream audio to a server. What I want to do is to divide the recorded audio into chunks and upload them while recording.
I used two recorders to do that, but it didn't work well; I can hear the difference between the chunks (stops for couple of milliseconds).
How can I do this?
Your problem can be broken into two pieces: recording and chunking (and uploading, but who cares).
For recording from the microphone and writing to the file, you can get started quickly with AVAudioEngine and AVAudioFile. See below for a sample, which records chunks at the device's default input sampling rate (you will probably want to rate convert that).
When you talk about the "difference between the chunks" you are referring to the ability to divide your audio data into pieces in such a way that when you concatenate them you don't hear discontinuities. e.g. LPCM audio data can be divided into chunks at the sample level, but the LPCM bitrate is high, so you're more likely to use a packetised format like adpcm (called ima4 on iOS?), or mp3 or aac. These formats can only be divided on packet boundaries, e.g. 64, 576 or 1024 samples, say. If your chunks are written without a header (usual for mp3 and aac, not sure about ima4), then concatenation is trivial: simply lay the chunks end to end, exactly as the cat command line tool would. Sadly, on iOS there is no mp3 encoder, so that leaves aac as a likely format for you, but that depends on your playback requirements. iOS devices and macs can definitely play it back.
import AVFoundation
class ViewController: UIViewController {
let engine = AVAudioEngine()
struct K {
static let secondsPerChunk: Float64 = 10
}
var chunkFile: AVAudioFile! = nil
var outputFramesPerSecond: Float64 = 0 // aka input sample rate
var chunkFrames: AVAudioFrameCount = 0
var chunkFileNumber: Int = 0
func writeBuffer(_ buffer: AVAudioPCMBuffer) {
let samplesPerSecond = buffer.format.sampleRate
if chunkFile == nil {
createNewChunkFile(numChannels: buffer.format.channelCount, samplesPerSecond: samplesPerSecond)
}
try! chunkFile.write(from: buffer)
chunkFrames += buffer.frameLength
if chunkFrames > AVAudioFrameCount(K.secondsPerChunk * samplesPerSecond) {
chunkFile = nil // close file
}
}
func createNewChunkFile(numChannels: AVAudioChannelCount, samplesPerSecond: Float64) {
let fileUrl = NSURL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent("chunk-\(chunkFileNumber).aac")!
print("writing chunk to \(fileUrl)")
let settings: [String: Any] = [
AVFormatIDKey: kAudioFormatMPEG4AAC,
AVEncoderBitRateKey: 64000,
AVNumberOfChannelsKey: numChannels,
AVSampleRateKey: samplesPerSecond
]
chunkFile = try! AVAudioFile(forWriting: fileUrl, settings: settings)
chunkFileNumber += 1
chunkFrames = 0
}
override func viewDidLoad() {
super.viewDidLoad()
let input = engine.inputNode!
let bus = 0
let inputFormat = input.inputFormat(forBus: bus)
input.installTap(onBus: bus, bufferSize: 512, format: inputFormat) { (buffer, time) -> Void in
DispatchQueue.main.async {
self.writeBuffer(buffer)
}
}
try! engine.start()
}
}
I'd like to record the some audio using AVAudioEngine and the users Microphone. I already have a working sample, but just can't figure out how to specify the format of the output that I want...
My requirement would be that I need the AVAudioPCMBuffer as I speak which it currently does...
Would I need to add a seperate node that does some transcoding? I can't find much documentation/samples on that problem...
And I am also a noob when it comes to Audio-Stuff. I know that I want NSData containing PCM-16bit with a max sample-rate of 16000 (8000 would be better)
Here's my working sample:
private var audioEngine = AVAudioEngine()
func startRecording() {
let format = audioEngine.inputNode!.inputFormatForBus(bus)
audioEngine.inputNode!.installTapOnBus(bus, bufferSize: 1024, format: format) { (buffer: AVAudioPCMBuffer, time:AVAudioTime) -> Void in
let audioFormat = PCMBuffer.format
print("\(audioFormat)")
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch { /* Imagine some super awesome error handling here */ }
}
If I changed the format to let' say
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.PCMFormatInt16, sampleRate: 8000.0, channels: 1, interleaved: false)
then if will produce an error saying that the sample rate needs to be the same as the hwInput...
Any help is very much appreciated!!!
EDIT: I just found AVAudioConverter but I need to be compatible with iOS8 as well...
You cannot change audio format directly on input nor output nodes. In the case of the microphone, the format will always be 44KHz, 1 channel, 32bits. To do so, you need to insert a mixer in between. Then when you connect inputNode > changeformatMixer > mainEngineMixer, you can specify the details of the format you want.
Something like:
var inputNode = audioEngine.inputNode
var downMixer = AVAudioMixerNode()
//I think you the engine's I/O nodes are already attached to itself by default, so we attach only the downMixer here:
audioEngine.attachNode(downMixer)
//You can tap the downMixer to intercept the audio and do something with it:
downMixer.installTapOnBus(0, bufferSize: 2048, format: downMixer.outputFormatForBus(0), block: //originally 1024
{ (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
print(NSString(string: "downMixer Tap"))
do{
print("Downmixer Tap Format: "+self.downMixer.outputFormatForBus(0).description)//buffer.audioBufferList.debugDescription)
})
//let's get the input audio format right as it is
let format = inputNode.inputFormatForBus(0)
//I initialize a 16KHz format I need:
let format16KHzMono = AVAudioFormat.init(commonFormat: AVAudioCommonFormat.PCMFormatInt16, sampleRate: 11050.0, channels: 1, interleaved: true)
//connect the nodes inside the engine:
//INPUT NODE --format-> downMixer --16Kformat--> mainMixer
//as you can see I m downsampling the default 44khz we get in the input to the 16Khz I want
audioEngine.connect(inputNode, to: downMixer, format: format)//use default input format
audioEngine.connect(downMixer, to: audioEngine.outputNode, format: format16KHzMono)//use new audio format
//run the engine
audioEngine.prepare()
try! audioEngine.start()
I would recommend using an open framework such as EZAudio, instead, though.
The only thing I found that worked to change the sampling rate was
AVAudioSettings.sharedInstance().setPreferredSampleRate(...)
You can tap off engine.inputNode and use the input node's output format:
engine.inputNode.installTap(onBus: 0, bufferSize: 2048,
format: engine.inputNode.outputFormat(forBus: 0))
Unfortunately, there is no guarantee that you will get the sample rate that you want, although it seems like 8000, 12000, 16000, 22050, 44100 all worked.
The following did NOT work:
Setting the my custom format in a tap off engine.inputNode. (Exception)
Adding a mixer with my custom format and tapping that. (Exception)
Adding a mixer, connecting it with the inputNode's format, connecting the mixer to the main mixer with my custom format, then removing the input of the outputNode so as not to send the audio to the speaker and get instant feedback. (Worked, but got all zeros)
Not using my custom format at all in the AVAudioEngine, and using AVAudioConverter to convert from the hardware rate in my tap. (Length of the buffer was not set, no way to tell if results were correct)
This was with iOS 12.3.1.
In order to change the sample rate of input node, you have to first connect the input node to a mixer node, and specify a new format in the parameter.
let input = avAudioEngine.inputNode
let mainMixer = avAudioEngine.mainMixerNode
let newAudioFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 1, interleaved: true)
avAudioEngine.connect(input, to: mainMixer, format: newAudioFormat)
Now you can call installTap function on input node with the newAudioFormat.
One more thing I'd like to point out is, since the new launch of iPhone12, the default sample rate of input node has been no longer 44100 anymore. It has been upgraded to 48000.
You cannot change the configuration of input node, try to create a mixer node with the format that you want, attach it to the engine, then connect it to the input node and then connect the mainMixer to the node that you just created. Now you can install a tap on this node to get PCM data.
Note that for some strange reasons, you don't have a lot of choice for sample rate! At least not on iOS 9.1, Use standard 11025, 22050 or 44100. Any other sample rate will fail!
If you just need to change the sample rate and channel, I recommend using row-level API. You do not need to use a mixer or converter. Here you can find the Apple document about low-level recording. If you want, you will be able to convert to Objective-C class and add protocol.
Audio Queue Services Programming Guide
If your goal is simply to end up with AVAudioPCMBuffers that contains audio in your desired format, you can convert the buffers returned in the tap block using AVAudioConverter. This way, you actually don't need to know or care what the format of the inputNode is.
class MyBufferRecorder {
private let audioEngine:AVAudioEngine = AVAudioEngine()
private var inputNode:AVAudioInputNode!
private let audioQueue:DispatchQueue = DispatchQueue(label: "Audio Queue 5000")
private var isRecording:Bool = false
func startRecording() {
if (isRecording) {
return
}
isRecording = true
// must convert (unknown until runtime) input format to our desired output format
inputNode = audioEngine.inputNode
let inputFormat:AVAudioFormat! = inputNode.outputFormat(forBus: 0)
// 9600 is somewhat arbitrary... min seems to be 4800, max 19200... it doesn't matter what we set
// because we don't re-use this value -- we query the buffer returned in the tap block for it's true length.
// Using [weak self] in the tap block is probably a better idea, but it results in weird warnings for now
inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(9600), format: inputFormat) { (buffer, time) in
// not sure if this is necessary
if (!self.isRecording) {
print("\nDEBUG - rejecting callback, not recording")
return }
// not really sure if/why this needs to be async
self.audioQueue.async {
// Convert recorded buffer to our preferred format
let convertedPCMBuffer = AudioUtils.convertPCMBuffer(bufferToConvert: buffer, fromFormat: inputFormat, toFormat: AudioUtils.desiredFormat)
// do something with converted buffer
}
}
do {
// important not to start engine before installing tap
try audioEngine.start()
} catch {
print("\nDEBUG - couldn't start engine!")
return
}
}
func stopRecording() {
print("\nDEBUG - recording stopped")
isRecording = false
inputNode.removeTap(onBus: 0)
audioEngine.stop()
}
}
Separate class:
import Foundation
import AVFoundation
// assumes we want 16bit, mono, 44100hz
// change to what you want
class AudioUtils {
static let desiredFormat:AVAudioFormat! = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: Double(44100), channels: 1, interleaved: false)
// PCM <--> PCM
static func convertPCMBuffer(bufferToConvert: AVAudioPCMBuffer, fromFormat: AVAudioFormat, toFormat: AVAudioFormat) -> AVAudioPCMBuffer {
let convertedPCMBuffer = AVAudioPCMBuffer(pcmFormat: toFormat, frameCapacity: AVAudioFrameCount(bufferToConvert.frameLength))
var error: NSError? = nil
let inputBlock:AVAudioConverterInputBlock = {inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return bufferToConvert
}
let formatConverter:AVAudioConverter = AVAudioConverter(from:fromFormat, to: toFormat)!
formatConverter.convert(to: convertedPCMBuffer!, error: &error, withInputFrom: inputBlock)
if error != nil {
print("\nDEBUG - " + error!.localizedDescription)
}
return convertedPCMBuffer!
}
}
This is by no means production ready code -- I'm also learning IOS Audio... so please, please let me know any errors, best practices, or dangerous things going on in that code and I'll keep this answer updated.