It's hard to explain, but I will try to do it...
Is that possible to add some delay between audio processing and sound from speakers to user?
I'm use the Speech framework to recognise an audio file (using SFSpeechAudioBufferRecognitionRequest and AVAudioEngine to stream an audio file) and get the final result from recogniser after user already heard a sound from speakers.
But a sound should appears after I get some result from recogniser. Need to add some latency between input to recogniser from my AVAudioMixerNode and sound to user.
if (self.recognizer?.isAvailable)! {
let recordingFormat = self.node.outputFormat(forBus: 0)
self.node.installTap(onBus: 0, bufferSize: 2048, format: recordingFormat) { buffer, _ in
self.request?.append(buffer)
}
self.task = self.recognizer.recognitionTask(with: self.request!, resultHandler: { [weak self] result, error in
guard let self = self, let result = result else { return }
print("Result: \(result.bestTranscription.formattedString)") // Need to print the result before user heard a sound from speakers
})
}
Related
I struggle to make the following scenario work as expected (code will be provided below).
Record my microphone input and store an AVAudioPCMBuffer in memory, this is done with AVAudioPCMBuffer extension method copy(from buffer: AVAudioPCMBuffer, readOffset: AVAudioFrameCount = default, frames: AVAudioFrameCount = default). I indeed get the buffer at the end of my recording.
When record is ended pass the buffer to AKPlayer and play. Here is a code snippet to demonstrate what I do (I know it is no the full app code, if needed I can share it):
.
private var player: AKPlayer = AKPlayer()
self.player.buffering = .always
// in the record complete callbak:
self.player.buffer = self.bufferRecorder?.pcmBuffer
self.player.volume = 1
self.player.play()
please note that the plater is connected to a mixer which is eventually connected to the AudioKit output.
when I inspect and debug the application I could see the buffer is with the correct length, and all my output/input setup uses the same processing format (sample rate, channels, bitrate etc) as well as the buffer recorded, but still my app crashes on this line:
2018-10-28 08:40:32.625001+0200 BeatmanApp[71037:6731884] [avae] AVAEInternal.h:70:_AVAE_Check:
required condition is false: [AVAudioPlayerNode.mm:665:ScheduleBuffer: (_outputFormat.channelCount == buffer.format.channelCount)]
when I debug and walk through the AudioKit code I can see that the breaking line is on AKPlayer+Playback.swift on line 162 on the method: playerNode.scheduleBuffer
more information that could be helpful:
the buffer recorded is 16 seconds long.
when I tried to pass the buffer straight to the player node in the tap method it seems as it worked, I did hear a delay from mic to speaker but it indeed played back.
I tried call prepare on the player before play method invoked, no help
thanks!
Ok, this was super uncool debugging session. I had to investigate the AVAudioEngine and how this kind of scenario could be done there, which of course not the final result I was looking. This quest helped me to understand how to solve it with AudioKit (half of my app is implemented using AudioKit's tools so it doesn't make sense to rewrite it with AVFoundation).
AFFoundation solution:
private let engine = AVAudioEngine()
private let bufferSize = 1024
private let p: AVAudioPlayerNode = AVAudioPlayerNode()
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.playAndRecord, mode: .default, options: .defaultToSpeaker)
} catch {
print("Setting category to AVAudioSessionCategoryPlayback failed.")
}
let inputNode = self.engine.inputNode
engine.connect(inputNode, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
// !!! the following lines are the key to the solution.
// !!! the player has to be attached to the engine before actually connected
engine.attach(p)
engine.connect(p, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
do {
try engine.start()
} catch {
print("could not start engine \(error.localizedDescription)")
}
recordBufferAndPlay(duration: 4)
recordBufferAndPlay function:
func recordBufferAndPlay(duration: Double){
let inputNode = self.engine.inputNode
let total: Double = AVAudioSession.sharedInstance().sampleRate * duration
let totalBufferSize: UInt32 = UInt32(total)
let recordedBuffer : AVAudioPCMBuffer! = AVAudioPCMBuffer(pcmFormat: inputNode.inputFormat(forBus: 0), frameCapacity: totalBufferSize)
var alreadyRecorded = 0
inputNode.installTap(onBus: 0, bufferSize: 256, format: inputNode.inputFormat(forBus: 0)) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
recordedBuffer.copy(from: buffer) // this helper function is taken from audio kit!
alreadyRecorded = alreadyRecorded + Int(buffer.frameLength)
print(alreadyRecorded, totalBufferSize)
if(alreadyRecorded >= totalBufferSize){
inputNode.removeTap(onBus: 0)
self.p.scheduleBuffer(recordedBuffer, at: nil, options: .loops, completionHandler: {
print("completed playing")
})
self.p.play()
}
}
}
AudioKit solution:
So in the AudioKit solution these line should be invoked on your AKPlayer object. Note that this should be done before you actually start your engine.
self.player.buffering = .always
AudioKit.engine.attach(self.player.playerNode)
AudioKit.engine.connect(self.player.playerNode, to: self.mixer.inputNode, format: AudioKit.engine.inputNode.outputFormat(forBus: 0))
than the record is done pretty similarly to how you would have done it in AVAudioEngine, you install a tap on your node (microphone or other node) and record the buffer of PCM samples.
I'm new to Audiokit and I'm trying to do some real-time digital signal processing on input audio from the microphone.
I know the data I want is in AKAudioFile's FloatChannelData, but what if I want to obtain this in real-time? I'm currently using AKMicrophone, AKFrequencyTracker, AKNodeOutputPlot, AKBooster and I'm plotting the tracker's amplitude data. However, that data is not the same as the audio signal (as you know, it's the RMS). Is there any way I can obtain the signal's Float data from the mic? Or even from the AKNodeOutputPlot? I just need read-access.
AKSettings.audioInputEnabled = true
mic = AKMicrophone()
plot = AKNodeOutputPlot(mic, frame: audioInputPlot.bounds)
tracker = AKFrequencyTracker.init(mic)
silence = AKBooster(tracker,gain:0)
AudioKit.output = silence
AudioKit.start()
The creator of recommends here:
AKNodeOutputPlot works, its one short file. You're basically just tapping the node and grabbing the data.
How would this work in my viewController if if have an instance of plot (AKNodeOutputPlot), mic(AKMicrophone) and want to output those values to a label?
Use a tap on which ever node you want to get the data out from. I used AKNodeOutputPlot in my quote above because its is fairly straightforward, just using that data as input for a plot, but you could take the data and do whatever with it. In this code (from AKNodeOutputPlot):
internal func setupNode(_ input: AKNode?) {
if !isConnected {
input?.avAudioNode.installTap(
onBus: 0,
bufferSize: bufferSize,
format: nil) { [weak self] (buffer, _) in
guard let strongSelf = self else {
AKLog("Unable to create strong reference to self")
return
}
buffer.frameLength = strongSelf.bufferSize
let offset = Int(buffer.frameCapacity - buffer.frameLength)
if let tail = buffer.floatChannelData?[0] {
strongSelf.updateBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
isConnected = true
}
You get the buffer data in real time. Here we just send it to "updateBuffer" where it gets plotted, but instead of plotting you'd do something else.
To complete Aurelius Prochazka answer:
To record the audio flowing through a node, you need to attach a tap to it. A tap is just a closure which get called each time a buffer is available.
Here is a sample code you can reuse in your own class:
var mic = AKMicrophone()
func initMicrophone() {
// Facultative, allow to set the sampling rate of the microphone
AKSettings.sampleRate = 44100
// Link the microphone note to the output of AudioKit with a volume of 0.
AudioKit.output = AKBooster(mic, gain:0)
// Start AudioKit engine
try! AudioKit.start()
// Add a tap to the microphone
mic?.avAudioNode.installTap(
onBus: audioBus, bufferSize: 4096, format: nil // I choose a buffer size of 4096
) { [weak self] (buffer, _) in //self is now a weak reference, to prevent retain cycles
// We try to create a strong reference to self, and name it strongSelf
guard let strongSelf = self else {
print("Recorder: Unable to create strong reference to self #1")
return
}
// We look at the buffer if it contains data
buffer.frameLength = strongSelf.bufferSize
let offset = Int(buffer.frameCapacity - buffer.frameLength)
if let tail = buffer.floatChannelData?[0] {
// We convert the content of the buffer to a swift array
let samples = Array(UnsafeBufferPointer(start: &tail[offset], count: 4096))
strongSelf.myFunctionHandlingData(samples)
}
}
func myFunctionhandlingData(data: [Float]) {
// ...
}
Be careful to use DispatchQueue or an other synchronisation mechanism if you need a to interact on this data between different threads.
In my case I do use :
DispatchQueue.main.async { [weak self] in
guard let strongSelf = self else {
print("Recorder: Unable to create strong reference to self #2")
return
}
strongSelf.myFunctionHandlingData(samples)
}
so that my function run in the main thread.
I am having a really difficult time with playing audio in the background of my app. The app is a timer that is counting down and plays bells, and everything worked using the timer originally. Since you cannot run a timer over 3 minutes in the background, I need to play the bells another way.
The user has the ability to choose bells and set the time for these bells to play (e.g. play bell immediately, after 5 minutes, repeat another bell every 10 minutes, etc).
So far I have tried using notifications using DispatchQueue.main and this will work fine if the user does not pause the timer. If they re-enter the app though and pause, I cannot seem to cancel this queue or pause it in anyway.
Next I tried using AVAudioEngine, and created a set of nodes. These will play while the app is in the foreground but seem to stop upon backgrounding. Additionally when I pause the engine and resume later, it won't pause the sequence properly. It will squish the bells into playing one after the other or not at all.
If anyone has any ideas of how to solve my issue that would be great. Technically I could try remove everything from the engine and recreate it from the paused time when the user pauses/resumes, but this seems quite costly. It also doesn't solve the problem of the audio stopping in the background. I have the required background mode 'App plays audio or streams audio/video using Airplay', and it is also checked under the background modes in capabilities.
Below is a sample of how I tried to set up the audio engine. The registerAndPlaySound method is called several other times to create the chain of nodes (or is this done incorrectly?). The code is kinda messy at the moment because I have been trying many ways trying to get this to work.
func setupSounds{
if (attached){
engine.detach(player)
}
engine.attach(player)
attached = true
let mixer = engine.mainMixerNode
engine.connect(player, to: mixer, format: mixer.outputFormat(forBus: 0))
var bell = ""
do {
try engine.start()
} catch {
return
}
if (currentSession.bellObject?.startBell != nil){
bell = (currentSession.bellObject?.startBell)!
guard let url = Bundle.main.url(forResource: bell, withExtension: "mp3") else {
return
}
registerAndPlaySound(url: url, delay: warmUpTime)
}
}
func registerAndPlaySound(url: URL, delay: Double) {
do {
let file = try AVAudioFile(forReading: url)
let format = file.processingFormat
let capacity = file.length
let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))
do {
try file.read(into: buffer)
}catch {
return
}
let sampleRate = buffer.format.sampleRate
let sampleTime = sampleRate*delay
let futureTime = AVAudioTime(sampleTime: AVAudioFramePosition(sampleTime), atRate: sampleRate)
player.scheduleBuffer(buffer, at: futureTime, options: AVAudioPlayerNodeBufferOptions(rawValue: 0), completionHandler: nil)
player.play()
} catch {
return
}
}
To explain my situation a little better I'm trying to make an app which will play a ping noise when a button is pressed and then proceed to record and transcribe the user's voice immediately after.
For the ping sound I'm using System Sound Services, to record the audio I'm using AudioToolbox, and to transcribe it I'm using Speech kit.
I believe the crux of my problem lies in the timing of the asynchronous System sound services play function:
//Button pressed function
let audiosession = AVAudioSession.sharedInstance()
let filename = "Ping"
let ext = "wav"
if let soundUrl = Bundle.main.url(forResource: filename, withExtension: ext){
var soundId: SystemSoundID = 0
AudioServicesCreateSystemSoundID(soundUrl as CFURL, &soundId)
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(soundid,_) -> Void in
AudioServicesDisposeSystemSoundID(soundid)
print("Sound played!")}, nil)
AudioServicesPlaySystemSound(soundId)
}
do{
try audiosession.setCategory(AVAudioSessionCategoryRecord)
try audiosession.setMode(AVAudioSessionModeMeasurement)
try audiosession.setActive(true, with: .notifyOthersOnDeactivation)
print("Changing modes!")
}catch{
print("error with audio session")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else{
fatalError("Audio engine has no input node!")
}
guard let recognitionRequest = recognitionRequest else{
fatalError("Unable to create a speech audio buffer recognition request object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, delegate: self)
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do{
try audioEngine.start()
delegate?.didStartRecording()
}catch{
print("audioEngine couldn't start because of an error")
}
What happens when I run this code is that it records the voice and transcribes it successfully. However the ping is never played. The two(non-error) print statements I have in there fire in the order:
Changing modes!
Sound played!
So to my understanding, the reason the ping sound isn't being played is because by the time it actually completes I've already changed the audio session category from playback to record. Just to verify this is true, I tried removing everything but the sound services ping and it plays the sound as expected.
So my question is what is the best way to bypass the asynchronous nature of the AudioServicesPlaySystemSound call? I've experimented with trying to pass self into the completion function so I could have it trigger a function in my class which then runs the recording chunk. However I haven't been able to figure out how one actually goes about converting self to an UnsafeMutableRawPointer so it can be passed as clientData. Furthermore, even if I DID know how to do that, I'm not sure if it's even a good idea or the intended use of that parameter.
Alternatively, I could probably solve this problem by relying on something like notification center. But once again that just seems like a very clunky way of solving the problem that I'm going to end up regretting later.
Does anyone know what the correct way to handle this type of situation is?
Update:
As per Gruntcake's request, here is my attempt to access self in the completion block.
First I create a userData constant which is an UnsafeMutableRawPointer to self:
var me = self
let userData = withUnsafePointer(to: &me) { ptr in
return unsafeBitCast(ptr, to: UnsafeMutableRawPointer.self)
Next I use that constant in my callback block, and attempt to access self from it:
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(sounded,me) -> Void in
AudioServicesDisposeSystemSoundID(sounded)
let myself = Unmanaged<myclassname>.fromOpaque(me!).takeRetainedValue()
myself.doOtherStuff()
print("Sound played!")}, userData)
Your attempt to call doOtherStuff() in the completion block is a correct approach (the only other one is notifications, those are the only two options)
What is complicating it in this case is the bridging from Obj-C to Swift that is necessary. Code to do that is:
let myData = unsafeBitCast(self, UnsafeMutablePointer<Void>.self)
AudioServicesAddSystemSoundCompletion(YOUR_SOUND_ID, CFRunLoopGetMain(), kCFRunLoopDefaultMode,{ (mSound, mVoid) in
let me = unsafeBitCast(mVoid, YOURCURRENTCLASS.self)
//me it is your current object so if yo have a variable like
// var someVar you can do
print(me.someVar)
}, myData)
Credit: This code was taken from an answer to this question, though it is not the accepted answer:
How do I implement AudioServicesSystemSoundCompletionProc in Swift?
I'd like to record the some audio using AVAudioEngine and the users Microphone. I already have a working sample, but just can't figure out how to specify the format of the output that I want...
My requirement would be that I need the AVAudioPCMBuffer as I speak which it currently does...
Would I need to add a seperate node that does some transcoding? I can't find much documentation/samples on that problem...
And I am also a noob when it comes to Audio-Stuff. I know that I want NSData containing PCM-16bit with a max sample-rate of 16000 (8000 would be better)
Here's my working sample:
private var audioEngine = AVAudioEngine()
func startRecording() {
let format = audioEngine.inputNode!.inputFormatForBus(bus)
audioEngine.inputNode!.installTapOnBus(bus, bufferSize: 1024, format: format) { (buffer: AVAudioPCMBuffer, time:AVAudioTime) -> Void in
let audioFormat = PCMBuffer.format
print("\(audioFormat)")
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch { /* Imagine some super awesome error handling here */ }
}
If I changed the format to let' say
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.PCMFormatInt16, sampleRate: 8000.0, channels: 1, interleaved: false)
then if will produce an error saying that the sample rate needs to be the same as the hwInput...
Any help is very much appreciated!!!
EDIT: I just found AVAudioConverter but I need to be compatible with iOS8 as well...
You cannot change audio format directly on input nor output nodes. In the case of the microphone, the format will always be 44KHz, 1 channel, 32bits. To do so, you need to insert a mixer in between. Then when you connect inputNode > changeformatMixer > mainEngineMixer, you can specify the details of the format you want.
Something like:
var inputNode = audioEngine.inputNode
var downMixer = AVAudioMixerNode()
//I think you the engine's I/O nodes are already attached to itself by default, so we attach only the downMixer here:
audioEngine.attachNode(downMixer)
//You can tap the downMixer to intercept the audio and do something with it:
downMixer.installTapOnBus(0, bufferSize: 2048, format: downMixer.outputFormatForBus(0), block: //originally 1024
{ (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
print(NSString(string: "downMixer Tap"))
do{
print("Downmixer Tap Format: "+self.downMixer.outputFormatForBus(0).description)//buffer.audioBufferList.debugDescription)
})
//let's get the input audio format right as it is
let format = inputNode.inputFormatForBus(0)
//I initialize a 16KHz format I need:
let format16KHzMono = AVAudioFormat.init(commonFormat: AVAudioCommonFormat.PCMFormatInt16, sampleRate: 11050.0, channels: 1, interleaved: true)
//connect the nodes inside the engine:
//INPUT NODE --format-> downMixer --16Kformat--> mainMixer
//as you can see I m downsampling the default 44khz we get in the input to the 16Khz I want
audioEngine.connect(inputNode, to: downMixer, format: format)//use default input format
audioEngine.connect(downMixer, to: audioEngine.outputNode, format: format16KHzMono)//use new audio format
//run the engine
audioEngine.prepare()
try! audioEngine.start()
I would recommend using an open framework such as EZAudio, instead, though.
The only thing I found that worked to change the sampling rate was
AVAudioSettings.sharedInstance().setPreferredSampleRate(...)
You can tap off engine.inputNode and use the input node's output format:
engine.inputNode.installTap(onBus: 0, bufferSize: 2048,
format: engine.inputNode.outputFormat(forBus: 0))
Unfortunately, there is no guarantee that you will get the sample rate that you want, although it seems like 8000, 12000, 16000, 22050, 44100 all worked.
The following did NOT work:
Setting the my custom format in a tap off engine.inputNode. (Exception)
Adding a mixer with my custom format and tapping that. (Exception)
Adding a mixer, connecting it with the inputNode's format, connecting the mixer to the main mixer with my custom format, then removing the input of the outputNode so as not to send the audio to the speaker and get instant feedback. (Worked, but got all zeros)
Not using my custom format at all in the AVAudioEngine, and using AVAudioConverter to convert from the hardware rate in my tap. (Length of the buffer was not set, no way to tell if results were correct)
This was with iOS 12.3.1.
In order to change the sample rate of input node, you have to first connect the input node to a mixer node, and specify a new format in the parameter.
let input = avAudioEngine.inputNode
let mainMixer = avAudioEngine.mainMixerNode
let newAudioFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 1, interleaved: true)
avAudioEngine.connect(input, to: mainMixer, format: newAudioFormat)
Now you can call installTap function on input node with the newAudioFormat.
One more thing I'd like to point out is, since the new launch of iPhone12, the default sample rate of input node has been no longer 44100 anymore. It has been upgraded to 48000.
You cannot change the configuration of input node, try to create a mixer node with the format that you want, attach it to the engine, then connect it to the input node and then connect the mainMixer to the node that you just created. Now you can install a tap on this node to get PCM data.
Note that for some strange reasons, you don't have a lot of choice for sample rate! At least not on iOS 9.1, Use standard 11025, 22050 or 44100. Any other sample rate will fail!
If you just need to change the sample rate and channel, I recommend using row-level API. You do not need to use a mixer or converter. Here you can find the Apple document about low-level recording. If you want, you will be able to convert to Objective-C class and add protocol.
Audio Queue Services Programming Guide
If your goal is simply to end up with AVAudioPCMBuffers that contains audio in your desired format, you can convert the buffers returned in the tap block using AVAudioConverter. This way, you actually don't need to know or care what the format of the inputNode is.
class MyBufferRecorder {
private let audioEngine:AVAudioEngine = AVAudioEngine()
private var inputNode:AVAudioInputNode!
private let audioQueue:DispatchQueue = DispatchQueue(label: "Audio Queue 5000")
private var isRecording:Bool = false
func startRecording() {
if (isRecording) {
return
}
isRecording = true
// must convert (unknown until runtime) input format to our desired output format
inputNode = audioEngine.inputNode
let inputFormat:AVAudioFormat! = inputNode.outputFormat(forBus: 0)
// 9600 is somewhat arbitrary... min seems to be 4800, max 19200... it doesn't matter what we set
// because we don't re-use this value -- we query the buffer returned in the tap block for it's true length.
// Using [weak self] in the tap block is probably a better idea, but it results in weird warnings for now
inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(9600), format: inputFormat) { (buffer, time) in
// not sure if this is necessary
if (!self.isRecording) {
print("\nDEBUG - rejecting callback, not recording")
return }
// not really sure if/why this needs to be async
self.audioQueue.async {
// Convert recorded buffer to our preferred format
let convertedPCMBuffer = AudioUtils.convertPCMBuffer(bufferToConvert: buffer, fromFormat: inputFormat, toFormat: AudioUtils.desiredFormat)
// do something with converted buffer
}
}
do {
// important not to start engine before installing tap
try audioEngine.start()
} catch {
print("\nDEBUG - couldn't start engine!")
return
}
}
func stopRecording() {
print("\nDEBUG - recording stopped")
isRecording = false
inputNode.removeTap(onBus: 0)
audioEngine.stop()
}
}
Separate class:
import Foundation
import AVFoundation
// assumes we want 16bit, mono, 44100hz
// change to what you want
class AudioUtils {
static let desiredFormat:AVAudioFormat! = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: Double(44100), channels: 1, interleaved: false)
// PCM <--> PCM
static func convertPCMBuffer(bufferToConvert: AVAudioPCMBuffer, fromFormat: AVAudioFormat, toFormat: AVAudioFormat) -> AVAudioPCMBuffer {
let convertedPCMBuffer = AVAudioPCMBuffer(pcmFormat: toFormat, frameCapacity: AVAudioFrameCount(bufferToConvert.frameLength))
var error: NSError? = nil
let inputBlock:AVAudioConverterInputBlock = {inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return bufferToConvert
}
let formatConverter:AVAudioConverter = AVAudioConverter(from:fromFormat, to: toFormat)!
formatConverter.convert(to: convertedPCMBuffer!, error: &error, withInputFrom: inputBlock)
if error != nil {
print("\nDEBUG - " + error!.localizedDescription)
}
return convertedPCMBuffer!
}
}
This is by no means production ready code -- I'm also learning IOS Audio... so please, please let me know any errors, best practices, or dangerous things going on in that code and I'll keep this answer updated.