Swift5 : Voice recognition with YoutubePlayerKit via AirPod Pro - youtube

I need to play Youtube and recognize my voice at same time.I could succeed to recognize my voice when the system speaker was selected for Youtube play.How ever the recognition of my voice was failed when the bluetooth hands free device such as AirPod Pro was selected for Youtube play.If I play sound using AVPlayer and AVAudioPlayer and recognize my voice at same time, the recognition of my voice was succeeded via bluetooth hands free device.Please advice me how to pass my voice from BLE hands free device as the sound input when specifying allowBluetooth option to AVAudioSession.
Before playing YoutubePlayerKit, audioSession was set as follows.AVAudioSession options: .allowBluetooth worked fine for BLE and system sound input for AVPlayer and AVAudioPlayer but not for YoutubePlayerKit.
let configuration = YouTubePlayer.Configuration(
isUserInteractionEnabled: false,
autoPlay: false,
showControls: false,
enableJavaScriptAPI: true,
loopEnabled: false,
playInline: true
)
do {
audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, mode: .measurement, options: .allowBluetooth)
// try audioSession.setCategory(.playAndRecord, mode: .measurement, options: .duckOthers)
try AVAudioSession.sharedInstance().setActive(true, options: .notifyOthersOnDeactivation)
} catch _ {
handleError(withMessage:"play youtube audio session error")// failed to record
}
youTubeID = URL(string: movieURL)!.lastPathComponent
let videoID = URL(string: movieURL)?.deletingLastPathComponent()
print("youTubeID \(String(describing: youTubeID)) videoID \(String(describing: videoID))")
if (videoID)!.absoluteString.contains("https://youtu.be/"){
playYoutube = true
DispatchQueue.main.async{ [self] in
youTubePlayer = YouTubePlayer(
source: .video(id: youTubeID),
configuration: configuration)
youTubePlayerView = YouTubePlayerHostingView(player: youTubePlayer)
youTubePlayer.pause()
youTubePlayerView.frame = self.topView.bounds
self.topView.addSubview(youTubePlayerView)
self.checkYoutubeDuration(player:youTubePlayer)
}
}
And set audioEngine as follows when doing voice recognition.
guard let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US")), recognizer.isAvailable else {
handleError(withMessage: "voice recobnition error")
return
}
audioEngine = AVAudioEngine()
inputNode = audioEngine.inputNode
inputNode.removeTap(onBus: 0)
guard let commonFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 2, interleaved: false) else { return }
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
}
self.audioEngine.prepare()
do {
try self.audioEngine.start()
print("audioEngine.start()")
}catch{
print("audioEngine.start failed")
}
// Create a speech recognition request.
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest!.shouldReportPartialResults = true

Related

How to implement band stop filter using AVAudioEngine

I am building an app that needs to perform analysis on the audio it receives from the microphone in real time. In my app, I also need to play a beep sound and start recording audio at the same time, in other words, I can't play the beep sound and then start recording. This introduces the problem of hearing the beep sound in my recording, (this might be because I am playing the beep sound through the speaker, but unfortunately I cannot compromise in this regard either). Since the beep sound is just a tone of about 2350 kHz, I was wondering how I could exclude that range of frequencies (say from 2300 kHz to 2400 kHz) in my recordings and prevent it from influencing my audio samples. After doing some googling I came up with what I think might be the solution, a band stop filter. According to Wikipedia: "a band-stop filter or band-rejection filter is a filter that passes most frequencies unaltered, but attenuates those in a specific range to very low levels". This seems like what I need to to exclude frequencies from 2300 kHz to 2400 kHz in my recordings (or at least for the first second of the recording while the beep sound is playing). My question is: how would I implement this with AVAudioEngine? Is there a way I can turn off the filter after the first second of the recording when the beep sound is done playing without stopping the recording?
Since I am new to working with audio with AVAudioEngine (I've always just stuck to the higher levels of AVFoundation) I followed this tutorial to help me create a class to handle all the messy stuff. This is what my code looks like:
class Recorder {
enum RecordingState {
case recording, paused, stopped
}
private var engine: AVAudioEngine!
private var mixerNode: AVAudioMixerNode!
private var state: RecordingState = .stopped
private var audioPlayer = AVAudioPlayerNode()
init() {
setupSession()
setupEngine()
}
fileprivate func setupSession() {
let session = AVAudioSession.sharedInstance()
//The original tutorial sets the category to .record
//try? session.setCategory(.record)
try? session.setCategory(.playAndRecord, options: [.mixWithOthers, .defaultToSpeaker])
try? session.setActive(true, options: .notifyOthersOnDeactivation)
}
fileprivate func setupEngine() {
engine = AVAudioEngine()
mixerNode = AVAudioMixerNode()
// Set volume to 0 to avoid audio feedback while recording.
mixerNode.volume = 0
engine.attach(mixerNode)
//Attach the audio player node
engine.attach(audioPlayer)
makeConnections()
// Prepare the engine in advance, in order for the system to allocate the necessary resources.
engine.prepare()
}
fileprivate func makeConnections() {
let inputNode = engine.inputNode
let inputFormat = inputNode.outputFormat(forBus: 0)
engine.connect(inputNode, to: mixerNode, format: inputFormat)
let mainMixerNode = engine.mainMixerNode
let mixerFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: inputFormat.sampleRate, channels: 1, interleaved: false)
engine.connect(mixerNode, to: mainMixerNode, format: mixerFormat)
//AudioPlayer Connection
let path = Bundle.main.path(forResource: "beep.mp3", ofType:nil)!
let url = URL(fileURLWithPath: path)
let file = try! AVAudioFile(forReading: url)
engine.connect(audioPlayer, to: mainMixerNode, format: nil)
audioPlayer.scheduleFile(file, at: nil)
}
//MARK: Start Recording Function
func startRecording() throws {
print("Start Recording!")
let tapNode: AVAudioNode = mixerNode
let format = tapNode.outputFormat(forBus: 0)
let documentURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
// AVAudioFile uses the Core Audio Format (CAF) to write to disk.
// So we're using the caf file extension.
let file = try AVAudioFile(forWriting: documentURL.appendingPathComponent("recording.caf"), settings: format.settings)
tapNode.installTap(onBus: 0, bufferSize: 4096, format: format, block: {
(buffer, time) in
try? file.write(from: buffer)
print(buffer.description)
print(buffer.stride)
let floatArray = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:Int(buffer.frameLength)))
})
try engine.start()
audioPlayer.play()
state = .recording
}
//MARK: Other recording functions
func resumeRecording() throws {
try engine.start()
state = .recording
}
func pauseRecording() {
engine.pause()
state = .paused
}
func stopRecording() {
// Remove existing taps on nodes
mixerNode.removeTap(onBus: 0)
engine.stop()
state = .stopped
}
}
AVAudioUnitEQ supports a band-stop filter.
Perhaps something like:
// Create an instance of AVAudioUnitEQ and connect it to the engine's main mixer
let eq = AVAudioUnitEQ(numberOfBands: 1)
engine.attach(eq)
engine.connect(eq, to: engine.mainMixerNode, format: nil)
engine.connect(player, to: eq, format: nil)
eq.bands[0].frequency = 2350
eq.bands[0].filterType = .bandStop
eq.bands[0].bypass = false
A slightly more complete answer, linked to an IBAction; in this example, I use .parametric for the filter type, with more bands than required, to give a broader insight on how to use it:
#IBAction func PlayWithEQ(_ sender: Any) {
self.engine.stop()
self.engine = AVAudioEngine()
let player = AVAudioPlayerNode()
let url = Bundle.main.url(forResource:"yoursong", withExtension: "m4a")!
let f = try! AVAudioFile(forReading: url)
self.engine.attach(player)
// adding eq effect node
let effect = AVAudioUnitEQ(numberOfBands: 4)
let bands = effect.bands
let freq = [125, 250, 2350, 8000]
for i in 0...(bands.count - 1) {
bands[i].frequency = Float(freq[i])
}
bands[0].gain = 0.0
bands[0].filterType = .parametric
bands[0].bandwidth = 1
bands[1].gain = 0.0
bands[1].filterType = .parametric
bands[1].bandwidth = 0.5
// filter of interest, rejecting 2350Hz (adjust bandwith as needed)
bands[2].gain = -60.0
bands[2].filterType = .parametric
bands[2].bandwidth = 1
bands[3].gain = 0.0
bands[3].filterType = .parametric
bands[3].bandwidth = 1
self.engine.attach(effect)
self.engine.connect(player, to: effect, format: f.processingFormat)
let mixer = self.engine.mainMixerNode
self.engine.connect(effect, to: mixer, format: f.processingFormat)
player.scheduleFile(f, at: nil) {
delay(0.05) {
if self.engine.isRunning {
self.engine.stop()
}
}
}
self.engine.prepare()
try! self.engine.start()
player.play()
}

Tap audio output using AVAudioEngine

I'm trying install a tap on the output audio that is played on my app. I have no issue catching buffer from microphone input, but when it comes to catch sound that it goes trough the speaker or the earpiece or whatever the output device is, it does not succeed. Am I missing something?
In my example I'm trying to catch the audio buffer from an audio file that an AVPLayer is playing. But let's pretend I don't have access directly to the AVPlayer instance.
The goal is to perform Speech Recognition on an audio stream.
func catchAudioBuffers() throws {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, mode: .voiceChat, options: .allowBluetooth)
try audioSession.setActive(true)
let outputNode = audioEngine.outputNode
let recordingFormat = outputNode.outputFormat(forBus: 0)
outputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
// PROCESS AUDIO BUFFER
}
audioEngine.prepare()
try audioEngine.start()
// For example I am playing an audio conversation with an AVPlayer and a local file.
player.playSound()
}
This code results in a:
AVAEInternal.h:76 required condition is false: [AVAudioIONodeImpl.mm:1057:SetOutputFormat: (_isInput)]
*** Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: _isInput'
I was facing the same problem and during 2 days of brainstorming found the following.
Apple says that For AVAudioOutputNode, tap format must be specified as nil. I'm not sure that it's important but in my case, that finally worked, format was nil.
You need to start recording and don't forget to stop it.
Removing tap is really important, otherwise you will have file that you can't open.
Try to save the file with the same audio settings that you used in source file.
Here's my code that finally worked. It was partly taken from this question Saving Audio After Effect in iOS.
func playSound() {
let rate: Float? = effect.speed
let pitch: Float? = effect.pitch
let echo: Bool? = effect.echo
let reverb: Bool? = effect.reverb
// initialize audio engine components
audioEngine = AVAudioEngine()
// node for playing audio
audioPlayerNode = AVAudioPlayerNode()
audioEngine.attach(audioPlayerNode)
// node for adjusting rate/pitch
let changeRatePitchNode = AVAudioUnitTimePitch()
if let pitch = pitch {
changeRatePitchNode.pitch = pitch
}
if let rate = rate {
changeRatePitchNode.rate = rate
}
audioEngine.attach(changeRatePitchNode)
// node for echo
let echoNode = AVAudioUnitDistortion()
echoNode.loadFactoryPreset(.multiEcho1)
audioEngine.attach(echoNode)
// node for reverb
let reverbNode = AVAudioUnitReverb()
reverbNode.loadFactoryPreset(.cathedral)
reverbNode.wetDryMix = 50
audioEngine.attach(reverbNode)
// connect nodes
if echo == true && reverb == true {
connectAudioNodes(audioPlayerNode, changeRatePitchNode, echoNode, reverbNode, audioEngine.mainMixerNode, audioEngine.outputNode)
} else if echo == true {
connectAudioNodes(audioPlayerNode, changeRatePitchNode, echoNode, audioEngine.mainMixerNode, audioEngine.outputNode)
} else if reverb == true {
connectAudioNodes(audioPlayerNode, changeRatePitchNode, reverbNode, audioEngine.mainMixerNode, audioEngine.outputNode)
} else {
connectAudioNodes(audioPlayerNode, changeRatePitchNode, audioEngine.mainMixerNode, audioEngine.outputNode)
}
// schedule to play and start the engine!
audioPlayerNode.stop()
audioPlayerNode.scheduleFile(audioFile, at: nil) {
var delayInSeconds: Double = 0
if let lastRenderTime = self.audioPlayerNode.lastRenderTime, let playerTime = self.audioPlayerNode.playerTime(forNodeTime: lastRenderTime) {
if let rate = rate {
delayInSeconds = Double(self.audioFile.length - playerTime.sampleTime) / Double(self.audioFile.processingFormat.sampleRate) / Double(rate)
} else {
delayInSeconds = Double(self.audioFile.length - playerTime.sampleTime) / Double(self.audioFile.processingFormat.sampleRate)
}
}
// schedule a stop timer for when audio finishes playing
self.stopTimer = Timer(timeInterval: delayInSeconds, target: self, selector: #selector(EditViewController.stopAudio), userInfo: nil, repeats: false)
RunLoop.main.add(self.stopTimer!, forMode: RunLoop.Mode.default)
}
do {
try audioEngine.start()
} catch {
showAlert(Alerts.AudioEngineError, message: String(describing: error))
return
}
//Try to save
let dirPaths: String = (NSSearchPathForDirectoriesInDomains(.libraryDirectory, .userDomainMask, true)[0]) + "/sounds/"
let tmpFileUrl = URL(fileURLWithPath: dirPaths + "effected.caf")
//Save the tmpFileUrl into global varibale to not lose it (not important if you want to do something else)
filteredOutputURL = URL(fileURLWithPath: filePath)
do{
print(dirPaths)
let settings = [AVSampleRateKey : NSNumber(value: Float(44100.0)),
AVFormatIDKey : NSNumber(value: Int32(kAudioFormatMPEG4AAC)),
AVNumberOfChannelsKey : NSNumber(value: 1),
AVEncoderAudioQualityKey : NSNumber(value: Int32(AVAudioQuality.medium.rawValue))]
self.newAudio = try! AVAudioFile(forWriting: tmpFileUrl as URL, settings: settings)
let length = self.audioFile.length
audioEngine.mainMixerNode.installTap(onBus: 0, bufferSize: 4096, format: nil) {
(buffer: AVAudioPCMBuffer?, time: AVAudioTime!) -> Void in
//Let us know when to stop saving the file, otherwise saving infinitely
if (self.newAudio.length) <= length {
do{
try self.newAudio.write(from: buffer!)
} catch _{
print("Problem Writing Buffer")
}
} else {
//if we dont remove it, will keep on tapping infinitely
self.audioEngine.mainMixerNode.removeTap(onBus: 0)
}
}
}
// play the recording!
audioPlayerNode.play()
}
#objc func stopAudio() {
if let audioPlayerNode = audioPlayerNode {
let engine = audioEngine
audioPlayerNode.stop()
engine?.mainMixerNode.removeTap(onBus: 0)
}
if let stopTimer = stopTimer {
stopTimer.invalidate()
}
configureUI(.notPlaying)
if let audioEngine = audioEngine {
audioEngine.stop()
audioEngine.reset()
}
isPlaying = false
}

Preventing playback while recording using AVAudioEngine on WatchOS

I used AVAudioEngine to gather PCM data from the microphone in iOS and it worked fine, however when I tried moving the project to WatchOS, I get feedback while recording. How would I stop playback from the speakers while recording?
var audioEngine = AVAudioEngine()
try AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default)
try AVAudioSession.sharedInstance().setActive(true)
let input = audioEngine.inputNode
let format = input.inputFormat(forBus: 0)
audioEngine.connect(input, to: audioEngine.mainMixerNode, format: format)
try! audioEngine.start()
let mixer = audioEngine.mainMixerNode
let format = mixer.outputFormat(forBus: 0)
let sampleRate = format.sampleRate
let fft_size = 2048
mixer.installTap(onBus: 0, bufferSize: UInt32(fft_size), format: format,
block: {(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
// Processing
}
For anyone else that runs into this, I fixed it by removing the connection from the inputNode to the mainMixerNode, and installed the tap straight on the inputNode. The way I was doing it before I guess creates a feedback loop where it's playing back what it's recording. Not sure why this only happens in WatchOS and not on iPhone... perhaps it was playing back from the ear speaker rather than the one next to the mic. Fixed code:
var audioEngine = AVAudioEngine()
try AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default)
try AVAudioSession.sharedInstance().setActive(true)
try! audioEngine.start()
let input = audioEngine.inputNode
let format = mixer.outputFormat(forBus: 0)
let sampleRate = format.sampleRate
let fft_size = 2048
input.installTap(onBus: 0, bufferSize: UInt32(fft_size), format: format,
block: {(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
// Processing
}

Connecting AVAudioMixerNode to AVAudioEngine

I use AVAudioMixerNode to change audio format. this entry helped me a lot. Below code gives me data i want. But i hear my own voice on phone's speaker. How can i prevent it?
func startAudioEngine()
{
engine = AVAudioEngine()
guard let engine = engine, let input = engine.inputNode else {
// #TODO: error out
return
}
let downMixer = AVAudioMixerNode()
//I think you the engine's I/O nodes are already attached to itself by default, so we attach only the downMixer here:
engine.attach(downMixer)
//You can tap the downMixer to intercept the audio and do something with it:
downMixer.installTap(onBus: 0, bufferSize: 2048, format: downMixer.outputFormat(forBus: 0), block: //originally 1024
{ (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
//i get audio data here
}
)
//let's get the input audio format right as it is
let format = input.inputFormat(forBus: 0)
//I initialize a 16KHz format I need:
let format16KHzMono = AVAudioFormat.init(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 11025.0, channels: 1, interleaved: true)
//connect the nodes inside the engine:
//INPUT NODE --format-> downMixer --16Kformat--> mainMixer
//as you can see I m downsampling the default 44khz we get in the input to the 16Khz I want
engine.connect(input, to: downMixer, format: format)//use default input format
engine.connect(downMixer, to: engine.outputNode, format: format16KHzMono)//use new audio format
engine.prepare()
do {
try engine.start()
} catch {
// #TODO: error out
}
}
You can hear your microphone recording through your speakers because your microphone is connected to downMixer, which is connected to engine.outputNode. You could probably just mute the output for the downMixer if you aren't using it with other inputs:
downMixer.outputVolume = 0.0
I did it like this to change the frequency to 48000Hz / 16 bit per sample / 2 channels, and save it to wave file:
let outputAudioFileFormat = [AVFormatIDKey: Int(kAudioFormatLinearPCM), AVSampleRateKey: 48000, AVNumberOfChannelsKey: 2, AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue]
let audioRecordingFormat : AVAudioFormat = AVAudioFormat.init(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 48000, channels: 2, interleaved: true)!
do{
try file = AVAudioFile(forWriting: url, settings: outputAudioFileFormat, commonFormat: .pcmFormatInt16, interleaved: true)
let recordingSession = AVAudioSession.sharedInstance()
try recordingSession.setPreferredInput(input)
try recordingSession.setPreferredSampleRate(audioRecordingFormat.sampleRate)
engine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: audioRecordingFormat, block: self.bufferAvailable)
engine.connect(engine.inputNode, to: engine.outputNode, format: audioRecordingFormat) //configure graph
}
catch
{
debugPrint("Could not initialize the audio file: \(error)")
}
And the function block
func bufferAvailable(buffer: AVAudioPCMBuffer, time: AVAudioTime)
{
do
{
try self.file?.write(from: buffer)
if self.onBufferAvailable != nil {
DispatchQueue.main.async {
self.onBufferAvailable!(buffer) // outside function used for analyzing and displaying a wave meter
}
}
}
catch{
self.stopEngine()
DispatchQueue.main.async {
self.onRecordEnd(false)
}
}
}
The stopEngine function is this, you should call it also when you want to stop the recording:
private func stopEngine()
{
self.engine.inputNode.removeTap(onBus: 0)
self.engine.stop()
}

Give priority to background music ( itunes ) over avfoundation player node

I'm using AVFoundation framework. Whenever the player plays the buffer, my background music gets stopped so I used below code to allow it to continue playing irrespective of the AVFoundation player.
try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, with: [.mixWithOthers,.allowBluetooth])
try audioSession.setMode(AVAudioSessionModeDefault)
try audioSession.setActive(true)
It does work but the problem is the quality of the background music gets dramatically affected. The music don't have the bass effects anymore whenever the AVPlayer plays the buffer.
I want the background music uninterrupted while using AVPlayer. Is it possible?
update : I added full code if anyone wants to check. Can feel the difference in background itune music as soon as the app is opened or the session is activated when using this code.
class ViewCosdfntroller: UIViewController {
var engine = AVAudioEngine()
let audioSession = AVAudioSession.sharedInstance()
let player = AVAudioPlayerNode()
let mixer = AVAudioMixerNode()
override func viewDidLoad() {
super.viewDidLoad()
do {
try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, with: [.mixWithOthers,.allowBluetooth])
try audioSession.setMode(AVAudioSessionModeDefault)
try audioSession.setActive(true)
} catch {
}
let input = engine.inputNode
let bus = 0
let inputFormat = input.outputFormat(forBus: bus)
let recordingFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 11025.0, channels: 1, interleaved: false)
engine.attach(player)
engine.attach(mixer)
engine.connect(input, to: mixer, format: input.outputFormat(forBus: 0))
engine.connect(player, to: engine.mainMixerNode, format: recordingFormat)
mixer.installTap(onBus: bus, bufferSize: AVAudioFrameCount(inputFormat.sampleRate * 0.4), format: inputFormat, block: { (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
let Converter:AVAudioConverter = AVAudioConverter.init(from: inputFormat, to: recordingFormat!)!
let newbuffer = AVAudioPCMBuffer(pcmFormat: recordingFormat!,frameCapacity: AVAudioFrameCount((recordingFormat?.sampleRate)! * 0.4))
let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in
outStatus.pointee = AVAudioConverterInputStatus.haveData
let audioBuffer : AVAudioBuffer = buffer
return audioBuffer
}
var error : NSError?
Converter.convert(to: newbuffer!, error: &error, withInputFrom: inputBlock)
self.player.scheduleBuffer(newbuffer!)
})
do {
try! engine.start()
player.play()
} catch {
print(error)
}
}
}
Unless this is some weird mixing quirk, the quality change you report may just be that recording categories change the default audio output device to the tiny, tinny receiver (because telephones, don't ask). Override this behaviour by adding .defaultToSpeaker to your setCategory() call:
try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, with: [.mixWithOthers,.allowBluetooth, .defaultToSpeaker])
I think you need this one:
try audioSession.setCategory(AVAudioSessionCategoryAmbient)
Documentation:
https://developer.apple.com/documentation/avfoundation/avaudiosessioncategoryambient
When you use this category, audio from other apps mixes with your audio

Resources