Recording voice while playing music - filter speakers input (iOS) - ios

I am developing an Karaoke app in which you can record your voice while listening to the music. When user uses headphones, everything is great - he can listen to the music and himself in headphones while singing. Then we have his pure voice recorded and we can mix it with playback.
Problem occurs when user does not use headphones. Then we play music via speakers AVAudioSessionCategoryPlayAndRecord and record simultaneously. In final recording we have user's voice and playback from speakers mixed together. Problem is that playback's volume is very big and it's "covering" user's voice. Firstly I thought that this is normal behaviour because speakers are close to microphone so there is nothing I can do.
However when I tried the same thing on Garage Band it somehow lowers playback from speakers making voice more hearable.
I also tried it with Instagram (you can record while playing music e.g. from Spotify) and I noticed that after ~1 sec. playback's volume is decreasing and we can hear voice more precisely.
I don't think that it's post processing because it would be very complicated so maybe there is an option to let "iOS handle it".
To be clear - it does not lowers playback during recording - it's "done" while listening final video.
I use AVCaptureSession for recording and AudioKit Player for playing.
Thanks in advance for any thoughts/tips/advices!
Regards

Ok so I asked Apple TS and the respond was exactly what I wanted: https://developer.apple.com/documentation/avfoundation/avaudiosession/mode/1616455-voicechat You just have to set this mode in AVAudioSession and system will handle it device’s tonal equalization is optimized for voice

iOS cannot 'just handle' that, there is no "filter out the music" function. The fact that it doesn't do it live, but does so later or with a delay strongly implies they are doing some post processing. I'm not a machine learning expert, but I think if you just used an equalizer and a noise gate you could get this effect. It'd be hard to extract an acapella but you could certainly improve it. Likely Instagram takes that second to identify where the voice frequencies are so it knows how to EQ the signal.

Related

Is there any possibility to read the frequency of the currently playing song with Swift?

I'm new to iOS programming and I don't know where to start. I found code examples how to read frequencies from the microphone with AudioKit framework. But this is not what I am looking for. Is it possible to retrieving frequency of the currently playing song in real time without using a microphone?
Thank you for help.
The iOS security sandbox prevents apps from capturing general audio output of any other app, such as the Music app.
Certain music apps, such as GarageBand might share inter-app audio, but this isn't supported by the majority of apps that output "songs".
An app might play the "song" itself, via an AVAudioPlayer, and tap the AVPlayer's output to get raw sample data for spectral frequency and pitch analysis (two very different things, by-the-way).

Managing text-to-speech and speech recognition at same time in iOS

I'd like my iOS app to use text-to-speech to read to the user some information that it receives from a server, and I'd also like to allow the user to stop such speech by a voice command. I have tried speech recognition frameworks for iOS like OpenEars and I find the problem that it is listening and detecting the information the app itself is "saying" and it intereferes in the recognition of user's voice commands.
Has somebody dealt with this scenario in iOS and found a solution for that? Thanks in advance
It is not a trivial thing to implement. Unfortunately iOS and others record the sound which is playing through speaker. The only choice you have is to use the headset. In that case speech recognition can continue listening for input. In Openears recognition is disabled during TTS unless headset is plugged in.
If you still want to implement this feature which is called "barge-in" you have to do the following:
Store the audio you play though microphone
Implement noise cancellation algorithm which effectively will remove the audio from the recording. You can use cross-correlation to find a proper offset in the recording and spectral subtraction to remove the audio.
Recognize the speech in remaining signal.
It is not possible to do that without significant modification of openears sources.
Related question is Android Speech Recognition while music is playing

Audio record and play simultaneously

I am trying to develop an iOS app which reads sound from the microphone, apply some effects and play it through the headset instantly, may be with some acceptable delay.
Is this possible? As a first step, i am trying to play the sound received from microphone in my headsets at the same time, but struggling to do so...
I was able to record the sound, save it and then play it easily. Relevant questions, articles couldn't be found easily. Any ideas, links are much appreciated
I did check Apple's aurioTouch. I couldn't find simultaneous record and play of same signal.
Request the shortest buffers possible using audio session APIs (less than 6 mS is possible on most iOS devices). Then feed the raw audio samples you get from RemoteIO recording callbacks to the buffers in the RemoteIO play callbacks, possibly using a lock free circular fifo in between.

Does AVPlayer stop the user's music?

I want to know -- what happens to a user's music when an AVPlayer starts to play? Does nothing happen, does it stop, or does the action cancel itself? If so, how would I check and see if the user is playing music?
Apple's Documentation for this confused me.
Alright, looking up the info for this, it seems that AVAudioPlayer objects only interrupts the user's audio if the app's AVAudioSession's category is set to Solo Ambient, the default.
I saw this answer: App with AVPlayer plays mp4 interrupt iPod music after launched
I also saw this table in the Apple Developer documentation that explains whether or not different categories interrupt the user's music, with examples of different scenarios, including one for a game: https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/MobileHIG/Sound.html#//apple_ref/doc/uid/TP40006556-CH44-SW1
In order to not interrupt the user's music, I need to set this to Ambient.

Intercept/modify audio stream on iOS

I am looking at the feasibility of getting the current raw audio stream playing and do stuff with it such as stream it over Bluetooth or equalize it, etc. Is there any way to do this in iOS 8?
For example: apps such as Pandora/Spotify are playing music and I want to access the audio they are playing.
To process audio from another app, that app needs to participate in Inter-App Audio.
I don't know if your example apps do that.

Resources