I am using a VoiceProcessingIO audio unit for voip calls. However, when I set the loud speaker (setting the kAudioSessionOverrideAudioRoute_Speaker audio session property), the PCM data received in the input callback by calling AudioUnitRender has a very low volume.
For a voip call, it is actually fine. The interlocutor hears it fainter, but he hears it. However, I would like to save to disk a good quality version of the input audio, possibly a raw audio from the mic.
Is it actually possible? In my tests I have not be able to do it. When VoiceProcessingIO is in use, the audio from the input-callback is just very low. Perhaps, I can get the unprocessed audio from some other source? Note, VoiceProcessingIO must still be used during the voip call.
The same question on Apple's forum is thread-655091, it has been asked 1 year ago and it has no answers.
Closest questions on SO I found are Two audio units? and Effect before render callback?, but they are more concerned about the output of VoiceProcessingIO rather than the input.
An idea would be to add a parallel "raw" RemoteIO unit to get the audio from the mic, but both in Two audio units? and in apple-forum-110816, developers say it will not be possible to add another RemoteIO in parallel to the VoiceProcessingIO, because having set
the audio session category as PlayAndRecord and the audio mode as VoiceChat, RemoteIO will not function as usual. I have not had a chance to try it, but it seems possible.
Are there other strategies? Are there some "pre-render input callbacks" called before VoiceProcessingIO unit kicks in and processes the raw data from the mic?
Is it possible to install some TAP between the mic and the VoiceProcessingIO unit?
AFAIK, there is no public API that allows getting both processed and unprocessed input from the microphone on an iOS device.
If you need processed input (voice processing for echo cancellation, etc.), then your best bet is to just add gain to the audio data for your other needs (via some DSP library, etc.), since it is float data.
Related
I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.
I am building an application which needs to do real time audio recording. I am using Swift for the project - so unable to use Novocaine library (as it has some Obj-C++ code).
What I need is get small chunks of the audio recording (real-time) which I can process or send to my websocket. Is there a Swift library that I can use to achieve this?
In addition to getting the live audio from the microphone, I also need to show a real time waveform.
Start recording
Get an event every for few bytes of recorded data, where I can send these bytes to my websocket.
Showing a waveform for the audio.
Let me know.
You do not need any of 3-rd party tools for getting audio from mic. It can be set up easily using AVAudioEngine. However, for minimising network traffic I suggest to use lame for compressing raw PCM audio stream into mp3.
Here you can find project with minimal functionality for getting mic input and compressing into mp3. In this example project mp3 stores into Documents folder, so you can try and listen to make sure it works.
From this point you can take mp3 buffer and send via socket. You can also play with lame settings to change quality, etc.
There is another branch called no-lame where same functionality implemented without lame encoding. Look here
I have an audio callback that I use to access a bufferList and analyse the audio.
I need to record this audio too. Firstly would it be wise to do the recording in the same callback?
e.g. memcpy(void *dest, ioData->mBuffers[0].mData, int byteCount);
Or should the recording have its own callback?
Either way, is this memcpy the correct way to do this and how would I write this audio to a file?
Should the totalByteCount be used with pointer arithmetic on the void * dest once the audio input completes and pass the data to a file writer?
What is the best way to record audio in a core-audio render callback?
I think you can have two different callbacks each for both input and output audio stream. Normally when you open a particular stream it could be input or output you specify the callback too. In the callback you can do all your audio processing provided that you can meeting the callback deadline otherwise there are chances that you may end up missing audio samples. A better way would be to use some kind of circular buffer and it the callback you just fill the buffer. You can do all the other processing in main thread (along with recording).
I'm not sure which audio framework you are using. I've used portaudio in my project and it worked fine. Portaudio also provide a lock free circular buffer which can be used inside callback without need for thread locking mechanism.
Following links might help you.
http://portaudio.com/docs/v19-doxydocs/paex__record_8c.html.
http://portaudio.com/docs/v19-doxydocs/paex_ocean_shore_8c.html
I'm doing an audio app where I need to manipulate audio buffers in real time for different sounds. I need a asynchronous clock running in the background that i can pull the time in various callbacks good to the millisecond for various audio manipulation purposes like the stop and start times of my audio playback, and when it records and stops recording.
I need it to be so such that the precision allows me to determine latency caused by the various processes and compensate for it in my code.
How would one implement such a clock in iOS?
If you are using the RemoteIO Audio Unit and requesting very short audio buffers from the Audio Session, then counting Remote callbacks and PCM samples within the callback buffer appears to be the most precise way to align audio with sub-millisecond resolution. Anything else will be off by the amount of latency between callbacks.
You can also use the timer in the mach_time.h header for fairly precise monotonic time measurement, but this won't be sync'd to audio playback rate, nor account for various latencies between subsystems.
If you have not already done so, I would definately advise you to look into the AVFoundation Framework, and take use of Core Audio. You can forget about the native NSTimer and NSDate options, considering your need of accuracy.
I'm been happily synthesizing audio (at 44.1khz) and sending it out through the RemoteIO audio unit. It's come to my attention that my app's audio is "garbled" when going out via HDMI to a certain model of TV. It looks to me like the problem is related to the fact that this TV is looking for audio data at 48khz.
Here are some questions:
Does RemoteIO adopt the sample rate of whichever device it's outputting to? If I'm sending audio via HDMI to a device that asks for 48kz, do my RemoteIO callback buffers become 48khz?
Is there some tidy way to just force RemoteIO to still think in terms of 44.1khz, and be smart enough to perform any necessary sample rate conversions on its own, before it hands data off to the device?
If RemoteIO does indeed just defer to the device it's connected to, then presumably I need to do some sample rate conversion between my synthesis engine and remote IO. Is AudioConverterConvertComplexBuffer the best way to do this?
Fixed my problem. I was incorrectly assuming that the number of frames requested by the render callback would be a power of two. Changed my code to accommodate any arbitrary number of frames and all seems to work fine now.
If you want sample rate conversion, try using the Audio Queue API, or do the conversion within your own app using some DSP code.
Whether the RemoteIO buffer size or sample rate can be configured or not might depend on iOS device model, OS version, audio routes, background modes, etc., so an app must accomodate different buffer sizes and sample rates when using RemoteIO.