I have been charged to add VOIP into an game (cross-platform, so can't use the Apple gamekit to do it).
For 3 or 4 days now, i'm trying to get my head wrap around audio unit and remoteIO...
I have overlooked tens of examples and such, but every time it is only applying a simple algorithm to the input PCM and play it back on the speaker.
According to Apple's documentation in order to do VOIP we should use kAudioSessionCategory_PlayAndRecord.
UInt32 audioCategory = kAudioSessionCategory_PlayAndRecord;
status = AudioSessionSetProperty(kAudioSessionProperty_AudioCategory,
sizeof(audioCategory),
&audioCategory);
XThrowIfError(status, "couldn't set audio category");
1) But it seems (to me) that playAndRecord will always play what coming from the mic (or more excatly the PerformThru callback // aurioTouch), am I wrong ?
I have the simplest callback, doing nothing but AURender
static OSStatus PerformThru(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
OSStatus err = AudioUnitRender(THIS->rioUnit, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData);
if (err)
printf("PerformThru: error %d\n", (int)err);
return err
}
From that callback I'm intending to send data to the peer (Not directly of course, but data will come from it)...
I do not see how I can play different output than the input, except maybe with 2 units, one recording, one playing, but it doesn't seems to be what Apple intended to (still accroding to the documentation).
And of course, I cannot find any documentation about it, audio unit is still pretty much un-documented...
Anyone would have an idea on what would be the best way to do it ?
I have not used VOIP or kAudioSessionCategory_PlayAndRecord. But if you want to record/transmit voice picked up from the mic and play back incoming data from network packages: Here is a good sample which included both mic and playback. Also if you have not read this doc from Apple, I would strongly recommend this.
In short: You need to create an AudioUnits instance. In it, configure two callbacks: one for mic and one for playback. The callback mic function will supply you the data that was picked up from the mic. You then can convert and transmit to other devices with whatever chosen network protocol. The playback callback function is where you supply the incoming data from other network devices to play back.
You can see this simple example. It describes how to use remote IO unit. After understanding this example, you should watch PJSIP's audio driver. These should help you implementing your own solution. Best of luck.
Related
I'm using Linphone SDK for a VoIP iOS app. And I found the proximity sensor (the one that will dim your screen when you put the phone close to ear) affects the incoming voice badly.
I found The inBusNumber for input render callback will increase to 1024 when the proximity is covered, normally it's 256. When it happens it also cause about 180ms time gap that Audio Unit doesn't trigger this callback, which destroy Linphone's buffer strategy.
setup render callback:
AURenderCallbackStruct renderCallbackStruct;
renderCallbackStruct.inputProc = au_write_cb;
renderCallbackStruct.inputProcRefCon = card;
auresult=AudioUnitSetProperty (
card->io_unit,
kAudioUnitProperty_SetRenderCallback,
kAudioUnitScope_Input,
outputBus,
&renderCallbackStruct,
sizeof (renderCallbackStruct)
);
In the render callback:
static OSStatus au_write_cb (
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
//it changes to 1024 when proximity sensor is triggered
UInt32 inNumberFrames,
AudioBufferList *ioData
) {}
In my understanding the inNumberFrames will only change in circumstance of switching playback devices (such as switching earphone to bluetooth). Is there any way that I can fix this figure when the proximity sensor is triggered?
I also try to set kAudioUnitProperty_MaximumFramesPerSlice to 256 and setPreferredIOBufferDuration of audio session, but both don't work.
I download Apple official demo named Speakerbox, and I found their render callback's inNumberFrames persists to 256 no matter how I trigger the proximity sensor. I compared the Apple's code and mine but I can't find any difference that may cause this. Appreciate any help, thank you.
Your understanding is incorrect. iOS can change inNumberFrames for other reasons, such as for currently running app life cycle state(s), and for power management changes. An app's audio unit buffer management strategy needs to tolerate such changes in buffer size, such as by audio dropout/error concealment or resynchronization.
As for differences in iOS buffer size behavior, those might be modified by the app's choice of audio unit, audio session type and options, and background mode options.
I have an app, where audio recording is the main and the most important part. However user can switch to table view controller where all records are displayed and no recording is performed.
The question is what approach is better: "start & stop audio system or just start it". It may seem obvious that the first one is more correct, like "allocate when you need it, deallocate when used it". I will show my thoughts on this question and I hope to find approval or disapproval with arguments among skilled people.
When I constructed AudioController.m the first time I implemented methods to open/close audio session and to start/stop audio unit. I wanted to stop audio system when recording is not active. I used the following code:
- (BOOL)startAudioSystem {
// open audio session
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
NSError *err = nil;
if (![audioSession setActive:YES error:&err] ) {
NSLog(#"Couldn't activate audio session: %#", err);
}
// start audio unit
OSStatus status;
status = AudioOutputUnitStart([self audioUnit]);
BOOL noErrors = err == nil && status == noErr;
return noErrors;
}
and
- (BOOL)stopAudioSystem {
// stop audio unit
BOOL result;
result = AudioOutputUnitStop([self audioUnit]) == noErr;
HANDLE_RESULT(result);
// close audio session
NSError *err;
HANDLE_RESULT([[AVAudioSession sharedInstance] setActive:NO withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&err]);
HANDLE_ERROR(err);
BOOL noErrors = err == nil && result;
return noErrors;
}
I found this approach problematic because of the following reasons:
Audio system starts with delay. That means, recording_callback() not called for some time. I suspect it is AudioOutputUnitStart, which is responsible for that. I tried to comment out the line with this function call and move it to initialization. the delay was gone.
If user performs switching between recording view and table view very very fast (audio system's starts and stops are very fast too), it cause the death of media service (I know that observing AVAudioSessionMediaServicesWereResetNotification could help here, but it is not the point).
To resolve these issues I modified AudioController.m with other approach which I managed to discover: start audio system when application becomes active and do not stop it before the app is terminated In this case there are also several issues:
CPU usage
If audio category is set to recording only, then no other audio could be played when user explores table view controller.
The first one surprisingly is not a big deal, if cancel any kind of processing in recording_callback() like this:
static OSStatus recordingCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
AudioController *input = (__bridge AudioController*)inRefCon;
if(!input->shouldPerformProcessing)
return noErr;
// processing
// ...
//
return noErr;
}
By doing this CPU usage equals to 0% on real device, when no recording is needed and no other actions are performed.
And the second issue can be solved by switching audio category to RecordAndPlay and enable mixing or just ignore the problem. For example in my case app requires mini Jack to be used by external device, so no headphones can be used in parallel.
Despite all this, the first approach is more close to me since I like to close/clean every stream/resource when it is no longer needed. And I want to be sure that there is indeed no other option than just start audio system. Please make me sure that I'm not the only one who came to this solution and it is the correct one.
The key to solving this problem is to note that the audio system actually runs in another (real-time) thread. And you can't really stop and deallocate something running in another thread exactly when you (or the app's main UI thread) "don't need it", but have to delay in order to allow the other thread to realize it needs to do something and then finish and clean up itself. This can potentially take up to many 100's of milliseconds for audio.
Given that, strategy 2 (just start) is safer and more realistic.
Or perhaps set a delay of many many seconds of non-use before attempting to stop audio, and possibly another short delay after that before attempting any restart.
I'm writing an app that should mix several sounds from disk and save resulting file to disk. I'm trying to use Audio Units.
I used Apple's MixerHost as a base for my app. It has Multichannel mixer connected to Remote I/O. When I'm trying to add render callback to remote IO I've got error -10861 "The attempted connection between two nodes cannot be made." when call AUGraphConnectNodeInput(...).
What I'm doing wrong? What's the right way to mix and record file to disk?
callback stub:
static OSStatus saveToDiskRenderCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
return noErr;
}
adding callback to Remote I/O Unit:
AURenderCallbackStruct saveToDiskCallbackStruct;
saveToDiskCallbackStruct.inputProc = &saveToDiskRenderCallback;
result = AUGraphSetNodeInputCallback (
processingGraph,
iONode,
0,
&saveToDiskCallbackStruct
);
error here:
result = AUGraphConnectNodeInput (
processingGraph,
mixerNode, // source node
0, // source node output bus number
iONode, // destination node
0 // desintation node input bus number
);
You are confused on how audio units works.
The node input callback (as set by AUGraphSetNodeInputCallback) and the node input connection (as set by AUGraphConnectNodeInput) are both on the same input side of your remote IO unit. It looks you believe that the input callback will be the output of your graph. This is wrong.
AUGraph offers two paths to feed the input of an AudioUnit:
Either from another upstream node (AUGraphConnectNodeInput)
or from a custom callback (AUGraphSetNodeInputCallback),
So you can't set them both simulatenously, it has no meaning.
Now two possibilities
1) Real time monitoring
This is not what you describe but this is the easier to get from where you are. So I assume you want to listen to the mix on the Remote I/O while it is being processed (in real time).
Then Read this
2) offline rendering
If you don't plan to listen in real time (which is what I understood first from your description), then the remote IO has nothing to do here since its purpose is to talk to a physical output. Then read that. It replaces the remote I/O unit with a Generic Output Unit. Be careful that the graph is not run in the same way.
I am using the render callback of the ioUnit to store the audio data into a circular buffer:
OSStatus ioUnitRenderCallback(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
OSStatus err = noErr;
AMNAudioController *This = (__bridge AMNAudioController*)inRefCon;
err = AudioUnitRender(This.encoderMixerNode->unit,
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
ioData);
// Copy the audio to the encoder buffer
TPCircularBufferCopyAudioBufferList(&(This->encoderBuffer), ioData, inTimeStamp, kTPCircularBufferCopyAll, NULL);
return err;
}
I then want to read the bytes out of the circular buffer, feed them to libLame and then to libShout.
I have tried starting a thread and using NSCondition to make it wait until data is available but this causes all sorts of issues due to using locks on the Core Audio callback.
What would be the recommended way to do this?
Thanks in advance.
More detail on how I implemented Adam's answer
I ended up taking Adam's advice and implemented it like so.
Producer
I use TPCircularBufferProduceBytes in the Core Audio Render callback to add the bytes to the circular buffer. In my case I have non-interleaved audio data so I ended up using two circular buffers.
Consumer
I spawn a new thread using pthread_create
Within the new thread create a new CFTimer and add it to the current
CFRunLoop (an interval of 0.005 seconds appears to work well)
I tell the current CFRunLoop to run
Within my timer callback I encode the audio and send it to the server (returning quickly if no data is buffered)
I also have a buffer size of 5MB which appears to work well (2MB was giving me overruns). This does seem a bit high :/
Use a repeating timer (NSTimer or CADisplayLink) to poll your lock-free circular buffer or FIFO. Skip doing work if there is not enough data in the buffer, and return (to the run loop). This works because you know the sample rate with high accuracy, and how much data you prefer or need to handle at a time, so can set the polling rate just slightly faster, to be on the safe side, but still be very close to the same efficiency as using conditional locks.
Using semaphores or locks (or anything else with unpredictable latency) in a real-time audio thread callback is not recommended.
You're on the right track, but you don't need NSCondition. You definitely don't want to block. The circular buffer implementation you're using is lock free and should do the trick. In the audio render callback, put the data into the buffer by calling TPCircularBufferProduceBytes. Then in the reader context (a timer callback is good, as hotpaw suggests), call TPCircularBufferTail to get the tail pointer (read address) and number of available bytes to read, and then call TPCircularBufferConsume to do the actual reading. Now you've done the transfer without taking any locks. Just make sure the buffer you allocate is large enough to handle the worst-case condition where your reader thread gets held off by the os for whatever reason, otherwise you can hit a buffer overrun condition and will lose data.
Recently, I was watching aurioTouch.
But I can't understand this sentence:
OSStatus err = AudioUnitRender (THIS-> rioUnit, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData);
According to apple documentaion explains:Initiates a rendering cycle for an audio unit.
But I feel are ambiguous.What is it to do?
Core Audio works on a "pull" model, where the output unit starts the process off by asking for audio samples from the unit connected to its input bus. Likewise, the unit connected to the output unit asks for samples connected to its input bus. Each of those "asks" is rendering cycle.
AudioUnitRender() typically passes in a buffer of samples that your audio unit can optionally process in some way. That buffer is the last argument in the function, ioData. inNumberFrames are the number of frames being passed in by ioData. 1 is the output element or 'bus' to render for (this could change depending on your configuration). rioUnit is the audio unit in question that is doing the processing.
Apple's Audio Unit Hosting Guide contains a section on rendering which I've found helpful.