Using AVCaptureSession and Audio Units Together Causes Problems for AVAssetWriterInput - ios

I'm working on an iOS app that does two things at the same time:
It captures audio and video and relays them to a server to provide video chat functionality.
It captures local audio and video and encodes them into an mp4 file to be saved for posterity.
Unfortunately, when we configure the app with an audio unit to enable echo cancellation, the recording functionality breaks: the AVAssetWriterInput instance we're using to encode audio rejects incoming samples. When we don't set up the audio unit, recording works, but we have terrible echo.
To enable echo cancellation, we configure an audio unit like this (paraphrasing for the sake of brevity):
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
AudioComponent comp = AudioComponentFindNext(NULL, &desc);
OSStatus status = AudioComponentInstanceNew(comp, &_audioUnit);
status = AudioUnitInitialize(_audioUnit);
This works fine for video chat, but it breaks the recording functionality, which is set up like this (again, paraphrasing—the actual implementation is spread out over several methods).
_captureSession = [[AVCaptureSession alloc] init];
// Need to use the existing audio session & configuration to ensure we get echo cancellation
_captureSession.usesApplicationAudioSession = YES;
_captureSession.automaticallyConfiguresApplicationAudioSession = NO;
[_captureSession beginConfiguration];
AVCaptureDeviceInput *audioInput = [[AVCaptureDeviceInput alloc] initWithDevice:[self audioCaptureDevice] error:NULL];
[_captureSession addInput:audioInput];
_audioDataOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioDataOutput setSampleBufferDelegate:self queue:_cameraProcessingQueue];
[_captureSession addOutput:_audioDataOutput];
[_captureSession commitConfiguration];
And the relevant portion of captureOutput looks something like this:
NSLog(#"Audio format, channels: %d, sample rate: %f, format id: %d, bits per channel: %d", basicFormat->mChannelsPerFrame, basicFormat->mSampleRate, basicFormat->mFormatID, basicFormat->mBitsPerChannel);
if (_assetWriter.status == AVAssetWriterStatusWriting) {
if (_audioEncoder.readyForMoreMediaData) {
if (![_audioEncoder appendSampleBuffer:sampleBuffer]) {
NSLog(#"Audio encoder couldn't append sample buffer");
}
}
}
What happens is the call to appendSampleBuffer fails, but—and this is the strange part—only if I don't have earphones plugged into my phone. Examining the logs produced when this happens, I found that without earphones connected, the number of channels reported in the log message was 3, whereas with earphones connected, the number of channels was 1. This explains why the encode operation was failing, since the encoder was configured to expect just a single channel.
What I don't understand is why I'm getting three channels here. If I comment out the code that initializes the audio unit, I only get a single channel and recording works fine, but echo cancellation doesn't work. Furthermore, if I remove these lines
// Need to use the existing audio session & configuration to ensure we get echo cancellation
_captureSession.usesApplicationAudioSession = YES;
_captureSession.automaticallyConfiguresApplicationAudioSession = NO;
recording works (I only get a single channel with or without headphones), but again, we lose echo cancellation.
So, the crux of my question is: why am I getting three channels of audio when I configure an audio unit to provide echo cancellation? Furthermore, is there any way to prevent this from happening or to work around this behavior using AVCaptureSession?
I've considered piping the microphone audio directly from the low-level audio unit callback into the encoder, as well as to the chat pipeline, but it seems like conjuring up the necessary Core Media buffers to do so would be a bit of work that I'd like to avoid if possible.
Note that the chat and recording functions were written by different people—neither of them me—which is the reason this code isn't more integrated. If possible, I'd like to avoid having to refactor the whole mess.

Ultimately, I was able to work around this issue by gathering audio samples from the microphone via the I/O audio unit, repackaging these samples into a CMSampleBuffer, and passing the newly constructed CMSampleBuffer into the encoder.
The code to do the conversion looks like this (abbreviated for the sake of brevity).
// Create a CMSampleBufferRef from the list of samples, which we'll own
AudioStreamBasicDescription monoStreamFormat;
memset(&monoStreamFormat, 0, sizeof(monoStreamFormat));
monoStreamFormat.mSampleRate = 48000;
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
monoStreamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved;
monoStreamFormat.mBytesPerPacket = 2;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerFrame = 2;
monoStreamFormat.mChannelsPerFrame = 1;
monoStreamFormat.mBitsPerChannel = 16;
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0, NULL, 0, NULL, NULL, &format);
// Convert the AudioTimestamp to a CMTime and create a CMTimingInfo for this set of samples
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 48000), presentationTime, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
kCFAllocatorDefault,
kCFAllocatorDefault,
0,
samples);
// Pass the buffer into the encoder...
Please note that I've removed error handling and cleanup of the allocated objects.

Related

AudioUnit is not playing back audio on tvOS

I'm working on an application for tvOS platform for playing back audio using WebRTC (https://webrtc.org/). WebRTC uses AudioUnit for audio playout (https://chromium.googlesource.com/external/webrtc/+/7a82467d0db0d61f466a1da54b94f6a136726a3c/sdk/objc/native/src/audio/voice_processing_audio_unit.mm). It works perfectly on iOS, but produces errors on tvOS.
First of all I've disabled audio capturing at all. The first error happens when creating a Voice Processing IO audio unit:
// Create an audio component description to identify the Voice Processing
// I/O audio unit.
AudioComponentDescription vpio_unit_description;
vpio_unit_description.componentType = kAudioUnitType_Output;
vpio_unit_description.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
vpio_unit_description.componentManufacturer = kAudioUnitManufacturer_Apple;
vpio_unit_description.componentFlags = 0;
vpio_unit_description.componentFlagsMask = 0;
// Obtain an audio unit instance given the description.
AudioComponent found_vpio_unit_ref =
AudioComponentFindNext(nullptr, &vpio_unit_description);
// Create a Voice Processing IO audio unit.
OSStatus result = noErr;
result = AudioComponentInstanceNew(found_vpio_unit_ref, &vpio_unit_);
if (result != noErr) {
vpio_unit_ = nullptr;
RTCLogError(#"AudioComponentInstanceNew failed. Error=%ld.", (long)result);
return false;
}
AudioComponentInstanceNew returns -3000 OSStatus (I assume it means an invalid component ID). This issue can be fixed by replacing kAudioUnitSubType_VoiceProcessingIO → kAudioUnitSubType_GenericOutput (I'm not sure this is a correct replacement, but the error is gone).
After that WebRTC is trying to enable output
// Enable output on the output scope of the output element.
UInt32 enable_output = 1;
result = AudioUnitSetProperty(vpio_unit_, kAudioOutputUnitProperty_EnableIO,
kAudioUnitScope_Output, kOutputBus,
&enable_output, sizeof(enable_output));
if (result != noErr) {
DisposeAudioUnit();
RTCLogError(#"Failed to enable output on output scope of output element. "
"Error=%ld.",
(long)result);
return false;
}
and this doesn't work as well: it returns -10879 OSStatus (I assume it means an invalid property). I think the problem is in providing kAudioOutputUnitProperty_EnableIO property, but have no idea why should be utilized instead.
Any ideas pr advices are very much appreciated. Thanks in advance.

Muxing AAC audio and h.264 video streams to mp4 with AVFoundation

For OSX and IOS, I have streams of real time encoded video (h.264) and audio (AAC) data coming in, and I want to be able to mux these together into an mp4.
I'm using an AVAssetWriterto perform the muxing.
I have video working, but my audio still sounds like jumbled static. Here's what I'm trying right now (skipping some of the error checks here for brevity):
I initialize the writer:
NSURL *url = [NSURL fileURLWithPath:mContext->filename];
NSError* err = nil;
mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];
I initialize the audio input:
NSDictionary* settings;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
settings = nil; // set output to nil so it becomes a pass-through
CMAudioFormatDescriptionRef audioFormatDesc = nil;
{
AudioStreamBasicDescription absd = {0};
absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
absd.mFormatID = kAudioFormatMPEG4AAC;
absd.mFormatFlags = kMPEG4Object_AAC_Main;
CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
}
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
[mContext->writer addInput:mContext->aacWriterInput];
And start the writer:
[mContext->writer startWriting];
[mContext->writer startSessionAtSourceTime:kCMTimeZero];
Then, I have a callback where I receive a packet with a timestamp (milliseconds), and a std::vector<uint8_t> with the data containing 1024 compressed samples. I make sure isReadyForMoreMediaData is true. Then, if this is our first time receiving the callback, I set up the CMAudioFormatDescription:
OSStatus error = 0;
AudioStreamBasicDescription streamDesc = {0};
streamDesc.mSampleRate = mParameters.audioSampleRate;
streamDesc.mFormatID = kAudioFormatMPEG4AAC;
streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
streamDesc.mChannelsPerFrame = 2; // always stereo for us
streamDesc.mBitsPerChannel = 0;
streamDesc.mBytesPerFrame = 0;
streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
streamDesc.mBytesPerPacket = 0;
streamDesc.mReserved = 0;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);
And finally, I create a CMSampleBufferRef and send it along:
CMSampleBufferRef buffer = NULL;
CMBlockBufferRef blockBuffer;
CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());
CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
CMTime pts = CMTimeMake(packet.timestamp, 1000);
CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };
size_t sampleSizeArray[1] = {packet.data.size()};
error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);
// First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
if (mContext->firstAudioFrame)
{
CFDictionaryRef dict = NULL;
dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
// we must trim the start time on first audio frame...
mContext->firstAudioFrame = false;
}
CMSampleBufferMakeDataReady(buffer);
BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];
I guess the part I'm most suspicious of is my call to CMSampleBufferCreate. It seems I have to pass in a sample sizes array, otherwise I get this error message immediately when checking my writer's status:
Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}
Where the underlying error appears to be kCMSampleBufferError_BufferHasNoSampleSizes.
I did notice an example in Apple's documentation for creating the buffer with AAC data:
https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc
In their example, they specify a long sampleSizeArray with an entry for every single sample. Is that necessary? I don't have that information with this callback. And in our Windows implementation we didn't need that data. So I tried sending in packet.data.size() as the sample size but that doesn't seem right and it certainly doesn't produce pleasant audio.
Any ideas? Either tweaks to my calls here or different APIs I should be using to mux together streams of encoded data.
Thanks!
If you don't want to transcode, do not pass the outputSetting dictionary. You should pass nil there:
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:nil sourceFormatHint:audioFormatDesc];
It is explained somewhere in this article:
https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/05_Export.html

Core audio: file playback render callback function

I am using RemoteIO Audio Unit for audio playback in my app with kAudioUnitProperty_ScheduledFileIDs.
Audio files are in PCM format. How can I implement a render callback function for this case, so I could manually modify buffer samples?
Here is my code:
static AudioComponentInstance audioUnit;
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_RemoteIO;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
AudioComponent comp = AudioComponentFindNext(NULL, &desc);
CheckError(AudioComponentInstanceNew(comp, &audioUnit), "error AudioComponentInstanceNew");
NSURL *playerFile = [[NSBundle mainBundle] URLForResource:#"short" withExtension:#"wav"];
AudioFileID audioFileID;
CheckError(AudioFileOpenURL((__bridge CFURLRef)playerFile, kAudioFileReadPermission, 0, &audioFileID), "error AudioFileOpenURL");
// Determine file properties
UInt64 packetCount;
UInt32 size = sizeof(packetCount);
CheckError(AudioFileGetProperty(audioFileID, kAudioFilePropertyAudioDataPacketCount, &size, &packetCount),
"AudioFileGetProperty(kAudioFilePropertyAudioDataPacketCount)");
AudioStreamBasicDescription dataFormat;
size = sizeof(dataFormat);
CheckError(AudioFileGetProperty(audioFileID, kAudioFilePropertyDataFormat, &size, &dataFormat),
"AudioFileGetProperty(kAudioFilePropertyDataFormat)");
// Assign the region to play
ScheduledAudioFileRegion region;
memset (&region.mTimeStamp, 0, sizeof(region.mTimeStamp));
region.mTimeStamp.mFlags = kAudioTimeStampSampleTimeValid;
region.mTimeStamp.mSampleTime = 0;
region.mCompletionProc = NULL;
region.mCompletionProcUserData = NULL;
region.mAudioFile = audioFileID;
region.mLoopCount = 0;
region.mStartFrame = 0;
region.mFramesToPlay = (UInt32)packetCount * dataFormat.mFramesPerPacket;
CheckError(AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ScheduledFileRegion, kAudioUnitScope_Global, 0, &region, sizeof(region)),
"AudioUnitSetProperty(kAudioUnitProperty_ScheduledFileRegion)");
// Prime the player by reading some frames from disk
UInt32 defaultNumberOfFrames = 0;
CheckError(AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ScheduledFilePrime, kAudioUnitScope_Global, 0, &defaultNumberOfFrames, sizeof(defaultNumberOfFrames)),
"AudioUnitSetProperty(kAudioUnitProperty_ScheduledFilePrime)");
AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = MyCallback;
callbackStruct.inputProcRefCon = (__bridge void * _Nullable)(self);
CheckError(AudioUnitSetProperty(audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &callbackStruct, sizeof(callbackStruct)), "error AudioUnitSetProperty[kAudioUnitProperty_setRenderCallback]");
CheckError(AudioUnitInitialize(audioUnit), "error AudioUnitInitialize");
Callback function:
static OSStatus MyCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData){
printf("my callback");
return noErr;
}
Audio Unit start playback on button press:
- (IBAction)playSound:(id)sender {
CheckError(AudioOutputUnitStart(audioUnit), "error AudioOutputUnitStart");
}
This code fails during compiling with kAudioUnitErr_InvalidProperty(-10879) error. The goal is to modify buffer samples that has been read from the AudioFileID and send the result to the speakers.
Seeing as how you are just getting familiar with core audio, I suggest you first get your remoteIO callback working independently of your file player. Just remove all of your file player related code and try to get that working first.
Then, once you have that working, move on to incorporating your file player.
As far as what I can see that's wrong, I think you are confusing the Audio File Services API with an audio unit. This API is used to read a file into a buffer which you would manually feed to to remoteIO, if you do want to go this route, use the Extended Audio File Services API, it's a LOT easier. The kAudioUnitProperty_ScheduledFileRegion property is supposed to be called on a file player audio unit. To get one of those, you would need to create it the same way as your remmoteIO with the exception that AudioComponentDescription's componentSubType and componentType are kAudioUnitSubType_AudioFilePlayer and kAudioUnitType_Generator respectively. Then, once you have that unit you would need to connect it to the remoteIO using the kAudioUnitProperty_MakeConnection property.
But seriously, start with just getting your remoteIO callback working, then try making a file player audio unit and connecting it (without the callback), then go from there.
Ask very specific questions about each of these steps independently, posting code you have tried that's not working, and you'll get a ton of help.

Core Audio - Interapp Audio - How to Retrieve output audio packets from Node app inside Host App?

I am writing an HOST app that uses Core Audio's new iOS 7 Inter App Audio technology to pull audio from a single NODE "generator" app and route it into my host app. I am using the Audio Components Services and Audio Unit Component Services C frameworks to achieve this.
What I want to achieve is to establish a connection to an external node app who can generate sound. I want that sound to be routed into my host app and for my host app to be able to directly access the audio packet data as a stream of raw audio data.
I have written the code inside my HOST app that does the following sequentially:
Sets up and activates an audio session with the correct session category.
Refreshes a list of interapp audio compatible apps who are of typekAudioUnitType_RemoteGenerator or kAudioUnitType_RemoteInstrument (I'm not interested in effects apps).
Pulls out the last object from that list and attempts to establish a connection using AudioComponentInstanceNew()
Sets the Audio Stream Basic Description that my host app needs the audio format in.
Sets up audio unit properties and callbacks as well as an audio unit render callback on the output scope (bus).
Initializes the audio unit.
So far so good, I have been able to successfully establish a connection, but my problem is that my render callback is not being called at all. What I am having trouble understanding is how exactly to pull the audio from the node application? I have read that I need to call AudioUnitRender() in order to initiate a rendering cycle on the node app, but how exactly does this need to be setup in my situation? I have seen other examples where AudioUnitRender() is called from inside the rendering callback, but this isnt going to work for me because my render callback isn't being called currently. Do I need to setup up my own audio processing thread and periodically call AudioUnitRender() on my 'node'?
The following is the code described above inside my HOST app.
static OSStatus MyAURenderCallback (void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
//Do something here with the audio data?
//This method is never being called?
//Do I need to puts AudioUnitRender() in here?
}
- (void)start
{
[self configureAudioSession];
[self refreshAUList];
}
- (void)configureAudioSession
{
NSError *audioSessionError = nil;
AVAudioSession *mySession = [AVAudioSession sharedInstance];
[mySession setPreferredSampleRate: _graphSampleRate error: &audioSessionError];
[mySession setCategory: AVAudioSessionCategoryPlayAndRecord error: &audioSessionError];
[mySession setActive: YES error: &audioSessionError];
self.graphSampleRate = [mySession sampleRate];
}
- (void)refreshAUList
{
_audioUnits = #[].mutableCopy;
AudioComponentDescription searchDesc = { 0, 0, 0, 0, 0 }, foundDesc;
AudioComponent comp = NULL;
while (true) {
comp = AudioComponentFindNext(comp, &searchDesc);
if (comp == NULL) break;
if (AudioComponentGetDescription(comp, &foundDesc) != noErr) continue;
if (foundDesc.componentType == kAudioUnitType_RemoteGenerator || foundDesc.componentType == kAudioUnitType_RemoteInstrument) {
RemoteAU *rau = [[RemoteAU alloc] init];
rau->_desc = foundDesc;
rau->_comp = comp;
AudioComponentCopyName(comp, &rau->_name);
rau->_image = AudioComponentGetIcon(comp, 48);
rau->_lastActiveTime = AudioComponentGetLastActiveTime(comp);
[_audioUnits addObject:rau];
}
}
[self connect];
}
- (void)connect {
if ([_audioUnits count] <= 0) {
return;
}
RemoteAU *rau = [_audioUnits lastObject];
AudioUnit myAudioUnit;
//Node application will get launched in background
Check(AudioComponentInstanceNew(rau->_comp, &myAudioUnit));
AudioStreamBasicDescription format = {0};
format.mChannelsPerFrame = 2;
format.mSampleRate = [[AVAudioSession sharedInstance] sampleRate];
format.mFormatID = kAudioFormatMPEG4AAC;
UInt32 propSize = sizeof(format);
Check(AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &propSize, &format));
//Output format from node to host
Check(AudioUnitSetProperty(myAudioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 0, &format, sizeof(format)));
//Setup a render callback to the output scope of the audio unit representing the node app
AURenderCallbackStruct callbackStruct = {0};
callbackStruct.inputProc = MyAURenderCallback;
callbackStruct.inputProcRefCon = (__bridge void *)(self);
Check(AudioUnitSetProperty(myAudioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Output, 0, &callbackStruct, sizeof(callbackStruct)));
//setup call backs
Check(AudioUnitAddPropertyListener(myAudioUnit, kAudioUnitProperty_IsInterAppConnected, IsInterappConnected, NULL));
Check(AudioUnitAddPropertyListener(myAudioUnit, kAudioOutputUnitProperty_HostTransportState, AudioUnitPropertyChangeDispatcher, NULL));
//intialize the audio unit representing the node application
Check(AudioUnitInitialize(myAudioUnit));
}

Can I use AVCaptureSession to encode an AAC stream to memory?

I'm writing an iOS app that streams video and audio over the network.
I am using AVCaptureSession to grab raw video frames using AVCaptureVideoDataOutput and encode them in software using x264. This works great.
I wanted to do the same for audio, only that I don't need that much control on the audio side so I wanted to use the built in hardware encoder to produce an AAC stream. This meant using Audio Converter from the Audio Toolbox layer. In order to do so I put in a handler for AVCaptudeAudioDataOutput's audio frames:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
// get the audio samples into a common buffer _pcmBuffer
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
CMBlockBufferGetDataPointer(blockBuffer, 0, NULL, &_pcmBufferSize, &_pcmBuffer);
// use AudioConverter to
UInt32 ouputPacketsCount = 1;
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = 1;
bufferList.mBuffers[0].mDataByteSize = sizeof(_aacBuffer);
bufferList.mBuffers[0].mData = _aacBuffer;
OSStatus st = AudioConverterFillComplexBuffer(_converter, converter_callback, (__bridge void *) self, &ouputPacketsCount, &bufferList, NULL);
if (0 == st) {
// ... send bufferList.mBuffers[0].mDataByteSize bytes from _aacBuffer...
}
}
In this case the callback function for the audio converter is pretty simple (assuming packet sizes and counts are setup properly):
- (void) putPcmSamplesInBufferList:(AudioBufferList *)bufferList withCount:(UInt32 *)count
{
bufferList->mBuffers[0].mData = _pcmBuffer;
bufferList->mBuffers[0].mDataByteSize = _pcmBufferSize;
}
And the setup for the audio converter looks like this:
{
// ...
AudioStreamBasicDescription pcmASBD = {0};
pcmASBD.mSampleRate = ((AVAudioSession *) [AVAudioSession sharedInstance]).currentHardwareSampleRate;
pcmASBD.mFormatID = kAudioFormatLinearPCM;
pcmASBD.mFormatFlags = kAudioFormatFlagsCanonical;
pcmASBD.mChannelsPerFrame = 1;
pcmASBD.mBytesPerFrame = sizeof(AudioSampleType);
pcmASBD.mFramesPerPacket = 1;
pcmASBD.mBytesPerPacket = pcmASBD.mBytesPerFrame * pcmASBD.mFramesPerPacket;
pcmASBD.mBitsPerChannel = 8 * pcmASBD.mBytesPerFrame;
AudioStreamBasicDescription aacASBD = {0};
aacASBD.mFormatID = kAudioFormatMPEG4AAC;
aacASBD.mSampleRate = pcmASBD.mSampleRate;
aacASBD.mChannelsPerFrame = pcmASBD.mChannelsPerFrame;
size = sizeof(aacASBD);
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &aacASBD);
AudioConverterNew(&pcmASBD, &aacASBD, &_converter);
// ...
}
This seems pretty straight forward only the IT DOES NOT WORK. Once the AVCaptureSession is running, the audio converter (specifically AudioConverterFillComplexBuffer) returns an 'hwiu' (hardware in use) error. Conversion works fine if the session is stopped but then I can't capture anything...
I was wondering if there was a way to get an AAC stream out of AVCaptureSession. The options I'm considering are:
Somehow using AVAssetWriterInput to encode audio samples into AAC and then get the encoded packets somehow (not through AVAssetWriter, which would only write to a file).
Reorganizing my app so that it uses AVCaptureSession only on the video side and uses Audio Queues on the audio side. This will make flow control (starting and stopping recording, responding to interruptions) more complicated and I'm afraid that it might cause synching problems between the audio and video. Also, it just doesn't seem like a good design.
Does anyone know if getting the AAC out of AVCaptureSession is possible? Do I have to use Audio Queues here? Could this get me into synching or control problems?
I ended up asking Apple for advice (it turns out you can do that if you have a paid developer account).
It seems that AVCaptureSession grabs a hold of the AAC hardware encoder but only lets you use it to write directly to file.
You can use the software encoder but you have to ask for it specifically instead of using AudioConverterNew:
AudioClassDescription *description = [self
getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC
fromManufacturer:kAppleSoftwareAudioCodecManufacturer];
if (!description) {
return false;
}
// see the question as for setting up pcmASBD and arc ASBD
OSStatus st = AudioConverterNewSpecific(&pcmASBD, &aacASBD, 1, description, &_converter);
if (st) {
NSLog(#"error creating audio converter: %s", OSSTATUS(st));
return false;
}
with
- (AudioClassDescription *)getAudioClassDescriptionWithType:(UInt32)type
fromManufacturer:(UInt32)manufacturer
{
static AudioClassDescription desc;
UInt32 encoderSpecifier = type;
OSStatus st;
UInt32 size;
st = AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size);
if (st) {
NSLog(#"error getting audio format propery info: %s", OSSTATUS(st));
return nil;
}
unsigned int count = size / sizeof(AudioClassDescription);
AudioClassDescription descriptions[count];
st = AudioFormatGetProperty(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size,
descriptions);
if (st) {
NSLog(#"error getting audio format propery: %s", OSSTATUS(st));
return nil;
}
for (unsigned int i = 0; i < count; i++) {
if ((type == descriptions[i].mSubType) &&
(manufacturer == descriptions[i].mManufacturer)) {
memcpy(&desc, &(descriptions[i]), sizeof(desc));
return &desc;
}
}
return nil;
}
The software encoder will take up CPU resources, of course, but will get the job done.

Resources