AVAudioRecorder in Swift 3: Get Byte stream instead of saving to file - ios

I am new to iOS programming and I want to port an Android app to iOS using Swift 3. The core functionality of the app is to read the byte stream from the microphone and to process this stream live. So it is not sufficient to store the audio stream to a file and process it after recording has stopped.
I already found the AVAudioRecorder class which works, but I don't know how to process the data stream live (filtering, sending it to a server, etc). The init-function of the AVAudioRecorder looks like that:
AVAudioRecorder(url: filename, settings: settings)
What I would need is a class where I can register an event handler or something like that which is called every time x bytes have been read so I can process it.
Is this possible with AVAudioRecorder? If not, is there another class in the Swift library that allows me to process audio streams live? In Android I use android.media.AudioRecord so It would be great if there's an equivalent class in Swift.
Regards

Use Audio Queue service in Core Audio framework.
https://developer.apple.com/library/content/documentation/MusicAudio/Conceptual/AudioQueueProgrammingGuide/AQRecord/RecordingAudio.html#//apple_ref/doc/uid/TP40005343-CH4-SW1
static const int kNumberBuffers = 3; // 1
struct AQRecorderState {
AudioStreamBasicDescription mDataFormat; // 2
AudioQueueRef mQueue; // 3
AudioQueueBufferRef mBuffers[kNumberBuffers]; // 4
AudioFileID mAudioFile; // 5
UInt32 bufferByteSize; // 6
SInt64 mCurrentPacket; // 7
bool mIsRunning; // 8
};
Here’s a description of the fields in this structure:
1 Sets the number of audio queue buffers to use.
2 An AudioStreamBasicDescription structure (from CoreAudioTypes.h)
representing the audio data format to write to disk. This format gets
used by the audio queue specified in the mQueue field. The mDataFormat
field gets filled initially by code in your program, as described in
Set Up an Audio Format for Recording. It is good practice to then
update the value of this field by querying the audio queue's
kAudioQueueProperty_StreamDescription property, as described in
Getting the Full Audio Format from an Audio Queue. On Mac OS X v10.5,
use the kAudioConverterCurrentInputStreamDescription property instead.
For details on the AudioStreamBasicDescription structure, see Core
Audio Data Types Reference.
3 The recording audio queue created by your application.
4 An array holding pointers to the audio queue buffers managed by the
audio queue.
5 An audio file object representing the file into which your program
records audio data.
6 The size, in bytes, for each audio queue buffer. This value is
calculated in these examples in the DeriveBufferSize function, after
the audio queue is created and before it is started. See Write a
Function to Derive Recording Audio Queue Buffer Size.
7 The packet index for the first packet to be written from the current
audio queue buffer.
8 A Boolean value indicating whether or not the audio queue is
running.

Related

AudioSessionAddPropertyListener deprecated for IOBufferDuration

I need to determine when my RemoteIO callback is changing the buffer size. Until iOS 7 we could add a session property listener using AudioSessionAddPropertyListener and then property kAudioSessionProperty_PreferredHardwareIOBufferDuration. This is now deprecated. Is there any replacement? AVAudioSession is meant to be KVO compliant, but not for the IOBufferDuration or preferredIOBufferDuration properties.
What is the replacement here?
The buffer duration is given to the RemoteIO callback in the form of the frameCount (proportional to the number of samples in the callback buffer) at a known sample rate. Any other notification would be asynchronous to this callback information, and thus possibly received at the wrong time compared to the actual change (which happens in the audio thread, not in the UI main run loop).
But your audio callback could change some visible state (global or in the parameter struct) which could be found by any other polling thread or consumer thread after the buffer duration update.

Adding audio buffer [from file] to 'live' audio buffer [recording to file]

What I'm trying to do:
Record up to a specified duration of audio/video, where the resulting output file will have a pre-defined background music from external audio-file added - without further encoding/exporting after recording.
As if you were recording video using the iPhones Camera-app, and all the recorded videos in 'Camera Roll' have background-songs. No exporting or loading after ending recording, and not in a separate AudioTrack.
How I'm trying to achieve this:
By using AVCaptureSession, in the delegate-method where the (CMSampleBufferRef)sample buffers are passed through, I'm pushing them to an AVAssetWriter to write to file. As I don't want multiple audio tracks in my output file, I can't pass the background-music through a separate AVAssetWriterInput, which means I have to add the background-music to each sample buffer from the recording while it's recording to avoid having to merge/export after recording.
The background-music is a specific, pre-defined audio file (format/codec: m4a aac), and will need no time-editing, just adding beneath the entire recording, from start to end. The recording will never be longer than the background-music-file.
Before starting the writing to file, I've also made ready an AVAssetReader, reading the specified audio-file.
Some pseudo-code(threading excluded):
-(void)startRecording
{
/*
Initialize writer and reader here: [...]
*/
backgroundAudioTrackOutput = [AVAssetReaderTrackOutput
assetReaderTrackOutputWithTrack:
backgroundAudioTrack
outputSettings:nil];
if([backgroundAudioReader canAddOutput:backgroundAudioTrackOutput])
[backgroundAudioReader addOutput:backgroundAudioTrackOutput];
else
NSLog(#"This doesn't happen");
[backgroundAudioReader startReading];
/* Some more code */
recording = YES;
}
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
if(!recording)
return;
if(videoConnection)
[self writeVideoSampleBuffer:sampleBuffer];
else if(audioConnection)
[self writeAudioSampleBuffer:sampleBuffer];
}
The AVCaptureSession is already streaming the camera-video and microphone-audio, and is just waiting for the BOOL recording to be set to YES. This isn't exactly how I'm doing this, but a short, somehow equivalent representation. When the delegate-method receives a CMSampleBufferRef of type Audio, I call my own method writeAudioSamplebuffer:sampleBuffer. If this was to be done normally, without a background-track as I'm trying to do, I'd simply put something like this: [assetWriterAudioInput appendSampleBuffer:sampleBuffer]; instead of calling my method. In my case though, I need to overlap two buffers before writing it:
-(void)writeAudioSamplebuffer:(CMSampleBufferRef)recordedSampleBuffer
{
CMSampleBufferRef backgroundSampleBuffer =
[backgroundAudioTrackOutput copyNextSampleBuffer];
/* DO MAGIC HERE */
CMSampleBufferRef resultSampleBuffer =
[self overlapBuffer:recordedSampleBuffer
withBackgroundBuffer:backgroundSampleBuffer];
/* END MAGIC HERE */
[assetWriterAudioInput appendSampleBuffer:resultSampleBuffer];
}
The problem:
I have to add incremental sample buffers from a local file to the live buffers coming in. The method I have created named overlapBuffer:withBackgroundBuffer: isn't doing much right now. I know how to extract AudioBufferList, AudioBuffer and mData etc. from a CMSampleBufferRef, but I'm not sure how to actually add them together - however - I haven't been able to test different ways to do that, because the real problem happens before that. Before the Magic should happen, I am in possession of two CMSampleBufferRefs, one received from microphone, one read from file, and this is the problem:
The sample buffer received from the background-music-file is different than the one I receive from the recording-session. It seems like the call to [self.backgroundAudioTrackOutput copyNextSampleBuffer]; receives a large number of samples. I realize that this might be obvious to some people, but I've never before been at this level of media-technology. I see now that it was wishful thinking to call copyNextSampleBuffer each time I receive a sampleBuffer from the session, but I don't know when/where to put it.
As far as I can tell, the recording-session gives one audio-sample in each sample-buffer, while the file-reader gives multiple samples in each sample-buffer. Can I somehow create a counter to count each received recorded sample/buffers, and then use the first file-sampleBuffer to extract each sample, until the current file-sampleBuffer has no more samples 'to give', and then call [..copyNext..], and do the same to that buffer?
As I'm in full control of both the recording and the file's codecs, formats etc, I am hoping that such a solution wouldn't ruin the 'alignment'/synchronization of the audio. Given that both samples have the same sampleRate, could this still be a problem?
Note
I'm not even sure if this is possible, but I see no immediate reason why it shouldn't.
Also worth mentioning that when I try to use a Video-file instead of an Audio-file, and try to continually pull video-sampleBuffers, they align up perfectly.
I am not familiarized with AVCaptureOutput, since all my sound/music sessions were built using AudioToolbox instead of AVFoundation. However, I guess you should be able to set the size of the recording capturing buffer. If not, and you are still get just one sample, I would recommend you to store each individual data obtained from the capture output in an auxiliar buffer. When the auxiliar buffer reaches the same size as the file-reading buffer, then call [self overlapBuffer:auxiliarSampleBuffer withBackgroundBuffer:backgroundSampleBuffer];
I hope this would help you. If not, I can provide example about how to do this using CoreAudio. Using CoreAudio I have been able to obtain 1024 LCPM samples buffer from both microphone capturing and file reading. So the overlapping is immediate.

What causes ExtAudioFileRead to make ioData->mBuffers[0].mDataByteSize negative?

The problem occurs when I often stop and start audio playback and seek a lot back and forth in an AAC audio file through an ExtAudioFileRef object. In few cases, this strange behaviour is shown by ExtAudioFileRead:
Sometimes it assigns these numbers to the mDataByteSize of the only AudioBuffer of the AudioBufferList:
-51604480
-51227648
-51350528
-51440640
-51240960
In hex, these numbers have the pattern 0xFC....00.
The code:
status = ExtAudioFileRead(_file, &numberFramesRead, ioData);
printf("s=%li d=%p d.nb=%li, d.b.d=%p, d.b.dbs=%li, d.b.nc=%li\n", status, ioData, ioData->mNumberBuffers, ioData->mBuffers[0].mData, ioData->mBuffers[0].mDataByteSize, ioData->mBuffers[0].mNumberChannels);
Output:
s=0 d=0x16668bd0 d.nb=1, d.b.d=0x30de000, d.b.dbs=1024, d.b.nc=2 // good (usual)
s=0 d=0x16668bd0 d.nb=1, d.b.d=0x30de000, d.b.dbs=-51240960, d.b.nc=2 // misbehaving
The problem occurs on an iPhone 4S on iOS 7. I could not reproduce the problem in the Simulator.
The problem occurs when concurrently calling ExtAudioFileRead() and ExtAudioFileSeek() for the same ExtAudioFileRef from two different threads/queues.
The read function was called directly from the AURenderCallback, so it was executed on AudioUnit's real-time thread while the seek was done on my own serial queue.
I've modified the code of the render callback to also dispatch_sync() to the same serial queue to which the seek gets dispatched. That solved the problem.

iOS: Playing PCM buffers from a stream

I'm receiving a series of UDP packets from a socket containing encoded PCM buffers. After decoding them, I'm left with an int16 * audio buffer, which I'd like to immediately play back.
The intended logic goes something like this:
init(){
initTrack(track, output, channels, sample_rate, ...);
}
onReceiveBufferFromSocket(NSData data){
//Decode the buffer
int16 * buf = handle_data(data);
//Play data
write_to_track(track, buf, length_of_buf, etc);
}
I'm not sure about everything that has to do with playing back the buffers though. On Android, I'm able to achieve this by creating an AudioTrack object, setting it up by specifying a sample rate, a format, channels, etc... and then just calling the "write" method with the buffer (like I wish I could in my pseudo-code above) but on iOS I'm coming up short.
I tried using the Audio File Stream Services, but I'm guessing I'm doing something wrong since no sound ever comes out and I feel like those functions by themselves don't actually do any playback. I also attempted to understand the Audio Queue Services (which I think might be close to what I want), however I was unable to find any simple code samples for its usage.
Any help would be greatly appreciated, specially in the form of example code.
You need to use some type of buffer to hold your incoming UDP data. This is an easy and good circular buffer that I have used.
Then to play back data from the buffer, you can use Audio Unit framework. Here is a good example project.
Note: The first link also shows you how to playback using Audio Unit.
You could use audioQueue services as well, make sure your doing some kind of packet re-ordering, if your using ffmpeg to decode the streams there is an option for this.
otherwise audio queues are easy to set up.
https://github.com/mooncatventures-group/iFrameExtractor/blob/master/Classes/AudioController.m
You could also use AudioUnits, a bit more complicated though.

Removing Silence from Audio Queue session recorded audio in ios

I'm using Audio Queue to record audio from the iphone's mic and stop recording when silence detected (no audio input for 10seconds) but I want to discard the silence from audio file.
In AudioInputCallback function I am using following code to detect silence :
AudioQueueLevelMeterState meters[1];
UInt32 dlen = sizeof(meters);
OSStatus Status AudioQueueGetProperty(inAQ,kAudioQueueProperty_CurrentLevelMeterDB,meters,&dlen);
if(meters[0].mPeakPower < _threshold)
{ // NSLog(#"Silence detected");}
But how to remove these packets? Or Is there any better option?
Instead of removing the packets from the AudioQueue, you can delay the write up by writing it to a buffer first. The buffer can be easily defined by having it inside the inUserData.
When you finish recording, if the last 10 seconds is not silent, you write it back to whatever file you are going to write. Otherwise just free the buffer.
after the file is recorded and closed, simply open and truncate the sample data you are not interested in (note: you can use AudioFile/ExtAudioFile APIs to properly update any dependent chunk/header sizes).

Resources