I have a continuous stream of raw PCM audio data from a TCP socket and I want to play them. I've done so many researches and saw many samples but no result. This gist was the most close solution but the problem is, it's streaming mp3 file.
So I have a socket which receives linear PCM audio data and give them to the player like this:
func play(_ data: Data) {
// this function is called for every 320 bytes of linear PCM data.
// play the 320 bytes of PCM data here!
}
So is there any "Simple" way to play raw PCM audio data?
For iOS, you can use the RemoteIO Audio Unit or the AVAudioEngine with a circular buffer for real-time audio streaming.
You can't give network data directly to audio output, but instead should put it in a circular buffer from which an audio subsystem play callback can consume it at its fixed rate. You will need to pre-buffer some amount of audio samples to cover network jitter.
Simple "ways" of doing this might not handle network jitter gracefully.
Answering late but If you are still stuck in playing TCP bytes then try to follow my answer where you put your tcp audio bytes in Circular Buffer and play it via AudioUnit.
Below code receives bytes from TCP and put them into a TPCircularBuffer
func tcpReceive() {
receivingQueue.async {
repeat {
do {
let datagram = try self.tcpClient?.receive()
var byteData = datagram?["data"] as? Data
let dataLength = datagram?["length"] as? Int
let _ = TPCircularBufferProduceBytes(&self.circularBuffer, byteData.bytes, UInt32(decodedLength * 2))
} catch {
fatalError(error.localizedDescription)
}
} while true
}
}
Create Audio Unit
...
var desc = AudioComponentDescription(
componentType: OSType(kAudioUnitType_Output),
componentSubType: OSType(kAudioUnitSubType_VoiceProcessingIO),
componentManufacturer: OSType(kAudioUnitManufacturer_Apple),
componentFlags: 0,
componentFlagsMask: 0
)
let inputComponent = AudioComponentFindNext(nil, &desc)
status = AudioComponentInstanceNew(inputComponent!, &audioUnit)
if status != noErr {
print("Audio component instance new error \(status!)")
}
// Enable IO for playback
status = AudioUnitSetProperty(
audioUnit!,
kAudioOutputUnitProperty_EnableIO,
kAudioUnitScope_Output,
kOutputBus,
&flag,
SizeOf32(flag)
)
if status != noErr {
print("Enable IO for playback error \(status!)")
}
//Use your own format, I have sample rate of 16000 and pcm 16 Bit
var ioFormat = CAStreamBasicDescription(
sampleRate: 16000.0,
numChannels: 1,
pcmf: .int16,
isInterleaved: false
)
//This is playbackCallback
var playbackCallback = AURenderCallbackStruct(
inputProc: AudioController_PlaybackCallback, //This is a delegate where audioUnit puts the bytes
inputProcRefCon: UnsafeMutableRawPointer(Unmanaged.passUnretained(self).toOpaque())
)
status = AudioUnitSetProperty(
audioUnit!,
AudioUnitPropertyID(kAudioUnitProperty_SetRenderCallback),
AudioUnitScope(kAudioUnitScope_Input),
kOutputBus,
&playbackCallback,
MemoryLayout<AURenderCallbackStruct>.size.ui
)
if status != noErr {
print("Failed to set recording render callback \(status!)")
}
//Init Audio Unit
status = AudioUnitInitialize(audioUnit!)
if status != noErr {
print("Failed to initialize audio unit \(status!)")
}
//Start AudioUnit
status = AudioOutputUnitStart(audioUnit!)
if status != noErr {
print("Failed to initialize output unit \(status!)")
}
This is my playbackCallback function where I play the audio from Circular Buffer
func performPlayback(
_ ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBufNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>
) -> OSStatus {
let buffer = ioData[0].mBuffers
let bytesToCopy = ioData[0].mBuffers.mDataByteSize
var bufferTail: UnsafeMutableRawPointer?
var availableBytes: UInt32 = 0
bufferTail = TPCircularBufferTail(&self.circularBuffer, &availableBytes)
let bytesToWrite = min(bytesToCopy, availableBytes)
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: ioData[0].mBuffers)
var monoSamples = [Int16]()
let ptr = bufferList.mBuffers.mData?.assumingMemoryBound(to: Int16.self)
monoSamples.append(contentsOf: UnsafeBufferPointer(start: ptr, count: Int(inNumberFrames)))
print(monoSamples)
memcpy(buffer.mData, bufferTail, Int(bytesToWrite))
TPCircularBufferConsume(&self.circularBuffer, bytesToWrite)
return noErr
}
For TPCircularBuffer I used this pod
'TPCircularBuffer', '~> 1.6'
All detail description and sample code is available for
Audiotoolbox / AudioUnit
You can register the Callback to get the PCM data from the AUGraph and send the pcm buffer to the socket.
Some more example of the usage :
https://github.com/rweichler/coreaudio-examples/blob/master/CH08_AUGraphInput/main.cpp
Related
I am recording audio in IOS from audioUnit, encoding the bytes with opus and sending it via UDP to android side. The problem is that the sound is playing a bit clipped. I have also tested the sound by sending the Raw data from IOS to Android and it plays perfect.
My AudioSession code is
try audioSession.setCategory(.playAndRecord, mode: .voiceChat, options: [.defaultToSpeaker])
try audioSession.setPreferredIOBufferDuration(0.02)
try audioSession.setActive(true)
My recording callBack code is:
func performRecording(
_ ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBufNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>) -> OSStatus
{
var err: OSStatus = noErr
err = AudioUnitRender(audioUnit!, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData)
if let mData = ioData[0].mBuffers.mData {
let ptrData = mData.bindMemory(to: Int16.self, capacity: Int(inNumberFrames))
let bufferPtr = UnsafeBufferPointer(start: ptrData, count: Int(inNumberFrames))
count += 1
addedBuffer += Array(bufferPtr)
if count == 2 {
let _ = TPCircularBufferProduceBytes(&circularBuffer, addedBuffer, UInt32(addedBuffer.count * 2))
count = 0
addedBuffer = []
let buffer = TPCircularBufferTail(&circularBuffer, &availableBytes)
memcpy(&targetBuffer, buffer, Int(min(bytesToCopy, Int(availableBytes))))
TPCircularBufferConsume(&circularBuffer, UInt32(min(bytesToCopy, Int(availableBytes))))
self.audioRecordingDelegate(inTimeStamp.pointee.mSampleTime / Double(16000), targetBuffer)
}
}
return err;
}
Here i am getting inNumberOfFrames almost 341 and i am appending 2 arrays together to get a bigger framesize (needed 640) for Android but i am only encoding 640 by the help of TPCircularBuffer.
func gotSomeAudio(timeStamp: Double, samples: [Int16]) {
samples.count))
let encodedData = opusHelper?.encodeStream(of: samples)
OPUS_SET_BITRATE_REQUEST)
let myData = encodedData!.withUnsafeBufferPointer {
Data(buffer: $0)
}
var protoModel = ProtoModel()
seqNumber += 1
protoModel.sequenceNumber = seqNumber
protoModel.timeStamp = Date().currentTimeInMillis()
protoModel.payload = myData
DispatchQueue.global().async {
do {
try self.tcpClient?.send(data: protoModel)
} catch {
print(error.localizedDescription)
}
}
let diff = CFAbsoluteTimeGetCurrent() - start
print("Time diff is \(diff)")
}
In the above code i am opus encoding 640 frameSize and adding it to ProtoBuf payload and Sending it via UDP.
On Android side i am parsing the Protobuf and decoding the 640 framesize and playing it with AudioTrack.There is no problem with android side as i have recorded and played sound just by using Android but the problem comes when i record sound via IOS and play through Android Side.
Please don't suggest to increase the frameSize by setting Preferred IO Buffer Duration. I want to do it without changing this.
https://stackoverflow.com/a/57873492/12020007 It was helpful.
https://stackoverflow.com/a/58947295/12020007
I have updated my code according to your suggestion, removed the delegate and array concatenation but there is still clipping on android side. I have also calculated the time it takes to encode bytes that is approx 2-3 ms.
Updated callback code is
var err: OSStatus = noErr
// we are calling AudioUnitRender on the input bus of AURemoteIO
// this will store the audio data captured by the microphone in ioData
err = AudioUnitRender(audioUnit!, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData)
if let mData = ioData[0].mBuffers.mData {
_ = TPCircularBufferProduceBytes(&circularBuffer, mData, inNumberFrames * 2)
print("mDataByteSize: \(ioData[0].mBuffers.mDataByteSize)")
count += 1
if count == 2 {
count = 0
let buffer = TPCircularBufferTail(&circularBuffer, &availableBytes)
memcpy(&targetBuffer, buffer, min(bytesToCopy, Int(availableBytes)))
TPCircularBufferConsume(&circularBuffer, UInt32(min(bytesToCopy, Int(availableBytes))))
let encodedData = opusHelper?.encodeStream(of: targetBuffer)
let myData = encodedData!.withUnsafeBufferPointer {
Data(buffer: $0)
}
var protoModel = ProtoModel()
seqNumber += 1
protoModel.sequenceNumber = seqNumber
protoModel.timeStamp = Date().currentTimeInMillis()
protoModel.payload = myData
do {
try self.udpClient?.send(data: protoModel)
} catch {
print(error.localizedDescription)
}
}
}
return err;
Your code is doing Swift memory allocation (Array concatenation) and Swift method calls (your recording delegate) inside the audio callback. Apple (in a WWDC session on Audio) recommends not doing any memory allocation or method calls inside the real-time audio callback context (especially when requesting short Preferred IO Buffer Durations). Stick to C function calls, such as memcpy and TPCircularBuffer.
Added: Also, don't discard samples. If you get 680 samples, but only need 640 for a packet, keep the 40 "left over" samples and use them appended in front of a later packet. The circular buffer will save them for you. Rinse and repeat. Send all the samples you get from the audio callback when you've accumulated enough for a packet, or yet another packet when you end up accumulating 1280 (2*640) or more.
I am developing VOIP calling app so now I am in the stage where I need to transfer the voice data to the server. For that I want to get Real time audio voice data from mic with 20 mili Seconds callbacks.
I did searched many links but I am unable find solution as
i am new to audio frameworks.
Details
We have our own stack like WebRTC which gives RTP sends data from remote for every 20 mili second and asks data from Mic for 20 mili second , What I am trying to achieve is to get 20 mili second data from mic and pass it the same to the stack. So need to know how to do so. Audio format is pcmFormatInt16 and sample rate is 8000 Hz with 20 mili seconds data.
I have searched for
AVAudioEngine,
AUAudioUnit,
AVCaptureSession Etc.
1.I am Using AVAudioSession and AUAudioUnit but setPreferredIOBufferDuration of audioSession is not setting with exact value what i have set. In result of that i am not getting the exact data size. Can anybody help me on setPreferredIOBufferDuration.
2.One more issue is auAudioUnit.outputProvider () is giving inputData in UnsafeMutableAudioBufferListPointer. inputData list has two element and I want only one sample. Can anybody help me on that to change it into data format which can be played in AVAudioPlayer.
I have followed before link
https://gist.github.com/hotpaw2/ba815fc23b5d642705f2b1dedfaf0107
let hwSRate = audioSession.sampleRate
try audioSession.setActive(true)
print("native Hardware rate : \(hwSRate)")
try audioSession.setPreferredIOBufferDuration(preferredIOBufferDuration)
try audioSession.setPreferredSampleRate(8000) // at 8000.0 Hz
print("Changed native Hardware rate : \(audioSession.sampleRate) buffer duration \(audioSession.ioBufferDuration)")
try auAudioUnit = AUAudioUnit(componentDescription: self.audioComponentDescription)
auAudioUnit.outputProvider = { // AURenderPullInputBlock
(actionFlags, timestamp, frameCount, inputBusNumber, inputData) -> AUAudioUnitStatus in
if let block = self.renderBlock { // AURenderBlock?
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
1,
inputData,
.none)
if err == noErr {
// save samples from current input buffer to circular buffer
print("inputData = \(inputData) and frameCount: \(frameCount)")
self.recordMicrophoneInputSamples(
inputDataList: inputData,
frameCount: UInt32(frameCount) )
}
}
let err2 : AUAudioUnitStatus = noErr
return err2
}
Log:-
Changed native Hardware rate : 8000.0 buffer duration 0.01600000075995922
try to get 40 ms data from the Audio interface and then split it up into 20ms data.
also check if you are able to set the sampling frequency (8 Khz) of the audio interface.
Render block will give you call backs according to the accepted set up by hardware for AUAudioUnit and AudioSession. We have to manage buffer if we want different size of input from mic. Output to the speaker should be same size as it expects like 128, 256 ,512 bytes etc.
try audioSession.setPreferredSampleRate(sampleRateProvided) // at 48000.0
try audioSession.setPreferredIOBufferDuration(preferredIOBufferDuration)
These values can be different from our preferred size. That is why we have to use buffer logic get out preferred size of input.
Link: https://gist.github.com/hotpaw2/ba815fc23b5d642705f2b1dedfaf0107
renderBlock = auAudioUnit.renderBlock
if ( enableRecording
&& micPermissionGranted
&& audioSetupComplete
&& audioSessionActive
&& isRecording == false ) {
auAudioUnit.inputHandler = { (actionFlags, timestamp, frameCount, inputBusNumber) in
if let block = self.renderBlock { // AURenderBlock?
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: audioFormat!.channelCount,
mDataByteSize: 0,
mData: nil))
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
inputBusNumber,
&bufferList,
.none)
if err == noErr {
// save samples from current input buffer to circular buffer
print("inputData = \(bufferList.mBuffers.mDataByteSize) and frameCount: \(frameCount) and count: \(count)")
count += 1
if !self.isMuteState {
self.recordMicrophoneInputSamples(
inputDataList: &bufferList,
frameCount: UInt32(frameCount) )
}
}
}
}
auAudioUnit.isInputEnabled = true
auAudioUnit.outputProvider = { ( // AURenderPullInputBlock?
actionFlags,
timestamp,
frameCount,
inputBusNumber,
inputDataList ) -> AUAudioUnitStatus in
if let block = self.renderBlock {
if let dataReceived = self.getInputDataForConsumption() {
let mutabledata = NSMutableData(data: dataReceived)
var bufferListSpeaker = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: 1,
mDataByteSize: 0,
mData: nil))
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
1,
&bufferListSpeaker,
.none)
if err == noErr {
bufferListSpeaker.mBuffers.mDataByteSize = UInt32(mutabledata.length)
bufferListSpeaker.mBuffers.mData = mutabledata.mutableBytes
inputDataList[0] = bufferListSpeaker
print("Output Provider mDataByteSize: \(inputDataList[0].mBuffers.mDataByteSize) output FrameCount: \(frameCount)")
return err
} else {
print("Output Provider \(err)")
return err
}
}
}
return 0
}
auAudioUnit.isOutputEnabled = true
do {
circInIdx = 0 // initialize circular buffer pointers
circOutIdx = 0
circoutSpkIdx = 0
circInSpkIdx = 0
try auAudioUnit.allocateRenderResources()
try auAudioUnit.startHardware() // equivalent to AudioOutputUnitStart ???
isRecording = true
} catch let e {
print(e)
}
I'm struggling to encode audio buffers received from AVCaptureSession using
AudioConverter and then appending them to an AVAssetWriter.
I'm not getting any errors (including OSStatus responses), and the
CMSampleBuffers generated seem to have valid data, however the resulting file
simply does not have any playable audio. When writing together with video, the video
frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error), probably because the asset
writer is waiting for the audio to catch up. I suspect it's related to the way
I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to
understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to
be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
Below is the proc that fetches samples for the audio converter, and the data
structure that gets passed to it:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
This is how I am converting AudioBufferLists to CMSampleBuffers:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
Is there anything that I am obviously doing wrong? Is there a proper way to
construct CMSampleBuffers from AudioBufferList? How do you transfer priming
information from the converter to CMSampleBuffers that you create?
For my use case I need to do the encoding manually as the buffers will be
manipulated further down the pipeline (although I've disabled all
transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to
digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
Can I use AVCaptureSession to encode an AAC stream to memory?
Writing video + generated audio to AVAssetWriterInput, audio stuttering
How do I use CoreAudio's AudioConverter to encode AAC in real-time?
Some references I've used:
Apple sample code demonstrating how to use AudioConverter
Note describing AAC encoder delay
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
Samples vs Packets vs Frames
This had been a huge source of confusion for me:
Each CMSampleBuffer can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples)
Each CMSampleBuffer that contains 1 sample represents a single audio packet.
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in the given buffer.
Packets contain frames. This is governed by the mFramesPerPacket property of the buffer's AudioStreamBasicDescription. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame. For compressed buffers (like AAC), there is no relationship between the total size and frame count.
AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
Slicing CMSampleBuffer's
You can actually slice CMSampleBuffer's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange. This is nice so that you can provide the AudioConverterComplexInputDataProc with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500 frames of data when it expects 1024, the result sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate.
Priming and trim duration
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart should never be greater than the duration of the sample buffer. An example of priming:
Priming frames: 2112
Sample rate: 44100
Priming duration: 2112 / 44100 = ~0.0479s
First frame, frames: 1024, priming duration: 1024 / 44100
Second frame, frames: 1024, priming duration: 1088 / 41100
Creating the new CMSampleBuffer
AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)
I am creating a voip app for iOS in objective-c. Currently i am trying to create the audio part: recording the audio data from microphone, encoding with Opus, decoding, and then playing. For the recording and playing i use AudioUnit. Also i made a buffer implementation which allocates places of memory each with initially set size. There are three main methods:
- setBufferSize - for setting buffer's sub allocated spaces.
- writeDataToBuffer - for creating new space(if needed), and filling data into current writing space.
- readDataFromBuffer - read data from current reading space.
I use the buffer for storing the audio data there. It works good. I've tested it. Also if i try to use it without Opus just reading audio data, storing it into the buffer, reading from the buffer and then playing, everything works great. But the problem comes when i include opus. Actually it encodes and decodes the audio data, but the quality is not so good and there are some crackle as well. I was wondering what am i doing wrong? Here are pieces of my code:
AudioUnit:
OSStatus status;
m_sAudioDescription.componentType = kAudioUnitType_Output;
m_sAudioDescription.componentSubType = kAudioUnitSubType_VoiceProcessingIO/*kAudioUnitSubType_RemoteIO*/;
m_sAudioDescription.componentFlags = 0;
m_sAudioDescription.componentFlagsMask = 0;
m_sAudioDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
AudioComponent inputComponent = AudioComponentFindNext(NULL, &m_sAudioDescription);
status = AudioComponentInstanceNew(inputComponent, &m_audioUnit);
// Enable IO for recording
UInt32 flag = 1;
status = AudioUnitSetProperty(m_audioUnit,
kAudioOutputUnitProperty_EnableIO,
kAudioUnitScope_Input,
VOIP_AUDIO_INPUT_ELEMENT,
&flag,
sizeof(flag));
// Enable IO for playback
status = AudioUnitSetProperty(m_audioUnit,
kAudioOutputUnitProperty_EnableIO,
kAudioUnitScope_Output,
VOIP_AUDIO_OUTPUT_ELEMENT,
&flag,
sizeof(flag));
// Describe format
m_sAudioFormat.mSampleRate = 48000.00;//48000.00;/*44100.00*/;
m_sAudioFormat.mFormatID = kAudioFormatLinearPCM;
m_sAudioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked/* | kAudioFormatFlagsCanonical*/;
m_sAudioFormat.mFramesPerPacket = 1;
m_sAudioFormat.mChannelsPerFrame = 1;
m_sAudioFormat.mBitsPerChannel = 16; //8 * bytesPerSample
m_sAudioFormat.mBytesPerFrame = /*(UInt32)bytesPerSample;*/2; //bitsPerChannel / 8 * channelsPerFrame
m_sAudioFormat.mBytesPerPacket = 2; //bytesPerFrame * framesPerPacket
// Apply format
status = AudioUnitSetProperty(m_audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
VOIP_AUDIO_INPUT_ELEMENT,
&m_sAudioFormat,
sizeof(m_sAudioFormat));
status = AudioUnitSetProperty(m_audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
VOIP_AUDIO_OUTPUT_ELEMENT,
&m_sAudioFormat,
sizeof(m_sAudioFormat));
// Set input callback
AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = inputRenderCallback;
callbackStruct.inputProcRefCon = this;
status = AudioUnitSetProperty(m_audioUnit,
kAudioOutputUnitProperty_SetInputCallback,
kAudioUnitScope_Global,
VOIP_AUDIO_INPUT_ELEMENT,
&callbackStruct,
sizeof(callbackStruct));
// Set output callback
callbackStruct.inputProc = outputRenderCallback;
callbackStruct.inputProcRefCon = this;
status = AudioUnitSetProperty(m_audioUnit,
kAudioUnitProperty_SetRenderCallback,
kAudioUnitScope_Global,
VOIP_AUDIO_OUTPUT_ELEMENT,
&callbackStruct,
sizeof(callbackStruct));
//Enable Echo cancelation:
this->_setEchoCancelation(true);
//Enable Automatic Gain control:
this->_setAGC(false);
// Initialise
status = AudioUnitInitialize(m_audioUnit);
return noErr;
Input buffer allocation and setting the size of storing buffers:
void VoipAudio::_allocBuffer()
{
UInt32 numFramesPerBuffer;
UInt32 size = sizeof(/*VoipUInt32*/VoipInt16);
AudioUnitGetProperty(m_audioUnit,
kAudioUnitProperty_MaximumFramesPerSlice,
kAudioUnitScope_Global,
VOIP_AUDIO_OUTPUT_ELEMENT, &numFramesPerBuffer, &siz
UInt32 inputBufferListSize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * m_sAudioFormat.mChannelsPerFrame);
inputBuffer = (AudioBufferList *)malloc(inputBufferListSize);
inputBuffer->mNumberBuffers = m_sAudioFormat.mChannelsPerFrame;
//pre-malloc buffers for AudioBufferLists
for(VoipUInt32 tmp_int1 = 0; tmp_int1 < inputBuffer->mNumberBuffers; tmp_int1++)
{
inputBuffer->mBuffers[tmp_int1].mNumberChannels = 1;
inputBuffer->mBuffers[tmp_int1].mDataByteSize = 2048;
inputBuffer->mBuffers[tmp_int1].mData = malloc(2048);
memset(inputBuffer->mBuffers[tmp_int1].mData, 0, 2048);
}
this->m_oAudioBuffer = new VoipBuffer();
this->m_oAudioBuffer->setBufferSize(2048);
this->m_oAudioReadBuffer = new VoipBuffer();
this->m_oAudioReadBuffer->setBufferSize(2880);
}
Record callback:
this->m_oAudioReadBuffer->writeDataToBuffer(samples, samplesSize);
void* tmp_buffer = this->m_oAudioReadBuffer->readDataFromBuffer();
if (tmp_buffer != nullptr)
{
sVoipAudioCodecOpusEncodedResult* encodedSamples = VoipAudioCodecs::Opus_Encode((VoipInt16*)tmp_buffer, 2880);
sVoipAudioCodecOpusDecodedResult* decodedSamples = VoipAudioCodecs::Opus_Decode(encodedSamples->m_data, encodedSamples->m_dataSize);
this->m_oAudioBuffer->writeDataToBuffer(decodedSamples->m_data, decodedSamples->m_dataSize);
free(encodedSamples->m_data);
free(encodedSamples);
free(decodedSamples->m_data);
free(decodedSamples);
}
Playing callback:
void* tmp_buffer = this->m_oAudioBuffer->readDataFromBuffer();
if (tmp_buffer != nullptr)
{
memset(buffer->mBuffers[0].mData, 0, 2048);
memcpy(buffer->mBuffers[0].mData, tmp_buffer, 2048);
buffer->mBuffers[0].mDataByteSize = 2048;
} else {
memset(buffer->mBuffers[0].mData, 0, 2048);
buffer->mBuffers[0].mDataByteSize = 2048;
}
Opus Init Code:
int _error = 0;
VoipAudioCodecs::m_oEncoder = opus_encoder_create(SAMPLE_RATE, CHANNELS, APPLICATION, &_error);
if (_error < 0)
{
fprintf(stderr, "VoipAudioCodecs error: failed to create an encoder: %s\n", opus_strerror(_error));
return;
}
_error = opus_encoder_ctl(VoipAudioCodecs::m_oEncoder, OPUS_SET_BITRATE(BITRATE/*OPUS_BITRATE_MAX*/));
if (_error < 0)
{
fprintf(stderr, "VoipAudioCodecs error: failed to set bitrate: %s\n", opus_strerror(_error));
return;
}
VoipAudioCodecs::m_oDecoder = opus_decoder_create(SAMPLE_RATE, CHANNELS, &_error);
if (_error < 0)
{
fprintf(stderr, "VoipAudioCodecs error: failed to create decoder: %s\n", opus_strerror(_error));
return;
}
Opus encode/decode:
sVoipAudioCodecOpusEncodedResult* VoipAudioCodecs::Opus_Encode(VoipInt16* number, int samplesCount)
{
unsigned char cbits[MAX_PACKET_SIZE];
VoipInt32 nbBytes;
nbBytes = opus_encode(VoipAudioCodecs::m_oEncoder, number, FRAME_SIZE, cbits, MAX_PACKET_SIZE);
if (nbBytes < 0)
{
fprintf(stderr, "VoipAudioCodecs error: encode failed: %s\n", opus_strerror(nbBytes));
return nullptr;
}
sVoipAudioCodecOpusEncodedResult* result = (sVoipAudioCodecOpusEncodedResult* )malloc(sizeof(sVoipAudioCodecOpusEncodedResult));
result->m_data = (unsigned char*)malloc(nbBytes);
memcpy(result->m_data, cbits, nbBytes);
result->m_dataSize = nbBytes;
return result;
}
sVoipAudioCodecOpusDecodedResult* VoipAudioCodecs::Opus_Decode(void* encoded, VoipInt32 nbBytes)
{
VoipInt16 decodedPacket[MAX_FRAME_SIZE];
int frame_size = opus_decode(VoipAudioCodecs::m_oDecoder, (const unsigned char*)encoded, nbBytes, decodedPacket, MAX_FRAME_SIZE, 0);
if (frame_size < 0)
{
fprintf(stderr, "VoipAudioCodecs error: decoder failed: %s\n", opus_strerror(frame_size));
return nullptr;
}
sVoipAudioCodecOpusDecodedResult* result = (sVoipAudioCodecOpusDecodedResult* )malloc(sizeof(sVoipAudioCodecOpusDecodedResult));
result->m_data = (VoipInt16*)malloc(frame_size / sizeof(VoipInt16));
memcpy(result->m_data, decodedPacket, (frame_size / sizeof(VoipInt16)));
result->m_dataSize = frame_size / sizeof(VoipInt16);
return result;
}
Here are some constants i use:
#define FRAME_SIZE 2880 //120, 240, 480, 960, 1920, 2880
#define SAMPLE_RATE 48000
#define CHANNELS 1
#define APPLICATION OPUS_APPLICATION_VOIP//OPUS_APPLICATION_AUDIO
#define BITRATE 64000
#define MAX_FRAME_SIZE 4096
#define MAX_PACKET_SIZE (3*1276)
Can you help me please?
Your audio call back time may need increased. Try increasing your session setPreferredIOBufferDuration time. I have used opus on iOS and have measured the decoding time. It takes 2 to 3 ms to decode about 240 frames of data. There is a good chance you are missing your subsequent callbacks because it is taking to long to decode the audio.
i was have same problem in my project, the problem was the iOS give me unstable frame size, i used audio queue service and audio unit, they give me same result ( crackled voice ).
all you have to do is, save some samples in ring buffer in audio callback.
then in separate thread, do audio processing to make fixed frame to each round.
for example :
audioUnit give you frames or samples like this: [2048 .. 2048 .. 2048]
and opus codec need, 2880 fame for each packet, so you need to get 2048 from first buffer and 832 remain frames from next buffer to get fixed frame size to send it to opus encoder.
this function i used in my project
func audioProcessing(){
DispatchQueue.global(qos: .default).async {
// this to save remain data from ring buffer
var remainData:NSMutableData = NSMutableData()
var remainDataSize = 0
while self.room_oppened{
// here we define the fixed frame we want to use in our opus encoder
var packetOffset = 0
let fixedFrameSize:Int = 5760
var dataToGetFullFrame:Int = 5760
let packetData:NSMutableData = NSMutableData(length: fixedFrameSize)!// this need to filled with data
if remainDataSize > 0 {
if remainDataSize < fixedFrameSize{
memcpy(packetData.mutableBytes.advanced(by: packetOffset), remainData.mutableBytes.advanced(by: 0), remainDataSize)// add the remain data
dataToGetFullFrame = dataToGetFullFrame - remainDataSize
packetOffset = packetOffset + remainDataSize// - 1
}else{
memcpy(packetData.mutableBytes.advanced(by: packetOffset), remainData.mutableBytes.advanced(by: 0), fixedFrameSize)// add the remain data
dataToGetFullFrame = 0
}
remainDataSize = 0
}
// if the packet not fill full, we need to get more data from circle buffer
if dataToGetFullFrame > 0 {
while dataToGetFullFrame > 0 {
let bufferData = self.ringBufferEncodedAudio.read()// read chunk of data from bufer
if bufferData != nil{
var chunkOffset = 0
if dataToGetFullFrame > bufferData!.length{
memcpy(packetData.mutableBytes.advanced(by: packetOffset) , bufferData!.mutableBytes , bufferData!.length)
chunkOffset = bufferData!.length// this how much data we read
dataToGetFullFrame = dataToGetFullFrame - bufferData!.length // how much of data we need to fill packet
packetOffset = packetOffset + bufferData!.length// + 1
}else{
memcpy(packetData.mutableBytes.advanced(by: packetOffset) , bufferData!.mutableBytes , dataToGetFullFrame)
chunkOffset = dataToGetFullFrame// this how much data we read
packetOffset = packetOffset + dataToGetFullFrame// + 1
dataToGetFullFrame = dataToGetFullFrame - dataToGetFullFrame // how much of data we need to fill packet
}
if dataToGetFullFrame <= 0 {
var size = bufferData!.length - chunkOffset
remainData = NSMutableData(bytes: bufferData?.mutableBytes.advanced(by: chunkOffset), length: size)
remainDataSize = size
}
}
usleep(useconds_t(8 * 1000))
}
}
// send packet to encoder
if self.enable_streaming {
let dataToEncode:Data = packetData as Data
let packet = OpusSwiftPort.shared.encodeData(dataToEncode)
if packet != nil{
self.sendAudioPacket(packet: packet!)// <--- this to network
}
}
}
}
}
after i did this audio processing i get very clear audio.
i hope this was helpful for you.
I am coding a real time audio playback program on iOS.
It receives audio RTP packages from the peer, and put it into audio queue to play.
When start playing, the sound is OK. But after 1 or 2 minutes, the sound muted, and there is no error reported from AudioQueue API. The callback function continues being called normally, nothing abnormal.
But it just muted.
My callback function:
1: Loop until there is enough data can be copied to audio queue buffer
do
{
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled >= kAudioQueueBufferLength)
{
break;
}
usleep(10*1000);
}
while (true);
2: Copy to AudioQueue Buffer, and enqueue it. This callback function keeps running normally and no error.
//copy to audio queue buffer
read_bytes = kAudioQueueBufferLength;
g_audio_playback_buf.Read((unsigned char *)inBuffer->mAudioData, read_bytes);
WriteLog(LOG_PHONE_DEBUG, "AudioQueueBuffer(Play): copy [%d] bytes to AudioQueue buffer! Total len = %d", read_bytes, read_bytes_enabled);
inBuffer->mAudioDataByteSize = read_bytes;
UInt32 nPackets = read_bytes / g_audio_periodsize; // mono
inBuffer->mPacketDescriptionCount = nPackets;
// re-enqueue this buffer
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
The problem has been resolved.
The key point is, you can not let the audio queue buffer waits, you must keep feeding it, or it might be muted. If you don't have enough data, fill it with blank data.
so the following code should be changed:
do
{
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled >= kAudioQueueBufferLength)
{
break;
}
usleep(10*1000);
}
while (true);
changed to this:
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled < kAudioQueueBufferLength)
{
memset(inBuffer->mAudioData, 0x00, kAudioQueueBufferLength);
}
else
{
inBuffer->mAudioDataByteSize = kAudioQueueBufferLength;
}
...
You can let the AudioQueue wait if you use AudioQueuePause.
In this exemple, in Swift 5, I use a generic queue. When this queue is empty, as you did, I fill my buffer with empty data in callback and call AudioQueuePause. It's important to note that all of AudioQueueBuffer send to AudioQueueRef with AudioQueueEnqueueBuffer before call AudioQueuePause are played.
Create an userData class to send everything you need to your callback :
class UserData {
let dataQueue = Queue<Data>()
let semaphore = DispatchSemaphore(value: 1)
}
private var inQueue: AudioQueueRef!
private var userData = UserData()
Give an instance of this class when you create your AudioQueue and start it :
AudioQueueNewOutput(&inFormat, audioQueueOutputCallback, &userData, nil, nil, 0, &inQueue)
AudioQueueStart(inQueue, nil)
Generate all your buffers and don't enqueue them directly : call your callback function :
for _ in 0...2 {
var bufferRef: AudioQueueBufferRef!
AudioQueueAllocateBuffer(inQueue, 320, &bufferRef)
audioQueueOutputCallback(&userData, inQueue, bufferRef)
}
When you receive audio data, you can call a method who enqueue your data and let it wait for callback function get it :
func audioReceived(_ audio: Data) {
let dataQueue = userData.dataQueue
let semaphore = userData.semaphore
semaphore.wait()
dataQueue.enqueue(audio)
semaphore.signal()
// Start AudioQueue every time, if it's already started this call do nothing
AudioQueueStart(inQueue, nil)
}
Finally you can implement a callback function like this :
private let audioQueueOutputCallback: AudioQueueOutputCallback = { (inUserData, inAQ, inBuffer) in
// Get data from UnsageMutableRawPointer
let userData: UserData = (inUserData!.bindMemory(to: UserData.self, capacity: 1).pointee)
let queue = userData.dataQueue
let semaphore = userData.semaphore
// bind UnsafeMutableRawPointer to UnsafeMutablePointer<UInt8> for data copy
let audioBuffer = inBuffer.pointee.mAudioData.bindMemory(to: UInt8.self, capacity: 320)
if queue.isEmpty {
print("Queue is empty: pause")
AudioQueuePause(inAQ)
audioBuffer.assign(repeating: 0, count: 320)
inBuffer.pointee.mAudioDataByteSize = 320
} else {
semaphore.wait()
if let data = queue.dequeue() {
data.copyBytes(to: audioBuffer, count: data.count)
inBuffer.pointee.mAudioDataByteSize = data.count
} else {
print("Error: queue is empty")
semaphore.signal()
return
}
semaphore.signal()
}
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, nil)
}
In my case I use 320 bytes buffer for 20ms of PCM data 16bits, 8kHz, mono.
This solution is more complexe but better than a pseudo infinite loop with empty audio data for your CPU. Apple is very punitive with greedy apps ;)
I hope this solution will help.