Clipping sound with opus on Android, sent from IOS - ios

I am recording audio in IOS from audioUnit, encoding the bytes with opus and sending it via UDP to android side. The problem is that the sound is playing a bit clipped. I have also tested the sound by sending the Raw data from IOS to Android and it plays perfect.
My AudioSession code is
try audioSession.setCategory(.playAndRecord, mode: .voiceChat, options: [.defaultToSpeaker])
try audioSession.setPreferredIOBufferDuration(0.02)
try audioSession.setActive(true)
My recording callBack code is:
func performRecording(
_ ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBufNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>) -> OSStatus
{
var err: OSStatus = noErr
err = AudioUnitRender(audioUnit!, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData)
if let mData = ioData[0].mBuffers.mData {
let ptrData = mData.bindMemory(to: Int16.self, capacity: Int(inNumberFrames))
let bufferPtr = UnsafeBufferPointer(start: ptrData, count: Int(inNumberFrames))
count += 1
addedBuffer += Array(bufferPtr)
if count == 2 {
let _ = TPCircularBufferProduceBytes(&circularBuffer, addedBuffer, UInt32(addedBuffer.count * 2))
count = 0
addedBuffer = []
let buffer = TPCircularBufferTail(&circularBuffer, &availableBytes)
memcpy(&targetBuffer, buffer, Int(min(bytesToCopy, Int(availableBytes))))
TPCircularBufferConsume(&circularBuffer, UInt32(min(bytesToCopy, Int(availableBytes))))
self.audioRecordingDelegate(inTimeStamp.pointee.mSampleTime / Double(16000), targetBuffer)
}
}
return err;
}
Here i am getting inNumberOfFrames almost 341 and i am appending 2 arrays together to get a bigger framesize (needed 640) for Android but i am only encoding 640 by the help of TPCircularBuffer.
func gotSomeAudio(timeStamp: Double, samples: [Int16]) {
samples.count))
let encodedData = opusHelper?.encodeStream(of: samples)
OPUS_SET_BITRATE_REQUEST)
let myData = encodedData!.withUnsafeBufferPointer {
Data(buffer: $0)
}
var protoModel = ProtoModel()
seqNumber += 1
protoModel.sequenceNumber = seqNumber
protoModel.timeStamp = Date().currentTimeInMillis()
protoModel.payload = myData
DispatchQueue.global().async {
do {
try self.tcpClient?.send(data: protoModel)
} catch {
print(error.localizedDescription)
}
}
let diff = CFAbsoluteTimeGetCurrent() - start
print("Time diff is \(diff)")
}
In the above code i am opus encoding 640 frameSize and adding it to ProtoBuf payload and Sending it via UDP.
On Android side i am parsing the Protobuf and decoding the 640 framesize and playing it with AudioTrack.There is no problem with android side as i have recorded and played sound just by using Android but the problem comes when i record sound via IOS and play through Android Side.
Please don't suggest to increase the frameSize by setting Preferred IO Buffer Duration. I want to do it without changing this.
https://stackoverflow.com/a/57873492/12020007 It was helpful.
https://stackoverflow.com/a/58947295/12020007
I have updated my code according to your suggestion, removed the delegate and array concatenation but there is still clipping on android side. I have also calculated the time it takes to encode bytes that is approx 2-3 ms.
Updated callback code is
var err: OSStatus = noErr
// we are calling AudioUnitRender on the input bus of AURemoteIO
// this will store the audio data captured by the microphone in ioData
err = AudioUnitRender(audioUnit!, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData)
if let mData = ioData[0].mBuffers.mData {
_ = TPCircularBufferProduceBytes(&circularBuffer, mData, inNumberFrames * 2)
print("mDataByteSize: \(ioData[0].mBuffers.mDataByteSize)")
count += 1
if count == 2 {
count = 0
let buffer = TPCircularBufferTail(&circularBuffer, &availableBytes)
memcpy(&targetBuffer, buffer, min(bytesToCopy, Int(availableBytes)))
TPCircularBufferConsume(&circularBuffer, UInt32(min(bytesToCopy, Int(availableBytes))))
let encodedData = opusHelper?.encodeStream(of: targetBuffer)
let myData = encodedData!.withUnsafeBufferPointer {
Data(buffer: $0)
}
var protoModel = ProtoModel()
seqNumber += 1
protoModel.sequenceNumber = seqNumber
protoModel.timeStamp = Date().currentTimeInMillis()
protoModel.payload = myData
do {
try self.udpClient?.send(data: protoModel)
} catch {
print(error.localizedDescription)
}
}
}
return err;

Your code is doing Swift memory allocation (Array concatenation) and Swift method calls (your recording delegate) inside the audio callback. Apple (in a WWDC session on Audio) recommends not doing any memory allocation or method calls inside the real-time audio callback context (especially when requesting short Preferred IO Buffer Durations). Stick to C function calls, such as memcpy and TPCircularBuffer.
Added: Also, don't discard samples. If you get 680 samples, but only need 640 for a packet, keep the 40 "left over" samples and use them appended in front of a later packet. The circular buffer will save them for you. Rinse and repeat. Send all the samples you get from the audio callback when you've accumulated enough for a packet, or yet another packet when you end up accumulating 1280 (2*640) or more.

Related

Streaming PCM audio over tcp socket

I have a continuous stream of raw PCM audio data from a TCP socket and I want to play them. I've done so many researches and saw many samples but no result. This gist was the most close solution but the problem is, it's streaming mp3 file.
So I have a socket which receives linear PCM audio data and give them to the player like this:
func play(_ data: Data) {
// this function is called for every 320 bytes of linear PCM data.
// play the 320 bytes of PCM data here!
}
So is there any "Simple" way to play raw PCM audio data?
For iOS, you can use the RemoteIO Audio Unit or the AVAudioEngine with a circular buffer for real-time audio streaming.
You can't give network data directly to audio output, but instead should put it in a circular buffer from which an audio subsystem play callback can consume it at its fixed rate. You will need to pre-buffer some amount of audio samples to cover network jitter.
Simple "ways" of doing this might not handle network jitter gracefully.
Answering late but If you are still stuck in playing TCP bytes then try to follow my answer where you put your tcp audio bytes in Circular Buffer and play it via AudioUnit.


Below code receives bytes from TCP and put them into a TPCircularBuffer

func tcpReceive() {
receivingQueue.async {
repeat {
do {
let datagram = try self.tcpClient?.receive()
var byteData = datagram?["data"] as? Data
let dataLength = datagram?["length"] as? Int
let _ = TPCircularBufferProduceBytes(&self.circularBuffer, byteData.bytes, UInt32(decodedLength * 2))
} catch {
fatalError(error.localizedDescription)
}
} while true
}
}
Create Audio Unit

...
var desc = AudioComponentDescription(
componentType: OSType(kAudioUnitType_Output),
componentSubType: OSType(kAudioUnitSubType_VoiceProcessingIO),
componentManufacturer: OSType(kAudioUnitManufacturer_Apple),
componentFlags: 0,
componentFlagsMask: 0
)
let inputComponent = AudioComponentFindNext(nil, &desc)
status = AudioComponentInstanceNew(inputComponent!, &audioUnit)
if status != noErr {
print("Audio component instance new error \(status!)")
}
// Enable IO for playback
status = AudioUnitSetProperty(
audioUnit!,
kAudioOutputUnitProperty_EnableIO,
kAudioUnitScope_Output,
kOutputBus,
&flag,
SizeOf32(flag)
)
if status != noErr {
print("Enable IO for playback error \(status!)")
}
//Use your own format, I have sample rate of 16000 and pcm 16 Bit
var ioFormat = CAStreamBasicDescription(
sampleRate: 16000.0,
numChannels: 1,
pcmf: .int16,
isInterleaved: false
)
//This is playbackCallback 
 var playbackCallback = AURenderCallbackStruct(
inputProc: AudioController_PlaybackCallback, //This is a delegate where audioUnit puts the bytes
inputProcRefCon: UnsafeMutableRawPointer(Unmanaged.passUnretained(self).toOpaque())
)
status = AudioUnitSetProperty(
audioUnit!,
AudioUnitPropertyID(kAudioUnitProperty_SetRenderCallback),
AudioUnitScope(kAudioUnitScope_Input),
kOutputBus,
&playbackCallback,
MemoryLayout<AURenderCallbackStruct>.size.ui
)
if status != noErr {
print("Failed to set recording render callback \(status!)")
}
//Init Audio Unit
status = AudioUnitInitialize(audioUnit!)
if status != noErr {
print("Failed to initialize audio unit \(status!)")
}
//Start AudioUnit
 status = AudioOutputUnitStart(audioUnit!)
if status != noErr {
print("Failed to initialize output unit \(status!)")
}
 
This is my playbackCallback function where I play the audio from Circular Buffer

func performPlayback(
_ ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
inTimeStamp: UnsafePointer<AudioTimeStamp>,
inBufNumber: UInt32,
inNumberFrames: UInt32,
ioData: UnsafeMutablePointer<AudioBufferList>
) -> OSStatus {
let buffer = ioData[0].mBuffers
let bytesToCopy = ioData[0].mBuffers.mDataByteSize
var bufferTail: UnsafeMutableRawPointer?
var availableBytes: UInt32 = 0
bufferTail = TPCircularBufferTail(&self.circularBuffer, &availableBytes)
let bytesToWrite = min(bytesToCopy, availableBytes)
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: ioData[0].mBuffers)
var monoSamples = [Int16]()
let ptr = bufferList.mBuffers.mData?.assumingMemoryBound(to: Int16.self)
monoSamples.append(contentsOf: UnsafeBufferPointer(start: ptr, count: Int(inNumberFrames)))
print(monoSamples)
memcpy(buffer.mData, bufferTail, Int(bytesToWrite))
TPCircularBufferConsume(&self.circularBuffer, bytesToWrite)
return noErr
}
For TPCircularBuffer I used this pod
'TPCircularBuffer', '~> 1.6'
All detail description and sample code is available for
Audiotoolbox / AudioUnit
You can register the Callback to get the PCM data from the AUGraph and send the pcm buffer to the socket.
Some more example of the usage :
https://github.com/rweichler/coreaudio-examples/blob/master/CH08_AUGraphInput/main.cpp

Get Mic data callbacks for 20 Miliseconds VoIP App

I am developing VOIP calling app so now I am in the stage where I need to transfer the voice data to the server. For that I want to get Real time audio voice data from mic with 20 mili Seconds callbacks.
I did searched many links but I am unable find solution as
i am new to audio frameworks.
Details
We have our own stack like WebRTC which gives RTP sends data from remote for every 20 mili second and asks data from Mic for 20 mili second , What I am trying to achieve is to get 20 mili second data from mic and pass it the same to the stack. So need to know how to do so. Audio format is pcmFormatInt16 and sample rate is 8000 Hz with 20 mili seconds data.
I have searched for
AVAudioEngine,
AUAudioUnit,
AVCaptureSession Etc.
1.I am Using AVAudioSession and AUAudioUnit but setPreferredIOBufferDuration of audioSession is not setting with exact value what i have set. In result of that i am not getting the exact data size. Can anybody help me on setPreferredIOBufferDuration.
2.One more issue is auAudioUnit.outputProvider () is giving inputData in UnsafeMutableAudioBufferListPointer. inputData list has two element and I want only one sample. Can anybody help me on that to change it into data format which can be played in AVAudioPlayer.
I have followed before link
https://gist.github.com/hotpaw2/ba815fc23b5d642705f2b1dedfaf0107
let hwSRate = audioSession.sampleRate
try audioSession.setActive(true)
print("native Hardware rate : \(hwSRate)")
try audioSession.setPreferredIOBufferDuration(preferredIOBufferDuration)
try audioSession.setPreferredSampleRate(8000) // at 8000.0 Hz
print("Changed native Hardware rate : \(audioSession.sampleRate) buffer duration \(audioSession.ioBufferDuration)")
try auAudioUnit = AUAudioUnit(componentDescription: self.audioComponentDescription)
auAudioUnit.outputProvider = { // AURenderPullInputBlock
(actionFlags, timestamp, frameCount, inputBusNumber, inputData) -> AUAudioUnitStatus in
if let block = self.renderBlock { // AURenderBlock?
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
1,
inputData,
.none)
if err == noErr {
// save samples from current input buffer to circular buffer
print("inputData = \(inputData) and frameCount: \(frameCount)")
self.recordMicrophoneInputSamples(
inputDataList: inputData,
frameCount: UInt32(frameCount) )
}
}
let err2 : AUAudioUnitStatus = noErr
return err2
}
Log:-
Changed native Hardware rate : 8000.0 buffer duration 0.01600000075995922
try to get 40 ms data from the Audio interface and then split it up into 20ms data.
also check if you are able to set the sampling frequency (8 Khz) of the audio interface.
Render block will give you call backs according to the accepted set up by hardware for AUAudioUnit and AudioSession. We have to manage buffer if we want different size of input from mic. Output to the speaker should be same size as it expects like 128, 256 ,512 bytes etc.
try audioSession.setPreferredSampleRate(sampleRateProvided) // at 48000.0
try audioSession.setPreferredIOBufferDuration(preferredIOBufferDuration)
These values can be different from our preferred size. That is why we have to use buffer logic get out preferred size of input.
Link: https://gist.github.com/hotpaw2/ba815fc23b5d642705f2b1dedfaf0107
renderBlock = auAudioUnit.renderBlock
if ( enableRecording
&& micPermissionGranted
&& audioSetupComplete
&& audioSessionActive
&& isRecording == false ) {
auAudioUnit.inputHandler = { (actionFlags, timestamp, frameCount, inputBusNumber) in
if let block = self.renderBlock { // AURenderBlock?
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: audioFormat!.channelCount,
mDataByteSize: 0,
mData: nil))
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
inputBusNumber,
&bufferList,
.none)
if err == noErr {
// save samples from current input buffer to circular buffer
print("inputData = \(bufferList.mBuffers.mDataByteSize) and frameCount: \(frameCount) and count: \(count)")
count += 1
if !self.isMuteState {
self.recordMicrophoneInputSamples(
inputDataList: &bufferList,
frameCount: UInt32(frameCount) )
}
}
}
}
auAudioUnit.isInputEnabled = true
auAudioUnit.outputProvider = { ( // AURenderPullInputBlock?
actionFlags,
timestamp,
frameCount,
inputBusNumber,
inputDataList ) -> AUAudioUnitStatus in
if let block = self.renderBlock {
if let dataReceived = self.getInputDataForConsumption() {
let mutabledata = NSMutableData(data: dataReceived)
var bufferListSpeaker = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: 1,
mDataByteSize: 0,
mData: nil))
let err : OSStatus = block(actionFlags,
timestamp,
frameCount,
1,
&bufferListSpeaker,
.none)
if err == noErr {
bufferListSpeaker.mBuffers.mDataByteSize = UInt32(mutabledata.length)
bufferListSpeaker.mBuffers.mData = mutabledata.mutableBytes
inputDataList[0] = bufferListSpeaker
print("Output Provider mDataByteSize: \(inputDataList[0].mBuffers.mDataByteSize) output FrameCount: \(frameCount)")
return err
} else {
print("Output Provider \(err)")
return err
}
}
}
return 0
}
auAudioUnit.isOutputEnabled = true
do {
circInIdx = 0 // initialize circular buffer pointers
circOutIdx = 0
circoutSpkIdx = 0
circInSpkIdx = 0
try auAudioUnit.allocateRenderResources()
try auAudioUnit.startHardware() // equivalent to AudioOutputUnitStart ???
isRecording = true
} catch let e {
print(e)
}

Piping AudioKit Microphone to Google Speech-to-Text

I'm trying to get AudioKit to pipe the microphone to Google's Speech-to-Text API as seen here but I'm not entirely sure how to go about it.
To prepare the audio for the Speech-to-Text engine, you need to set up the encoding and pass it through as chunks. In the example Google uses, they use Apple's AVFoundation, but I'd like to use AudioKit so I can preform some pre-processing such as cutting of low amplitudes etc.
I believe the right way to do this is to use a Tap:
First, I should match the format by:
var asbd = AudioStreamBasicDescription()
asbd.mSampleRate = 16000.0
asbd.mFormatID = kAudioFormatLinearPCM
asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
asbd.mBytesPerPacket = 2
asbd.mFramesPerPacket = 1
asbd.mBytesPerFrame = 2
asbd.mChannelsPerFrame = 1
asbd.mBitsPerChannel = 16
AudioKit.format = AVAudioFormat(streamDescription: &asbd)!
Then create a tap such as:
open class TestTap {
internal let bufferSize: UInt32 = 1_024
#objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
// do work here
}
}
}
But I wasn't able to identify the right way of handling this data to be sent to the Google Speech-to-Text API via the method streamAudioData in real-time with AudioKit but perhaps I am going about this the wrong way?
UPDATE:
I've created a Tap as such:
open class TestTap {
internal var audioData = NSMutableData()
internal let bufferSize: UInt32 = 1_024
func toData(buffer: AVAudioPCMBuffer) -> NSData {
let channelCount = 2 // given PCMBuffer channel count is
let channels = UnsafeBufferPointer(start: buffer.floatChannelData, count: channelCount)
return NSData(bytes: channels[0], length:Int(buffer.frameCapacity * buffer.format.streamDescription.pointee.mBytesPerFrame))
}
#objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
self.audioData.append(self.toData(buffer: buffer) as Data)
// We recommend sending samples in 100ms chunks (from Google)
let chunkSize: Int /* bytes/chunk */ = Int(0.1 /* seconds/chunk */
* AudioKit.format.sampleRate /* samples/second */
* 2 /* bytes/sample */ )
if self.audioData.length > chunkSize {
SpeechRecognitionService
.sharedInstance
.streamAudioData(self.audioData,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
self.audioData = NSMutableData()
}
}
}
}
and in viewDidLoad:, I'm setting AudioKit up with:
AKSettings.sampleRate = 16_000
AKSettings.bufferLength = .shortest
However, Google complains with:
ERROR: Audio data is being streamed too fast. Please stream audio data approximately at real time.
I've tried changing multiple parameters such as the chunk size to no avail.
I found the solution here.
Final code for my Tap is:
open class GoogleSpeechToTextStreamingTap {
internal var converter: AVAudioConverter!
#objc public init(_ input: AKNode?, sampleRate: Double = 16000.0) {
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: sampleRate, channels: 1, interleaved: false)!
self.converter = AVAudioConverter(from: AudioKit.format, to: format)
self.converter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Normal
self.converter?.sampleRateConverterQuality = .max
let sampleRateRatio = AKSettings.sampleRate / sampleRate
let inputBufferSize = 4410 // 100ms of 44.1K = 4410 samples.
input?.avAudioNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(inputBufferSize), format: nil) { buffer, time in
let capacity = Int(Double(buffer.frameCapacity) / sampleRateRatio)
let bufferPCM16 = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))!
var error: NSError? = nil
self.converter?.convert(to: bufferPCM16, error: &error) { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer
}
let channel = UnsafeBufferPointer(start: bufferPCM16.int16ChannelData!, count: 1)
let data = Data(bytes: channel[0], count: capacity * 2)
SpeechRecognitionService
.sharedInstance
.streamAudioData(data,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
}
}
You can likely record using AKNodeRecorder, and pass along the buffer from the resulting AKAudioFile to the API. If you wanted more real-time, you could try installing a tap on the avAudioNode property of the AKNode you want to record and pass the buffers to the API continuously.
However, I'm curious why you see the need for pre-processing - I'm sure the Google API is plenty optimized for recordings produced by the sample code you noted.
I've had a lot of success / fun with the iOS Speech API. Not sure if there's a reason you want to go with the Google API, but I'd consider checking it out and seeing if it might better serve your needs if you haven't already.
Hope this helps!

AAC encoding using AudioConverter and writing to AVAssetWriter

I'm struggling to encode audio buffers received from AVCaptureSession using
AudioConverter and then appending them to an AVAssetWriter.
I'm not getting any errors (including OSStatus responses), and the
CMSampleBuffers generated seem to have valid data, however the resulting file
simply does not have any playable audio. When writing together with video, the video
frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error), probably because the asset
writer is waiting for the audio to catch up. I suspect it's related to the way
I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to
understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to
be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
Below is the proc that fetches samples for the audio converter, and the data
structure that gets passed to it:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
This is how I am converting AudioBufferLists to CMSampleBuffers:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
Is there anything that I am obviously doing wrong? Is there a proper way to
construct CMSampleBuffers from AudioBufferList? How do you transfer priming
information from the converter to CMSampleBuffers that you create?
For my use case I need to do the encoding manually as the buffers will be
manipulated further down the pipeline (although I've disabled all
transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to
digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
Can I use AVCaptureSession to encode an AAC stream to memory?
Writing video + generated audio to AVAssetWriterInput, audio stuttering
How do I use CoreAudio's AudioConverter to encode AAC in real-time?
Some references I've used:
Apple sample code demonstrating how to use AudioConverter
Note describing AAC encoder delay
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
Samples vs Packets vs Frames
This had been a huge source of confusion for me:
Each CMSampleBuffer can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples)
Each CMSampleBuffer that contains 1 sample represents a single audio packet.
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in the given buffer.
Packets contain frames. This is governed by the mFramesPerPacket property of the buffer's AudioStreamBasicDescription. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame. For compressed buffers (like AAC), there is no relationship between the total size and frame count.
AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
Slicing CMSampleBuffer's
You can actually slice CMSampleBuffer's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange. This is nice so that you can provide the AudioConverterComplexInputDataProc with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500 frames of data when it expects 1024, the result sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate.
Priming and trim duration
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart should never be greater than the duration of the sample buffer. An example of priming:
Priming frames: 2112
Sample rate: 44100
Priming duration: 2112 / 44100 = ~0.0479s
First frame, frames: 1024, priming duration: 1024 / 44100
Second frame, frames: 1024, priming duration: 1088 / 41100
Creating the new CMSampleBuffer
AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)

The sound muted after playing audio with Audio Queue on iOS for a while

I am coding a real time audio playback program on iOS.
It receives audio RTP packages from the peer, and put it into audio queue to play.
When start playing, the sound is OK. But after 1 or 2 minutes, the sound muted, and there is no error reported from AudioQueue API. The callback function continues being called normally, nothing abnormal.
But it just muted.
My callback function:
1: Loop until there is enough data can be copied to audio queue buffer
do
{
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled >= kAudioQueueBufferLength)
{
break;
}
usleep(10*1000);
}
while (true);
2: Copy to AudioQueue Buffer, and enqueue it. This callback function keeps running normally and no error.
//copy to audio queue buffer
read_bytes = kAudioQueueBufferLength;
g_audio_playback_buf.Read((unsigned char *)inBuffer->mAudioData, read_bytes);
WriteLog(LOG_PHONE_DEBUG, "AudioQueueBuffer(Play): copy [%d] bytes to AudioQueue buffer! Total len = %d", read_bytes, read_bytes_enabled);
inBuffer->mAudioDataByteSize = read_bytes;
UInt32 nPackets = read_bytes / g_audio_periodsize; // mono
inBuffer->mPacketDescriptionCount = nPackets;
// re-enqueue this buffer
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
The problem has been resolved.
The key point is, you can not let the audio queue buffer waits, you must keep feeding it, or it might be muted. If you don't have enough data, fill it with blank data.
so the following code should be changed:
do
{
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled >= kAudioQueueBufferLength)
{
break;
}
usleep(10*1000);
}
while (true);
changed to this:
read_bytes_enabled = g_audio_playback_buf.GetReadByteLen();
if (read_bytes_enabled < kAudioQueueBufferLength)
{
memset(inBuffer->mAudioData, 0x00, kAudioQueueBufferLength);
}
else
{
inBuffer->mAudioDataByteSize = kAudioQueueBufferLength;
}
...
You can let the AudioQueue wait if you use AudioQueuePause.
In this exemple, in Swift 5, I use a generic queue. When this queue is empty, as you did, I fill my buffer with empty data in callback and call AudioQueuePause. It's important to note that all of AudioQueueBuffer send to AudioQueueRef with AudioQueueEnqueueBuffer before call AudioQueuePause are played.
Create an userData class to send everything you need to your callback :
class UserData {
let dataQueue = Queue<Data>()
let semaphore = DispatchSemaphore(value: 1)
}
private var inQueue: AudioQueueRef!
private var userData = UserData()
Give an instance of this class when you create your AudioQueue and start it :
AudioQueueNewOutput(&inFormat, audioQueueOutputCallback, &userData, nil, nil, 0, &inQueue)
AudioQueueStart(inQueue, nil)
Generate all your buffers and don't enqueue them directly : call your callback function :
for _ in 0...2 {
var bufferRef: AudioQueueBufferRef!
AudioQueueAllocateBuffer(inQueue, 320, &bufferRef)
audioQueueOutputCallback(&userData, inQueue, bufferRef)
}
When you receive audio data, you can call a method who enqueue your data and let it wait for callback function get it :
func audioReceived(_ audio: Data) {
let dataQueue = userData.dataQueue
let semaphore = userData.semaphore
semaphore.wait()
dataQueue.enqueue(audio)
semaphore.signal()
// Start AudioQueue every time, if it's already started this call do nothing
AudioQueueStart(inQueue, nil)
}
Finally you can implement a callback function like this :
private let audioQueueOutputCallback: AudioQueueOutputCallback = { (inUserData, inAQ, inBuffer) in
// Get data from UnsageMutableRawPointer
let userData: UserData = (inUserData!.bindMemory(to: UserData.self, capacity: 1).pointee)
let queue = userData.dataQueue
let semaphore = userData.semaphore
// bind UnsafeMutableRawPointer to UnsafeMutablePointer<UInt8> for data copy
let audioBuffer = inBuffer.pointee.mAudioData.bindMemory(to: UInt8.self, capacity: 320)
if queue.isEmpty {
print("Queue is empty: pause")
AudioQueuePause(inAQ)
audioBuffer.assign(repeating: 0, count: 320)
inBuffer.pointee.mAudioDataByteSize = 320
} else {
semaphore.wait()
if let data = queue.dequeue() {
data.copyBytes(to: audioBuffer, count: data.count)
inBuffer.pointee.mAudioDataByteSize = data.count
} else {
print("Error: queue is empty")
semaphore.signal()
return
}
semaphore.signal()
}
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, nil)
}
In my case I use 320 bytes buffer for 20ms of PCM data 16bits, 8kHz, mono.
This solution is more complexe but better than a pseudo infinite loop with empty audio data for your CPU. Apple is very punitive with greedy apps ;)
I hope this solution will help.

Resources