Read UInt32 from InputStream - ios

I need to communicate with a server that has a special message format: Each message begins with 4 bytes (together a unsigned long / UInt32 in big endian format) which determines the length of the following message. After those 4 bytes the message is sent as a normal string
So I first need to read 4 bytes into an Integer (32 bit unsigned). In Java I do this like:
DataInputStream is;
...
int len = is.readInt();
How can I do this in Swift 4?
At the moment I use
var lengthbuffer = [UInt8](repeating: 0, count: 4)
let bytecount = istr.read(&lengthbuffer, maxLength: 4)
let lengthbytes = lengthbuffer[0...3]
let bigEndianValue = lengthbytes.withUnsafeBufferPointer {
($0.baseAddress!.withMemoryRebound(to: UInt32.self, capacity: 1) { $0 })
}.pointee
let bytes_expected = Int(UInt32(bigEndian: bigEndianValue))
But this looks not like this is the most elegant way. And furthermore, sometimes (I cannot reproduces it reliably) there is a wrong value read (too big). When I then try to allocate memory for the following message, the app crashes:
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bytes_expected)
let bytes_read = istr.read(buffer, maxLength: bytes_expected)
So what is the swift way to read a UInt32 from a InputStream?
EDIT:
My current code (implemented things from the comments. Thanks!) looks like this:
private let inputStreamAccessQueue = DispatchQueue(label: "SynchronizedInputStreamAccess") // NOT concurrent!!!
// This is called on Stream.Event.hasBytesAvailable
func handleInput() {
self.inputStreamAccessQueue.sync(flags: .barrier) {
guard let istr = self.inputStream, istr.hasBytesAvailable else {
log.error(self.buildLogMessage("handleInput() called when inputstream has no bytes available"))
return
}
let lengthbuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: 4)
defer { lengthbuffer.deallocate(capacity: 4) }
let lenbytes_read = istr.read(lengthbuffer, maxLength: 4)
guard lenbytes_read == 4 else {
self.errorHandler(NetworkingError.InputError("Input Stream received \(lenbytes_read) (!=4) bytes"))
return
}
let bytes_expected = Int(UnsafeRawPointer(lengthbuffer).load(as: UInt32.self).bigEndian)
log.info(self.buildLogMessage("expect \(bytes_expected) bytes"))
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bytes_expected)
let bytes_read = istr.read(buffer, maxLength: bytes_expected)
guard bytes_read == bytes_expected else {
print("Error: Expected \(bytes_expected) bytes, read \(bytes_read)")
return
}
guard let message = String(bytesNoCopy: buffer, length: bytes_expected, encoding: .utf8, freeWhenDone: true) else {
log.error("ERROR WHEN READING")
return
}
self.handleMessage(message)
}
}
This works most of the time, but sometimes istr.read() does not read bytes_expected bytes but bytes_read < bytes_expected. This results in another hasbytesAvailable event and handleInput() is called again. This time, of course, the first 4 bytes that are read do not contain the length of a new message but some content of the last message. But my code does not know that, so the first bytes are interpreted as the length. In many cases this is a real big value => allocating too much memory => crash
I think this is the explanation for the bug. But how to solve it?
Call read() on the stream while hasBytesAvailable = true? Is there maybe a better solution?
I would assume that when I loop, the hasBytesAvailableEvent would still happen after every read() => handleInput would still be called again too early... How can I avoid this?
EDIT 2: I have implemented the loop now, unfortunately it is still crashing with the same error (and probably same reason). Relevant code:
let bytes_expected = Int(UnsafeRawPointer(lengthbuffer).load(as: UInt32.self).bigEndian)
var message = ""
var bytes_missing = bytes_expected
while bytes_missing > 0 {
print("missing", bytes_missing)
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bytes_missing)
let bytes_read = istr.read(buffer, maxLength: bytes_missing)
guard bytes_read > 0 else {
print("bytes_read not <= 0: \(bytes_read)")
return
}
guard bytes_read <= bytes_missing else {
print("Read more bytes than expected. missing=\(bytes_missing), read=\(bytes_read)")
return
}
guard let partial_message = String(bytesNoCopy: buffer, length: bytes_expected, encoding: .utf8, freeWhenDone: true) else {
log.error("ERROR WHEN READING")
return
}
message = message + partial_message
bytes_missing -= bytes_read
}
My console output when it crashes:
missing 1952807028 malloc: * mach_vm_map(size=1952808960) failed
(error code=3)
* error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
So it seems that the whole handleInput() method is called too early, although I use the barrier! What am I doing wrong?

I‘d do it like this (ready to be pasted into a playground):
import Foundation
var stream = InputStream(data: Data([0,1,0,0]))
stream.open()
defer { stream.close() }
var buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: 4)
defer { buffer.deallocate(capacity: 4) }
guard stream.read(buffer, maxLength: 4) >= 4 else {
// handle all cases: end of stream, error, waiting for more data to arrive...
fatalError()
}
let number = UnsafeRawPointer(buffer).load(as: UInt32.self)
number // 256
number.littleEndian // 256
number.bigEndian // 65536
Using UnsafeRawPointer.load directly (without explicit rebinding) is safe for trivial types according to the documentation. Trivial types are generally those that don‘t require ARC operations.
Alternatively, you can access the same memory as a different type without rebinding through untyped memory access, so long as the bound type and the destination type are trivial types.

I would suggest load(as:) to convert the buffer to the UInt32, and I would make sure you make the endianness explicit, e.g.
let value = try stream.read(type: UInt32.self, endianness: .little)
Where:
enum InputStreamError: Error {
case readFailure
}
enum Endianness {
case little
case big
}
extension InputStream {
func read<T: FixedWidthInteger>(type: T.Type, endianness: Endianness = .little) throws -> T {
let size = MemoryLayout<T>.size
var buffer = [UInt8](repeating: 0, count: size)
let count = read(&buffer, maxLength: size)
guard count == size else {
throw InputStreamError.readFailure
}
return buffer.withUnsafeBytes { pointer -> T in
switch endianness {
case .little: return T(littleEndian: pointer.load(as: T.self))
case .big: return T(bigEndian: pointer.load(as: T.self))
}
}
}
func readFloat(endianness: Endianness) throws -> Float {
return try Float(bitPattern: read(type: UInt32.self, with: endianness))
}
func readDouble(endianness: Endianness) throws -> Double {
return try Double(bitPattern: read(type: UInt64.self, with: endianness))
}
}
Note, I made read(type:endianness:) a generic, so it can be reused with any of the standard integer types. I have also thrown in readFloat and readDouble for good measure.

Related

Xcode 14 UnsafePointer.withMemoryRebound throwing EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP) during unit tests of unzip protocol

Hellow!
I'm trying to run unit tests on a piece of code but I'm getting EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP) exception when the following code is executed:
/// Decompresses the data using the gzip deflate algorithm. Self is expected to be a gzip deflate
/// stream according to [RFC-1952](https://tools.ietf.org/html/rfc1952).
/// - returns: uncompressed data
func gunzip() -> Data?
{
// 10 byte header + data + 8 byte footer. See https://tools.ietf.org/html/rfc1952#section-2
let overhead = 10 + 8
guard count >= overhead else { return nil }
typealias GZipHeader = (id1: UInt8, id2: UInt8, cm: UInt8, flg: UInt8, xfl: UInt8, os: UInt8)
let hdr: GZipHeader = withUnsafeBytes { (ptr: UnsafePointer<UInt8>) -> GZipHeader in
// +---+---+---+---+---+---+---+---+---+---+
// |ID1|ID2|CM |FLG| MTIME |XFL|OS |
// +---+---+---+---+---+---+---+---+---+---+
return (id1: ptr[0], id2: ptr[1], cm: ptr[2], flg: ptr[3], xfl: ptr[8], os: ptr[9])
}
typealias GZipFooter = (crc32: UInt32, isize: UInt32)
let ftr: GZipFooter = withUnsafeBytes { (bptr: UnsafePointer<UInt8>) -> GZipFooter in
// +---+---+---+---+---+---+---+---+
// | CRC32 | ISIZE |
// +---+---+---+---+---+---+---+---+
return bptr.advanced(by: count - 8).withMemoryRebound(to: UInt32.self, capacity: 2) { ptr in
return (ptr[0].littleEndian, ptr[1].littleEndian)
}
}
// Wrong gzip magic or unsupported compression method
guard hdr.id1 == 0x1f && hdr.id2 == 0x8b && hdr.cm == 0x08 else { return nil }
let has_crc16: Bool = hdr.flg & 0b00010 != 0
let has_extra: Bool = hdr.flg & 0b00100 != 0
let has_fname: Bool = hdr.flg & 0b01000 != 0
let has_cmmnt: Bool = hdr.flg & 0b10000 != 0
let cresult: Data? = withUnsafeBytes { (ptr: UnsafePointer<UInt8>) -> Data? in
var pos = 10 ; let limit = count - 8
if has_extra {
pos += ptr.advanced(by: pos).withMemoryRebound(to: UInt16.self, capacity: 1) {
return Int($0.pointee.littleEndian) + 2 // +2 for xlen
}
}
if has_fname {
while pos < limit && ptr[pos] != 0x0 { pos += 1 }
pos += 1 // skip null byte as well
}
if has_cmmnt {
while pos < limit && ptr[pos] != 0x0 { pos += 1 }
pos += 1 // skip null byte as well
}
if has_crc16 {
pos += 2 // ignoring header crc16
}
guard pos < limit else { return nil }
let config = (operation: COMPRESSION_STREAM_DECODE, algorithm: COMPRESSION_ZLIB)
return perform(config, source: ptr.advanced(by: pos), sourceSize: limit - pos)
}
guard let inflated = cresult else { return nil }
guard ftr.isize == UInt32(truncatingIfNeeded: inflated.count) else { return nil }
guard ftr.crc32 == inflated.crc32().checksum else { return nil }
return inflated
}
fileprivate extension Data {
func withUnsafeBytes<ResultType, ContentType>(_ body: (UnsafePointer<ContentType>) throws -> ResultType) rethrows -> ResultType
{
return try self.withUnsafeBytes({ (rawBufferPointer: UnsafeRawBufferPointer) -> ResultType in
return try body(rawBufferPointer.bindMemory(to: ContentType.self).baseAddress!)
})
}
}
This portion of code always throws the mentioned exception:
This problem has been happening consistently. This code has not been touched in 2 years, and it doesn't fail when running in a cloud pipeline (Azure) with MacOS 10.15
It fails on multiple MacOS devices running Monterrey (12.6) and Xcode 14.
Anyone has any thoughts on what can be causing it?
I know the problem is pretty vague but that's the only context I have.
Thanks
Based on the code snippet you provided, there is no obvious error that would cause the EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP) exception.
It seems that the error is related to the memory management, the code is using pointers, and it seems that the memory that bptr is pointing to is not valid or that it doesn't match the type specified in the withMemoryRebound function.
One possible cause of this issue could be that the code is expecting a specific memory layout or alignment that is not present on the newer version of macOS. There are few things you could try:
Update the code to use the latest Swift syntax and features, this will help you to avoid deprecated methods and also help you to avoid pointer related errors.
Run the code on an older version of macOS and Xcode to see if that resolves the issue.
Check if there is any system library that is missing from the environment or if there are any system libraries that are outdated.
Try running the code on a simulator or emulator and see if that resolves the issue, this will help you to understand if the problem is related to the specific device or the operating system version
Try to use some debugging tools like the lldb to trace the problem and see which instruction is causing the error, this will help you to understand the problem and fix it.

Using CoreBluetooth CBL2CAPChannel to move data

I have set up some data transfer function, using CoreBluetooth CBL2CAPChannel, in a Swift iOS app. Here is the function for sending data:
func sendData(_ outStream: OutputStream) -> Bool {
let data = tranferBlock! // tranferBlock holds some .utf8 data.
let bytesWritten = data.withUnsafeBytes {outStream.write($0, maxLength: data.count)}
if bytesWritten > 0 {
tranferBlock = nil
return true
}
return false
}
And here is the function for receiving data:
func receiveData(_ inStream: InputStream) {
let bufLength = 1024
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufLength)
let bytesRead = inStream.read(buffer, maxLength: bufLength)
if let string = String(bytesNoCopy: buffer,
length: bytesRead,
encoding: .utf8,
freeWhenDone: false) {
print("READ-L2CAPData:\(bytesRead):\(string)")
if bytesRead != 0 {
receiveBuffer = string // receiveBuffer holds the received data.
let ntfc = NotificationCenter.default
ntfc.post(name:Notification.Name(rawValue: "DATACAMEIN"),
object: nil, userInfo: nil)
}
}
}
I can say, it is working, because this allows me to transfer data. But there is one issue: I do not receive data the same way I send it. For example, let us say I send data in 3 packets, like this:
.......
tranferBlock = Data("aaaAAAaaa".utf8)
sendData(outStream)
tranferBlock = Data("bbbBBBbbb".utf8)
sendData(outStream)
tranferBlock = Data("cccCCCccc".utf8)
sendData(outStream)
.......
I would expect, in that case to receive the data in 3 packets, at about the same pace I have sent them:
aaaAAAaaa
bbbBBBbbb
cccCCCccc
But instead I receive one packet:
aaaAAAaaabbbBBBbbbcccCCCccc
And very often, for some reason that I do not know, only when I kill the sending app.
I would like to know what I need to change to receive the data the way I expect.
Stream-based protocols do not support the framing you're after. You can't rely on the amount of bytes that are coalesced into one logical packet in the underlying protocol layers. If you need a framing, define one on the application layer.

Why am I not getting all the bytes of the image in the server?

I am building an iOS app that takes a photo and sends it to a TCP server running on my computer. The way I'm doing it is configuring the connection with Streams like this:
func setupCommunication() {
var readStream: Unmanaged<CFReadStream>?
var writeStream: Unmanaged<CFWriteStream>?
CFStreamCreatePairWithSocketToHost(kCFAllocatorDefault,
"192.168.1.40" as CFString, 2323, &readStream, &writeStream)
outputStream = writeStream!.takeRetainedValue()
outputStream.schedule(in: .current, forMode: .common)
outputStream.open()
}
Then, when I press the camera button, the photo is taken and sent through the outputStream. Since the TCP server doesn't know how much data it has to read, the first 8 bytes correspond to the size of the image, and the image is sent right after, as we can see in this code:
func photoOutput(_ output: AVCapturePhotoOutput,
didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
if let image = photo.fileDataRepresentation() {
print(image)
print(image.count)
var nBytes = UInt64(image.count)
let nData = Data(bytes: &nBytes, count: 8)
_ = nData.withUnsafeBytes({
outputStream.write($0, maxLength: nData.count)
})
_ = image.withUnsafeBytes({
outputStream.write($0, maxLength: image.count)
})
outputStream.close()
}
}
On the server side, which is written in C, I perform the next actions:
Read first 8 bytes to know the size of the image
printf("\n[*] New client connected\n");
while (n_recv < sizeof(uint64_t)) {
if ((n = read(client_sd, buffer, BUF_SIZ)) == -1) {
printf("\n[-] Error reading data from the client\n");
close(client_sd);
close(server_sd);
return 0;
}
n_recv += n;
}
memcpy(&img_size, buffer, sizeof(uint64_t));
printf("\n[+] Client says he's going to send %llu bytes\n", img_size);
Allocate enough memory to store the received image, and if we already read any byte of the image next to the its size, copy it.
if ((img_data = (uint8_t *) malloc(img_size)) == NULL) {
printf("\n[-] Error allocating memory for image\n");
close(client_sd);
close(server_sd);
return 0;
}
n_recv -= sizeof(uint64_t);
if (n_recv > 0) {
memcpy(img_data, buffer, n_recv);
}
From now on, n_recv is the number of bytes received of the image only, not including the first 8 bytes for the size. Then just read till the end.
while (n_recv < img_size) {
if ((n = read(client_sd, buffer, BUF_SIZ)) == -1) {
printf("\n[-] Error reading data from the client\n");
close(client_sd);
close(server_sd);
return 0;
}
memcpy(img_data + n_recv, buffer, n);
n_recv += n;
}
printf("\n[+] Data correctly recived from client\n");
close(client_sd);
close(server_sd);
This works pretty nice at the beginning. In fact, I can see that I'm getting the right number for the image size every time:
However, I'm not getting the full image, and the server just keeps waiting blocked in the read function. To see what's happening, I added this
printf("%llu\n", n_recv);
inside the loop for reading the image, to watch the number of bytes received. It stops in the middle of the image, for some reason I'm not able to explain:
What's the problem that is causing the communication to stop? Is the problem in the server code or is it something related to iOS app?
First, the C code looks okay to me.. but you realize you are missing return code/result handling in Swift?
In the C code you are checking the return value of recv to know if the bytes were read.. IE: You are checking if read returns -1..
However, in the swift code you make the assumption that ALL the data was written.. You never checked the result of the write operation on OutputStream which tells you how many bytes were written or returns -1 on failure..
You should be doing the same thing (after all, you did it in C).. For such cases I created two extensions:
extension InputStream {
/**
* Reads from the stream into a data buffer.
* Returns the count of the amount of bytes read from the stream.
* Returns -1 if reading fails or an error has occurred on the stream.
**/
func read(data: inout Data) -> Int {
let bufferSize = 1024
var totalBytesRead = 0
while true {
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
let count = read(buffer, maxLength: bufferSize)
if count == 0 {
return totalBytesRead
}
if count == -1 {
if let streamError = self.streamError {
debugPrint("Stream Error: \(String(describing: streamError))")
}
return -1
}
data.append(buffer, count: count)
totalBytesRead += count
}
return totalBytesRead
}
}
extension OutputStream {
/**
* Writes from a buffer into the stream.
* Returns the count of the amount of bytes written to the stream.
* Returns -1 if writing fails or an error has occurred on the stream.
**/
func write(data: Data) -> Int {
var bytesRemaining = data.count
var bytesWritten = 0
while bytesRemaining > 0 {
let count = data.withUnsafeBytes {
self.write($0.advanced(by: bytesWritten), maxLength: bytesRemaining)
}
if count == 0 {
return bytesWritten
}
if count < 0 {
if let streamError = self.streamError {
debugPrint("Stream Error: \(String(describing: streamError))")
}
return -1
}
bytesRemaining -= count
bytesWritten += count
}
return bytesWritten
}
}
Usage:
var readStream: Unmanaged<CFReadStream>?
var writeStream: Unmanaged<CFWriteStream>?
//For testing I used 127.0.0.1
CFStreamCreatePairWithSocketToHost(kCFAllocatorDefault, "192.168.1.40" as CFString, 2323, &readStream, &writeStream)
//Actually not sure if these need to be retained or unretained might be fine..
//Again, not sure..
var inputStream = readStream!.takeRetainedValue() as InputStream
var outputStream = writeStream!.takeRetainedValue() as OutputStream
inputStream.schedule(in: .current, forMode: .common)
outputStream.schedule(in: .current, forMode: .common)
inputStream.open()
outputStream.open()
var dataToWrite = Data() //Your Image
var dataRead = Data(capacity: 256) //Server response -- Pre-Allocate something large enough that you "think" you might read..
outputStream.write(data: dataToWrite)
inputStream.read(data: &dataRead)
Now you get error handling (printing) and you have buffered reading/writing.. After all, you're not guaranteed that the socket or pipe or w/e.. the stream is attached to has read/written ALL your bytes at once.. hence the reading/writing chunks.

Piping AudioKit Microphone to Google Speech-to-Text

I'm trying to get AudioKit to pipe the microphone to Google's Speech-to-Text API as seen here but I'm not entirely sure how to go about it.
To prepare the audio for the Speech-to-Text engine, you need to set up the encoding and pass it through as chunks. In the example Google uses, they use Apple's AVFoundation, but I'd like to use AudioKit so I can preform some pre-processing such as cutting of low amplitudes etc.
I believe the right way to do this is to use a Tap:
First, I should match the format by:
var asbd = AudioStreamBasicDescription()
asbd.mSampleRate = 16000.0
asbd.mFormatID = kAudioFormatLinearPCM
asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
asbd.mBytesPerPacket = 2
asbd.mFramesPerPacket = 1
asbd.mBytesPerFrame = 2
asbd.mChannelsPerFrame = 1
asbd.mBitsPerChannel = 16
AudioKit.format = AVAudioFormat(streamDescription: &asbd)!
Then create a tap such as:
open class TestTap {
internal let bufferSize: UInt32 = 1_024
#objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
// do work here
}
}
}
But I wasn't able to identify the right way of handling this data to be sent to the Google Speech-to-Text API via the method streamAudioData in real-time with AudioKit but perhaps I am going about this the wrong way?
UPDATE:
I've created a Tap as such:
open class TestTap {
internal var audioData = NSMutableData()
internal let bufferSize: UInt32 = 1_024
func toData(buffer: AVAudioPCMBuffer) -> NSData {
let channelCount = 2 // given PCMBuffer channel count is
let channels = UnsafeBufferPointer(start: buffer.floatChannelData, count: channelCount)
return NSData(bytes: channels[0], length:Int(buffer.frameCapacity * buffer.format.streamDescription.pointee.mBytesPerFrame))
}
#objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
self.audioData.append(self.toData(buffer: buffer) as Data)
// We recommend sending samples in 100ms chunks (from Google)
let chunkSize: Int /* bytes/chunk */ = Int(0.1 /* seconds/chunk */
* AudioKit.format.sampleRate /* samples/second */
* 2 /* bytes/sample */ )
if self.audioData.length > chunkSize {
SpeechRecognitionService
.sharedInstance
.streamAudioData(self.audioData,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
self.audioData = NSMutableData()
}
}
}
}
and in viewDidLoad:, I'm setting AudioKit up with:
AKSettings.sampleRate = 16_000
AKSettings.bufferLength = .shortest
However, Google complains with:
ERROR: Audio data is being streamed too fast. Please stream audio data approximately at real time.
I've tried changing multiple parameters such as the chunk size to no avail.
I found the solution here.
Final code for my Tap is:
open class GoogleSpeechToTextStreamingTap {
internal var converter: AVAudioConverter!
#objc public init(_ input: AKNode?, sampleRate: Double = 16000.0) {
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: sampleRate, channels: 1, interleaved: false)!
self.converter = AVAudioConverter(from: AudioKit.format, to: format)
self.converter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Normal
self.converter?.sampleRateConverterQuality = .max
let sampleRateRatio = AKSettings.sampleRate / sampleRate
let inputBufferSize = 4410 // 100ms of 44.1K = 4410 samples.
input?.avAudioNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(inputBufferSize), format: nil) { buffer, time in
let capacity = Int(Double(buffer.frameCapacity) / sampleRateRatio)
let bufferPCM16 = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))!
var error: NSError? = nil
self.converter?.convert(to: bufferPCM16, error: &error) { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer
}
let channel = UnsafeBufferPointer(start: bufferPCM16.int16ChannelData!, count: 1)
let data = Data(bytes: channel[0], count: capacity * 2)
SpeechRecognitionService
.sharedInstance
.streamAudioData(data,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
}
}
You can likely record using AKNodeRecorder, and pass along the buffer from the resulting AKAudioFile to the API. If you wanted more real-time, you could try installing a tap on the avAudioNode property of the AKNode you want to record and pass the buffers to the API continuously.
However, I'm curious why you see the need for pre-processing - I'm sure the Google API is plenty optimized for recordings produced by the sample code you noted.
I've had a lot of success / fun with the iOS Speech API. Not sure if there's a reason you want to go with the Google API, but I'd consider checking it out and seeing if it might better serve your needs if you haven't already.
Hope this helps!

AAC encoding using AudioConverter and writing to AVAssetWriter

I'm struggling to encode audio buffers received from AVCaptureSession using
AudioConverter and then appending them to an AVAssetWriter.
I'm not getting any errors (including OSStatus responses), and the
CMSampleBuffers generated seem to have valid data, however the resulting file
simply does not have any playable audio. When writing together with video, the video
frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error), probably because the asset
writer is waiting for the audio to catch up. I suspect it's related to the way
I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to
understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to
be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
Below is the proc that fetches samples for the audio converter, and the data
structure that gets passed to it:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
This is how I am converting AudioBufferLists to CMSampleBuffers:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
Is there anything that I am obviously doing wrong? Is there a proper way to
construct CMSampleBuffers from AudioBufferList? How do you transfer priming
information from the converter to CMSampleBuffers that you create?
For my use case I need to do the encoding manually as the buffers will be
manipulated further down the pipeline (although I've disabled all
transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to
digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
Can I use AVCaptureSession to encode an AAC stream to memory?
Writing video + generated audio to AVAssetWriterInput, audio stuttering
How do I use CoreAudio's AudioConverter to encode AAC in real-time?
Some references I've used:
Apple sample code demonstrating how to use AudioConverter
Note describing AAC encoder delay
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
Samples vs Packets vs Frames
This had been a huge source of confusion for me:
Each CMSampleBuffer can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples)
Each CMSampleBuffer that contains 1 sample represents a single audio packet.
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in the given buffer.
Packets contain frames. This is governed by the mFramesPerPacket property of the buffer's AudioStreamBasicDescription. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame. For compressed buffers (like AAC), there is no relationship between the total size and frame count.
AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
Slicing CMSampleBuffer's
You can actually slice CMSampleBuffer's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange. This is nice so that you can provide the AudioConverterComplexInputDataProc with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500 frames of data when it expects 1024, the result sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate.
Priming and trim duration
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart should never be greater than the duration of the sample buffer. An example of priming:
Priming frames: 2112
Sample rate: 44100
Priming duration: 2112 / 44100 = ~0.0479s
First frame, frames: 1024, priming duration: 1024 / 44100
Second frame, frames: 1024, priming duration: 1088 / 41100
Creating the new CMSampleBuffer
AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)

Resources