How to remove the riff headers from wav file in Swift? - ios

I want to just have the data chunk of the .wav file and exclude all other chunks i.e the riff headers.
let voiceData = try? Data(contentsOf: soundUrl).advanced(by: 44)
I did try this but for some reason, there is still some baggage left before the actual audio. could anyone please help me with this issue. if there an efficient way to read the .wav file and only include the data section?

First, are you certain this is actually a WAV file. WAV does typically have 44 bytes of header. Why do you believe there is "some baggage?" How are you determining that?
You can of course parse the RIFF format directly. The easiest (sloppiest) approach is to scan down until you find the bytes "data" (0x64 61 74 61). The next 4 bytes will the the length (in little-endian format, which you can skip if you're just going to read to the end), followed by the actual data you want.
Finding the data bytes is done with range(of:)
let dataBytes = Data([0x64, 0x61, 0x74, 0x61])
if let dataRange = riff.range(of: dataBytes) {
let start = dataRange.endIndex + 4 // Skip over length bytes
let samples = riff[start...] // read the rest of the bytes
// use samples
}

Related

How to compress data

I'm trying to compress data to improve the space complexity, but I'm not sure if I'm incorrectly compressing data or incorrectly measuring the size.
I tried the following in the Playground.
import Foundation
import Compression
// Example data
struct MyData: Encodable {
let property = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
}
// I tried using MemoryLayout to measure the size of the uncompressed data
let size = MemoryLayout<MyData>.size
print("myData type size", size) // 16
let myData = MyData()
let myDataSize = MemoryLayout.size(ofValue: myData)
print("myData instance size", myDataSize) // 16
func run() {
// 1. This shows the size of the encoded data
guard let encoded = try? JSONEncoder().encode(myData) else { return }
print("myData encoded size", encoded) // 589 bytes
/// 2. This shows the size after using a first compression method
guard let compressed = try? (encoded as NSData).compressed(using: .lzfse) else { return }
let firstCompression = Data(compressed)
print("firstCompression", firstCompression) // 491 bytes
/// 3. Second compression method (just wanted to try a different compression method)
let secondCompression = compress(encoded)
print("secondCompression", secondCompression) // 491 bytes
/// 4. Wanted to compare the size difference between compressed and uncompressed for a bigger data so here is the array of uncompressed data.
var myDataArray = [MyData]()
for _ in 0 ... 100 {
myDataArray.append(MyData())
}
guard let encodedArray = try? JSONEncoder().encode(myDataArray) else { return }
print("myData encodedArray size", encodedArray) // 59591 bytes
print("memory layout", MemoryLayout.size(ofValue: encodedArray)) // 16
/// 5. Compressed array
var compressedArray = [Data]()
for _ in 0 ... 100 {
guard let compressed = try? (encoded as NSData).compressed(using: .lzfse) else { return }
let data = Data(compressed)
compressedArray.append(data)
}
guard let encodedCompressedArray = try? JSONEncoder().encode(compressedArray) else { return }
print("myData compressed array size", encodedCompressedArray) // 66661 bytes
print("memory layout", MemoryLayout.size(ofValue: encodedCompressedArray)) // 16
/// 6. Compression using lzma
var differentCompressionArray = [Data]()
for _ in 0 ... 100 {
guard let compressed = try? (encoded as NSData).compressed(using: .lzma) else { return }
let data = Data(compressed)
differentCompressionArray.append(data)
}
guard let encodedCompressedArray2 = try? JSONEncoder().encode(differentCompressionArray) else { return }
print("myData compressed array size", encodedCompressedArray2) // 60702 bytes
print("memory layout", MemoryLayout.size(ofValue: encodedCompressedArray2)) // 16
}
run()
// The implementation for the second compression method
func compress(_ sourceData: Data) -> Data {
let pageSize = 128
var compressedData = Data()
do {
let outputFilter = try OutputFilter(.compress, using: .lzfse) { (data: Data?) -> Void in
if let data = data {
compressedData.append(data)
}
}
var index = 0
let bufferSize = sourceData.count
while true {
let rangeLength = min(pageSize, bufferSize - index)
let subdata = sourceData.subdata(in: index ..< index + rangeLength)
index += rangeLength
try outputFilter.write(subdata)
if (rangeLength == 0) {
break
}
}
}catch {
fatalError("Error occurred during encoding: \(error.localizedDescription).")
}
return compressedData
}
The MemoryLayout object doesn't seem to be helpful in measuring the size of encoded arrays whether or not they're compressed. I'm not sure how to measure a struct or an array of struts without encoding them with JSONEncoder which already compresses the data.
The before/after compression for the single instance of MyData (#1, #2, and #3) seems to show that the data is being properly compressed going from 589 bytes to 491 bytes. However, the comparison between an array of uncompressed data and an array of compressed data (#4, #5) seems to show that the size increased from 59591 to 66661 after the compression.
Finally, I tried using a different compression algorithm lzma (#6). It reduced the size to 60702 which is lower than the previous compression, but it still wasn't smaller than the uncompressed data.
To get a bit of confusion out of the way first: MemoryLayout gives you information about the size and structure of the layout of a type at compile time, but can't be used to determine the amount of storage an Array value needs at runtime because the size of the Array structure itself does not depend on how much data it contains.
Highly simplified, the layout of an Array value looks like this:
┌─────────────────────┐
│ Array │
├──────────┬──────────┤ ┌──────────────────┐
│ length │ buffer ─┼───▶│ storage │
└──────────┴──────────┘ └──────────────────┘
1 word / 1 word /
8 bytes 8 bytes
└─────────┬─────────┘
└─▶ MemoryLayout<Array<UInt8>>.size
An Array value stores its length, or count (mixed in with some flags, but we don't need to worry about that) and a pointer to the actual space where the items it contains are stored. Those items aren't stored as part of the Array value itself, but separately in allocated memory which the Array points to. Whether the Array "contains" 10 values or 100000 values, the size of the Array structure remains the same: 1 word (or 8 bytes on a 64-bit system) for the length, and 1 word for the pointer to the actual underlying storage. (The size of the storage buffer, however, is exactly determined by the number of elements it is able to contain, at runtime.)
In practice, Array is significantly more complicated than this for bridging and other reasons, but this is the basic gist; this is why you only ever see MemoryLayout.size(ofValue:) return the same number every time. [And incidentally, the size of String is the same as Array for similar reasons, which is why MemoryLayout<MyData>.size also reports 16.]
In order to know how many bytes an Array or a Data effectively take up, it's sufficient to ask them for their .count: Array<UInt8> and Data are both collections of UInt8 values (bytes), and their .count will reflect the amount of data effectively stored in their underlying storage.
As for the size increase between step (4) and (5), note that
Step 4 takes 100 copies of your MyData and joins them together before converting them to JSON, while
Step 5 takes 100 copies of individually compressed MyData instances, joins those together, and then re-coverts them to JSON
Step 5 has a few issues compared to step 4:
Compression benefits heavily from repetition in data: a bit of data compressed and repeated 100 times won't be nearly as compact as a bit of data repeated 100 times, then compressed, because each round of compression can't benefit from knowing that there's another copy of the data that came before it. As a simple example:
Let's say we wanted to use a form of run-length encoding to compress the string Hello: there isn't a lot we can do, except maybe turn it into Hel{2}o (where {2} indicates a repetition of the last character 2 times)
If we compress Hello and join it 3 times, we get might get Hel{2}oHel{2}oHel{2}o,
But if we first joined Hello 3 times and then compressed, we could get {Hel{2}o}{3}, which is much more compact
Compression also typically needs to insert some information about how the data was compressed in order to be able to recognize and decompress the data later. By compressing MyData 100 times and joining all of those instances, you're repeating that metadata 100 times
Even after compressing your MyData instances, re-representing them as JSON decreases how compressed they are because it can't represent the binary data exactly. Instead, it has to convert each Data blob into a Base64-encoded string, which causes it to grow again
Between these issues, it's not terribly surprising that your data is growing. What you actually want is a modification to step 4, which is compressing the joined data:
guard let encodedArray = try? JSONEncoder().encode(myDataArray) else { fatalError() }
guard let compressedEncodedArray = try? (encodedArray as NSData).compressed(using: .lzma) else { fatalError() }
print(compressedEncodedArray.count) // => 520
This is significantly better than
guard let encodedCompressedArray = try? JSONEncoder().encode(compressedArray) else { fatalError() }
print(encodedCompressedArray.count) // => 66661
As an aside: it seems unlikely that you're actually using JSONEncoder in practice to join data in this way, and this was just for measurement here — but if you actually are, consider other mechanisms for doing this. Converting binary data to JSON in this way is very inefficient storage-wise, and with a bit more information about what you might actually need in practice, we might be able to recommend a more effective way to do this.
If what you're actually doing in practice is encoding an Encodable object tree and then compressing that the one time, that's totally fine.

AVAudioFile.write(from:) fails when buffer contains interleaved audio

I'm trying to write out an audio file after doing some processing, and am getting an error. I've reduced the error to this simple standalone case:
import Foundation
import AVFoundation
do {
let inputFileURL = URL(fileURLWithPath: "/Users/andrewmadsen/Desktop/test.m4a")
let file = try AVAudioFile(forReading: inputFileURL, commonFormat: .pcmFormatFloat32, interleaved: true)
guard let buffer = AVAudioPCMBuffer(pcmFormat: file.processingFormat, frameCapacity: AVAudioFrameCount(file.length)) else {
throw NSError()
}
buffer.frameLength = buffer.frameCapacity
try file.read(into: buffer)
let tempURL =
URL(fileURLWithPath: NSTemporaryDirectory())
.appendingPathComponent("com.openreelsoftware.AudioWriteTest")
.appendingPathComponent(UUID().uuidString)
.appendingPathExtension("caf")
let fm = FileManager.default
let dirURL = tempURL.deletingLastPathComponent()
if !fm.fileExists(atPath: dirURL.path, isDirectory: nil) {
try fm.createDirectory(at: dirURL, withIntermediateDirectories: true, attributes: nil)
}
var settings = buffer.format.settings
settings[AVAudioFileTypeKey] = kAudioFileCAFType
let tempFile = try AVAudioFile(forWriting: tempURL, settings: settings)
try tempFile.write(from: buffer)
} catch {
print(error)
}
When this code runs, the tempFile.write(from: buffer) call throws an error:
Error Domain=com.apple.coreaudio.avfaudio Code=-50 "(null)" UserInfo={failed call=ExtAudioFileWrite(_imp->_extAudioFile, buffer.frameLength, buffer.audioBufferList)}
test.m4a is a stereo, 44.1 KHz AAC file (from the iTunes store), though the failure occurs with other stereo files in other formats (AIFF and WAV) as well.
The code does not fail, and instead correctly saves the original audio out to a new file if I change the interleaved parameter to false when creating the original input AVAudioFile (file). However, in this case, the following message is logged to the console:
Audio files cannot be non-interleaved. Ignoring setting AVLinearPCMIsNonInterleaved YES.
It seems strange and confusing that writing a non-interleaved buffer works fine, despite a message saying that files must be interleaved, while writing an interleaved buffer fails. This is the opposite of what I expected.
I'm aware that reading a file using the plain AVAudioFile(forReading:) initializer without specifying a format defaults to using non-interleaved (ie. the "standard" AVAudioFormat at the file's actual sample rate and channel count). Does this mean that I really do have to convert interleaved audio to non-interleaved before trying to write it?
Notably, in the actual program where this problem came up, I'm doing something much more complex than simply reading a file in and writing it back out again, and I do need to handle interleaved audio. I have confirmed however that that original, more complex code is also failing only for interleaved stereo audio.
Is there something tricky I need to do to get AVAudioFile to write out a buffer containing interleaved PCM audio?
The mixup here is that there are TWO formats in play: the format of the output file, and the format of the buffers you will write (the processing format). The initializer AVAudioFile(forWriting: settings:) does not let you choose the processing format and defaults to de-interleaved, hence your error.
This opens the file for writing using the standard format (deinterleaved floating point).
You need to use the other initializer: AVAudioFile(forWriting:settings: commonFormat:interleaved:) whose last two arguments specify the processing format (the argument names could have been clearer about that tbh).
var settings: [String : Any] = [:]
settings[AVFormatIDKey] = kAudioFormatMPEG4AAC
settings[AVAudioFileTypeKey] = kAudioFileCAFType
settings[AVSampleRateKey] = buffer.format.sampleRate
settings[AVNumberOfChannelsKey] = 2
settings[AVLinearPCMIsFloatKey] = (buffer.format.commonFormat == .pcmFormatInt32)
let tempFile = try AVAudioFile(forWriting: tempURL, settings: settings, commonFormat: buffer.format.commonFormat, interleaved: buffer.format.isInterleaved)
try tempFile.write(from: buffer)
p.s. passing the buffer format setting directly to AVAudioFile gets you an LPCM caf file, which you may not want, hence I reconstruct the file settings.
Not positive here, but maybe since you're making the outputFile settings the same as the processing format, it's possible that the processing format has an inflexible policy on interleaving, whereas the file settings format will be fine with it - or vice versa.
Here's what I'd try first. Incomplete example, but should be enough to illustrate the areas to test.
let sourceFile: AVAudioFile
let format: AVAudioFormat
do {
// for the moment, try this without any specific format and see what it gives you
let sourceFile = try AVAudioFile(forReading: inputFileURL)
format = sourceFile.processingFormat
print(format) // let's see what we're getting so far, maybe some clues
} catch {
fatalError("Unable to load the source audio file: \(error.localizedDescription).")
}
let sourceSettings = sourceFile.fileFormat.settings
var outputSettings = sourceSettings // start with the settings of the original file rather than the buffer format settings
outputSettings[AVAudioFileTypeKey] = kAudioFileCAFType
// etc...

How can I change the first two bits of an mp4 file to 00, 00 in swift?

I am trying to change the first two bits of an mp4 file to 00, 00 to change it back to normal / working state. After downloading the mp4 file from an API, I discovered it would not play and acted weird, so on the internet I found out that some people said it was encoded and changing the two first bits to 0 makes it work! (It does, but) I don't know how to do that in swift any help would be appreciated!
I found out how to do it! This is the code:
import UIKit
import AVFoundation
// URL to a mp4 file called videoFile located in the project's folder:
let videoURL = URL(string: Bundle.main.path(forResource: "videoFile", ofType: "mp4")!)
// Func for "decoding" (changing it's 2 first bytes to 0):
func changeFirstTwoBytesOfFile(fileURL: URL) {
var fileData = fileURL.dataRepresentation
// Printing first 4 bytes of the unchanged file for checking:
print("Bytes before change:")
print(fileData[0])
print(fileData[1])
print(fileData[2])
print(fileData[3])
print("")
print("")
// Changing it's 2 first bytes to 0
fileData[0] = 0
fileData[1] = 0
// Printing the changed file's bytes to check if the change took place:
print("Bytes after change:")
print(fileData[0])
print(fileData[1])
print(fileData[2])
print(fileData[3])
}
changeFirstTwoBytesOfFile(fileURL: videoURL!)

iOS Swift playing audio (aac) from network stream

I'm developing an iOS application and I'm quite new to iOS development. So far I have implemented a h264 decoder from network stream using VideoToolbox, which was quite hard.
Now I need to play an audio stream that comes from network, but with no file involved, just a raw AAC stream read directly from the socket. This streams comes from the output of a ffmpeg instance.
The problem is that I don't know how to start with this, it seems there is little information about this topic. I have already tried with AVAudioPlayer but found just silence. I think I have first need to decompress the packets from the stream, just like with the h264 decoder.
I have been trying also with AVAudioEngine and AVAudioPlayerNode but no sucess, same as with AVAudioPlayer. Can someone provide me some guidance? Maybe AudioToolbox? AudioQueue?
Thank you very much for the help :)
Edit:
I'm playing around with AVAudioCompressedBuffer and having no error using AVAudioEngine and AVAudioNode. But, I don't know what this output means:
inBuffer: <AVAudioCompressedBuffer#0x6040004039f0: 0/1024 bytes>
Does this mean that the buffer is empty? I have been trying to feed this buffer in several ways, but always returns something like 0/1024. I think I'm not doing this right:
compressedBuffer.mutableAudioBufferList.pointee = audioBufferList
Any idea?
Thank you!
Edit 2:
I'm editing for reflecting my code for decompressing the buffer. Maybe some one can point me in the right direction.
Note: The packet that is ingested by this function actually is passed without the ADTS header (9 bytes) but I have also tried passing it with the header.
func decodeCompressedPacket(packet: Data) -> AVAudioPCMBuffer {
var packetCopy = packet
var streamDescription: AudioStreamBasicDescription = AudioStreamBasicDescription.init(mSampleRate: 44100, mFormatID: kAudioFormatMPEG4AAC, mFormatFlags: UInt32(MPEG4ObjectID.AAC_LC.rawValue), mBytesPerPacket: 0, mFramesPerPacket: 1024, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0)
let audioFormat = AVAudioFormat.init(streamDescription: &streamDescription)
let compressedBuffer = AVAudioCompressedBuffer.init(format: audioFormat!, packetCapacity: 1, maximumPacketSize: 1024)
print("packetCopy count: \(packetCopy.count)")
var audioBuffer: AudioBuffer = AudioBuffer.init(mNumberChannels: 1, mDataByteSize: UInt32(packetCopy.count), mData: &packetCopy)
var audioBufferList: AudioBufferList = AudioBufferList.init(mNumberBuffers: 1, mBuffers: audioBuffer)
var mNumberBuffers = 1
var packetSize = packetCopy.count
// memcpy(&compressedBuffer.mutableAudioBufferList[0].mBuffers, &audioBuffer, MemoryLayout<AudioBuffer>.size)
// memcpy(&compressedBuffer.mutableAudioBufferList[0].mBuffers.mDataByteSize, &packetSize, MemoryLayout<Int>.size)
// memcpy(&compressedBuffer.mutableAudioBufferList[0].mNumberBuffers, &mNumberBuffers, MemoryLayout<UInt32>.size)
// compressedBuffer.mutableAudioBufferList.pointee = audioBufferList
var bufferPointer = compressedBuffer.data
for byte in packetCopy {
memset(compressedBuffer.mutableAudioBufferList[0].mBuffers.mData, Int32(byte), MemoryLayout<UInt8>.size)
}
print("mBuffers: \(compressedBuffer.audioBufferList[0].mBuffers.mNumberChannels)")
print("mBuffers: \(compressedBuffer.audioBufferList[0].mBuffers.mDataByteSize)")
print("mBuffers: \(compressedBuffer.audioBufferList[0].mBuffers.mData)")
var uncompressedBuffer = uncompress(inBuffer: compressedBuffer)
print("uncompressedBuffer: \(uncompressedBuffer)")
return uncompressedBuffer
}
So you are right in thinking you will (most likely) need to decompress the packets received from the stream. The idea is to get them to raw PCM format so that this can be sent directly to the audio output. This way you could also apply any DSP / audio manipulation you could want to the audio stream.
As you mentioned, you will probably need to be looking into the AudioQueue direction and the Apple Docs provide a good example of streaming audio in realtime, although this is in obj-c (in this case I think it may be a good idea to carry this out in obj-c). This is probably the best place to get started (interfacing the obj-c to swift is super simple).
Looking again at it in Swift there is the class AVAudioCompressedBuffer which seems to handle AAC for your case (would not need to decode the AAC if you get this to work), however there is no direct method for setting the buffer as it is intended for just being a storage container, I believe. Here's a working example of someone using the AVAudioCompressedBuffer along with an AVAudioFile (maybe you could buffer everything into files in background threads? I think it would be too much IO overhead).
However, if you tackle this in obj-c there is a post on how to set the AVAudioPCMBuffer (maybe works with AVAudioCompressedBuffer?) directly through memset (kind of digusting but at the same time lovely as an embedded programmer myself).
// make a silent stereo buffer
AVAudioChannelLayout *chLayout = [[AVAudioChannelLayout alloc] initWithLayoutTag:kAudioChannelLayoutTag_Stereo];
AVAudioFormat *chFormat = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32
sampleRate:44100.0
interleaved:NO
channelLayout:chLayout];
AVAudioPCMBuffer *thePCMBuffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:chFormat frameCapacity:1024];
thePCMBuffer.frameLength = thePCMBuffer.frameCapacity;
for (AVAudioChannelCount ch = 0; ch < chFormat.channelCount; ++ch) {
memset(thePCMBuffer.floatChannelData[ch], 0, thePCMBuffer.frameLength * chFormat.streamDescription->mBytesPerFrame);
}
I know this is a lot to take and no way seems like a simple solution, but I think the obj-c AudioQueue technique would be my first stop!
Hope this helps!

Swift: best way to send large arrays of numbers ([Double]) over HTTP

I thought I'd ask after hours of inconclusive research and tests:
Introduction
I'm trying to send very large arrays of Doubles from an app to a server, naturally, I want to compress this as much as possible.
Specifically, these array contain CMDeviceMotion components (acceleration, x, y, z, gyroscope, etc...), but this question should apply to any large array of numbers (over 100K or a million values)
What I've tried and found by researching options
Say I have a large array of Double (There are many others) :
var CMX = CM.map({$0.userAcceleration.x})
here, CMX is of type [Double] and CM is [CMDeviceMotion]
I've tried making POST requests to my server by sending CMX in different ways, then calculating the total size after I receive it on the server :
First, as a single comma separated string :
{"AX":"-0.0441827848553658,-0.103976868093014,-0.117475733160973,-0.206566318869591,-0.266509801149368,-0.282151937484741,-0.260240525007248,-0.266505032777786,-0.315020948648453,-0.305839896202087,0.0255246963351965,0.0783950537443161,0.0749507397413254,0.0760494321584702,-0.0101579604670405,0.106710642576218,0.131824940443039,0.0630970001220703,0.21177926659584,0.27022996544838,0.222621202468872,0.234281644225121,0.288497060537338,0.176655143499374,0.193904414772987,0.169417425990105,0.150193274021149,0.00871349219232798,-0.0270088445395231,-0.0 ....
Size 153 Kb.
It makes sense that this is larger than sending as binary data, since a single number here is 64 bits (8 bytes), and becomes 17 bytes long (one byte per character) +1 = 18 (added a character for the comma).
With this reasoning, sending the array as binary data should be smaller.
Base 64 encoding
Here, I convert the array to a Data object using NSKeyedArchiver and base 64 encode the data before sending it :
["AX":NSKeyedArchiver.archivedData(withRootObject:CM.map({$0.userAcceleration.x})).base64EncodedString()]
This made the file size 206 Kb
Sending the data as a JSON array
By just sending :
["AX": CM.map({$0.userAcceleration.x})]
It turned out that this array of numbers was practically converted to a comma separated string, the size ended up being the same as in trial 1 (160Kb)
Sending as Data without base 64 encoding
Doing this:
["AX":NSKeyedArchiver.archivedData(withRootObject:CM.map({$0.userAcceleration.x}))
made the application crash at runtime, so I can't send a Data object as a value in a JSON
Question
How can I send these array in a more condensed way in a JSON object ?
Note that I already have downsampling in mind, and using 32 bit floats to reduce the size.
Simple way would be to do this:
let data: Data = CMX.withUnsafeBufferPointer { pointer in
return Data(buffer: pointer)
}
And you have binary buffer with all your Doubles/Floats combined.
But because HTTP is text-based protocol you will have to convert this data to base64 string:
let base64String = data.base64EncodedString()
And this base64String should be passed for AX parameter of your POST(?) HTTP request.
EDIT:
To convert it back you may use code like this:
extension Array {
init?(data: Data) {
// This check should be more complex, but here we just check if total byte count divides to one element size in bytes
guard data.count % MemoryLayout<Element>.size == 0 else { return nil }
let elementCount = data.count / MemoryLayout<Element>.size
let buffer = UnsafeMutableBufferPointer<Element>.allocate(capacity: elementCount)
data.copyBytes(to: buffer)
self = buffer.map({$0})
buffer.deallocate()
}
// Wrapped here code above
var data: Data {
return self.withUnsafeBufferPointer { pointer in
return Data(buffer: pointer)
}
}
}
let converted: [Double]? = Array(data: CMX.data) // converted now should be equal to CMX

Resources