Interperating AudioBuffer.mData to display audio visualization - ios

I am trying to process audio data in real-time so that I can display an on-screen spectrum analyzer/visualization based on sound input from the microphone. I am using AVFoundation's AVCaptureAudioDataOutputSampleBufferDelegate to capture the audio data, which is triggering the delgate function captureOutput. Function below:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
autoreleasepool {
guard captureOutput != nil,
sampleBuffer != nil,
connection != nil,
CMSampleBufferDataIsReady(sampleBuffer) else { return }
//Check this is AUDIO (and not VIDEO) being received
if (connection.audioChannels.count > 0)
{
//Determine number of frames in buffer
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer)
//Get AudioBufferList
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, nil, &audioBufferList, MemoryLayout<AudioBufferList>.size, nil, nil, UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), &blockBuffer)
let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for audioBuffer in audioBuffers {
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
let i16array = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count/2).map(Int16.init(bigEndian:))
}
for dataItem in i16array
{
print(dataItem)
}
}
}
}
}
The code above prints positive and negative numbers of type Int16 as expected, but need help in converting these raw numbers into meaningful data such as power and decibels for my visualizer.

I was on the right track... Thanks to RobertHarvey's comment on my question - Use of the Accelerate Framework's FFT calculation functions is required to achieve a spectrum analyzer. But even before I could use these functions, you need to convert your raw data into an Array of type Float as many of the functions require a Float array.
Firstly, we load the raw data into a Data object:
//Read data from AudioBuffer into a variable
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
I like to think of a Data object as a "list" of 1-byte sized chunks of info (8 bits each), but if I check the number of frames I have in my sample and the total size of my Data object in bytes, they don't match:
//Get number of frames in sample and total size of Data
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer) //= 1024 frames in my case
var dataSize = audioBuffer.mDataByteSize //= 2048 bytes in my case
The total size (in bytes) of my data is twice the number of frames I have in my CMSampleBuffer. This means that each frame of audio is 2 bytes in length. In order to read the data meaningfully, I need to convert my Data object which is a "list" of 1-byte chunks into an array of 2-byte chunks. Int16 contains 16 bits (or 2 bytes - exactly what we need), so lets create an Array of Int16:
//Convert to Int16 array
let samples = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count / MemoryLayout<Int16>.size)
}
Now that we have an Array of Int16, we can convert it to an Array of Float:
//Convert to Float Array
let factor = Float(Int16.max)
var floats: [Float] = Array(repeating: 0.0, count: samples.count)
for i in 0..<samples.count {
floats[i] = Float(samples[i]) / factor
}
Now that we have our Float array, we can now use the Accelerate Framework's complex math to convert the raw Float values into meaningful ones like magnitude, decibels etc. Link to documentation:
Apple's Accelerate Framework
Fast Fourier Transform (FFT)
I found Apple's documentation rather overwhelming. Luckily, I found a really good example online which I was able to re-purpose for my needs, called TempiFFT. Implementation as follows:
//Initiate FFT
let fft = TempiFFT(withSize: numFrames, sampleRate: 44100.0)
fft.windowType = TempiFFTWindowType.hanning
//Pass array of Floats
fft.fftForward(floats)
//I only want to display 20 bands on my analyzer
fft.calculateLinearBands(minFrequency: 0, maxFrequency: fft.nyquistFrequency, numberOfBands: 20)
//Then use a loop to iterate through the bands in your spectrum analyzer
var magnitudeArr = [Float](repeating: Float(0), count: 20)
var magnitudeDBArr = [Float](repeating: Float(0), count: 20)
for i in 0..<20
{
var magnitudeArr[i] = fft.magnitudeAtBand(i)
var magnitudeDB = TempiFFT.toDB(fft.magnitudeAtBand(i))
//..I didn't, but you could perform drawing functions here...
}
Other useful references:
Converting Data into Array of Int16
Converting Array of Int16 to Array of Float

Related

How do I remove noise from an audio signal? And what threshold/threshold range should I use?

I have loaded an audio file and have created an input and output buffer.
But when I follow Apple's post I get an output signal that has distorted sounds.
private func extractSignal(input: AVAudioPCMBuffer, output: AVAudioPCMBuffer) {
let count = 256
let forward = vDSP.DCT(previous: nil, count: count, transformType: .II)!
let inverse = vDSP.DCT(previous: nil, count: count, transformType: .III)!
// Iterates over the signal.
input.iterate(signalCount: 32000) { step, signal in
var series = forward.transform(signal)
series = vDSP.threshold(series, to: 0.0003, with: .zeroFill) // What should this threshold be?
var inversed = inverse.transform(series)
let divisor: Float = Float(count / 2)
inversed = vDSP.divide(inversed, divisor)
// Code: write inversed to output buffer.
output.frameLength = AVAudioFrameCount(step * signal.count + signal.count)
}
}

How to correctly convert AVAudioCompressedBuffer into Data and back

I have an AVAudioCompressedBuffer instance that gets correctly decoded and played by my AVAudioEngine.
The problem is that after converting it to Data and then back to AVAudioCompressedBuffer it is no longer playable and throws a kAudioCodecBadDataError.
This is how I'm currently managing the conversion to and from Data:
// Convert AVAudioCompressedBuffer to Data
let capacity = Int(compressedBuffer.byteLength)
let compressedBufferPointer = compressedBuffer.data.bindMemory(to: UInt8.self, capacity: capacity)
var compressedBytes: [UInt8] = [UInt8].init(repeating: 0, count: capacity)
compressedBufferPointer.withMemoryRebound(to: UInt8.self, capacity: capacity) { sourceBytes in
compressedBytes.withUnsafeMutableBufferPointer {
$0.baseAddress!.initialize(from: sourceBytes, count: capacity)
}
}
let data = Data(compressedBytes)
// Convert Data to AVAudioCompressedBuffer
let compressedBuffer: AVAudioCompressedBuffer = AVAudioCompressedBuffer.init(format: format, packetCapacity: packetCapacity, maximumPacketSize: maximumPacketSize)
compressedBuffer.byteLength = byteLength
compressedBuffer.packetCount = packetCount
data.withUnsafeBytes {
compressedBuffer.data.copyMemory(from: $0.baseAddress!, byteCount: byteLength)
}
let buffer = compressedBuffer
The values for all of the buffer attributes (format, packetCapacity, maximumPacketSize, byteLength, packetCount, byteLength) are the same on both ends of the conversion.
It turns out that, for some reason, converting AVAudioCompressedBuffer that way fails to include the buffer's packetDescriptions. These are stored as a C-Style array of AudioStreamPacketDescriptions structs in the buffer. By creating a Codable struct (PacketDescription) and mapping the descriptions objects separately the reassembled buffer worked as expected.
var packetDescriptions = [PacketDescription]()
for index in 0..<compressedBuffer.packetCount {
if let packetDescription = compressedBuffer.packetDescriptions?[Int(index)] {
packetDescriptions.append(
.init(mStartOffset: packetDescription.mStartOffset,
mVariableFramesInPacket: packetDescription.mVariableFramesInPacket,
mDataByteSize: packetDescription.mDataByteSize))
}
}
packetDescriptions?.enumerated().forEach { index, element in
compressedBuffer.packetDescriptions?[index] = AudioStreamPacketDescription(mStartOffset: element.mStartOffset!,
mVariableFramesInPacket: element.mVariableFramesInPacket!,
mDataByteSize: element.mDataByteSize!)
}

what samples will buffer contain when I use installTap(onBus for multichannel audio?

what samples will buffer contain when I use installTap(onBus for multichannel audio?
If channel count > 1 then it will contain left microphone ? or it will contain samples from left microphone and right microphone?
when I use iphone simulator then Format = pcmFormatFloat32, channelCount = 2, sampleRate = 44100.0, not Interleaved
I use this code
let bus = 0
inputNode.installTap(onBus: bus, bufferSize: myTapOnBusBufferSize, format: theAudioFormat) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
self.onNewBuffer(buffer)
}
func onNewBuffer(_ inputBuffer:AVAudioPCMBuffer!)
{
var samplesAsDoubles:[Double] = []
for i in 0 ..< Int(inputBuffer.frameLength)
{
let theSample = Double((inputBuffer.floatChannelData?.pointee[i])!)
samplesAsDoubles.append( theSample )
}
}
print("number of input busses = \(inputNode.numberOfInputs)")
it print
number of input buses = 1
for each sample in my samplesAsDoubles array from buffer that I have inside of block from what channel it will be? at what time this sample was recorded?
From the header comments for floatChannelData:
The returned pointer is to format.channelCount pointers to float. Each of these pointers
is to "frameLength" valid samples, which are spaced by "stride" samples.
If format.interleaved is false (as with the standard deinterleaved float format), then
the pointers will be to separate chunks of memory. "stride" is 1.
FloatChannelData gives you a 2D float array. For a non-interleaved 2 channel buffer, you would access the individual samples like this:
let channelCount = Int(buffer.format.channelCount)
let frameCount = Int(buffer.frameLength)
if let channels = buffer.floatChannelData { //channels is 2D float array
for channelIndex in 0..<channelCount {
let channel = channels[channelIndex] //1D float array
print(channelIndex == 0 ? "left" : "right")
for frameIndex in 0..<frameCount {
let sample = channel[frameIndex]
print(" \(sample)")
}
}
}

how to find maximum value of UnsafeMutablePointer<Float> with bufferSize of data type UInt32 in swift?

I am new to swift and working on a project where i had to visualize audio waves!
I am using EZAudio pod where to plot the waves on the screen a function UpdatePlot is used and in parameter a UnsafeMutablePoiter> is passed
I want the maximum value in each UnUnsafeMutablePointer to find the highest wave length on the plot
buffer[0]//with a bufferSize UInt32
I want to find the highest value in that buffer[0] array!
Please Help!!!
p.s : thanks in advance
for the array buffer[]
Swift 3:
buffer.max()
Swift 2:
buffer.maxElement()
These two lines are so confusing:
buffer[0]//with a bufferSize UInt32
I want to find the highest value in that buffer[0] array!
Which is an UnsafeMutablePointer<Float>, buffer itself or buffer[0]? Is buffer a Swift Array?
I assume buffer is of type UnsafeMutablePointer<Float>.
Manually:
func myMax(buf: UnsafePointer<Float>, count: UInt32) -> Float? {
guard case let bufCount = Int(count) where count > 0 else {return nil}
var max = -Float.infinity
for i in 0..<bufCount {
if buf[i] > max {
max = buf[i]
}
}
return max
}
Use it like this:
if let maxValue = myMax(buffer, count: bufferSize) {
//Use the maximum value
print(maxValue)
} else {
//No maximum value when empty
print("empty")
}
If you want to utilize Swift Library, you can write myMax as:
func myMax(buf: UnsafePointer<Float>, count: UInt32) -> Float? {
return UnsafeBufferPointer(start: buf, count: Int(count)).maxElement()
}

How to create a NSData from hex value in swift

I'm a swift/iOS newbie and I have a problem to solve.
I'm trying to get data from Texas Instrument SensorTag 2. To activate a sensor, following the instructions, I have to write a binary string in the configuration bank of my sensor.
I have this snippet of code:
if SensorTag.validConfigCharacteristic(thisCharacteristic) {
// Enable Sensor
let enableByte = SensorTag.getEnableByteFor(thisCharacteristic)
self.sensorTagPeripheral.writeValue(enableByte, forCharacteristic: thisCharacteristic, type: CBCharacteristicWriteType.WithResponse)
}
and I write the function to get the value to write. enableByte type is NSData.
class func getEnableByteFor(thisCharacteristic: CBCharacteristic) -> NSData {
print(thisCharacteristic.UUID)
var enableValue = 0
if thisCharacteristic.UUID == MovementConfigUUID {
enableValue = ...
} else { // any other activation
enableValue = 1
}
return NSData(bytes: &enableValue, length: sizeof(UInt8))
}
For every sensor I have to write a 1 if I want to enable the sensor and 0 if I want to disable it, but with the movement sensor I have to write according to this guide 16 bits (2 byte). For my config I have to write a binary value of 0000000001111111, 0x007F. How can I initialize a NSData object with value 0x007F?
Try this:
let bytes : [CChar] = [0x0, 0x7F]
let data = NSData(bytes: bytes, length: 2)
NSData(bytes:length:) creates an NSData object from a byte stream. In Objective-C, this byte stream is of type char *. The Swift equivalent is [CChar]. The question (and another answer) use an Int to represent this byte stream. This is wrong and dangerous.
var enableValue = 0 // enableValue is a 64-bit integer
NSData(bytes: &enableValue, length: sizeof(UInt8)) // this trims it to the first 8 bits
It works because x86 uses Little Endian encoding, which puts the least significant byte first. It will fail on PowerPC, which uses Big Endian. ARM uses switchable endianness so it may or may not fail there. When the situation call for exact bit layout, you should not rely on the architecture's endianness:
class func getEnableByteFor(thisCharacteristic: CBCharacteristic) -> NSData {
print(thisCharacteristic.UUID)
let enableValue : [CChar]
if thisCharacteristic.UUID == MovementConfigUUID {
enableValue = [0x0, 0x7F]
} else { // any other activation
enableValue = [0x1]
}
return NSData(bytes: enableValue, length: enableValue.count)
}
Much shorter solution taking in account byte order:
NSData(bytes: [UInt16(0x007F).bigEndian], length: 2)
Now there is nothing wrong with using [UInt16] as byte stream because UInt16 has bigEndian property that returns the big-endian representation of the integer changing byte order if necessary.

Resources