I'm new to Audiokit and I'm trying to do some real-time digital signal processing on input audio from the microphone.
I know the data I want is in AKAudioFile's FloatChannelData, but what if I want to obtain this in real-time? I'm currently using AKMicrophone, AKFrequencyTracker, AKNodeOutputPlot, AKBooster and I'm plotting the tracker's amplitude data. However, that data is not the same as the audio signal (as you know, it's the RMS). Is there any way I can obtain the signal's Float data from the mic? Or even from the AKNodeOutputPlot? I just need read-access.
AKSettings.audioInputEnabled = true
mic = AKMicrophone()
plot = AKNodeOutputPlot(mic, frame: audioInputPlot.bounds)
tracker = AKFrequencyTracker.init(mic)
silence = AKBooster(tracker,gain:0)
AudioKit.output = silence
AudioKit.start()
The creator of recommends here:
AKNodeOutputPlot works, its one short file. You're basically just tapping the node and grabbing the data.
How would this work in my viewController if if have an instance of plot (AKNodeOutputPlot), mic(AKMicrophone) and want to output those values to a label?
Use a tap on which ever node you want to get the data out from. I used AKNodeOutputPlot in my quote above because its is fairly straightforward, just using that data as input for a plot, but you could take the data and do whatever with it. In this code (from AKNodeOutputPlot):
internal func setupNode(_ input: AKNode?) {
if !isConnected {
input?.avAudioNode.installTap(
onBus: 0,
bufferSize: bufferSize,
format: nil) { [weak self] (buffer, _) in
guard let strongSelf = self else {
AKLog("Unable to create strong reference to self")
return
}
buffer.frameLength = strongSelf.bufferSize
let offset = Int(buffer.frameCapacity - buffer.frameLength)
if let tail = buffer.floatChannelData?[0] {
strongSelf.updateBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
isConnected = true
}
You get the buffer data in real time. Here we just send it to "updateBuffer" where it gets plotted, but instead of plotting you'd do something else.
To complete Aurelius Prochazka answer:
To record the audio flowing through a node, you need to attach a tap to it. A tap is just a closure which get called each time a buffer is available.
Here is a sample code you can reuse in your own class:
var mic = AKMicrophone()
func initMicrophone() {
// Facultative, allow to set the sampling rate of the microphone
AKSettings.sampleRate = 44100
// Link the microphone note to the output of AudioKit with a volume of 0.
AudioKit.output = AKBooster(mic, gain:0)
// Start AudioKit engine
try! AudioKit.start()
// Add a tap to the microphone
mic?.avAudioNode.installTap(
onBus: audioBus, bufferSize: 4096, format: nil // I choose a buffer size of 4096
) { [weak self] (buffer, _) in //self is now a weak reference, to prevent retain cycles
// We try to create a strong reference to self, and name it strongSelf
guard let strongSelf = self else {
print("Recorder: Unable to create strong reference to self #1")
return
}
// We look at the buffer if it contains data
buffer.frameLength = strongSelf.bufferSize
let offset = Int(buffer.frameCapacity - buffer.frameLength)
if let tail = buffer.floatChannelData?[0] {
// We convert the content of the buffer to a swift array
let samples = Array(UnsafeBufferPointer(start: &tail[offset], count: 4096))
strongSelf.myFunctionHandlingData(samples)
}
}
func myFunctionhandlingData(data: [Float]) {
// ...
}
Be careful to use DispatchQueue or an other synchronisation mechanism if you need a to interact on this data between different threads.
In my case I do use :
DispatchQueue.main.async { [weak self] in
guard let strongSelf = self else {
print("Recorder: Unable to create strong reference to self #2")
return
}
strongSelf.myFunctionHandlingData(samples)
}
so that my function run in the main thread.
Related
It's hard to explain, but I will try to do it...
Is that possible to add some delay between audio processing and sound from speakers to user?
I'm use the Speech framework to recognise an audio file (using SFSpeechAudioBufferRecognitionRequest and AVAudioEngine to stream an audio file) and get the final result from recogniser after user already heard a sound from speakers.
But a sound should appears after I get some result from recogniser. Need to add some latency between input to recogniser from my AVAudioMixerNode and sound to user.
if (self.recognizer?.isAvailable)! {
let recordingFormat = self.node.outputFormat(forBus: 0)
self.node.installTap(onBus: 0, bufferSize: 2048, format: recordingFormat) { buffer, _ in
self.request?.append(buffer)
}
self.task = self.recognizer.recognitionTask(with: self.request!, resultHandler: { [weak self] result, error in
guard let self = self, let result = result else { return }
print("Result: \(result.bestTranscription.formattedString)") // Need to print the result before user heard a sound from speakers
})
}
I struggle to make the following scenario work as expected (code will be provided below).
Record my microphone input and store an AVAudioPCMBuffer in memory, this is done with AVAudioPCMBuffer extension method copy(from buffer: AVAudioPCMBuffer, readOffset: AVAudioFrameCount = default, frames: AVAudioFrameCount = default). I indeed get the buffer at the end of my recording.
When record is ended pass the buffer to AKPlayer and play. Here is a code snippet to demonstrate what I do (I know it is no the full app code, if needed I can share it):
.
private var player: AKPlayer = AKPlayer()
self.player.buffering = .always
// in the record complete callbak:
self.player.buffer = self.bufferRecorder?.pcmBuffer
self.player.volume = 1
self.player.play()
please note that the plater is connected to a mixer which is eventually connected to the AudioKit output.
when I inspect and debug the application I could see the buffer is with the correct length, and all my output/input setup uses the same processing format (sample rate, channels, bitrate etc) as well as the buffer recorded, but still my app crashes on this line:
2018-10-28 08:40:32.625001+0200 BeatmanApp[71037:6731884] [avae] AVAEInternal.h:70:_AVAE_Check:
required condition is false: [AVAudioPlayerNode.mm:665:ScheduleBuffer: (_outputFormat.channelCount == buffer.format.channelCount)]
when I debug and walk through the AudioKit code I can see that the breaking line is on AKPlayer+Playback.swift on line 162 on the method: playerNode.scheduleBuffer
more information that could be helpful:
the buffer recorded is 16 seconds long.
when I tried to pass the buffer straight to the player node in the tap method it seems as it worked, I did hear a delay from mic to speaker but it indeed played back.
I tried call prepare on the player before play method invoked, no help
thanks!
Ok, this was super uncool debugging session. I had to investigate the AVAudioEngine and how this kind of scenario could be done there, which of course not the final result I was looking. This quest helped me to understand how to solve it with AudioKit (half of my app is implemented using AudioKit's tools so it doesn't make sense to rewrite it with AVFoundation).
AFFoundation solution:
private let engine = AVAudioEngine()
private let bufferSize = 1024
private let p: AVAudioPlayerNode = AVAudioPlayerNode()
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.playAndRecord, mode: .default, options: .defaultToSpeaker)
} catch {
print("Setting category to AVAudioSessionCategoryPlayback failed.")
}
let inputNode = self.engine.inputNode
engine.connect(inputNode, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
// !!! the following lines are the key to the solution.
// !!! the player has to be attached to the engine before actually connected
engine.attach(p)
engine.connect(p, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
do {
try engine.start()
} catch {
print("could not start engine \(error.localizedDescription)")
}
recordBufferAndPlay(duration: 4)
recordBufferAndPlay function:
func recordBufferAndPlay(duration: Double){
let inputNode = self.engine.inputNode
let total: Double = AVAudioSession.sharedInstance().sampleRate * duration
let totalBufferSize: UInt32 = UInt32(total)
let recordedBuffer : AVAudioPCMBuffer! = AVAudioPCMBuffer(pcmFormat: inputNode.inputFormat(forBus: 0), frameCapacity: totalBufferSize)
var alreadyRecorded = 0
inputNode.installTap(onBus: 0, bufferSize: 256, format: inputNode.inputFormat(forBus: 0)) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
recordedBuffer.copy(from: buffer) // this helper function is taken from audio kit!
alreadyRecorded = alreadyRecorded + Int(buffer.frameLength)
print(alreadyRecorded, totalBufferSize)
if(alreadyRecorded >= totalBufferSize){
inputNode.removeTap(onBus: 0)
self.p.scheduleBuffer(recordedBuffer, at: nil, options: .loops, completionHandler: {
print("completed playing")
})
self.p.play()
}
}
}
AudioKit solution:
So in the AudioKit solution these line should be invoked on your AKPlayer object. Note that this should be done before you actually start your engine.
self.player.buffering = .always
AudioKit.engine.attach(self.player.playerNode)
AudioKit.engine.connect(self.player.playerNode, to: self.mixer.inputNode, format: AudioKit.engine.inputNode.outputFormat(forBus: 0))
than the record is done pretty similarly to how you would have done it in AVAudioEngine, you install a tap on your node (microphone or other node) and record the buffer of PCM samples.
I am trying to save depth images from the iPhoneX TrueDepth camera. Using the AVCamPhotoFilter sample code, I am able to view the depth, converted to grayscale format, on the screen of the phone in real-time. I cannot figure out how to save the sequence of depth images in the raw (16 bits or more) format.
I have depthData which is an instance of AVDepthData. One of its members is depthDataMap which is an instance of CVPixelBuffer and image format type kCVPixelFormatType_DisparityFloat16. Is there a way to save it to the phone to transfer for offline manipulation?
There's no standard video format for "raw" depth/disparity maps, which might have something to do with AVCapture not really offering a way to record it.
You have a couple of options worth investigating here:
Convert depth maps to grayscale textures (which you can do using the code in the AVCamPhotoFilter sample code), then pass those textures to AVAssetWriter to produce a grayscale video. Depending on the video format and grayscale conversion method you choose, other software you write for reading the video might be able to recover depth/disparity info with sufficient precision for your purposes from the grayscale frames.
Anytime you have a CVPixelBuffer, you can get at the data yourself and do whatever you want with it. Use CVPixelBufferLockBaseAddress (with the readOnly flag) to make sure the content won't change while you read it, then copy data from the pointer CVPixelBufferGetBaseAddress provides to wherever you want. (Use other pixel buffer functions to see how many bytes to copy, and unlock the buffer when you're done.)
Watch out, though: if you spend too much time copying from buffers, or otherwise retain them, they won't get deallocated as new buffers come in from the capture system, and your capture session will hang. (All told, it's unclear without testing whether a device has the memory & I/O bandwidth for much recording this way.)
You can use Compression library to create a zip file with the raw CVPixelBuffer data.
Few problems with this solution.
It's a lot of data and zip is not a good compression. (the compressed file is 20 times bigger than 32bits per frame video with the same number of frames).
Apple's Compression library creates a file which standard zip program does't open. I use zlib in C code to read it and use inflateInit2(&strm, -15); to make it work.
You'll need to do some work to export the file out of your application
Here is my code (which I limited to 250 frames since it hold it in RAM but you can flush to disk if needed more frames):
// DepthCapture.swift
// AVCamPhotoFilter
//
// Created by Eyal Fink on 07/04/2018.
// Copyright © 2018 Resonai. All rights reserved.
//
// Capture the depth pixelBuffer into a compress file.
// This is very hacky and there are lots of TODOs but instead we need to replace
// it with a much better compression (video compression)....
import AVFoundation
import Foundation
import Compression
class DepthCapture {
let kErrorDomain = "DepthCapture"
let maxNumberOfFrame = 250
lazy var bufferSize = 640 * 480 * 2 * maxNumberOfFrame // maxNumberOfFrame frames
var dstBuffer: UnsafeMutablePointer<UInt8>?
var frameCount: Int64 = 0
var outputURL: URL?
var compresserPtr: UnsafeMutablePointer<compression_stream>?
var file: FileHandle?
// All operations handling the compresser oobjects are done on the
// porcessingQ so they will happen sequentially
var processingQ = DispatchQueue(label: "compression",
qos: .userInteractive)
func reset() {
frameCount = 0
outputURL = nil
if self.compresserPtr != nil {
//free(compresserPtr!.pointee.dst_ptr)
compression_stream_destroy(self.compresserPtr!)
self.compresserPtr = nil
}
if self.file != nil {
self.file!.closeFile()
self.file = nil
}
}
func prepareForRecording() {
reset()
// Create the output zip file, remove old one if exists
let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0] as NSString
self.outputURL = URL(fileURLWithPath: documentsPath.appendingPathComponent("Depth"))
FileManager.default.createFile(atPath: self.outputURL!.path, contents: nil, attributes: nil)
self.file = FileHandle(forUpdatingAtPath: self.outputURL!.path)
if self.file == nil {
NSLog("Cannot create file at: \(self.outputURL!.path)")
return
}
// Init the compression object
compresserPtr = UnsafeMutablePointer<compression_stream>.allocate(capacity: 1)
compression_stream_init(compresserPtr!, COMPRESSION_STREAM_ENCODE, COMPRESSION_ZLIB)
dstBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
compresserPtr!.pointee.dst_ptr = dstBuffer!
//defer { free(bufferPtr) }
compresserPtr!.pointee.dst_size = bufferSize
}
func flush() {
//let data = Data(bytesNoCopy: compresserPtr!.pointee.dst_ptr, count: bufferSize, deallocator: .none)
let nBytes = bufferSize - compresserPtr!.pointee.dst_size
print("Writing \(nBytes)")
let data = Data(bytesNoCopy: dstBuffer!, count: nBytes, deallocator: .none)
self.file?.write(data)
}
func startRecording() throws {
processingQ.async {
self.prepareForRecording()
}
}
func addPixelBuffers(pixelBuffer: CVPixelBuffer) {
processingQ.async {
if self.frameCount >= self.maxNumberOfFrame {
// TODO now!! flush when needed!!!
print("MAXED OUT")
return
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let add : UnsafeMutableRawPointer = CVPixelBufferGetBaseAddress(pixelBuffer)!
self.compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(add.assumingMemoryBound(to: UInt8.self))
let height = CVPixelBufferGetHeight(pixelBuffer)
self.compresserPtr!.pointee.src_size = CVPixelBufferGetBytesPerRow(pixelBuffer) * height
let flags = Int32(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_OK {
NSLog("Buffer compression retured: \(compression_status)")
return
}
if self.compresserPtr!.pointee.src_size != 0 {
NSLog("Compression lib didn't eat all data: \(compression_status)")
return
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
// TODO(eyal): flush when needed!!!
self.frameCount += 1
print("handled \(self.frameCount) buffers")
}
}
func finishRecording(success: #escaping ((URL) -> Void)) throws {
processingQ.async {
let flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
self.compresserPtr!.pointee.src_size = 0
//compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_END {
NSLog("ERROR: Finish failed. compression retured: \(compression_status)")
return
}
self.flush()
DispatchQueue.main.sync {
success(self.outputURL!)
}
self.reset()
}
}
}
To explain my situation a little better I'm trying to make an app which will play a ping noise when a button is pressed and then proceed to record and transcribe the user's voice immediately after.
For the ping sound I'm using System Sound Services, to record the audio I'm using AudioToolbox, and to transcribe it I'm using Speech kit.
I believe the crux of my problem lies in the timing of the asynchronous System sound services play function:
//Button pressed function
let audiosession = AVAudioSession.sharedInstance()
let filename = "Ping"
let ext = "wav"
if let soundUrl = Bundle.main.url(forResource: filename, withExtension: ext){
var soundId: SystemSoundID = 0
AudioServicesCreateSystemSoundID(soundUrl as CFURL, &soundId)
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(soundid,_) -> Void in
AudioServicesDisposeSystemSoundID(soundid)
print("Sound played!")}, nil)
AudioServicesPlaySystemSound(soundId)
}
do{
try audiosession.setCategory(AVAudioSessionCategoryRecord)
try audiosession.setMode(AVAudioSessionModeMeasurement)
try audiosession.setActive(true, with: .notifyOthersOnDeactivation)
print("Changing modes!")
}catch{
print("error with audio session")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else{
fatalError("Audio engine has no input node!")
}
guard let recognitionRequest = recognitionRequest else{
fatalError("Unable to create a speech audio buffer recognition request object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, delegate: self)
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do{
try audioEngine.start()
delegate?.didStartRecording()
}catch{
print("audioEngine couldn't start because of an error")
}
What happens when I run this code is that it records the voice and transcribes it successfully. However the ping is never played. The two(non-error) print statements I have in there fire in the order:
Changing modes!
Sound played!
So to my understanding, the reason the ping sound isn't being played is because by the time it actually completes I've already changed the audio session category from playback to record. Just to verify this is true, I tried removing everything but the sound services ping and it plays the sound as expected.
So my question is what is the best way to bypass the asynchronous nature of the AudioServicesPlaySystemSound call? I've experimented with trying to pass self into the completion function so I could have it trigger a function in my class which then runs the recording chunk. However I haven't been able to figure out how one actually goes about converting self to an UnsafeMutableRawPointer so it can be passed as clientData. Furthermore, even if I DID know how to do that, I'm not sure if it's even a good idea or the intended use of that parameter.
Alternatively, I could probably solve this problem by relying on something like notification center. But once again that just seems like a very clunky way of solving the problem that I'm going to end up regretting later.
Does anyone know what the correct way to handle this type of situation is?
Update:
As per Gruntcake's request, here is my attempt to access self in the completion block.
First I create a userData constant which is an UnsafeMutableRawPointer to self:
var me = self
let userData = withUnsafePointer(to: &me) { ptr in
return unsafeBitCast(ptr, to: UnsafeMutableRawPointer.self)
Next I use that constant in my callback block, and attempt to access self from it:
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(sounded,me) -> Void in
AudioServicesDisposeSystemSoundID(sounded)
let myself = Unmanaged<myclassname>.fromOpaque(me!).takeRetainedValue()
myself.doOtherStuff()
print("Sound played!")}, userData)
Your attempt to call doOtherStuff() in the completion block is a correct approach (the only other one is notifications, those are the only two options)
What is complicating it in this case is the bridging from Obj-C to Swift that is necessary. Code to do that is:
let myData = unsafeBitCast(self, UnsafeMutablePointer<Void>.self)
AudioServicesAddSystemSoundCompletion(YOUR_SOUND_ID, CFRunLoopGetMain(), kCFRunLoopDefaultMode,{ (mSound, mVoid) in
let me = unsafeBitCast(mVoid, YOURCURRENTCLASS.self)
//me it is your current object so if yo have a variable like
// var someVar you can do
print(me.someVar)
}, myData)
Credit: This code was taken from an answer to this question, though it is not the accepted answer:
How do I implement AudioServicesSystemSoundCompletionProc in Swift?
I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
#objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
#objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio