I am creating a metronome as part of a larger app and I have a few very short wav files to use as the individual sounds. I would like to use AVAudioEngine because NSTimer has significant latency problems and Core Audio seems rather daunting to implement in Swift. I'm attempting the following, but I'm currently unable to implement the first 3 steps and I'm wondering if there is a better way.
Code outline:
Create an array of file URLs according to the metronome's current settings (number of beats per bar and subdivisions per beat; file A for beats, file B for subdivisions)
Programmatically create a wav file with the appropriate number of frames of silence, based on the tempo and the length of the files, and insert it into the array between each of the sounds
Read those files into a single AudioBuffer or AudioBufferList
audioPlayer.scheduleBuffer(buffer, atTime:nil, options:.Loops, completionHandler:nil)
So far I have been able to play a looping buffer (step 4) of a single sound file, but I haven't been able to construct a buffer from an array of files or create silence programmatically, nor have I found any answers on StackOverflow that address this. So I'm guessing that this isn't the best approach.
My question is: Is it possible to schedule a sequence of sounds with low latency using AVAudioEngine and then loop that sequence? If not, which framework/approach is best suited for scheduling sounds when coding in Swift?
I was able to make a buffer containing sound from file and silence of required length. Hope this will help:
// audioFile here – an instance of AVAudioFile initialized with wav-file
func tickBuffer(forBpm bpm: Int) -> AVAudioPCMBuffer {
audioFile.framePosition = 0 // position in file from where to read, required if you're read several times from one AVAudioFile
let periodLength = AVAudioFrameCount(audioFile.processingFormat.sampleRate * 60 / Double(bpm)) // tick's length for given bpm (sound length + silence length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: periodLength)
try! audioFile.readIntoBuffer(buffer) // sorry for forcing try
buffer.frameLength = periodLength // key to success. This will append silcence to sound
return buffer
}
// player – instance of AVAudioPlayerNode within your AVAudioEngine
func startLoop() {
player.stop()
let buffer = tickBuffer(forBpm: bpm)
player.scheduleBuffer(buffer, atTime: nil, options: .Loops, completionHandler: nil)
player.play()
}
I think that one of possible ways to have sounds played at with lowest possible time error is providing audio samples directly via callback. In iOS you could do this with AudioUnit.
In this callback you could track sample count and know at what sample you are now. From sample counter you could go to time value (using sample rate) and use it for your high level tasks like metronome. If you see that it is time to play metronome sound then you just starting to copy audio samples from that sound to buffer.
This is a theoretic part without any code, but you could find many examples of AudioUnit and callback technique.
To expand upon 5hrp's answer:
Take the simple case where you have two beats, an upbeat (tone1) and a downbeat (tone2), and you want them out of phase with each other so the audio will be (up, down, up, down) to a certain bpm.
You will need two instances of AVAudioPlayerNode (one for each beat), let's call them audioNode1 and audioNode2
The first beat you will want to be in phase, so setup as normal:
let buffer = tickBuffer(forBpm: bpm)
audioNode1player.scheduleBuffer(buffer, atTime: nil, options: .loops, completionHandler: nil)
then for the second beat you want it to be exactly out of phase, or to start at t=bpm/2. for this you can use an AVAudioTime variable:
audioTime2 = AVAudioTime(sampleTime: AVAudioFramePosition(AVAudioFrameCount(audioFile2.processingFormat.sampleRate * 60 / Double(bpm) * 0.5)), atRate: Double(1))
you can use this variable in the buffer like so:
audioNode2player.scheduleBuffer(buffer, atTime: audioTime2, options: .loops, completionHandler: nil)
This will play on loop your two beats, bpm/2 out of phase from each other!
It's easy to see how to generalise this to more beats, to create a whole bar. It's not the most elegant solution though, because if you want to say do 16th notes you'd have to create 16 nodes.
Related
I am trying to extract MFCC vectors from the audio signal as input into a recurrent neural network. However, I am having trouble figuring out how to obtain the raw audio frames in Swift using Core Audio. Presumably, I have to go low-level to get that data, but I cannot find helpful resources in this area.
How can I get the audio signal information that I need using Swift?
Edit: This question was flagged as a possible duplicate of How to capture audio samples in iOS with Swift?. However, that particular question does not have the answer that I am looking for. Namely, the solution to that question is the creation of an AVAudioRecorder, which is a component, not the end result, of a solution to my question.
This question How to convert WAV/CAF file's sample data to byte array? is more in the direction of where I am headed. The solutions to that are written in Objective-C, and I am wondering if there is a way to do it in Swift.
Attaching a tap to the default input node on AVAudioEngine is pretty straightforward and will get you real-time ~100ms chunks of audio from the microphone as Float32 arrays. You don't even have to connect any other audio units. If your MFCC extractor & network are sufficiently responsive this may be the easiest way to go.
let audioEngine = AVAudioEngine()
if let inputNode = audioEngine.inputNode {
inputNode.installTap( onBus: 0, // mono input
bufferSize: 1000, // a request, not a guarantee
format: nil, // no format translation
block: { buffer, when in
// This block will be called over and over for successive buffers
// of microphone data until you stop() AVAudioEngine
let actualSampleCount = Int(buffer.frameLength)
// buffer.floatChannelData?.pointee[n] has the data for point n
var i=0
while (i < actualSampleCount) {
let val = buffer.floatChannelData?.pointee[i]
// do something to each sample here...
i += 1
}
})
do {
try audioEngine.start()
} catch let error as NSError {
print("Got an error starting audioEngine: \(error.domain), \(error)")
}
}
You will need to request and obtain microphone permission as well.
I find the amplitudes to be rather low, so you may need to apply some gain or normalization depending on your network's needs.
To process your WAV files, I'd try AVAssetReader, though I don't have code at hand for that.
Let's say I have an AVAudioFile with a duration of 10 seconds. I want to load that file into an AVAudioPCMBuffer but I only want to load the audio frames that come after a certain number of seconds/milliseconds or after a certain AVAudioFramePosition.
It doesn't look like AVAudioFile's readIntoBuffer methods give me that kind of precision so I'm assuming I'll have to work at the AVAudioBuffer level or lower?
You just need to set the AVAudioFile's framePosition property before reading.
I need to send audio data in real-time in PCM format 8 KHz 16 Bit Mono.
Audio must been sent like array of chars with length
(<#char *data#>, <#int len#>).
Now I'm beginner in Audio processing and cant really understand how to accomplish that. My best try was been to convert to iLBC format and try but it couldn't work. Is there any sample how to record and convert audio to any format. I have already read Learning Core Audio from Chris Adamson and Kevin Avila but I really didn't find solution that works.
Simple what i need:
(record)->(convert?)-> send(char *data, int length);
Couse I need to send data like arrays of chars i cant use player.
EDIT:
I managed to make everything work with recording and with reading buffers. What I can't manage is :
if (ref[i]->mAudioDataByteSize != 0){
char * data = (char*)ref[i]->mAudioData;
sendData(mHandle, data, ref[i]->mAudioDataByteSize);
}
This is not really a beginner task. The solutions are to use either the RemoteIO Audio Unit, the Audio Queue API, or an AVAudioEngine installTapOnBus block. These will give you near real-time (depending on the buffer size) buffers of audio samples (Int16's or Floats, etc.) that you can convert, compress, pack into other data types or arrays, etc. Usually by calling a callback function or block that you provide to do whatever you want with the incoming recorded audio sample buffers.
I'm working on a simple audio playback app. It has approximately 10 audio files, each with a normal playback tempo of 100 beats per minute. The user is able to input a tempo variable (between 70 and 140 b.p.m.) which is assigned (tempo/100) to the AVAudioPlayer rate var, just before the play() function is called...
#IBAction func playPause(sender: AnyObject) {
if !isPlaying {
let audioPath = NSBundle.mainBundle().pathForResource(selectedTrack, ofType: "mp3")
do {
try player = AVAudioPlayer(contentsOfURL: NSURL(fileURLWithPath: audioPath!))
player.enableRate = true
player.rate = Float(tempo) / 100
player.play()
isPlaying = !isPlaying
} catch {
print ("play error")
}
}
}
Playback at the audio's normal tempo (100b.p.m.) works perfectly fine. However, by changing the tempo by even a single bpm unit, the playback sounds really poor. The tempo shift sounds accurate (i.e. lowering the tempo var results in the audio slowing down, and vice versa), and the pitch sounds like it is maintained (albeit, a little 'wobbly' in sustained notes), but the quality of the sound seems to be negatively affected in a major way. I would perhaps expect this for more extreme rate changes (rate<50% or rate>200%), but it is totally apparent even at 99% and 101%.
I was using 44k/16bit .wav, then tried .mp3 (in a variety of qualities), all with the same result. I've also looked at this, which seems similar if not the same (though the query was never resolved)... AVAudioPlayer rate change introduces artifacts/distortion
Changing the playback speed of these files using other software (DAWs and virtual DJ softs) does not create the same anomalies, so my assumption is that perhaps the algorithm that interpolates the extra data points on the waveform is simply not robust enough for my purpose.
But if anyone can give me a solution, I'd be totally stoked.
I'm trying to make an accurate timer to analyze an input. I'd like to be able to measure 1% deviation in signals of ~200ms.
My understanding is that using an AudioUnit will be able to get <1ms.
I tried implementing the code from Stefan Popp's example
After updating a few things to get it to work on xcode 6.3, I have the example working, however:
While I do eventually want to capture audio, I thought there should be some way to get a notification, like NSTimer, so I tried an AudioUnitAddRenderNotify, but it does exactly what it says it should - i.e it's tied to the render, not just an arbitrary timer. Is there some way to get a callback triggered without having to record or play?
When I examine mSampleTime, I find that the interval between slices does match the inNumberFrames - 512 - which works out to 11.6ms. I see the same interval for both record and play. I need more resolution than that.
I tried playing with kAudioSessionProperty_PreferredHardwareIOBufferDuration but all the examples I could find use the deprecated AudioSessions, so I tried to convert to AudioUnits:
Float32 preferredBufferSize = .001; // in seconds
status = AudioUnitSetProperty(audioUnit, kAudioSessionProperty_PreferredHardwareIOBufferDuration, kAudioUnitScope_Output, kOutputBus, &preferredBufferSize, sizeof(preferredBufferSize));
But I get OSStatus -10879, kAudioUnitErr_InvalidProperty.
Then I tried kAudioUnitProperty_MaximumFramesPerSlice with values of 128 and 256, but inNumberFrames is always 512.
UInt32 maxFrames = 128;
status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFrames, sizeof(maxFrames));
[EDIT]
I am trying to compare the timing of an input (user's choice of MIDI or microphone) to when it should be. Specifically, is the instrument being played before or after the beat/metronome and by how much? This is for musicians, not a game, so precision is expected.
[EDIT]
The answers seem re-active to events. i.e. They let me precisely see when something happened, however I don't see how I do something accurately. My fault for not being clear. My app needs to be the metronome as well - synchronize playing a click on the beat and flash a dot on the beat - then I can analyze the user's action to compare timing. But if I can't play the beat accurately, the rest falls apart. Maybe I'm supposed to record audio - even if I don't want it - just to get inTimeStamp from the callback?
[EDIT]
Currently my metronome is:
- (void) setupAudio
{
AVAudioPlayer *audioPlayer;
NSString *path = [NSString stringWithFormat:#"%#/click.mp3", [[NSBundle mainBundle] resourcePath]];
NSURL *soundUrl = [NSURL fileURLWithPath:path];
audioPlayer = [[AVAudioPlayer alloc] initWithContentsOfURL:soundUrl error:nil];
[audioPlayer prepareToPlay];
CADisplayLink *syncTimer;
syncTimer = [CADisplayLink displayLinkWithTarget:self selector:#selector(syncFired:)];
syncTimer.frameInterval = 30;
[syncTimer addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSDefaultRunLoopMode];
}
-(void)syncFired:(CADisplayLink *)displayLink
{
[audioPlayer play];
}
You should be using a circular buffer, and performing your analysis on the signal in chunks that match your desired frame count on your own timer. To do this you set up a render callback, then feed your circular buffer the input audio in the callback. Then you set up your own timer which will pull from the tail of the buffer and do your analysis. This way you could be feeding the buffer 1024 frames every 0.23 seconds, and your analysis timer could fire maybe every 0.000725 seconds and analyze 32 samples. Here is a related question about circular buffers.
EDIT
To get precision timing using a ring buffer, you could also store the timestamp corresponding to the audio buffer. I use TPCircularBuffer for doing just that. TPCircularBufferPrepareEmptyAudioBufferList, TPCircularBufferProduceAudioBufferList, and TPCircularBufferNextBufferList will copy and retrieve the audio buffer and timestamp to and from a ring buffer. Then when you are doing your analysis, there will be a timestamp corresponding to each buffer, eliminating the need to do all of your work in the render thread, and allowing you to pick and choose your analysis window.
If you are using something like cross-correlation and/or a peak detector to find a matched sample vector within an audio sample buffer (or a ring buffer containing samples), then you should be able to count samples between sharp events to within one sample (1/44100 or 0.0226757 milliseconds at a 44.1k Hz sample rate), plus or minus some time estimation error. For events more than one Audio Unit buffer apart, you can sum and add the number of samples within the intervening buffers to get a more precise time interval than just using (much coarser) buffer timing.
However, note that there is a latency or delay between every sample buffer and speaker audio going out, as well as between microphone sound reception and buffer callbacks. That has to be measured, as in you can measure the round trip time between sending a sample buffer out, and when the input buffer autocorrelation estimation function gets it back. This is how long it takes the hardware to buffer, convert (analog to digital and vice versa) and pass the data. That latency might be around the area of 2 to 6 times 5.8 milliseconds, using appropriate Audio Session settings, but might be different for different iOS devices.
Yes, the most accurate way to measure audio is to capture the audio and look at the data in the actual sampled audio stream.