I'm building a rhythm game and trying to provide extremely low latency audio response to the user using AudioKit.
I'm new to AudioKit, following the Hello world example, I built a very simple test app using AKOscillator:
...
let oscillator = AKOscillator()
...
AudioKit.output = oscilator
oscillator.frequency = 880
AKSettings.bufferLength = .shortest
AKSettings.ioBufferDuration = 0.002
AudioKit.start()
... // On Touch event ///
oscillator.start()
... // 20 ms later ///
oscillator.stop()
I measured the latency between touch event and the first sound coming out, it's around 100ms, which is way to slow for us...
Few possibilities I could think:
100 ms is hitting the hardware limit of the audio output delay
some more magic settings can fix this
oscillator.start() has some delay, to achieve lowest latency I should use something else
something wrong with the other parts of the code (touch handling etc.)
Since I have now experience on AudioKit (nor iOS audio system...), any piece of information will be really appreciated!
Related
I am writing an app to recording single-channel audio with the built-in microphone on an iPhone 6. The apps works as expected when configured to record at 8000 Hz. Here's the code
// Set up audio session
let session = AVAudioSession.sharedInstance()
// Configure audio session
do {
try session.setCategory(AVAudioSessionCategoryPlayAndRecord)
var recordSettings = [String:AnyObject]()
recordSettings[AVFormatIDKey] = Int(kAudioFormatLinearPCM) as AnyObject
// Set the sampling rate
recordSettings[AVSampleRateKey] = 8000.0 as AnyObject
recordSettings[AVNumberOfChannelsKey] = 1 as AnyObject
recorder = try AVAudioRecorder(url: outputFileURL, settings: recordSettings)
recorder?.delegate = self
recorder?.isMeteringEnabled = true
recorder?.prepareToRecord()
return true
}
catch {
throw Error.AVConfiguration
}
To reduce the storage requirements, I would like to record at a much lower sample rate (ideally less than 1000 Hz). If I set the sample rate to 1000 Hz, the app records at 8000 Hz.
According to Apple's documentation,
The available range for hardware sample rate is device dependent. It typically ranges from 8000 through 48000 hertz.
Question...is it possible to use AVAudioSession (or other framework) to record audio at a low sample rate?
Audio recording on iPhone are made with hardware codecs so available frame rates are hardcoded and can't be changed. But if you need to have 1kHz sample rate you can record at 8kHz and than just resample you record with some resample library. Personally, I prefer to use ffmpeg for such tasks.
I hope you are aware that by the niquist theorem you cannot expect very useful results of what you try to achieve.
That is, except you are targeting for low frequencies only. For that case you might want to use a low-pass filter first. It's almost impossible to understand voices with olny frequencies below 500 Hz. Speech is usually said to require 3 kHz, that makes for a sample rate of 6000.
For an example of what you'd have to expect try something similar to:
ffmpeg -i tst.mp3 -ar 1000 tst.wav
with e.g. some vocals and listen to the result.
You can however possibly achieve some acceptable trade-off using e.g. a sample rate of 3000.
An alternative would be to do some compression on the fly as #manishg suggested. As Smartphones these days can do video compression in real time that should be totally feasible with iPhone's Hard- and Software. But it's a totally different thing than reducing the sample rate.
I'm working on a simple audio playback app. It has approximately 10 audio files, each with a normal playback tempo of 100 beats per minute. The user is able to input a tempo variable (between 70 and 140 b.p.m.) which is assigned (tempo/100) to the AVAudioPlayer rate var, just before the play() function is called...
#IBAction func playPause(sender: AnyObject) {
if !isPlaying {
let audioPath = NSBundle.mainBundle().pathForResource(selectedTrack, ofType: "mp3")
do {
try player = AVAudioPlayer(contentsOfURL: NSURL(fileURLWithPath: audioPath!))
player.enableRate = true
player.rate = Float(tempo) / 100
player.play()
isPlaying = !isPlaying
} catch {
print ("play error")
}
}
}
Playback at the audio's normal tempo (100b.p.m.) works perfectly fine. However, by changing the tempo by even a single bpm unit, the playback sounds really poor. The tempo shift sounds accurate (i.e. lowering the tempo var results in the audio slowing down, and vice versa), and the pitch sounds like it is maintained (albeit, a little 'wobbly' in sustained notes), but the quality of the sound seems to be negatively affected in a major way. I would perhaps expect this for more extreme rate changes (rate<50% or rate>200%), but it is totally apparent even at 99% and 101%.
I was using 44k/16bit .wav, then tried .mp3 (in a variety of qualities), all with the same result. I've also looked at this, which seems similar if not the same (though the query was never resolved)... AVAudioPlayer rate change introduces artifacts/distortion
Changing the playback speed of these files using other software (DAWs and virtual DJ softs) does not create the same anomalies, so my assumption is that perhaps the algorithm that interpolates the extra data points on the waveform is simply not robust enough for my purpose.
But if anyone can give me a solution, I'd be totally stoked.
I downloaded the speakHere example and changed the parameters liked below:
#define kBufferDurationSeconds 0.020
void AQRecorder::SetupAudioFormat(UInt32 inFormatID)
{
memset(&mRecordFormat,0, sizeof(mRecordFormat));
mRecordFormat.mFormatID =kAudioFormatLinearPCM;
mRecordFormat.mSampleRate =8000.0;
mRecordFormat.mFormatFlags =kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
mRecordFormat.mBitsPerChannel = 16;
mRecordFormat.mFramesPerPacket = mRecordFormat.mChannelsPerFrame =1;
mRecordFormat.mBytesPerFrame = (mRecordFormat.mBitsPerChannel/8) * mRecordFormat.mChannelsPerFrame;
mRecordFormat.mBytesPerPacket = mRecordFormat.mBytesPerFrame ;
}
But I found that it seemed the time interval of the callback function AQRecorder::MyInputBufferHandler() was called but not per 20ms. It called the callback function four times with 1ms interval and after 500ms calls the callback function one time, then four times with 1ms, then 500ms,over and over again. But I set the parameter kBufferDurationSeconds = 0.02
what cause this problem. Please help me.
In iOS, the Audio Session setPreferredIOBufferDuration API (did you even use an OS buffer duration call?) is only a request regarding the app's preference. The OS is free to choose a buffer duration different, but compatible with what iOS thinks is best (for battery life, compatibility with other apps, etc.)
Audio Queues run on top of Audio Units. If the RemoteIO Audio Unit is using 500 mS buffers, it will cut them up into 4 smaller Audio Queue buffers and pass those smaller buffers to the Audio Queue API in a quick burst.
If you use the Audio Unit API instead of the Audio Queue API, and the Audio Session API for a setPreferredIOBufferDuration message, you may be able to request and get shorter, more evenly spaced, audio buffers.
I simply need to know when the user starts talking into the microphone. I will not be doing any speech processing or anything fancy, just detect when the microphone has picked up anything. I've been looking for an hour now and can't find anything as simple as that. Can somebody point me in the right direction?
Update 1
I apologise for how late this is; I have been having connectivity issues. Here's the code I've been using:
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
let audioEngine = AVAudioEngine()
let inputNode = audioEngine.inputNode
let bus = 0
inputNode.installTapOnBus(bus, bufferSize: 8192, format:inputNode.inputFormatForBus(bus)) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
println("Speech detected.")
}
audioEngine.prepare()
audioEngine.startAndReturnError(nil)
}
The callback you're passing to installTapOnBus will be called with every audio block coming from the mic. The code above detects that your app has started listening -- to silence or anything else -- not whether someone is speaking into the mic.
In order to actually identify the start of speech you would need to look at the data in the buffer.
A simple version of this is similar to an audio noise gate used in PA systems: Pick an amplitude threshold and a duration threshold and once both are met you call it speech. Because phones, mics, and environments all vary you will probably need to adaptively determine an amplitude threshold to get good performance.
Even with an adaptive threshold, sustained loud sounds will still be considered "speech". If you need to weed those out too then you'll want to look at some sort of frequency analysis (e.g., FFT) and identify sufficient amplitude and variation over time in speech frequencies. Or you could pass the buffers to the speech recognition engine (e.g., SFSpeechRecognizer) and see whether it recognizes anything, hence piggybacking on Apple's signal processing work. But that's pretty heavyweight if you do it often.
I use ffmpeg to decode video/audio stream and use portaudio to play audio. I encounter a sync problem with portaudio. I have a function like below,
double AudioPlayer::getPlaySec() const
{
double const latency = Pa_GetStreamInfo( mPaStream )->outputLatency;
double const bytesPerSec = mSampleRate * Pa_GetSampleSize( mSampleFormat ) * mChannel;
double const playtime = mConsumedBytes / bytesPerSec;
return playtime - latency;
}
mCousumeBytes is the byte count which written into audio device in the portaudio callback function. I thought I could have got the playing time according to the byte count. Actually, when I execute the other process ( like open firefox ) which make cpu busy, the audio become intermittent, but the callback doesn't stop so that mConsumeBytes is more than expected, and getPlaySec return a time which is larger than playing time.
I have no idea how this happened. Any suggestion is welcome. Thanks!
Latency, in PortAudio is defined a bit vaguely. Something like the average time between when you put data into the buffer and when you can expect it to play. That's not something you want to use for this purpose.
Instead, to find the current playback time of the device, you can actually poll the device using the Pa_GetStreamTime function.
You may want to see this document for more detailed info.
I know this is old. But still; PortAudio v19+ can provide you with its own sample rate. You should use that for audio sync, since actual sample rate playback can differ between different hardware. PortAudio might try to compensate (depending on implementation). If you have drift problems, try using that.