I have an AVAssetWriter and I do set audio compression settings dictionary using canApply(outputSettings: audioCompressionSettings, forMediaType: .audio) API.
One of the fields in the compression settings is setting an audio sample rate using AVSampleRateKey. My question is if the sample rate I set in this key is different from sample rate of audio sample buffers that are appended, can this cause audio to drift away from video? Is setting arbitrary sample rate in asset writer settings not recommended?
If the sample rate of the sample buffers and the AVAssetWriterInput's outputSettings differ, then the sample buffers will be rate converted.
I have not observed AV sync problems due to Apple sample rate conversion APIs.
They seem to do the right thing.
Are you seeing a problem?
Related
I need to analyse chunks of audio data of (approximately) 1 second with a sample rate of 8kHz. Although the audio will be recorded in real time, it will only be used for detecting specific events. Hence, there are no strict latency requirements. What would be the best framework to use in this case?
I already started learning Core Audio and I worked through the book Learning Core Audio. With the minimal amount of Swift documentation available on the internet I was able to set up an AUGraph for iOS to record audio with the remote I/O audio unit and to get acces to the raw samples with the output render callback. Unfortunately, I got stuck to create chunks of 1 seconds of audio samples to perform the audio analysis. Could a custom AudioBufferList be used for this? Or could a large ringbuffer be implemented on the remote I/O audio unit (like it is required in case of a HAL audio unit)?
I also tried to adopt AVFoundation with AVAssetReader to obtain the audio chunks. Although I was able to obtain samples of a recorded audio signal, I did not succes in creating a buffer of 1 second (and I even don’t know whether it would be possible to do this in realtime). Would AVFoundation be a good choice in this situation anyhow?
I would appreciate any advice on this.
A main problem for me is the fact that I try to use Swift but that there is not much example code available and that there is even less documentation. I feel that it would be better to switch to Objective-C for audio programming, and to stop trying to get everything in Swift. I am curious whether this would be a better time investment?
For analyzing 1 second windows of audio samples, the simplest solution would be to use the Audio Queue API with a lock-free ring buffer (say around 2 seconds long) to record samples. You can use a repeating nstimer task to poll how full the buffer is, and emit 1 second chunks to a processing task when they become available.
Core Audio and the RemoteIO Audio Unit is for if you need much shorter data windows with latency requirements on the order a few milliseconds.
Core Audio is a C API.
Objective-C is an extension of C. I find that Objective-C is much nicer for working with core audio than swift.
I created a cross platform c lockless ring buffer. There is sample code that demonstrates setting up the ring, setting up the mic, playing audio, and reading and writing from the ring.
The ring records that last N number of seconds that you specify. Old data is overwritten by new data. So you specify that you want the latest 3 seconds recorded. The sample I show plays a sine wave while recording through the microphone. Every 7 seconds, it grabs the last 2 seconds of recorded audio.
Here is the complete sample code on github.
I'm recording audio files at a bit rate of 44.1khz. I like having high quality audio for playback purposes. However, when I want to export via text or email, the audio files fail to export because they're larger than 15MB (usually for audio files +3mins). Is there a way to reduce the bit rate only when I want to export? I've seen the following tutorial, but I'd rather keep my files as m4a rather than converting to aac:
http://atastypixel.com/blog/easy-aac-compressed-audio-conversion-on-ios/.
You can use AVAssetReader and AVAssetWriter to transcode an audio file to one with different parameters (lower bit rate, higher compression, etc.). Just because you create a new (temporary?) audio file for export doesn't force you to delete the current higher quality audio file you want for playback.
What I want to do is to take the output samples of an AVAsset corresponding to an audio file (no video involved) and send them to an audio effect class that takes in a block of samples, and I want to be able to this in real time.
I am currently looking at the AVfoundation class reference and programming guide, but I can't see a way of redirect the output of a player item and send it to my effect class, and from there, send the transformed samples to an Audio output (using AVAssetReaderAudioMixOutput?) and hear it from there. I see that the AVAssetReader class gives me a way to get a block of samples using
[myAVAssetReader addOutput:myAVAssetReaderTrackOutput];
[myAVAssetReaderTrackOutput copyNextSampleBuffer];
but Apple documentation specifies that the AVAssetReader class is not made and should not be used for real-time situations. Does anybody have a suggestion on where to look, or if I am having the right approach?
The MTAudioProcessingTap is perfect for this. By leveraging an AVPlayer, you can avoid having to block the samples yourself with the AVAssetReaderOutput and then render them yourself in an Audio Queue or with an Audio Unit.
Instead, attach an MTAudioProcessingTap to the inputParameters of your AVAsset's audioMix, and you'll be given samples in blocks which are easy to then throw into an effect unit.
Another benefit from this is that it will work with AVAssets derived from URLs that can't always be opened by other Apple APIs (like Audio File Services), such as the user's iPod library. Additionally, you get all of the functionality like tolerance of audio interruptions that the AVPlayer provides for free, which you would otherwise have to implement by hand if you went with an AVAssetReader solution.
To set up a tap you have to set up some callbacks that the system invokes as appropriate during playback. Full code for such processing can be found at this tutorial here.
There's a new MTAudioProcessingTap object in iOS 6 and Mac OS 10.8 . Check out the Session 517 video from WWDC 2012 - they've demonstrated exactly what you want to do.
WWDC Link
AVAssetReader is not ideal for realtime usage because it handles the decoding for you, and in various cases copyNextSampleBuffer can block for random amounts of time.
That being said, AVAssetReader can be used wonderfully well in a producer thread feeding a circular buffer. It depends on your required usage, but I've had good success using this method to feed a RemoteIO output, and doing my effects/signal processing in the RemoteIO callback.
I'm been happily synthesizing audio (at 44.1khz) and sending it out through the RemoteIO audio unit. It's come to my attention that my app's audio is "garbled" when going out via HDMI to a certain model of TV. It looks to me like the problem is related to the fact that this TV is looking for audio data at 48khz.
Here are some questions:
Does RemoteIO adopt the sample rate of whichever device it's outputting to? If I'm sending audio via HDMI to a device that asks for 48kz, do my RemoteIO callback buffers become 48khz?
Is there some tidy way to just force RemoteIO to still think in terms of 44.1khz, and be smart enough to perform any necessary sample rate conversions on its own, before it hands data off to the device?
If RemoteIO does indeed just defer to the device it's connected to, then presumably I need to do some sample rate conversion between my synthesis engine and remote IO. Is AudioConverterConvertComplexBuffer the best way to do this?
Fixed my problem. I was incorrectly assuming that the number of frames requested by the render callback would be a power of two. Changed my code to accommodate any arbitrary number of frames and all seems to work fine now.
If you want sample rate conversion, try using the Audio Queue API, or do the conversion within your own app using some DSP code.
Whether the RemoteIO buffer size or sample rate can be configured or not might depend on iOS device model, OS version, audio routes, background modes, etc., so an app must accomodate different buffer sizes and sample rates when using RemoteIO.
Can I set any sample rate I want? What are the restrictions?
How about the hardware sample rate? And once that is set, what is the restriction on the internal sample rates passed between units?
I'm guessing that the actual hardware rate may have to be some bit shift of 44.1KHz, and any internal sample rates must be a downward bit shift of this original value (e.g. 22.1KHz, 11.05KHz). Is this close?
As far as I understand,
1. I set the hardware sample rate from the Audio Session.
2. The system will set a sample rate as close as it is able to the sample rate I specified.
3. I then query the audio session for the same property I set, which will give me the actual sample rate it is using
At the level of audio units, specifically the RemoteIO unit, documentation states that the two points at which the unit connects to the hardware (ie the input scope of the microphone (input) bus and the output scope of the speaker (output) bus), the sample rate may be retrieved but not set.
However, I attempt to access this value while constructing the remote I/O unit, and it returns zero. I guess maybe I need to start the unit before I can get meaningful data from its connections (the act of starting it probably creates the connections). So the solution here appears to be to get the sample rate from the audio session and use that, as per the above.
NEED TAG: Audio-Unit
I'm assuming you're on iOS since you mention AudioSessions. So you'll want to:
Check for audio input hardware. AudioSessionGetProperty (kAudioSessionProperty_AudioInputAvailable...)
Set audio session to "play & record" mode. AudioSessionSetProperty (kAudioSessionProperty_AudioCategory...) with kAudioSessionCategory_PlayAndRecord
Activate the session. AudioSessionSetActive()
Get the preferred sample rate. AudioSessionGetProperty (kAudioSessionProperty_CurrentHardwareSampleRate)
Then you can set up your audio processing chain with the correct sample rate.
As for playing back audio, you can use any sample rate and the API should convert it to the hardware's output sample rate. Obviously if you use a very high sample rate, it'l consume a lot of memory and CPU time.