iOS: Audio Units: setting arbitrary sample rate - ios

Can I set any sample rate I want? What are the restrictions?
How about the hardware sample rate? And once that is set, what is the restriction on the internal sample rates passed between units?
I'm guessing that the actual hardware rate may have to be some bit shift of 44.1KHz, and any internal sample rates must be a downward bit shift of this original value (e.g. 22.1KHz, 11.05KHz). Is this close?
As far as I understand,
1. I set the hardware sample rate from the Audio Session.
2. The system will set a sample rate as close as it is able to the sample rate I specified.
3. I then query the audio session for the same property I set, which will give me the actual sample rate it is using
At the level of audio units, specifically the RemoteIO unit, documentation states that the two points at which the unit connects to the hardware (ie the input scope of the microphone (input) bus and the output scope of the speaker (output) bus), the sample rate may be retrieved but not set.
However, I attempt to access this value while constructing the remote I/O unit, and it returns zero. I guess maybe I need to start the unit before I can get meaningful data from its connections (the act of starting it probably creates the connections). So the solution here appears to be to get the sample rate from the audio session and use that, as per the above.
NEED TAG: Audio-Unit

I'm assuming you're on iOS since you mention AudioSessions. So you'll want to:
Check for audio input hardware. AudioSessionGetProperty (kAudioSessionProperty_AudioInputAvailable...)
Set audio session to "play & record" mode. AudioSessionSetProperty (kAudioSessionProperty_AudioCategory...) with kAudioSessionCategory_PlayAndRecord
Activate the session. AudioSessionSetActive()
Get the preferred sample rate. AudioSessionGetProperty (kAudioSessionProperty_CurrentHardwareSampleRate)
Then you can set up your audio processing chain with the correct sample rate.
As for playing back audio, you can use any sample rate and the API should convert it to the hardware's output sample rate. Obviously if you use a very high sample rate, it'l consume a lot of memory and CPU time.

Related

Can I use TokBox OTSubscriberKitNetworkStatsDelegate to calculate bandwidth

I am building a video conferencing app with TokBox. I would like to give the user an indication of how well the streams are behaving. I have noticed that the OTSubscriberKitNetworkStatsDelegate lets you view how many audio and video packets a subscriber has lost. What is unclear is wether this is an indication of the health of your connection or theirs. I assume that I could use this delegate to view my own dropped packets (as a publisher AND a subscriber). Would this be the way to calculate some kind of bandwidth indicator for TokBox?
UPDATE:
Great answers and so quickly too! Impressive OpenTok community. Just to finish up here, the OTNetworkTest is awesome and actually uses the OTSubscriberKitNetworkStatsDelegate to calculate the quality of the stream as I suspected. The only issue with it, is that it is designed to run before you start your session. I need a test that can run as part of the existing session; so, I am going to strip out the calculation parts and create a version of this class that uses your own subscriber data. Thanks for all the help folks.
Well actually there are a few approaches.
Naive soultion
A rough yet Simply calculate the size of a frame and multiply it by the framerate(Real one, not nominated) and after that add the kbps of the sound. You should get quite accurate picture of actual bandwidth.
For frame rate calculation read about Dynamic frame rate controls
OpenTok approach (The legit one)
I bet that a good User experience solution would be not to show that everything's bad, but to adjust the stream quality, indicating errors only in case of a total faiure(Like Skype does). Look at this:
Starting with our 2.7.0 mobile SDK release, you can start a publisher
with per-determined video resolution and frames per seconds (fps).
Before using the API, you should be aware of the following:
Though HD video sounds like a good idea at first, from a practical
standpoint you may run into issues with device CPU load on low to
medium range devices. You may also be limited by the user’s
available bandwidth. Lastly, data charges for your users could run
high.
Available on the device. The actual empirical values for these parameters will vary based on the specific device. Your selection
can be seen as a maximum for the resolution and frame rate you are
willing to publish.
Automatically adjusted based on various parameters like a user’s packet loss, CPU utilization, and network bandwidth/bit-rate. Rather
than attempting to do this dynamically on your own, we recommend
picking meaningful values and allowing OpenTok to handle the fine
tuning.
Bandwidth, set your publisher video type property to “screen” instead of the default “camera” value.
Taken from here
So, here's what you should do:
Implement <OTSubscriberKitNetworkStatsDelegate> protocol first. It has a method called
- (void)subscriber:(OTSubscriberKit *)subscriber videoNetworkStatsUpdated:(OTSubscriberKitVideoNetworkStats *)stats
Which as you can see has a OTSubscriberKitVideoNetworkStats object passed to it.
Next, you can extract three properties from this object:
#property (readonly) uint64_t videoPacketsLost - The estimated number of video packets lost by this subscriber.
#property (readonly) uint64_t videoPacketsReceived - The number of video packets received by this subscriber.
#property (readonly) uint64_t videoBytesReceived – The number of video bytes received by this subscriber.
#property (readonly) double timestamp – The timestamp, in milliseconds since the Unix epoch, for when these stats were gathered.
So, feel free to play around with these values and implement the best solution for your app.
Moreover, they have published an article specially adressed towards managing different bandwidth on conference calls. Check it out.
UPD:
While I was writing the answer #JaideepShah mentioned an amazing example. Read throughly the explanation for this example. There is a table indicating proper resolution for right values I mentioned above.
It would be the health of your network connections to the TokBox platform/cloud.
The code at https://github.com/opentok/opentok-network-test shows you how to calculate the audio and video bitrate and this could be used as an indicator.
You are calculating the subscriber stats and not the publisher stats.

How to obtain audio chunks for analysis in core audio or AVFoundation

I need to analyse chunks of audio data of (approximately) 1 second with a sample rate of 8kHz. Although the audio will be recorded in real time, it will only be used for detecting specific events. Hence, there are no strict latency requirements. What would be the best framework to use in this case?
I already started learning Core Audio and I worked through the book Learning Core Audio. With the minimal amount of Swift documentation available on the internet I was able to set up an AUGraph for iOS to record audio with the remote I/O audio unit and to get acces to the raw samples with the output render callback. Unfortunately, I got stuck to create chunks of 1 seconds of audio samples to perform the audio analysis. Could a custom AudioBufferList be used for this? Or could a large ringbuffer be implemented on the remote I/O audio unit (like it is required in case of a HAL audio unit)?
I also tried to adopt AVFoundation with AVAssetReader to obtain the audio chunks. Although I was able to obtain samples of a recorded audio signal, I did not succes in creating a buffer of 1 second (and I even don’t know whether it would be possible to do this in realtime). Would AVFoundation be a good choice in this situation anyhow?
I would appreciate any advice on this.
A main problem for me is the fact that I try to use Swift but that there is not much example code available and that there is even less documentation. I feel that it would be better to switch to Objective-C for audio programming, and to stop trying to get everything in Swift. I am curious whether this would be a better time investment?
For analyzing 1 second windows of audio samples, the simplest solution would be to use the Audio Queue API with a lock-free ring buffer (say around 2 seconds long) to record samples. You can use a repeating nstimer task to poll how full the buffer is, and emit 1 second chunks to a processing task when they become available.
Core Audio and the RemoteIO Audio Unit is for if you need much shorter data windows with latency requirements on the order a few milliseconds.
Core Audio is a C API.
Objective-C is an extension of C. I find that Objective-C is much nicer for working with core audio than swift.
I created a cross platform c lockless ring buffer. There is sample code that demonstrates setting up the ring, setting up the mic, playing audio, and reading and writing from the ring.
The ring records that last N number of seconds that you specify. Old data is overwritten by new data. So you specify that you want the latest 3 seconds recorded. The sample I show plays a sine wave while recording through the microphone. Every 7 seconds, it grabs the last 2 seconds of recorded audio.
Here is the complete sample code on github.

MTAudioProcessingTap - produce more output samples?

Inside my iOS 8.0. App I need to apply some custom audio processing on (non-realtime) audio playback. Typically, the audio comes from a device-local audio file.
Currently, I use MTAudioProcessingTap on a AVMutableAudioMix. Inside the process callback I then call my processing code. In certain cases this processing code may produce more samples than the amount of samples being passed in and I wonder what's the best way to handle this (think time stretching effect for example)
The process callback takes an incoming CMItemCount *numberFramesOut argument that signals the amount of outgoing frames. For in-place processing where the amount of incoming frames and outgoing frames is identical this is no problem. In the case where my processing generates more samples I need a way to get the playback going until my output buffers are emptied.
Is MTAudioProcessingTap the right choice here anyway?
MTAudioProcessingTap does not support changing the number of samples between the input and the output (to skip silences for instance).
You will need a custom audio unit graph for this.
A circular buffer/fifo is one of the most common methods to intermediate between different producer and consumer rates, as long as the long term rate is the same. If long term, you plan on producing more samples than are played, you may need to occasionally temporarily stop producing samples, while still playing, in order not to fill up all of the buffer or the systems memory.

Match a sound in recorded audio stream

I have a PCM stream incoming from the microphone. I am analyzing short chunks (Java language) of it to detect short spikes in sound loudness (amplitude). I have a determined sound that plays periodically and I need to know if detected spike is in fact this sound recorded. I have the PCM for sound played, it's completely determined.
I have no clue where to start, should I perform some comparison in time domain or frequency domain? Would be great if someone could give me some insight on how this is done and where should I dig.
Thanks.
It sounds like you want to compare an incoming set of pulses to a references set of pulses. Cross-correlation is probably what you want to use. You may need to precondition your data first, eg create an envelope instead of using raw data, or the cross-correlation may fail unless the match is perfect.

iOS audio over HDMI -- how to deal with 48khz sample rate?

I'm been happily synthesizing audio (at 44.1khz) and sending it out through the RemoteIO audio unit. It's come to my attention that my app's audio is "garbled" when going out via HDMI to a certain model of TV. It looks to me like the problem is related to the fact that this TV is looking for audio data at 48khz.
Here are some questions:
Does RemoteIO adopt the sample rate of whichever device it's outputting to? If I'm sending audio via HDMI to a device that asks for 48kz, do my RemoteIO callback buffers become 48khz?
Is there some tidy way to just force RemoteIO to still think in terms of 44.1khz, and be smart enough to perform any necessary sample rate conversions on its own, before it hands data off to the device?
If RemoteIO does indeed just defer to the device it's connected to, then presumably I need to do some sample rate conversion between my synthesis engine and remote IO. Is AudioConverterConvertComplexBuffer the best way to do this?
Fixed my problem. I was incorrectly assuming that the number of frames requested by the render callback would be a power of two. Changed my code to accommodate any arbitrary number of frames and all seems to work fine now.
If you want sample rate conversion, try using the Audio Queue API, or do the conversion within your own app using some DSP code.
Whether the RemoteIO buffer size or sample rate can be configured or not might depend on iOS device model, OS version, audio routes, background modes, etc., so an app must accomodate different buffer sizes and sample rates when using RemoteIO.

Resources