Millisecond (and greater) precision for audio file elapsed time in iOS - ios

I am looking for a low-latency way of finding out how many seconds have elapsed in an audio file to guaranteed millisecond precision in real-time. According to the AVAudioPlayer class reference, a call to -currentTime will return "the offset of the current playback position, measured in seconds from the start of the sound", however an NSTimeInterval is a double and this implies fractions of a second are possible.
As a testing scenario, I have an audio file playing and the user taps a button. Playback DOES NOT pause/stop, but at the moment the button was tapped I would like to obtain information about the elapsed time. In the real application, the "button may be pressed" many times in one second, hence the need for millisecond precision.
My files are stored as AIFF files and are around 1-10 minutes in length. Ideally I would like to find out exactly which sample frame is 'up-next' when playback resumes - however, this level of precision is a little excessive and millisecond precision is perfectly acceptable.
Is AVAudioPlayer's -currentTime method sufficient to achieve guaranteed millisecond precision for a currently-playing audio file? Or, would it be preferable to use a lower-level API such as iOS's Audio Units?

If you want sub-millisecond relative time resolution, convert to raw PCM and count buffers * length + samples using a low latency RemoteIO Audio Unit configuration. Most iOS devices will support as small as 6 mS RemoteIO buffers of 256 samples, with a callback for each buffer.

Related

record pcmaudiodata per 10 milisecond without playback

İ need to record pcmaudio per 10 milisecond without playback in swift.
I have tried this code but i can't find how can i stop playback while recording.
RecordAudio Github Repo
and second question: How can i get PCM data from circular buffer for encode-decode process properly. When I convert recorded audio data to signed byte or unsigned byte or anything else the converted data sometimes will corrupt. What is the best practice for this kind of process
In the RecordAudio sample code, the audio format is specified as Float (32-bit floats). When doing a float to integer conversion, you have to make sure your scale and offset results in a value in legal range for the destination type. e.g. check that -1.0 to 1.0 results in 0 to 256 (unsigned byte), and out-of-range values are clipped to legal values. Also pay attention to the number of samples you convert, as an Audio Unit callback can vary the frameCount sent (number of samples returned). You most likely won't get exactly 10 mS in any single RemoteIO callback, but may have to observe a circular buffer filled by multiple callbacks, or a larger buffer that you will have to split.
When RemoteIO is running in play-and-record mode, you can usually silence playback by zeroing the bufferList buffers (after copying, analyzing, or otherwise using the data in the buffers) before returning from the Audio Unit callback.

Read an AVAudioFile into a buffer starting at a certain time

Let's say I have an AVAudioFile with a duration of 10 seconds. I want to load that file into an AVAudioPCMBuffer but I only want to load the audio frames that come after a certain number of seconds/milliseconds or after a certain AVAudioFramePosition.
It doesn't look like AVAudioFile's readIntoBuffer methods give me that kind of precision so I'm assuming I'll have to work at the AVAudioBuffer level or lower?
You just need to set the AVAudioFile's framePosition property before reading.

Mute the audio at a particular interval of time while casting the video in ios using chromecast

I am working on chrome cast based application using google cast api. Initially, I am joining to the chromecast session from youtube and playing the video. Later I joined to this session from my application.
There is a requirement in my application to mute the audio at particular interval of time.
I required to mute the audio from 00:01:34:03(hh:mm:ss:ms) to 00:01:34:15((hh:mm:ss:ms).
Converting the time to seconds in the below way.
Time to seconds conversion: (00*60*60)+(01*60)+34+(03/1000) = 94.003 -> Mute start time
Calling the mute method after the interval of: Mute start time - Current streaming position
I am using approximateStreamPosition value (in GCKMediaControlChannel header file) to known the stream position of the casting video. It is returning the value in double format say 94.70801001358.
In this 94 is seconds duration, what does the value after the decimal point indicates(.70801001358). Is it milliseconds? If so can I round it to three digits.
As I required to mute the audio in milliseconds duration, is it causes any delay or advance muting of the audio if I round off the value.
The 0.70801001358 is in seconds; I am not sure what you mean by asking if that is in milliseconds. In milliseconds, that number would be 708.01001358.
You wont be able to have millisecond accuracy in controlling mute (or any other control command for that matter); just setting up a command and transfer time from your iOS device to Chromecast together will throw your calculations off by a good number of milliseconds.

Get peak volume of audio input on iOS

On iOS 7, how do I get the current microphone input volume in a range between 0 and 1?
I've seen several approaches like this one, but the results I get baffle me.
The return values of peakPowerForChannel: are documented to be in the range of -160 to 0 with 0 being the loudest and -160 near absolute silence.
Problem: Given a quite room and a short but loud noise, the power goes all the way up in an instant but takes very long time to drop back to quite level (way longer than the actual noise...)
What I want: Essentially I want an exact copy of the Audio Input patch of Quartz Composer with its Volume Peak output. Any tips?
To get a similar volume peak measurement, you might have to input raw audio via the iOS Audio Queue API (or the RemoteIO Audio Unit), and analyze the raw PCM waveform samples in each audio callback, looking for a magnitude maxima over your desired frame width or analysis time.

VTCompressionSessionEncodeFrame: last seconds are lost?

I am using VTCompressionSessionEncodeFrameWithOutputHandler to compress pixel buffers from camera into raw h264 stream. I am using kVTEncodeFrameOptionKey_ForceKeyFrame to be sure that every output from VTCompressionSessionEncodeFrame is not dependent on other pieces. Also, there is kVTCompressionPropertyKey_AllowFrameReordering = false, kVTCompressionPropertyKey_RealTime = true options during session initialization and VTCompressionSessionCompleteFrames called after each VTCompressionSessionEncodeFrame call.
I also collect samples, produced by VTCompressionSessionEncodeFrame and periodically save them as MP4 file (using Bento4 library).
But final track is always shorter than samples, feeded to VTCompressionSessionEncodeFrame on 1-2 seconds. After several attempts to resolve this, i can be sure, that is it VTCompressionSessionEncodeFrame outputs frames, that depends on later frames to be decoded properly - so this frames are lost, since they can not be used to produce "final chunks" of the track.
So the question - how one can force VTCompressionSessionEncodeFrame to produce totally independent data chunks?
Turn out this was... FPS issue! NAL units do not have special timing itself (aside of pts, which is capture-fps-bound in my case), so it is quite important they are produced at exact rate as FPS in movie is expecting them to be... Nothing was lost, just saved frames were played faster (this was not so easy to spot, in fact)

Resources