AVAudioEngine schedule sample accurate parameter changes - ios

I am trying to create an app using a combination of AVAudioPlayerNode instances and other AUAudioUnits for EQ and compression etc. Everything connects up well and using the V3 version of the API certainly makes configuration easier for connecting nodes together. However during playback I would like to be able to automate parameter changes such a the gain on a mixer so that the changes are ramped (eg. fade out or fade in.) and feel confident that the changes are sample accurate.
One solution I have considered to install a tap on a node (perhaps the engine's mixer node) and within that adjust the gain for a given unit but since the tap is on the output of a unit this is always going to be too late to have the desired effect (I think) without doing of offset calculations and then delaying my source audio playback to match up to the parameater changes. I have also looked at the scheduleParameterBlock property on AUAudioUnit but it seems I would need to implement my own custom unit to make use of that rather than use built-in units even though it was mentioned in
WWDC session 508: " ...So the first argument to do schedule is a sample
time, the parameter value can ramp over time if the Audio Unit has
advertised it as being rampable. For example, the Apple Mixer does
this. And the last two parameters, of course, are function parameters
are the address of the parameter to be changed and the new parameter
value.... "
Perhaps this meant that internally the Apple Mixer uses it and not that we can tap into any rampable capabilities. I can't find many docs or examples other than implementing a custom audio unit as in Apple's example attached to this talk.
Other potential solutions I have seen include using NSTimer, CADisplayLink or dispatchAfter... but these solutions feel worse and less sample accurate than offsetting from the installed tap block on the output of a unit.
I feel like I've missed something very obvious since there are other parts of the new AVAudioEngine API that make a lot of sense and the old AUGraph API allowed more access to sample accurate sequencing and parameter changing.

This is not as obvious as you'd hope it would be. Unfortunately in my tests, the ramp parameter on scheduleParameterBlock (or even the underlying AudioUnitScheduleParameters) simply doesn't do anything. Very odd for such a mature API.
The bottom line is that you can only set a parameter value within a single buffer, not at the sample level. Setting a parameter value at a sample time, will automatically ramp from the current value to the new value by the end of the containing buffer. There seems to be no way to disable this automatic ramping.
Longer fades have to be done in sections by setting fractional values across multiple buffers and keeping track of the fade's relative progress. In reality, for normal duration fades, this timing discrepancy is unlikely to be a problem because sample-accuracy would be overkill.
So to sum up, sample-level parameter changes seem to be impossible, but buffer-level parameter changes are easy. If you need to do very short fades (within a single buffer or across a couple of buffers) then this can be done at the sample-level by manipulating the individual samples via AURenderCallback.

Related

How many sounds can be played at a time on iOS - AVAudioPlayer vs. AVAudioEngine & AVAudioPlayerNode

I have an application in which there is a set of about 50 sounds, which range in length from about 300 ms to about 4 seconds. Various combinations of sounds need to be played at precise times (up to 10 of them can be triggered at once). Some sounds need to be repeated at intervals as short as 100 ms.
I've implemented this is as a two dimensional array of AVAudioPlayers, all of which are loaded with sounds at application launch. There are several players for each sound, to accommodate rapidly repeating sounds. The players for a particular sound are reused in strict rotation. When a new sound is scheduled, the oldest player for that sound is stopped and its current time is set to 0, so the sound will repeat from the start, the next time it's scheduled using player.play(atTime:). There's a thread that schedules new sets of sounds about 300 ms before they are to be played.
It all works quite nicely, up to a point that varies with the device. Eventually, as sounds are played more rapidly, and/or more simultaneous sounds are scheduled, some sounds will refuse to play.
I'm contemplating switching to AVAudioEngine and AVAudioPlayerNodes, using a mixer node. Does anyone know if that approach is likely to handle more simultaneous sounds? My guess is that both approaches translate into a rather similar set of CoreAudio functions, but I haven't actually written the code to test that hypothesis - before I do that, I'm hoping that someone else may have explored this issue before me. I've been deep into CoreAudio before, and I'm hoping to be able to use these handy high-level functions instead!
Also, does anyone know of a way to trigger a closure when a sound initiates? The documented functionality allows for a callback closure, but the only way I've been able to trigger events when the sounds start, is to create a high quality of service queue for DispatchQueue. Unfortunately, depending on the system load, queued events may be executed at times that vary from the scheduled times by up to about 50 ms, which is not quite as precise as I'd prefer to be.
Using AVAudioEngine with AVAudioPlayersNodes provides much better performance, albeit at the cost of a bit of code complexity. I was able to easily increase the playback rate by a factor of five, with better buffer control.
The main drawback in switching to this approach was that Apple's documentation is less than stellar. A few additions to Apple's documentation would have made this task a LOT easier:
Mixer nodes are documented as being able to convert sample rates and channel counts, so I attempted to configure audioEngine.mainMixerNode to convert mono buffers to the output node's settings. Setting the main mixer node's output to the output node's format appeared to be accepted, but threw opaque errors at run time that complained about channel count mismatches.
It appears that the main mixer node is not actually a fully functional mixer node. To get this to work, I had to insert another mixer node that performed the channel conversion, and connect it to the main mixer node. If Apple's documentation had actually mentioned this, it would have saved me a lot of experimentation.
Also, just scheduling a buffer does not cause anything to play. You need to call play() on the player node before anything will happen. Apple's documentation is confusing here - it says that calling play() with no arguments will cause playback to occur immediately, which wasn't what I wanted. It took some experimentation to determine that play() just tells the player node to wake up, and that scheduled buffers will actually be played at the scheduled time, rather than immediately.
It would have been enormously helpful if Apple had provided more than the auto-generated class documentation. A bit of human-generated documentation would have saved me an awful lot of frustrating experimentation.
Chris Adamson's well-written "Learning Core Audio" was very helpful when I was working with Core Audio - it's a shame that the newer AVAudioEngine functionality isn't documented nearly as well.

Flink: Are multiple execution environments supported?

Is it OK to create multiple ExecutionEnvironments in a Flink program? More specifically, create one ExecutionEnvironment and one StreamExecutionEnvironment in the same main method, so that one can work with batch and later transit to streaming without problems?
I guess that the other possibility would be to split the program in two, but for my testing purposes this seems better. Is Flink prepared for this scenario?
All seems to work fine, except I am currently having problems with no output when joining two streams on a common index and using window(TumblingProcessingTimeWindows.of(Time.seconds(1))). I have already called setStreamTimeCharacteristic(TimeCharacteristic.EventTime) on the StreamExecutionEnvironment and even tried assigning custom watermarks on both joined streams with assignTimestampsAndWatermarks where I just return System.currentTimeMillis() as the timestamp of each record.
Since it finishes really quickly, both streams should fit in that 1-second window, no? Both streams print just fine right before the join. I can try supplying the important parts of code (it's rather lengthy) if anyone's interested.
UPDATE: OK, so I separated the two environments (put each inside a main method) and then I simply call the first main from the second main method. The described problem no longer occurs.
No, this not supported, and won't really work.
At least up through Flink 1.9, a given application must either have an ExecutionEnvironment and use the DataSet API, or a StreamExecutionEnvironment and use the DataStream API. You cannot mix the two in one application.
There is ongoing work to more completely unify batch and streaming, but that's a work in progress. To understand this better you might want to watch the video for this recent Flink Forward talk when it becomes available.

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

Simultaneously generate multiple sine waves into sample buffer for audio unit (iOS)

Given an array (of changing length) of frequencies and amplitudes, can I generate a single audio buffer on a sample by sample basis that includes all the tones in the array? If not, what is the best way to generate multiple tones in a single audio unit? Have each note generate it's own buffer then sum those into an output buffer? Wouldn't that be the same thing as doing it all at once?
Working on an iOS app that generates notes from touches, considering using STK but don't want to have to send note off messages, would rather just generate sinusoidal tones for the notes I'm holding in an array. Each note actually needs to produce two sinusoids, with varying frequency and amplitude. One note may be playing the same frequency as a different note so a note off message at that frequency could cause problems. Eventually I want to manage amplitude (adsr) envelopes for each note outside of the audio unit. I also want response time to be as fast as possible so I'm willing to do some extra work/learning to keep the audio stuff as low level as I can.
I've been working with sine wave single tone generator examples. Tried essentially doubling one of these, something like:
Buffer[frame] = (sin(theta1) + sin(theta2))/2
Incrementing theta1/theta2 by frequency1/frequency2 over sample rate, (I realize this is not the most efficient calling sin() ) but get aliasing effects. I've yet to find an example with multiple frequencies or data sources other than reading audio from file.
Any suggestions/examples? I originally had each note generate its own audio unit, but that gave me too much latency from touch to note sounding (and seems inefficient too). I am newer to this level of programming than I am to digital audio in general, so please be gentle if I'm missing something obvious.
yes of course you can, you can do whatever you like inside your render callback. when you set this call back up, you can pass in a pointer to an object.
that object could contain the on off states for each tone. in fact the object could contain a method responsible for filling up the buffer. ( just make sure the object is nonatomic if it is a property -- otherwise you will get artefacts due to locking issues )
What exactly are you trying to achieve? Do you really need to generate on-the-fly?
if so, you run the risk of overloading the remoteIO audio unit's render callback, which will give you glitches and artefacts
you might get away with it on the simulator and then move it over to a device and find that mysteriously it isn't working any more because you are running on 50 times less processor, and one callback cannot complete before the next one arrives
having said, you can get away with a lot
I have made a 12 tone player that can simultaneously play any number of individual tones
all I do is have a ring buffer for each tone (I am using quite a complex waveform so this takes a lot of time, in fact I actually calculate it the first time the application is run and subsequently load it from file), and maintain a read-head and an enabled flag for each ring.
Then I add everything up in the render callback, and this handles fine on the device, even if all 12 are playing together. I know the documentation tells you not to do this, it recommends only using this callback in order to fill one buffer from another, but you can get away with a lot, and it is a PITA to code up some sort of buffering system that calculates on a different thread.

How should a parser filter behave in directshow editing services?

we´ve created a custom push source / parser filter that is expected to work in a directshow
editing services timeline.
Now everything is great except that the filter does not stop to deliver samples when the current
cut has reached it´s end. The rendering stops, but the downstream filter continues to consume
samples. The filter delivers samples until it reaches EOF. This causes high cpu load, so the application
is simply unusable.
After a lot of investigation I’m not able to find a suitable mechanism that can inform my filter
that the cut is over so the filter needs to be stopped :
The Deliver function on the connected decoder pins always returns S_OK, meaning the attached decoder
is also not aware the IMediaSamples are being discarded downstream
there’s no flushing in the filter graph
the IMediaSeeking::SetPositions interface is used but only the start positions are set –
our is always instructed to play up to the end of the file.
I would expect when using IAMTimelineSrc::SetMediaTimes(Start, Stop) from the application
that this would set a stop time too, but this does not happen.
I’ve also tried to manipulate the XTL timeline adding ‘mstop’ attributes to all the clip in the
hope that this would imply a stop position being set, but to no avail
In the filters point of view, the output buffers are always available (as the IMediaSamples are being discarded downstream),
so the filter is filling samples as fast as it can until the source file is finished.
Is there any way the filter can detect when to stop or can we do anything from the application side ?
Many thanks
Tilo
You can try adding a custom interface to your filter and call a method externally from your client application. See this SO question for a bit more of details on this approach. You should be careful with thread safety while implementing this method, and it is indeed possible that there is a neater way of detecting that the capturing should be stopped.
I'm not that familiar with DES, but I have tried my demux filters in DES and the stop time was set correctly when there was a "stop=" tag for the clip.
Perhaps your demux does not implement IMediaSeeking correctly. Do you expose IMediaSeeking through the pins?
I had a chance to work with DES and custom push source filter recently.
From my experience;
DES actually does return error code to Receive() call, which is in turn returned to Deliver() of the source, when the cut reaches the end.
I hit the similar situation that source does not receive it and continues to run to the end of the stream.
The problem I found (after a huge amount of ad-hoc trials) is that the source needs to call DeliverNewSegment() method at each restart after seek. DES seems to take incoming samples only after that notification. It looks like DES receives the samples as S_OK even without that notification, but it just throws away.
I don't see DES sets end time by IMediaSeeking::SetPositions, either.
I hope this helps, although this question was very old and I suppose Tilo does not care this any more...

Resources