I have created an iOS 5/iOS 6 app with a display that responds to changes in the musical pitch performed by the user. It uses the record function in the sample SpeakHere code but does not actually save a file because it is designed to respond in real time.
I would now like to extend this app to respond simultaneously to the pitch itself and the duration that the same pitch is sustained (for example, changing the color when the same pitch is held steadily for a minimum period of time). I have been reading about NSTimer and NSDate functions, which seem straightforward, as well as AudioTimeStamp functions, which are apparently C based and which I find very confusing. Based on other posts, it seems like NSTimer and NSDate checks might cause the display's real-time response to an actual musical performance to lag. How about dispatchAfter? Could I expect the block to execute at the scheduled time?
My question is, what approach is most likely to yield the desired result of allowing me to measure duration of a particular pitch in the AudioQueue and update my display continuously in real time? Do I need to be saving to a file for this to work?
I am self-taught and have only been programming for a few months, so no matter what I will have to do a lot of learning of APIs/C language features that are new to me. I'm hoping someone can point me in a fruitful direction. Thanks!
You're definitely getting into pretty advanced stuff here. Here are a few thoughts:
Your audio processing seems to be the most intensive operation. Because this processing needs to be continuous, you're probably going to have to do this processing in another thread. By processing, I mean examining the audio to determine pitch.
Once you've identified the pitch, you should store the time for which it began.
Then, in the main thread, setup an NSTimer that repeats continuously and in the NSTimer's fire method, subtract the pitch's start date from the current date to get the elapsed time, as an NSTimeInterval.
Send the NSTimeInterval to your display logic so that you can update the color on screen.
Some things to check out:
Beginner's tutorial on multi-threading and Grand Central Dispatch on iOS
NSTimer
Using NSTimers
Hope that helps you out!
Related
I have an application in which there is a set of about 50 sounds, which range in length from about 300 ms to about 4 seconds. Various combinations of sounds need to be played at precise times (up to 10 of them can be triggered at once). Some sounds need to be repeated at intervals as short as 100 ms.
I've implemented this is as a two dimensional array of AVAudioPlayers, all of which are loaded with sounds at application launch. There are several players for each sound, to accommodate rapidly repeating sounds. The players for a particular sound are reused in strict rotation. When a new sound is scheduled, the oldest player for that sound is stopped and its current time is set to 0, so the sound will repeat from the start, the next time it's scheduled using player.play(atTime:). There's a thread that schedules new sets of sounds about 300 ms before they are to be played.
It all works quite nicely, up to a point that varies with the device. Eventually, as sounds are played more rapidly, and/or more simultaneous sounds are scheduled, some sounds will refuse to play.
I'm contemplating switching to AVAudioEngine and AVAudioPlayerNodes, using a mixer node. Does anyone know if that approach is likely to handle more simultaneous sounds? My guess is that both approaches translate into a rather similar set of CoreAudio functions, but I haven't actually written the code to test that hypothesis - before I do that, I'm hoping that someone else may have explored this issue before me. I've been deep into CoreAudio before, and I'm hoping to be able to use these handy high-level functions instead!
Also, does anyone know of a way to trigger a closure when a sound initiates? The documented functionality allows for a callback closure, but the only way I've been able to trigger events when the sounds start, is to create a high quality of service queue for DispatchQueue. Unfortunately, depending on the system load, queued events may be executed at times that vary from the scheduled times by up to about 50 ms, which is not quite as precise as I'd prefer to be.
Using AVAudioEngine with AVAudioPlayersNodes provides much better performance, albeit at the cost of a bit of code complexity. I was able to easily increase the playback rate by a factor of five, with better buffer control.
The main drawback in switching to this approach was that Apple's documentation is less than stellar. A few additions to Apple's documentation would have made this task a LOT easier:
Mixer nodes are documented as being able to convert sample rates and channel counts, so I attempted to configure audioEngine.mainMixerNode to convert mono buffers to the output node's settings. Setting the main mixer node's output to the output node's format appeared to be accepted, but threw opaque errors at run time that complained about channel count mismatches.
It appears that the main mixer node is not actually a fully functional mixer node. To get this to work, I had to insert another mixer node that performed the channel conversion, and connect it to the main mixer node. If Apple's documentation had actually mentioned this, it would have saved me a lot of experimentation.
Also, just scheduling a buffer does not cause anything to play. You need to call play() on the player node before anything will happen. Apple's documentation is confusing here - it says that calling play() with no arguments will cause playback to occur immediately, which wasn't what I wanted. It took some experimentation to determine that play() just tells the player node to wake up, and that scheduled buffers will actually be played at the scheduled time, rather than immediately.
It would have been enormously helpful if Apple had provided more than the auto-generated class documentation. A bit of human-generated documentation would have saved me an awful lot of frustrating experimentation.
Chris Adamson's well-written "Learning Core Audio" was very helpful when I was working with Core Audio - it's a shame that the newer AVAudioEngine functionality isn't documented nearly as well.
I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.
I'm trying to realize an app which plays a sequence of tones in a loop.
Actually, I use OpenAL and my experiences with such framework are positive, as I can perform a sound pitch also.
Here's the scenario:
load a short sound (3 seconds) from a CAF file
play that sound in a loop and perform a sound shift also.
This works well, provided that the tact rate isn't too high - I mean a time of more than 10 milliseconds per tone.
Anyhow, my NSTimer (which embeds my sound sequence to play) should be configurable - and as soon as my tact rate increases (I mean less than 10 ms per tone), the sound is no more echoed correctly - even some tones are dropped in an obvious random way.
It seems that real time sound processing becomes an issue.
I'm still a novice in IOS programming, but I believe that Apple sets a limit concerning time consumption and/or semaphore.
Now my questions:
OpenAL is written in C - until now, I didn't understand the whole code and philosophy behind that framework. Is there a possibility to resolve my above mentioned problem making some modifications - I mean setting flags/values or overwriting certain methods?
If not, do you know another IOS sound framework more appropriate for such kind of real time sound processing?
Many thanks in advance!
I know that it deals with a quite extraordinary and difficult problem - maybe s.o. of you has resolved a similar one? Just to emphasize: sound pitch must be guaranteed!
It is not immediately clear from the explanation precisely what you're trying to achieve. Some code is expected.
However, your use of NSTimer to sequence audio playback is clearly problematic. It is neither intended as a reliable nor a high resolution timer.
NSTimer delivers events through a run-loop queue - probably your application's main queue - where they content with user interface events.
As the main thread is not a real-time thread, it may not even be scheduled to run for some time.
There may be quantisation effects on with the delay you requested, meaning your events effectively round to zero clock ticks and get scheduled immediately.
Perioidic timers have deleterious effects on battery life. iOS and MacOSX both take steps to reduce their impact by timer coalescing
The clock you should be using for sequencing events is the playback sample clock - which is available in the render handler of whatever framework you use. As well as being reliable this is efficient as well, as the render handler will be running periodically anyway, and in a real-time thread.
I'm working on an iOS7-only app that needs to display a clock complete with ticking sound. I've used a NSTimer of 1s and I use AVAudioPlayer to play the tick sound every second.
Unfortunately, there's something slightly off with the timing. I've measured that timer is off by between 2 and 22 thousands of a second, which you wouldn't think would matter a great deal, but the lag creates a nail biting tension.. kind of like a heart flutter :-)
I've looked around a bit but it sounds like using audio queue services is the only way to go.. and I really don't fancy delving into the depths of that particular framework again.
My question: Is there some other way of getting precisely scheduled sound events in iOS 7 and failing that is there a decent wrapper framework for audio queue services available somewhere? Or better still is there a way of more precisely scheduling NSTimers?
Using any of NSTimer, libdispatch, or spawning a thread that sleeps for the tick duration rely on the underlying thread getting scheduled in time. The kernel provides no guarantee of this, and it is not surprising that the you observe timing jitter; the latency you observe looks reasonable.
NSTimer running on the main thread is likely to perform worst of these as you are also contending against other events delivered through it.
I think your options here are either to use audio queue services, a real-time thread to schedule the events with AVAudioPlayer, or render the audio yourself to a remoteIO unit.
I don't think AVPlayer provides any particular guarantees about timing either.
Basically, for my team's app, we need to be able to synchronize music across multiple iOS devices. The first way we did this was by having the music on all the devices already and just sending a play command to all the devices. Some would get it later than others, so that method did not work. There was an idea mentioned to calculate the latency between all the devices and send the commands at the appropriate times based on the latency.
The second way proposed would be to stream the music. If we were to implement streaming, how should we go about doing it. Should Audio Units be used, OpenAL, etc.? Also, if streaming was being done, how would we go about making sure that each device's stream was in sync.
Basically, the audio has to be in sync so that the person hearing it cannot differentiate between the devices. A few milliseconds off should not be a problem (unless the listener has super-human hearing).
You'd be amazed at how good the human ear us at spotting audio anomalies...
Sync the time of day
Effectively your trying to meet a real time requirement with a whole load if very variable things in the way (wifi, etc). I strongly suspect the only way you're going to get close to doing this is to issue a 'play' instruction that includes a particular time to start playing. Of course that relies on all the clocks being accurately set.
NTP
I don't know how iPhones get their time of day. If they use (or could use) NTP then you'll be getting close. NTP is designed to convey accurate time of day information over a network despite variable network delays. I've had a quick look and it seems that most NTP clients for iOS are the simple ones, not the full NTP that measures and tunes out network delays, clock drifts, etc.
GPS
Alternatively GPS is also a very good source of time information. Again I don't know if iPhones can or do use GPS for setting their clock but if it could be done then that would likely be pretty good. On Solaris (and I think Linux too) the 1 pulse per second that most GPS chips generate from the GPS signal can be used to condition the internal OS clock, making it very accurate indeed (sub microsecond accuracy).
I fear that iPhones don't do either of these things natively; both involve using a fair bit of electricity, so I wouldn't be surprised if they did something else less sophisticated.
Cell Time Service
Some Cell networks provide a time service too, but I don't think it's designed for accurate time setting. Also it tends not to be available everywhere. You often find it at major airports so that recent arrivals get their phones set to something close to local time.
Play at time X
So if one of those could be used to ensure that all the iPhones are set to exactly the same time of day then all you have to do is write your software to start playing at a specific time. That will probably involve polling the clock in a very tight loop waiting for it to tick over; most OSes don't provide a means of sleeping until a specific time. They do at least allow for sleeping for a period of time, which can be used to sleep until close to the appointed time. You'd then start polling the clock until the right time is reached.
Delay Measurement and Standard Deviation
Your first method is doomed I think. You might be able to measure average delays and so forth but that doesn't mean that every message has exactly the same latency. The standard deviation in the latency will tell you what you can expect to achieve, and I don't think that's going to be particularly small. If so then the message has got to include a timestamp.
NTP can work because it's only interested in the average delay measured over a period of time (hours sometimes), whereas you're interested in instantaneous delay.
Streaming with RTP
Your second method may work if you can time sync the devices as discussed above. The RTP protocol was designed for use in these circumstances; it doesn't help with achieving sync, but it does help a lot with the streaming. It tells you whereabouts in the stream any one piece of received data fits, allowing you to play it at the right time.
Clock Drift
Another problem to deal with is how long you're playing for. If it's a long time then you may discover that the 44kHz (or whatever) audio clock rate on each device isn't quite the same. So, whilst you might find a way of starting to play all at the same time, the separate devices will then start diverging ever so slightly. Over a long period of time they may be noticeably out.
BlueTooth
It might be possible to do something with BlueTooth. It has many weird and wonderful profiles, and it might be that one of those would serve to send an accurate 'start now' message.
Audio Trigger
You might also use sound as a means of conveying a start signal. One device can play a particular sound whilst your software in the others is listening with the mic. When a particular feature is detected in the sound, that's the time for everyone to start playing. Sort of a computerised "1, 2, a 1 2 3 4".
Camera Flash
Should be easy to spot in software...
I think your first way would work if you expand it a little bit. Assuming all the clocks on the devices are in sync you could include a timestamp in your play command. Then each device would calculate the time between the timestamp and when it received the command. You would then play the music and offset it by the time difference.