What is the best method of synchronizing audio across iOS devices with WiFi? - ios

Basically, for my team's app, we need to be able to synchronize music across multiple iOS devices. The first way we did this was by having the music on all the devices already and just sending a play command to all the devices. Some would get it later than others, so that method did not work. There was an idea mentioned to calculate the latency between all the devices and send the commands at the appropriate times based on the latency.
The second way proposed would be to stream the music. If we were to implement streaming, how should we go about doing it. Should Audio Units be used, OpenAL, etc.? Also, if streaming was being done, how would we go about making sure that each device's stream was in sync.
Basically, the audio has to be in sync so that the person hearing it cannot differentiate between the devices. A few milliseconds off should not be a problem (unless the listener has super-human hearing).

You'd be amazed at how good the human ear us at spotting audio anomalies...
Sync the time of day
Effectively your trying to meet a real time requirement with a whole load if very variable things in the way (wifi, etc). I strongly suspect the only way you're going to get close to doing this is to issue a 'play' instruction that includes a particular time to start playing. Of course that relies on all the clocks being accurately set.
NTP
I don't know how iPhones get their time of day. If they use (or could use) NTP then you'll be getting close. NTP is designed to convey accurate time of day information over a network despite variable network delays. I've had a quick look and it seems that most NTP clients for iOS are the simple ones, not the full NTP that measures and tunes out network delays, clock drifts, etc.
GPS
Alternatively GPS is also a very good source of time information. Again I don't know if iPhones can or do use GPS for setting their clock but if it could be done then that would likely be pretty good. On Solaris (and I think Linux too) the 1 pulse per second that most GPS chips generate from the GPS signal can be used to condition the internal OS clock, making it very accurate indeed (sub microsecond accuracy).
I fear that iPhones don't do either of these things natively; both involve using a fair bit of electricity, so I wouldn't be surprised if they did something else less sophisticated.
Cell Time Service
Some Cell networks provide a time service too, but I don't think it's designed for accurate time setting. Also it tends not to be available everywhere. You often find it at major airports so that recent arrivals get their phones set to something close to local time.
Play at time X
So if one of those could be used to ensure that all the iPhones are set to exactly the same time of day then all you have to do is write your software to start playing at a specific time. That will probably involve polling the clock in a very tight loop waiting for it to tick over; most OSes don't provide a means of sleeping until a specific time. They do at least allow for sleeping for a period of time, which can be used to sleep until close to the appointed time. You'd then start polling the clock until the right time is reached.
Delay Measurement and Standard Deviation
Your first method is doomed I think. You might be able to measure average delays and so forth but that doesn't mean that every message has exactly the same latency. The standard deviation in the latency will tell you what you can expect to achieve, and I don't think that's going to be particularly small. If so then the message has got to include a timestamp.
NTP can work because it's only interested in the average delay measured over a period of time (hours sometimes), whereas you're interested in instantaneous delay.
Streaming with RTP
Your second method may work if you can time sync the devices as discussed above. The RTP protocol was designed for use in these circumstances; it doesn't help with achieving sync, but it does help a lot with the streaming. It tells you whereabouts in the stream any one piece of received data fits, allowing you to play it at the right time.
Clock Drift
Another problem to deal with is how long you're playing for. If it's a long time then you may discover that the 44kHz (or whatever) audio clock rate on each device isn't quite the same. So, whilst you might find a way of starting to play all at the same time, the separate devices will then start diverging ever so slightly. Over a long period of time they may be noticeably out.
BlueTooth
It might be possible to do something with BlueTooth. It has many weird and wonderful profiles, and it might be that one of those would serve to send an accurate 'start now' message.
Audio Trigger
You might also use sound as a means of conveying a start signal. One device can play a particular sound whilst your software in the others is listening with the mic. When a particular feature is detected in the sound, that's the time for everyone to start playing. Sort of a computerised "1, 2, a 1 2 3 4".
Camera Flash
Should be easy to spot in software...

I think your first way would work if you expand it a little bit. Assuming all the clocks on the devices are in sync you could include a timestamp in your play command. Then each device would calculate the time between the timestamp and when it received the command. You would then play the music and offset it by the time difference.

Related

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

Establishing synchronized music streaming across devices

I am attempting to stream audio files from a server to iOS devices and play them completely synchronized. For example on my phone I might be 20 secs into a song and then my friend next to me should also be 20 secs into the song as well. I know this is not an easy problem to solve, but I am attempting to do so.
I can currently get them within one second of each other by calculating the difference in time between the devices and then have them sync up, however that is not good enough because the human ear can detect a major difference in a second and this is over WIFI.
My next approach is going to be to unicast the one file from the server and then have the all devices pick it up directly from the server and then implement some type of buffer system similar to netflix so that network connectivity would be a limiting factor. http://www.wowza.com/ is what I would use to help with that.
I know this can be done, because http://lysn.in/ is does it with their app and I want to be able to do something similar.
Any other recommendations after I try my unicast option?
Would implementing firebase help solve a lot of the heavy lifting problems?
(1) In answer to ONE of your questions (the final one):
Firebase is not "realtime" in "that sense" -- PubNub is probably (almost certainly) the fastest "realtime" messaging for and between apps/browser/etc.
But they don't mean real-time in the sense of real-time, say, as race game engineers mean it or indeed in your use-case.
So firebase is not relevant to you here and won't help.
(2) Regarding your second general question: "how to synchronise time on two or more devices, given that we have communications delays."
Now, this is a really well-travelled problem in computer science.
It would be pointless outlining it here, because it is fully explained here http://www.ntp.org/ntpfaq/NTP-s-algo.htm if you click on "How is time synchronised"?
So in fact, to get a good time base on both machines, you should use that! Have both machines really accurately set a time to NTP using the existing (perfected for decades) NTP synchronisation.
(So for example https://stackoverflow.com/a/6744978/294884 )
In fact are you doing this?
It's possible that doing that will solve all your problems; then just agree to start at a certain exact time.
Hope it helps!
I would recommend against using the data movement to synchronize the playback. This should be straightforward to do with a buffer and a periodic "sync" signal that is sent at a period of < 1/2 the buffer size. Worst case this should generate a small blip on devices that get ahead or behind relative to the sync signal.

How to determine if two devices are listening to audio at the same time

If I have two devices listening to audio and sending data to a server, is there a way for me to align the data (at the server level) based on audio so I know when the devices were listening at the same time? The audio would be scheduled so the only thing to really account for I guess would be cable network/timezone issues.
I've been looking at things like FFT and other questions related to sound but realize that I may be chasing the wrong problem or over engineering. Would it be best to try and compare frequency or use a solution like this question suggests?
Most Audio API's on iOS have callbacks, especially the low-level one. After you determine that the metadata of the song is the same (you could probably do this over bluetooth using GameKit), you could use the callbacks to determine if the time elapsed on the playback is the same, or within your tolerence.
If the devices are not near each other, you would need to send the metadata to your server similar to how Last.fm scrobbles now-listening tracks, then compare against known devices.

How to synchronize audio playback on 2 or more iOS devices?

I would like to write a web application that allows me to sync audio playback of an MP3 down to ~50ms, or close enough that the human ear can't detect the difference.
The idea would be that two or more smartphones could each be paired to a bluetooth speaker, and two or more speakers would play the same audio at the exact same time.
How would you suggest I go about setting this up, both client-side and server-side? I'm planning to use Rails/Ruby for backend, and iOS/obj c for mobile dev.
I had though of the idea of syncing to a global/atomic clock on the server, and having the server provide instructions to clients on when to start playing/jump in to an already playing track. My concern is that, if I want to stream the audio, that it will be impossible to load a song into memory and start playback accurately on the millisecond level.
Thoughts?
The jitter in internet packet delivery will be too large, so forget about syncing over the internet. However you could check the accuracy of NTP which is still used (I guess, I know that older UNIX's used it) by the OS when you switch on automatic date/time in Settings, but my guess is that it won't be good enough either. But perhaps the OS may also use other time sources like GPS; I'm don't know how iOS does it but accuracy within 20ms is not to be expected. You could create experimental app to check it out.
So, what's left is a sync closer to home, meaning between the devices directly. Of course you need to make sure that all devices haves loaded (enough of) the song, and have preloaded it in AVAudioPlayer or whatever you're using, to be able to start playing immediately. (It may actually not be the best idea to use higher level 'AVAudioPlayer` API's as it may give higher delays, and what more important higher jitter, than lower level API's.)
Here are three ideas (one device needs to be master triggering the start play, the others are slaves that are waiting for the trigger):
Use an audio trigger pulse, like a high tone of a defined length and frequency. Then use FFT to recognise this tone.
Connect the devices via GameKit Bluetooth and transmit the trigger on these connections.
Use the iPhone 4+ flash as trigger: flash in a certain pattern. This would require you to sample the video data which is quite doable and can be very fast.
I'm going with a solution that uses an atomic clock for synchronization, and an external service that allows server instructions/messages to be sent to all devices in close sync.

Handling latency while synchronizing client-side timers using Juggernaut

I need to implement a draft application for a fantasy sports website. Each users will have 1m30 to choose a player on its team and if that time has elapsed it will be selected automatically. Our planned implementation will use Juggernaut to push the turn changes to each user participating in the draft. But I'm still not sure about how to handle latency.
The main issue here is if a user got a higher latency than the others, he will receive the turn changes a little bit later and his timer won't be synchronized. Say someone receive a turn change after choosing a player himself while on his side he think he still got 2 seconds left, how can we handle that case? Is it better to try to measure each user latency and adjust the client-side timer to minimize that issue? If so, how could we implement that?
This is a tricky issue, but there are some good solutions out there. Look into what time.gov does, and how it does it; essentially, as I understand it, they use Java to perform multiple repeated requests to the server, to attempt to get an idea of the latency involved in the communication, then they generate a measure of latency that they use to skew the returned time data. You could use the same process for your application, with even more accuracy; keeping track of what the latency is and how it varies over time lets you make some statistical inferences about how reliable your latency numbers are, etc. It can be a bit complex, but it can definitely allow you to smooth out your performance. My understanding is that this is what most MMOs do as well, to manage lag.

Resources