Establishing synchronized music streaming across devices - ios

I am attempting to stream audio files from a server to iOS devices and play them completely synchronized. For example on my phone I might be 20 secs into a song and then my friend next to me should also be 20 secs into the song as well. I know this is not an easy problem to solve, but I am attempting to do so.
I can currently get them within one second of each other by calculating the difference in time between the devices and then have them sync up, however that is not good enough because the human ear can detect a major difference in a second and this is over WIFI.
My next approach is going to be to unicast the one file from the server and then have the all devices pick it up directly from the server and then implement some type of buffer system similar to netflix so that network connectivity would be a limiting factor. http://www.wowza.com/ is what I would use to help with that.
I know this can be done, because http://lysn.in/ is does it with their app and I want to be able to do something similar.
Any other recommendations after I try my unicast option?
Would implementing firebase help solve a lot of the heavy lifting problems?

(1) In answer to ONE of your questions (the final one):
Firebase is not "realtime" in "that sense" -- PubNub is probably (almost certainly) the fastest "realtime" messaging for and between apps/browser/etc.
But they don't mean real-time in the sense of real-time, say, as race game engineers mean it or indeed in your use-case.
So firebase is not relevant to you here and won't help.
(2) Regarding your second general question: "how to synchronise time on two or more devices, given that we have communications delays."
Now, this is a really well-travelled problem in computer science.
It would be pointless outlining it here, because it is fully explained here http://www.ntp.org/ntpfaq/NTP-s-algo.htm if you click on "How is time synchronised"?
So in fact, to get a good time base on both machines, you should use that! Have both machines really accurately set a time to NTP using the existing (perfected for decades) NTP synchronisation.
(So for example https://stackoverflow.com/a/6744978/294884 )
In fact are you doing this?
It's possible that doing that will solve all your problems; then just agree to start at a certain exact time.
Hope it helps!

I would recommend against using the data movement to synchronize the playback. This should be straightforward to do with a buffer and a periodic "sync" signal that is sent at a period of < 1/2 the buffer size. Worst case this should generate a small blip on devices that get ahead or behind relative to the sync signal.

Related

Matching time on iPhones worldwide - synchronizing playing video?

Two Quick questions:
Are the seconds value of the clock/time on all iPhones around the world the same? If not, are they close at least & how close?
Do the seconds value of the clock/time on all iPhones around the world change/increment at the exact same time?
Upon request, I'm editing this post and adding the purpose for asking such questions:
I'm trying to make a corporate app that can play a video on multiple iPhones around the world at the exact same time (or as close as possible, ideally the exact same moment). Could you please guide me on how to do this?
Much thanks in advance!
To answer your actual question per se,
Are the seconds value of the clock/time on all iPhones around the world the same?
The fact is, yes, 99.9999% of iPhone users simply use the "get time from a server" system which is of course built in to any phone now.
(Indeed, this simply applies to any Windows, Mac, Android, iOS, etc etc device.)
Yes, they are all "about the same".
You could rely on it being within a second or two, probably even closer.
You cannot rely on it being closer than that.
It seems that you
I'm trying to make a corporate app that can play a video on multiple iPhones around the world at the exact same time
Synchronization in general and streaming video (is it streaming?) is a well-explored (and rather technical) branch of software engineering. (For example a massive amount of game engineering, which is a huge field, relates to inside-the-frame synchronization.)
This is not something you can learn how to do in five minutes, and it requires a stack of device-cloud stuff. Go ahead and ask new questions about this, or just google to get started! Good luck.
Regarding using push notifications.
With the push notification you would send a time. You would not send a "command to play it".
So say right now it is (example) 09:13:28. You would pick a time in the future by a couple minutes. So let's say 09:14:30.
Then using a push notification you would send that information "09:14:30" (and a video file name) to everyone connected. (You'd be sending a "command" as it were, to play video X at 09:14:30.)
Then every device would in fact play the video at 09:14:30 (simply using the local clock, as asked in your question).
Be aware that sending push notifications is extremely sloppy and slow. It can take any amount of time from 5 seconds to a minute, AND quite often there are delays beyond that (ie ten minutes or the like).
I personally would not even bother starting to experiment with push notifications, for the project you describe.
These days, making apps is entirely about using device-cloud services, such as Firebase. Everything is about "OCC" - occasionally connected computing.
(So, you can't get a job "making apps" anymore - i.e. if you know how to move buttons around on an iPhone screen. You get a job because you can make a total, live, device-cloud system - indeed such as you are making.)
Indeed your example project is the perfect such "demo" project for learning about how to do modern apps.
Simply use Firebase to sync everything up.
You'll essentially put a piece of information on Firebase ("play video X at 09:14:30") and that information will be communicated fairly quickly/reliably to everyone connected.
For the particular task you describe, I personally would use PubNub which is faster than Firebase and basically made for game-like problems precisely like you describe.
http://pubnub.com
If you truly needed performance/reliability better than pubnub, you are really talking major engineering. So, the (buildings of) engineers who make live games at Nintendo, Warcraft etc, would tackle such an issue as "being even faster than PubNub".
So, the answer in brief!
The very short answer then to your question posed is:
Learn to use the various device-cloud services, which are at the heart of all apps today. (Knowing how to make "an Android or iOS app", as such, is of no consequence today.) For your particular problem, you'll want to use PubNub specifically, as it is built for precisely realtime problems such as this. (Firebase more leans towards "OCC" type data problems.)
Really that's it.

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

What is the best method of synchronizing audio across iOS devices with WiFi?

Basically, for my team's app, we need to be able to synchronize music across multiple iOS devices. The first way we did this was by having the music on all the devices already and just sending a play command to all the devices. Some would get it later than others, so that method did not work. There was an idea mentioned to calculate the latency between all the devices and send the commands at the appropriate times based on the latency.
The second way proposed would be to stream the music. If we were to implement streaming, how should we go about doing it. Should Audio Units be used, OpenAL, etc.? Also, if streaming was being done, how would we go about making sure that each device's stream was in sync.
Basically, the audio has to be in sync so that the person hearing it cannot differentiate between the devices. A few milliseconds off should not be a problem (unless the listener has super-human hearing).
You'd be amazed at how good the human ear us at spotting audio anomalies...
Sync the time of day
Effectively your trying to meet a real time requirement with a whole load if very variable things in the way (wifi, etc). I strongly suspect the only way you're going to get close to doing this is to issue a 'play' instruction that includes a particular time to start playing. Of course that relies on all the clocks being accurately set.
NTP
I don't know how iPhones get their time of day. If they use (or could use) NTP then you'll be getting close. NTP is designed to convey accurate time of day information over a network despite variable network delays. I've had a quick look and it seems that most NTP clients for iOS are the simple ones, not the full NTP that measures and tunes out network delays, clock drifts, etc.
GPS
Alternatively GPS is also a very good source of time information. Again I don't know if iPhones can or do use GPS for setting their clock but if it could be done then that would likely be pretty good. On Solaris (and I think Linux too) the 1 pulse per second that most GPS chips generate from the GPS signal can be used to condition the internal OS clock, making it very accurate indeed (sub microsecond accuracy).
I fear that iPhones don't do either of these things natively; both involve using a fair bit of electricity, so I wouldn't be surprised if they did something else less sophisticated.
Cell Time Service
Some Cell networks provide a time service too, but I don't think it's designed for accurate time setting. Also it tends not to be available everywhere. You often find it at major airports so that recent arrivals get their phones set to something close to local time.
Play at time X
So if one of those could be used to ensure that all the iPhones are set to exactly the same time of day then all you have to do is write your software to start playing at a specific time. That will probably involve polling the clock in a very tight loop waiting for it to tick over; most OSes don't provide a means of sleeping until a specific time. They do at least allow for sleeping for a period of time, which can be used to sleep until close to the appointed time. You'd then start polling the clock until the right time is reached.
Delay Measurement and Standard Deviation
Your first method is doomed I think. You might be able to measure average delays and so forth but that doesn't mean that every message has exactly the same latency. The standard deviation in the latency will tell you what you can expect to achieve, and I don't think that's going to be particularly small. If so then the message has got to include a timestamp.
NTP can work because it's only interested in the average delay measured over a period of time (hours sometimes), whereas you're interested in instantaneous delay.
Streaming with RTP
Your second method may work if you can time sync the devices as discussed above. The RTP protocol was designed for use in these circumstances; it doesn't help with achieving sync, but it does help a lot with the streaming. It tells you whereabouts in the stream any one piece of received data fits, allowing you to play it at the right time.
Clock Drift
Another problem to deal with is how long you're playing for. If it's a long time then you may discover that the 44kHz (or whatever) audio clock rate on each device isn't quite the same. So, whilst you might find a way of starting to play all at the same time, the separate devices will then start diverging ever so slightly. Over a long period of time they may be noticeably out.
BlueTooth
It might be possible to do something with BlueTooth. It has many weird and wonderful profiles, and it might be that one of those would serve to send an accurate 'start now' message.
Audio Trigger
You might also use sound as a means of conveying a start signal. One device can play a particular sound whilst your software in the others is listening with the mic. When a particular feature is detected in the sound, that's the time for everyone to start playing. Sort of a computerised "1, 2, a 1 2 3 4".
Camera Flash
Should be easy to spot in software...
I think your first way would work if you expand it a little bit. Assuming all the clocks on the devices are in sync you could include a timestamp in your play command. Then each device would calculate the time between the timestamp and when it received the command. You would then play the music and offset it by the time difference.

How to synchronize audio playback on 2 or more iOS devices?

I would like to write a web application that allows me to sync audio playback of an MP3 down to ~50ms, or close enough that the human ear can't detect the difference.
The idea would be that two or more smartphones could each be paired to a bluetooth speaker, and two or more speakers would play the same audio at the exact same time.
How would you suggest I go about setting this up, both client-side and server-side? I'm planning to use Rails/Ruby for backend, and iOS/obj c for mobile dev.
I had though of the idea of syncing to a global/atomic clock on the server, and having the server provide instructions to clients on when to start playing/jump in to an already playing track. My concern is that, if I want to stream the audio, that it will be impossible to load a song into memory and start playback accurately on the millisecond level.
Thoughts?
The jitter in internet packet delivery will be too large, so forget about syncing over the internet. However you could check the accuracy of NTP which is still used (I guess, I know that older UNIX's used it) by the OS when you switch on automatic date/time in Settings, but my guess is that it won't be good enough either. But perhaps the OS may also use other time sources like GPS; I'm don't know how iOS does it but accuracy within 20ms is not to be expected. You could create experimental app to check it out.
So, what's left is a sync closer to home, meaning between the devices directly. Of course you need to make sure that all devices haves loaded (enough of) the song, and have preloaded it in AVAudioPlayer or whatever you're using, to be able to start playing immediately. (It may actually not be the best idea to use higher level 'AVAudioPlayer` API's as it may give higher delays, and what more important higher jitter, than lower level API's.)
Here are three ideas (one device needs to be master triggering the start play, the others are slaves that are waiting for the trigger):
Use an audio trigger pulse, like a high tone of a defined length and frequency. Then use FFT to recognise this tone.
Connect the devices via GameKit Bluetooth and transmit the trigger on these connections.
Use the iPhone 4+ flash as trigger: flash in a certain pattern. This would require you to sample the video data which is quite doable and can be very fast.
I'm going with a solution that uses an atomic clock for synchronization, and an external service that allows server instructions/messages to be sent to all devices in close sync.

How can I determine the quality of a connection in iOS?

I'm familiar with using Reachability to determine the type of internet connection (if any) being used on an iOS device. Unfortunately that's not a decent indicator of connection quality. Wifi with low signal strength is pretty sketchy and 3G with anything less than 3 bars is a disaster (not to mention networks that only allow EDGE connections).
How can I determine the quality of my connection so I can help my users decide if they should be downloading larger files on their current connection?
A pragmatic approach would be to download one moderately large-sized file hosted on a reliable, worldwide CDN, at the start of your application. You know the filesize beforehand, you just have to measure the time it takes, make a simple computation and then you've got your estimate of the quality of the connection.
For example, jQuery UI source code, unminified, gzipped weighs roughly 90kB. Downloading it from http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.14/jquery-ui.js takes 327ms here on my Mac. So one can assume I have at least a decent connection that can handle approximately 300kB/s (and in fact, it can handle much more).
The trick is to find the good balance between the original file size and the latency of the network, as the full download speed is never reached on a small file like this. On the other hand, downloading 1MB right after launching your application will surely penalize most of your users, even if it will allow you to measure more precisely the speed of the connection.
Cyrille's answer is a good pragmatic answer, but is not really in the end a great solution in the mobile context for these reasons:
It involves doing a test "at the start of your application" by which I assume he means when your app launches. But your app may execute for a long while, may go background and then back into the foreground, and all the while the user is changing network contexts with changes in underlying network performance - so that initial test result may bear no relationship to the "current" performance of the network connection.
For the reason he rightly points out, that it is "penalizing" your user by making them download a test file over what may already be constrained network conditions.
You also suggest in your original post that you want your user to decide if they should download based on information you present to them. But I would suggest that this is not a good way to approach interacting with mobile users - that you should not be asking them to make complicated decisions. If absolutely necessary, only ask if they want to download the file if you think it may present a problem, but keep it that simple - "Do you want to download XYZ file (100 MB)?" I personally would even avoid even that.
Instead of downloading a test file, the better solution is to monitor and adapt. Measure the performance of the connection as you go along, keep track of the "freshness" of that information you have about how well the connection is performing, and only present your user with a decision to make if based on the on-going performance of the connection it seems necessary.
EDIT: For example, if you determine a patience threshold that in your opinion represents tolerable download performance, keep track of each download that the user does in order to determine if that threshold is being reached. That way, instead of clogging up the users connection with test downloads, you're using the real world activity as the determining factor for "quality of the connection", which is ultimately about the end-user experience of the quality of the connection. If you decide to provide the user with the ability to cancel downloads, then you have an excellent "input" about the user's actual patience threshold, and can adapt your functionality to that situation, by subsequently giving them the choice before they start the download. If you've flipped into this type of "confirmation" mode, but then find that files are starting to download faster, you could dynamically exit the confirmation mode.
Rob's answer is very good, but for a more specific implementation start with (https://developer.apple.com/library/archive/samplecode/SimplePing/Introduction/Intro.html#//apple_ref/doc/uid/DTS10000716)Apple's Simple Ping example source code
Target the domain for the server that you want to monitor connection quality to. Use the ping library to "ping" it on a regular basis (say 1 or 10 seconds depending upon your UI needs). Measure how long it takes to get a response to your ping (or if it never returns) to develop an estimate of the connection quality to communicate to your user.

Resources