Real time audio conversation iOS - ios

I am designing an iOS app for a customer who wants to allow real-time (with minimum lag, max 50ms) conversations between users (a sort of Teamspeak). The lag must be low because the audio can also be live music, played with instruments, so all the users need to synchronize. I need a server, which will request audio recordings to every client and send to others (and make them hear the same sound at the same time).
HTTP is easy to manage/implement and easy to scale, but very low-performing because an average HTTP request takes > 50ms... (with a mid-level hardware), so I was thinking of TCP/UDP connections kept open between clients and server.
But I have some questions:
If I develop the server in Python (using TwistedMatrix, for example), how are its performance ?
I can't develop the server in C++ because it is hard to manage (scalable) and to develop.
Anyone used Nodejs (which is easy to scale) to manage TCP/UDP connections?
If I use HTTP, will it be fast enough with Keep-Alive? Becuase usually the time required for an HTTP Request to be performed is > 50ms (because opening-closing connection is hard), and I want the total procedure to be less than that time.
The server will be running on a Linux machine.
And finally: which type of compression can you suggest me? I thought Ogg Vorbis would be nice, but if there's anything better (and can be used in iOS), I am open to changes.
Thank you,
Umar.

First off, you are not going to get sub 50 ms latency. Others have tried this. See for example http://ejamming.com/ a service that attempts to do what you are doing, but has a musically noticeable delay over the line and is therefore, in the ears of many, completely unusable. They use special routing techniques to get the latency as low as possible and last I heard their service doesn't work with some router configurations.
Secondly, what language you use on server probably doesn't make much difference, as the delay from client to server will be worse than any delay caused by your service, but if I understand your service correctly, you are going to need a lot of servers (or server threads) just relaying audio data between clients or doing some sort of minimal mixing. This is a small amount of work per connection, but a lot of connections, so you need something that can handle that. I would lean towards something like Java, Scala, or maybe Go. I could be wrong, but I don't think this is a good use-case for node, which, as I understand it, does not do multithreading well at this time. Also, don't poo-poo C++, scalable services have been built C++. You could also build the relay part of the service in C++ and the rest in whatever.
Third, when choosing a compression format, you'll have to choose one that can survive packet loss if you plan to use UDP, and I think UDP is the only way to go for this. I don't think vorbis is up to this task, but I could be wrong. Off the top of my head, I'm not sure of anything that works on the iPhone and is UDP friendly, but I'm sure there are lots of things. Speex is an example and is open-source. Not sure if the latency and quality meet your needs.
Finally, to be blunt, I think there are som other things you should research a bit more. eg. DNS is usually cached locally and not checked every http call (though it may depend on the system/library. At least most systems cache dns locally). Also, there is no such protocol as TCP/UDP. There is TCP/IP (sometimes just called TCP) and UDP/IP (sometimes just called UDP). You seem to refer to the two as if they are one. The difference is very important for what you are doing. For example, HTTP runs on top of TCP, not UDP, and UDP is considered "unreliable", but has less overhead, so it's good for streaming.
Edit: speex

What concerns the server, the request itself is not a bottleneck. I guess you have sufficient time to set up the connection, as it happens only in the beginning of the session. Therefore the protocol is not of much relevance.
But consider that HTTP is a stateless protocol and not suitable for audio streaming. There are a couple of real time streaming protocols you can choose from. All of them will work over TCP or UDP (e.g. use raw sockets), and there are plenty of implementations.
In your case, the bottleneck with latency is not the server but the network itself. The connection between an iOS device and a wireless access point (AP) eats up about 40ms if the AP is not misconfigured and connection is good. (ping your iPhone.) In total, you'd have a minimum of 80ms for the path iOS -> AP -> Server -> AP -> iOS. But it is difficult to keep that latency stable. (Typical latency of AirPlay on my local network is about 300ms.)
I think live music over iOS devices is not practicable today. Try skype between two iOS devices and look how close you can get to 50ms. I'd bet no one can do it significantly better, what concerns latency.
Update: New research result!
I have to revise my claims regarding the latency of wifi connections of the iDevice. Apparently when you first ping your device, latency will be bad. But if I ping again no later than 200ms after that, I see an average latency 2ms-3ms between AP and iDevice.
My interpretation is that if there is no communication between AP and iDevice for more than 200ms, the network adapter of the iDevice will go to a less responsive sleep mode, probably to save battery power.
So it seems, live music is within reach again... :-)
Update 2
The ping-interval required for keep alive of low latency apparently differs from device to device. The reported 200ms is for an 3rd gen. iPad. For my iPhone 4 it's more like 50ms.
While streaming audio you probably don't need to bother with this, as data is exchanged on a more frequent basis. In my own context, I have sparse communication between an iDevice and a server, but low latency is crucial. A keep alive therefore is the way to go.
Best, Peter

Related

iOS - synching audio between devices with millisecond accuracy

I need to synch audio in networked devices with millisecond accuracy. I've hacked together something that works quite well, but is'nt perfectly reliable :
1)Server device sends an rpc with a timeSinceClick param
2)client device launches the same click offsetted according to the time the rpc spent in transit,
3)System.Diagnostics.StopWatch checks periodicaly on all connected devices to make sure playback hasn't deviated too much from absolute time and corrects if necessary
Are there any more elegant ways to do this? Also, my way of doing it requires manual synching if non iOS devices are added to the mix : latency divergences make it very hard to automate...
I'm all eyes!
Cheers,
Gregzo
It is difficult to synchronize multiple devices on the same machine with millisecond accuracy, so if you are able to do this on multiple machines I would say you are doing well. I'm not familiar enough with iOS to the steps you describe, but I can tell you how I would approach this in cross-platform way. Maybe your approach amounts to the same thing.
one machine (the "master") would send a UDP packet with to all other machines.
all other machines would reply as quickly as possible.
the time it takes to receive the reply, divided by two, would be (approximately) the time it takes to get a packet from one machine to another. (this would have to be validated. maybe it takes much longer to process and send a packet? probably not)
after repeating steps 1-3, ignoring any extreme values and averaging the remaining results, you know about how long it takes to get a message from one machine to another.
now "sync" UDP packets can be sent from the main machine to the "slave" machines. The sync packets will include delay information so that when the save machine receive the packets, they know they were sent x milliseconds ago. Several sync packets may need to be sent in case the network delays or drops some of them.

Large number of WebSocket connections

I am writing an application that keeps track of content pushed around between users of a certain task. I am thinking of using WebSockets to send down new content as they are available to all users who are currently using the app for that given task.
I am writing this on Rails and the client side app is on iOS (probably going to be in Android too). I'm afraid that this WebSocket solution might not scale well. I am after some advice and things to consider while making the decision to go with WebSockets vs. some kind of polling solution.
Would Ruby on Rails servers (like Heroku) support large number of WebSockets open at the same time? Let's say a million connections for argument sake. Any material anyone can provide me of such stuff?
Would it cost a lot more on server hosting if I architect it this way?
Is it even possible to maintain millions of WebSockets simultaneously? I feel like this may not be the best design decision.
This is my first try at a proper Rails API. Any advice is greatly appreciated. Thx.
Million connections over WebSockets, using Ruby, I can't see its real if you not using clustering to spread connections between different instances to handle all the data processing.
The problem here is serializing and deserializing data.
As well you have to research of how often you will need to pull data to client from server, and if it worth to have just periodical checks using AJAX, then handling connection for whole time. Because if you do handle connection and then you not using it - it is waste of resources. WebSockets are build on top of TCP layer, and all connections are not "cheap" as well going through for OS and asking them for data available again is not the simple process, with millions connections it is something really almost impossible without using most advanced technologies in the world.
I head that Erlang is able to handle millions of connections, but I don't have details over it. As well connection is one thing, another is processing data and interaction between connections - this you might want to check, because if you have heavy processing algorithms, then you definitely need to look into horizontal scaling options over clustering solutions.
If you are implementing chat, use websockets.
If you are implementing 1 way messages in realtime use server sent events.
If you are implementing 1 way messages sent every few hours or so, use APNS.
The saying goes phone in hand, use websockets / server sent events.
Phone in pocket, use APNS.
APNS will alleviate wifi dips, tcp/ip socket hangs and many other issues. Really useful. There is the chance that it may take a little time to get through. But then again, there is the chance that websockets will take
Recent versions of iOS let you send APNS to the client without a popup message to the client so it can ask the server for more information. That along with some backgrounding implementations really improves things.
If possible, do not implement totally anonymous clients. It is very tricky to detect if a client reinstalls the app. So you'll end up sending duplicates to the client. Need to take that into account.
APNS looks trivial to implement in ruby, but I'd suggest avoiding the urge and going to using an existing gem/service out there that supports both google and apple. It is much trickier to implement than it may seem at first.
If you decide to stick with websockets, it may make sense to just leverage websockets in nginx like https://github.com/wandenberg/nginx-push-stream-module
ASIDE:
Using SMS where speed is critical is very expensive. $1/month per phone number only sends a max rate of 1 message per second. So sending 100 messages per second = $100/month plus message fees. Do note that 100 messages at a rate of 50 messages/second = $50/month. But if you want to send 1k messages, that takes 20 seconds.
Good luck

How do I increase the priority of a TCP packet in Delphi?

I have a server application that receives some special TCP packet from a client and needs to react to it as soon as possible by sending an high-level ACK to the client (the TCP ACK won't suite my needs).
However, this server is really network intensive and sometimes the packet will take too long to be sent (like 200ms in a local network, when a simple server application can send it in less than 1ms).
Is there a way to mark this packet with a high-priority tag or something like that in Delphi? Or maybe with the Win32 API?
Thanks in advance.
EDIT
Thanks for all the answers so far. I'll add some details. My product has the following setup: there are several devices that are built upon vehicles with WIFI conectivity. When they arrive at the garage, those device connect to my server and start to transmit data.
Because of hardware limitations, I implemented a high-level ACK to make the device aware that the last packet arrived successfully (please, don't argue about this - the data may be broken even if I got a correct TCP ACK). However, if I use my server software, that communicates with a remote database, to issue this ACK, I get very long delay (>200ms). If I use an exclusive software to do this task, I get small latencies (<1ms). So, I was imagining if I could just tell Windows to send those special packets first, as it seems to me that this package is getting delayed so the database ones can get delivered.
That's the motivation behind my question.
EDIT 2
As requested: this is legacy software and I'm using the legacy dclsockets140.bpl package and Delphi 2010 (14.0.3593.25826).
IMO it is very difficult to realize this. there are a lot of equipment and software involved. first of all, if you communicate between 2 different OS's you got a latency. second, soft and hard firewalls, antiviruses, everything is filtering/delaying your package.
you can try also to 'hack' the system(this involve some very good knowledge on how the frames/segments are packed/send,flow control,congestion,etc), either by altering it from code, either by using some tools like http://half-open.com/ or others.
In short, passing MSG_OOB flag to the send function marks the data as "urgent". Detailed discussion about the OOB in the context of Windows Sockets implementation specifics is available here.

Managing security on UDP socket

I am looking at developing my first multiplayer RTS game and I'm naturally going to be using UDP sockets for receiving/sending data.
One thing that I've been trying to figure out is how to protect these ports from being flooded by fake packets in a DoS attack. Normally a firewall would protect against flood attacks but I will need to allow packets on the ports that I'm using and will have to rely on my own software to reject bogus packets. What will stop people from sniffing my packets, observing any authentication or special structure I'm using and spamming me with similar packets? Source addresses can easily be changed to make detecting and banning offenders nearly impossible. Are there any widely accepted methods for protecting against these kind of attacks?
I know all about the differences between UDP and TCP and so please don't turn this into a lecture about that.
===================== EDIT =========================
I should add that I'm also trying to work out how to protect against someone 'hacking' the game and cheating by sending packets that I believe are coming from my game. Sequencing/sync numbers or id's could easily be faked. I could use an encryption but I am worried about how much this would slow the responses of my server and this wouldn't provide protection from DoS.
I know these are basic problems every programmer using a UDP socket must encounter, but for the life of me I cannot find any relevant documentation on methods for working around them!
Any direction would be appreciated!
The techniques you need would not be specific to UDP: you are looking for general message authentication to handle spoofing, rate throttling to handle DoS, and server-side state heuristics ("does this packet make sense?") to handle client hacks.
For handling DoS efficiently, you need layers of detection. First drop invalid source addresses without even looking at the contents. Put a session ID at the start of each packet with an ID that isn't assigned or doesn't match the right source. Next, keep track of the arrival rates per session. Start dropping from addresses that are coming in too fast. These techniques will block everything except someone who is able to sniff legitimate packets in real-time.
But a DoS attack based on real-time sniffing would be very rare and the rate of attack would be limited to the speed of a single source network. The only way to block packet sniffing is to use encryption and checksums, which is going to be a lot of work. Since this is your "first multiplayer RTS", I suggest doing everything short of encryption.
If you do decide to use encryption, AES-128 is relatively fast and very secure. Brian Gladman's reference Rijndael implementation is a good starting point if you really want to optimize, or there are plenty of AES libraries out there. Checksumming the clear-text data can be done with a simple CRC-16. But that's probably overkill for your likely attack vectors.
Most important of all: Never trust the client! Always keep track of everything server-side. If a packet arrives that seems bogus (like a unit moving Y units per second while it should only be able to mov X units per second) then simply drop the packet.
Also, if the number of packets per second grows to big, start dropping packets as well.
And don't use the UDP packets for "unimportant" things... In-game chat and similar things can go though normal TCP streams.

What's the most simple way to bi-directional communication between iOS and Mac OS X?

I'm considering using 2 NSStream for up/down channels. However, it looks somewhat complex. If you know simpler way (or recommendations) to do this, please let me know!
-- edit --
This is a kind of fast prototyping of internal/in-house remote controller. Low latency is best, but not required.
Binary formatted data, but not so heavy. Most of them are short control messages, and sometimes big chunks exceptionally.
On Cocoa / Cocoa touch. Platforms are limited to them.
Two peers are on LAN or at least WiFi network. So I can assume the connection is basically fast.
Compatibility to unknown hosts, high efficiency/performance/reliability and such things are no need to be considered. Just simplicity is most important now.
Without knowing acceptable latency, amount of data, type of data, and/or network topology (same LAN? routing over WAN?), it is impossible to say.
For most purposes, HTTP provides an awfully big and versatile hammer. And HTTP is supported by just about everything.
You want simple? Nothing is as simple as HTTP simply because it is a ubiquitous high level protocol that everyone and there brother has implemented anywhere from high level APIs (like NSHTTP*/NSURL*) down to less-than-$1 embedded chips.
If the devices you want to control have an option for an HTTP server, go for that. It'll be dead simple and debugging is much much easier when working with a high level protocol like HTTP.
At this point, it is hard not to buy a device with a LAN/wLAN port that doesn't also have an HTTP server in it (off the top of my head, my home theater receiver, solar controller, bbq, printer, security camera, PS3, VOIP box, and U-verse router all have HTTP servers).
However, the requirements on your non-Cocoa Touch side may dictate otherwise.

Resources