How is SIP scaled for high load?

How is SIP scaled for high load? - scalability

Basically, I want to implement a VoIP system with sip in a vps server. But it seems that it would not be able to handle more than ~20 simultaneous calls(just bare sip). What are the workarounds to this problem? Can the sip server be just used as a database to tell the clients where to find their intended targets..? Like p2p? I am quite new to sip. Additional info is appreciated.

Your VPS server looks to pretty low-key and when you say it cant handle more than 20 Cps that seems to indicate it topped out on CPU. Correct me if thats not the case.
Options to Scale SIP
Of the Shelf SIP Load balancer - Available in Virtual / Hardware / Opensource and every flavor that you want. It hides a farm of SIP Servers that you have and it can be managed to spread the load accordingly.
Unless the nature of SIP server is defined, it can be difficult to understand the bottlenecks you face and without that its difficult to give a simple solution.

SIP scalability comes from delegating as much work to the endpoints and doing as little on the servers as possible.
What you describe is a "redirect server": it accepts and stores registrations from the endpoints (softphones, hardphones, etc), and responds with "3xx redirect" to incoming calls and forgets about them immediately.
This is probably the most extreme example of server minimization. SIP is a very versatile protocol, it lets you set up your server infrastructure in many different ways with varying degree of control over calls. It lets you trade off features for performance.
Even the flimsiest VPS should be able to handle the signalling for way more than 20 parallel calls even in full "stateful proxy" mode.
Just make sure media (the RTP streams) is not routed through your server. Set up STUN to help firewalled endpoints send media to each other directly.

Related

So many persistent connections to the server. Is that the right way?

I would like to understand networking services with a large user base a bit better so that I know how to approach a project I am busy with.
The following statements that I make may be incorrect but they still lead to the question that I want to ask...
Please consider Skype and TeamViewer clients. It seems that both keep persistent network connections open to their respective servers. They use these persistent connections to initiate additional connections. Some of these connections are created by means of Hole Punching if the clients are behind NATs. They are then used for direct Peer-to-Peer communications.
Now according to http://expandedramblings.com/index.php/skype-statistics/ there are 300 million users using Skype and 4.9 million daily active users. I would assume that most of that 4.9 million users will most probably have their client apps running most of the day. That is a lot of connections to the Skype servers that are open at any given time.
So to my question; Is this feasible or at least acceptable? I mean, wouldn't it be better to not have a network connection open while idle and aspecially when there are so many connections open to the servers at once? The only reason I can think is that it would be the only way to properly do Hole Punching. Techically, how is this achieved on the server side?

Is this feasible or at least acceptable?
Feasible it certainly is, you mention already two popular apps that do it, so it is very doable in practice.
As for acceptable, to start no internet authority (e.g. IETF) has ever said it is unacceptable to have long-lived connections even with low traffic.
Furthermore, the only components for which this matters are network elements that keep connection/flow state. These are for sure the endpoints and so-called middleboxes like NAT and firewalls. For the client this is only one connection, the server is usually fine tuned by the application developers (who made this choice) themselves, so for these it is acceptable. For middleboxes it's simple: they have no choice, they're designed to just work with all kind of flows, including long-lived persistent connections.
I mean, wouldn't it be better to not have a network connection open while idle and aspecially when there are so many connections open to the servers at once?
Not at all. First of all, that could be 'much' slower as you'd need to set up a full connection before each control-plane call. This is especially noticeable if your RTT is big or if the servers do some complicated connection proxying/redirection for load-balancing/localization purposes.
Next to that this would historically make incoming calls difficult for a huge amount of users. Many ISP's block/blocked unknown incoming connections from the internet by means of a firewall. Similar, if you are behind a NAT device that does not support UPnP or PCP you can't open a port to listen on for your public IP address. So you need it even aside from hole-punching.
The only reason I can think is that it would be the only way to
properly do Hole Punching. Techically, how is this achieved on the
server side?
Technically you can't do proper hole-punching as soon as the NAT devices maintain a full <src-ip,src-port,dest-ip,dest-port,protocol> (classical 5-tuple) flow match. Then the best you can do with 'hole punching' is set up a proxy between peers.
What hole-punching relies on is that the NAT flow lookup is only looking at <src-ip,src-port,protocol> upstream and <dest-ip,dest-port,protocol> downstream to do the translation. In that case both clients just set up a connection to the server, their ip and port gets translated and the server passes this to the other client. The other client can now start sending packets to that translated <ip,port> combination which should work because NAT ignores the server's ip/port. But even if the particular NAT would work like this, some security device (e.g. stateful firewall) might detect session hi-jacking and drop this anyway.
Nowadays you rather use UPnP to open up a port to listen on your public IP which is much easier if supported.

ZeroMQ REQ/REP the other way round

I have a strange szenario:
Webserver / Appserver (Java) sends requests to many different satellite systems (on customers site). Only satellite systems can initiate connection due to firewall rules.
The model I think should be something like REQ/REP, but here the REQuester have to bind and the REPlyer would have to connect.
Is this possible and a stable architecture?
Are there better solutions? (We first had WebSockets in mind...)
Remark: we don't have to use Java on both ends. To be precise on customers site we have Delphi, but we could bridge it somehow.

The model I think should be something like REQ/REP, but here the
REQuester have to bind and the REPlyer would have to connect.
This will be problematic. When the server initiates the connection, it must be aware of all peers and their bind address. Not a big deal for a handful of peers, but for many peers changing constantly, it's a mess.
Only satellite systems can initiate connection due to firewall rules.
If that's the case, your mileage will vary with WebSockets; google around, lots of info on this.
Are there better solutions?
Well, with ZeroMq, one solution that comes to mind to support client request initiation is this:
Server binds with ROUTER
Clients connect with DEALER.
This approach offers bi-directional request/reply, does not block (asynchronous), and eliminates the client-side bind problem mentioned in your question. Here, the server binds, and either side can initiate the conversation.
I recommend reading this section in the guide, it covers extended async request/reply and message enveloping, important when using ROUTER/DEALER sockets.

Large number of WebSocket connections

I am writing an application that keeps track of content pushed around between users of a certain task. I am thinking of using WebSockets to send down new content as they are available to all users who are currently using the app for that given task.
I am writing this on Rails and the client side app is on iOS (probably going to be in Android too). I'm afraid that this WebSocket solution might not scale well. I am after some advice and things to consider while making the decision to go with WebSockets vs. some kind of polling solution.
Would Ruby on Rails servers (like Heroku) support large number of WebSockets open at the same time? Let's say a million connections for argument sake. Any material anyone can provide me of such stuff?
Would it cost a lot more on server hosting if I architect it this way?
Is it even possible to maintain millions of WebSockets simultaneously? I feel like this may not be the best design decision.
This is my first try at a proper Rails API. Any advice is greatly appreciated. Thx.

Million connections over WebSockets, using Ruby, I can't see its real if you not using clustering to spread connections between different instances to handle all the data processing.
The problem here is serializing and deserializing data.
As well you have to research of how often you will need to pull data to client from server, and if it worth to have just periodical checks using AJAX, then handling connection for whole time. Because if you do handle connection and then you not using it - it is waste of resources. WebSockets are build on top of TCP layer, and all connections are not "cheap" as well going through for OS and asking them for data available again is not the simple process, with millions connections it is something really almost impossible without using most advanced technologies in the world.
I head that Erlang is able to handle millions of connections, but I don't have details over it. As well connection is one thing, another is processing data and interaction between connections - this you might want to check, because if you have heavy processing algorithms, then you definitely need to look into horizontal scaling options over clustering solutions.

If you are implementing chat, use websockets.
If you are implementing 1 way messages in realtime use server sent events.
If you are implementing 1 way messages sent every few hours or so, use APNS.
The saying goes phone in hand, use websockets / server sent events.
Phone in pocket, use APNS.
APNS will alleviate wifi dips, tcp/ip socket hangs and many other issues. Really useful. There is the chance that it may take a little time to get through. But then again, there is the chance that websockets will take
Recent versions of iOS let you send APNS to the client without a popup message to the client so it can ask the server for more information. That along with some backgrounding implementations really improves things.
If possible, do not implement totally anonymous clients. It is very tricky to detect if a client reinstalls the app. So you'll end up sending duplicates to the client. Need to take that into account.
APNS looks trivial to implement in ruby, but I'd suggest avoiding the urge and going to using an existing gem/service out there that supports both google and apple. It is much trickier to implement than it may seem at first.
If you decide to stick with websockets, it may make sense to just leverage websockets in nginx like https://github.com/wandenberg/nginx-push-stream-module
ASIDE:
Using SMS where speed is critical is very expensive. $1/month per phone number only sends a max rate of 1 message per second. So sending 100 messages per second = $100/month plus message fees. Do note that 100 messages at a rate of 50 messages/second = $50/month. But if you want to send 1k messages, that takes 20 seconds.
Good luck

Websocket scalability, broadcasting concerns

If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.

Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.

The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.

This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.

Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.

I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart