What are common ways of implementing web API request throttling/rate-limiting? - ruby-on-rails

What are common ways of implementing web API request throttling? Are there any libraries for common web frameworks (Rails, Django, Java, etc.) that give you this along with temporary banning?
A related question suggests that the rate limiting is done at the web server by limiting requests by IP, but that would mean that all requests are treated equally. It seems like throttling needs to be handled by the application because:
Some API calls may have different rate limits (e.g. an autocompletion API would have a higher limit than other calls)
Temporary banning by API key can't be handled by the web server
Requests coming from behind a proxy are treated the same (?)
related questions: here, here, and here

Django-Piston has some neat throttling in there. Check out the source http://bitbucket.org/jespern/django-piston/wiki/Home

You might also want to use tools like IPtables (linux) to hard limit some of the incoming traffic. There are also third party services like 3scale (http://www.3scale.net - disclaimer - I work for them :-) ) which allow to keep track of and manage all the usage limits you want to apply to traffic on a per-user basis.

Related

Comparison between service worker and AppCache

What are the core differences between service worker and AppCache. What are the pros and cons of each and when to prefer one over another .
The primary difference is that AppCache is a high-level, declarative API, with which you specify the set of resources you'd like the browser to cache; whereas Service Worker is a low-level, imperative, event-driven API with which you write a script that can intercept fetch events and cache their responses along with doing other things (like displaying push notifications).
The pros and cons are largely a function of API design: theoretically, AppCache is easier to use, while having more limited use cases; whereas Service Worker is harder to use, but is more flexible.
Nevertheless, AppCache is considered hard to use in practice due to poor design (see Application Cache Is A Douchebag for a list of design issues). And it has been deprecated, so it is being removed from browsers (per Using the application cache).
Thus the only reason to prefer AppCache is to offline an app on browsers that don't yet support Service Worker, as Kenneth Ormandy recommends in Don’t Wait for ServiceWorker: Adding Offline Support with One-Line.
Compare Can I use Service Workers? to Can I use Offline web applications? to see the differences in browser support. But note that browsers that support Service Worker, like Chrome and Firefox, are removing support for AppCache, so you'll need to implement both to offline your app across all browsers that support either standard.
In addition of what Myk Melez said, One of the main benefits of Service Workers against Application Cache is that Application Cache only works when user is disconnected from the network, so you can not manage situations of:
1- "slow network" - Your connection signal is strong, however some external entities (server, routes, etc) are delaying the transmission to your specific application.
2- "Lie-fi" (your phone shows is connected to a wi-fi or a cell network with low signal) so it seems to be connected when actually is not.
Service Workers is like a middle ware giving you control over the requests the browser is making, you can actually intercept the request and respond wherever you want, no matter you are connected or not. So you can implement "offline first" principle.

How is SIP scaled for high load?

Basically, I want to implement a VoIP system with sip in a vps server. But it seems that it would not be able to handle more than ~20 simultaneous calls(just bare sip). What are the workarounds to this problem? Can the sip server be just used as a database to tell the clients where to find their intended targets..? Like p2p? I am quite new to sip. Additional info is appreciated.
Your VPS server looks to pretty low-key and when you say it cant handle more than 20 Cps that seems to indicate it topped out on CPU. Correct me if thats not the case.
Options to Scale SIP
Of the Shelf SIP Load balancer - Available in Virtual / Hardware / Opensource and every flavor that you want. It hides a farm of SIP Servers that you have and it can be managed to spread the load accordingly.
Unless the nature of SIP server is defined, it can be difficult to understand the bottlenecks you face and without that its difficult to give a simple solution.
SIP scalability comes from delegating as much work to the endpoints and doing as little on the servers as possible.
What you describe is a "redirect server": it accepts and stores registrations from the endpoints (softphones, hardphones, etc), and responds with "3xx redirect" to incoming calls and forgets about them immediately.
This is probably the most extreme example of server minimization. SIP is a very versatile protocol, it lets you set up your server infrastructure in many different ways with varying degree of control over calls. It lets you trade off features for performance.
Even the flimsiest VPS should be able to handle the signalling for way more than 20 parallel calls even in full "stateful proxy" mode.
Just make sure media (the RTP streams) is not routed through your server. Set up STUN to help firewalled endpoints send media to each other directly.

Websocket scalability, broadcasting concerns

If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.
Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.
The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?

Real-time ASP.NET MVC Web Application

I need to add a "real-time" element to my web application. Basically, I need to detect "changes" which are stored in a SQL Server table, and update various parts of the UI when a change has occured.
I'm currently doing this by polling. I send an ajax request to the server every 3 seconds asking for any new changes - these are then returned and processed. It works, but I don't like it - it means that for each browser I'll be issuing these requests frequently, and the server will always be busy processing them. In short, it doesn't scale well.
Is there any clever alternative that avoids polling overhead?
Edit
In the interests of completeness, I'm updating this to mention the solution we eventually went with - SignalR. It's OS and comes from Microsoft. It's risen in popularity, and I can heartily recommend this, or indeed WebSync which we also looked at.
Check out WebSync, a comet server designed for ASP.NET/IIS.
In particular, what I would do is use the SQL Dependency class, and when you detect a change, use RequestHandler.Publish("/channel", data); to send out the info to the appropriate listening clients.
Should work pretty nicely.
taken directly from the link refernced by Jakub (i.e.):
Reverse AJAX with IIS/ASP.NET
PokeIn on codeplex gives you an enhanced JSON functionality to make your server side objects available in client side. Simply, it is a Reverse Ajax library which makes it easy to call JavaScript functions from C#/VB.NET and to call C#/VB.NET functions from JavaScript. It has numerous features like event ordering, resource management, exception handling, marshaling, Ajax upload control, mono compatibility, WCF & .NET Remoting integration and scalable server push.
There is a free community license option for this library and the licensing option is quite cost effective in comparison to others.
I've actually used this and the community edition is pretty special. well worth a look as this type of tech will begin to dominate the landscape in the coming months/years. the codeplex site comes complete with asp.net mvc samples.
No matter what: you will always be limited to the fact that HTTP is (mostly) a one-way street. Unless you implement some sensible code on the client (ie. to listen to incoming network requests) anything else will involve polling the server for updates, no-matter what others will tell you.
We had a similar requirement: to have very fast response time in one of our real-time web applications, serving about 400 - 500 clients per web server. Server would need to notify the clients almost within 0.1 of a second (telephony & VoIP).
In the end we implemented an Async Handler. On each polling request we put the request to sleep for 5 seconds, waiting for a semaphore pulse signal to respond to the client. If the 5 seconds are up, we respond with a "no event" and the client will post the request again (immediately). This resulted in very fast response times, and we never had any problems with up to 500 clients per machine.. no idea how many more we could add before the polling requests might create a problem.
take a look at this article
I've read somewhere (didn't remember where) that using this WCF feature make the host process handle requests in a way that didn't consume blocked threads.
Depending on the restrictions on you application you can use Silverlight to do this connection. You don't need to have any UI for Silverlight, but you can use Sockets have a connection that accepts server side pushes of data.

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

Resources