Here is my task.
Every second backgrounder task should generate json based on some data. This operation is not cpu intensive ( mostly network) and it generates JSON content (5-10KB). Operations take about 200ms.
Also I have about 1000 clients asking for this content once every few seconds. Let's say it's about 200 requests/sec.
Server should just output current actual json.
Currently i already have rails 4+nginx+passenger+debian sever doing other jobs, related to this work.
Being a student I want to make my server in a most cost-effective way having an ability to easy-scale in this ways:
Adding few more backgrounder jobs, generating more json's
Increasing number of requests to 10 000 per second
Currently I have linode 2048 ssd with 2 CPU Cores. My questions are:
What gem/solution should I use for my backgrounder tasks ( the are currently written in ruby )
How to effectivly store actual json and pass it from backgrounder(s) to rails/nginx.
How to make serving json as fast as possible.
you mentioned "Server should just output current actual json", I guess the JSON generation may not become a bottleneck as you can cache it to Memcache and serve Memcache directly:
1) Periodically background process -> dump data to Memcache (even gzip to speed it up)
2) User -> Nginx -> Memcache
See the Nginx memcache built in module http://nginx.org/en/docs/http/ngx_http_memcached_module.html
The bottleneck is any backend with blocking mechanism, GIL, IO locks etc, try to avoid these type of problems by split request/response cycle with intermediate Memcache data point.
Related
I understand a benefit of Puma over other Rails web servers is how it handles slow clients. While a Puma server receives and downloads a (potentially slow) request, it can still receive and download other requests that might download quicker and be passed on to a worker for processing before the slow request has finished being received.
But I can't find any information about what, if any, limits there are to this.
Can Puma download any number of requests at the same time? If 1000 slow requests hit it at the same time, would the 1001st request reach a Puma worker first assuming it wasn't a slow request?
I guess what I'm interested in generally is what impact multiple slow requests have on other requests, including each other - because I'm working on an application that's likely to involve plenty of 'slow requests' (image uploads from phones via 3G).
This great article by #nate-berkopec helps explain in principle how Puma helps with slow clients: "In clustered mode, then, Puma can deal with slow requests (thanks to a separate master process whose responsibility it is to download requests and pass them on)..." Any more light anyone can shed would be very welcome.
There are a number of considerations, such as the IO polling system, memory and concurrency concerns.
IO polling system
Edit (Sep. 9th, 2020): By now the Puma server is running on the nio4r and should no longer be subject to the limits of the select system call (where file descriptor values are limited to 1023).
As far as I know, Puma uses the select system call (unlike iodine or passenger, which also protect you from slow clients but use kqueue or epoll).
The select system call is limited on most systems (usually up to 1024 clients / maxfd). I would assume that would create a limit.
However, I know Puma is working on replacing the select system call with something both portable and effective (such as leveraging the nio4r gem).
I don't know if that was already achieved, but it will break this limit and probably improve performance.
Memory
Slow clients still consume memory as they slowly fill the buffer with their header data or slowly download the buffered data that was sent (keeping the buffer in the memory until the download is complete).
Memory limitations will always add limitations to slow client handling.
Some limitations can be elevated, such as sending static files using X-Sendfile (supported with iodine, and when Puma or passenger are running under nginx)... but this isn't really something you can fix.
Concurrency
Puma handles slow clients within the Ruby's GIL (global instruction lock). This means that no other threads / instructions can execute while Puma is handling slow clients.
This is often a non issue, but a large enough number of slow clients increases the cost of context switching and system calls. This could (potentially) slow a server considerably.
Both Passenger and iodine perform slow client buffering outside of the GIL, allowing these system calls to be truly concurrent (when multiple CPU cores are available).
This will mitigate the issue but not fully solve the issue.
Conclusion and Caveats
The biggest issue is usually the IO polling system. A solution for this is on Puma's roadmap (maybe already implemented, I'm not sure).
The other issues (memory limits and concurrency limits) are relatively less important, but they can't be mitigated without using language extensions (the iodine server is written in C and Passenger is written in C++).
Since Puma doesn't (currently) require any language extensions (except for it's integrated HTTP parsers in C and Java), these issue remain.
I should point out that I'm the author for the iodine HTTP/Websocket server, so I'm somewhat biased.
I'm aware of the hugely trafficked sites built in Django or Ruby On Rails. I'm considering one of these frameworks for an application that will be deployed on ONE box and used internally by a company. I'm a noob and I'm wondering how may concurrent users I can support with a response time of under 2 seconds.
Example box spec: Core i5, 8Gb Ram 2.3Ghz. Apache webserver. Postgres DB.
App overview: Simple CRUD operations. Small models of 10-20 fields probably <1K data per record. Relational database schema, around 20 tables.
Example usage scenario: 100 users making a CRUD request every 5 seconds (=20 requests per second). At the same time 2 users uploading a video and one background search process running to identify potentially related data entries.
1) Would a video upload process run outside the GIL once an initial request set it off uploading?
2) For the above system built in Django with the full stack deployed on the box described above, with the usage figures above, should I expect response times <2s? If so what would you estimate my maximum hits per second could be for response time <2s?
3) For the the same scenario with Ruby On Rails, could I expect response times of <2s?
4) Specifically with regards to response times at the above usage levels, would it be significantly better built in Java (play framework) due to JVM support for concurrent processing.
Many thanks
Duncan
I am interested in ways to optimize my Unicorn setup for my Ruby on Rails 3.1.3 app. I'm currently spawning 14 worker processes on High-CPU Extra Large Instance since my application appears to be CPU bound during load tests. At about 20 requests per second replaying requests on a simulation load tests, all 8 cores on my instance get peaked out, and the box load spikes up to 7-8. Each unicorn instance is utilizing about 56-60% CPU.
I'm curious what are ways that I can optimize this? I'd like to be able to funnel more requests per second onto an instance of this size. Memory is completely fine as is all other I/O. CPU is getting tanked during my tests.
If you are CPU bound you want to use no more unicorn processes than you have cores, otherwise you overload the system and slow down the scheduler. You can test this on a dev box using ab. You will notice that 2 unicorns will outperform 20 (number depends on cores, but the concept will hold true).
The exception to this rule is if your IO bound. In which case add as many unicorns as memory can hold.
A good performance trick is to route IO bound requests to a different app server hosting many unicorns. For example, if you have a request that uses a slow sql query, or your waiting on an external request, such as a credit card transaction. If using nginx, define an upstream server for the IO bound requests, forward those urls to a box with 40 unicorns. CPU bound or really fast requests, forward to a box with 8 unicorns (you stated you have 8 cores, but on aws you might want to try 4-6 as their schedulers are hypervised and already very busy).
Also, I'm not sure you can count on aws giving you reliable CPU usage, as your getting a percentage of an obscure percentage.
First off, you probably don't want instances at 45-60% cpu. In that case, if you get a traffic spike, all of your instances will choke.
Next, 14 Unicorn instances seems large. Unicorn does not use threading. Rather, each process runs with a single thread. Unicorn's master process will only select a thread if it is able to handle it. Because of this, the number of cores isn't a metric you should use to measure performance with Unicorn.
A more conservative setup may use 4 or so Unicorn processes per instance, responding to maybe 5-8 requests per second. Then, adjust the number of instances until your CPU use is around 35%. This will ensure stability under the stressful '20 requests per second scenario.'
Lastly, you can get more gritty stats and details by using God.
For a high CPU extra large instance, 20 requests per second is very low. It is likely there is an issue with the code. A unicorn-specific problem seems less likely. If you are in doubt, you could try a different app server and confirm it still happens.
In this scenario, questions I'd be thinking about...
1 - Are you doing something CPU intensive in code--maybe something that should really be in the database. For example, if you are bringing back a large recordset and looping through it in ruby/rails to sort it or do some other operation, that would explain a CPU bottleneck at this level as opposed to within the database. The recommendation in this case is to revamp the query to do more and take the burden off of rails. For example, if you are sorting the result set in your controller, rather than through sql, that would cause an issue like this.
2 - Are you doing anything unusual compared to a vanilla crud app, like accessing a shared resource, or anything where contention could be an issue?
3 - Do you have any loops that might burn CPU, especially if there was contention for a resource?
4 - Try unhooking various parts of the controller logic in question. For example, how well does it scale if you hack your code to just return a static hello world response instead? I bet suddenly unicorn will be blazlingly fast. Then try adding back in parts of your code until you discover the source of the slowness.
I'd like to mock large (>100MB) and slow file downloads locally by a ruby service - rails, sinatra, rack or something else.
After starting server and writing something like: http://localhost:3000/large_file.rar, I'd like to slooowly download a file (for testing purposes).
My question is, how to throttle local webserver to certain maximum speed? Because if file is stored locally, it will by default download very fast.
You should use curl for this, which allows you to specify a maximum transfer speed with the --limit-rate option. The following would download a file at about 10KB per second:
curl --limit-rate 10K http://localhost:3000/large_file.rar
From the documentation:
The given speed is measured in bytes/second, unless a suffix is
appended. Appending ‘k’ or ‘K’ will count the number as kilobytes, ‘m’
or M’ makes it megabytes, while ‘g’ or ‘G’ makes it gigabytes.
Examples: 200K, 3m and 1G.
The given rate is the average speed counted during the entire
transfer. It means that curl might use higher transfer speeds in short
bursts, but over time it uses no more than the given rate.
More examples here (search for "speed limit"): http://www.cs.sunysb.edu/documentation/curl/index.html
We are expect to serve few thousands uploads within 2 or 3 minutes. Most of the uploads will be about 20 -> 200 Mb.
Technically, I think upload has not much to do with Rails, but rather the WebServer (Apache/Nginx), so as long as the server can handle concurrent requests, then there not much work for Rails app to do (except to move the file to proper storage and to create database record to track the file).
Is my assumption right? Normally, how many concurrent uploads a single Rails App process can be expected to handle? (Given the Rails App could take 20 ms with all the calculation, moving file and create database record, but the connection must be kept alive for 1 minute so that the file can be successful transferred)
Not really but close. A single rails application instance can only handle a single request at a time but it's easy to use a server that has a pool of these instances using nginx and passenger or mongrels and a load balancer.
You should create a load test to confirm any of your assumptions.
I would use curl to simulate 10/100/1000 users uploading a few megabytes using multiple processes and tune the upload speed to simulate slow clients to see how that affects your performance. Measure the response times for 10 concurrent requests and record and observe the results.
You could use the nginx upload module and by-pass rails if you can and if that helps. Always test your assumptions.