Phusion Passenger + Rails App: website is under heavy load - ruby-on-rails

I have rails app running on passenger web server. From time to time I get queue full error from passenger.
But I have no idea what could cause this problem, because there is no long running requests in production log, nor could I see anything like this in the new relic monitor.
There are no memory leaks also, memory and cpu consumption is like average stats. Any ideas how to debug this situation?
I know about passenger-status --show=requests flag but, it is a little helpful because I don't know what requests it includes: only queued or running (hung) ones also?

Related

Puma crashing after running at 100% for a few seconds

I've deployed my ROR app at AWS EC2 instance using Nginx and Puma. Now, I have a page in app that runs lots of queries in loops(I know that's bad but we'll be improving it in some time).
Now the thing is, this page is giving 502 Gateway Timeout error resulting in crashing Puma Server. I investigated the CPU processes on server and it shows that ruby process runs at 100% CPU for few seconds and after that Puma crashes.
I'm unsure why is this happening, as the same page with same data loads on local PC in 6-7 seconds.
Is this some limit from AWS on processes?
Is this something on the Puma side?
Without further information, it's not possible to give an exact answer what's causing the problem.
As an "educated guess", I'd say it could be an out-of-memory issue.
I found the issue after multiple hours of debugging. It was a very edge case scenario putting server in an infinite loop causing memory to overflow.
I used top -i to investigate the increasing memory.
Thank you all for suggestions and responses.

nginx passenger The website under heavy load error and no queue is processed

i am getting this error message. i understand this is because of queues but it always stays like this (no queue is processed. it is stuck). using passenger 4 and nginx. i had to restart nginx to get it working
This website is under heavy load
We're sorry, too many people are accessing this website at the same time. We're working on this problem. Please try again later.

Unicorn optimize with unicorn killer gives worst performance

I used 'unicorn-worker-killer' gem with the some additional modificationfrom here for ruby GC http://blog.newrelic.com/2013/05/28/unicorn-rawk-kick-gc-out-of-the-band/
But after the following instruction both there(https://github.com/kzk/unicorn-worker-killer) and deployed to production server. My application performance degrade gradually like
App server response time goes from 350 ms avg to 1100 ms
Page loading time goes from 6s avg to 13s
Also my heroku combination is:
6 Web dyno with 1 gb memory
1 woker dyno with 1x speed.
unicron worker process is 3
my db connection is 40 and set db pool 2 at heroku.
Please help me about how i optimize page loading time and app server time.
Any idea?
There isn't enough information here to diagnose your issue.
I would recommend you don't use this unicorn worker killer gem. You're better off using a normal application server like unicorn configured for Heroku unless you have a specific problem (hung workers or memory leaks).
If you want to diagnose your performance problems and load times, the best thing to do is use a service like NewRelic (available as a Heroku add-on) that will allow you to measure your request times and drill down into what specifically is a bottle neck and then fix that.

Identify Bottleneck with Passenger

We are running a big server (12 threads, 6 cores, 64Gb ram, 2 SDDs raid-0) for our rails app deployed with nginx/passenger.
Unfortunately, pages are taken forever to load something like between 10 and 40 seconds. However, the server is under a really light load, with a load average of 0.61 0.56 0.53. We have ram used weirdly, free -ml reporting 57Gb (of 64Gb) usage whereas htop reporting only 4Gb (of 64Gb).
We have checked our production log, and rails request takes something like 100/200ms to be completed, so almost nothing.
How can we identify the bottleneck?
This question is fairly vague, but I'll see if I can give you some pointers.
My first guess is that your app is spending a lot of time doing database related stuff, see below for my advice.
As for the odd memory usage, are you looking at the correct part of the free -ml output? Just to clarify, you want to be looking at the -/+ buffers/cache: line to get the accurate output.
You might also check to see if any of your passenger workers are hanging, as that is a fairly common issue with passenger. You can do this by running strace -p $pid on your passenger workers. If its hanging, it will have an obvious lack of "doing anything"
As for troubleshooting response time within rails itself, I would highly suggest looking into using newrelic(http://newrelic.com/). You can often times see exactly which part of your app is causing the bad response time that way by breaking down how much time is spent in each part of your app. It's a simple gem to install and once you get reporting working, its pretty invaluable for issues like this.
Finally, the bottleneck was passenger, passenger-status is pretty useful showing up the queue left.
Our server is pretty precent so we just increase the number of passenger processes in nginx.conf to 600, resulting:
passenger_root /usr/local/rvm/gems/ruby-2.0.0-p195/gems/passenger-4.0.5;
passenger_ruby /usr/local/rvm/wrappers/ruby-2.0.0-p195/ruby;
passenger_max_pool_size 600;

Thin vs Unicorn on Heroku

Just wanted to get people's opinions on using Unicorn vs Thin as a rails server. Most of the articles/benchmarks I found online seem very incomplete, so it would nice to have a centralized place to discuss it.
Unicron is a multi-processes server, while thin is an event based/non-blocking server. Event-based servers are great... if your code is asynchronous/non-blocking - vanilla rails is blocking. So unless you use non-blocking rails libraries, I really don't see the advantage of using Thin. Even worse, in a non-blocking server, if your i/o loop is blocking you're going to block the entire loop and not be able to handle any more requests until the blocking call returns. Blocking libraries are going to slow thin down!
Why did Heroku choose Thin as their default server (for cedar)? They are smart guys, so I'm sure they had a reason.
Bellow is a link that suggests replacing Thin with 4 Unicorn workers - this makes perfect sense to me.
4 Unicron workers on Heroku
Thin is easy to configure - not optimal, but it just works in the Heroku environment.
Unicorn can be more efficient, but it needs to be configured: How many workers? Preload App? What do you pick?
I have released Unicorn Heroku apps with workers set to 3, 5 and 8 - just based on how big each app is - how much code, how much memory is used and how much traffic you get all go into picking this number, and you need to monitor over time to make sure you got the number right, and your app isn't running out of memory.
Preload false - this will make your app start slower, but when Unicorn restarts a worker, this is 'safer' with network connections (memcache, postgres, mongo etc)
Preload true - this is better, but you need to handle server re-connections correctly in the pre and post fork code.
Thin has none of these issues out of the box, but you only get process of execution.
Summary: It's really hard to configure Unicorn out of the box to work well (or at all) for everyone, whereas Thin can just work to get people running with fewer support requests.
Recently (only a few months ago) the folks behind Phusion Passenger add support to Heroku. Definitely this is an alternative you should try and see if fits your needs.
Is blazing fast even with 1 dyno and the drop in response time is palpable.
A simple Passenger Ruby Heroku Demo is hosted on github.
The main benefits that Passengers on Heroku claims are:
Static asset acceleration through Nginx - Don't let your Ruby app serve static assets, let Nginx do it for you and offload your app for the really important tasks. Nginx will do a much better job.
Multiple worker processes - Instead of running only one worker on a dyno, Phusion Passenger runs multiple worker on a single dyno, thus utilizing its resources to its fullest and giving you more bang for the buck. This approach is similar to Unicorn's. But unlike Unicorn, Phusion Passenger dynamically scales the number of worker processes based on current traffic, thus freeing up resources when they're not necessary.
Memory optimizations - Phusion Passenger uses less memory than Thin and Unicorn. It also supports copy-on-write virtual memory in combination with code preloading, thus making your app use even less memory when run on Ruby 2.0.
Request/response buffering - The included Nginx buffers requests and responses, thus protecting your app against slow clients (e.g. mobile devices on mobile networks) and improving performance.
Out-of-band garbage collection - Ruby's garbage collector is slow, but why bother your visitors with long response times? Fix this by running garbage collection outside of the normal request-response cycle! This concept, first introduced by Unicorn, has been improved upon: Phusion Passenger ensures that only one request at the same time is running out-of-band garbage collection, thus eliminating all the problems Unicorn's out-of-band garbage collection has.
JRuby support - Unicorn's a better choice than Thin, but it doesn't support JRuby. Phusion Passenger does.
Hope this helps.
Heroku does not use intelligent routing - it will randomly assign jobs to dynos regardless of whether the dyno is busy. Thus, if your dyno cannot handle multiple jobs at once, you will get latency (perhaps massive latency) even if you are paying for lots of other dynos that are free. " That's right — if your app needs 80 dynos with an intelligent router, it needs 4,000 with a random router. "
http://news.rapgenius.com/James-somers-herokus-ugly-secret-lyrics
Heroku says they are working on this, and their plan is to make it easier to use Unicorn. They basically said "Oops, we didn't notice that this was a problem for a few years... and now that we look, it's definitely a problem for Thin... so I guess you need to use a different program than the one we've been pushing all this time."
http://news.rapgenius.com/Jesper-joergensen-routing-performance-update-lyrics
From the official Heroku explanation (second link above):
"Rails, in fact, does not yet reliably support concurrent request handling. This leaves Rails developers unable to leverage the additional concurrency capabilities offered by the Cedar stack, unless they move to a concurrent web server like Puma or Unicorn.
Rails apps deployed to Cedar with Thin can rather quickly end up with request queuing problems. Because the Cedar router no longer does any queuing on behalf of the app, requests queued at the dyno must wait until the single Rails process works its way through the queue. Many customers have run into this issue and we failed to take action and provide them with a better approach to deploying Rails apps on Cedar."
Also of interest is that their performance tools, including New Relic, have not been reporting time spent in the dyno queue.
http://news.rapgenius.com/Lemon-money-trees-rap-genius-response-to-heroku-lyrics
Oops.

Resources