The website is under heavy load + ROR - ruby-on-rails

We are running a website with ROR on CentOS 6 with 2 web server and 1 database server. Some times it shows message "The website is under heavy load"... Can some plese help you what to check here.
We are using Passenger 4.0.21 with Ruby 1.8.7 and Apache 2.2.15. Web server is running with the default settings.
Below is some output of passenger-status:
# passenger-status
Version : 4.0.21
Date : Thu Dec 12 02:02:44 -0500 2013
Instance: 20126
----------- General information -----------
Max pool size : 6
Processes : 6
Requests in top-level queue : 0
----------- Application groups -----------
/home/web/html#default:
App root: /home/web/html
Requests in queue: 100
* PID: 20290 Sessions: 1 Processed: 53 Uptime: 24h 3m 5s
CPU: 0% Memory : 634M Last used: 23h 16m 8
* PID: 22657 Sessions: 1 Processed: 37 Uptime: 23h 15m 55s
CPU: 0% Memory : 609M Last used: 22h 44m
* PID: 29147 Sessions: 1 Processed: 146 Uptime: 20h 47m 48s
CPU: 0% Memory : 976M Last used: 18h 20m
* PID: 22216 Sessions: 1 Processed: 26 Uptime: 10h 3m 19s
CPU: 0% Memory : 538M Last used: 9h 44m 4
* PID: 23306 Sessions: 1 Processed: 75 Uptime: 9h 43m 22s
CPU: 0% Memory : 483M Last used: 8h 44m 4
* PID: 25626 Sessions: 1 Processed: 115 Uptime: 8h 46m 42s
CPU: 0% Memory : 540M Last used: 7h 59m 5

You have too many requests in queue. Since version 4.0.15 there is a limit which is 100 by default. Here is a short excerpt from http://blog.phusion.nl/2013/09/06/phusion-passenger-4-0-16-released/ which says:
Phusion Passenger now displays an error message to clients if too many
requests are queued up, instead of letting them wait. This much
improves quality of service. By default, "too many" is 100. You may
customize this with PassengerMaxRequestQueueSize (Apache) or
passenger_max_request_queue_size (Nginx).
Have a look at the user guide about this: http://www.modrails.com/documentation/Users%20guide%20Apache.html#PassengerMaxRequestQueueSize
You could try increasing it or setting it to 0in order to disable it.
EDIT
You should also check your logs to see whether there are requests which take too long. Maybe you have some processes in your code that take too long. I prefer using NewRelic for monitoring those things.

Related

ruby on rails app stuck on futex 'kill QUIT PID' but passenger gives no trace

Part of passenger-status(nginx/1.14.0 Phusion_Passenger/6.0.1) output shows that the two processes are shutting down but cannot quit.
* PID: 10351 Sessions: 1 Processed: 279777 Uptime: 6d 1h 41m 37s
CPU: 3% Memory : 625M Last used: 2d 3
Shutting down...
* PID: 10370 Sessions: 1 Processed: 290718 Uptime: 6d 1h 41m 37s
CPU: 3% Memory : 778M Last used: 6h 5
Shutting down...
The strace output tells me the ruby process is stuck on futex call.
$ strace -p 10351
strace: Process 10351 attached
futex(0x7fd7cbf02210, FUTEX_WAIT_PRIVATE, 0, NULL
kill -QUIT 10351 also gives me no trace info.
The relevant output of ps -efL to process 10351 shows another thread id 10353.
ubuntu 10351 1 10351 0 2 Feb02 ? 00:00:02 Passenger AppPreloader: /var/www/app/current (forking...)
ubuntu 10351 1 10353 0 2 Feb02 ? 00:00:00 Passenger AppPreloader: /var/www/app/current (forking...)
and strace -p 10353 outputs:
strace: Process 10353 attached
restart_syscall(<... resuming interrupted poll ...>
Any idea how to get the ruby trace info to debug the issue?

How to check whether my Rails app is running with a dev or production environment while running

I'm running a rails app on a Centos 6.5 server with Passenger and Nginx. How can I check which environment it's running on without stopping it?
Use the passenger-status command. For example, this shows me passenger is running the production environment (the first line under the Application groups heading):
(production-web) ubuntu#ip-10-0-3-146 ~% sudo passenger-status
Version : 5.0.15
Date : 2015-08-20 17:40:24 +0000
Instance: lNNFwV1C (Apache/2.4.7 (Ubuntu) Phusion_Passenger/5.0.15)
----------- General information -----------
Max pool size : 12
App groups : 1
Processes : 6
Requests in top-level queue : 0
----------- Application groups -----------
/home/my-app/deploy/current (production):
App root: /home/my-app/deploy/current
Requests in queue: 0
* PID: 11123 Sessions: 0 Processed: 12997 Uptime: 21h 14m 2s
CPU: 0% Memory : 190M Last used: 1s ago
* PID: 11130 Sessions: 0 Processed: 140 Uptime: 21h 14m 2s
CPU: 0% Memory : 153M Last used: 9m 32s a
* PID: 11137 Sessions: 0 Processed: 15 Uptime: 21h 14m 2s
CPU: 0% Memory : 103M Last used: 57m 54s
* PID: 11146 Sessions: 0 Processed: 6 Uptime: 21h 14m 2s
CPU: 0% Memory : 101M Last used: 7h 47m 4
* PID: 11153 Sessions: 0 Processed: 5 Uptime: 21h 14m 1s
CPU: 0% Memory : 100M Last used: 8h 42m 3
* PID: 11160 Sessions: 0 Processed: 2 Uptime: 21h 14m 1s
CPU: 0% Memory : 81M Last used: 8h 42m 3
rails console is not reliable - it only tells you what environment the console is running under. Passenger may be configured to run in a different environment.
Your environment is found on Rails.env.
Loading development environment (Rails 4.2.3)
2.1.2 :001 > Rails.env
=> "development"
You can also use the environment in a question format for conditionals:
2.1.2 :002 > Rails.env.production?
=> false
2.1.2 :003 > Rails.env.pickle?
=> false
2.1.2 :004 > Rails.env.development?
=> true
Word of warning - this is if you want to program something within your code that checks the environment.

phusion passenger processes dying and new ones starting up mysteriously

as you can see, passenger processes are dying and new ones booting up, even though we're not explicitly restarting passenger ourselves. we can't pinpoint what's causing this. what are some common places we should look to find out what's triggering these restarts?
the passenger-status commands were issued about 30 min apart. passenger_pool_idle_time is set to 0 in our conf file, which you can see here: https://gist.github.com/panabee/8ddf95a72d6a07e29c7f
we're on passenger 4.0.5, rails 3.2.12, and nginx 1.4.1.
[root#mongo ~]# passenger-status
----------- General information -----------
Max pool size : 20
Processes : 3
Requests in top-level queue : 0
----------- Application groups -----------
/home/p/p#default:
App root: /home/p/p
Requests in queue: 0
* PID: 17171 Sessions: 0 Processed: 536 Uptime: 27m 56s
CPU: 0% Memory : 62M Last used: 20s ago
* PID: 18087 Sessions: 0 Processed: 363 Uptime: 17m 31s
CPU: 0% Memory : 36M Last used: 39s ago
* PID: 19382 Sessions: 0 Processed: 51 Uptime: 2m 55s
CPU: 0% Memory : 34M Last used: 5s ago
[root#mongo ~]# passenger-status
----------- General information -----------
Max pool size : 20
Processes : 2
Requests in top-level queue : 0
----------- Application groups -----------
/home/p/p#default:
App root: /home/p/p
Requests in queue: 0
* PID: 25266 Sessions: 0 Processed: 73 Uptime: 2m 56s
CPU: 0% Memory : 32M Last used: 34s ago
* PID: 25462 Sessions: 1 Processed: 18 Uptime: 51s
CPU: 0% Memory : 28M Last used: 0s ago
[root#mongo ~]#
Look in the web server error log. If the application dies you will probably see the reason in that log file.
this is a bug in 4.0.5. 4.0.6 patches things. in the meantime, set the value to a very large number.

Passenger Spawning a lot of Rack Applications

output of passenger-memory-stats
----- Passenger processes -----
PID VMSize Private Name
-------------------------------
28572 207.4 MB ? Rack: /home/myapp/application
28580 207.0 MB ? Rack: /home/myapp/application
28588 206.0 MB ? Rack: /home/myapp/application
28648 206.5 MB ? Rack: /home/myapp/application
29005 23.0 MB ? PassengerWatchdog
29008 100.5 MB ? PassengerHelperAgent
29010 43.1 MB ? Passenger spawn server
29013 70.8 MB ? PassengerLoggingAgent
29053 202.0 MB ? Passenger ApplicationSpawner: /home/myapp/application
29105 202.3 MB ? Rack: /home/myapp/application
29114 202.3 MB ? Rack: /home/myapp/application
29121 202.3 MB ? Rack: /home/myapp/application
29130 202.3 MB ? Rack: /home/myapp/application
29138 202.3 MB ? Rack: /home/myapp/application
That looks like a lot of spawned processes... this is a app currently in development with no one (that I know of) hitting it...
the output of passenger-status
App root: /home/myapp/application
* PID: 29105 Sessions: 1 Processed: 0 Uptime: 15m 11s
* PID: 29114 Sessions: 1 Processed: 0 Uptime: 14m 0s
* PID: 29121 Sessions: 1 Processed: 0 Uptime: 14m 0s
* PID: 29130 Sessions: 1 Processed: 0 Uptime: 14m 0s
* PID: 29138 Sessions: 1 Processed: 0 Uptime: 14m 0s
First, is this normal?
Second, possible causes?
For anyone having this issue of Rails hanging... If you are running on a limited memory VPS, check and make sure you tune your max_pool so that you don't spawn too many instances of the app for your system to handle. The default is 6 which is apparently too many for memory strapped VPS's.
Docs about max pool setting:
http://www.modrails.com/documentation/Users%20guide%20Nginx.html#PassengerMaxPoolSize
It may be that some process survive from earlier versions of your app. Our app's Rack processes each point to a specific release of our app.
95171 2491.8 MB 4.8 MB Rack: /Deploy/theapp/releases/20120530013237
And there were multiple processes pointing to many different releases. Which leads me to conclude these are left over when the app is restarted.
I thought may be that touching the tmp/restart.txt instead of restarting apache has this effect. So I set :use_sudo to true, and am restarting with 'run "#{try_sudo} /opt/local/apache2/bin/apachectl graceful"' instead and the only Rack processes I see are those that were just started.

Odd restarts on specific passenger processes

I've recently been working with our passenger setup and monitoring our app via NewRelic's RPM. As of the last week I've noticed that the production version of our app restarts about once an hour (doesn't track to exactly once an hour, it's seemingly random, and only happens during the day that I can tell - though there are seldom requests at night so I never see the startup blip). However the other sites on the same box do not.
Taking a look at passenger-status I see this:
----------- Domains -----------
/web/marketing/current:
PID: 2897 Sessions: 0 Processed: 178 Uptime: 22h 35m 58s
/web/demo/current:
PID: 11664 Sessions: 0 Processed: 58 Uptime: 17h 14m 59s
PID: 11026 Sessions: 0 Processed: 20 Uptime: 17h 50m 21s
/web/production/current:
PID: 20103 Sessions: 0 Processed: 12 Uptime: 9m 49s
PID: 20107 Sessions: 0 Processed: 3 Uptime: 9m 49s
PID: 20099 Sessions: 0 Processed: 20 Uptime: 9m 49s
PID: 20032 Sessions: 0 Processed: 20 Uptime: 11m 46s
PID: 20105 Sessions: 0 Processed: 17 Uptime: 9m 49s
PID: 20101 Sessions: 0 Processed: 2 Uptime: 9m 49s
PID: 20110 Sessions: 0 Processed: 1 Uptime: 9m 43s
Our passenger setup is currently:
PassengerRoot /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.15
PassengerRuby /usr/local/bin/ruby_gc_wrapper
PassengerMaxPoolSize 20
PassengerUseGlobalQueue on
PassengerStatThrottleRate 120
PassengerPoolIdleTime 0
RailsSpawnMethod smart
RailsFrameworkSpawnerIdleTime 0
RailsAppSpawnerIdleTime 0
and ruby_gc_wrapper looks like:
#!/bin/sh
# wrap ruby with gc tuning parameters
export RUBY_HEAP_MIN_SLOTS=500000
export RUBY_HEAP_SLOTS_INCREMENT=250000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=50000000
export RUBY_HEAP_FREE_MIN=4096
exec "/usr/local/bin/ruby" "$#"
From what I understand PassengerPoolIdleTime 0 should prevent the app from timing out. The only difference, that I know of, between the demo instance and the production instance is how much more often the prod one is called. However I don't have PassengerMaxRequests set anywhere so I'm baffled as to why it'd suddenly restart like this. I've looked at logrotate, monit and others to see if there are any outside processes messing with apache2 but if that were happening I'd expect all processes to have the same uptime.
Really rather strange. Any clue?
After closer inspection the restarts were far more regular than I initially thought. While they were pegged to a specific time they were generally around 1 hour apart for the last 3 hours and around 15 minutes after the hour. Turns out there's one thing that runs 15 after the hour on that box: chef.
Now the baffling thing is why it would restart only one of the applications and not all of them. That I still don't know but there are possibilities in there. Either way, disable chef automatically running (which I prefer not to do in production anyway) and now I have instances that are up for 5 hours, and a tiny response time. Beautiful.

Resources