I checked my applications, and they're running a huge amount of memory which is crashing my server.
Here's my ps :
RSS COMMAND
1560 sshd: shadyfront#pts/0
1904 -bash
1712 PassengerNginxHelperServer /home/shadyfront/webapps/truejersey/gems/gems/p
8540 Passenger spawn server
612 nginx: master process /home/shadyfront/webapps/truejersey/nginx/sbin/nginx
1368 nginx: worker process
94796 Rails: /home/shadyfront/webapps/truejersey/True-Jersey
1580 PassengerNginxHelperServer /home/shadyfront/webapps/age_of_revolt/gems/gem
8152 Passenger spawn server
548 nginx: master process /home/shadyfront/webapps/age_of_revolt/nginx/sbin/ng
1240 nginx: worker process
92196 Rack: /home/shadyfront/webapps/age_of_revolt/Age-of-Revolt
904 ps -u shadyfront -o rss,command
Is this abnormally large for an e-commerce application?
If you are on linux, You can use
ulimit
http://ss64.com/bash/ulimit.html
Not sure why it is eating your memory though.
If your using a 64-bit OS then it's fairly normal.
RSS COMMAND
89824 Rack: /var/www/vhosts/zmdev.net/zmdev # RefineryCMS on Passenger
148216 thin server (0.0.0.0:5000) # Redmine
238856 thin server (0.0.0.0:3000) # Spree after a couple of weeks
140260 thin server (0.0.0.0:3000) # Spree after a fresh reboot
All of these are 64-bit OSes, there are significant memory reductions using 32-bit OS
Here's the exact same Spree application running Webrick in my dev environment using 32-but Ubuntu
RSS COMMAND
58904 /home/chris/.rvm/rubies/ruby-1.9.2-p180/bin/ruby script/rails s
Related
Related questions/issues not directly addressing this issue
Puma won't die when killing rails server with ctrl-c
Can't kill a process - stopping rails server
Unstoppable server - rails
How to stop/kill server (development) in rubymine
Can't stop rails server
Signal sent by the stopping button
Send SIGINT signal to running programm.
Sending SIGTERM to a server process when running via rails s (in clustered mode) fails to stop the server successfully
Appropriately wait for worker child process when shutting down via SIGTERM
Signal traps exit with status 0 instead of killing own process
Rails server doesn't stop on windows bash killing
Rubymine on Windows with WSL based interpreter: Puma fails to kill the server process when it termiinates
WSL process tree survives a taskkill
[WSL] Rubymine does not kill rails process, need to manually do it.
Specifications
Windows 10 (10.0.17134 Build 17134 with Feature Update 1803) running Ubuntu 18.04.1 LTS via Windows Subsystem for Linux
RubyMine 2018.2.3
Puma 3.12.0
Rails 5.1.4
Ruby 2.5.1p57
Issue
Starting a development environment with the green start arrow in RubyMine starts a Puma server as expected. The server console log shows:
]0;Ubuntu^Z
=> Booting Puma
=> Rails 5.1.4 application starting in development
=> Run `rails server -h` for more startup options
[2331] Puma starting in cluster mode...
[2331] * Version 3.9.1 (ruby 2.5.1-p57), codename: Private Caller
[2331] * Min threads: 5, max threads: 5
[2331] * Environment: development
[2331] * Process workers: 2
[2331] * Preloading application
[2331] * Listening on tcp://127.0.0.1:3000
[2331] Use Ctrl-C to stop
[2331] - Worker 0 (pid: 2339) booted, phase: 0
[2331] - Worker 1 (pid: 2343) booted, phase: 0
(Not sure what's with the broken characters around Ubuntu at the top, but that's another issue...)
I can monitor the server by issuing the ps aux | grep puma command in console. The output is as follows:
samort7 2456 16.9 0.3 430504 66420 tty5 Sl 23:58 0:05 puma 3.9.1 (tcp://127.0.0.1:3000) [rails-sample-app]
samort7 2464 1.6 0.3 849172 54052 tty5 Sl 23:58 0:00 puma: cluster worker 0: 2456 [rails-sample-app]
samort7 2468 1.5 0.3 849176 54052 tty5 Sl 23:58 0:00 puma: cluster worker 1: 2456 [rails-sample-app]
samort7 2493 0.0 0.0 14804 1200 tty4 S 23:59 0:00 grep --color=auto puma
Clicking the red "Stop" button in IntelliJ causes this line to show up in the server console log:
Process finished with exit code 1
However, running ps aux| grep puma again reveals that the puma server is still running:
samort7 2464 0.2 0.3 849172 54340 ? Sl Oct12 0:00 puma: cluster worker 0: 2456 [rails-sample-app]
samort7 2468 0.2 0.3 849176 54332 ? Sl Oct12 0:00 puma: cluster worker 1: 2456 [rails-sample-app]
samort7 2505 0.0 0.0 14804 1200 tty4 S 00:01 0:00 grep --color=auto puma
Repeatedly clicking start and stop will cause more and more zombie processes to be created. These processes can be killed by issuing a pkill -9 -f puma command, but that is less than ideal and defeats the whole purpose of the stop button.
Expected Result
Running the server directly from the terminal with rails s -b 127.0.0.1 -p 3000 starts the server as before and then pressing Ctrl+C gives the following output and doesn't create zombie processes:
[2688] - Gracefully shutting down workers...
[2688] === puma shutdown: 2018-10-13 00:12:00 -0400 ===
[2688] - Goodbye!
Exiting
Analysis
According to the RubyMine documentation, clicking the stop button:
invokes soft kill allowing the application to catch the SIGINT event and perform graceful termination (on Windows, the Ctrl+C event is emulated).
Despite what the documentation claims, this does not appear to be happening. It seems that the stop button is actually issuing a SIGKILL signal, stopping the process without allowing it to gracefully clean up.
I have also noticed that if I close all terminal windows both inside and outside of RubyMine, then open up a new terminal window, the zombie processes are gone.
Question
How can I get RubyMine to issue the correct SIGINT signal upon pressing the red stop button so that the Puma server can be shut down gracefully without creating zombie processes?
For reference, I am experiencing this issue in this commit of my codebase (a chapter from Michael Hartle's Rails Tutorial).
I'm running a basic Rails 4 (ruby 2.1.4) app on Heroku with a Puma config as follows:
workers Integer(ENV['PUMA_WORKERS'] || 1)
threads Integer(ENV['MIN_THREADS'] || 6), Integer(ENV['MAX_THREADS'] || 6)
I currently do not have any ENV vars set so I should be defaulting to 1 worker.
The problem is, that while investigating a potential memory leak, it appears that 2 'instances' of my web.1 dyno are running, at least according to NewRelic.
I have heroku labs:enable log-runtime-metrics enabled and it shows my memory footprint at ~400MB. On NewRelic it shows my footprint at ~200MB AVG across 2 'instances'.
heroku:ps shows:
=== web (1X): `bundle exec puma -C config/puma.rb`
web.1: up 2014/10/30 13:49:29 (~ 4h ago)
So why would NewRelic think I have 2 instances running? If I do a heroku:restart NewRelic will see only 1 instance for awhile and then bump up to 2. Is this something Heroku is doing but not reporting to me, or is it a Puma thing even though workers should be set to 1.
See the Feb 17, 2015 release of New Relic 3.10.0.279 which addresses this specific issue when tracking Puma instances. I'm guessing since your app is running on Heroku, you have preload_app! set in your Puma config so this should apply.
From the release notes:
Metrics no longer reported from Puma master processes.
When using Puma's cluster mode with the preload_app! configuration directive, the agent will no longer start its reporting thread in the Puma master process. This should result in more accurate instance counts, and more accurate stats on the Ruby VMs page (since the master process will be excluded).
I'm testing the updated on a project with a simular issue and it seems to be reporting more accurately.
It's because Puma has always 1 master process from which all the workers are spawned.
So, the instance count will come from the following:
1 (master process) + <N_WORKER>
I have a server with about 30 Ruby On Rails applications.
When I (re)start 1 of the 30 apps, then all other apps are not accessible and they seem to be waiting for the 1 app to (re)start.
Even the command: passenger-status, seems to wait untill the 1 app is (re)started.
Is this normal behavior?
Or how can this be fixed?
(Virtual) Server specifications:
CPU: 3 cores x 2.2ghz
Memory: 4GB
Hardisk: 40GB
Server software:
CentOS release 6.3 (Final)
Nginx version: nginx/1.0.12
Ruby 1.9.3p125 (2012-02-16 revision 34643)[x86_64-linux]
Phusion Passenger version 3.0.18
Nginx/Passenger config:
passenger_max_pool_size 50;
passenger_min_instances 1;
passenger_max_instances_per_app 2;
I'm happy to help with more details if needed.
Update **
Installed passenger enterprise and nginx, now the apps don't bother each other anymore with starting.. so I think the problem got fixed with passenger_rolling_restarts on;
Hmm... it looks like this may be a "feature" of Passenger's open-source version. From http://phusionpassenger.com/enterprise :
In the open source version of Phusion Passenger restarting an application involves shutting down all application processes and spawnining new ones. Because starting a new process can take a long time (depending on the application), visitors may experience slow responses while the restart is in progress. With rolling restarts, Phusion Passenger Enterprise restarts your application processes in the background.
So, the options would appear to be:
1) upgrade to the enterprise edition of Passenger
2) switch to some other server.
Yuck.
Actually, two issues are at work here:
When rolling restarts are not being used, your visitors will have to wait until the restart is finished. Just as what drosboro says.
In addition, Phusion Passenger 3 locks the entire application pool when spawning the first process for an application. During this time, no requests can be handled. Subsequent process spawns are done in the background, and rolling restart spawns are also done in the background, so they don't affect requests. This locking limitation has been entirely lifted in Phusion Passenger 4 (and of course, Phusion Passenger Enterprise 4): everything has been made asynchronous.
I've inherited the maintenance and development of a Ruby on Rails site that runs on Ruby 1.8.7 and Rails 2.3.2. While we try to deploy to Linux servers using Passenger as much as possible, my boss has told me that there we must be able to deploy to Windows at times for our clients.
I have installed my Rails app fine and it works perfectly when I test with the Webrick server. I have also installed Apache 2.2 which is serving up generic HTML pages perfectly. However, when I try to run my Rails app under Apache I get a 503 Service Temporarily Unavailable error
There is no error listed in the Apache logs but when I check the RoR logs it does show
127.0.0.1 - - [09/Aug/2012:10:31:02 +1000] "GET / HTTP/1.1" 503 323
127.0.0.1 - - [09/Aug/2012:10:31:02 +1000] "GET /favicon.ico HTTP/1.1" 503 323
and
[Thu Aug 09 10:31:06 2012] [error] proxy: BALANCER: (balancer://mmapscluster). All workers are in error state
[Thu Aug 09 10:31:07 2012] [error] proxy: BALANCER: (balancer://mmapscluster). All workers are in error state
As you may have guessed we are running Mongrel as a proxy server for performance reasons.
When I removed all of the proxying from the Apache configuration (incidentally restarting Apache is not enough for the proxy config - I had to reboot the entire machine), I got a seemingly endless list of the following Apache errors,
[notice] Parent: Created child process 1944
[notice] Child 1944: Child process is running
[notice] Parent: child process exited with status 255 -- Restarting.
[notice] Apache/2.2.15 (Win32) configured -- resuming normal operations
I have gone round and round on this and I've checked my config against a working installation that we have but I cannot see any differences in the setup. The only real difference is that the working one is running on a 32-bit machine and the failing one is running on a 64-bit machine.
Could this be the problem? Has anybody else had any similar types of problems running Apache on 64-bit machines?
At work we're running some high traffic sites in rails. We often get a problem with the following being spammed in the nginx error log:
2011/05/24 11:20:08 [error] 90248#0: *468577825 connect() to unix:/app_path/production/shared/system/unicorn.sock failed (61: Connection refused) while connecting to upstream
Our setup is nginx on the frontend server (load balancing), and unicorn on our 4 app servers. Each unicorn is running with 8 workers. The setup is very similar to the one GitHub uses.
Most of our content is cached, and when the request hits nginx it looks for the page in memcached and serves that it if can find it - otherwise the request goes to rails.
I can solve the above issue - SOMETIMES - by doing a pkill of the unicorn processes on the servers followed by a:
cap production unicorn:check (removing all the pid's)
cap production unicorn:start
Do you guys have any clue to how I can debug this issue? We don't have any significantly high load on our database server when these problems occurs..
Something killed your unicorn process on one of the servers, or it timed out. Or you have an old app server in your upstream app_server { } block that is no longer valid. Nginx will retry it from time to time. The default is to re-try another upstream if it gets a connection error, so hopefully your clients didn't notice anything.
I don't think this is a nginx issue for me, restarting nginx didn't help. It seems to be gunicorn...A quick and dirty way to avoid this is to recycle the gunicorn instances when the system is not being used, say 1AM for example if that is an acceptable maintenance window. I run gunicorn as a service that will come back up if killed so a pkill script takes care of the recycle/respawn:
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
exec /var/web/proj/server.sh
I am starting to wonder if this is at all related to memory allocation. I have MongoDB running on the same system and it reserves all the memory for itself but it is supposed to yield if other applications require more memory.
Other things worth a try is getting rid of eventlet or other dependent modules when running gunicorn. uWSGI can also be used as an alternative to gunicorn.