I have one rails app run with Phusion Passenger as a standalone server with the command bundle exec passenger start --port 8000 --user ubuntu --daemonize.
The issue is that Passenger launches too many processes for my work and consumes quite a lot of memory. The server is used for my private work, so there is almost no service request. How can I control the number of processes with Phusion Passenger? What configuration option should be minimum in memory consumption?
Edit
With --max-pool-size 1, I don't see dramatic improve; I still have multiple RubyApp and preloaders.
Edit 2 (working with nginx)
From https://www.phusionpassenger.com/documentation/Users%20guide%20Nginx%203.0.html I could learn more about the options that I can add to nginx.conf file.
passenger_max_pool_size 1;
passenger_pool_idle_time 1;
passenger-status shows much less memory usage (only one pool).
buntu#ip-172-31-63-19 public> sudo passenger-status
Version : 5.0.21
Date : 2015-11-06 05:50:24 +0000
Instance: aSCyt3IW (nginx/1.8.0 Phusion_Passenger/5.0.21)
----------- General information -----------
Max pool size : 1
App groups : 1
Processes : 1
Requests in top-level queue : 0
----------- Application groups -----------
/home/ubuntu/webapp/rails/passenger-ruby-rails-demo/public (development):
App root: /home/ubuntu/webapp/rails/passenger-ruby-rails-demo
Requests in queue: 0
* PID: 3099 Sessions: 0 Processed: 49 Uptime: 33s
CPU: 1% Memory : 69M Last used: 11s ago
Try this:
passenger start --max-pool-size <NUMBER>
Related
We are uses Ruby on Rails v5.2 and Puma v4, rotating the Rails application log with logrotate, and restarting puma when it is rotated.
We uses puma restart with pumactl phased-restart, but we used to send SIGUSR1 signal to restart it.
Specifically, it is the following code.
test -s $pid && kill -USR1 "$(cat /path/to/app/tmp/pids/puma.pid)"
Refs: https://github.com/puma/puma/blob/master/docs/restart.md
Although puma restarts without any problem for a while, after several restarts, unnecessary workers will be left as shown below. In the following, there are two worker 7s. Perhaps pid = 15241 seems to be an unnecessary worker.
$ ps aux | grep [p]uma
username 10096 22.8 3.0 3357088 1012940 ? Sl 13:01 1:12 puma: cluster worker 0: 14749 [app]
username 10135 26.6 3.0 3562476 1016716 ? Sl 13:01 1:22 puma: cluster worker 1: 14749 [app]
username 10196 17.9 3.4 3688000 1145328 ? Sl 13:01 0:54 puma: cluster worker 2: 14749 [app]
username 10243 15.6 3.9 3753452 1287424 ? Sl 13:01 0:46 puma: cluster worker 3: 14749 [app]
username 10325 16.8 3.6 3758280 1189004 ? Sl 13:01 0:48 puma: cluster worker 4: 14749 [app]
username 10381 15.9 3.1 3423564 1034904 ? Sl 13:01 0:45 puma: cluster worker 5: 14749 [app]
username 10420 16.6 3.2 3689844 1070112 ? Sl 13:01 0:46 puma: cluster worker 6: 14749 [app]
username 10464 21.9 2.6 3155296 888148 ? Sl 13:01 0:59 puma: cluster worker 7: 14749 [app]
username 14749 0.0 0.0 216884 20340 ? Sl 7月19 0:56 puma 4.0.0 (tcp://0.0.0.0:9292) [app]
username 15241 60.7 9.2 5273704 3049272 ? Sl 11:02 75:20 puma: cluster worker 7: 14749 [app]
My question is, I would like to know how to restart puma without leaving unnecessary workers.
Or I would like to know if there is a way to find out what each worker is doing to find out why this is the case.
Or, please tell me if there is a way to put in logs etc and check the situation after the fact.
The main usage version is as follows.
Ruby: 2.4.2
Ruby on Rails: 5.2.2.1
Puma: 4.0.0
As mentioned above, puma restarts are performed by pumactl phased-restart, but the situation was the same by sending SIGUSR1 signal.
Puma's phased-restart is a rolling restart. Each worker is restarted one by one. The other thing, is that the command returns before the workers have been restarted.
I also use phased-restart, but I don't have logrotated restart Puma, but instead just have it copytruncate the log files. It allows the file descriptors to remain the same. Now, maybe there are unforeseen problems with my approach, but I haven't seen any issues yet.
Example logrotate config:
/path/to/rails/app/shared/log/*.log {
compress
copytruncate
daily
dateext
dateformat .%Y%m%d
dateyesterday
delaycompress
missingok
notifempty
rotate 36
su app_user app_user
}
I am attempting to run Celery as a Windows Service using Supervisord. I followed the configuration laid out on the Celery site and [here][1]. I have set up a virtual environment to run supervisord through cygwin.I have highlighted the lines I think are most important (with **). It appears supervisord and rabbitMQ are working. The problem is with Celery.
I setup the service with the commands:
$ cygrunsrv --install supervisord --path /usr/bin/python --args "/usr/bin/supervisord -n -c /usr/etc/supervisord.conf"
$ supervisord
UPDATED: I now have the following in my supervisord.log file:
2014-08-07 12:46:40,676 INFO exited: celery (exit status 1; not expected)
2014-08-07 12:47:07,187 INFO Increased RLIMIT_NOFILE limit to 1024
2014-08-07 12:47:07,238 INFO RPC interface 'supervisor' initialized
2014-08-07 12:47:07,251 INFO daemonizing the supervisord process
2014-08-07 12:47:07,253 INFO supervisord started with pid 7508
2014-08-07 12:47:08,272 INFO spawned: 'celery' with pid 8056
**2014-08-07 12:47:08,833 INFO success: celery entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)**
The config file is:
[inet_http_server] ; inet (TCP) server disabled by default
port=127.0.0.1:8072 ; (ip_address:port specifier, *:port for all iface)
username = user
password = 123
[supervisord]
logfile= /home/HBA/venv/logFiles/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
;user=HBA ; (default is current user, required if root)
childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=http://127.0.0.1:8072 ; use an http:// url to specify an inet socket
[program:celery]
command= celery worker -A runLogProject --loglevel=INFO ; the program (relative uses PATH, can take args)
directory= /home/HBA/venv/runLogProject
environment=PATH="/home/HBA/venv/;/home/HBA/venv/Scripts/"
numprocs=1
stdout_logfile= /home/HBA/venv/logFiles/%(program_name)s/worker.log ; stdout log path, NONE for none; default AUTO
stderr_logfile= /home/HBA/venv/logFiles/%(program_name)s/worker.log ; stderr log path, NONE for none; default AUTO
autostart=true ; start at supervisord start (default: true)
autorestart=true ; whether/when to restart (default: unexpected)
startsecs=0
stopwaitsecs=1000
killasgroup=true
My celery log file gives me:
**[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-4' pid:12284 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-3' pid:4432 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-2' pid:9120 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-1' pid:6280 exited with 'signal -1'**
C:\Python27\lib\site-packages\celery\apps\worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2014-08-07 19:47:08,822: WARNING/MainProcess] C:\Python27\lib\site-packages\celery\apps\worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
**[2014-08-07 19:47:08,944: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2014-08-07 19:47:08,954: INFO/MainProcess] mingle: searching for neighbors
[2014-08-07 19:47:09,963: INFO/MainProcess] mingle: all alone**
C:\Python27\lib\site-packages\celery\fixups\django.py:236: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2014-08-07 19:47:09,982: WARNING/MainProcess] C:\Python27\lib\site-packages\celery\fixups\django.py:236: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2014-08-07 19:47:09,982: WARNING/MainProcess] celery#CORONADO ready.
I solved my issue using the following command: /home/HBA/venv/Scripts/celery worker -A runLogProject --loglevel=INFO
My biggest issue was an unfamiliarity with virtual environments. I needed to make sure the files were in the correct folders within the venv.
I have a Rails 4 app that is experiencing a problem with capistrano deployment. After each capistrano deploy, the web app hangs (http requests are on hold and the website cannot be reached via browser) for a long period of time, in some cases up to 20 or even 30 minutes.
I like to think the deploy is fairly standard. The general procedure is:
git push the repo to remote server
standard capistrano :publishing
Bundle install with capistrano/bundler
touch tmp/restart
And there's nothing tricky in the Capfile:
require 'capistrano/setup'
require 'capistrano/deploy'
require 'capistrano/bundler'
Running a cap server deploy completes successfully and returns in about 10 seconds. But after it's done, the server goes down and stays down ("freezes up") for a really long time, until finally it comes back and the new version is deployed.
I am able to ssh into the server (although the prompt is super laggy) while this is happening and I can see that all of its resources are being used up by ruby / Passenger RackApp:
top:
>top
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
.. .. .. .. .. . 50.1 18.4 .. ruby
.. .. .. .. .. . 49.8 18.7 .. ruby
.. .. .. .. .. . 49.9 21.1 .. ruby
.. .. .. .. .. . 49.5 20.9 .. ruby
This server has two cores so that is all of its available CPU.
ps aux shows several instances of Passenger RackApp all churning away:
>ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user .... 11.0 18.3 .. .. .. .. .. 2:05 Passenger RackApp: /www/myRailsApp/current
user .... 12.1 18.3 .. .. .. .. .. 1:35 Passenger RackApp: /www/myRailsApp/current
user .... 12.1 20.3 .. .. .. .. .. 1:33 Passenger RackApp: /www/myRailsApp/current
user .... 14.7 21.0 .. .. .. .. .. 1:21 Passenger RackApp: /www/myRailsApp/current
user .... 5.6 12.5 .. .. .. .. .. 0:24 Passenger RackApp: /www/myRailsApp/current
user .... 5.8 6.6 .. .. .. .. .. 0:07 Passenger AppPreloader: /www/myRailsApp/current
user .... 0.7 7.6 .. .. .. .. .. 0:01 Passenger RackApp: /www/myRailsApp/current
Passenger-status looks generally like this:
>passenger-status
Version : 4.0.41
Date : 2014-07-23 15:25:11 +0000
Instance: 19086
----------- General information -----------
Max pool size : 6
Processes : 3
Requests in top-level queue : 0
----------- Application groups -----------
/www/myRailsApp/current#default:
App root: /www/myRailsApp/current
Requests in queue: 0
* PID: 3173 Sessions: 1 Processed: 1 Uptime: 3m 7s
CPU: 70% Memory : 426M Last used: 3m 7s ago
* PID: 3194 Sessions: 1 Processed: 1 Uptime: 3m 1s
CPU: 69% Memory : 361M Last used: 3m 0s ago
* PID: 3220 Sessions: 1 Processed: 1 Uptime: 2m 40s
CPU: 67% Memory : 349M Last used: 2m 39s ago
The logs (nginx, rails) do not show anything.
Versions:
Rails 4.1.0
nginx version: nginx/1.4.1
Passenger Version : 4.0.41
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
My main questions are:
What exactly is going on here?
How can I diagnose this more effectively?
How can I setup capistrano so that it deploys rapidly?
It appears that something in a Capistrano deploy causes your apps (not Passenger itself) to do some heavy processing that takes 20 minutes. You can see that not only in the CPU usage, but also in the "Sessions: 1" label. This means that all those processes are, at that time, busy processing a request
The best way to diagnose this is to find out what your app is doing. Read the Phusion blog to learn how to do that. In particular, you can use the SIGQUIT trick to get a backtrace for your processes.
You can also run passenger-status --show=requests to see which requests your apps are stuck at.
Problem:
I am trying to get sphinx running again after server reboot. There seems to be no sphinx.conf file when I try to start it running:
>searchd
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
I have run:
rake thinking_sphinx:configure
rake thinking_sphinx:index
rake thinking_sphinx:start
The problem is for some reason no etc/sphinxsearch/sphinx.conf file is being created... I am new to thinking_sphinx and this might not be the only problem (with the site), but it doesn't seem to be set up fully. For out put and more information read below:
Background info:
I am working on a project I didn't set up initially. We rebooted the server to see some of the changes we made in a constants file. But after the reboot the project no longer displays when you navigate to the site. When you put in the straight ip address it just says "Welcome to Nginx".
The port is open and working through our hosting server, so I was told I have to restart some services. One of the issues I came upon was with thinking_sphinx. This was the rake tasks for sphinx site I referenced. As well as common configuration issues for sphinx.
I set up the sphinx.yml development paths (we aren't using production). Then I ran
>rake thinking_sphinx:index
which seems to have worked even though it output some warnings:
Generating Configuration to /home/potato/streetpotato/config/development.sphinx.conf
(0.2ms) SELECT ##global.sql_mode, ##session.sql_mode;
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/home/potato/streetpotato/config/development.sphinx.conf'...
indexing index 'bar_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14080 kb
collected 249 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 249 docs, 32394 bytes
total 0.254 sec, 127298 bytes/sec, 978.49 docs/sec
indexing index 'bar_delta'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14080 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.003 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'bar'...
indexing index 'synonym_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 103 bytes
total 0.003 sec, 30356 bytes/sec, 884.17 docs/sec
indexing index 'synonym_delta'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.002 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'synonym'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 100 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 100 docs, 3146 bytes
total 0.013 sec, 239348 bytes/sec, 7608.03 docs/sec
skipping non-plain index 'user'...
total 11 reads, 0.000 sec, 3.8 kb/call avg, 0.0 msec/call avg
total 37 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg
Then I ran
>rake thinking_sphinx:configure
Generating Configuration to /home/potato/streetpotato/config/development.sphinx.conf
(0.2ms) SELECT ##global.sql_mode, ##session.sql_mode;
Lastly running:
>rake thinking_sphinx:start
Started successfully (pid 29623).
Now even though my log says:
[Fri Nov 16 19:34:29.820 2012] [29623] accepting connections
There is still no sphinx.conf file being generated and when I try to use the searchd command it still gives me the error...
>searchd --stop
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
I am at a loss, I know this is super long but only because I am so lost and trying to give as much information as possible. I got further then I did yesterday with this but it still doesn't seem to be fully working. I might have to do more set up with unicorn or thin as well. I'm just trying to figure out how to get the site back up and running again... If any one has run into similar issues with their site going down after reboot and got it back up (specifically a rails project on Nginx and unicorn or thin using sphinx) any insight would be appreciated.
Thanks,
Alan
Calm down!! :-)
Firstly, you don't need a /etc/sphinxsearch/sphinx.conf file; that is just the default file that searchd tries to use when you don't specify any configuration file.
As your log output shows, your rails application is using /home/potato/streetpotato/config/development.sphinx.conf file when it starts the searchd process.
Run ps -fe | grep searchd on your dev machine; you should see something like this as the output:
501 14128 1 0 0:00.00 ttys004 0:00.00 searchd --pidfile --config /home/potato/streetpotato/config/development.sphinx.conf
501 14130 13546 0 0:00.00 ttys004 0:00.01 grep searchd
So rails app calls searchd with --config /home/potato/streetpotato/config/development.sphinx.conf argument, to specify a different conf file.
From your logs, it is clear that thinkingsphinx is running fine. You can confirm it further by logging into rails console and running a search method on one of the models which have thinking_sphinx indexes defined on them.
Eg: If your app has an Article model as shown in the above link, the following command will show all articles having National Parks in them:
$ rails console
> Article.search( "National Parks" )
=> [#<Article id: 15,... >, #<Article id: 22,...>,...]
The real problem is the application not showing after restarting the server. That has nothing to do with thinking sphinx which is running fine.
Try rolling back all the changes made in the constants file that you mention above, and make sure the application is working fine. Then start making the changes one by one and isolate the one change that breaks your application.
So yeah, this is a hole in ThinkingSphinx (IMHO) -- you can start the sphinxd server using the various rake tasks (which generate the config as needed) ... but this doesn't work in production.
On a project I worked in last year (running on a Linux server) we created an /etc/init.d script to start sphinxd -- it takes options including a path to the configuration file. We did our deploys with capistrano, and put generated code in app/shared -- a directory outside of the source tree. I believe there are some predefined capistrano tasks that will rebuild the Rails-specific config files when models change or otherwise affect what Sphinx does (same as the rake task you mention).
This was one of those cases for us where we had been putting off site search for a long time, and one of our developers got it "all set up" in an afternoon. Getting it deployed took a lot more work.
(Just saw answer from #prakash-murthy -- he provides some details of how to specify config path when you initialized sphinxd). But the trick is to have it start when the system starts and pointing to the config that ThinkingSphinx generates.)
Ok so after a day n a half I finally set it all up and got it running (it was more then just sphinx). I also had to get nginx and unicorn up and running in the background, since we didn't have scripts set up to restart them when the server was rebooted...
When rebooting the server you have to restart some services before the app will be accessible:
1) thinking_sphinx
reference sites
http://pat.github.com/ts/en/rake_tasks.html
http://www.claytonlz.com/2010/09/thinkingsphinx-conf-problems/
a)create/modify app/config/sphinx.yml
development:
morphology: stem_en
port: 9312
bin_path: "/usr/bin" # set up the path to binary for searchd
searchd_binary_name: searchd
indexer_binary_name: indexer
#mem_limit: 128M
test:
morphology: stem_en
port: 9312
mem_limit: 128M
production:
morphology: stem_en
port: 9312
mem_limit: 512M
# the searchd ip, in case it's not on localhost
# address: 10.10.0.0
# this is by default included in db/sphinx
# searchd_file_path: "/path/to/shared/folder/sphinx"
b)rake thinking_sphinx:index
c)rake thinking_sphinx:configure # creates config/development.sphinx.conf which helps define sphinx's indexing
d)# then you have to start sphinx, there are 2 ways to do this
rake thinking_sphinx:start
rake thinking_sphinx:stop
OR
searchd
searchd --stop
# only the rake commands worked for me, when I tried to run searchd
# I got an error FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
# for some reason we dont have a sphinx.conf file, but the rake commands work without it
e)# once you start thinking_sphinx check log/searchd.log file for the line
[Fri Nov 16 19:34:29.820 2012] [29623] accepting connections
2) nginx
reference site:
http://wiki.nginx.org/CommandLine
a) check that nginx is up and running
i) start server
# to check where nginx resides type in this into server console
which nginx
# whatever path it gives you is how you start the server this is my path
/usr/sbin/nginx
ii) stop server
/usr/sbin/nginx -s stop # use the path given by which command
3) unicorn (starting app server)
reference site:
http://codelevy.com/2010/02/09/getting-started-with-unicorn.html
a) test if unicorn will run after previous changes
unicorn_rails -p 3000
# the site should now be up and running, check that it is
# console should now log the different actions you do on the site
b) create unicorn.rb in config folder (if none is there)
# only start this step if the step above got the site running
# close the console or exit the process you started above
# contents of unicorn.rb
worker_processes 2 #(starts 2 child processes, not completely neccissary)
preload_app true
timeout 30
listen 3000
after_fork do |server, worker|
ActiveRecord::Base.establish_connection
end
c) run unicorn in the background
# make sure you exited the process above before running this
unicorn_rails -c config/unicorn.rb -D
# this was giving me an error that it said was logged by stderr
# I got the command to run by adding a command to the front
http://stackoverflow.com/questions/2325152/check-for-stdout-or-stderr
exec 2> /dev/null unicorn_rails -c config/unicorn.rb -D
d) (optional) check stats from starting unicorn
i) pgrep -lf unicorn_rails
#sample output
5374 unicorn_rails master -c config/unicorn.rb -D
5388 unicorn_rails worker[0] -c config/unicorn.rb -D # not needed currently
5391 unicorn_rails worker[1] -c config/unicorn.rb -D # not needed currently
ii) cat tmp/pids/unicorn.pid # from inside the streetpotato folder
#sample output
5374
I have a m1.small instance in amazon with 8GB hard disk space on which my rails application runs. It runs smoothly for 2 weeks and after that it crashes saying the memory is full.
App is running on rails 3.1.1, unicorn and nginx
I simply dont understand what is taking 13G ?
I killed unicorn and 'free' command is showing some free space while df is still saying 100%
I rebooted the instance and everything started working fine.
free (before killing unicorn)
total used free shared buffers cached
Mem: 1705192 1671580 33612 0 321816 405288
-/+ buffers/cache: 944476 760716
Swap: 917500 50812 866688
df -l (before killing unicorn)
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 8256952 7837520 4 100% /
none 847464 120 847344 1% /dev
none 852596 0 852596 0% /dev/shm
none 852596 56 852540 1% /var/run
none 852596 0 852596 0% /var/lock
/dev/xvda2 153899044 192068 145889352 1% /mnt
/dev/xvdf 51606140 10276704 38707996 21% /data
sudo du -hc --max-depth=1 (before killing unicorn)
28K ./root
6.6M ./etc
4.0K ./opt
9.7G ./data
1.7G ./usr
4.0K ./media
du: cannot access `./proc/27220/task/27220/fd/4': No such file or directory
du: cannot access `./proc/27220/task/27220/fdinfo/4': No such file or directory
du: cannot access `./proc/27220/fd/4': No such file or directory
du: cannot access `./proc/27220/fdinfo/4': No such file or directory
0 ./proc
14M ./boot
120K ./dev
1.1G ./home
66M ./lib
4.0K ./selinux
6.5M ./sbin
6.5M ./bin
4.0K ./srv
148K ./tmp
16K ./lost+found
20K ./mnt
0 ./sys
253M ./var
13G .
13G total
free (after killing unicorn)
total used free shared buffers cached
Mem: 1705192 985876 **719316** 0 365536 228576
-/+ buffers/cache: 391764 1313428
Swap: 917500 46176 871324
df -l (after killing unicorn)
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 8256952 7837516 8 100% /
none 847464 120 847344 1% /dev
none 852596 0 852596 0% /dev/shm
none 852596 56 852540 1% /var/run
none 852596 0 852596 0% /var/lock
/dev/xvda2 153899044 192068 145889352 1% /mnt
/dev/xvdf 51606140 10276704 38707996 21% /data
unicorn.rb
rails_env = 'production'
working_directory "/home/user/app_name"
worker_processes 5
preload_app true
timeout 60
rails_root = "/home/user/app_name"
listen "#{rails_root}/tmp/sockets/unicorn.sock", :backlog => 2048
# listen 3000, :tcp_nopush => false
pid "#{rails_root}/tmp/pids/unicorn.pid"
stderr_path "#{rails_root}/log/unicorn/unicorn.err.log"
stdout_path "#{rails_root}/log/unicorn/unicorn.out.log"
GC.copy_on_write_friendly = true if GC.respond_to?(:copy_on_write_friendly=)
before_fork do |server, worker|
ActiveRecord::Base.connection.disconnect!
##
# When sent a USR2, Unicorn will suffix its pidfile with .oldbin and
# immediately start loading up a new version of itself (loaded with a new
# version of our app). When this new Unicorn is completely loaded
# it will begin spawning workers. The first worker spawned will check to
# see if an .oldbin pidfile exists. If so, this means we've just booted up
# a new Unicorn and need to tell the old one that it can now die. To do so
# we send it a QUIT.
#
# Using this method we get 0 downtime deploys.
old_pid = "#{rails_root}/tmp/pids/unicorn.pid.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
# someone else did our job for us
end
end
end
after_fork do |server, worker|
ActiveRecord::Base.establish_connection
worker.user('rails', 'rails') if Process.euid == 0 && rails_env == 'production'
end
i've just released 'unicorn-worker-killer' gem. This enables you to kill Unicorn worker based on 1) Max number of requests and 2) Process memory size (RSS), without affecting the request.
It's really easy to use. No external tool is required. At first, please add this line to your Gemfile.
gem 'unicorn-worker-killer'
Then, please add the following lines to your config.ru.
# Unicorn self-process killer
require 'unicorn/worker_killer'
# Max requests per worker
use Unicorn::WorkerKiller::MaxRequests, 10240 + Random.rand(10240)
# Max memory size (RSS) per worker
use Unicorn::WorkerKiller::Oom, (96 + Random.rand(32)) * 1024**2
It's highly recommended to randomize the threshold to avoid killing all workers at once.
I think you are conflating memory usage and disk space usage. It looks like Unicorn and its children were using around 500 MB of memory, you look at the second "-/+ buffers/cache:" number to see the real free memory. As far as the disk space goes, my bet goes on some sort of log file or something like that going nuts. You should do a du -h in the data directory to find out what exactly is using so much storage. As a final suggestion, it's a little known fact that Ruby never returns memory back to the OS if it allocates it. It DOES still use it internally, but once Ruby grabs some memory the only way to get it to yield the unused memory back to the OS is to quit the process. For example, if you happen to have a process that spikes your memory usage to 500 MB, you won't be able to use that 500 MB again, even after the request has completed and the GC cycle has run. However, Ruby will reuse that allocated memory for future requests, so it is unlikely to grow further.
Finally, Sergei mentions God to monitor the process memory. If you are interested in using this, there is already a good config file here. Be sure to read the associated article as there are key things in the unicorn config file that this god config assumes you have.
As Preston mentioned you don't have a memory problem (over 40% free), you have a disk full problem. du reports most of the storage is consumed in /root/data.
You could use find to identify very large files, eg, the following will show all files under that dir greater than 100MB in size.
sudo find /root/data -size +100M
If unicorn is still running, lsof (LiSt Open Files) can show what files are in use by your running programs or by a specific set of processes (-p PID), eg:
sudo lsof | awk '$5 ~/REG/ && $7 > 100000000 { print }'
will show you open files greater than 100MB in size
You can set up god to watch your unicorn workers and kill them if they eat too much memory. Unicorn master process will then fork another worker to replace this one. Problem worked around. :-)
Try removing newrelic for your app if you are using newrelic. Newrelic rpm gem itself leaking the memory. I had the same issue and I stratched my head for almost 10day to figure out the issue.
Hope that help you.
I contact newrelic support team and below is their reply.
Thanks for contacting support. I am deeply sorry for the frustrating
experience you have had. As a performance monitoring tool, our
intention is "first do no harm", and we take these kind of issues very
seriously.
We recently identified the cause of this issue and have released a
patch to resolve it. (see https://newrelic.com/docs/releases/ruby). We
hope you'll consider resuming monitoring with New Relic with this fix.
If you are interested in doing so, make sure you are using at least
v3.6.8.168 from now on.
Please let us know if you have any addition questions or concerns.
We're eager to address them.
Even if I tried update newrelic gem but it still leaking the memory. Finally I have to remove the rewrelic although it is a great tool but we can not use it at such cost(memory leak).
Hope that help you.