Unicorn completely ignores USR2 signal - ruby-on-rails

I'm experiencing a rather strange problem with unicorn on my production server.
Although the config file states preload_app true, sending USR2 to the master process does not generate any response, and it seems like unicorn is ignoring the signal altogether.
On another server sending USR2 changes the master process to and (old) state and starts a new master process successfully.
The problematic server is using RVM & bundler, so I'm assuming it's somehow related (the other one is vanilla ruby).
Sending signals other than USR2 (QUIT, HUP) works just fine.
Is there a way to trace what's going on behind the scenes here? Unicorn's log file is completely empty.

I suspect your issue might be that your Gemfile has changed, but you haven't started your unicorn in a way that allows USR2 to use the new Gemfile. It's therefore crashing when you try to restart the app.
Check your /log/unicorn.log for details of what might be failing.
If you're using Capistrano, specify the BUNDLE_GEMFILE as the symlink, e.g.:
run "cd #{current_path} && BUNDLE_GEMFILE=#{current_path}/Gemfile bundle exec unicorn -c #{config_path} -E #{unicorn_env} -D"
Here's a PR that demostrates this.

I experienced a similar problem, but my logs clearly identified the issue: sending USR2 would initially work on deployments, but as deployments got cleaned up, the release that the Unicorn master was initially started on would get deleted, so attempts at sending a USR2 signal would appear to do nothing / fail, with the error log stating:
forked child re-executing... 53
/var/www/application/releases/153565b36021c0b8c9cbab1cc373a9c5199073db/vendor/bundle/ruby/1.9.1/gems/unicorn-4.3.1/lib/unicorn/http_server.rb:439:in
`exec': No such file or directory -
/var/www/application/releases/153565b36021c0b8c9cbab1cc373a9c5199073db/vendor/bundle/ruby/1.9.1/bin/unicorn
(Errno::ENOENT)
The Unicorn documents mention this potential problem at http://unicorn.bogomips.org/Sandbox.html: "cleaning up old revisions will cause revision-specific installations of unicorn to go missing and upgrades to fail", which in my case meant USR2 appeared to 'do nothing'.
I'm using Chef's application recipe to deploy applications, which creates a symlinked vendor_bundle directory that is shared across deployments, but calling bundle exec unicorn still resulted in the original Unicorn master holding a path reference that included a specific release directory.
To fix it I had to call bundle exec /var/www/application/shared/vendor_bundle/ruby/1.9.1/bin/unicorn to ensure the Unicorn master had a path to a binary that would be valid from one deployment to the next. Once that was done I could deploy to me heart's content, and kill -USR2 PID would work as advertised.
The Unicorn docs mention you can manually change the binary path reference by setting the following in the Unicorn config file and sending HUP to reload Unicorn before sending a USR2 to fork a new master: Unicorn::HttpServer::START_CTX[0] = "/some/path/to/bin/unicorn"
Perhaps this is useful to some people in similar situations, but I didn't implement this as it appears specifying an absolute path to the shared unicorn binary was enough.

I've encountered a similar problem on my VDS. Strace'ing revealed the cause:
write(2, "E, [2011-07-23T04:40:27.240227 #19450] ERROR -- : Cannot allocate memory - fork(2) (Errno::ENOMEM) <...>
Try increasing the memory size, XEN memory on demand limits (they were too hard in my case), or maybe turn on overcommit, through the latter may have some serious unwanted side effects, so do it carefully.

Related

how to debug to find the code that is blocking the main thread

I have a Rails app running version 6.1.1 and currently, we use Ruby 2.7.2
While trying to upgrade to Ruby 3, I'm facing an interesting issue: some code apparently is blocking the main thread. When I try to boot the server or run the tests, the console stuck and I can't even stop the process, I have to kill it.
I tracked it down to one gem called valvat, used to validate EU VAT numbers. I opened an issue on its Github repo but the maintainer couldn't reproduce even using the same Gemfile.lock I have, which lead me to believe that it might not be just the gem, gotta be something else in my code.
This is what happens when I try to boot the server:
=> Booting Puma
=> Rails 6.1.1 application starting in development
=> Run `bin/rails server --help` for more startup options
^C^C^C
as one can see, I can't even stop it, the thread is now hanging and I can't tell exactly by whom.
I tried to run the specs with -b -w to see what I can see but got the same error: the thread hangs and the warnings I get from Ruby are just generic ones like method already defined or something like that.
This is the last output from the console while running specs with -b -w before the thread hangs:
/Users/luiz/rails_dev/app/config/initializers/logging_rails.rb:18: warning: method redefined; discarding old tag_logger
/Users/luiz/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/activejob-6.1.1/lib/active_job/logging.rb:19: warning: previous definition of tag_logger was here
thing is, I also get these warnings when I remove the gem and run this command though the specs run without issues then.
Is there a way to track this down to whatever is causing the thread to hang?
If you have no error message, it's hard to understand where exactly your application hangs.
As the developer cannot reproduce with your Gemfile.lock, it's possible that one of your config file or initializers is the culprit. You could create a new blank application with the same Gemfile, add your config and initializers files one by one, and test each time if the server runs until you find which one is causing the freeze.
Test also your application with another Ruby version (are you using rbenv or RVM?)
Also check your syslog, as valvat calls a web service you may find connection errors there. Check this doc on how to access your syslog: https://www.howtogeek.com/356942/how-to-view-the-system-log-on-a-mac/

Running filewatcher as separate process

I am still very new to Ruby so I hope you can help. I have a Ruby on Rails app that needs to watch a specific directory "Dir A" to which I keep adding txt files. Once a new file appears it needs to be processed into csv file, which then appears in a tmp directory before being attached to a user, and disappears from tmp after the file goes into ActiveStorage, while keeping the original txt file in "Dir A" for a limited amount of time.
Now, I am using filewatcher gem to watch "Dir A" , however I need it to run on server startup and continue to run in the background. I understand I need to daemonize the process, but how can I do it from *.rb files rather than terminal?
Atm I am using Threads, but I am not sure if that's the best solution...
I also have the following issues:
- how to process files which have already appeared in the folder before the server start up?
- filewatcher does not seem to pick up another new file while it's still processing the previous one, and threads don't seem to help with that
- what would you recommend to be the best way to keep track of processed files - a database, or renaming/copying files into a different folder, or some global variables or maybe there's something else? I have to know which files are processed so I don't repeat the process espacially in cases when I need to schedule filewatcher restarts due to its declining performance (filewatcher documentation states it is best to restart the process if it's been long-running)
I'm sorry to bombard with questions, but I need some guidance, maybe there's a better gem I've missed, I looked at guard gem but I am not entirely sure how it works and filewatcher seemed simpler.
This question should probably be split into two, one about running filewatcher as a background process, and another about managing processed files, but as far as filewatcher goes, a simple solution would be to use the foreman gem with a Procfile.
You can start your Rails app in one process and filewatcher in another with a Procfile in the root of your app, like this:
# Procfile
web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-development}
filewatcher: filewatcher "**/*.txt" "bundle exec rake process_txt_files"
and move whatever processing needs to be done into a rake task. With this you could just run foreman start locally to start both processes, and if your production server supports Procfiles (like Heroku, for example), this makes it easy to do the same in production.

memory leak on delay job what's the better alternative on rails

I send mails with delay job worker.
The web app runs on EC2 instance with 2GB ram, somehow, the instance runs out of memory after booting a while
I guess the root cause is delayed job.
What's the good alternative for that.
Can I send the mail in Thread.new , therefore the user won't be blocked on sending email
here's how i run the servers and worker on boot
every :reboot do
command " cd #{PROJECT} ; git pull origin develop "
command " cd #{PROJECT} ; memcached -vv "
command " cd #{PROJECT} ; bundle exec rake Delayed::Backend::Mongoid::Job.create_indexes "
command " cd #{PROJECT} ; bundle exec rake jobs:work "
command " cd #{PROJECT} ; bundle exec puma config/puma.rb"
command " cd #{PROJECT} ; ruby app_periodic_tasks.rb"
end
Try Sidekiq which seems much more solid.
There are a solid number of background processing tools in the Rails world and Sidekiq and Resque would be the top ones on that list for me.
I've been using Sidekiq for the last 3 years at work and it's a phenomenal tool that has done wonders for our large application.
We use it on two worker boxes: one handles assembling, building, storing, and sending e-newsletters and the other handles our application's members (uploading lists, parsing them, validating/verifying emails, and the like).
The assembly worker never runs into memory issues. Once or twice a month, I'll restart Sidekiq just to freshen it up but that's about it.
The member manager worker, which does far less than the assembly worker, requires its instance of Sidekiq to be restarted every few days. I end up with a Sidekiq worker/process that eats up a lot of memory and forces me to restart the service.
Sidekiq, however, isn't really the problem. It's my code ... there's a lot of I/O involved in managing members and I am certain that it's something I'm not doing right versus the tool itself.
So, long and short for me is this: your background tool is important but what you're doing in the code - related to the worker and elsewhere in an app - is more important (can also check to make sure memcached and caching aren't filling up your RAM, etc.).

Rails.root points to the wrong directory in production during a Resque job

I have two jobs that are queued simulataneously and one worker runs them in succession. Both jobs copy some files from the builds/ directory in the root of my Rails project and place them into a temporary folder.
The first job always succeeds, never have a problem - it doesn't matter which job runs first either. The first one will work.
The second one receives this error when trying to copy the files:
No such file or directory - /Users/apps/Sites/my-site/releases/20130829065128/builds/foo
That releases folder is two weeks old and should not still be on the server. It is empty, housing only a public/uploads directory and nothing else. I have killed all of my workers and restarted them multiple times, and have redeployed the Rails app multiple times. When I delete that releases directory, it makes it again.
I don't know what to do at this point. Why would this worker always create/look in this old releases directory? Why would only the second worker do this? I am getting the path by using:
Rails.root.join('builds') - Rails.root is apparently a 2 week old capistrano release? I should also mention this only happens in the production environment. What can I do
?
Rescue is not being restarted (stopped and started) on deployments which is causing old versions of the code to be run. Each worker continues to service the queue resulting in strange errors or behaviors.
Based on the path name it looks like you are using Capistrano for deploying.
Are you using the capistrano-resque gem? If not, you should give that a look.
I had exactly the same problem and here is how I solved it:
In my case the problem was how capistrano is handling the PID-files, which specify which workers currently exist. These files are normally stored in tmp/pids/. You need to tell capistrano NOT to store them in each release folder, but in shared/tmp/pids/. Otherwise resque does not know which workers are currently running, after you make a new deployment. It looks into the new release's pids-folder and finds no file. Therefore it assumes that no workers exist, which need to be shut down. Resque just creates new workers. And all the other workers still exist, but you cannot see them in the Resque-Dashboard. You can only see them, if you check the processes on the server.
Here is what you need to do:
Add the following lines in your deploy.rb (btw, I am using Capistrano 3.5)
append :linked_dirs, ".bundle", "tmp/pids"
set :resque_pid_path, -> { File.join(shared_path, 'tmp', 'pids') }
On the server, run htop in the terminal to start htop and then press T, to see all the processes which are currently running. It is easy to spot all those resque-worker-processes. You can also see the release-folder's name attached to them.
You need to kill all worker-processes by hand. Get out of htop and type the following command to kill all resque-processes (I like to have it completely clean):
sudo kill -9 `ps aux | grep [r]esque | grep -v grep | cut -c 10-16`
Now you can make a new deploy. You also need to start the resque-scheduler again.
I hope that helps.

Restarting Unicorn with USR2 doesn't seem to reload production.rb settings

I'm running unicorn and am trying to get zero downtime restarts working.
So far it is all awesome sauce, the master process forks and starts 4 new workers, then kills the old one, everyone is happy.
Our scripts send the following command to restart unicorn:
kill -s USR2 `cat /www/app/shared/pids/unicorn.pid`
On the surface everything looks great, but it turns out unicorn isn't reloading production.rb. (Each time we deploy we change the config.action_controller.asset_host value to a new CDN container endpoint with our pre-compiled assets in it).
After restarting unicorn in this way the asset host is still pointing to the old release. Doing a real restart (ie: stop the master process, then start unicorn again from scratch) picks up the new config changes.
preload_app is set to true in our unicorn configuration file.
Any thoughts?
My guess is that your unicorns are being restarted in the old production directory rather than the new production directory -- in other words, if your working directory in unicorn.rb is <capistrano_directory>/current, you need to make sure the symlink happens before you attempt to restart the unicorns.
This would explain why stopping and starting them manually works: you're doing that post-deploy, presumably, which causes them to start in the correct directory.
When in your deploy process are you restarting the unicorns? You should make sure the USR2 signal is being sent after the new release directory is symlinked as current.
If this doesn't help, please gist your unicorn.rb and deploy.rb; it'll make it a lot easier to debug this problem.
Keep in mind that:
your working directory in unicorn.rb should be :
/your/cap/directory/current
NOT be:
File.expand_path("../..", FILE)
Because the unicorn and linux soft link forking error: soft link can not work well.
for example:
cd current #current is a soft link to another directory
... ...
when we get our working directory, we got the absolute path not the path in "current"

Resources