what causes 'deadlock; recursive locking' error in a Rails app? - ruby-on-rails

My rails app tracks any delayed_job errors, and we saw this one today for the first time:
deadlock; recursive locking /app/vendor/bundle/ruby/1.9.1/gems/delayed_job-3.0.5/lib/delayed/worker.r
The app has been performing flawlessly, with millions of delayed jobs handled w/o error.
Is this just "one of those random things" or is there something different we can/should do to prevent it from happening again?
I'm especially confused because we run only a single worker.
Our setup: Rails 3.2.12, Heroku app, Postgres, several web dynos but only 1 worker dyno.

This is an issue with Rack. See similar bug reports:
https://github.com/rack/rack/issues/658
https://github.com/rack/rack/issues/349

I had the same issue. The fix was to upgrade rubygems. The way I used to upgrade:
gem update --system
Ref: https://github.com/pry/pry/issues/2137#issuecomment-720775183

Related

how to debug to find the code that is blocking the main thread

I have a Rails app running version 6.1.1 and currently, we use Ruby 2.7.2
While trying to upgrade to Ruby 3, I'm facing an interesting issue: some code apparently is blocking the main thread. When I try to boot the server or run the tests, the console stuck and I can't even stop the process, I have to kill it.
I tracked it down to one gem called valvat, used to validate EU VAT numbers. I opened an issue on its Github repo but the maintainer couldn't reproduce even using the same Gemfile.lock I have, which lead me to believe that it might not be just the gem, gotta be something else in my code.
This is what happens when I try to boot the server:
=> Booting Puma
=> Rails 6.1.1 application starting in development
=> Run `bin/rails server --help` for more startup options
^C^C^C
as one can see, I can't even stop it, the thread is now hanging and I can't tell exactly by whom.
I tried to run the specs with -b -w to see what I can see but got the same error: the thread hangs and the warnings I get from Ruby are just generic ones like method already defined or something like that.
This is the last output from the console while running specs with -b -w before the thread hangs:
/Users/luiz/rails_dev/app/config/initializers/logging_rails.rb:18: warning: method redefined; discarding old tag_logger
/Users/luiz/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/activejob-6.1.1/lib/active_job/logging.rb:19: warning: previous definition of tag_logger was here
thing is, I also get these warnings when I remove the gem and run this command though the specs run without issues then.
Is there a way to track this down to whatever is causing the thread to hang?
If you have no error message, it's hard to understand where exactly your application hangs.
As the developer cannot reproduce with your Gemfile.lock, it's possible that one of your config file or initializers is the culprit. You could create a new blank application with the same Gemfile, add your config and initializers files one by one, and test each time if the server runs until you find which one is causing the freeze.
Test also your application with another Ruby version (are you using rbenv or RVM?)
Also check your syslog, as valvat calls a web service you may find connection errors there. Check this doc on how to access your syslog: https://www.howtogeek.com/356942/how-to-view-the-system-log-on-a-mac/

Why does Rails console create so many Ruby processes?

I experimented with a Rake task with Cron. I started with no Ruby processes, then the Cron job started and spawned one process. The highlighted process below is what is run by Cron, which is expected:
I wanted to check if any records were being written to the database. I ran rails c to enter the Rails console, and noticed that suddenly four other ruby processes showed up in my process list as above. Why would this happen? I think that running the console should create one other process and not four.
After quitting the console, I am left with three Ruby processes including the Rake task that is running.
I am using Rails 4.2.
It's not that I find this to be problematic, but I am curious why there would need to be more than one process for a REPL and then two leftover processes after the REPL is closed.
This is because of spring which has shipped with rails by default for a little while now.
You might notice that the second time you run rails c is a lot faster than the first time. This is because the first time you run a springified script your app is loaded as normal and then forks to run what you requested. The second time around this loader script can just fork a second time, so you get a much faster startup. This possible because of these extra processes you have noticed.
You can see them by running
spring status
And you can kill them by running
spring stop

Heroku gem versioning issue

I've been working on an application that runs on Heroku for awhile and occasionally I run into a funny issue where my background workers start failing. It's as if the background workers have an old version of the gem, which used to talk to a HTTP API and was switched to hit a HTTPS endpoint. The gem that's causing problems is written by me and is pulled from Github with the following line in my Gemfile:
gem 'stubhub', github: 'Zanfa/stubhub'
From my logs, I can see that I'm getting 403s because it tries to hit the non-HTTPS url (which does not exist in the gem anywhere). However when I open the console with heroku run rails c and run the job from there, it never has the same issue.
I've also tried heroku run bundle list and bundle list to compare if there's a mismatch in versions, but it always reports the current version 0.0.23.
And to make things more interesting, this doesn't always happen. There's only like a 20% chance that it will start hitting the non-https endpoint, and doing heroku restart usually fixes it, but it will pop up again in a couple of pushes.
What you need to remember is that every time you do heroku run you're instantiating your slug into a new dyno, not interacting with your existing dynos. Googling will show different people dealing with Bundler caching issues, like http://artsy.github.io/blog/2013/01/15/debugging-bundler-issues-with-heroku/
My advice is to push a version of your repo with an empty Gemfile (to clear out the cache) and then push your normal repo again.

delayed_job dies with out error -- leaving job in locked state

After DJ dies the log files indicate nothing.
running: ./script/delayed_job status
gives: pid-file for killed process 1143 found (/appPath/tmp/pids/delayed_job.pid), deleting.
delayed_job: no instances running
The strange thing is if I use: ./script/delayed_job run It will run perfectly in the foreground! And never dies.
Tried many versions of delayed_job and mongoid with same results.
Any one know how to debug?
Using:
rails (3.2.7)
delayed_job_mongoid (2.0.0)
mongoid (3.0.3)
delayed_job (3.0.3)
Turns out delayed_job was executing a job causing a segmentation fault, which would kill the delayed_job daemon.
After debugging it turns out Random.rand() will cause a reproducible segmentation fault when run in a daemonized environment. This has to do with initial seeding and setup of the random generator, which apparently does not get handled properly by daemonize.
The solution: Random.new.rand()
I'm wondering if the weird behaviour in this stack overflow DJ log question could account for the behaviour you had. The answer looks plausible too. Stranger things have happened.
Pt 2:
Permission issues? could may very well be mucking it up too. Is this in production or dev? Does it work in Dev?
PT 3: From the github page of DJm Make sure you are using MongoDB version 1.3 or newer. Are you?
pt 4: and this? script/rails runner 'Delayed::Backend::Mongoid::Job.create_indexes'
Lastly, as of today DJM is running red on Travis, with some errors that may effect you. I had a shoddy build in a gem once, driving me to drink only to be fixed 2 days later. http://travis-ci.org/#!/collectiveidea/delayed_job_mongoid/jobs/1962498
If that isn't it, throw on pry in the Gemfile, add binding.pry to that script starting at the top and working down.

Avoiding deployment gem-fail downtime while using Passenger, bundler & git-based deploys

I'm deploying a Rails 3 app on Passenger 3.0.7 using capistrano and git-based deployment, similar to GitHub's setup:
https://github.com/blog/470-deployment-script-spring-cleaning -- this means the app operates entirely out of one directory, with no /releases/123456 and symlink switching.
If we've added any gems our app starts throwing 500 errors during deployment, during the "bundle:install" phase, but before deploy:restart. The code has been updated and it seems like passenger is already starting to use it, and required gems can't be found yet.
This is not caused by new workers being spun up, as I've tried setting the Passenger idle_time to 0 and max_instances and min_instances to the same value, so that workers are never spun down.
Running on Linux with ruby-ee 1.8.7-2011.03. Sample error from Passenger: https://gist.github.com/54794e85d2c799e4f697
I've also considered doing "two-directory" git-based deployment as a hack -- swapping in the new code once the bundle is complete. Ideas welcome.
Go with the two-directory deployment. Apart from avoiding the 500s during deployment, this will also act as a safety net if you need to rollback during/after deployment.

Resources