I'm running a rails 2 app on unicorn and experiencing weird behavior. When I first start the server and make the first request it's pretty fast (< 3 sec for entire page load). Every subsequent request takes upwards of 30-45 seconds. When I tail the logs nothing is actually being logged for the first 30+ seconds. Then all of a sudden the page just loads and the logging shows up. It's as if the request just hangs for over 30 seconds.
When I run the same code using script/server or thin it works fast and as expected.
Currently we are running unicorn on Heroku without these problems. I'm in the process of migrating to AWS and it's where I'm running in to the problems. I'm also running in to the same issue on my local VM when running unicorn.
Here is my configuration.
bundle exec unicorn -c ./config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 2)
timeout 90
preload_app true
listen (ENV['PORT'] || 8000), :backlog => Integer(ENV['UNICORN_BACKLOG'] || 20)
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
# AR
ActiveRecord::Base.connection.disconnect!
# Redis
begin
Timeout.timeout(2) { REDIS.quit }
rescue Exception
# This likely means the redis service is down
end
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
# AR
ActiveRecord::Base.establish_connection
# Redis
uri = URI.parse(ENV["REDIS_URL"])
REDIS = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password)
# Dalli
Rails.cache.reset
# Mongoid
Mongoid::Setup.run
end
UPDATE
I've narrowed it down to something preventing the worker from completing. I have a timeout of 30 seconds and even if the request only takes 1 sec to complete it will not accept any more requests until the timeout occurs. Are there any good ways to figure out what is preventing the worker from becoming available?
Related
I have been troubleshooting a memory usage of about 550 Mb on a Heroku Rails-app running on Unicorn which is causing some 2k ms response times.
I looked at my New Relic-graphs and realized I am running two instances but I only have 1 worker and I am only running 1 dyno (Hobby). I don't understand why there are two instances! It seems like I am accidentally using a "ghost" instance. This only happens when I use Unicorn, not on Puma.
Edit: I added a worker to see what happened if I ran 2 workers. This caused 3 instances to be running, according to New Relic, so it does not duplicate it just add one ghost instance.
Once every 10 minutes I run a short scheduled task, which can be seen in the graphs. ENV["WEB_CONCURRENCY"] is not set, by the way.
# Unicorn.rb:
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 1)
timeout 15
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
# Proc
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
Running heroku ps gives:
heroku ps
=== web (Hobby): bundle exec unicorn -p $PORT -c ./config/unicorn.rb (1)
web.1: up 2018/01/12 11:34:08 +0100 (~ 5h ago)
Is this behavior to be expected or am I doing something terribly wrong here? What could cause the second instance to run? Is it possible to accidentally start two versions on the app on boot?
I removed the scheduler, some gems and the dalli cache and this removed the extra/ghost instance. Then I put them back one after another but it stayed at one instance. I.e. exactly the same setup that before had two instances, now were down to 1 (which makes the most sense).
The memory consumtion remains the same so I will mark this down as a New Relic bug. Unfortunately.
We have a pretty complex Rails setup on Heroku.
In a typical day we have about 10 Web Dynos with Unicorn (each 2x and running 3 Unicorn workers) and 15 Worker Dynos running Delayed Jobs, though it fluctuates, so we use hirefire to scale up and down when we can to save on cost. Our Postgres database allows for 400 connections.
Last week I finally got fed up with our Delayed::Jobs queue that we had been using for several years; we have a series of jobs that were running every 10 minutes, it got to the point where it was taking more than 10 minutes to run all the jobs, so they queue would get backed up. I decided to make the decision to move over to Sidekiq, as I had had some success with it in the past.
It is working decently well so far, though I am finding our web dynos to be way less consistent. For example, here is our new relic graph of a 3 hour period yesterday:
But here's what the exact same time period looked like the week before:
Basically, before Sidekiq, our jobs didn't seem to be affecting our web dynos at all, but now they are. My only guess here is that when our every 10 minute jobs run they are temporarily overwhelming our postgres connections, which is slowing down the web dynos. It's the only way I can imagine the jobs would effect the web.
Any thoughts on how to keep these a bit more separate, or so they are affecting each other less, and our web response time more consistent?
Here's our sidekiq.yml:
---
:concurrency: 5
production:
:concurrency: <%= ENV['WORKER_POOL'] || 15 %>
:queues:
- [instant, 3]
- [fetchers, 2]
- [mailers, 1]
- [fetch_all, 1]
- [moderation, 1]
- [default, 1]
- [reports, 1]
- [images, 1]
- [slack, 1]
And our sidekiq.rb
require 'sidekiq'
Sidekiq.configure_server do |config|
database_url = ENV['DATABASE_URL']
if database_url
pool = ENV['WORKER_POOL'] || 15
new_database_url = "#{database_url}?pool=#{pool}"
ActiveRecord::Base.establish_connection(new_database_url)
end
end
Sidekiq.default_worker_options = { retry: 1 }
We are overwriting the pool setting on the db for the sidekiq worker instances so we can take full advantage of the concurrency.
And our database.yml
production:
database: myapp_production
adapter: postgresql
encoding: unicode
pool: 5
And our unicorn.rb
worker_processes 3
timeout 30
preload_app true
listen ENV['PORT'], backlog: Integer(ENV['UNICORN_BACKLOG'] || 200)
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
And our Procfile:
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
redis: redis-server
worker: bundle exec sidekiq -e production -C config/sidekiq.yml
Our hirefire managers are set up like
Web:
Workers:
Any suggestions?
It remains to be seen, but a very promising fix seems to have been turning off prepared_statements for the sidekiq workers in my config/database.yml:
default: &default
adapter: postgresql
encoding: unicode
pool: 5
prepared_statements: <%= !Sidekiq.server? %>
I have a rails app that serves the web and an Android client. I recently upgraded to Ruby 2.0 from 1.9.3. I went to do some work on the Android client and kept getting status 500 from the API. I return JSON data for each user by sending their auth_token in the headers.
I checked the server logs to see what was up. This method was returning nil:
def api_user
my_auth = request.headers["auth_token"]
api_user = User.where(authentication_token: my_auth).first
end
I hadn't touched my Android client so I knew the problem wasn't there.
I tried curl --header "auth_token: xxxxxxxxxxxxxxxxxxxx" https://myapp.com/api/etc.json
And got status 500 as well.
I recently made a bunch of changes to the rails app and moved from Heroku to Digial Ocean, so I figured that maybe I'd messed something up in the process. I tried locally:
curl --header "auth_token: xxxxxxxxxxxxxxxxxxxx" http://localhost:3000/api/etc.json
And it returned data for the specified user!
I'm really confused. I tried logging in production to see if the headers were there at all:
Rails.logger.info "auth_token #{request.headers["auth_token"]}"
In development this produced auth_token xxxxxxxxxxxxxxxx as it should.
In production this returned auth_token. So I'm not able to access the headers in production. Why is this?
Setup info: unicorn on a DigitalOcean droplet running Dokku. Let me know if you need more info - I'm really stumped....
ADDITIONAL INFO
Procfile
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
worker: bundle exec rake jobs:work
config unicorn.rb
worker_processes 3
timeout 3000
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
Okay, this ended up being something related to nginx. I found the answer here.
Basically, you have to add the line below to your nginx.conf file right after http {to use underscores in headers.
underscores_in_headers on;
It works.
I just switched from Webrick to Unicorn on Heroku and got into some problems that I would never dream of.
Customer contacted me that they no longer can do particular actions. I went through logs on Heroku and there were no problems there. However I noticed that there were some memory related problems, but those were solved by downscaling parallel requests on Unicorn. Since that, there were no problems. I ended up downloading the DB and attaching it locally to discover that there is no problem there.
I tried everything except changing server back to Webrick.
Totally desperate, I tried even that and IT WORKED!
Is there any reason why the app is behaving differently? There is no error, Heroku shows no problems.
Simply, the DB transaction won't complete, because there are no data stored in DB.
I cannot run on Webrick indefinitely, because right now it's causing the connection pool to deplete pretty fast and because of that I am forced to restart the server from time to time.
Any help?
Thanks
UPDATE:
This is the content of config/unicorn.rb
#config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 2)
timeout 60
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
and this is config/initializers/database_connection.rb
#config/initializers/database_connection.rb
Rails.application.config.after_initialize do
ActiveRecord::Base.connection_pool.disconnect!
ActiveSupport.on_load(:active_record) do
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || ENV['MAX_THREADS'] || 5
ActiveRecord::Base.establish_connection(config)
end
end
UPDATE 2:
I just tried to fork the application and provision all the resources to replicate the original environment. Everything works properly on forked application. The only difference is that there is no SSL endpoint.
I don't get it.
My Unicorn config (copied from Heroku's docs):
# config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 30
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
But every time a dyno is restarted, we get this:
heroku web.5 - - Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Ruby 2.0, Rails 3.2, Unicorn 4.6.3
We've had issues like this with Unicorn for some time . . . we also get seemingly random timeout errors, even though we never see much load and have 4 dynos with 4 workers each (we never have any request queuing). We have had 0 luck getting rid of these errors, even with help from Heroku. I get the feeling even they aren't 100% confident in the optimal settings for Unicorn on Heroku.
We just recently switched to Puma and so far so good, much better performance and no weird timeouts yet. One of the other reasons we switched to Puma is that I suspect some of our random timeouts come from "slow clients" . . . Unicorn isn't designed to handle slow clients.
I will let you know if we see continued success with Puma, but so far so good. The switch is pretty painless, assuming your app is thread-safe.
Here are the puma settings we are using. We are using "Clustered Mode".
procfile:
web: bundle exec puma -p $PORT -C ./config/puma.rb
puma.rb:
environment ENV['RACK_ENV']
threads Integer(ENV["PUMA_THREADS"] || 5),Integer(ENV["PUMA_THREADS"] || 5)
workers Integer(ENV["WEB_CONCURRENCY"] || 4)
preload_app!
on_worker_boot do
ActiveSupport.on_load(:active_record) do
ActiveRecord::Base.establish_connection
end
end
We currently have WEB_CONCURRENCY set to 4 and PUMA_THREADS set to 5.
We aren't using an initializer for DB_POOL, just using the default DB_POOL setting of 5 (hence the 5 threads).
The only reason we are using WEB_CONCURRENCY as our environment variable name is so that log2viz reports the correct number of workers. Would rather call it PUMA_WORKERS but whatever, not a huge deal.
Hope this helps . . . again, will let you know if we see any issues with Puma.
I hate to add another answer, especially one this simple, but ultimately what fixed this problem for us was removing the 'rack-timeout' gem. I realize this is probably not best practice but I'm curious if there is some conflict between rack-timeout and Unicorn and/or Puma (which is odd because Heroku recommends rack-timeout for use with Unicorn).
Anyway Puma is working great for us but we did still see some random inexplicable timeouts even after the Puma upgrade . . . but removing rack-timeout got rid of the issue completely. Obviously we still get timeouts but only for code we haven't optimized or if we are getting heavy usage (basically when you would expect to see timeouts). Thus I would blame this issue on rack-timeout and not on Unicorn . . . thus contradicting my previous answer :)
Hope this helps. If anyone else wants to poke holes in my theory, feel free!