I sought, but did not find, a max-requests-per-worker option in unicorn similar to gunicorn's max_requests or apache's MaxRequestsPerChild.
Does it exist?
If not, has anyone implemented it?
I'm thinking of putting it in the file where I have oobgc, since that gets control after every requests anyway. Does that sound about right?
The problem is that my unicorn workers are getting big and fat, and garbage collection is taking more and more of my CPU.
i've just released 'unicorn-worker-killer' gem. This enables you to kill Unicorn worker based on 1) Max number of requests and 2) Process memory size (RSS), without affecting the request. It's really easy to use. At first, please add this line to your Gemfile.
gem 'unicorn-worker-killer'
Then, please add the following lines to your config.ru.
# Unicorn self-process killer
require 'unicorn/worker_killer'
# Max requests per worker
use Unicorn::WorkerKiller::MaxRequests, 3072, 4096
# Max memory size (RSS) per worker
use Unicorn::WorkerKiller::Oom, (256*(1024**2)), (384*(1024**2))
It's highly recommended to randomize the threshold to avoid killing all workers at once.
Unicorn doesn't offer a max-requests.
The unicorn master will re-spawn any worker which exits and a worker will gracefully exit at the end of the current request when it receives a QUIT signal, so you could easily roll your own max request logic into your worker request life-cycle.
With Rails, something like the following in your application controller (alternatively, similar logic in a rack middleware)
after_filter do
##request_count ||= 0
Process.kill('QUIT',$$) if (##request_count += 1) > MAX_REQUESTS
end
Related
I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:
# in config/initializers/item_loop.rb
Thread.new
loop do
Item.find_each do |item|
# Get price from third-party api and update record.
item.update_price!
# Need to wait a little between requests to avoid getting throttled.
sleep 5
end
end
end
I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.
Yet there are multiple things that concern me:
Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
Deadlock: Should I use Rails.application.executor.wrap?
Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?
Information on any of these subjects is appreciated.
Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler
Some code snippets:
# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
include Sidekiq::Worker
sidekiq_options throttle: { threshold: 720, period: 1.hour }
def perform(item_id)
Item.find(item_id).update_price!
end
end
# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
include Sidekiq::Worker
def perform
Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
end
end
# config/sidekiq.yml
:schedule:
update_price:
cron: '0 */4 * * *' # Runs once per 4 hours - depends on how many Items are there
class: UpdatePriceMasterWorker
Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.
I use rails runner x (god gem or k8s) in our similar case.
Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.
God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).
I think deadlock would happen in any concurrency situation.
I do not think this loop + sleep approach is a design flaw, because:
cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
ActiveJob is good for one-shot long-running task, but it does not fit for daemon.
Does loading multiple Models in sidekiq worker can cause memory leak? Does it get garbage collected?
For example:
class Worker
include Sidekiq::Worker
def perform
Model.find_each do |item|
end
end
end
Does using ActiveRecord::Base.connection inside worker can cause problems? Or this connection automatically closes?
I think you are running into a problem that I also had with a "worker" - the actual problem was the code, not Sidekiq in any way, shape or form.
In my problematic code, I thoughtlessly just loaded up a boatload of models with a big, fat, greedy query (hundreds of thousands of instances).
I fixed my worker/code quite simply. For my instance, I transitioned my DB call from all to use find_in_batches with a lower number of objects pulled for the batch.
Model.find_in_batches(100) do |record|
# ... I like find_in_batches better than find_each because you can use lower numbers for the batch size
# ... other programming stuff
As soon as I did this, a job that would bring down Sidekiq after a while (running out of memory on the box) has run with find_in_batches for 5 months without me even having to restart Sidekiq ... Ok, I may have restarted Sidekiq some in the last 5 months when I've deployed or done maintenance :), but not because of the worker!
I am using Unicorn as my app server for my Rails app, and am trying to figure out why there sometimes is sometimes a non-trivial (> 5 seconds) delay between the start of a request, and when it reaches my controller.
This is what my production.log prints out:
Started GET "/search/articles.json?q=mashable.com" for 138.7.7.33 at 2015-07-23 14:59:19 -0400**
Parameters: {"q"=>"mashable.com"}
Searching articles for keyword: mashable.com, format: json, Time: 2015-07-23 14:59:26 -0400
Notice how there is a 7 second delay in between STARTED GET: and "Searching articles for keyword", which is the first thing the controller method does.
articles.json is routed to my controller method "articles" which simply does this for now:
def articles
format = params[:format]
keyword = params["q"]
Rails.logger.info "Searching articles for keyword: #{keyword}, format: #{format}, Time: #{Time.now.to_s}"
end
This is my routes.rb
MyApp::Application.routes.draw do
match '/search/articles' => 'search#articles'
#more routes here, but articles is the first route
end
What could possibly cause this delay? Is it because an Unicorn worker is busy? Is it because an Unicorn worker is taking up too much memory which leads the system to be slow?
Note: I don't believe the delay is in making any database connections but I could be wrong. The code doesn't need to make a database call, and the max connections for my database is 1000, and there are usually at most 1-2 connections.
Three thoughts:
You'll probably be better served using Puma instead of Unicorn
It could be that your system is running out of memory, or it could have plenty of memory available: install New Relic to troubleshoot where the bottleneck is
It could also be that you have more Unicorn instances than the number of connections your DB allows, in which case the instance is having to wait for others to disconnect before it can connect. This would likely manifest itself with irregular 5-second delays rather than happening every time.
Actually, it might be caused by an before_filter callback, you should check it
I think it can be because of lack of memory and thus frequent garbage collection, which freeze whole system.
If it's a production problem it could be caused by slow clients sending requests. New Relic and Monit are good options. You could consider sending signals to Unicorn workers to restart them to better understand the problem.
You could also try adding preload_app true in your Unicorn config to speed up the startup time of worker processes.
I'm writing a rake task that would be called every minute (possibly every 30 seconds in the future) by Whenever, and it contacts a polling API endpoint (per user in our database). Obviously, this is not efficient run as a single thread, but is it possible to multithread? If not, is there a good event-based HTTP library that would be able to get the job done?
I'm writing a rake task that would be called every minute (possibly every 30 seconds in the future) by Whenever
Beware of Rails startup times, it might be better to use a forking model such as Resque or Sidekiq, Rescue provides https://github.com/bvandenbos/resque-scheduler which should be able to do what you need, I can't speak about Sidekiq, but I'm sure it has something similar available (Sidekiq is much newer than Resque)
Obviously, this is not efficient run as a single thread, but is it possible to multithread? If not, is there a good event-based HTTP library that would be able to get the job done?
I'd suggest you look at ActiveRecord's find_each for tips on making your finder process more efficient, once you have your batches you can easily do something using threads such as:
#
# Find each returns 50 by default, you can pass options
# to optimize that for larger (or smaller) batch sizes
# depending on your available RAM
#
Users.find_each do |batch_of_users|
#
# Find each returns an Enumerable collection of users
# in that batch, they'll be always smaller than or
# equal to the batch size chosen in `find_each`
#
#
# We collect a bunch of new threads, one for each
# user, eac
#
batch_threads = batch_of_users.collect do |user|
#
# We pass the user to the thread, this is good
# habit for shared variables, in this case
# it doesn't make much difference
#
Thread.new(user) do |u|
#
# Do the API call here use `u` (not `user`)
# to access the user instance
#
# We shouldn't need to use an evented HTTP library
# Ruby threads will pass control when the IO happens
# control will return to the thread sometime when
# the scheduler decides, but 99% of the time
# HTTP and network IO are the best thread optimized
# thing you can do in Ruby.
#
end
end
#
# Joining threads means waiting for them to finish
# before moving onto the next batch.
#
batch_threads.map(&:join)
end
This will start no more than batch_size of threads, waiting after each batch_size to finish.
It would be possible to do something like this, but then you will have an uncontrollable number of threads, there's an alternative you might benefit from here, it gets a lot more complicated including a ThreadPool, and shared list of work to do, I've posted it as at Github so'as not to spam stackoverflow: https://gist.github.com/6767fbad1f0a66fa90ac
I would suggest using sidekiq which is great at multithreading. You can then enqueue separate jobs per user for polling the API. clockwork can be used to make the jobs you enqueue recurring.
I have a script called 'worker.rb'. When ran this script will perform processing for a while (an hour lets say) and then die.
I need to have another script which is going to be responsible for spawning the worker script above. Let's call this script 'runner.rb'. 'runner.rb' will be called with an argument dictating how many workers it is allowed to spawn.
I'd like runner.rb to do the following: (e.g. 'ruby runner.rb 5')
- Query the database for specific values (e.g. got 100 values)
- Spawn 5 instances of 'worker.rb' (passing the first 5 values respectively)
- Keep checking for any of the instances of 'worker.rb' spawned above to finish and then call 'worker.rb' again with the 6th value from the database and continue this process indefinitely.
I'm using the Daemons gem but am lost as the best way to go about this. The 'runner' script should definitely be daemonized - but should worker also be daemonized?
How should 'runner' go about checking if 'worker' has finished or not? Can this be done using a PID stored in a file?
I used Daemons gem before. But somehow it didn't do well on keep the number of child processes. Then I made a another one, called light_daemon. You could let light_daemon to prefork certain number of worker processes. If one of the worker dies for any reason, the light_daemon will spawn a new one to replace it. If your worker process may cause memory leaking issue, you could let the work to actively die before it gets too big. The parent process will keep the number of the worker processes constant. I used it in the produce site of one of my projects. I worked pretty well.
The following is an example daemon using the light-daemon gem.
require 'rubygems'
require 'light_daemon'
class Client
def initialize
#count = 0
end
def call
`echo "process: #{Process.pid}" >> /tmp/light-daemon.txt`
sleep 3
#count +=1
(#count < 100)? true : false
end
end
LightDaemon::Daemon.start(Client.new, :children=> 2, :pid_file => "/tmp/light-daemon.pid" )
In the daemon, the worker process dies after the method "call" is invoked 100 times. Then a new worker process is spawned and the process continues.