Sidekiq ignoring concurrency when using multiple queues - ruby-on-rails

I have two queues: development and ar_updater. I have a bunch of sidekiq workers that are continuously updating an ActiveRecord object, but sometimes they are querying the record at the same time, rather than only after another one has updated it. Therefore, I was thinking about just creating a queue and limiting its concurrency to 1 so that the correct data can be appended.
Here's my config/sidekiq.yml file:
development:
:concurrency: 10
ar_updater:
:concurrency: 1
:queues:
- development
- ar_updater
and then I just have a simple worker that looks like this:
class ArUpdateWorker
include Sidekiq::Worker
sidekiq_options queue: "ar_updater"
def perform(options)
options = options.transform_keys(&:to_sym)
schedule_id = options[:schedule_id]
progress = options[:progress]
schedule = Schedule.find(schedule_id)
old_progress = schedule.current_progress
schedule.update(current_progress: old_progress + progress)
end
end
Is there a way to make sure that these workers are only running one at a time in this particular queue? When calling ArUpdateWorker multiple times, it seems like sidekiq ignores the concurrency and just runs the worker as many times as I want, but if I switch the queue of the worker to development, then it adheres to the concurrency of 10. A little confusing.
It seems the only way to workaround this is to run sidekiq with the concurrency settings set on the command line interface

You cannot actually do that. You have a queue called development. That can be very confusing because you would also have an environment called development.
The configuration you have sets the configuration of the environment called "development" to 10, rather than the queue. A sidekiq process will have one concurrency setting depending on the environment. You can't set that process to have different concurrency levels.
This is a similar question:
How can I specify different concurrency queues for sidekiq configuration?

Related

Rails Initializer: An infinite loop in a separate thread to update records in the background

I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:
# in config/initializers/item_loop.rb
Thread.new
loop do
Item.find_each do |item|
# Get price from third-party api and update record.
item.update_price!
# Need to wait a little between requests to avoid getting throttled.
sleep 5
end
end
end
I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.
Yet there are multiple things that concern me:
Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
Deadlock: Should I use Rails.application.executor.wrap?
Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?
Information on any of these subjects is appreciated.
Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler
Some code snippets:
# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
include Sidekiq::Worker
sidekiq_options throttle: { threshold: 720, period: 1.hour }
def perform(item_id)
Item.find(item_id).update_price!
end
end
# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
include Sidekiq::Worker
def perform
Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
end
end
# config/sidekiq.yml
:schedule:
update_price:
cron: '0 */4 * * *' # Runs once per 4 hours - depends on how many Items are there
class: UpdatePriceMasterWorker
Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.
I use rails runner x (god gem or k8s) in our similar case.
Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.
God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).
I think deadlock would happen in any concurrency situation.
I do not think this loop + sleep approach is a design flaw, because:
cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
ActiveJob is good for one-shot long-running task, but it does not fit for daemon.

Delayed job exclude queue

I have a delayed job queue which contains particularly slow running tasks, which I want to be crunched by its own set of dedicated workers, so there is less risk it'll bottleneck the rest of the worker pipeline.
RAILS_ENV=production script/delayed_job --queue=super_slow_stuff start
However I then also want a general worker pool for all other queues, hopefully without having to specify them seperately (as their names etc are often changed/added too). Something akin to:
RAILS_ENV=production script/delayed_job --except-queue=super_slow_stuff start
I could use the wildcard * charecter but I imagine this would cause the second worker to pickup the super slow jobs too?
Any suggestions on this?
you can define a global constant for your app with all queues.
QUEUES={
mailers: 'mailers',
etc..
}
then use this constant in yours delay method calls
object.delay(queue: QUEUES[:mailers]).do_something
and try to build delayed_job_args dinamically
system("RAILS_ENV=production script/delayed_job --pool=super_slow_stuff --pool:#{(QUEUES.values-[super_slow_stuff]).join(',')}:number_of_workers start")
Unfortunately this functionality not realized in delayed jobs.
See:
https://github.com/collectiveidea/delayed_job/pull/466
https://github.com/collectiveidea/delayed_job/pull/901
You may fork delayed jobs repository and apply the simple patches from https://github.com/collectiveidea/delayed_job/pull/466.
Then use your GitHub repo, but please vote into https://github.com/collectiveidea/delayed_job/pull/466 to make it merged finally into upstream.
Update:
I wrote option to exclude queues for myself. It is in (exclude_queues) branch: https://github.com/one-more-alex/delayed_job/tree/exclude_queues
https://github.com/one-more-alex/delayed_job_active_record/tree/exclude_queues
Options description included in Readme.md
Parts about exclusion.
# Option --exclude-specified-queues will do inverse of queues processing by skipping onces from --queue, --queues.
# If both --pool=* --exclude-specified-queues given, no exclusions will by applied on "*".
If EXCLUDE_SPECIFIED_QUEUES set to YES, then queues defined by QUEUE, QUEUES will be skipped instead. See opton --exclude-specified-queues description for specal case of queue "*"
If answer strictly on question, the calling of general worker will be like:
RAILS_ENV=production script/delayed_job --queue=super_slow_stuff --exclude-specified-queues start
Warning
Please not, that am not going to support DelayedJobs and code placed "as is" in hope it will be useful.
Corresponding pull request was made by me https://github.com/collectiveidea/delayed_job/pull/1019
Also for Active Record backend: https://github.com/collectiveidea/delayed_job_active_record/pull/151
Only ActiveRecord backend supported.

Memory consumpton inside sidekiq worker

Does loading multiple Models in sidekiq worker can cause memory leak? Does it get garbage collected?
For example:
class Worker
include Sidekiq::Worker
def perform
Model.find_each do |item|
end
end
end
Does using ActiveRecord::Base.connection inside worker can cause problems? Or this connection automatically closes?
I think you are running into a problem that I also had with a "worker" - the actual problem was the code, not Sidekiq in any way, shape or form.
In my problematic code, I thoughtlessly just loaded up a boatload of models with a big, fat, greedy query (hundreds of thousands of instances).
I fixed my worker/code quite simply. For my instance, I transitioned my DB call from all to use find_in_batches with a lower number of objects pulled for the batch.
Model.find_in_batches(100) do |record|
# ... I like find_in_batches better than find_each because you can use lower numbers for the batch size
# ... other programming stuff
As soon as I did this, a job that would bring down Sidekiq after a while (running out of memory on the box) has run with find_in_batches for 5 months without me even having to restart Sidekiq ... Ok, I may have restarted Sidekiq some in the last 5 months when I've deployed or done maintenance :), but not because of the worker!

ActiveRecord::Base.connection.query_cache_enabled in sidekiq

I have a piece of code that performs the same queries over and over, and it's doing that in a background worker within a thread.
I checkout out the activerecord query cache middleware but apparently it needs to be enabled before use. However I'm not sure if it's a safe thing to do and if it will affect other running threads.
you can see the tests here: https://github.com/rails/rails/blob/3e36db4406beea32772b1db1e9a16cc1e8aea14c/activerecord/test/cases/query_cache_test.rb#L19
my question is: can I borrow and/or use the middleware directly to enable query cache for the duration of a block safely in a thread?
when I tried ActiveRecord::Base.cache do my CI started failing left and right...
EDIT: Rails 5 and later: the ActiveRecord query cache is automatically enabled even for background jobs like Sidekiq (see: https://github.com/mperham/sidekiq/wiki/Problems-and-Troubleshooting#activerecord-query-cache for information on how to disable it).
Rails 4.x and earlier:
The difficulty with applying ActiveRecord::QueryCache to your Sidekiq workers is that, aside from the implementation details of it being a middleware, it's meant to be built during the request and destroyed at the end of it. Since background jobs don't have a request, you need to be careful about when you clear the cache. A reasonable approach would be to cache only during the perform method, though.
So, to implement that, you'll probably need to write your own piece of Sidekiq middleware, based on ActiveRecord::QueryCache but following Sidekiq's middleware guide. E.g.,
class SidekiqQueryCacheMiddleware
def call(worker, job, queue)
connection = ActiveRecord::Base.connection
enabled = connection.query_cache_enabled
connection_id = ActiveRecord::Base.connection_id
connection.enable_query_cache!
yield
ensure
ActiveRecord::Base.connection_id = connection_id
ActiveRecord::Base.connection.clear_query_cache
ActiveRecord::Base.connection.disable_query_cache! unless enabled
end
end

Ruby How to create daemon process that will spawn multiple workers

I have a script called 'worker.rb'. When ran this script will perform processing for a while (an hour lets say) and then die.
I need to have another script which is going to be responsible for spawning the worker script above. Let's call this script 'runner.rb'. 'runner.rb' will be called with an argument dictating how many workers it is allowed to spawn.
I'd like runner.rb to do the following: (e.g. 'ruby runner.rb 5')
- Query the database for specific values (e.g. got 100 values)
- Spawn 5 instances of 'worker.rb' (passing the first 5 values respectively)
- Keep checking for any of the instances of 'worker.rb' spawned above to finish and then call 'worker.rb' again with the 6th value from the database and continue this process indefinitely.
I'm using the Daemons gem but am lost as the best way to go about this. The 'runner' script should definitely be daemonized - but should worker also be daemonized?
How should 'runner' go about checking if 'worker' has finished or not? Can this be done using a PID stored in a file?
I used Daemons gem before. But somehow it didn't do well on keep the number of child processes. Then I made a another one, called light_daemon. You could let light_daemon to prefork certain number of worker processes. If one of the worker dies for any reason, the light_daemon will spawn a new one to replace it. If your worker process may cause memory leaking issue, you could let the work to actively die before it gets too big. The parent process will keep the number of the worker processes constant. I used it in the produce site of one of my projects. I worked pretty well.
The following is an example daemon using the light-daemon gem.
require 'rubygems'
require 'light_daemon'
class Client
def initialize
#count = 0
end
def call
`echo "process: #{Process.pid}" >> /tmp/light-daemon.txt`
sleep 3
#count +=1
(#count < 100)? true : false
end
end
LightDaemon::Daemon.start(Client.new, :children=> 2, :pid_file => "/tmp/light-daemon.pid" )
In the daemon, the worker process dies after the method "call" is invoked 100 times. Then a new worker process is spawned and the process continues.

Resources