Rails 5.0.1, Ruby 2.4.0, Sidekiq 4.2.9
I need count some specific data in background jobs. I implemented it already through Postgres, but I faced with problem: Sidekiq concurrency very loads DB connections and if I decrease concurrency number, jobs runing take a lot time.
I found that it's possible to use atomic counter and in some period save result to DB.
So can I share variable between threads in Sidekiq? If it is, how I should initialize shared variable?
Thanks for any advise
If you share a variable between threads, you need to worry about locking it with a Mutex and it only scales to a single process.
Instead, use Redis commands to increment counters.
https://redis.io/commands/incr
Related
Seems that the task is simple and straightforward: I need to limit amount of jobs that can be performed at the same time, so my server won't blow up. But google is silent, perhaps I'm doing something wrong? Enlighten me please?
I use standard async adapter.
It's not recommended to use default rails async adapter in production, especially for heroku dyno that restart itself once per day.
For enqueuing and executing jobs in production you need to set up a
queuing backend, that is to say you need to decide for a 3rd-party
queuing library that Rails should use. Rails itself only provides an
in-process queuing system, which only keeps the jobs in RAM. If the
process crashes or the machine is reset, then all outstanding jobs are
lost with the default async backend. This may be fine for smaller apps
or non-critical jobs, but most production apps will need to pick a
persistent backend.
There are plenty of supported adaptor to choose from such as:
Sidekiq
Resque
Delayed Job
It's easy to start, they provide clear instruction and example.
If you would like to use the default async adapter in development and would like to limit the maximum number of jobs that will be executed in parallel, you can add the following line in your config/application.rb file:
config.active_job.queue_adapter = ActiveJob::QueueAdapters::AsyncAdapter.new min_threads: 1, max_threads: 2, idletime: 600.seconds
So in this case at most two will run at the same time. I think the default maximum is twice the number of processors.
I have a case where I need to limit it to one and that works just fine (Rails 7).
Source: https://api.rubyonrails.org/classes/ActiveJob/QueueAdapters/AsyncAdapter.html
Here are some questions I have on ActiveJobs:
Say I've queued up n number of jobs on a job queue on sidekiq via ActiveJobs. On my EC2, I've set puma to have 4 workers, with 5 threads each. Does this mean up to 20 concurrent jobs will run at the same time? Will each thread pick up a queued job when it's idle and just process it? I tried this setting but it seems like it is still processing it in serial - 1 job at a time. Is there more settings I would need to do?
Regarding concurrency - how would I be able to setup even more EC2 instances just to tackle the job queue itself?
Regarding the queues itself - is there a way for us to manage / look at the queue from within Rails? Or should I rely on sidekiq's web interface to look at the queue?
Sidekiq has good Wiki. As for your questions:
Sidekiq(and other Background Job implementations) works as
where producer is your Rails app(s), Queue - Redis and consumer - Sidekiq worker(s). All three entities are completely independent applications, which may run on different servers. So, neither Puma nor Rails application can affect Sidekiq concurrency at all.
Sidekiq concurrency description goes far beyond SO answer. You can google large posts by "scaling Sidekiq workers". In short: yes, you can run separate EC2 instance(s) and set up Redis and tune Sidekiq workers count, concurrency per worker, ruby runtime, queues concurrency and priority and so so on.
Edited: Sidekiq has per worker configruration (usually sidekiq.yml). But number of workers is managed by system tools like Debian's Upstart. Or you can buy Sidekiq Pro/Enterprise with many features (like sidekiqswarm).
From wiki: Sidekiq API
Is there anything in its architecture that makes it hard to do?
I want to run an existing rails+sidekiq application in a VM with very little memory, and loading the entire rails stack in two different process is using a lot of RAM.
Puma is built to spin up homogenous web worker threads, and divide incoming requests among them. If you wanted to modify it to spawn off separate Sidekiq threads, it should technically be possible with a crazy puma.rb file, but there's no precedent I can find for doing so (edit: Mike’s answer below points out that the sucker_punch gem can essentially do this, for the same purpose of memory efficiency). Practically-speaking, if your VM cannot support running two Rails processes at a time, it probably won't be able to handle the increased memory load as your application does the work of both Sidekiq and Puma… but that depends on your workload.
If this is just for development purposes, you might be able to accomplish what you're looking for by turning on Sidekiq's inline mode (normally meant just for testing):
require 'sidekiq/testing'
Sidekiq::Testing.inline!
This will cause all perform_async calls to actually execute inline, instead of going into Redis and being picked up by the Sidekiq process.
Nothing official.
This is what sucker_punch is designed for.
For long running processes like mailing and posting to external sites, is it ok to use the Ruby Thread.new instead of a background worker like Delayed Jobs or Resque?
It depends what you mean by OK. Ruby has a Global Interpreter Lock (most implementations do anyway - JRuby is one exception) which means you will not get true concurrency using the Thread.new method. That doesn't mean you aren't getting any concurrency at all though. This is discussed in more depth in multiple places:
http://ablogaboutcode.com/2012/02/06/the-ruby-global-interpreter-lock/
http://merbist.com/2011/10/03/about-concurrency-and-the-gil/
http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/
The Delayed Jobs and Resque methods both involve having one or more separate processes performing the long running operation(s). With multiple processes you will have true concurrency between your rails app and the background worker process(es) since the GIL does not get in the way at all.
I have 16 resque queues and when I try to see the memory allocaton for these queues it is showing like 4% of the memory for each fo these queues. But at that time all these queues are empty. SO, out of 100% of my memory nearly 64% is utilized by the environment load itself. Thats what I feel.
My doubt are
1. Will each of these resque queues loads the complete application into memory separately.
If Yes, can I make any change to the resque configuation in such a way that all resque queues use the same environment loaded in a single place in memory.
Thanks in advance
I think you are out of luck if you're using Resque. I believe this is why Sidekiq was developed as a nearly drop-in replacement for Resque. The author of Sidekiq wrote a blog post describing how he improved Resque's memory usage. Here's a little bit from the Sidekiq FAQs:
Why Sidekiq over Multi-threaded Resque?
Back at Carbon Five I worked on improving Resque to use threads
instead of forking. That project was the basis for Sidekiq. I would
suggest using Sidekiq over that fork of Resque for a few reasons:
MT Resque was a one-off for a Carbon Five client and is not supported.
There are a number of bugs that were not solved, e.g. the web UI's
display of worker threads, because they were not important to the
client.
Sidekiq was built from the ground up to use threads via
Celluloid.
Sidekiq has middleware, which lets you do cool things in
the job lifespan. Resque doesn't support middleware like this
natively.
In short, MT Resque: a quick hack to save one client a lot of money, Sidekiq: the well designed solution for the same problem.