Ruby on Rails batch processing - ruby-on-rails

I am working on a Rails app that runs regularly scheduled sidekiq jobs, and I understand how queues and background jobs works. I'm working with a 3rd party that requests that I batch jobs to them so that each worker handles one job at a time with 50 workers running in parallel.
I've been researching this for hours, but I'm unclear on how to do this and how to tell if it's actually working. Currently, my procfile looks like this:
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
worker: bundle exec sidekiq -C ./config/sidekiq.yml
Is it as simple as increasing the concurrency from the rake task to -c 50 in the worker line? Or do I need to use ConnectionPool inside the worker class? The Rails docs say that using find_each is "useful if you want multiple workers dealing with the same processing queue." If I run find_each inside the rake task and call the worker once for each item, will it run the jobs in parallel? I read one article that says that concurrency and parallelism are often confused, so I am, in turn, a little confused about which direction to take.

Related

concurrency in delayed_jobs

I have ROR application and 1 delay_job process ran using rake job:work.
ROR application add Job in multiple queue.
Lets say we have queue 1 and queue 2.
My Question is task in queue 1 and task in queue 2 will be executed concurrently?
Currently in my application after running rake job:work process only 1 thread is spawn which executes queue1 task and then queue2 task.
If i have to execute in parallel, i have to run two rake task of job:work.
Is it correct behavior or it can be run concurrently in 1 rake task of job:work.
And what is worker in Delay Job. Is delay Job interchangeably used with worker
Thanks
Priyanka
No, one worker cannot run two jobs concurrently, you need more than one process running for that.
In the example you are describing, you are starting a worker that is running in the foreground (rake job:work), but what you could do instead, is to start them as background workers, by running bin/delayed_job instead (script/delayed_job for earlier versions). That command has multiple options that you can use to specify how you want delayed_job to function.
One of the options is -n or --number_of_workers=workers. That means you can start two workers by running the following command:
bundle exec bin/delayed_job --number_of_workers=2 start
It is also possible to dedicate certain workers to only run jobs from a specific queue, or only high priority jobs.

Is there is something like cron in rails application on windows?

I'm trying to use cron in my application to send mails every week but I think it doesn't work on Windows.
Does anybody knows any equivalent to cron solution that works on Windows?
Windows equivalent of Unix's cron is a "Task Scheduler". You can configure your periodical task there.
Purely Ruby solution
If you want a purely Ruby solution look into:
rufus-scheduler - it's Windows cron gem.
crono - it's a in-Rails cron scheduler, so it should work anywhere.
Web services - there are plenty of free online services that would make a request to a given URL in specific time periods. This is basically a poor man's cronjob.
I recommend taking a look at Resque and the extension Resque-scheduler gems. You will need to have a resque scheduler process running with bundle exec rake resque:scheduler and at least one worker process running with QUEUE=* bundle exec rake resque:work.
If you want these services to run in the background as a windows service, you can do it with srvany.exe as described in this SO question.
The above assumes you are ok with installing Redis - a key-value store that is very popular among the Rails community as it can be easily used to support other Rails components such as caching and ActionCable, and it is awesome by itself for many multi-process use cases.
Resque is a queue system on top of Redis that allows you to define jobs that can be executed asynchronously in the background. When you run QUEUE=* bundle exec rake resque:work, a worker process runs constantly and polls the queue. Once a job is enqueued, an available worker pops it from the queue and starts working on it. This architecture is quite scalable, as you can have multiple workers listening to the queues if you'd like.
To define a job, you do this:
class MyWeeklyEmailSenderJob
def self.perform
# Your code to send weekly emails
end
end
While you can enqueue this job to the queue yourself from anywhere (e.g. from a controller as a response to an action), in your case you want it to automatically be placed into the queue once a week. This is what Resque-scheduler is for. It allows you to configure a file such as app/config/resque_schedule.yml in which you can define which jobs should be enqueued in which time interval. For example:
send_weekly_emails:
cron: 0 8 * * Mon
class: MyWeeklyEmailSenderJob
queue: email_sender_queue
description: "Send weekly emails"
Remember that a scheduling process has to run in order for this to work with bundle exec rake resque:scheduler.
thanks guys , actually i tried rufus scheduler gem and it worked for me , i guess it's the best and easier solution

Redis::CommandError: ERR max number of clients reached

I am receiving the above error and trying to get more insight. My app has 3 types of background jobs and about 100 users so nothing too heavy.
My goal is to be able to process multiple background jobs at the same time (so if 10 users perform the same job, they don't need to wait for each other job to finish before starting).
I'm confused as to how many dynos I need, how many workers I need, how many redis connections I need. What's the difference between all these things?
My current setup has:
1 x professional web dyno
1 x professional scheduler dyno
3 x professional worker dyno
and my procfile:
web: bundle exec rails server -p $PORT
scheduler: bundle exec rake resque:scheduler
worker: env TERM_CHILD=1 QUEUE='*' COUNT='3' bundle exec rake resque:workers
And I am getting the error:
Redis::CommandError: ERR max number of clients reached
I am just surprised because it seems like what I'm trying to achieve is pretty simple.

Prevent sidekiq from executing queued up jobs when starting from command line?

When I start sidekiq in my development environment (Rails 3.2), I use the following command:
bundle exec sidekiq
When I do this, sidekiq will execute all jobs that have been queued up when it was not running. e.g. If I have created a bunch of new user accounts during testing, it will try and send welcome emails to all of the fake accounts (my emails are sent from a sidekiq job).
Is there a way to start sidekiq and tell it to delete all pending jobs? That way I can turn it back on without worrying that it will try and run a bunch of jobs that don't need to run (since this is my dev environment).
I have looked in documentation, but can't find an answer, hopefully it's something simple I overlooked...
redis-cli flushall && bundle exec sidekiq
I found a solution: Using the sidekiq monitoring UI that comes with sidekiq (https://github.com/mperham/sidekiq/wiki/Monitoring), I'm able to view all queues (even when sidekiq is not running). Deleting the queue will remove all of the jobs in it, which solves the problem.

delayed_job rake task parameters and concurrency

The documentation states that a delayed job worker can be invoked using a rake task like so: rake jobs:work, or QUEUE=queue1 rake jobs:work if you want it to work on a specific queue.
I have a couple of questions about this way to invoke jobs:
Is there a way to pass other parameters like sleep-delay or read-ahead (like you would do if you start the worker using the script: delayed_job start --sleep-delay 30 --read-ahead 500 --queue=queue1)?
Is there any gain in processing speed if you launch 2 workers on the same queue using the rake task?
In answer to 1. - yes you can set sleep delay and read ahead from the command line. You do it via environment variables:
QUEUE=queue1 SLEEP_DELAY=1 rake jobs:work
for example. See this commit.
rake jobs:work is just a means to an end to put up another worker, for development purposes or to work off a big queue (you have rake jobs:workoff for this though) so all benefits and disclaimers of multiple workers apply,
two jobs process in parallel so if you've got the cpu power your queue will be worked quicker
I don't know about the question #1 though, it's possible rake jobs wasn't intended to be used outside of development

Resources