I have 2 Sidekiq workers:
Foo:
# frozen_string_literal: true
class FooWorker
include Sidekiq::Worker
sidekiq_options queue: :foo
def perform
loop do
File.open(File.join(Rails.root, 'foo.txt'), 'w') { |file| file.write('FOO') }
end
end
end
Bar:
# frozen_string_literal: true
class BarWorker
include Sidekiq::Worker
sidekiq_options queue: :bar
def perform
loop do
File.open(File.join(Rails.root, 'bar.txt'), 'w') { |file| file.write('BAR') }
end
end
end
Which has pretty the same functionality, both runs on different queues and the yaml file looks like this:
---
:queues:
- foo
- bar
development:
:concurrency: 5
The problem is, even both are running and showing in the Busy page of Sidekiq UI, only one of them will actually create a file and put contents in. Shouldn't Sidekiq be multi-threaded?
Update:
this happens only on my machine
i created a new project with rails new and same
i cloned a colleague project and ran his sidekiq and is working!!!
i used his sidekiq version, not working!
New Update:
this happens also on my colleague machine if he clone my project
if I run 2 jobs with a finite loop ( like 10 times do something with a sleep), first job will be executed and then the second, but after the second finishes and start again both will work on same time as expected -- everyone who cloned the project from: github.com/ArayB/sidekiq-test encountered the problem.
It's not an issue with Sidekiq. It's an issue somewhere in Ruby/MRI/Thread/GIL. Google for more info, but my understanding is that sometimes threads aren't real threads (see "green threads") so really just simulate threading. The important thing is that only one thread can execute at a time.
It's interesting that with only two threads the system isn't giving time to the second thread. No idea why, but it must realize it's mistake when you run it again.
Interestingly if you run your same app but instead fire off 10 TestWorkers (and tweak the output so you can tell the difference) sidekiq will run all 10 "at once".
10.times {|i| TestWorker.perform_async(i) }
Here is the tweaked worker. Be sure to flush the output cause that can also cause issues with threading and TTY buffering not reflecting reality.
class TestWorker
include Sidekiq::Worker
def perform(n)
10.times do |i|
puts "#{n} - #{i} - #{Time.current}"
$stdout.flush
sleep 1
end
end
end
Some interesting links:
https://en.wikipedia.org/wiki/Green_threads
http://ruby-doc.org/core-2.4.1/Thread.html#method-c-pass
https://github.com/ruby/ruby/blob/v2_4_1/thread.c
Does ruby have real multithreading?
Related
I am trying to run a cron job in every 10 seconds that runs a piece of code. I have used an approach which requires running a code and making it sleep for 10 seconds, but it seems to make drastically degrading the app performance. I am using whenever gem, which run every minute and sleeps for 10 seconds. How can I achieve the same w/o using sleep method. Following is my code.
every 1.minute do
runner "DailyNotificationChecker.send_notifications"
end
class DailyNotificationChecker
def self.send_notifications
puts "Triggered send_notifications"
expiry_time = Time.now + 57
while (Time.now < expiry_time)
if RUN_SCHEDULER == "true" || RUN_SCHEDULER == true
process_notes
end
sleep 10 #seconds
end
def self.process_notes
notes = nil
time = Benchmark.measure do
Note.uncached do
notes = Note.where(status: false)
notes.update_all(status: true)
end
end
puts "time #{time}"
end
end
Objective of my code is to change the boolean status of objects to true which gets checked every 10 seconds. This table has 2 million records.
I suggest using a Sidekiq background jobs for this. With the sidekiq-scheduler gem you can run ordinary sidekiq jobs schedules in whatever internal you need. Bonus points for having a web-interface to handle and monitor the jobs via the Sidekiq gem.
You would use the clockwork gem. It runs in a separate process. The configuration is pretty simple.
require 'clockwork'
include Clockwork
every(10.seconds, 'frequent.job') { DailyNotificationChecker.process_notes }
In my app I am trying to perform two worker tasks sequentially.
First, a PDF is being created with Wicked pdf and then, once the PDF is created, to send an email to two different recipients with the PDF attached.
This is what is called in the controller :
PdfWorker.perform_async(#d.id)
MailingWorker.perform_in(1.minutes, #d.id,#d.class.name.to_s)
First worker creates the PDF and second worker sends email.
Here is second worker :
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
MailingWorker.perform_in(1.minutes, #d.id, #model.to_s)
end
end
end
The if statement checks if the PDF has been created. If true two mails are sent, otherwise, the same worker is called again one minute later, just to let the Heroku server extra time to process the PDF creation in case it takes more time or a long queue.
Though if the PDF has definitely failed to be processed, the above ends up in an infinite loop.
Is there a way to fix this ?
One option I see is calling the second worker inside the PDF creation worker though I don't really want to nest workers too deep. It makes my controller more clear to have them separate, I can see the sequence of actions. But any advice welcome.
Another option is to use sidekiq_options retry: 5 and request a retry of the controller that could be counted towards the full total of 5 retries, instead of retrying the worker with else MailingWorker.perform_in(1.minutes, #d.id, #model.to_s) but I don't know how to do this. As per this thread https://github.com/mperham/sidekiq/issues/769 it would be to raise an exception but I am not sure how to do this ... (also I am not sure how long the retry will wait before being processed with the exception method, with the solution above I can control the time frame..)
If you do not want to have nested workers, then in MailingWorker instead of enqueuing it again, raise an exception if the PDF is not present.
Also, configure the worker retry option, so that sidekiq will push it to the retry queue and run it again in sometime. According to the documentation,
Sidekiq will retry failures with an exponential backoff using the
formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1)) (i.e.
15, 16, 31, 96, 271, ... seconds + a random amount of time). It will
perform 25 retries over approximately 21 days.
Worker code will be more like,
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: 5
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
raise "PDF not present"
end
end
end
I believe the "correct" and most asynchroneous way to do this is to have two queues, and two workers:
Queue 1: CreatePdfWorker
Queue 2: SendPdfWorker
When the CreatePdfWorker has generated the PDF, it then enqueues the SendPdfWorker with the newly generated PDF and recipients.
This way, each worker can work independently and pluck from the queue asynchroneously, and you're not struggling against the design choices of Sidekiq.
I have an issue with importing a lot of records from a user provided excel file into a database. The logic for this is working fine, and I’m using ActiveRecord-import to cut down on the number of database calls. However, when a file is too large, the processing can take too long and Heroku will return a timeout. Solution: Resque and moving the processing to a background job.
So far, so good. I’ve needed to add CarrierWave to upload the files to S3 because I can’t just hold the file in memory for the background job. The upload portion is also working fine, I created a model for them and am passing the IDs through to the queued job to retrieve the file later as I understand I can’t pass a whole ActiveRecord object through to the job.
I’ve installed Resque and Redis locally, and everything seems to be setup correctly in that regard. I can see the jobs I’m creating being queued and then run without failing. The job seems to run fine, but no records are added to the database. If I run the code from my job line by line in the console, the records are added to the database as I would expect. But when the queued jobs I’m creating run, nothing happens.
I can’t quite work out where the problem might be.
Here’s my upload controller’s create action:
def create
#upload = Upload.new(upload_params)
if #upload.save
Resque.enqueue(ExcelImportJob, #upload.id)
flash[:info] = 'File uploaded.
Data will be processed and added to the database.'
redirect_to root_path
else
flash[:warning] = 'Upload failed. Please try again.'
render :new
end
end
This is a simplified version of the job with fewer sheet columns for clarity:
class ExcelImportJob < ApplicationJob
#queue = :default
def perform(upload_id)
file = Upload.find(upload_id).file.file.file
data = parse_excel(file)
if header_matches? data
# Create a database entry for each row, ignoring the first header row
# using activerecord-import
sales = []
data.drop(1).each_with_index do |row, index|
sales << Sale.new(row)
if index % 2500 == 0
Sale.import sales
sales = []
end
end
Sale.import sales
end
def parse_excel(upload)
# Open the uploaded excel document
doc = Creek::Book.new upload
# Map rows to the hash keys from the database
doc.sheets.first.rows.map do |row|
{ date: row.values[0],
title: row.values[1],
author: row.values[2],
isbn: row.values[3],
release_date: row.values[5],
units_sold: row.values[6],
units_refunded: row.values[7],
net_units_sold: row.values[8],
payment_amount: row.values[9],
payment_amount_currency: row.values[10] }
end
end
# Returns true if header matches the expected format
def header_matches?(data)
data.first == {:date => 'Date',
:title => 'Title',
:author => 'Author',
:isbn => 'ISBN',
:release_date => 'Release Date',
:units_sold => 'Units Sold',
:units_refunded => 'Units Refunded',
:net_units_sold => 'Net Units Sold',
:payment_amount => 'Payment Amount',
:payment_amount_currency => 'Payment Amount Currency'}
end
end
end
I can probably have some improved logic anyway as right now I’m holding the whole file in memory, but that isn’t the issue I’m having – even with a small file that has only 500 or so rows, the job doesn’t add anything to the database.
Like I said my code worked fine when I wasn’t using a background job, and still works if I run it in the console. But for some reason the job is doing nothing.
This is my first time using Resque so I don’t know if I’m missing something obvious? I did create a worker and as I said it does seem to run the job. Here’s the output from Resque’s verbose formatter:
*** resque-1.27.4: Waiting for default
*** Checking default
*** Found job on default
*** resque-1.27.4: Processing default since 1508342426 [ExcelImportJob]
*** got: (Job{default} | ExcelImportJob | [15])
*** Running before_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** resque-1.27.4: Forked 63706 at 1508342426
*** Running after_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** done: (Job{default} | ExcelImportJob | [15])
In the Resque dashboard the jobs aren’t logged as failed. They get executed and I can see an increment in the ‘processed’ jobs on the stats page. But as I say the DB remains untouched. What’s going on? How can I debug the job more clearly? Is there a way to get into it with Pry?
It looks like my problem was with Resque.enqueue(ExcelImportJob, #upload.id).
I changed my code to ExcelImportJob.perform_later(#upload.id) and now my code actually runs!
I also added a resque.rake task to lib/tasks as described here: http://bica.co/2015/01/20/active-job-resque/.
That link also notes how to use rails runner to call the job without running the full Rails server and triggering the job, which is useful for debugging.
Strangely, I didn't quite manage to get the job to print anything to STDOUT as suggested by #hoffm but at least it led me down a good avenue of inquiry.
I still don't fully understand the difference between why calling Resqueue.enqueue still added my jobs to the queue and indeed seemed to run them, but the code wasn't executed, so if someone has a better grasp and an explanation, that would be much appreciated.
TL;DR: calling perform_later rather than Resque.enqueue fixed the problem but I don't know why.
In my ruby script,I am using celluloid-zmq gem. where I am trying to run evaluate_response asynchronously inside pollers using,
async.evaluate_response(socket.read_multipart)
But if I remove sleep from loop, somehow thats not working out, It is not reaching to "evaluate_response" method. But if I put sleep inside loop it works perfectly.
require 'celluloid/zmq'
Celluloid::ZMQ.init
module Celluloid
module ZMQ
class Socket
def socket
#socket
end
end
end
end
class Indefinite
include Celluloid::ZMQ
## Readers
attr_reader :dealersock,:pullsock,:pollers
def initialize
prepare_dealersock and prepare_pullsock and prepare_pollers
end
## prepare DEALER SOCK
def prepare_dealersock
#dealersock = DealerSocket.new
#dealersock.identity = "IDENTITY"
#dealersock.connect("tcp://localhost:20482")
end
## prepare PULL SOCK
def prepare_pullsock
#pullsock = PullSocket.new
#pullsock.connect("tcp://localhost:20483")
end
## prepare the Pollers
def prepare_pollers
#pollers = ZMQ::Poller.new
#pollers.register_readable(dealersock.socket)
#pollers.register_readable(pullsock.socket)
end
def run!
loop do
pollers.poll ## this is blocking operation never mind though we need it
pollers.readables.each do |socket|
## we know socket.read_multipart is blocking call this would give celluloid the chance to run other process in mean time.
async.evaluate_response(socket.read_multipart)
end
## If you remove the sleep the async evaluate response would never be executed.
## sleep 0.2
end
end
def evaluate_response(message)
## Hmmm, the code just not reaches over here
puts "got message: #{message}"
...
...
...
...
end
end
## Code is invoked like this
Indefinite.new.run!
Any idea why this is happening?
The question was 100% changed, so my previous answer does not help.
Now, the issues are...
ZMQ::Poller is not part of Celluloid::ZMQ
You are directly using the ffi-rzmq bindings, and not using the Celluloid::ZMQ wrapping, which provides evented & threaded handling of the socket(s).
It would be best to make multiple actors -- one per socket -- or to just use Celluloid::ZMQ directly in one actor, rather than undermining it.
Your actor never gets time to work with the response
This part makes it a duplicate of:
Celluloid async inside ruby blocks does not work
The best answer is to use after or every and not loop ... which is dominating your actor.
You need to either:
Move evaluate_response to another actor.
Move each socket to their own actor.
This code needs to be broken up into several actors to work properly, with a main sleep at the end of the program. But before all that, try using after or every instead of loop.
I have the following code that I run in a rake task in rails:
10.times do |i|
Thread.new do
puts "#{i}"
end
end
When I run this locally, I get the following:
0
3
5
1
7
8
2
4
9
6 (with new lines)
However, when I run the same code in EC2 via the same rake task, it will print out maybe one or two lines, and then the task will terminate. I'm not sure why, but it seems my EC2 instance can't handle the multithreading for some reason.
Any insights why?
You've just been getting lucky locally - there is nothing that guarantees that your 10 threads will execute to completion before your program exits. If you want to wait for your threads then you must do so explicitly:
threads = 10.times.collect do |i|
Thread.new do
puts i
end
end
threads.each(&:join)
The join method blocks the calling thread until the specified thread has completed. It also returns the return value of that thread.