Sidekiq - pick up DB changes made by another worker - ruby-on-rails

In a Rails app, I have projects which have many tasks.
A task may have a predecessor that need to be completed before the task can start.
I use sidekiq for creating tasks.
class ScheduleProjectJob < ApplicationJob
queue_as :default
def perform(project)
tasks = Array(project.tasks)
while !tasks.empty? do
task = tasks.shift
if task.without_predecessor? || task.predecessor_scheduled?
ScheduleTaskJob.perform_later(task)
else
tasks << task
end
end
end
I loop through the tasks and schedule a task if it doesn't have a predecessor or, in case it has one, when the predecessor has been already scheduled.
To check if the predecessor has been scheduled, I check in the database if the predecessor state is scheduled (tasks are created with created state and updated to scheduled at the end of ScheduleTaskJob.
The check is as follows
Task.joins(:task_template).
where(%q(task_templates.dep_id = :dep AND
task_templates.tag = :tag AND
tasks.state = :state),
specification_id: task_template.dep_id,
tag: task_template.runs_after_tag,
state: 'scheduled').
count > 0
The query above seems to work fine when I manually set the DB up and run it.
However, when it runs inside the ScheduleProjectJob the state of the predecessor task is always reported as created even if I can see in the DB the value in the record has been updated to scheduled.
Am I missing anything here?
end

ActiveRecord caches query results, when you're expecting query result to change, wrap your query with:
ActiveRecord::Base.uncached do # or YourModel.uncached do
some_query.count
end

Related

Timeout in a delayed job

I have some code that potentially can run for a longer period of time. However if it does I want to kill it, here is what I'm doing at the moment :
def perform
Timeout.timeout(ENV['JOB_TIMEOUT'].to_i, Exceptions::WorkerTimeout) { do_perform }
end
private
def do_perform
...some code...
end
Where JOB_TIMEOUT is an environment variable with value such as 10.seconds. I've got reports that this still doesn't prevent my job from running longer that it should.
Is there a better way to do this?
I believe delayed_job does some exception handling voodoo with multiple retries etc, not to mention that I think do_perform will return immediately and the job will continue as usual in another thread. I would imagine a better approach is doing flow control inside the worker
def perform
# A nil timeout will continue with no timeout, protect against unset ENV
timeout = (ENV['JOB_TIMEOUT'] || 10).to_i
do_stuff
begin
Timeout.timeout(timeout) { do_long_running_stuff }
rescue Timeout::Error
clean_up_after_self
notify_business_logic_of_failure
end
end
This will work. Added benefits are not coupling delayed_job so tightly with your business logic - this code can be ported to any other job queueing system unmodified.

Testing error callback from Delayed::Job with RSpec

all,
I have a custom Delayed::Job setup that uses the the success and error callbacks to change the attributes of the object that is being modified in the background. This object is interacting with an external API. To test this, I'm using RSpec with VCR to record external API interactions.
Here's my worker:
class SuperJob < Struct.new(:Thingy_id)
include JobMethods
def perform
thing = Thingy.find(Thingy_id)
run_update(thing)
end
def success(job)
thing = Thingy.find_by_job_id(job.id)
thing.update(job_finished_at: Time.now, job_id: nil)
end
def error(job, exception)
thing = Thingy.find_by_job_id(job.id)
thing.update(job_id: -1, disabled: true)
end
end
Here are my DJ settings:
Delayed::Worker.delay_jobs = !Rails.env.test?
Delayed::Worker.max_run_time = 2.minutes
I've successfully used RSpec to test the results of the success callback. What I'd like to do is test the results of the error callback. The external API doesn't have any particular length limit on the time of the response, to for my app I'd like to limit the maximum wait time to 2 minutes (as seen in the max_run_time setting for DJ).
Now, how do I test that? The API isn't returning a timeout, so I'm not sure how I need to handle this in VCR. The DJ job isn't running in a queue and I don't particularly want the suite to delay for 2 minutes on every run.
Thoughts or suggestions would be greatly appreciated! Thanks!

how to check sidekiq status without an extension

On the sidekiq wiki it says I can check the status of sidekiq by polling it.
I know when I perform delay method sidekiq will return the process id, but how do check Sidekiq to see if the id is finished processing.
I would expect some code like this:
if Sidekiq.check_complete?(process_id)
puts 'Process completed'
end
I want to do this without any extra gems.
From your comment above, if you want to check if Sidekiq is done processing, the best approach is to check your queue size. For example, using some helper functions as below..
def sidekiq_stats()
summary = Hash.new
stats = Sidekiq::Stats.new
summary = { processed: stats.processed,
failed: stats.failed,
enqueued: stats.enqueued,
queues: stats.queues}
end
def queue_stats(queue)
summary = Hash.new
queue = Sidekiq::Queue.new(queue)
summary = { size: queue.size,
latency: queue.latency}
end
You can call each queue and check the size, if all queues have size 0, the Sidekiq worker is idle.

How to set max_run_time for a specific job?

I want to set Delayed::Worker.max_run_time = 1.hour for a specific job that I know will take a while. However, this is set as a global configuration in initializers/delayed_job_config.rb. As a result, this change will make ALL of my jobs have a max run time of 1 hour. Is there a way to just change it for one specific job without creating a custom job?
Looking at the Worker class on GitHub:
def run(job)
job_say job, 'RUNNING'
runtime = Benchmark.realtime do
Timeout.timeout(self.class.max_run_time.to_i, WorkerTimeout) { job.invoke_job }
job.destroy
end
job_say job, 'COMPLETED after %.4f' % runtime
return true # did work
rescue DeserializationError => error
job.last_error = "#{error.message}\n#{error.backtrace.join("\n")}"
failed(job)
rescue Exception => error
self.class.lifecycle.run_callbacks(:error, self, job){ handle_failed_job(job, error) }
return false # work failed
end
It doesn't appear that you can set a per-job max. But I would think you could roll your own timeout, in your job. Assuming the Timeout class allows nesting! Worth a try.
class MyLongJobClass
def perform
Timeout.timeout(1.hour.to_i, WorkerTimeout) { do_perform }
end
private
def do_perform
# ... real perform work
end
end
You can now set a per job max run time, but it must be lower than the global constant.
To set a per-job max run time that overrides the Delayed::Worker.max_run_time you can define a max_run_time method on the job
NOTE: this can ONLY be used to set a max_run_time that is lower than
Delayed::Worker.max_run_time. Otherwise the lock on the job would
expire and another worker would start the working on the in progress
job.
I have a parent Job class where I set max_run_time to 10 minutes. Then override that method for the one that I want to be really long. Then set the global constant to be really long as well.

Parallelizing methods in Rails

My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:
def method_one
batch_query_API
process_data
end
..........
def method_nth
batch_query_API
process_data
end
def summary
method_one
......
method_nth
collect_results
end
How can I run all query methods at the same time instead of sequential in Rails (without firing up multiple workers, of course)?
Edit: all of the methods are called from a single instance variable. I think this limits the use of Sidekiq or Delay in submitting jobs simultaneously.
Ruby has the excellent promise gem. Your example would look like:
require 'future'
def method_one
...
def method_nth
def summary
result1 = future { method_one }
......
resultn = future { method_nth }
collect_results result1, ..., resultn
end
Simple, isn't it? But let's get to more details. This is a future object:
result1 = future { method_one }
It means, the result1 is getting evaluated in the background. You can pass it around to other methods. But result1 doesn't have any result yet, it is still processing in the background. Think of passing around a Thread. But the major difference is - the moment you try to read it, instead of passing it around, it blocks and waits for the result at that point. So in the above example, all the result1 .. resultn variables will keep getting evaluated in the background, but when the time comes to collect the results, and when you try to actually read these values, the reads will wait for the queries to finish at that point.
Install the promise gem and try the below in Ruby console:
require 'future'
x = future { sleep 20; puts 'x calculated'; 10 }; nil
# adding a nil to the end so that x is not immediately tried to print in the console
y = future { sleep 25; puts 'y calculated'; 20 }; nil
# At this point, you'll still be using the console!
# The sleeps are happening in the background
# Now do:
x + y
# At this point, the program actually waits for the x & y future blocks to complete
Edit: Typo in result, should have been result1, change echo to puts
You can take a look at a new option in town: The futoroscope gem.
As you can see by the announcing blog post it tries to solve the same problem you are facing, making simultaneous API query's. It seems to have pretty good support and good test coverage.
Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.
This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.
So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.
You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.
def main_method
Thread.new { method_one }
Thread.new { method_two }
Thread.new { method_three }
end
def method_one
# something_slow_that_does_an_http_request
end
def method_two
# something_slow_that_does_an_http_request
end
def method_three
# something_slow_that_does_an_http_request
end
Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.
Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.
If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.
It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.
In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.
There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.
def task_1
# Something slow
return results
end
def task_2
# Something slow
return results
end
def task_3
# Something slow
return results
end
my_responses = {}
Thread.new { my_responses[:result_1] = task_1 }
Thread.new { my_responses[:result_2] = task_2 }
Thread.new { my_responses[:result_3] = task_3 }
while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
end
# Any code at this line will not execute until all three results are collected.
Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.
If you want to get into multithreaded programming, I recommend this book:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
It's centered around Java, but the pitfalls and concepts explained are universal.
You should check out Sidekiq.
RailsCasts episode about Sidekiq.

Resources