Ruby (rails) non-blocking recursive algorithm? - ruby-on-rails

I've written the following pseudo-ruby to illustrate what I'm trying to do. I've got some computers, and I want to see if anything's connected to them. If nothing is connected to them, try again for another two attempts, and if that's the still case, shut it down.
This is for a big deployment so this recursive timer could be running for hundreds of nodes. I just want to check, is this approach sound? Will it generate tonnes of threads and eat up lots of RAM while blocking the worker processes? (I expect it will be running as a delayed_job)
check_status(0)
def check_status(i)
if instance.connected.true? then return
if instance.connected.false? and i < 3
wait.5.minutes
instance.check_status(i+1)
else
instance.shutdown
return
end
end

There is not going to be a large problem when the maximum recursion depth here is 3. It should be fine. Recursing a method does not create threads, but each call does store more information about the call stack, and eventually the resources used for that storage could run out. Not after 3 calls though, that is quite safe.
However, there is no need for recursion to solve your problem. The following loop should do just as well:
def check_status
return if instance.connected.true?
2.times do
wait.5.minutes
return if instance.connected.true?
end
instance.shutdown
end

You got answers from other users already. However, since you are waiting 5 minutes at least two times, you might consider using another language or change the design.
Ruby (MRI) has a global interpreter lock, which restricts parallel execution of Ruby code. MRI is not parallel. You risk to be inefficient with this.
Consider using threads (a reasonable number of thread pools might make sense), probably fed by a queue with tasks
Make sure you don't wait 5 minutes. Instead put them to sleep for that time. This way other threads can execute, while some are sleeping/waiting
You could also consider using jRuby, since jRuby has true parallelism (MRI is restricted by the GIL, thus it is not truly parallel)
Consider using another programming language that might be more performant

If it's running via delayed_job why not use the gem's functionality to implement what you want? I, for one, would go for something like the following. No need to sleep the delayed jobs or anything.
class CheckStatusJob
def before(job)
#job = job
end
def perform
if instance.connected.true? then return
if instance.connected.false? and #job.attempts < 3
raise 'The job failed!'
else
instance.shutdown
end
end
def max_attempts
3
end
def reschedule_at(current_time, attempts)
current_time + 5.minutes
end
end

Related

Is there a way, with Ruby, to prevent the CPU from going idle without blocking anything

I have a Ruby on Rails app. Ruby 2.3 Rails 3.2 The app uses resque which runs jobs asynchronously off a queue. One job, in particular, makes calls to an external (Ebay) api. While the api call is being made, the CPU of the ec2 instance doesn't process anything. Is there a way to prevent the CPU from going idle during the api call?
Is there a way, with Ruby, to prevent the CPU from going idle without
blocking anything?
Yes absolutely. In fact its trivial:
require 'securerandom'
t = Thread.new do
loop do
print SecureRandom.alphanumeric(3)
sleep 0.00001
end
end
This will just continue printing a Matrix screensaver indefinately without blocking until you call t.exit.
But its most likely the wrong answer to the wrong question.
Here's a solution I came up with which is working so far. I realize this question tends to offend peoples' sensibilities. But, perhaps with the caveat that this code should never actually be executed, I'd be interested to know of any potential shortcomings:
module AppHelpers
def self.prevent_idle_cpu
uuid = SecureRandom.uuid
cache_key = "/prevent_idle_cpu/#{uuid}"
Rails.cache.write(cache_key, "busy", expires_in: 1.day)
thread = Thread.new do
while Rails.cache.read(cache_key).present?
1_000_000.times { 13 * 13 }
end
end
thread.priority = -1
begin
yield
ensure
Rails.cache.delete(cache_key)
end
end
end
AppHelpers.prevent_idle_cpu do
api.make_call
end

Idempotent Design with Sidekiq Ruby on Rails Background Job

Sidekiq recommends that all jobs be idempotent (able to run multiple times without being an issue) as it cannot guarantee a job will only be run one time.
I am having trouble understanding the best way to achieve that in certain cases. For example, say you have the following table:
User
id
email
balance
The background job that is run simply adds some amount to their balance
def perform(user_id, balance_adjustment)
user = User.find(user_id)
user.balance += balance_adjustment
user.save
end
If this job is run more than once their balance will be incorrect. What is best practice for something like this?
If I think about it a potential solution I can come up with is to create a record before scheduling the job that is something like
PendingBalanceAdjustment
user_id
balance_adjustment
When the job runs it will need to acquire a lock for this user so that there's no chance of a race condition between two workers and then will need to both update the balance and delete the record from pending balance adjustment before releasing the lock.
The job then looks something like this?
def perform(user_id, balance_adjustment_id)
user = User.find(user_id)
pba = PendingBalanceAdjustment.where(:balance_adjustment_id => balance_adjustment_id).take
if pba.present?
$redis.lock("#{user_id}/balance_adjustment") do
user.balance += pba.balance_adjustment
user.save
pba.delete
end
end
end
This seems to solve both
a) Race condition between two workers taking the job at the same time (though you'd think Sidekiq could guarantee this already?)
b) A job being run multiple times after running successfully
Is this pattern a good solution?
You're on the right track; you want to use a database transaction, not a redis lock.
I think you're on the right track too but you're solution might be overkill since I don't have full knowledge of your application.
BUT, a simpler solution would simply be to have a flag on you User model like balance_updated:datetime. So, you could check that before updating.
As Mike mentions using a Transaction block should ensure it's thread safe.
In any case, to answer your question more generally... having an updated_ column is usually good enough to start with, and then if it gets complicated you can move this stuff to another model.

Moving a Resque job between queues

Is there anyway to move a resque job between two different queues?
We sometimes get in the situation that we have a big queue and a job that is near the end we find a need to "bump up its priority." We thought it might be an easy way to simply move it to another queue that had a worker waiting for any high priority jobs.
This happens rarely and is usually a case where we get a special call from a customer, so scaling, re-engineering don't seem totally necessary.
There is nothing built-in in Resque. You can use rpoplpush like:
module Resque
def self.move_queue(source, destination)
r = Resque.redis
r.llen("queue:#{source}").times do
r.rpoplpush("queue:#{source}", "queue:#{destination}")
end
end
end
https://gist.github.com/rafaelbandeira3/7088498
If it's a rare occurrence you're probably better off just manually pushing a new job into a shorter queue. You'll want to make sure that your system has a way to identify that the job has already run and to bail out so that when the job in the long queue is finally reached it is not processed again (if double processing is a problem for you).

How to avoid meeting Heroku's API rate limit with delayed job and workless

My Survey model has about 2500 instances and I need to apply the set_state method to each instance twice. I need to apply it the second time only after every instance has had the method applied to it once. (The state of an instance can depend on the state of other instances.)
I'm using delayed_job to create delayed jobs and workless to automatically scale up/down my worker dynos as required.
The set_state method typically takes about a second to execute. So I've run the following at the heroku console:
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
Shouldn't be any issues with overloading the API, right?
And yet I'm still seeing the following in my logs for each delayed job:
Heroku::API::Errors::ErrorWithResponse: Expected(200) <=> Actual(429 Unknown)
I'm not seeing any infinite loops -- it just returns this message as soon as I create the delayed job.
How can I avoid blowing Heroku's API rate limits?
Reviewing workless, it looks like it incurs an API call per delayed job to check the worker count and potentially a second API call to scale up/down. So if you are running 5000 (2500x2) jobs within a short period, you'll end up with 5000+ API calls. Which would be well in excess of the 1200/requests per hour limit. I've commented over there to hopefully help toward reducing the overall API usage (https://github.com/lostboy/workless/issues/33#issuecomment-20982433), but I think we can offer a more specific solution for you.
In the mean time, especially if your workload is pretty predictable (like this). I'd recommend skipping workless and doing that portion yourself. ie it sounds like you already know WHEN the scaling would need to happen (scale up right before the loop above, scale down right after). If that is the case you could do something like this to emulate the behavior in workless:
require 'heroku-api'
heroku = Heroku::API.new(:api_key => ENV['HEROKU_API_KEY'])
client.post_ps_scale(ENV['APP_NAME'], 'worker', Survey.count)
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
min_workers = ENV['WORKLESS_MIN_WORKERS'].present? ? ENV['WORKLESS_MIN_WORKERS'].to_i : 0
client.post_ps_scale(ENV['APP_NAME'], 'worker', min_workers)
Note that you'll need to remove workless from these jobs also. I didn't see a particular way to do this JUST for certain jobs though, so you might want to ask on that project if you need that. Also, if this needs to be 2 pass (the first time through needs to finish before the second), the 4 second sleep may in some cases be insufficient but that is a different can of worms.
I hope that helps narrow in on what you needed, but I'm certainly happy to discuss further and/or elaborate on the above as needed. Thanks!

Are rails timers reliable when using Net::HTTP?

When reading data from a potentially slow website, I want to ensure that get_response can not hang, and so added a timer to timeout after x seconds. So far, so good. I then read http://ph7spot.com/musings/system-timer which illustrates that in certain situations timer.rb doesn't work due to ruby's implementation of threads.
Does anyone know if this is one of these situations?
url = URI.parse(someurl)
begin
Timeout::timeout(30) do
response = Net::HTTP.get_response(url)
#responseValue = CGI.unescape(response.body)
end
rescue Exception => e
dosomething
end
well, first of all Timeout is not a class defined in Rails but in Ruby, second, Timeout is not reliable in cases when you make system calls.
Ruby uses what it's so called Green Threads. Let's suppose you have 3 threads, you think all of them will run in parallel but if one of the threads makes a syscall all the rest of the threads will be blocked until the syscall finishes, in this case Timeout won't work as expected, so it's always better to use something reliable like SystemTimer.

Resources