Run callback before (or during) Sidekiq retry - ruby-on-rails

I am looking to run a callback before (or when) a retry occurs for my Sidekiq worker jobs. We store the state of our jobs as an attribute (queued, performing, completed, etc.), and retries do not automatically reset this state. I need a way to run a callback that sets the retried job's state back to queued before the retried job gets queued. For example:
before_retry do
self.status = "queued"
end
I read another post (Sidekiq Pro callback when batch is retried?) suggesting some sort of server middleware to detect retries and run callbacks, but I don't entirely know if that will work as I feel the retry would already be queued by the time any middleware catches it. Are there any other options out there? Thanks in advance!

Would it work to rescue inside perform?
def perform
...
rescue YourException
self.status = "queued"
raise
end
raiseing again so the job is retried.

Related

Rails: How to send json response immediately for long running job

I am processing long running jobs. i am invoking long running method from controller. i want to send json response immediately to the user. below is my code but its not working.
def process
p = Proces.new(:status => "in-progress")
render :json => {:id => process.id}
long_running_job()
end
How can i send json response immediately?
You need to run your long_running_job in a background job queue. This will make your long running task occur outside of the web request cycle, allowing you to return the JSON immediately, while your long running processes continues elsewhere. Look into ActiveJob if you're on Rails 4.2+ and/or Sidekiq.

Rails avoid blocking worker in slow controller

Generally any DB/File IO even external HTTP requests are pretty quick, but I am finding slower ones can hold up all my workers (and memory limits how many Ruby instances I can run), and creating large numbers of threads per worker has other issues (with CPU or memory heavy actions clogging up the system).
Can I have Rails process these actions in an async manner (more like NodeJS) or else introduce threads for that action in some way?
Since I want to respond to the original request, neither workers or just spawning another thread myself seems appropriate, since Rails will ensure the original thread sends a response when it returns from the controller.
def my_action
#data1 = get_data("https://slow.com/data") #e.g. Net::HTTP
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
def my_action
get_data("https://slow.com/data").then do |data1| # e.g. internal thread, not sure on other options
get_data("https://slow.com/data2?group_id=#{data["id"]}").then do |data2|
#data1 = data1
#data2 = data2
render # Appears to have no effect
end
end
# Rails does an implicit "render" on return
end
def my_action
Thread.new do # explicit thread just for this request
#data1 = get_data("https://slow.com/data")
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
end
In a Rails application, you're better off relying on an external process to run background jobs rather than using Ruby Threads.
Sidekiq is a pretty standard gem now for this purpose.
If it takes 10 seconds to process a request, and you want to send your response to the original HTTP request, then you've got to hold open that HTTP connection for 10 seconds. You can't get around that. If your server can handle X HTTP connections, and you have X+1 people making these slow requests... someone is going to get blocked.
There are only three possible solutions:
Figure out a way to process the requests faster. This is ideal, if you can do it.
Don't hold open the HTTP connection. Run a background task (using Sidekiq or similar gem) to do the work. When it's done, send it via websocket, or have the client poll for it. It makes your API more complicated for the client, but as a client I'd rather deal with a little complexity than having my requests blocked and maybe time out.
Scale up your server until it can handle the traffic. This is the "throw money at the problem" solution. I generally disapprove of this, since you'll have to keep throwing more money every time demand grows. But if your organization has more money than dev time, it might work for a while.
Those are your options.

How to stop action execution with another action in rails?

I have following actions in a Test controller:
def run
while true do
# code
end
end
def stop
# stop the run action
end
how can stop action be implemented to halt the run action?
Because a client will wait for a response from a server, you can't have a loop in an endpoint that waits for another endpoint to be called.
In this case, a client will visit /test/run, but since the server won't return anything until the loop finishes, the client will just keep waiting.
This means (unless you specifically configured your webserver to do so), that another connection can't be made to the server to reach the test/stop endpoint.
If you must have a "job" that runs and be cancel-able by an endpoint, turn it into an actual background task.

Pull/push status in rails 3

I have a longer running task in the background, and how exactly would I let pull status from my background task or would it better somehow to communicate the task completion to my front end?
Background :
Basically my app uses third party service for processing data, so I want this external web service workload not to block all the incoming requests to my website, so I put this call inside a background job (I use sidekiq). And so when this task is done, I was thinking of sending a webhook to a certain controller which will notify the front end that the task is complete.
How can I do this? Is there a better solution for this?
Update:
My app is hosted on heroku
Update II:
I've done some research on the topic and I found out that I can create a seperate app on heroku which will handle this, found this example :
https://github.com/heroku-examples/ruby-websockets-chat-demo
This long running task will be run per user, on a website with a lot of traffic, is this a good idea?
I would implement this using a pub/sub system such as Faye or Pusher. The idea behind this is that you would publish the status of your long running job to a channel, which would then cause all subscribers of that channel to be notified of the status change.
For example, within your job runner you could notify Faye of a status change with something like:
client = Faye::Client.new('http://localhost:9292/')
client.publish('/jobstatus', {id: jobid, status: 'in_progress'})
And then in your front end you can subscribe to that channel using javascript:
var client = new Faye.Client('http://localhost:9292/');
client.subscribe('/jobstatus', function(message) {
alert('the status of job #' + message.jobid + ' changed to ' + message.status);
});
Using a pub/sub system in this way allows you to scale your realtime page events separately from your main app - you could run Faye on another server. You could also go for a hosted (and paid) solution like Pusher, and let them take care of scaling your infrastructure.
It's also worth mentioning that Faye uses the bayeaux protocol, which means it will utilise websockets where it is available, and long-polling where it is not.
We have this pattern and use two different approaches. In both cases background jobs are run with Resque, but you could likely do something similar with DelayedJob or Sidekiq.
Polling
In the polling approach, we have a javascript object on the page that sets a timeout for polling with a URL passed to it from the rails HTML view.
This causes an Ajax ("script") call to the provided URL, which means Rails looks for the JS template. So we use that to respond with state and fire an event for the object to response to when available or not.
This is somewhat complicated and I wouldn't recommend it at this point.
Sockets
The better solution we found was to use WebSockets (with shims). In our case we use PubNub but there are numerous services to handle this. That keeps the polling/open-connection off your web server and is much more cost effective than running the servers needed to handle these connection.
You've stated you are looking for front-end solutions and you can handle all the front-end with PubNub's client JavaScript library.
Here's a rough idea of how we notify PubNub from the backend.
class BackgroundJob
#queue = :some_queue
def perform
// Do some action
end
def after_perform
publish some_state, client_channel
end
private
def publish some_state, client_channel
Pubnub.new(
publish_key: Settings.pubnub.publish_key,
subscribe_key: Settings.pubnub.subscribe_key,
secret_key: Settings.pubnub.secret_key
).publish(
channel: client_channel,
message: some_state.to_json,
http_sync: true
)
end
end
The simplest approach that I can think of is that you set a flag in your DB when the task is complete, and your front-end (view) sends an ajax request periodically to check the flag state in db. In case the flag is set, you take appropriate action in the view. Below are code samples:
Since you suggested that this long running task needs to run per user, so let's add a boolean to users table - task_complete. When you add the job to sidekiq, you can unset the flag:
# Sidekiq worker: app/workers/task.rb
class Task
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
# Long running task code here, which executes per user
user.task_complete = true
user.save!
end
end
# When adding the task to sidekiq queue
user = User.find(params[:id])
# flag would have been set to true by previous execution
# In case it is false, it means sidekiq already has a job entry. We don't need to add it again
if user.task_complete?
Task.perform_async(user.id)
user.task_complete = false
user.save!
end
In the view you can periodically check whether the flag was set using ajax requests:
<script type="text/javascript">
var complete = false;
(function worker() {
$.ajax({
url: 'task/status/<%= #user.id %>',
success: function(data) {
// update the view based on ajax request response in case you need to
},
complete: function() {
// Schedule the next request when the current one's complete, and in case the global variable 'complete' is set to true, we don't need to fire this ajax request again - task is complete.
if(!complete) {
setTimeout(worker, 5000); //in miliseconds
}
}
});
})();
</script>
# status action which returns the status of task
# GET /task/status/:id
def status
#user = User.find(params[:id])
end
# status.js.erb - add view logic based on what you want to achieve, given whether the task is complete or not
<% if #user.task_complete? %>
$('#success').show();
complete = true;
<% else %>
$('#processing').show();
<% end %>
You can set the timeout based on what the average execution time of your task is. Let's say your task takes 10 minutes on average, so their's no point in checking it at a 5sec frequency.
Also in case your task execution frequency is something complex (and not 1 per day), you may want to add a timestamp task_completed_at and base your logic on a combination of the flag and timestamp.
As for this part:
"This long running task will be run per user, on a website with a lot of traffic, is this a good idea?"
I don't see a problem with this approach, though architectural changes like executing jobs (sidekiq workers) on separate hardware will help. These are lightweight ajax calls, and some intelligence built into your javascript (like the global complete flag) will avoid the unnecessary requests. In case you have huge traffic, and DB reads/writes are a concern then you may want to store that flag directly into redis instead (since you already have it for sidekiq). I believe that will resolve your read/write concerns, and I don't see that it is going to cause problems. This is the simplest and cleanest approach I can think of, though you can try achieving the same via websockets, which are supported by most modern browsers (though can cause problems in older versions).

Concurrent requests: second request does not recognize updates from the first

I have an extremely long operation (~30s on average) that happens in a request cycle. It is not a huge deal because it is ran very rarely and only through an administrative portal, otherwise I would push it to Resque or whatever.
When the request starts and the object is not flagged, the first thing that happens in the controller method is to set a flag:
def create
raise if #foo.flag?
#foo.update_attributes! flag: true
# perform lengthy operation
ensure
#foo.update_attributes! flag: false
end
When I start the first request, it sets flag to true and begins the operation. Then when I start the second request while the first is hanging, it never raises an error even though I can see clearly in the Rails console that flag is true for #foo.
I was under the impression that update_attributes! commits the database transaction, and so subsequent requests should see the changes. Is this not the case?
I am just testing in development with the thin server. Is there any way I can do this?

Resources