non blocking I/O in ruby on rails - ruby-on-rails

My application build reports based on data gathered from external APIs what can take a few minutes. Naturally, I do not want to block the worker process. Is it possible to send data gathering to the background and allow user to work with application further?
Also it will be good to display the state of the job with SSE or WebSockets.

Usually you would dispatch a background job to Resque or another background job queue, and have your worker perform jobs from the queue.
That looks like this:
class ReportGenerationJob
# ...
def work
# do expensive operations here
end
end
r = Resque.new
r << ReportGenerationJob.new(...) # Not a blocking operation!
Once a given job is completed, your worker can then signal when it's done in some helpful way (e.g. e-mailing the user that the job is completed; writing a "done!" value to your database; et cetera).

Related

Display info (date) about completed Sidekiq job

I'm using Rails 6 in my app with Sidekiq on board. I've got FetchAllProductsWorker like below:
module Imports
class FetchAllProductsWorker
include Sidekiq::Worker
sidekiq_options queue: 'imports_fetch_all', retry: 0
def perform
(...)
end
end
end
I want to check when FetchAllProductsWorker was finished successfully last time and display this info in my front-end. This job will be fired sporadically but the user must have feedback when the last database sync (FetchAllProductsWorker is responsible for that) succeeded.
I want to have such info only for this one worker. I saw a lot of useful things inside the sidekiq API docs but none of them relate to the history of completed jobs.
You could use the Sidekiq Batches API that provides an on_success callback but that is mostly used for tracking batch work which is an overkill for your problem. I suggest writing your code at the end of the perform function.
def perform
(...) # Run the already implemented code
(notify/log for success) # If it is successful notify/log.
end
The simplified default lifecycle of a Sidekiq looks like this:
If there is an error then the job will be retried a couple of times (read about Retries in the Sidekiq docs). During that time you can see the failing job and the error on the Sidekiq Web UI if configured.
– If the job is finished successfully the job is removed from Redis and there is no information about this specific job available to the application.
That means Sidekiq does not really support running queries about jobs that run successfully in the past. If you need information about past jobs then you need to build this on its own. I basically see three options to allow monitoring Sidekiq Jobs:
Write useful information to your application's log. Most logging tools support monitoring for specific messages, sending messages, or creating views for specific events. This might be enough if you just need this information for debugging reasons.
def perform
Rails.logger.info("#{self.class.name}" started)
begin
# job code
rescue => exception
Rails.logger.error("#{self.class.name} failed: #{exception.message}")
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
Rails.logger.info("#{self.class.name}" was finished successfully)
end
end
If you are mostly interested in getting informed when there is a problem then I suggest looking at a tool like Dead Man's Snitch. how those tools are working is that you ping their API as the last step of a job which will only reach when there was no error. Then configure that tool to notify you if its API hasn't been pinged in the expected timeframe, for example, if you have a daily import job, then Dead Man's Snitch would send you a message only if there wasn't a successful import Job in the last 24 hours. If the job was successful it will not spam you every single day.
require 'open-uri'
def perform
# job code
open("https://nosnch.in/#{TOKEN}")
end
If you want to allow your application's users to see job return statuses on a dashboard in the application. Then it makes sense to store that information in the database. You could, for example, just create a JobStatus ActiveRecord model with columns like job_name, status, payload, and a created_at and then create records in that table whenever it feels useful. Once the data is in the database you can present that data like every other model's data to the user.
def perform
begin
# job code
rescue => exception
JobStatus.create(job_name: self.class.name, status: 'failed', payload: exception.to_json)
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
JobStatus.create(job_name: self.class.name, status: 'success')
end
end
And, of course, you can combine all those technics and tools for different use-cases. 1. for history and statistics, 2. for admins and people being on-call, 3. for users of your application.

Sidekiq worker best practices for recurring jobs

I'm trying to define what the best/most efficient way is to schedule background jobs on sidekiq. I need this jobs to run periodically (i.e every 15 minutes, every day, etc.) for each user.
The jobs are tied to several objects such as calendars, postings, blogs, etc. Each user can have 0-many of those objects.
I have considered two options:
1) Have a scheduler job that in turn schedules a background worker for each one of the objects above. The worker would look something like this:
`
class WorkerScheduler
def perform
CalendarWorker.perform_async
BlogWorker.perform_async
##### etc...
end
`
and inside each worker, I would go through the process for each one of the available records (that may require threading as discussed here: How get best performance rails requests parallel sidekiq worker
`
class CalendarWorker
calendars = Calendar.all
calendars.each do |calendar|
#### actions for each calendar
end
### reschedule worker
CalendarWorker.perform_in(15.minutes)
end
end
`
2) Every time a new record is created for calendars, postings, blogs, etc. Schedule a background worker and within that worker, reschedule to perform again later as desired. I.e:
`
class CalendarWorker
def perform(i)
.... complete all logic for Calendar.find(i) ...
CalendarWorker.perform_in(15.minutes, i)
end
end
`
Are either of the above better than the other? I'm looking to make sure this is done in the most efficient way and also that my worker dyno (Heroku) does not get overloaded. Right now, I've been scheduling a job offer record and my memory seems to be making out the hey Dyno at just under 500 MB and there is little to no traffic. Does the number of scheduled jobs have a big impact on memory usage?
Are there other potential ways to do this both for the job running itself and the scheduling?

ruby on rails background application to run jobs automaically at a time dynamically defined by users?

I have a use case where user schedules a 'command' from the web interface. The user also specifies the date and time the command needs to be triggred.
This is sequence of steps:
1.User schedules a command 'Restart Device' at May 31, 3pm.
2.This is saved in a database table called Command.
3.Now there needs to be a background job that needs to be triggered at this specified time to do something (make an api call, send email etc.)
4.Once job is executed, It is removed or marked done, until a new command is issued.
There could be multpile users concurrently performing the above sequence of steps.
Is delayed_job a good choice for above? I couldnt find an example as how to implement above using delayed job.
EDIT: the reason I was looking at delayed_job is because eventually I would need to leverage existing relational database
I would advise to use Sidekiq. With it you can use scheduled jobs to tell sidekiq when to perform the jobs.
Example :
MyWorker.perform_at(3.hours.from_now, 'mike', 1)
EDIT : worker example
#app/workers/restart_device_worker.rb
class RestartDeviceWorker
include Sidekiq::Worker
def perform(params)
# Do the job
# ...
# update in DB
end
end
see doc: https://blog.codeship.com/how-to-use-rails-active-job/
https://guides.rubyonrails.org/active_job_basics.html
If you are using Rails 5 then you have best option of ActiveJob(inbuilt feature)
Use ActiveJob
"Active Job – Make work happen later. Active Job is a framework for declaring jobs and making them run on a variety of queuing backends. These jobs can be everything from regularly scheduled clean-ups, to billing charges, to mailings. Anything that can be chopped up into small units of work and run in parallel, really."
Active Job has built-in adapters for multiple queuing backends (Sidekiq, Resque, Delayed Job and others). You just need to tell them.
Scenario: I want to delete my story after 24 hours(1 day). Then we do create a job named "StoriesCleanupJob". Call this job at the time of the creation of story like below
StoriesCleanupJob.set(wait: 1.day).perform_later(story)
It will call the Job after 1 day.
class StoriesCleanupJob < ApplicationJob
queue_as :default
def perform(story)
if story.destroy
#put your own conditions like update the status and all, whatever you want to perform.
end
end
end

Multithreading vs Background jobs in Rails

I have an application that makes thousands of requests to a web service API. Each request takes about 2 seconds, then the response creates new record in the database. I want to just fire off as many of those requests as I can simultaneously, and save the response to the database as as soon as I get the response.
Is this something I should be using a gem like sidekiq for, or the ruby Thread class? I don't want to just hand off the requests to be handled synchronously.
Sounds like you need a thread pool for performing the operation, and a database thread to commit the results.
You can build one of these really simply:
require 'thread'
db_queue = Queue.new
Thread.new do
while (item = db_queue.pop)
# ... Deal with item in queue
end
end
# Example of supplying a job
db_queue.push(api_response)
# When finished
db_queue.push(nil)
Due to the Global Interpreter Lock in the standard Ruby runtime threads are only really useful for managing many lightly loaded threads. If you need something more heavy-duty, JRuby might be what you're looking for.

Sidekiq handling re-queue when processing large data

See the updated question below.
Original question:
In my current Rails project, I need to parse large xml/csv data file and save it into mongodb.
Right now I use this steps:
Receive uploaded file from user, store the data into mongodb
Use sidekiq to perform async process of the data in mongodb.
After process finished, delete the raw data.
For small and medium data in localhost, the steps above run well. But in heroku, I use hirefire to dynamically scale worker dyno up and down. When worker still processing the large data, hirefire see empty queue and scale down worker dyno. This send kill signal to the process, and leave the process in incomplete state.
I'm searching a better way to do the parsing, allow the parsing process got killed anytime (saving the current state when receiving kill signal), and allow the process got re-queued.
Right now I'm using Model.delay.parse_file and it don't get re-queued.
UPDATE
After reading sidekiq wiki, I found article about job control. Can anyone explain the code, how it works, and how it preserve it's state when receiving SIGTERM signal and the worker get re-queued?
Is there any alternative way to handle job termination, save current state, and continue right from the last position?
Thanks,
Might be easier to explain the process and the high level steps, give a sample implementation (a stripped down version of one that I use), and then talk about throw and catch:
Insert the raw csv rows with an incrementing index (to be able to resume from a specific row/index later)
Process the CSV stopping every 'chunk' to check if the job is done by checking if Sidekiq::Fetcher.done? returns true
When the fetcher is done?, store the index of the currently processed item on the user and return so that the job completes and control is returned to sidekiq.
Note that if a job is still running after a short timeout (default 20s) the job will be killed.
Then when the job runs again simply, start where you left off last time (or at 0)
Example:
class UserCSVImportWorker
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
items = user.raw_csv_items.where(:index => {'$gte' => user.last_csv_index.to_i})
items.each_with_index do |item, i|
if (i+1 % 100) == 0 && Sidekiq::Fetcher.done?
user.update(last_csv_index: item.index)
return
end
# Process the item as normal
end
end
end
The above class makes sure that each 100 items we check that the fetcher is not done (a proxy for if shutdown has been started), and ends execution of the job. Before the execution ends however we update the user with the last index that has been processed so that we can start where we left off next time.
throw catch is a way to implement this above functionality a little cleaner (maybe) but is a little like using Fibers, nice concept but hard to wrap your head around. Technically throw catch is more like goto than most people are generally comfortable with.
edit
Also you could not make call to Sidekiq::Fetcher.done? and record the last_csv_index on each row or on each chunk of rows processed, that way if your worker is killed without having the opportunity to record the last_csv_index you can still resume 'close' to where you left off.
You are trying to address the concept of idempotency, the idea that processing a thing multiple times with potential incomplete cycles does not cause problems. (https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-jobs-idempotent-and-transactional)
Possible steps forward
Split the file up into parts and process those parts with a job per part.
Lift the threshold for hirefire so that it will scale when jobs are likely to have fully completed (10 minutes)
Don't allow hirefire to scale down while a job is working (set a redis key on start and clear on completion)
Track progress of the job as it is processing and pick up where you left off if the job is killed.

Resources