I have a worker that downloads a JSON from s3, then a streaming JSON parser (Oj Saj) that parses the file into my db. I can update the worker status from the worker's class, but once I am inside the parser class it is outside the scope of the worker (or so it appears to me)
class Worker
include Sidekiq::Worker
include Sidekiq::Status::Worker
class SajParser << Oj::Saj
at 5 #this doesn't update the status of the worker
end
def perform()
at 5 #this does update the status of the worker
end
end
I would like a solution that allows me to update the status of the worker as the parsers goes over the JSON and inserts it into the db.
If you can make Sidekiq::Status #at method take the jid of the job to update and it's current status, then yes it is possible. Maybe you could follow up on this issue it seems the same as the one you have.
If I may suggest to actually use Sidekiq's power to parallelize the parser's job, into more workers instead of having one worker doing the job since you have the json in memory you can spin a job for each part of the json and whenever one of them is done, it should store its status somewhere accessible between all jobs like the DB for example.
Related
I'm using Rails 6 in my app with Sidekiq on board. I've got FetchAllProductsWorker like below:
module Imports
class FetchAllProductsWorker
include Sidekiq::Worker
sidekiq_options queue: 'imports_fetch_all', retry: 0
def perform
(...)
end
end
end
I want to check when FetchAllProductsWorker was finished successfully last time and display this info in my front-end. This job will be fired sporadically but the user must have feedback when the last database sync (FetchAllProductsWorker is responsible for that) succeeded.
I want to have such info only for this one worker. I saw a lot of useful things inside the sidekiq API docs but none of them relate to the history of completed jobs.
You could use the Sidekiq Batches API that provides an on_success callback but that is mostly used for tracking batch work which is an overkill for your problem. I suggest writing your code at the end of the perform function.
def perform
(...) # Run the already implemented code
(notify/log for success) # If it is successful notify/log.
end
The simplified default lifecycle of a Sidekiq looks like this:
If there is an error then the job will be retried a couple of times (read about Retries in the Sidekiq docs). During that time you can see the failing job and the error on the Sidekiq Web UI if configured.
– If the job is finished successfully the job is removed from Redis and there is no information about this specific job available to the application.
That means Sidekiq does not really support running queries about jobs that run successfully in the past. If you need information about past jobs then you need to build this on its own. I basically see three options to allow monitoring Sidekiq Jobs:
Write useful information to your application's log. Most logging tools support monitoring for specific messages, sending messages, or creating views for specific events. This might be enough if you just need this information for debugging reasons.
def perform
Rails.logger.info("#{self.class.name}" started)
begin
# job code
rescue => exception
Rails.logger.error("#{self.class.name} failed: #{exception.message}")
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
Rails.logger.info("#{self.class.name}" was finished successfully)
end
end
If you are mostly interested in getting informed when there is a problem then I suggest looking at a tool like Dead Man's Snitch. how those tools are working is that you ping their API as the last step of a job which will only reach when there was no error. Then configure that tool to notify you if its API hasn't been pinged in the expected timeframe, for example, if you have a daily import job, then Dead Man's Snitch would send you a message only if there wasn't a successful import Job in the last 24 hours. If the job was successful it will not spam you every single day.
require 'open-uri'
def perform
# job code
open("https://nosnch.in/#{TOKEN}")
end
If you want to allow your application's users to see job return statuses on a dashboard in the application. Then it makes sense to store that information in the database. You could, for example, just create a JobStatus ActiveRecord model with columns like job_name, status, payload, and a created_at and then create records in that table whenever it feels useful. Once the data is in the database you can present that data like every other model's data to the user.
def perform
begin
# job code
rescue => exception
JobStatus.create(job_name: self.class.name, status: 'failed', payload: exception.to_json)
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
JobStatus.create(job_name: self.class.name, status: 'success')
end
end
And, of course, you can combine all those technics and tools for different use-cases. 1. for history and statistics, 2. for admins and people being on-call, 3. for users of your application.
I have a use case where user schedules a 'command' from the web interface. The user also specifies the date and time the command needs to be triggred.
This is sequence of steps:
1.User schedules a command 'Restart Device' at May 31, 3pm.
2.This is saved in a database table called Command.
3.Now there needs to be a background job that needs to be triggered at this specified time to do something (make an api call, send email etc.)
4.Once job is executed, It is removed or marked done, until a new command is issued.
There could be multpile users concurrently performing the above sequence of steps.
Is delayed_job a good choice for above? I couldnt find an example as how to implement above using delayed job.
EDIT: the reason I was looking at delayed_job is because eventually I would need to leverage existing relational database
I would advise to use Sidekiq. With it you can use scheduled jobs to tell sidekiq when to perform the jobs.
Example :
MyWorker.perform_at(3.hours.from_now, 'mike', 1)
EDIT : worker example
#app/workers/restart_device_worker.rb
class RestartDeviceWorker
include Sidekiq::Worker
def perform(params)
# Do the job
# ...
# update in DB
end
end
see doc: https://blog.codeship.com/how-to-use-rails-active-job/
https://guides.rubyonrails.org/active_job_basics.html
If you are using Rails 5 then you have best option of ActiveJob(inbuilt feature)
Use ActiveJob
"Active Job – Make work happen later. Active Job is a framework for declaring jobs and making them run on a variety of queuing backends. These jobs can be everything from regularly scheduled clean-ups, to billing charges, to mailings. Anything that can be chopped up into small units of work and run in parallel, really."
Active Job has built-in adapters for multiple queuing backends (Sidekiq, Resque, Delayed Job and others). You just need to tell them.
Scenario: I want to delete my story after 24 hours(1 day). Then we do create a job named "StoriesCleanupJob". Call this job at the time of the creation of story like below
StoriesCleanupJob.set(wait: 1.day).perform_later(story)
It will call the Job after 1 day.
class StoriesCleanupJob < ApplicationJob
queue_as :default
def perform(story)
if story.destroy
#put your own conditions like update the status and all, whatever you want to perform.
end
end
end
I have a "Play" button in my app that checks a stock value from an API and creates a Position object that holds that value. This action uses Resque to make a background job using Resque and Redis in the following way:
Controller - stock_controller.rb:
def start_tracking
#stock = Stock.find(params[:id])
Resque.enqueue(StockChecker, #stock.id)
redirect_to :back
end
Worker:
class StockChecker
#queue = :stock_checker_queue
def self.perform(stock_id)
stock = Stock.find_by(id: stock_id)
stock.start_tracking_position
end
end
Model - stock.rb:
def start_tracking_position
// A Position instance that holds the stock value is created
end
I now want this to happen every 15 minutes for every Stock object. I looked at the scheduling section in the Ruby Toolbox website and have a hard time deciding what fits to my needs and how to start implementing it.
My concern is that my app will create tons of Position objects so I need something that is simple, uses Resque and can withstand this type of object creating without overloading the app.
What gem should I use and what is the simplest way to make my Resque Job happen every 15 minutes when the start_tracking action happens on a Stock object?
I've found resque scheduler to be useful: https://github.com/resque/resque-scheduler.
Configure up the schedule.yml for 15 mins
The biggest issue I found was ensuring it's running post releases etc. In the end I set up God to shutdown and restart
In terms of load. Not sure I follow, the schedulers will trigger events but the load is determined by the number of workers you have and how you decide to implement the creation. You can set the priority of the queues, and workers for the queue, but I guess if you don't process them in a timely way you get a backlog, is that acceptable. ? Normally you would run them of a separate server, this minimising impact to front end
In my Rails 3.2 project, I am using SuckerPunch to run a expensive background task when a model is created/updated.
Users can do different types of interactions on this model. Most of the times these updates are pretty well spaced out, however for some other actions like re-ordering, bulk-updates etc, those POST requests can come in very frequently, and that's when it overwhelms the server.
My question is, what would be the most elegant/smart strategy to start the background job when first update happens, but wait for say 10 seconds to make sure no more updates are coming in to that Model (Table, not a row) and then execute the job. So effectively throttling without queuing.
My sucker_punch worker looks something like this:
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
#perform some expensive job
end
end
It gets called from Marker and 'Map' model and sometimes from controllers (for update_all cases)like so:
after_save :generate_static_map_html
def generate_static_map_html
StaticMapWorker.new.async.perform(self.map, self.map.markers)
end
So, a pretty standard setup for running background job. How do I make the job wait or not schedule until there are no updates for x seconds on my Model (or Table)
If it helps, Map has_many Markers so triggering the job with logic that when any marker associations of a map update would be alright too.
What you are looking for is delayed jobs, implemented through ActiveJob's perform_later. According to the edge guides, that isn't implemented in sucker_punch.
ActiveJob::QueueAdapters comparison
Fret not, however, because you can implement it yourself pretty simply. When your job retrieves the job from the queue, first perform some math on the records modified_at timestamp, comparing it to 10 seconds ago. If the model has been modified, simply add the job to the queue and abort gracefully.
code!
As per the example 2/5 of the way down the page, explaining how to add a job within a worker Github sucker punch
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
if Map.where(modified_at: 10.seconds.ago..Time.now).count > 0
StaticMapWorker.new.async.perform(map,markers)
else
#perform some expensive job
end
end
end
I have a rails application where I want to run a job in the background, but I need to run the job 2 hours from the original event.
The use case might be something like this:
User posts a product listing.
Background job is queued to syndicate listing to 3rd party api's, but even after original request, the response could take a while and the 3rd party's solution is to poll them every 2 hours to see if we can get a success acknowledgement.
So is there a way to queue a job, so that a worker daemon knows to ignore it or only listen to it at the scheduled time?
I don't want to use cron because it will load up a whole application stack and may be executed twice on overlapping long running jobs.
Can a priority queue be used for this? What solutions are there to implement this?
try delayed job - https://github.com/collectiveidea/delayed_job
something along these lines?
class ProductCheckSyndicateResponseJob < Struct.new(:product_id)
def perform
product = Product.find(product_id)
if product.still_needs_syndicate_response
# do it ...
# still no response, check again in two hours
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(product.id), :run_at => 2.hours.from_now)
else
# nothing to do ...
end
end
end
initialize job first time in controller or maybe before_create callback on model?
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(#product.id), :run_at => 2.hours.from_now)
Use the Rufus Scheduler gem. It runs as a background thread, so you don't have to load the entire application stack again. Add it to your Gemfile, and then your code is as simple as:
# in an initializer,
SCHEDULER = Rufus::Scheduler.start_new
# then wherever you want in your Rails app,
SCHEDULER.in('2h') do
# whatever code you want to run in 2 hours
end
The github page has tons of more examples.