I have a "Play" button in my app that checks a stock value from an API and creates a Position object that holds that value. This action uses Resque to make a background job using Resque and Redis in the following way:
Controller - stock_controller.rb:
def start_tracking
#stock = Stock.find(params[:id])
Resque.enqueue(StockChecker, #stock.id)
redirect_to :back
end
Worker:
class StockChecker
#queue = :stock_checker_queue
def self.perform(stock_id)
stock = Stock.find_by(id: stock_id)
stock.start_tracking_position
end
end
Model - stock.rb:
def start_tracking_position
// A Position instance that holds the stock value is created
end
I now want this to happen every 15 minutes for every Stock object. I looked at the scheduling section in the Ruby Toolbox website and have a hard time deciding what fits to my needs and how to start implementing it.
My concern is that my app will create tons of Position objects so I need something that is simple, uses Resque and can withstand this type of object creating without overloading the app.
What gem should I use and what is the simplest way to make my Resque Job happen every 15 minutes when the start_tracking action happens on a Stock object?
I've found resque scheduler to be useful: https://github.com/resque/resque-scheduler.
Configure up the schedule.yml for 15 mins
The biggest issue I found was ensuring it's running post releases etc. In the end I set up God to shutdown and restart
In terms of load. Not sure I follow, the schedulers will trigger events but the load is determined by the number of workers you have and how you decide to implement the creation. You can set the priority of the queues, and workers for the queue, but I guess if you don't process them in a timely way you get a backlog, is that acceptable. ? Normally you would run them of a separate server, this minimising impact to front end
Related
I am building an online e-commerce store, and I am trying to use rails with action cable to update a product from being out of stock to in-stock at a certain date time e.g 12:00:00 2020-02-19.
The idea is as soon as the time is reached, I want to push a Websocket that the product is now available.
I have tried a few solutions such as:
Thread.new do
while true do
if **SOMETIME** == Time.now
ActionCable.server.broadcast "product_channel",content: "product-in-stock"
end
end
end
The main issue with this approach is that it creates another thread and makes rails unresponsive. Furthermore, if this value is set for say 1 week from now I do not want every user who queries the endpoint to create a brand-new thread running like this.
You have two option use sidekiq jobs or use whenever job gem
https://github.com/mperham/sidekiq/wiki/Scheduled-Jobs
Whenever allow you to set specific day and time, check the documentation for more info
https://github.com/javan/whenever
Currently I am building an application that allows users to place bids on products and admins to approve them. The 'transactions' themselves take place outside of the scope of the application. Currently, users see the price of an asset on the transaction/new page and submit a bid by submitting the form. Admins click a button to approve the bid.
class TransactionsController < ApplicationController
before_action :get_price
def new
#price = get_price
#tansaction = Transaction.new
end
###
def get_price
#price = <<Some External Query>>
end
def approve
t = Transaction.find(params[:id])
t.status = "Approved"
t.update!
end
Obviously this is not ideal. I don't want to query the API every time a user wants to submit a bid. Ideally, I could query this API every 5-10 seconds in the background and use the price in that manner. I have looked at a couple of techniques for running background jobs including delayed_job, sidekiq, and resque. For example in sidekiq I could run something like this:
#app/workers/price_worker.rb
class PriceWorker
include Sidekiq::Worker
def perform(*args)
get_price
end
def get_price
#price = <<Some External Query>>
end
end
#config/initializers/sidekiq.rb
schedule_file = "config/schedule.yml"
if File.exists?(schedule_file) && Sidekiq.server?
Sidekiq::Cron::Job.load_from_hash YAML.load_file(schedule_file)
end
#config/schedule.yml
my_price_job:
cron: "*/10 * * * * * "
class: "PriceWorker"
That code runs. The problem is I am kind of lost on how to handle the price variable and pass it back to the user from the worker. I have watched the Railscasts episodes for both sidekiq and resque. I have written background workers and jobs that queue and run properly, but I cannot figure out how to implement them into my application. This is the first time I have dealt with background jobs so I have a lot to learn. I have spent sometime researching this issue and it seems like more background jobs are used for longer running tasks like updating db indexes rather than constantly recurring jobs (like an API request every 5 seconds).
So to sum up, What is the proper technique for running a constantly recurring task such as querying an external API in Rails? Any feedback on how to do this properly will be greatly appreciated! Thank you.
That is not how background jobs work. You're right, you have a lot of reading up to do. Think of running an asynchronous job processor like Sidekiq as running an entirely separate app. It shares the same code base as your Rails app but it runs completely separately. If you want these two separate apps to talk to each other then you have to design and write that code.
For example, I would define a cache with reader and writer methods, then have the cache populated when necessary:
someone loads product "foo" for the first time on your site
Rails checks the cache and finds it empty
Rails calls the external service
Rails saves the external service response to the cache using its writer method
Rails returns the cached response to the client
The cache would be populated thereafter by Sidekiq:
someone loads product "foo" for the second time on your site
Rails checks the cache and finds the cached value from above
Rails fires a Sidekiq job telling it to refresh the cache
Rails returns the cached response to the client
Continuing from step 3 above:
Sidekiq checks to see when the cache was last refreshed, and if it was more than x seconds ago then continue, else quit
Sidekiq calls the external service
Sidekiq saves the external service response to the cache using its writer method
When the next client loads product "foo", Rails will read the cache that was updated (or not updated) by Sidekiq.
With this type of system, the cache must be an external store of some kind like a relational database (MySQL, Postgres, Sqlite) or a NoSQL database (Redis, memcache). You cannot use the internal Rails cache because the Rails cache exists only within the memory space of the Rails app, and is not readable by Sidekiq. (because Sidekiq runs as a totally separate app)
I guess in this case you should use rails cache. Put something like this in your controller:
#price = Rails.cache.fetch('price') do
<<Some external query>>
end
you can also configure cache expiration date, by setting expires_in argument, see https://apidock.com/rails/ActiveSupport/Cache/Store/fetch for more information.
Regarding using background jobs to update your "price" value, you would need to store retrieved data anyways (use some kind of database) and fetch it in your controller.
In my Rails 3.2 project, I am using SuckerPunch to run a expensive background task when a model is created/updated.
Users can do different types of interactions on this model. Most of the times these updates are pretty well spaced out, however for some other actions like re-ordering, bulk-updates etc, those POST requests can come in very frequently, and that's when it overwhelms the server.
My question is, what would be the most elegant/smart strategy to start the background job when first update happens, but wait for say 10 seconds to make sure no more updates are coming in to that Model (Table, not a row) and then execute the job. So effectively throttling without queuing.
My sucker_punch worker looks something like this:
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
#perform some expensive job
end
end
It gets called from Marker and 'Map' model and sometimes from controllers (for update_all cases)like so:
after_save :generate_static_map_html
def generate_static_map_html
StaticMapWorker.new.async.perform(self.map, self.map.markers)
end
So, a pretty standard setup for running background job. How do I make the job wait or not schedule until there are no updates for x seconds on my Model (or Table)
If it helps, Map has_many Markers so triggering the job with logic that when any marker associations of a map update would be alright too.
What you are looking for is delayed jobs, implemented through ActiveJob's perform_later. According to the edge guides, that isn't implemented in sucker_punch.
ActiveJob::QueueAdapters comparison
Fret not, however, because you can implement it yourself pretty simply. When your job retrieves the job from the queue, first perform some math on the records modified_at timestamp, comparing it to 10 seconds ago. If the model has been modified, simply add the job to the queue and abort gracefully.
code!
As per the example 2/5 of the way down the page, explaining how to add a job within a worker Github sucker punch
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
if Map.where(modified_at: 10.seconds.ago..Time.now).count > 0
StaticMapWorker.new.async.perform(map,markers)
else
#perform some expensive job
end
end
end
I'm looking for a solution which enables:
Repetitive executing of a scraping task (nokogiri)
Changing the time interval via http://www.myapp.com/interval (example)
What is the best solution/way to get this done?
Options I know about
Custom Rake task
Rufus Scheduler
Current situation
In ./config/initializers/task_scheduler.rb I have:
require 'nokogiri'
require 'open-uri'
require 'rufus-scheduler'
require 'rake'
scheduler = Rufus::Scheduler.new
scheduler.every "1h" do
puts "BEGIN SCHEDULER at #{Time.now}"
#url = "http://www.marktplaats.nl/z/computers-en-software/apple-ipad/ipad-mini.html? query=ipad+mini&categoryId=2722&priceFrom=100%2C00&priceTo=&startDateFrom=always"
#doc = Nokogiri::HTML(open(#url))
#title = #doc.at_css("title").text
#number = 0
2.times do |number|
#doc.css(".defaultSnippet.group-#{#number}").each do |listing|
#listing_title = listing.at_css(".mp-listing-title").text
#listing_subtitle = listing.at_css(".mp-listing-description").text
#listing_price = listing.at_css(".price").text
#listing_priority = listing.at_css(".mp-listing-priority-product").text
listing = Listing.create(title: "#{#listing_title}", subtitle: "#{#listing_subtitle}", price: "#{#listing_price}")
end
#number +=1
end
puts "END SCHEDULER at #{Time.now}"
end
Is it not working?
Yes the current setup is working. However, I don't know how to enable changing the interval time via http://www.myapp.com/interval (example).
Changing scheduler.every "1h" to scheduler.every "#{#interval} do does not work.
In what file do I have to define #interval for it to work in task_scheduler.rb?
I'm not very familiar with Rufus Scheduler but it appears that it will be difficult to acheive both of your goals (regular heartbeat, dynamically rescheduled) with it. In order for it to work, you'll have to capture the job_id that it returns, use that job_id to stop the job if a rescheduling event occurs, and then create the new job. Rufus also points out that it's an in-memory application whose jobs will disappear when the process disappears -- reboot the server, restart the application, etc and you've got to reschedule from scratch.
I'd consider two things. First, I'd consider creating a model that wraps the screen-scraping that you want to do. At a minimum you'd capture the url and the interval. The model may wrap up the code for processing the html response (basically what's wrapped up in the 2.times block) as instance methods that you trigger based on the URL. You may also capture this in a text column and use eval on it, assuming that only "good guys" get access to this part of the system. This has a couple of advantages: you can quickly expand to scraping other sites and you can sanitize the interval sent back by the user.
Second, something like Delayed::Job may better suit your needs. Delayed::Job allows you to specify a time for the job's execution which you could fill in by reading the model and converting the interval to a time. The key to this approach is that the job must schedule the next iteration of itself before it exits.
This won't be as rock-steady as something like cron but it does seem to better address the rescheduling need.
First off: your rufus scheduler code is in an initializer, which is fine, but it is executed before the rails process is started, and only when the rails process is started. So, in the initializer you have no access to any variable #interval you could set, for instance in a controller.
What are possible options, instead of a class variable:
read it from a config file
read it from a database (but you will have to setup your own connection, in the initializer activerecord is not started imho
And ... if you change the value you will have to restart your rails process for it to have effect again.
So an alternative approach, where your rails process handles the interval of the scheduled job, is to use a recurring background job. At the end of the background, it reschedules itself, with the at that moment active interval. The interval is fetched from the database, I would propose.
Any background job handler could do this. Check ruby toolbox, I vote for resque or delayed_job.
I have a rails application where I want to run a job in the background, but I need to run the job 2 hours from the original event.
The use case might be something like this:
User posts a product listing.
Background job is queued to syndicate listing to 3rd party api's, but even after original request, the response could take a while and the 3rd party's solution is to poll them every 2 hours to see if we can get a success acknowledgement.
So is there a way to queue a job, so that a worker daemon knows to ignore it or only listen to it at the scheduled time?
I don't want to use cron because it will load up a whole application stack and may be executed twice on overlapping long running jobs.
Can a priority queue be used for this? What solutions are there to implement this?
try delayed job - https://github.com/collectiveidea/delayed_job
something along these lines?
class ProductCheckSyndicateResponseJob < Struct.new(:product_id)
def perform
product = Product.find(product_id)
if product.still_needs_syndicate_response
# do it ...
# still no response, check again in two hours
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(product.id), :run_at => 2.hours.from_now)
else
# nothing to do ...
end
end
end
initialize job first time in controller or maybe before_create callback on model?
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(#product.id), :run_at => 2.hours.from_now)
Use the Rufus Scheduler gem. It runs as a background thread, so you don't have to load the entire application stack again. Add it to your Gemfile, and then your code is as simple as:
# in an initializer,
SCHEDULER = Rufus::Scheduler.start_new
# then wherever you want in your Rails app,
SCHEDULER.in('2h') do
# whatever code you want to run in 2 hours
end
The github page has tons of more examples.