How to run certain task when all other jobs are finished - ruby-on-rails

I have a Rails app and use Jobs/Sidekiq for certain long running tasks and calculations.
I want to renew my menu-navigation if all jobs are done, by invalidating the cache returning the navigation and recalculate the content for that cache again. So users are seeing actual results and wont have long rendering pages.
I don't use actionViews, I use Rails as an API
Example
Users are able to add items, which are starting long running tasks after saving (resizing images, up/downloading media, converting stuff...)
When all these jobs have finished, the cache should get
invalidated and recalculated.
At the moment I invalidate the cache in each job, but that's not optimal, because these jobs are always running in some way with other items, so the cache will never be available.
I thought on a specific database field (ie: visible) or db table, which is true when all jobs are done and observe that state, and based on that state recalulate results.
Another thought is create a db table which represents the calculated results, and acts as a cache for my navigation.
What is the "right" way to handle that within Rails?
Process of saving items:
DatabaseItem
saving! && disabled visibility
Up/Downloading Stuff (job)
Resizing images (job)
convert media (job)
enabling item
invalidate cache
recalculating stuff (job) (or first visit of page)
Example of my navigation Tree which is very deep (>20k entries):
RootItem (10 items)
ChildItem (3 items)
SubItem (2 items)
SubItem (1 item)
ChildItem (4 items)
SubItem (2 items)
SubItem (2 items)
I tried to handle that most generic as possible. If you need more details please let me know.
I use Rails, Postgres, Memcached, Dalli, actionpack-action_caching
Simplified:
class ItemSavedJob
include Sidekiq::Job
sidekiq_options retry: 3
# It activates or deactivates items
def perform(id)
item = Item.find(id)
item.make_invisible!
item.start_long_running_job
item.images.each do |image|
ItemConvertJob.perform_async(image.id)
# HERE when the last convertion has finished,
# it should activate the item and invalidate the Navigation
end
CacheHelper.invalidate_navigation_cache!
end
end
class ImageConvertJob
include Sidekiq::Job
sidekiq_options retry: 3
def perform(id)
image = Image.find(id)
image.do_a_long_convertion
end
end

Related

When updating an individual child record triggers update to parent, and child records can be updated all at once, how to manage efficiency

I have a parent Order that has many child Items. There is a status attribute on both parent and children, and the statuses interact. For example, only when all Items are complete can the parent Order be complete.
Think of it like a restaurant where the policy is to bring out all the food in a given Order at once. Each food Item goes through very statuses like in_prep, being_cooked, etc. to complete and when all Items are complete, then the Order is complete, and then it can be brought out, at which point all Item and the parent Order can have a status of enjoyed_by_customer
Extending this example, let's say the various line cooks at each part of the kitchen have a tablet where they can update statuses of specific Items and only those Items in their purview. The chef, however, can view the entirety of the Order and make updates to any individual Item status, because the chef is walking around and tweaking and tasting and making changes, with creative ability to change any Item at will. For example, maybe the line cook says the Item salad is complete, but the chef tastes it, says it needs more dressing, and sets it back to in_prep.
First, I built a dashboard for line cooks to mark specific Items in their purview. A line cook making salads, for example, should be able to view all the salads from all the Orders that are not complete, and make batch updates to all of them. For example, the last step might be garnish, so they can garnish a dozen plates all at once and then set all those salads to complete. This meant that I had thought to put a callback on the Item model that called a set_status method on the parent Order:
class Item
after_update :set_order_status
# after_update so we know that the status on this Item is valid
def set_order_status
if self.status.changed?
self.order.set_status
end
end
end
class Order
def set_status
new_status = calculated_somehow_from_item_statuses # eg if all items are 'complete', this returns 'complete'
self.update_attributes(status: new_status)
end
end
Second, I built an overall Order dashboard for the chef, where they can, again, modify statuses of any Item as well as other attributes on the Order itself. For example, maybe chef wants to mark the Order with a special discount or something. For efficiency's sake, I set up the Order dashboard to update the entire Order all at once using nested attributes. In other words, the submitted params are like:
{
"discount": "true",
"items_attributes": [
{"id"=>"1","status"=>"complete"},
{"id"=>"2","status"=>"in_prep"},
]
}
# this way it's easy to just do:
#order.update_attributes(params)
However, of course, because of the way the individual line cook's dashboards have been set up, #order.update_attributes is actually called once for the #order and then once again after each child item updates. In other words, there's redundancy caused by the fact that child items can be both updated 1) individually & 2) en masse, where in either situation, the parent order should be updated at the end based on all child items collectively.
My only thought on revising this is to adjust the set_order_status callback on the child Item, it isn't triggered if the item update is done at the Order level by a chef. In other words:
class Item
after_update :set_order_status
attr_accessor: changed_by_chef
def set_order_status
if self.status.changed? && self.changed_by_chef == false
self.order.set_status
end
end
end
# where updates from the chef's dashboard, rather than individual line cooks, will have changed_by_chef in params
{
"discount": "true",
"items_attributes": [
{"id"=>"1","status"=>"complete", "changed_by_chef"=>"true"},
{"id"=>"2","status"=>"in_prep", "changed_by_chef"=>"true"},
]
}
#order.assign_attributes(params) # assign statuses to child items and run validations
#order.set_status # this method calculates appropriate order level status based on validated child item statuses, and calls the final update_attributes to save everything
In this way, #order.update_attributes is effectively called once. However, this also feels somewhat hacky to me, and I'm wondering if there's a more conventional Railsy way of doing this.
Thanks!

Expire cache based on saved value

In my app there is a financial overview page with quite a lot of queries. This page is refreshed once a month after executing a background job, so I added caching:
#iterated_hours = Rails.cache.fetch("productivity_data", expires_in: 24.hours) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
The cache must expire when the background job finishes, so I created a model CacheExpiration:
class CacheExpiration < ApplicationRecord
validates :cache_key, :expires_in, presence: true
end
So in the background job a record is created:
CacheExpiration.create(cache_key: "productivity_data", expires_in: DateTime.now)
And the Rails.cache.fetch is updated to:
expires_in = get_cache_key_expiration("productivity_data")
#iterated_hours = Rails.cache.fetch("productivity_data", expires_in: expires_in) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
private def get_cache_key_expiration(cache_key)
cache_expiration = CacheExpiration.find_by_cache_key(cache_key)
if cache_expiration.present?
cache_expiration.expires_in
else
24.hours
end
end
So now the expiration is set to a DateTime, is this correct or should it be a number of seconds? Is this the correct approach to make sure the cache is expired only once when the background job finishes?
Explicitly setting an expires_in value is very limiting and error prone IMO. You will not be able to change the value once a cache value has been created (well you can clear the cache manually) and if ever you want to change the background job to run more/less often, you also have to remember to update the expires_in value. Additionally, the time when the background job is finished might be different from the time the first request to the view is made. As a worst case, the request is made a minute before the background job updates the information for the view. Your users will have to wait a whole day to get current information.
A more flexible approach is to rely on updated_at or in their absence created_at fields of ActiveRecord models.
For that, you can either rely on the CacheExpiration model you already created (it might already have the appropriate fields) or use the last of the "huge number of records" you create. Simply order them and take the last SomeArModel.order(created_at: :desc).first
The benefit of this approach is that whenever the AR model you create is updated/created, you cache is busted and a new one will be created. There is no longer a coupling between the time a user called the end point and the time the background job ran. In case a record is created by any means other than the background job, it will also simply be handled.
ActiveRecord models are first class citizens when it comes to caching. You can simply pass them in as cache keys. Your code would then change to:
Rails.cache.fetch(CacheExpiration.find_by_cache_key("productivity_data")) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
But if at all possible, try to find an alternative model so you no longer have to maintain CacheExpiration.
Rails also has a guide on that topic

Ruby on Rails constantly updating a variable

Currently I am building an application that allows users to place bids on products and admins to approve them. The 'transactions' themselves take place outside of the scope of the application. Currently, users see the price of an asset on the transaction/new page and submit a bid by submitting the form. Admins click a button to approve the bid.
class TransactionsController < ApplicationController
before_action :get_price
def new
#price = get_price
#tansaction = Transaction.new
end
###
def get_price
#price = <<Some External Query>>
end
def approve
t = Transaction.find(params[:id])
t.status = "Approved"
t.update!
end
Obviously this is not ideal. I don't want to query the API every time a user wants to submit a bid. Ideally, I could query this API every 5-10 seconds in the background and use the price in that manner. I have looked at a couple of techniques for running background jobs including delayed_job, sidekiq, and resque. For example in sidekiq I could run something like this:
#app/workers/price_worker.rb
class PriceWorker
include Sidekiq::Worker
def perform(*args)
get_price
end
def get_price
#price = <<Some External Query>>
end
end
#config/initializers/sidekiq.rb
schedule_file = "config/schedule.yml"
if File.exists?(schedule_file) && Sidekiq.server?
Sidekiq::Cron::Job.load_from_hash YAML.load_file(schedule_file)
end
#config/schedule.yml
my_price_job:
cron: "*/10 * * * * * "
class: "PriceWorker"
That code runs. The problem is I am kind of lost on how to handle the price variable and pass it back to the user from the worker. I have watched the Railscasts episodes for both sidekiq and resque. I have written background workers and jobs that queue and run properly, but I cannot figure out how to implement them into my application. This is the first time I have dealt with background jobs so I have a lot to learn. I have spent sometime researching this issue and it seems like more background jobs are used for longer running tasks like updating db indexes rather than constantly recurring jobs (like an API request every 5 seconds).
So to sum up, What is the proper technique for running a constantly recurring task such as querying an external API in Rails? Any feedback on how to do this properly will be greatly appreciated! Thank you.
That is not how background jobs work. You're right, you have a lot of reading up to do. Think of running an asynchronous job processor like Sidekiq as running an entirely separate app. It shares the same code base as your Rails app but it runs completely separately. If you want these two separate apps to talk to each other then you have to design and write that code.
For example, I would define a cache with reader and writer methods, then have the cache populated when necessary:
someone loads product "foo" for the first time on your site
Rails checks the cache and finds it empty
Rails calls the external service
Rails saves the external service response to the cache using its writer method
Rails returns the cached response to the client
The cache would be populated thereafter by Sidekiq:
someone loads product "foo" for the second time on your site
Rails checks the cache and finds the cached value from above
Rails fires a Sidekiq job telling it to refresh the cache
Rails returns the cached response to the client
Continuing from step 3 above:
Sidekiq checks to see when the cache was last refreshed, and if it was more than x seconds ago then continue, else quit
Sidekiq calls the external service
Sidekiq saves the external service response to the cache using its writer method
When the next client loads product "foo", Rails will read the cache that was updated (or not updated) by Sidekiq.
With this type of system, the cache must be an external store of some kind like a relational database (MySQL, Postgres, Sqlite) or a NoSQL database (Redis, memcache). You cannot use the internal Rails cache because the Rails cache exists only within the memory space of the Rails app, and is not readable by Sidekiq. (because Sidekiq runs as a totally separate app)
I guess in this case you should use rails cache. Put something like this in your controller:
#price = Rails.cache.fetch('price') do
<<Some external query>>
end
you can also configure cache expiration date, by setting expires_in argument, see https://apidock.com/rails/ActiveSupport/Cache/Store/fetch for more information.
Regarding using background jobs to update your "price" value, you would need to store retrieved data anyways (use some kind of database) and fetch it in your controller.

Rails and sucker_punch: Debounce x seconds before executing job to control rate of execution

In my Rails 3.2 project, I am using SuckerPunch to run a expensive background task when a model is created/updated.
Users can do different types of interactions on this model. Most of the times these updates are pretty well spaced out, however for some other actions like re-ordering, bulk-updates etc, those POST requests can come in very frequently, and that's when it overwhelms the server.
My question is, what would be the most elegant/smart strategy to start the background job when first update happens, but wait for say 10 seconds to make sure no more updates are coming in to that Model (Table, not a row) and then execute the job. So effectively throttling without queuing.
My sucker_punch worker looks something like this:
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
#perform some expensive job
end
end
It gets called from Marker and 'Map' model and sometimes from controllers (for update_all cases)like so:
after_save :generate_static_map_html
def generate_static_map_html
StaticMapWorker.new.async.perform(self.map, self.map.markers)
end
So, a pretty standard setup for running background job. How do I make the job wait or not schedule until there are no updates for x seconds on my Model (or Table)
If it helps, Map has_many Markers so triggering the job with logic that when any marker associations of a map update would be alright too.
What you are looking for is delayed jobs, implemented through ActiveJob's perform_later. According to the edge guides, that isn't implemented in sucker_punch.
ActiveJob::QueueAdapters comparison
Fret not, however, because you can implement it yourself pretty simply. When your job retrieves the job from the queue, first perform some math on the records modified_at timestamp, comparing it to 10 seconds ago. If the model has been modified, simply add the job to the queue and abort gracefully.
code!
As per the example 2/5 of the way down the page, explaining how to add a job within a worker Github sucker punch
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
if Map.where(modified_at: 10.seconds.ago..Time.now).count > 0
StaticMapWorker.new.async.perform(map,markers)
else
#perform some expensive job
end
end
end

Working with stale data when performing asynchronous jobs with Sidekiq

In order to process events asynchronously and create an activity feed, I'm using Sidekiq and Ruby on Rails' Global ID.
This works well for most types of activities, however some of them require data that could change by the time the job is performed.
Here's a completely made-up example:
class Movie < ActiveRecord::Base
include Redis::Objects
value :score # stores an integer in Redis
has_many :likes
def popular?
likes.count > 1000
end
end
And a Sidekiq worker performing a job every time a movie is updated:
class MovieUpdatedWorker
include Sidekiq::Worker
def perform(global_id)
movie = GlobalID::Locator.locate(global_id)
MovieUpdatedActivity.create(movie: movie, score: movie.score) if movie.popular?
end
end
Now, imagine Sidekiq is lagging behind and, before it gets a chance to perform its job, the movie's score is updated in Redis, some users unliked the movie and the popular method now returns false.
Sidekiq ends up working with updated data.
I'm looking for ways to schedule jobs while making sure the required data won't change when the job is performed. A few ideas:
1/ Manually pass in all the required data and adjust the worker accordingly:
MovieUpdatedWorker.perform_async(
movie: self,
score: score,
likes_count: likes.count
)
This could work but would require reimplementing/duplicating all methods that rely on data such as score and popular? (imagine an app with much more than these two/three movable pieces).
This doesn't scale well either since serialized objects could take up a lot of room in Redis.
2/ Stubbing some methods on the record passed in to the worker:
MovieUpdatedWorker.perform_async(
global_id,
stubs: { score: score, popular?: popular? }
)
class MovieUpdatedWorker
include Sidekiq::Worker
def perform(global_id, stubs: {})
movie = GlobalID::Locator.locate(global_id)
# inspired by RSpec
stubs.each do |message, return_value|
movie.stub(message) { return_value }
end
MovieUpdatedActivity.create(movie: movie, score: movie.score) if movie.popular?
end
end
This isn't functional, but you can imagine the convenience of dealing with an actual record, not having to reimplement existing methods, and dealing with the actual data.
Do you see other strategies to "freeze" object data and asynchronously process them? What do you think about these two?
I wouldn't say that the data was stale since you would actually have the newest version of it, just that it was no longer popular. It sounds like you actually want the stale version.
If you don't want the data to have changed you need to cache it somehow. Either like you say, pass the data to the job directly, or You can add some form of versioning of the data in the database and pass a reference to the old version.
I think passing on the data you need to Redis is a reasonable way. You could serialize only the attributes you actually care about, like score.

Resources