Rails cache fetch with failover - ruby-on-rails

We use this to get a value from an external API:
def get_value
Rails.cache.fetch "some_key", expires_in: 15.second do
# hit some external API
end
end
But sometimes the external API goes down and when we try to hit it, it raises exceptions.
To fix this we'd like to:
try updating it every 15 seconds
but if it goes offline, use the old value for up to 5 minutes, retrying every 15 seconds or so
if it's stale for more than 5 minutes, only then start raising exceptions
Is there a convenient wrapper/library for this or what would be a good solution? We could code up something custom, but it seems like a common enough use case there should be something battle tested out there. Thanks!

Didn't end up finding any good solutions, so ended up using this:
# This helper is useful for caching a response from an API, where the API is unreliable
# It will try to refresh the value every :expires_in seconds, but if the block throws an exception it will use the old value for up to :fail_in seconds before actually raising the exception
def cache_with_failover key, options=nil
key_fail = "#{key}_fail"
options ||= {}
options[:expires_in] ||= 15.seconds
options[:fail_in] ||= 5.minutes
val = Rails.cache.read key
return val if val
begin
val = yield
Rails.cache.write key, val, expires_in: options[:expires_in]
Rails.cache.write key_fail, val, expires_in: options[:fail_in]
return val
rescue Exception => e
val = Rails.cache.read key_fail
return val if val
raise e
end
end
# Demo
fail = 10.seconds.from_now
a = cache_with_failover('test', expires_in: 5.seconds, fail_in: 10.seconds) do
if Time.now < fail
Time.now
else
p 'failed'
raise 'a'
end
end
An even better solution would probably exponentially back off retries after the first failure. As it's currently written, it will pummel the api with retries (in the yield) after the first failure.

Related

How to set an expiry on a cached Ruby search?

I have a function, which returns a list of ID's, in the Rails caching guide I can see that an expiration can be set on the cached results, but I have implemented my caching somewhat differently.
def provide_book_ids(search_param)
#returned_ids ||= begin
search = client.search(query: search_param, :reload => true)
search.fetch
search.options[:query] = search_str
search.fetch(true)
search.map(&:id)
end
end
What is the recomennded way to set a 10 minute cache expiry, when written as above?
def provide_book_ids(search_param)
#returned_ids = Rails.cache.fetch("zendesk_ids", expires_in: 10.minutes) do
search = client.search(query: search_param, :reload => true)
search.fetch
search.options[:query] = search_str
search.fetch(true)
search.map(&:id)
end
end
I am assuming this code is part of some request-response cycle and not something else (for example a long running worker or some class that is initialized once in your app. In such a case you wouldn't want to use #returned_ids directly but instead call provide_book_ids to get the value, but from I understand that's not your scenario so provided approach above should work.

How can I prevent many sidekiq jobs from exceeding the API calls limit

I am working on an Ruby On Rails application. We have many sidekiq workers that can process multiple jobs at a time. Each job will make calls to the Shopify API, the calls limit set by Shopify is 2 calls per second. I want to synchronize that, so that only two jobs can call the API in a given second.
The way I'm doing that right now, is like this:
# frozen_string_literal: true
class Synchronizer
attr_reader :shop_id, :queue_name, :limit, :wait_time
def initialize(shop_id:, queue_name:, limit: nil, wait_time: 1)
#shop_id = shop_id
#queue_name = queue_name.to_s
#limit = limit
#wait_time = wait_time
end
# This method should be called for each api call
def synchronize_api_call
raise "a block is required." unless block_given?
get_api_call
time_to_wait = calculate_time_to_wait
sleep(time_to_wait) unless Rails.env.test? || time_to_wait.zero?
yield
ensure
return_api_call
end
def set_api_calls
redis.del(api_calls_list)
redis.rpush(api_calls_list, calls_list)
end
private
def get_api_call
logger.log_message(synchronizer: 'Waiting for api call', color: :yellow)
#api_call_timestamp = redis.brpop(api_calls_list)[1].to_i
logger.log_message(synchronizer: 'Got api call.', color: :yellow)
end
def return_api_call
redis_timestamp = redis.time[0]
redis.rpush(api_calls_list, redis_timestamp)
ensure
redis.ltrim(api_calls_list, 0, limit - 1)
end
def last_call_timestamp
#api_call_timestamp
end
def calculate_time_to_wait
current_time = redis.time[0]
time_passed = current_time - last_call_timestamp.to_i
time_to_wait = wait_time - time_passed
time_to_wait > 0 ? time_to_wait : 0
end
def reset_api_calls
redis.multi do |r|
r.del(api_calls_list)
end
end
def calls_list
redis_timestamp = redis.time[0]
limit.times.map do |i|
redis_timestamp
end
end
def api_calls_list
#api_calls_list ||= "api-calls:shop:#{shop_id}:list"
end
def redis
Thread.current[:redis] ||= Redis.new(db: $redis_db_number)
end
end
the way I use it is like this
synchronizer = Synchronizer.new(shop_id: shop_id, queue_name: 'shopify_queue', limit: 2, wait_time: 1)
# this is called once the process started, i.e. it's not called by the jobs themselves but by the App from where the process is kicked off.
syncrhonizer.set_api_calls # this will populate the api_calls_list with 2 timestamps, those timestamps will be used to know when the last api call has been sent.
then when a job wants to make a call
syncrhonizer.synchronize_api_call do
# make the call
end
The problem
The problem with this is that if for some reason a job fails to return to the api_calls_list the api_call it took, that will make that job and the other jobs stuck for ever, or until we notice that and we call set_api_calls again. That problem won't affect that particular shop only, but also the other shops as well, because the sidekiq workers are shared between all the shops using our app. It happen sometimes that we don't notice that until a user calls us, and we find that it was stuck for many hours while it should be finished in a few minutes.
The Question
I just realised lately that Redis is not the best tool for shared locking. So I am asking, Is there any other good tool for this job?? If not in the Ruby world, I'd like to learn from others as well. I'm interested in the techniques as well as the tools. So every bit helps.
You may want to restructure your code and create a micro-service to process the API calls, which will use a local locking mechanism and force your workers to wait on the socket. It comes with the added complexity of maintaining the micro-service. But if you're in a hurry then Ent-Rate-Limiting looks cool too.

How to handle external service failure in Open-Uri?

In my Rails app I am trying to fetch a number of currency exchange rates from an external service and store them in the cache:
require 'open-uri'
module ExchangeRate
def self.all
Rails.cache.fetch("exchange_rates", :expires_in => 24.hours) { load_all }
end
private
def self.load_all
hashes = {}
CURRENCIES.each do |currency|
begin
hash = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read) #what if not available?
hashes[currency] = hash["rates"]
rescue Timeout::Error
puts "Timeout"
rescue OpenURI::Error => e
puts e.message
end
end
hashes
end
end
This works great in development but I am worried about the production environment. How can I prevent the whole thing from being cached if the external service is not available? How can I ensure ExchangeRate.all always contains data, even if it's old and can't be updated due to an external service failure?
I tried to add some basic error handling but I'm afraid it's not enough.
If you're worried about your external service not being reliable enough to keep up with caching every 24 hours, then you should disable the auto cache expiration, let users work with old data, and set up some kind of notification system to tell you if the load_all fails.
Here's what I'd do:
Assume ExchangeRate.all always returns a cached copy, with no expiration (this will return nil if no cache is found):
module ExchangeRate
def self.all
rates = Rails.cache.fetch("exchange_rates")
UpdateCurrenciesJob.perform_later if rates.nil?
rates
end
end
Create an ActiveJob that handles the updates on a regular basis:
class UpdateCurrenciesJob < ApplicationJob
queue_as :default
def perform(*_args)
hashes = {}
CURRENCIES.each do |currency|
begin
hash = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read) # what if not available?
hashes[currency] = hash['rates'].merge('updated_at' => Time.current)
rescue Timeout::Error
puts 'Timeout'
rescue OpenURI::Error => e
puts e.message
end
if hashes[currency].blank? || hashes[currency]['updated_at'] < Time.current - 24.hours
# send a mail saying "this currency hasn't been updated"
end
end
Rails.cache.write('exchange_rates', hashes)
end
end
Set the job up to run every few hours (4, 8, 12, less than 24). This way, the currencies will load in the background, the clients will always have data, and you will always know if currencies aren't working.

Speed up rake task by using typhoeus

So i stumbled across this: https://github.com/typhoeus/typhoeus
I'm wondering if this is what i need to speed up my rake task
Event.all.each do |row|
begin
url = urlhere + row.first + row.second
doc = Nokogiri::HTML(open(url))
doc.css('.table__row--event').each do |tablerow|
table = tablerow.css('.table__cell__body--location').css('h4').text
next unless table == row.eventvenuename
tablerow.css('.table__cell__body--availability').each do |button|
buttonurl = button.css('a')[0]['href']
if buttonurl.include? '/checkout/external'
else
row.update(row: buttonurl)
end
end
end
rescue Faraday::ConnectionFailed
puts "connection failed"
next
end
end
I'm wondering if this would speed it up, Or because i'm doing a .each it wouldn't?
If it would could you provide an example?
Sam
If you set up Typhoeus::Hydra to run parallel requests, you might be able to speed up your code, assuming that the Kernel#open calls are what's slowing you down. Before you optimize, you might want to run benchmarks to validate this assumption.
If it is true, and parallel requests would speed it up, you would need to restructure your code to load events in batches, build a queue of parallel requests for each batch, and then handle them after they execute. Here's some sketch code.
class YourBatchProcessingClass
def initialize(batch_size: 200)
#batch_size = batch_size
#hydra = Typhoeus::Hydra.new(max_concurrency: #batch_size)
end
def perform
# Get an array of records
Event.find_in_batches(batch_size: #batch_size) do |batch|
# Store all the requests so we can access their responses later.
requests = batch.map do |record|
request = Typhoeus::Request.new(your_url_build_logic(record))
#hydra.queue request
request
end
#hydra.run # Run requests in parallel
# Process responses from each request
requests.each do |request|
your_response_processing(request.response.body)
end
end
rescue WhateverError => e
puts e.message
end
private
def your_url_build_logic(event)
# TODO
end
def your_response_processing(response_body)
# TODO
end
end
# Run the service by calling this in your Rake task definition
YourBatchProcessingClass.new.perform
Ruby can be used for pure scripting, but it functions best as an object-oriented language. Decomposing your processing work into clear methods can help clarify your code and help you catch things like Tom Lord mentioned in the comments on your question. Also, instead of wrapping your whole script in a begin..rescue block, you can use method-level rescues as in #perform above, or just wrap #hydra.run.
As a note, .all.each is a memory hog, and is thus considered a bad solution to iterating over records: .all loads all of the records into memory before iterating over them with .each. To save memory, it's better to use .find_each or .find_in_batches, depending on your use case. See: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html

Will returning a nil value from a block passed to Rails.cache.fetch clear it?

Let's suppose I have a method like this:
def foo
Rails.cache.fetch("cache_key", :expires_in => 60.minutes) do
return_something
end
end
return_something sometimes returns a nil value. When this happens, I don't want the nil value to be cached for 60 minutes. Instead, the next time I call foo, I want the block passed to fetch to be executed again.
Is Rails.cache.fetch working like this by default? Or do I have to implement this functionality?
Update (with Answer)
Turns out, the answer was no, at least when using Memcached.
it depends on the implementation of the cache-store that you are using. i would say that it should not cache nil values, but empty strings are ok to cache.
look at the dalli store implementation ie:
def fetch(name, options=nil)
options ||= {}
name = expanded_key name
if block_given?
unless options[:force]
entry = instrument(:read, name, options) do |payload|
payload[:super_operation] = :fetch if payload
read_entry(name, options)
end
end
if !entry.nil?
instrument(:fetch_hit, name, options) { |payload| }
entry
else
result = instrument(:generate, name, options) do |payload|
yield
end
write(name, result, options)
result
end
else
read(name, options)
end
end
The updated answer to this question is: By default fetch caches nil values, but using the dalli_store engine you can avoid it with cache_nils option:
Rails.cache.fetch("cache_key", expires_in: 60.minutes, cache_nils: false) do
return_something
end
Worth noting, the defaults for Dalli have changed in recent years - the flag for nil-caching is currently false by default. See https://github.com/petergoldstein/dalli
It's definitely worth adding a test to check that your setup does what you expect (especially for production mode)

Resources