How to memoize hash across requests? - ruby-on-rails

In my Rails application I have a very expensive function that fetches a bunch of conversion rates from an external service once per day:
require 'open-uri'
module Currency
def self.all
#all ||= fetch_all
end
def self.get_rate(from_curr = "EUR", to_curr = "USD")
all[from_curr][to_curr]
end
private
def self.fetch_all
hashes = {}
CURRENCIES.keys.each do |currency|
hash = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read)
hashes[currency] = hash["rates"]
end
hashes
end
end
Is there a way to store the result of this function (a hash) to speed things up? Right now, I am trying to store it in an instance variable #all, which speeds it up a little, however it is not persisted across requests. How can I keep it across requests?

create a file lets say currency_rates.rb in your initializer with the following code:
require 'open-uri'
hashes = {}
CURRENCIES.keys.each do |currency|
hashes[currency] = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read)["rates"]
end
CURRENCY_RATES = hashes
Then write the following rake task which will run daily:
task update_currency_rates: :environment do
require 'open-uri'
hashes = {}
CURRENCIES.keys.each do |currency|
hashes[currency] = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read)["rates"]
end
Constant.const_set('CURRENCY_RATES', hashes)
end
The only drawback is that it will run every time you deploy new version of your app/on restart. You can go with it if you are ok with it.
You can avoid that if you use caching like memcachier or something, then you can do like,
def currency_rates
Rails.cache.fetch('currency_rates', expires_in: 24.hours) do
# write above code in some method and call here which will return hash and thus it will be cached.
end
end

I would initilize the hash lazy like this:
require 'open-uri'
require 'json'
module Currency
def self.get_rate(from_curr = "EUR", to_curr = "USD")
#memorized_result ||={}
#memorized_result.fetch(from_curr) do |not_found_key|
data = JSON.parse(open(URI("http://api.fixer.io/latest?base=# {not_found_key}")).read)
#memorized_result[not_found_key] = data["rates"]
end[to_curr]
end
end
I think you don't need all the exchange rates at all the time. So you can speed up things by fetching only the required one at a time. Over the time you keep all rates in memory.
This is persisted between requests in some edge cases. It depends on your server, for instance, unicorn uses multiple processes. Every process has it's own
#memorized_result variable, which needs to be filled.
If you want to share this data betweend multiple processes or servers then you need a storage for the fetched data which can be shared between multiple processes.
If you need a time to life for your entries then I would tweak #Md. Farhan Memon Rails cache hint like this:
def get_rate(from_curr = "EUR", to_curr = "USD")
Rails.cache.fetch("currency_rates_#{from_curr}_#{to_curr}", expires_in: 24.hours) do
data = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{from_curr}")).read)
data["rates"][to_curr]
end
end

Related

How to set an expiry on a cached Ruby search?

I have a function, which returns a list of ID's, in the Rails caching guide I can see that an expiration can be set on the cached results, but I have implemented my caching somewhat differently.
def provide_book_ids(search_param)
#returned_ids ||= begin
search = client.search(query: search_param, :reload => true)
search.fetch
search.options[:query] = search_str
search.fetch(true)
search.map(&:id)
end
end
What is the recomennded way to set a 10 minute cache expiry, when written as above?
def provide_book_ids(search_param)
#returned_ids = Rails.cache.fetch("zendesk_ids", expires_in: 10.minutes) do
search = client.search(query: search_param, :reload => true)
search.fetch
search.options[:query] = search_str
search.fetch(true)
search.map(&:id)
end
end
I am assuming this code is part of some request-response cycle and not something else (for example a long running worker or some class that is initialized once in your app. In such a case you wouldn't want to use #returned_ids directly but instead call provide_book_ids to get the value, but from I understand that's not your scenario so provided approach above should work.

How can I prevent many sidekiq jobs from exceeding the API calls limit

I am working on an Ruby On Rails application. We have many sidekiq workers that can process multiple jobs at a time. Each job will make calls to the Shopify API, the calls limit set by Shopify is 2 calls per second. I want to synchronize that, so that only two jobs can call the API in a given second.
The way I'm doing that right now, is like this:
# frozen_string_literal: true
class Synchronizer
attr_reader :shop_id, :queue_name, :limit, :wait_time
def initialize(shop_id:, queue_name:, limit: nil, wait_time: 1)
#shop_id = shop_id
#queue_name = queue_name.to_s
#limit = limit
#wait_time = wait_time
end
# This method should be called for each api call
def synchronize_api_call
raise "a block is required." unless block_given?
get_api_call
time_to_wait = calculate_time_to_wait
sleep(time_to_wait) unless Rails.env.test? || time_to_wait.zero?
yield
ensure
return_api_call
end
def set_api_calls
redis.del(api_calls_list)
redis.rpush(api_calls_list, calls_list)
end
private
def get_api_call
logger.log_message(synchronizer: 'Waiting for api call', color: :yellow)
#api_call_timestamp = redis.brpop(api_calls_list)[1].to_i
logger.log_message(synchronizer: 'Got api call.', color: :yellow)
end
def return_api_call
redis_timestamp = redis.time[0]
redis.rpush(api_calls_list, redis_timestamp)
ensure
redis.ltrim(api_calls_list, 0, limit - 1)
end
def last_call_timestamp
#api_call_timestamp
end
def calculate_time_to_wait
current_time = redis.time[0]
time_passed = current_time - last_call_timestamp.to_i
time_to_wait = wait_time - time_passed
time_to_wait > 0 ? time_to_wait : 0
end
def reset_api_calls
redis.multi do |r|
r.del(api_calls_list)
end
end
def calls_list
redis_timestamp = redis.time[0]
limit.times.map do |i|
redis_timestamp
end
end
def api_calls_list
#api_calls_list ||= "api-calls:shop:#{shop_id}:list"
end
def redis
Thread.current[:redis] ||= Redis.new(db: $redis_db_number)
end
end
the way I use it is like this
synchronizer = Synchronizer.new(shop_id: shop_id, queue_name: 'shopify_queue', limit: 2, wait_time: 1)
# this is called once the process started, i.e. it's not called by the jobs themselves but by the App from where the process is kicked off.
syncrhonizer.set_api_calls # this will populate the api_calls_list with 2 timestamps, those timestamps will be used to know when the last api call has been sent.
then when a job wants to make a call
syncrhonizer.synchronize_api_call do
# make the call
end
The problem
The problem with this is that if for some reason a job fails to return to the api_calls_list the api_call it took, that will make that job and the other jobs stuck for ever, or until we notice that and we call set_api_calls again. That problem won't affect that particular shop only, but also the other shops as well, because the sidekiq workers are shared between all the shops using our app. It happen sometimes that we don't notice that until a user calls us, and we find that it was stuck for many hours while it should be finished in a few minutes.
The Question
I just realised lately that Redis is not the best tool for shared locking. So I am asking, Is there any other good tool for this job?? If not in the Ruby world, I'd like to learn from others as well. I'm interested in the techniques as well as the tools. So every bit helps.
You may want to restructure your code and create a micro-service to process the API calls, which will use a local locking mechanism and force your workers to wait on the socket. It comes with the added complexity of maintaining the micro-service. But if you're in a hurry then Ent-Rate-Limiting looks cool too.

How to handle external service failure in Open-Uri?

In my Rails app I am trying to fetch a number of currency exchange rates from an external service and store them in the cache:
require 'open-uri'
module ExchangeRate
def self.all
Rails.cache.fetch("exchange_rates", :expires_in => 24.hours) { load_all }
end
private
def self.load_all
hashes = {}
CURRENCIES.each do |currency|
begin
hash = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read) #what if not available?
hashes[currency] = hash["rates"]
rescue Timeout::Error
puts "Timeout"
rescue OpenURI::Error => e
puts e.message
end
end
hashes
end
end
This works great in development but I am worried about the production environment. How can I prevent the whole thing from being cached if the external service is not available? How can I ensure ExchangeRate.all always contains data, even if it's old and can't be updated due to an external service failure?
I tried to add some basic error handling but I'm afraid it's not enough.
If you're worried about your external service not being reliable enough to keep up with caching every 24 hours, then you should disable the auto cache expiration, let users work with old data, and set up some kind of notification system to tell you if the load_all fails.
Here's what I'd do:
Assume ExchangeRate.all always returns a cached copy, with no expiration (this will return nil if no cache is found):
module ExchangeRate
def self.all
rates = Rails.cache.fetch("exchange_rates")
UpdateCurrenciesJob.perform_later if rates.nil?
rates
end
end
Create an ActiveJob that handles the updates on a regular basis:
class UpdateCurrenciesJob < ApplicationJob
queue_as :default
def perform(*_args)
hashes = {}
CURRENCIES.each do |currency|
begin
hash = JSON.parse(open(URI("http://api.fixer.io/latest?base=#{currency}")).read) # what if not available?
hashes[currency] = hash['rates'].merge('updated_at' => Time.current)
rescue Timeout::Error
puts 'Timeout'
rescue OpenURI::Error => e
puts e.message
end
if hashes[currency].blank? || hashes[currency]['updated_at'] < Time.current - 24.hours
# send a mail saying "this currency hasn't been updated"
end
end
Rails.cache.write('exchange_rates', hashes)
end
end
Set the job up to run every few hours (4, 8, 12, less than 24). This way, the currencies will load in the background, the clients will always have data, and you will always know if currencies aren't working.

Speed up rake task by using typhoeus

So i stumbled across this: https://github.com/typhoeus/typhoeus
I'm wondering if this is what i need to speed up my rake task
Event.all.each do |row|
begin
url = urlhere + row.first + row.second
doc = Nokogiri::HTML(open(url))
doc.css('.table__row--event').each do |tablerow|
table = tablerow.css('.table__cell__body--location').css('h4').text
next unless table == row.eventvenuename
tablerow.css('.table__cell__body--availability').each do |button|
buttonurl = button.css('a')[0]['href']
if buttonurl.include? '/checkout/external'
else
row.update(row: buttonurl)
end
end
end
rescue Faraday::ConnectionFailed
puts "connection failed"
next
end
end
I'm wondering if this would speed it up, Or because i'm doing a .each it wouldn't?
If it would could you provide an example?
Sam
If you set up Typhoeus::Hydra to run parallel requests, you might be able to speed up your code, assuming that the Kernel#open calls are what's slowing you down. Before you optimize, you might want to run benchmarks to validate this assumption.
If it is true, and parallel requests would speed it up, you would need to restructure your code to load events in batches, build a queue of parallel requests for each batch, and then handle them after they execute. Here's some sketch code.
class YourBatchProcessingClass
def initialize(batch_size: 200)
#batch_size = batch_size
#hydra = Typhoeus::Hydra.new(max_concurrency: #batch_size)
end
def perform
# Get an array of records
Event.find_in_batches(batch_size: #batch_size) do |batch|
# Store all the requests so we can access their responses later.
requests = batch.map do |record|
request = Typhoeus::Request.new(your_url_build_logic(record))
#hydra.queue request
request
end
#hydra.run # Run requests in parallel
# Process responses from each request
requests.each do |request|
your_response_processing(request.response.body)
end
end
rescue WhateverError => e
puts e.message
end
private
def your_url_build_logic(event)
# TODO
end
def your_response_processing(response_body)
# TODO
end
end
# Run the service by calling this in your Rake task definition
YourBatchProcessingClass.new.perform
Ruby can be used for pure scripting, but it functions best as an object-oriented language. Decomposing your processing work into clear methods can help clarify your code and help you catch things like Tom Lord mentioned in the comments on your question. Also, instead of wrapping your whole script in a begin..rescue block, you can use method-level rescues as in #perform above, or just wrap #hydra.run.
As a note, .all.each is a memory hog, and is thus considered a bad solution to iterating over records: .all loads all of the records into memory before iterating over them with .each. To save memory, it's better to use .find_each or .find_in_batches, depending on your use case. See: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html

How to test the number of database calls in Rails

I am creating a REST API in rails. I'm using RSpec. I'd like to minimize the number of database calls, so I would like to add an automatic test that verifies the number of database calls being executed as part of a certain action.
Is there a simple way to add that to my test?
What I'm looking for is some way to monitor/record the calls that are being made to the database as a result of a single API call.
If this can't be done with RSpec but can be done with some other testing tool, that's also great.
The easiest thing in Rails 3 is probably to hook into the notifications api.
This subscriber
class SqlCounter< ActiveSupport::LogSubscriber
def self.count= value
Thread.current['query_count'] = value
end
def self.count
Thread.current['query_count'] || 0
end
def self.reset_count
result, self.count = self.count, 0
result
end
def sql(event)
self.class.count += 1
puts "logged #{event.payload[:sql]}"
end
end
SqlCounter.attach_to :active_record
will print every executed sql statement to the console and count them. You could then write specs such as
expect do
# do stuff
end.to change(SqlCounter, :count).by(2)
You'll probably want to filter out some statements, such as ones starting/committing transactions or the ones active record emits to determine the structures of tables.
You may be interested in using explain. But that won't be automatic. You will need to analyse each action manually. But maybe that is a good thing, since the important thing is not the number of db calls, but their nature. For example: Are they using indexes?
Check this:
http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain/
Use the db-query-matchers gem.
expect { subject.make_one_query }.to make_database_queries(count: 1)
Fredrick's answer worked great for me, but in my case, I also wanted to know the number of calls for each ActiveRecord class individually. I made some modifications and ended up with this in case it's useful for others.
class SqlCounter< ActiveSupport::LogSubscriber
# Returns the number of database "Loads" for a given ActiveRecord class.
def self.count(clazz)
name = clazz.name + ' Load'
Thread.current['log'] ||= {}
Thread.current['log'][name] || 0
end
# Returns a list of ActiveRecord classes that were counted.
def self.counted_classes
log = Thread.current['log']
loads = log.keys.select {|key| key =~ /Load$/ }
loads.map { |key| Object.const_get(key.split.first) }
end
def self.reset_count
Thread.current['log'] = {}
end
def sql(event)
name = event.payload[:name]
Thread.current['log'] ||= {}
Thread.current['log'][name] ||= 0
Thread.current['log'][name] += 1
end
end
SqlCounter.attach_to :active_record
expect do
# do stuff
end.to change(SqlCounter, :count).by(2)

Resources