I work with rails 5 / audited 4.6.5
I have batch actions on more than 2000 items on once.
To make it usable, I need to use the updatre_all function.
Then, I would like to create the needed audited in one time
What I would like to do is something like that :
Followup.where(id: ids).in_batches(of: 500) do |group|
#quick update for user responsiveness
group.update_all(step_id: 1)
group.delay.newAudits(step_id: 1)
end
But the audited gem looks to be to basic for that.
I'm sure a lot of poeple faced issue like that before
after a lot of iteration, I manage to create an optimized query.
I put it in an AditBatch class and also add delay
class AuditBatch
############################################
## Init a batch update from a list of ids
## the aim is to avoid instantiations in controllers
## #param string class name (eq: 'Followup')
## #param int[] ids of object to update
## #param changet hash of changes {key1: newval, key2: newval}
############################################
def self.init_batch_creation(auditable_type, auditable_ids, changes)
obj = Object.const_get(auditable_type)
group = obj.where(id: auditable_ids)
AuditBatch.delay.batch_creation(group, changes, false)
end
############################################
## insert a long list of audits in one time
## #param group array array of auditable objects
## #param changet hash of changes {key1: newval, key2: newval}
############################################
def self.batch_creation(group, changes, delayed = true)
sql = 'INSERT INTO audits ("action", "audited_changes", "auditable_id", "auditable_type", "created_at", "version", "request_uuid")
VALUES '
total = group.size
group.each_with_index do |g, index|
parameters = 'json_build_object('
length = changes.size
i=1
changes.each do |key, val|
parameters += "'#{key}',"
parameters += "json_build_array("
parameters += "(SELECT ((audited_changes -> '#{key}')::json->>1) FROM audits WHERE auditable_id = #{g.id} order by id desc limit 1),"
parameters += val.is_a?(String) ? "'#{val.to_s}'" : val.to_s
parameters += ')'
parameters += ',' if i < length
i +=1
end
parameters += ')'
sql += "('update', #{parameters}, #{g.id}, '#{g.class.name}', '#{Time.now}', (SELECT max(version) FROM audits where auditable_id= #{g.id})+1, '#{SecureRandom.uuid}')"
sql += ", " if (index+1) < total
end
if delayed==true
AuditBatch.delay.execute_delayed_sql(sql)
else
ActiveRecord::Base.connection.execute(sql)
end
end
def self.execute_delayed_sql(sql)
ActiveRecord::Base.connection.execute(sql)
end
end
With group.update_all your callbacks are skipped and it doesn't end up recording changes in new audit.
You cannot manually create audits for those records, and even if you can create that audit changes manually, you will need the reference of "what changed?" (goes in audited_changes). But those changes are already lost when you did update_all on group earlier.
(`action`, `audited_changes`, `auditable_id`, `auditable_type`, `created_at`, `version`, `request_uuid`)
It is also documented in this audited issue - https://github.com/collectiveidea/audited/issues/352
paper_trail, another such gem which retains change_logs, also has this issue: https://github.com/airblade/paper_trail/issues/337
Related
I have an API endpoint that accounts for a little less than half of the average response time (on averaging taking about 514 ms, yikes). The endpoint simply returns some statistics about stored data scoped to particular time periods, such as this week, last week, this month, and so on...
There are a number of ways that we could reduce it's impact, like getting the clients to hit it less and with more particular queries such as only querying for "this week" when only that data is used. Here we focus on what can be done at the database-level first. In our current implementation we generate this data for all "time scopes" on-the-fly and the number of queries is enormous and made multiple times per second. No caching is used, but maybe there is a way to use Rails's cache_key, or the low-level Rails.cache?
The current implementation look something like this:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
#user = user
summaries = Struct::Summaries.new
TimeScope::TIME_SCOPES.each do |scope|
foos = user.foos.by_scope(scope.to_sym)
summary = Struct::Summary.new
# e.g: summaries.last_week = build_summary(foos)
summaries.send("#{scope}=", build_summary(summary, foos))
end
summaries
end
private_class_method
def self.build_summary(summary, foos)
summary.all_quuz = #user.foos_count
summary.all_quux = all_quux(foos)
summary.quuw = quuw(foos).to_f
%w[foo bar baz qux].product(
%w[quux quuz corge]
).each do |a, b|
# e.g: summary.foo_quux = quux(foos, "foo")
summary.send("#{a.downcase}_#{b}=", send(b, foos, a) || 0)
end
summary
end
def self.all_quuz(foos)
foos.count
end
def self.all_quux(foos)
foos.sum(:quux)
end
def self.quuw(foos)
foos.quuwable.total_quuw
end
def self.corge(foos, foo_type)
return if foos.count.zero?
count = self.quuz(foos, foo_type) || 0
count.to_f / foos.count
end
def self.quux(foos, foo_type)
case foo_type
when "foo"
foos.where(foo: true).sum(:quux)
when "bar"
foos.bar.where(foo: false).sum(:quux)
when "baz"
foos.baz.where(foo: false).sum(:quux)
when "qux"
foos.qux.sum(:quux)
end
end
def self.quuz(foos, foo_type)
case trip_type
when "foo"
foos.where(foo: true).count
when "bar"
foos.bar.where(foo: false).count
when "baz"
foos.baz.where(foo: false).count
when "qux"
foos.qux.count
end
end
end
To avoid making changes to the model, or creating migrations to create a table to store this data (both of which may be valid and better solutions) I decided maybe it would be easier to construct one large sql query that will be executed at once in the hopes that it will be faster to build the query string and execute it without the overhead of active record set up and tear down of SQL queries.
The new approach looks something like this, it is horrifying to me and I know there must be a more elegant way:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
results = ActiveRecord::Base.connection.execute(build_query_for(user))
results.each do |result|
# build up summary struct from query results
end
end
def self.build_query_for(user)
TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[foo bar baz qux].map do |foo_type|
%[
select
'#{scope}_#{foo_type}',
sum(quux) as quux,
count(*), as quuz,
round(100.0 * (count(*) / #{user.foos_count.to_f}), 3) as corge
from
"foos"
where
"foo"."user_id" = #{user.id}
and "foos"."foo_type" = '#{foo_type.humanize}'
and "foos"."end_time" between '#{time_scope.from}' AND '#{time_scope.to}'
and "foos"."foo" = '#{foo_type == 'foo' ? 't' : 'f'}'
union
]
end
end.join.reverse.sub("union".reverse, "").reverse
end
end
The funny way of replacing the last occurance of union also horrifies but it seems to work. There must be a beter way as there are probably many things that are wrong with the above implementation(s). It may be helpful to note that I use Postgresql and have no problem with writing queries that are not portable to other DB's. Any advice is truly appreciated!
Thanks for reading!
Update: I found a solution that works for me and sped up the endpoint that uses this service object by 500% ! Essentially the idea is, instead of building a query string and then executing it for each set of parameters, we create a prepared statement using prepare followed by an exec_prepared passing in parameters to the query. Since this query is made many times over this is a useful optmization because, as per the documentation:
A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
We prepare the query like so:
def prepare_query!
ActiveRecord::Base.transaction do
connection.prepare("foos_summary",
%[with scoped_foos as (
select
*
from
"foos"
where
"foos"."user_id" = $3
and ("foos"."end_time" between $4 and $5)
)
select
$1::text as scope,
$2::text as foo_type,
sum(quux)::float as quux,
sum(eggs + bacon + ham)::float as food,
count(*) as count,
round((sum(quux) / nullif(
(select
sum(quux)
from
scoped_foos), 0))::numeric,
5)::float as quuz
from
scoped_foos
where
(case $6
when 'Baz'
then (baz = 't')
else
(baz = 'f' and foo_type = $6)
end
)
])
end
You can see in this query we use a common table expression for more readability and to avoid writing the same select query twice over.
Then we execute the query, passing in the parameters we need:
def connection
#connection ||= ActiveRecord::Base.connection.raw_connection
end
def query_results
prepare_query! unless query_already_prepared?
#results ||= TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[bacon eggs ham spam].map do |foo_type|
connection.exec_prepared("foos_summary",
[scope,
foo_type,
#user.id,
time_scope.from,
time_scope.to,
foo_type.humanize])
end
end
end
Where query_already_prepared? is a simple check in the prepared statements table maintained by postgres:
def query_already_prepared?
connection.exec(%(select
name
from
pg_prepared_statements
where name = 'foos_summary')).count.positive?
end
A nice solution, I thought! Hopefully the technique illustrated here will help others with a similar problems.
I am getting this error for this set up. My thought is that the file cannot properly access the csv. That I am attempting to import. I've got to import from one csv to create another csv using the model date. What do I put in the controller and views to show the new csv / manipulated data? Basically how can I pass one csv file in a model for manipulation (orders.csv) and out into another csv file (redemption.csv) the code in the model is just telling model to calculate the existing numbers in orders.csv a certain way for export without this argument error?
The controller (I don't really know what to do here)
class OrdersController < ApplicationController
def index
orders = Order.new
end
def redemptions
orders = Order.new
end
end
The View (not confident about this either)
<h1>Chocolates</h1>
puts "#{order.purchased_chocolate_count}"
<%= link_to "CSV", orders_redemptions_path, :format => :csv %>
Model
require 'csv'
# Define an Order class to make it easier to store / calculate chocolate tallies
class Order < ActiveRecord::Base
module ChocolateTypes
MILK = 'milk'
DARK = 'dark'
WHITE = 'white'
SUGARFREE = 'sugar free'
end
BonusChocolateTypes = {
ChocolateTypes::MILK => [ChocolateTypes::MILK, ChocolateTypes::SUGARFREE],
ChocolateTypes::DARK => [ChocolateTypes::DARK],
ChocolateTypes::WHITE => [ChocolateTypes::WHITE, ChocolateTypes::SUGARFREE],
ChocolateTypes::SUGARFREE => [ChocolateTypes::SUGARFREE, ChocolateTypes::DARK]
}
# Ruby has this wacky thing called attr_reader that defines the available
# operations that can be performed on class member variables from outside:
attr_reader :order_value
attr_reader :chocolate_price
attr_reader :required_wrapper_count
attr_reader :order_chocolate_type
attr_reader :chocolate_counts
def initialize(order_value, chocolate_price, required_wrapper_count, order_chocolate_type)
#order_value = order_value
#chocolate_price = chocolate_price
#required_wrapper_count = required_wrapper_count
#order_chocolate_type = order_chocolate_type
# Initialize a new hash to store the chocolate counts by chocolate type.
# Set the default value for each chocolate type to 0
#chocolate_counts = Hash.new(0);
process
end
# Return the number of chocolates purchased
def purchased_chocolate_count
# In Ruby, division of two integer values returns an integer value,
# so you don't have to floor the result explicitly
order_value / chocolate_price
end
# Return the number of chocolate bonuses to award (which can include
# multiple different chocolate types; see BonusChocolateTypes above)
def bonus_chocolate_count
(purchased_chocolate_count / required_wrapper_count).to_i
end
# Process the order:
# 1. Add chocolate counts to the totals hash for the specified order type
# 2. Add the bonus chocolate types awarded for this order
def process
chocolate_counts[order_chocolate_type] += purchased_chocolate_count
bonus_chocolate_count.times do |i|
BonusChocolateTypes[order_chocolate_type].each do |bonus_chocolate_type|
chocolate_counts[bonus_chocolate_type] += 1
end
end
end
# Output the chocolate counts (including bonuses) for the order as an array
# of strings suitable for piping to an output CSV
def csv_data
ChocolateTypes.constants.map do |output_chocolate_type|
# Get the display string (lowercase)
chocolate_key = ChocolateTypes.const_get(output_chocolate_type)
chocolate_count = chocolate_counts[chocolate_key].to_i
"#{chocolate_key} #{chocolate_count}"
end
end
end
# Create a file handle to the output file
CSV.open("redemptions.csv", "wb") do |redemption_csv|
# Read in the input file and store it as an array of lines
input_lines = CSV.read("orders.csv")
# Remove the first line from the input file (it just contains the CSV headers)
input_lines.shift()
input_lines.each do |input_line|
order_value, chocolate_price, required_wrapper_count, chocolate_type = input_line
# Correct the input values to the correct types
order_value = order_value.to_f
chocolate_price = chocolate_price.to_f
required_wrapper_count = required_wrapper_count.to_i
# Sanitize the chocolate type from the input line so that it doesn't
# include any quotes or leading / trailing whitespace
chocolate_type = chocolate_type.gsub(/[']/, '').strip
order = Order.new(order_value, chocolate_price, required_wrapper_count, chocolate_type)
order.process()
puts order.purchased_chocolate_count
# Append the order to the output file as a new CSV line
output_csv << order.csv_data
end
end
In Your initialize method you are not provide default value to argument.
def initialize(order_value, chocolate_price, required_wrapper_count, order_chocolate_type)
When you are trying to run orders = Order.new it is expecting four argument and you haven't provide it.
One more issue. Your local variable name should be order not orders for proper naming convention.
To assign default values properly, you can look here.
I have this "heavy_rotation" filter I'm working on. Basically it grabs tracks from our database based on certain parameters (a mixture of listens_count, staff_pick, purchase_count, to name a few)
An xhr request is made to the filter_tracks controller action. In there I have a flag to check if it's "heavy_rotation". I will likely move this to the model (cos this controller is getting fat)... Anyway, how can I ensure (in a efficient way) to not have it pull the same records? I've considered an offset, but than I have to keep track of the offset for every query. Or maybe store track.id's to compare against for each query? Any ideas? I'm having trouble thinking of an elegant way to do this.
Maybe it should be noted that a limit of 14 is set via Javascript, and when a user hits "view more" to paginate, it sends another request to filter_tracks.
Any help appreciated! Thanks!
def filter_tracks
params[:limit] ||= 50
params[:offset] ||= 0
params[:order] ||= 'heavy_rotation'
# heavy rotation filter flag
heavy_rotation ||= (params[:order] == 'heavy_rotation')
#result_offset = params[:offset]
#tracks = Track.ready.with_artist
params[:order] = "tracks.#{params[:order]}" unless heavy_rotation
if params[:order]
order = params[:order]
order.match(/artist.*/){|m|
params[:order] = params[:order].sub /tracks\./, ''
}
order.match(/title.*/){|m|
params[:order] = params[:order].sub /tracks.(title)(.*)/i, 'LOWER(\1)\2'
}
end
searched = params[:q] && params[:q][:search].present?
#tracks = parse_params(params[:q], #tracks)
#tracks = #tracks.offset(params[:offset])
#result_count = #tracks.count
#tracks = #tracks.order(params[:order], 'tracks.updated_at DESC').limit(params[:limit]) unless heavy_rotation
# structure heavy rotation results
if heavy_rotation
puts "*" * 300
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
# mix in top licensed tracks within last 3 months
t = Track.top_licensed
tracks_top_licensed = t.where(
"tracks.updated_at >= :top",
top: three_months_ago).limit(5)
# mix top listened to tracks within last two weeks
tracks_top_listens = #tracks.order('tracks.listens_count DESC').where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(3)
# mix top downloaded tracks within last two weeks
tracks_top_downloaded = #tracks.order("tracks.downloads_count DESC").where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(2)
# mix in 25% of staff picks added within 3 months
tracks_staff_picks = Track.ready.staff_picks.
includes(:artist).order("tracks.created_at DESC").where(
"tracks.updated_at >= :top",
top: three_months_ago)
.limit(4)
#tracks = tracks_top_licensed + tracks_top_listens + tracks_top_downloaded + tracks_staff_picks
end
render partial: "shared/results"
end
I think seeking an "elegant" solution is going to yield many diverse opinions, so I'll offer one approach and my reasoning. In my design decision, I feel that in this case it's optimal and elegant to enforce uniqueness on query intersections by filtering the returned record objects instead of trying to restrict the query to only yield unique results. As for getting contiguous results for pagination, on the other hand, I would store offsets from each query and use it as the starting point for the next query using instance variables or sessions, depending on how the data needs to be persisted.
Here's a gist to my refactored version of your code with a solution implemented and comments explaining why I chose to use certain logic or data structures: https://gist.github.com/femmestem/2b539abe92e9813c02da
#filter_tracks holds a hash map #tracks_offset which the other methods can access and update; each of the query methods holds the responsibility of adding its own offset key to #tracks_offset.
#filter_tracks also holds a collection of track id's for tracks that already appear in the results.
If you need persistence, make #tracks_offset and #track_ids sessions/cookies instead of instance variables. The logic should be the same. If you use sessions to store the offsets and id's from results, remember to clear them when your user is done interacting with this feature.
See below. Note, I refactored your #filter_tracks method to separate the responsibilities into 9 different methods: #filter_tracks, #heavy_rotation, #order_by_params, #heavy_rotation?, #validate_and_return_top_results, and #tracks_top_licensed... #tracks_top_<whatever>. This will make my notes easier to follow and your code more maintainable.
def filter_tracks
# Does this need to be so high when JavaScript limits display to 14?
#limit ||= 50
#tracks_offset ||= {}
#tracks_offset[:default] ||= 0
#result_track_ids ||= []
#order ||= params[:order] || 'heavy_rotation'
tracks = Track.ready.with_artist
tracks = parse_params(params[:q], tracks)
#result_count = tracks.count
# Checks for heavy_rotation filter flag
if heavy_rotation? #order
#tracks = heavy_rotation
else
#tracks = order_by_params
end
render partial: "shared/results"
end
All #heavy_rotation does is call the various query methods. This makes it easy to add, modify, or delete any one of the query methods as criteria changes without affecting any other method.
def heavy_rotation
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
tracks_top_licensed(date_range: three_months_ago, max_results: 5) +
tracks_top_listens(date_range: two_weeks_ago, max_results: 3) +
tracks_top_downloaded(date_range: two_weeks_ago, max_results: 2) +
tracks_staff_picks(date_range: three_months_ago, max_results: 4)
end
Here's what one of the query methods looks like. They're all basically the same, but with custom SQL/ORM queries. You'll notice that I'm not setting the :limit parameter to the number of results that I want the query method to return. This would create a problem if one of the records returned is duplicated by another query method, like if the same track was returned by staff_picks and top_downloaded. Then I would have to make an additional query to get another record. That's not a wrong decision, just one I didn't decide to do.
def tracks_top_licensed(args = {})
args = #default.merge args
max = args[:max_results]
date_range = args[:date_range]
# Adds own offset key to #filter_tracks hash map => #tracks_offset
#tracks_offset[:top_licensed] ||= 0
unfiltered_results = Track.top_licensed
.where("tracks.updated_at >= :date_range", date_range: date_range)
.limit(#limit)
.offset(#tracks_offset[:top_licensed])
top_tracks = validate_and_return_top_results(unfiltered_results, max)
# Add offset of your most recent query to the cumulative offset
# so triggering 'view more'/pagination returns contiguous results
#tracks_offset[:top_licensed] += top_tracks[:offset]
top_tracks[:top_results]
end
In each query method, I'm cleaning the record objects through a custom method #validate_and_return_top_results. My validator checks through the record objects for duplicates against the #track_ids collection in its ancestor method #filter_tracks. It then returns the number of records specified by its caller.
def validate_and_return_top_results(collection, max = 1)
top_results = []
i = 0 # offset incrementer
until top_results.count >= max do
# Checks if track has already appeared in the results
unless #result_track_ids.include? collection[i].id
# this will be returned to the caller
top_results << collection[i]
# this is the point of reference to validate your query method results
#result_track_ids << collection[i].id
end
i += 1
end
{ top_results: top_results, offset: i }
end
I saw other threads stating how to do it for mySql, and even how to do it in java, but not how to set the query timeout in ruby.
I'm trying to use the setQueryTimeout function in Jruby using OJDBC7, but can't find how to do it in ruby. I've tried the following:
#c.connection.instance_variable_get(:#connection).instance_variable_set(:#query_timeout, 1)
#c.connection.instance_variable_get(:#connection).instance_variable_set(:#read_timeout, 1)
#c.connection.setQueryTimeout(1)
I also tried modifying my database.yml file to include
adapter: jdbc
driver: oracle.jdbc.driver.OracleDriver
timeout: 1
none of the above had any effect, other then the setQueryTimeout which threw a method error.
Any help would be great
So I found a way to make it work, but I don't like it. It's very hackish and orphans queries on the database, but it at least allows my app to continue executing. I would still love to find a way to cancel the statement so i'm not orphaning queries that take longer then 10 seconds.
query_thread = Thread.new {
#execute query
}
begin
Timeout::timeout(10) do
query_thread.join()
end
rescue
Thread.kill(query_thread)
results = Array.new
end
Query timeout on Oracle-DB works for me with Rails 4 and JRuby
With JRuby you can use JBDC-function statement.setQueryTimeout to define query timeout.
Suddenly this requires patching of oracle-enhanced_adapter as shown below.
This example is an implementation of iterator-query without storing result in array, which also uses query timeout.
# hold open SQL-Cursor and iterate over SQL-result without storing whole result in Array
# Peter Ramm, 02.03.2016
# expand class by getter to allow access on internal variable #raw_statement
ActiveRecord::ConnectionAdapters::OracleEnhancedJDBCConnection::Cursor.class_eval do
def get_raw_statement
#raw_statement
end
end
# Class extension by Module-Declaration : module ActiveRecord, module ConnectionAdapters, module OracleEnhancedDatabaseStatements
# does not work as Engine with Winstone application server, therefore hard manipulation of class ActiveRecord::ConnectionAdapters::OracleEnhancedAdapter
# and extension with method iterate_query
ActiveRecord::ConnectionAdapters::OracleEnhancedAdapter.class_eval do
# Method comparable with ActiveRecord::ConnectionAdapters::OracleEnhancedDatabaseStatements.exec_query,
# but without storing whole result in memory
def iterate_query(sql, name = 'SQL', binds = [], modifier = nil, query_timeout = nil, &block)
type_casted_binds = binds.map { |col, val|
[col, type_cast(val, col)]
}
log(sql, name, type_casted_binds) do
cursor = nil
cached = false
if without_prepared_statement?(binds)
cursor = #connection.prepare(sql)
else
unless #statements.key? sql
#statements[sql] = #connection.prepare(sql)
end
cursor = #statements[sql]
binds.each_with_index do |bind, i|
col, val = bind
cursor.bind_param(i + 1, type_cast(val, col), col)
end
cached = true
end
cursor.get_raw_statement.setQueryTimeout(query_timeout) if query_timeout
cursor.exec
if name == 'EXPLAIN' and sql =~ /^EXPLAIN/
res = true
else
columns = cursor.get_col_names.map do |col_name|
#connection.oracle_downcase(col_name).freeze
end
fetch_options = {:get_lob_value => (name != 'Writable Large Object')}
while row = cursor.fetch(fetch_options)
result_hash = {}
columns.each_index do |index|
result_hash[columns[index]] = row[index]
row[index] = row[index].strip if row[index].class == String # Remove possible 0x00 at end of string, this leads to error in Internet Explorer
end
result_hash.extend SelectHashHelper
modifier.call(result_hash) unless modifier.nil?
yield result_hash
end
end
cursor.close unless cached
nil
end
end #iterate_query
end #class_eval
class SqlSelectIterator
def initialize(stmt, binds, modifier, query_timeout)
#stmt = stmt
#binds = binds
#modifier = modifier # proc for modifikation of record
#query_timeout = query_timeout
end
def each(&block)
# Execute SQL and call block for every record of result
ActiveRecord::Base.connection.iterate_query(#stmt, 'sql_select_iterator', #binds, #modifier, #query_timeout, &block)
end
end
Use above class SqlSelectIterator like this example:
SqlSelectIterator.new(stmt, binds, modifier, query_timeout).each do |record|
process(record)
end
I am attempting to generate a fragment cache (using a Dalli/Memcached store) however the key is being generated with "#" as part of the key, so Rails doesn't seem to be recognizing that there is a cache value and is hitting the database.
My cache key in the view looks like this:
cache([#jobs, "index"]) do
The controller has:
#jobs = #current_tenant.active_jobs
With the actual Active Record query like this:
def active_jobs
self.jobs.where("published = ? and expiration_date >= ?", true, Date.today).order("(featured and created_at > now() - interval '" + self.pinned_time_limit.to_s + " days') desc nulls last, created_at desc")
end
Looking at the rails server, I see the cache read, but the SQL Query still runs:
Cache read: views/#<ActiveRecord::Relation:0x007fbabef9cd58>/1-index
Read fragment views/#<ActiveRecord::Relation:0x007fbabef9cd58>/1-index (1.0ms)
(0.6ms) SELECT COUNT(*) FROM "jobs" WHERE "jobs"."tenant_id" = 1 AND (published = 't' and expiration_date >= '2013-03-03')
Job Load (1.2ms) SELECT "jobs".* FROM "jobs" WHERE "jobs"."tenant_id" = 1 AND (published = 't' and expiration_date >= '2013-03-03') ORDER BY (featured and created_at > now() - interval '7 days') desc nulls last, created_at desc
Any ideas as to what I might be doing wrong? I'm sure it has to do w/ the key generation and ActiveRecord::Relation, but i'm not sure how.
Background:
The problem is that the string representation of the relation is different each time your code is run:
|This changes|
views/#<ActiveRecord::Relation:0x007fbabef9cd58>/...
So you get a different cache key each time.
Besides that it is not possible to get rid of database queries completely. (Your own answer is the best one can do)
Solution:
To generate a valid key, instead of this
cache([#jobs, "index"])
do this:
cache([#jobs.to_a, "index"])
This queries the database and builds an array of the models, from which the cache_key is retrieved.
PS: I could swear using relations worked in previous versions of Rails...
We've been doing exactly what you're mentioning in production for about a year. I extracted it into a gem a few months ago:
https://github.com/cmer/scope_cache_key
Basically, it allows you to use a scope as part of your cache key. There are significant performance benefits to doing so since you can now cache a page containing multiple records in a single cache element rather than looping each element in the scope and retrieving caches individually. I feel that combining this with with the standard "Russian Doll Caching" principles is optimal.
I have had similar problems, I have not been able to successfully pass relations to the cache function and your #jobs variable is a relation.
I coded up a solution for cache keys that deals with this issue along with some others that I was having. It basically involves generating a cache key by iterating through the relation.
A full write up is on my site here.
http://mark.stratmann.me/content_items/rails-caching-strategy-using-key-based-approach
In summary I added a get_cache_keys function to ActiveRecord::Base
module CacheKeys
extend ActiveSupport::Concern
# Instance Methods
def get_cache_key(prefix=nil)
cache_key = []
cache_key << prefix if prefix
cache_key << self
self.class.get_cache_key_children.each do |child|
if child.macro == :has_many
self.send(child.name).all.each do |child_record|
cache_key << child_record.get_cache_key
end
end
if child.macro == :belongs_to
cache_key << self.send(child.name).get_cache_key
end
end
return cache_key.flatten
end
# Class Methods
module ClassMethods
def cache_key_children(*args)
#v_cache_key_children = []
# validate the children
args.each do |child|
#is it an association
association = reflect_on_association(child)
if association == nil
raise "#{child} is not an association!"
end
#v_cache_key_children << association
end
end
def get_cache_key_children
return #v_cache_key_children ||= []
end
end
end
# include the extension
ActiveRecord::Base.send(:include, CacheKeys)
I can now create cache fragments by doing
cache(#model.get_cache_key(['textlabel'])) do
I've done something like Hopsoft, but it uses the method in the Rails Guide as a template. I've used the MD5 digest to distinguish between relations (so User.active.cache_key can be differentiated from User.deactivated.cache_key), and used the count and max updated_at to auto-expire the cache on updates to the relation.
require "digest/md5"
module RelationCacheKey
def cache_key
model_identifier = name.underscore.pluralize
relation_identifier = Digest::MD5.hexdigest(to_sql.downcase)
max_updated_at = maximum(:updated_at).try(:utc).try(:to_s, :number)
"#{model_identifier}/#{relation_identifier}-#{count}-#{max_updated_at}"
end
end
ActiveRecord::Relation.send :include, RelationCacheKey
While I marked #mark-stratmann 's response as correct I actually resolved this by simplifying the implementation. I added touch: true to my model relationship declaration:
belongs_to :tenant, touch: true
and then set the cache key based on the tenant (with a required query param as well):
<% cache([#current_tenant, params[:query], "#{#current_tenant.id}-index"]) do %>
That way if a new Job is added, it touches the Tenant cache as well. Not sure if this is the best route, but it works and seems pretty simple.
Im using this code:
class ActiveRecord::Base
def self.cache_key
pluck("concat_ws('/', '#{table_name}', group_concat(#{table_name}.id), date_format(max(#{table_name}.updated_at), '%Y%m%d%H%i%s'))").first
end
def self.updated_at
maximum(:updated_at)
end
end
maybe this can help you out
https://github.com/casiodk/class_cacher , it generates a cache_key from the Model itself, but maybe you can use some of the principles in the codebase
As a starting point you could try something like this:
def self.cache_key
["#{model_name.cache_key}-all",
"#{count}-#{updated_at.utc.to_s(cache_timestamp_format) rescue 'empty'}"
] * '/'
end
def self.updated_at
maximum :updated_at
end
I'm having normalized database where multiple models relate to the same other model, think of clients, locations, etc. all having addresses by means of a street_id.
With this solution you can generate cache_keys based on scope, e.g.
cache [#client, #client.locations] do
# ...
end
cache [#client, #client.locations.active, 'active'] do
# ...
end
and I could simply modify self.updated from above to also include associated objects (because has_many does not support "touch", so if I updated the street, it won't be seen by the cache otherwise):
belongs_to :street
def cache_key
[street.cache_key, super] * '/'
end
# ...
def self.updated_at
[maximum(:updated_at),
joins(:street).maximum('streets.updated_at')
].max
end
As long as you don't "undelete" records and use touch in belongs_to, you should be alright with the assumption that a cache key made of count and max updated_at is sufficient.
I'm using a simple patch on ActiveRecord::Relation to generate cache keys for relations.
require "digest/md5"
module RelationCacheKey
def cache_key
Digest::MD5.hexdigest to_sql.downcase
end
end
ActiveRecord::Relation.send :include, RelationCacheKey