How to join 2 queries in Arel, one being an aggregation of the other (Rails 5.2.4)?

How to join 2 queries in Arel, one being an aggregation of the other (Rails 5.2.4)? - ruby-on-rails

My application monitors ProductionJobs, derived from BusinessProcesses in successive versions. Thus the unique key of ProductionJob class is composed of business_process_id and version fields.
Initially, the ProductionJob index would display the list of objects (including all versions) using an Arel structured query (#production_jobs).
But it is more convinient to only show the last version of each ProductionJob. So I created a query (#recent_jobs) to retrieve the last version of the ProductionJob for a given BusinessProces.
Joining the 2 queries should return only the last version of each ProductionJob. This is what I can't achieve with my knowledge of Arel, and I would be grateful if you could show me how to do!
Here is the code in production_jobs_controller:
a) Arel objects setup
private
def jobs
ProductionJob.arel_table
end
def processes # jobs are built on the processes
BusinessProcess.arel_table
end
def flows # flows provide a classifiaction to processes
BusinessFlow.arel_table
end
def owners # owner of the jobs
User.arel_table.alias('owners')
end
def production_jobs # job index
jobs.
join(owners).on(jobs[:owner_id].eq(owners[:id])).
join(processes).on(jobs[:business_process_id].eq(processes[:id])).
join(flows).on(processes[:business_flow_id].eq(flows[:id])).
join_sources
end
def job_index_fields
[jobs[:id],
jobs[:code].as("job_code"),
jobs[:status_id],
jobs[:created_at],
jobs[:updated_by],
jobs[:updated_at],
jobs[:business_process_id],
jobs[:version],
processes[:code].as("process_code"),
flows[:code].as("statistical_activity_code"),
owners[:name].as("owner_name")]
end
def order_by
[jobs[:code], jobs[:updated_at].desc]
end
# Latest jobs
def recent_jobs
jobs.
join(owners).on(jobs[:owner_id].eq(owners[:id])).
join_sources
end
def recent_jobs_fields
[ jobs[:code],
jobs[:business_process_id].as('bp_id'),
jobs[:version].maximum.as('max_version')
]
end
b) The index method
# GET /production_jobs or /production_jobs.json
def index
#production_jobs = ProductionJob.joins(production_jobs).
pgnd(current_playground).
where("business_flows.code in (?)", current_user.preferred_activities).
order(order_by).
select(job_index_fields).
paginate(page: params[:page], :per_page => params[:per_page])
#recent_jobs = ProductionJob.joins(recent_jobs).select(recent_jobs_fields).group(:business_process_id, :code)
#selected_jobs = #production_jobs.joins(#recent_jobs).where(business_process_id: :bp_id, version: :max_version)
Unfortunately, #selected_jobs returns a nil object, even though #production_jobs and #recent_jobs show linkable results. how should I build the #selected_jobs statement to reach the expected result?
Thanks a lot!

After several trials, I finally included the sub-request in a 'where ... in()' clause. This may not be optimal, and I am open to other proposals.
The result can be understood as the following:
#recent_jobs provide the list ProductionJobs'last versions, based on their code and version
#production_jobs provide the list of all ProductionJobs
#selected_jobs adds the where clause to #production_jobs, based on the #recent_jobs:
The last request is updated to:
#selected_jobs = #production_jobs
.where("(production_jobs.code,
production_jobs.business_process_id,
production_jobs.version)
in (?)",
#recent_jobs
)
It works this way, but I'd be glad to receive suggestions to enhance this query. Thanks!

Related

How to extract a week from a date in Arel query with Rails 5.2?

I use Arel to build reusable and structured queries, but reading around, I didn't find a clear and efficient way to alter extracted date to actually retrieve the calendar week.
Using cweek Ruby method, I am trying to build the following query (on Postgres):
week_series = Skill.joins(concepts_list).select(skills[:created_at].cweek, skills[:id].count.as("count")).group(skills[:created_at].cweek)
Here is my base query upon skills:
### Base object
def skills
Skill.arel_table
end
# Additional tables
def users
User.arel_table
end
def organisations
Organisation.arel_table
end
def themes
Playground.arel_table.alias('themes')
end
def domains
BusinessArea.arel_table.alias('domains')
end
def collections
BusinessObject.arel_table.alias('collections')
end
# Queries
def concepts_list
skills.
join(users).on(skills[:owner_id].eq(users[:id])).
join(organisations).on(skills[:organisation_id].eq(organisations[:id])).
join(collections).on(skills[:business_object_id].eq(collections[:id])).
join(domains).on(collections[:parent_id].eq(domains[:id]).and(collections[:parent_type].eq('BusinessArea'))).
join(themes).on(domains[:playground_id].eq(themes[:id])).
join_sources
end
def concepts_list_output
[skills[:id], skills[:created_at], users[:user_name], users[:name],
organisations[:code], themes[:code], domains[:code]]
end

The equivalent for ruby cweek in postgresql is EXTRACT/DATE_PART. Might differ a bit if you're using a different database.
The sql you're after is
DATE_PART('week', "skills"."created_at")
That is just a NamedFunction
Arel::Nodes::NamedFunction.new(
'DATE_PART',
[Arel::Nodes.build_quoted('week'), skills[:created_at]]
)

ActiveRecord how to use Where only if the parameter you're querying has been passed?

I'm running a query like the below:
Item.where("created_at >=?", Time.parse(params[:created_at])).where(status_id: params[:status_id])
...where the user can decide to NOT provide a parameter, in which case it should be excluded from the query entirely. For example, if the user decides to not pass a created_at and not submit it, I want to run the following:
Item.where(status_id: params[:status_id])
I was thinking even if you had a try statement like Time.try(:parse, params[:created_at]), if params[created_at] were empty, then the query would be .where(created_at >= ?", nil) which would NOT be the intent at all. Same thing with params[:status_id], if the user just didn't pass it, you'd have a query that's .where(status_id:nil) which is again not appropriate, because that's a valid query in itself!
I suppose you can write code like this:
if params[:created_at].present?
#items = Item.where("created_at >= ?", Time.parse(params[:created_at])
end
if params[:status_id].present?
#items = #items.where(status_id: params[:status_id])
end
However, this is less efficient with multiple db calls, and I'm trying to be more efficient. Just wondering if possible.

def index
#products = Product.where(nil) # creates an anonymous scope
#products = #products.status(params[:status]) if params[:status].present?
#products = #products.location(params[:location]) if params[:location].present?
#products = #products.starts_with(params[:starts_with]) if params[:starts_with].present?
end
You can do something like this. Rails is smart in order to identify when it need to build query ;)
You might be interested in checking this blog It was very useful for me and can also be for you.

If you read #where documentation, you can see option to pass nil to where clause.
blank condition :
If the condition is any blank-ish object, then #where is a no-op and returns the current relation.
This gives us option to pass conditions if valid or just return nil will produce previous relation itself.
#items = Item.where(status_condition).where(created_at_condition)
private
def status_condition
['status = ?', params[:status]] unless params[:status].blank?
end
def created_at_condition
['created_at >= ?', Time.parse(params[:created_at])] unless params[:created_at].blank?
end
This would be another option to achieve the desired result. Hope this helps !

How to speed up a very frequently made query using raw SQL and without ORM?

I have an API endpoint that accounts for a little less than half of the average response time (on averaging taking about 514 ms, yikes). The endpoint simply returns some statistics about stored data scoped to particular time periods, such as this week, last week, this month, and so on...
There are a number of ways that we could reduce it's impact, like getting the clients to hit it less and with more particular queries such as only querying for "this week" when only that data is used. Here we focus on what can be done at the database-level first. In our current implementation we generate this data for all "time scopes" on-the-fly and the number of queries is enormous and made multiple times per second. No caching is used, but maybe there is a way to use Rails's cache_key, or the low-level Rails.cache?
The current implementation look something like this:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
#user = user
summaries = Struct::Summaries.new
TimeScope::TIME_SCOPES.each do |scope|
foos = user.foos.by_scope(scope.to_sym)
summary = Struct::Summary.new
# e.g: summaries.last_week = build_summary(foos)
summaries.send("#{scope}=", build_summary(summary, foos))
end
summaries
end
private_class_method
def self.build_summary(summary, foos)
summary.all_quuz = #user.foos_count
summary.all_quux = all_quux(foos)
summary.quuw = quuw(foos).to_f
%w[foo bar baz qux].product(
%w[quux quuz corge]
).each do |a, b|
# e.g: summary.foo_quux = quux(foos, "foo")
summary.send("#{a.downcase}_#{b}=", send(b, foos, a) || 0)
end
summary
end
def self.all_quuz(foos)
foos.count
end
def self.all_quux(foos)
foos.sum(:quux)
end
def self.quuw(foos)
foos.quuwable.total_quuw
end
def self.corge(foos, foo_type)
return if foos.count.zero?
count = self.quuz(foos, foo_type) || 0
count.to_f / foos.count
end
def self.quux(foos, foo_type)
case foo_type
when "foo"
foos.where(foo: true).sum(:quux)
when "bar"
foos.bar.where(foo: false).sum(:quux)
when "baz"
foos.baz.where(foo: false).sum(:quux)
when "qux"
foos.qux.sum(:quux)
end
end
def self.quuz(foos, foo_type)
case trip_type
when "foo"
foos.where(foo: true).count
when "bar"
foos.bar.where(foo: false).count
when "baz"
foos.baz.where(foo: false).count
when "qux"
foos.qux.count
end
end
end
To avoid making changes to the model, or creating migrations to create a table to store this data (both of which may be valid and better solutions) I decided maybe it would be easier to construct one large sql query that will be executed at once in the hopes that it will be faster to build the query string and execute it without the overhead of active record set up and tear down of SQL queries.
The new approach looks something like this, it is horrifying to me and I know there must be a more elegant way:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
results = ActiveRecord::Base.connection.execute(build_query_for(user))
results.each do |result|
# build up summary struct from query results
end
end
def self.build_query_for(user)
TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[foo bar baz qux].map do |foo_type|
%[
select
'#{scope}_#{foo_type}',
sum(quux) as quux,
count(*), as quuz,
round(100.0 * (count(*) / #{user.foos_count.to_f}), 3) as corge
from
"foos"
where
"foo"."user_id" = #{user.id}
and "foos"."foo_type" = '#{foo_type.humanize}'
and "foos"."end_time" between '#{time_scope.from}' AND '#{time_scope.to}'
and "foos"."foo" = '#{foo_type == 'foo' ? 't' : 'f'}'
union
]
end
end.join.reverse.sub("union".reverse, "").reverse
end
end
The funny way of replacing the last occurance of union also horrifies but it seems to work. There must be a beter way as there are probably many things that are wrong with the above implementation(s). It may be helpful to note that I use Postgresql and have no problem with writing queries that are not portable to other DB's. Any advice is truly appreciated!
Thanks for reading!
Update: I found a solution that works for me and sped up the endpoint that uses this service object by 500% ! Essentially the idea is, instead of building a query string and then executing it for each set of parameters, we create a prepared statement using prepare followed by an exec_prepared passing in parameters to the query. Since this query is made many times over this is a useful optmization because, as per the documentation:
A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
We prepare the query like so:
def prepare_query!
ActiveRecord::Base.transaction do
connection.prepare("foos_summary",
%[with scoped_foos as (
select
*
from
"foos"
where
"foos"."user_id" = $3
and ("foos"."end_time" between $4 and $5)
)
select
$1::text as scope,
$2::text as foo_type,
sum(quux)::float as quux,
sum(eggs + bacon + ham)::float as food,
count(*) as count,
round((sum(quux) / nullif(
(select
sum(quux)
from
scoped_foos), 0))::numeric,
5)::float as quuz
from
scoped_foos
where
(case $6
when 'Baz'
then (baz = 't')
else
(baz = 'f' and foo_type = $6)
end
)
])
end
You can see in this query we use a common table expression for more readability and to avoid writing the same select query twice over.
Then we execute the query, passing in the parameters we need:
def connection
#connection ||= ActiveRecord::Base.connection.raw_connection
end
def query_results
prepare_query! unless query_already_prepared?
#results ||= TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[bacon eggs ham spam].map do |foo_type|
connection.exec_prepared("foos_summary",
[scope,
foo_type,
#user.id,
time_scope.from,
time_scope.to,
foo_type.humanize])
end
end
end
Where query_already_prepared? is a simple check in the prepared statements table maintained by postgres:
def query_already_prepared?
connection.exec(%(select
name
from
pg_prepared_statements
where name = 'foos_summary')).count.positive?
end
A nice solution, I thought! Hopefully the technique illustrated here will help others with a similar problems.

ActiveRecord has_and_belongs_to_many: find models with all given elements

I'm implementing a search system that uses name, tags, and location. There is a has_and_belongs_to_many relationship between Server and Tag. Here's what my search method currently looks like:
def self.search(params)
#servers = Server.all
if params[:name]
#servers = #servers.where "name ILIKE ?", "%#{params[:name]}%"
end
if params[:tags]
#tags = Tag.find params[:tags].split(",")
# How do I eliminate servers that do not have these tags?
end
# TODO: Eliminate those that do not have the location specified in params.
end
The tags parameter is just a comma-separated list of IDs. My question is stated in a comment in the if params[:tags] conditional block. How can I eliminate servers that do not have the tags specified?
Bonus question: any way to speed this up? All fields are optional, and I am using Postgres exclusively.
EDIT
I found a way to do this, but I have reason to believe it will be extremely slow to run. Is there any way that's faster than what I've done? Perhaps a way to make the database do the work?
tags = Tag.find tokens
servers = servers.reject do |server|
missing_a_tag = false
tags.each do |tag|
if server.tags.find_by_id(tag.id).nil?
missing_a_tag = true
end
end
missing_a_tag
end

Retrieve the servers with all the given tags with
if params[:tags]
tags_ids = params[:tags].split(',')
#tags = Tag.find(tags_ids)
#servers = #servers.joins(:tags).where(tags: {id: tags_ids}).group('servers.id').having("count(*) = #{tags_ids.count}")
end
The group(...).having(...) part selects the servers with all requested tags. If you're looking for servers which have at least one of the tags, remove it.
With this solution, the search is done in a single SQL request, so it will be better than your solution.

Rails Cache Key generated as ActiveRecord::Relation

I am attempting to generate a fragment cache (using a Dalli/Memcached store) however the key is being generated with "#" as part of the key, so Rails doesn't seem to be recognizing that there is a cache value and is hitting the database.
My cache key in the view looks like this:
cache([#jobs, "index"]) do
The controller has:
#jobs = #current_tenant.active_jobs
With the actual Active Record query like this:
def active_jobs
self.jobs.where("published = ? and expiration_date >= ?", true, Date.today).order("(featured and created_at > now() - interval '" + self.pinned_time_limit.to_s + " days') desc nulls last, created_at desc")
end
Looking at the rails server, I see the cache read, but the SQL Query still runs:
Cache read: views/#<ActiveRecord::Relation:0x007fbabef9cd58>/1-index
Read fragment views/#<ActiveRecord::Relation:0x007fbabef9cd58>/1-index (1.0ms)
(0.6ms) SELECT COUNT(*) FROM "jobs" WHERE "jobs"."tenant_id" = 1 AND (published = 't' and expiration_date >= '2013-03-03')
Job Load (1.2ms) SELECT "jobs".* FROM "jobs" WHERE "jobs"."tenant_id" = 1 AND (published = 't' and expiration_date >= '2013-03-03') ORDER BY (featured and created_at > now() - interval '7 days') desc nulls last, created_at desc
Any ideas as to what I might be doing wrong? I'm sure it has to do w/ the key generation and ActiveRecord::Relation, but i'm not sure how.

Background:
The problem is that the string representation of the relation is different each time your code is run:
|This changes|
views/#<ActiveRecord::Relation:0x007fbabef9cd58>/...
So you get a different cache key each time.
Besides that it is not possible to get rid of database queries completely. (Your own answer is the best one can do)
Solution:
To generate a valid key, instead of this
cache([#jobs, "index"])
do this:
cache([#jobs.to_a, "index"])
This queries the database and builds an array of the models, from which the cache_key is retrieved.
PS: I could swear using relations worked in previous versions of Rails...

We've been doing exactly what you're mentioning in production for about a year. I extracted it into a gem a few months ago:
https://github.com/cmer/scope_cache_key
Basically, it allows you to use a scope as part of your cache key. There are significant performance benefits to doing so since you can now cache a page containing multiple records in a single cache element rather than looping each element in the scope and retrieving caches individually. I feel that combining this with with the standard "Russian Doll Caching" principles is optimal.

I have had similar problems, I have not been able to successfully pass relations to the cache function and your #jobs variable is a relation.
I coded up a solution for cache keys that deals with this issue along with some others that I was having. It basically involves generating a cache key by iterating through the relation.
A full write up is on my site here.
http://mark.stratmann.me/content_items/rails-caching-strategy-using-key-based-approach
In summary I added a get_cache_keys function to ActiveRecord::Base
module CacheKeys
extend ActiveSupport::Concern
# Instance Methods
def get_cache_key(prefix=nil)
cache_key = []
cache_key << prefix if prefix
cache_key << self
self.class.get_cache_key_children.each do |child|
if child.macro == :has_many
self.send(child.name).all.each do |child_record|
cache_key << child_record.get_cache_key
end
end
if child.macro == :belongs_to
cache_key << self.send(child.name).get_cache_key
end
end
return cache_key.flatten
end
# Class Methods
module ClassMethods
def cache_key_children(*args)
#v_cache_key_children = []
# validate the children
args.each do |child|
#is it an association
association = reflect_on_association(child)
if association == nil
raise "#{child} is not an association!"
end
#v_cache_key_children << association
end
end
def get_cache_key_children
return #v_cache_key_children ||= []
end
end
end
# include the extension
ActiveRecord::Base.send(:include, CacheKeys)
I can now create cache fragments by doing
cache(#model.get_cache_key(['textlabel'])) do

I've done something like Hopsoft, but it uses the method in the Rails Guide as a template. I've used the MD5 digest to distinguish between relations (so User.active.cache_key can be differentiated from User.deactivated.cache_key), and used the count and max updated_at to auto-expire the cache on updates to the relation.
require "digest/md5"
module RelationCacheKey
def cache_key
model_identifier = name.underscore.pluralize
relation_identifier = Digest::MD5.hexdigest(to_sql.downcase)
max_updated_at = maximum(:updated_at).try(:utc).try(:to_s, :number)
"#{model_identifier}/#{relation_identifier}-#{count}-#{max_updated_at}"
end
end
ActiveRecord::Relation.send :include, RelationCacheKey

While I marked #mark-stratmann 's response as correct I actually resolved this by simplifying the implementation. I added touch: true to my model relationship declaration:
belongs_to :tenant, touch: true
and then set the cache key based on the tenant (with a required query param as well):
<% cache([#current_tenant, params[:query], "#{#current_tenant.id}-index"]) do %>
That way if a new Job is added, it touches the Tenant cache as well. Not sure if this is the best route, but it works and seems pretty simple.

Im using this code:
class ActiveRecord::Base
def self.cache_key
pluck("concat_ws('/', '#{table_name}', group_concat(#{table_name}.id), date_format(max(#{table_name}.updated_at), '%Y%m%d%H%i%s'))").first
end
def self.updated_at
maximum(:updated_at)
end
end

maybe this can help you out
https://github.com/casiodk/class_cacher , it generates a cache_key from the Model itself, but maybe you can use some of the principles in the codebase

As a starting point you could try something like this:
def self.cache_key
["#{model_name.cache_key}-all",
"#{count}-#{updated_at.utc.to_s(cache_timestamp_format) rescue 'empty'}"
] * '/'
end
def self.updated_at
maximum :updated_at
end
I'm having normalized database where multiple models relate to the same other model, think of clients, locations, etc. all having addresses by means of a street_id.
With this solution you can generate cache_keys based on scope, e.g.
cache [#client, #client.locations] do
# ...
end
cache [#client, #client.locations.active, 'active'] do
# ...
end
and I could simply modify self.updated from above to also include associated objects (because has_many does not support "touch", so if I updated the street, it won't be seen by the cache otherwise):
belongs_to :street
def cache_key
[street.cache_key, super] * '/'
end
# ...
def self.updated_at
[maximum(:updated_at),
joins(:street).maximum('streets.updated_at')
].max
end
As long as you don't "undelete" records and use touch in belongs_to, you should be alright with the assumption that a cache key made of count and max updated_at is sufficient.

I'm using a simple patch on ActiveRecord::Relation to generate cache keys for relations.
require "digest/md5"
module RelationCacheKey
def cache_key
Digest::MD5.hexdigest to_sql.downcase
end
end
ActiveRecord::Relation.send :include, RelationCacheKey

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to join 2 queries in Arel, one being an aggregation of the other (Rails 5.2.4)? - ruby-on-rails

Related

How to extract a week from a date in Arel query with Rails 5.2?

ActiveRecord how to use Where only if the parameter you're querying has been passed?

How to speed up a very frequently made query using raw SQL and without ORM?

ActiveRecord has_and_belongs_to_many: find models with all given elements

Rails Cache Key generated as ActiveRecord::Relation

Categories

Resources