Parameterize an ActiveRecord #joins method - ruby-on-rails

I am refactoring a fairly complex query that involves chaining multiple .joins methods together. In one of these joins I am using a raw SQL query which uses string interpolation i.e joining WHERE foo.id = #{id}. I am aware that I can parameterize ActiveRecord #where by using the ? variable and passing in the arguments as parameters, but the joins method does not support multiple arguments in this fashion. For example:
Using:
Post.my_scope_name.joins("LEFT JOIN posts ON posts.id = images.post_id and posts.id = ?", "1") in order to pass in an id of 1 produces an ActiveRecord::StatementInvalid
because the generated SQL looks like this:
"LEFT JOIN posts ON posts.id = images.post_id and posts.id = ? 1"
What is the standard approach to parameterizing queries when using the joins method?

arel "A Relational Algebra" is the underlying query assembler for Rails and can be used to construct queries, conditions, joins, CTEs, etc. that are not high level supported in Rails. Since this library is an integral part of Rails most Rails query methods will support direct injection of Arel objects without issue (to be honest most methods convert your arguments into one of these objects anyway).
In your case you can construct the join you want as follows:
posts_table = Post.arel_table
images_table = Image.arel_table
some_id = 1
post_join = Arel::Nodes::OuterJoin.new(
posts_table,
Arel::Nodes::On.new(
posts_table[:id].eq(images_table[:post_id])
.and(posts_table[:id].eq(some_id))
)
)
SQL produced:
post_join.to_sql
#=> "LEFT OUTER JOIN [posts] ON [posts].[id] = [images].[post_id] AND [posts].[id] = 1"
Then you just add this join to your current query
Image.joins(post_join)
#=> SELECT images.* FROM images LEFT OUTER JOIN [posts] ON [posts].[id] = [images].[post_id] AND [posts].[id] = 1

Related

Rails: How to remove n+1 query when we need to query association inside loop?

I have output as result in code having queries in it (only showing basic one here)
So basically I need sum of the custom line items as well as all line items
results = Order.includes(:customer, :line_items).where('completed_at IS NOT NULL')
results.each do |result|
custom_items_sum = result.line_items.where(line_item_type: 'custom').sum(:amount)
total_sum = result.line_items.sum(:amount)
end
In this code, there is n+1 query issue, I have tried adding includes but for sure it is not going to work as we have another query inside the loop, Any help will be appreciated??
If you don't want to trigger other queries in the loop you need to avoid methods which work on relations and use that ones which work on collections. Try
custom_items_sum = result.line_items.
select { |line_item| line_item.line_item_type == 'custom' }.
sum(&:amount)
This should work without n+1 queries.
Note that it's possible to write just one query and avoid this computation anyway but that's beyond the scope of your question :)
Rails was never known to be robust enough as ORM. Use plain SQL instead:
results =
Order.connection.execute <<-SQL
SELECT order.id, SUM(line_items.amount)
FROM orders
JOIN line_items
ON (line_items.order_id = orders.id)
WHERE orders.completed_at IS NOT NULL
GROUP BY orders.id
HAVING line_items.line_item_type = 'custom'
SQL
That way you’ll get all the intermediate sums in a single query, which is way faster than performing all the calculations in ruby.
Just because #AlekseiMatiushkin says write it in raw SQL let's do the same with rails
order_table = Order.arel_table
line_items_table = LineItem.arel_table
custom_items = Arel::Table.new(:custom_items)
Order.select(
order_table[Arel.star],
line_items_table[:amount].sum.as('total_sum'),
custom_items[:amount].sum.as('custom_items_sum')
).joins(
order_table.join(line_items_table).on(
line_items_table[:order_id].eq(order_table[:id])
).join(
Arel::Nodes::As.new(line_items_table,:custom_items),
Arel::Nodes::OuterJoin
).on(
custom_items[:order_id].eq(order_table[:id]).and(
custom_items[:line_item_type].eq('custom')
)
).join_sources
).where(
order_table[:completed_at].not_eq(nil)
).group(:id)
This will produce an ActiveRecord::Relation of Order objects with a virtual attributes of total_sum and custom_items_sum using the following query
SELECT
orders.*,
SUM(line_items.amount) AS total_sum,
SUM(custom_items.amount) As custom_items_sum
FROM
orders
INNER JOIN line_items ON line_items.order_id = orders.id
LEFT OUTER JOIN line_items AS custom_items ON custom_items.order_id = orders.id
AND custom_items.line_item_type = 'custom'
WHERE
orders.completed_at IS NOT NULL
GROUP BY
orders.id
This should handle the request in a single query by using 2 joins to aggregate the needed data.
Try to use the scoping block. The following code generates very clean SQL queries.
Order.includes(:line_items).where.not(completed_at: nil).scoping do
#custom_items_sum = Order.where(line_items: { line_item_type: 'custom' })
.sum(:amount)
#total_sum = Order.sum(:amount)
end
There's not that much documentation about the scoping block but it scopes your model to the ActiveRecord requests made before (here : where('completed IS NOT NULL') and with the :line_items included).
Hope this helps! :)

How can I combine COUNT(*) for two different ActiveRecord relations into a single SQL query?

With two different ActiveRecord relation objects, is there a way to issue one SQL query to compare the counts of the relations?
eg. say I have two ActiveRecord::Relation objects like this:
posts = Post.where().where().some_scope
users = User.where().some_other_scope.where().joins(:something)
To compare the counts of each relation, I'd have to issue two SQL queries.
posts.count == users.count
# => SELECT COUNT(*) FROM posts WHERE... ;
# => SELECT COUNT(*) FROM users INNER JOIN ... WHERE... ;
I want to be able to issue just one query. Something like:
Post.select("COUNT(first) == COUNT(second) as are_equal"), posts, users).are_equal
It is not possible to combine two counts over two different tables into one query, unless you use a UNION. Which will run the two separate queries and merge the results. This will take about the same time as running the two queries separately, except you only go to the db-server once (1 query), but you loose readability. So imho I really wonder if that is worth it.
E.g. in the one case you can write
if posts.count == users.count
In the other case one would write:
count_sql = <<-SQL
select "Posts" as count_type, count(*) from posts where ...
union
select "Users" as count_type, count(*) from users where ...
SQL
result = Post.connection.execute(count_sql)
if result[0]["count"] == result[1]["count"]
You will have to decide if the performance improvement ways up to the loss of readability.
This isn't possible with ActiveRecord query methods, but the underlying Arel query builder (which ActiveRecord uses internally) can achieve this, it just looks a bit less elegant:
posts = Post.where().where().some_scope
users = User.where().some_other_scope.where().joins(:something)
posts_table = Post.arel_table
users_table = User.arel_table
posts_count = Arel::Nodes::Count.new([posts_table[:id]]).as('count')
users_count = Arel::Nodes::Count.new([users_table[:id]]).as('count')
union = posts.select(posts_count).arel.union(users.select(users_count).arel)
post_count, user_count = Post.from(posts_table.create_table_alias(union, :posts)).map(&:count)
Although it may not actually be beneficial in this case (as discussed in other answers), it's worth being aware of Arel because there are times where it is useful - I always try to avoid raw SQL in my Rails applications and Arel makes that possible.
An excellent introduction can be found here: https://danshultz.github.io/talks/mastering_activerecord_arel/#/
You can always write your own SQL query.
Let's say you have two models, AdminUser and Company. One way of doing what you want would be the following:
ActiveRecord::Base.connection.execute("SELECT COUNT(*) as nb from admin_users UNION SELECT COUNT(*) as nb from companies;").to_a
You'll end up with an array of two hashes, each containing the number of records of each database table.

Avoiding duplicate joins in Rails ActiveRecord query

I have a scenario where I have SQL joins somewhere in the query chain, and then at some further point I need to append a condition which needs the same join, but I don't know at this point whether that join exists already in the scope. For example:
#foo = Foo.joins("INNER JOIN foos_bars ON foos_bars.foo_id = foos.id")
....
#foo.joins(:bars).where(bars: { id: 1 })
This will product an SQL error about duplicate table/alias names.
The reason I write the SQL join manually in the first instance is to improve the efficiency as the classic rails AREL join will product two INNER JOINS where in my case I only need the one.
Is there a recommended way around this? Some way to inspect the joins currently in a scope for example.
Response to comment:
With a has_and_belongs_to_many relationship Rails produces two INNER JOINS like this:
SELECT "journals".* FROM "journals"
INNER JOIN "categories_journals"
ON "categories_journals"."journal_id" = "journals"."id"
INNER JOIN "categories"
ON "categories"."id" = "categories_journals"."category_id"
WHERE "categories"."id" = 1
Whereas I believe I can do this instead:
SELECT "journals".* FROM "journals"
INNER JOIN "categories_journals"
ON "categories_journals"."journal_id" = "journals"."id"
WHERE "categories_journals"."category_id" = 1
Correct me if I'm wrong.
The solution was to universally use string joins. Unbeknownst to me Rails actually uniqs string joins -- so as long as they're string identical this problem doesn't occur.
This article put me on the scent, the author exhibits the exact same problem as me and patches Rails, and it looks like the patch was implemented a long time ago. I don't think it's perfect though. There should be a way for rails to handle hash parameter joins and string joins and not bomb out when they overlap. Might see if I can patch that ..
EDIT:
I did a couple of benchmarks to see if I was really worrying about nothing or not (between the two ways of joining):
1.9.3p194 :008 > time = Benchmark.realtime { 1000.times { a = Incident.joins("INNER JOIN categories_incidents ON categories_incidents.incident_id = incidents.id").where("categories_incidents.category_id = 1") } }
=> 0.042458
1.9.3p194 :009 > time = Benchmark.realtime { 1000.times { a = Incident.joins(:categories).where(categories: { id: 1 }) } }
=> 0.152703
I'm not a regular benchmarker so my benchmarks may not be perfect but it looks to me as though my more efficient way does make real world performance improvements over large queries or lots of queries.
The downside of joining in the way I have done is that if a Category didn't exist but was still recorded in the join table then that might cause a few problems that would otherwise be avoidable with the more thorough join.

Rails inner join two sql statements to get ActiveRecord::Relation

I have complex query that I want to join itself to do additional computations. In sql I can do
SELECT t1.*, t2.*
FROM (SQL) AS t1
INNER JOIN
(SQL) AS t2
ON t1.num = t2.num - 1
Suppose the SQL is the query I want to join.
How to do that in rails with ActiveRecord / arel (or something else) to get ActiveRecord::Relation and to be able to use where() on it.
If I do it with sql and execute/select_all I get PG:result or hash and can't use where memthod anymore.
First solution is interpolating other query using to_sql method:
Place.select("places.*, communes.*").joins("INNER JOIN (#{
Commune.where(:id => [1,2,3]).to_sql
}) as communes ON communes.id = places.commune_id")
Second solution is using Arel's merge method:
Place.select("places.*, communes.*")
.joins(:commune)
.merge(Commune.where(:id => [1,2,3]))

ActiveRecord Custom Query vs find_by_sql loading

I have a Custom Query that look like this
self.account.websites.find(:all,:joins => [:group_websites => {:group => :users}],:conditions=>["users.id =?",self])
where self is a User Object
I manage to generate the equivalent SQL for same
Here how it look
sql = "select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = #{account_id} AND (users.id = #{user_id}))"
With the decent understanding of SQL and ActiveRecord I assumed that(which most would agree on) the result obtained from above query might take a longer time as compare to result obtained from find_by_sql(sql) one.
But Surprisingly
When I ran the above two
I found the ActiveRecord custom Query leading the way from ActiveRecord "find_by_sql" in term of load time
here are the test result
ActiveRecord Custom Query load time
Website Load (0.9ms)
Website Columns(1.0ms)
find_by_sql load time
Website Load (1.3ms)
Website Columns(1.0ms)
I repeated the test again an again and the result still the came out the same(with Custom Query winning the battle)
I know the difference aren't that big but still I just cant figure out why a normal find_by_sql query is slower than Custom Query
Can Anyone Share a light on this.
Thanks Anyway
Regards
Viren Negi
With the find case, the query is parameterized; this means the database can cache the query plan and will not need to parse and compile the query again.
With the find_by_sql case the entire query is passed to the database as a string. This means there is no caching that the database can do on the structure of the query, and it needs to be parsed and compiled on each occasion.
I think you can test this: try find_by_sql in this way (parameterized):
User.find_by_sql(["select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = ? AND (users.id = ?))", account_id, users.id])
Well, the reason is probably quite simple - with custom SQL, the SQL query is sent immediately to db server for execution.
Remember that Ruby is an interpreted language, therefore Rails generates a new SQL query based on the ORM meta language you have used before it can be sent to the actual db server for execution. I would say additional 0.1 ms is the time taken by framework to generate the query.

Resources