Ruby on Rails ActiveRecord efficiency - ruby-on-rails

This code should update the entire table by applying a filter to its "name" values:
entries = select('id, name').all
entries.each do |entry|
puts entry.id
update(entry.id, { :name => sanitize(entry.name) })
end
I am pretty new to Ruby on Rails and found it interesting, that my selection query is split into the single row selections:
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 1) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 2) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 3) LIMIT 1
...
As I understand, it's a kind of optimization, provided by Rails - to select a row only when it's needed (every cycle) and not the all entries at once.
However, is it really more efficient in this case? I mean, if I have 1000 records in my database table - is it better to make 1000 queries than a single one? If not, how can I force Rails to select more than one row per query?
Another question is: not all rows are updated by this query. Does Rails ignore the update query, if the provided values are the same which already exist (in other words, if entry.name == sanitize(entry.name))?

ActiveRecord is an abstraction layer, but when doing certain operations (especially those involving large datasets) it is useful to know what is happening underneath the abstraction layer.
This is pretty much true for all abstractions. (see Joel Spolsky's classic article on leaky abstractions: http://www.joelonsoftware.com/articles/LeakyAbstractions.html )
To deal with the case in point here, Rails provides the update_all method

Entry.find_each do |entry|
#...
end
That fetches all entries (100 per query) and exposes each entry for your pleasure.
If attributes are not changed, Rails will not perform an UPDATE query.

Related

How do I write a Rails finder method where none of the has_many items has a non-nil field?

I'm using Rails 5. I have the following model ...
class Order < ApplicationRecord
...
has_many :line_items, :dependent => :destroy
The LineItem model has an attribute, "discount_applied." I would like to return all orders where there are zero instances of a line item having the "discount_applied" field being not nil. How do I write such a finder method?
First of all, this really depends on whether or not you want to use a pure Arel approach or if using SQL is fine. The former is IMO only advisable if you intend to build a library but unnecessary if you're building an app where, in reality, it's highly unlikely that you're changing your DBMS along the way (and if you do, changing a handful of manual queries will probably be the least of your troubles).
Assuming using SQL is fine, the simplest solution that should work across pretty much all databases is this:
Order.where("(SELECT COUNT(*) FROM line_items WHERE line_items.order_id = orders.id AND line_items.discount_applied IS NULL) = 0")
This should also work pretty much everywhere (and has a bit more Arel and less manual SQL):
Order.left_joins(:line_items).where(line_items: { discount_applied: nil }).group("orders.id").having("COUNT(line_items.id) = 0")
Depending on your specific DBMS (more specifically: its respective query optimizer), one or the other might be more performant.
Hope that helps.
Not efficient but I thought it may solve your problem:
orders = Order.includes(:line_items).select do |order|
order.line_items.all? { |line_item| line_item.discount_applied.nil? }
end
Update:
Instead of finding orders which all it's line items have no discount, we can exclude all the orders which have line items with a discount applied from the output result. This can be done with subquery inside where clause:
# Find all ids of orders which have line items with a discount applied:
excluded_ids = LineItem.select(:order_id)
.where.not(discount_applied: nil)
.distinct.map(&:order_id)
# exclude those ids from all orders:
Order.where.not(id: excluded_ids)
You can combine them in a single finder method:
Order.where.not(id: LineItem
.select(:order_id)
.where.not(discount_applied: nil))
Hope this helps
A possible code
Order.includes(:line_items).where.not(line_items: {discount_applied: nil})
I advice to get familiar with AR documentation for Query Methods.
Update
This seems to be more interested than I initially though. And more complicated, so I will not be able to give you a working code. But I would look into a solution using LineItem.group(order_id).having(discount_applied: nil), which should give you a collection of line_items and then use it as sub-query to find related orders.
If you want all the records where discount_applied is nil then:
Order.includes(:line_items).where.not(line_items: {discount_applied: nil})
(use includes to avoid n+1 problem)
or
Order.joins(:line_items).where.not(line_items: {discount_applied: nil})
Here is the solution to your problem
order_ids = Order.joins(:line_items).where.not(line_items: {discount_applied: nil}).pluck(:id)
orders = Order.where.not(id: order_ids)
First query will return ids of Orders with at least one line_item having discount_applied. The second query will return all orders where there are zero instances of a line_item having the discount_applied.
I would use the NOT EXISTS feature from SQL, which is at least available in both MySQL and PostgreSQL
it should look like this
class Order
has_many :line_items
scope :without_discounts, -> {
where("NOT EXISTS (?)", line_items.where("discount_applied is not null")
}
end
If I understood correctly, you want to get all orders for which none line item (if any) has a discount applied.
One way to get those orders using ActiveRecord would be the following:
Order.distinct.left_outer_joins(:line_items).where(line_items: { discount_applied: nil })
Here's a brief explanation of how that works:
The solution uses left_outer_joins, assuming you won't be accessing the line items for each order. You can also use left_joins, which is an alias.
If you need to instantiate the line items for each Order instance, add .eager_load(:line_items) to the chain which will prevent doing an additional query for every order (N+1), i.e., doing order.line_items.each in a view.
Using distinct is essential to make sure that orders are only included once in the result.
Update
My previous solution was only checking that discount_applied IS NULL for at least one line item, not all of them. The following query should return the orders you need.
Order.left_joins(:line_items).group(:id).having("COUNT(line_items.discount_applied) = ?", 0)
This is what's going on:
The solution still needs to use a left outer join (orders LEFT OUTER JOIN line_items) so that orders without any associated items are included.
Groups the line items to get a single Order object regardless of how many items it has (GROUP BY recipes.id).
It counts the number of line items that were given a discount for each order, only selecting the ones whose items have zero discounts applied (HAVING (COUNT(line_items.discount_applied) = 0)).
I hope that helps.
You cannot do this efficiently with a classic rails left_joins, but sql left join was build to handle thoses cases
Order.joins("LEFT JOIN line_items AS li ON li.order_id = orders.id
AND li.discount_applied IS NOT NULL")
.where("li.id IS NULL")
A simple inner join will return all orders, joined with all line_items,
but if there are no line_items for this order, the order is ignored (like a false where)
With left join, if no line_items was found, sql will joins it to an empty entry in order to keep it
So we left joined the line_items we don't want, and find all orders joined with an empty line_items
And avoid all code with where(id: pluck(:id)) or having("COUNT(*) = 0"), on day this will kill your database

Speed up Active Record group by count query

How can I speed up the following query? I'm look to find record with 6 or less unique values of fb_id. The select doesn't seem to be adding much in terms of time but instead it's the group and count. Is there an alternate way to query? I added an index on fb_id and it only sped up the query by 50%
FbGroupApplication.group(:fb_id).where.not(
fb_id: _get_exclude_fb_group_ids
).group(
"count_fb_id desc"
).count(
"fb_id"
).select{|k, v| v <= 6 }
The query is looking for FbGroupApplications that have 6 or less applications to the same fb_id
Passing a block to the select method made Rails trigger the SQL, convert the found rows into ActiveRecord::Base's ruby object (record), and then perform a select on the array based of the block you gave. This whole process is costly (ruby is not good at this).
You can "delegate" the responsibility of comparing the count vs 6 to the database with a having clause:
FbGroupApplication
.group(:fb_id)
.where.not(fb_id: _get_exclude_fb_group_ids)
.having('count(fb_id) <= 6')

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

includes/joins case in rails 4

I have a habtm relationship between my Product and Category model.
I'm trying to write a query that searches for products with minimum of 2 categories.
I got it working with the following code:
p = Product.joins(:categories).group("product_id").having("count(product_id) > 1")
p.length # 178
When iterating on it though, for each time I call product.categories, it will do a new call to the database - not good. I want to prevent these calls and have the same result. Doing more research I've seen that I could include (includes) my categories table and it would load all the table in memory so it's not necessary to call the database again when iterating. So I got it working with the following code:
p2 = Product.includes(:categories).joins(:categories).group("product_id").having("count(product_id) > 1")
p2.length # 178 - I compared and the objects are the same as last query
Here come's what I am confused about:
p.first.eql? p2.first # true
p.first.categories.eql? p2.first.categories # false
p.first.categories.length # 2
p2.first.categories.length # 1
Why with the includes query I get the right objects but I don't get the categories relationship right?
It has something to do with the group method. Your p2 only contains the first category for each product.
You could break this up into two queries:
product_ids = Product.joins(:categories).group("product_id").having("count(product_id) > 1").pluck(:product_id)
result = Product.includes(:categories).find(product_ids)
Yeah, you hit the database twice, but at least you don't go to the database when you're iterating.
You must know that includes doesn't play well with joins (joins will just suppress the former).
Also When you include an association ActiveRecord figures out if it'll use eager_load (with a left join) or preload (with a separate query). Includes is just a wrapper for one of those 2.
The thing is preload plays well with joins ! So you can do this :
products = Product.preload(:categories). # this will trigger a separate query
joins(:categories). # this will build the relevant query
group("products.id").
having("count(product_id) > 1").
select("products.*")
Note that this will also hit the database twice, but you will not have any O(n) query.

How do you set the value of a newly added ActiveRecord counter cache?

I have a model object which did not have a counter cache on it before and I added it via a migration. The thing is, I tried and failed to set the starting value of the counter cache based on the number of child objects I already had in the migration. Any attempt to update the cache value did not get written to the database. I even tried to do it from the console but it was never going to happen. Any attempt to write directly to that value on the parent was ignored.
Changing the number of children updated the counter cache (as it should), and removing the ":counter_cache => true" from the child would let me update the value on the parent. But that's cheating. I needed to be able to add the counter cache and then set its starting value to the number of children in the migration so I could then start with correct values for pages which would show it.
What's the correct way to do that so that ActiveRecord doesn't override me?
You want to use the update_counters method, this blog post has more details:
josh.the-owens.com add a counter cache to an existing db-table
This RailsCasts on the topic is also a good resource:
http://railscasts.com/episodes/23-counter-cache-column
The canonical way is to use reset_counter_cache, i.e.:
Author.find_each do |author|
Author.reset_counter_cache(author.id, :books)
end
...and that's how you should do it if those tables are of modest size, i. e. <= 1,000,000 rows.
BUT: for anything large this will take on the order of days, because it requires two queries for each row, and fully instantiates a model etc.
Here's a way to do it about 5 orders of magnitude faster:
Author
.joins(:books)
.select("authors.id, authors.books_count, count(books.id) as count")
.group("authors.id")
.having("authors.books_count != count(books.id)")
.pluck(:id, :books_count, "count(books.id)")
.each_with_index do |(author_id, old_count, fixed_count), index|
puts "at index %7i: fixed author id %7i, new books_count %4i, previous count %4i" % [index, author_id, fixed_count, old_count] if index % 1000 == 0
Author.update_counters(author_id, books_count: fixed_count - old_count)
end
It's also possible to do it directly in SQL using just a single query, but the above worked well enough for me. Note the somewhat convoluted way it uses the difference of the previous count to the correct one: this is necessary because update_counters doesn't allow setting an absolute value, but only to increase/decrease it. The column is otherwise marked readonly.

Resources