my active record query causing slow loading times - ruby-on-rails

I have a sidebar on my app where I list customers with "open" tickets. When there are a lot of customers with open tickets, my site runs really slow. Upon examining the logs, it seems the lag is coming from the sidebar. Here's the query:
#sidebar_customers = current_user.company.customers.open_or_claimed.where("user_id = ? OR user_id IS NULL", current_user.id).order(aasm_state: :asc)
and this is how I display the sidebar:
- #sidebar_customers.each do |customer|
%li#sidebar-customer{:class => "#{active_state_class(customer)}"}
= link_to raw("<i class = 'menu-icon fa #{claimed_class(customer)}' id = 'icon-cust-#{customer.id}'></i><span class = 'mm-text'>#{customer.full_name}</span>#{'<span id = "open" class = "label label-success pull-right">open</span>' if customer.open?}#{"<span id = 'new-cust-#{customer.id}' class = 'label label-danger'>new</span>" if customer.not_viewed_count > 0 && customer.claimed?}"), customer, id: "cust-#{customer.id}"
how can I make this faster? am i doing something wrong?
EDIT: also, this seems to be making the site slow as well:
#messages = current_user.company.messages.find(:all, :order => "id desc", :limit => 25)

This is not surprising since current_user.company.customers does SQL joins on the users table, the companys table and the customers table. And SQL joins are expensive (time consuming). As you noticed the performance of the joins decreases significantly with increase in data (customers in this case).
There are many ways to optimize the performance here.
First experiment with eager loading of data. See the section on Eager Loading Multiple Associations
http://guides.rubyonrails.org/active_record_querying.html
Next review this SO thread on differences between includes and Joins on performance.
Rails :include vs. :joins
Be sure to review the SQL generated in the log file as you experiment. See if you can eliminate any unwanted joining.
Other usual things to speed up queries: Add indexes to the tables. Reduce the data returned by adding more constraints (where conditions).

Related

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

Rails left join with conditions

I have users, problems, and attempts which is a join table between users and problems. I'm looking to show an index of all the problems along with the current user's most recent attempt for each, if they have one.
I've tried four things to get a left join with conditions and none of them have worked.
The naive approach is something like...
#problems = Problem.enabled
#problems.each do { |prob|
prob.last_attempt = prob.attempts
.where(user_id: current_user.id)
.last
end
This gets all the problems and the attempts I want but is N+1 queries. So...
#problems = Problem.enabled
.includes(:attempts)
This does the left join (or the equivalent two queries) getting all the problems but also all the attempts, not just those for the current user. So...
#problems = Problem.enabled
.includes(:attempts)
.where(attempts: {user_id: current_user.id})
This gets only those problems that the current user has already attempted.
So...
//problem.rb
has_many :user_attempts,
-> (user) { where(user_id: user.id) },
class_name: 'Attempt'
//problem_controller.index
#problems = Problem.enabled
.includes(:user_attempts, current_user)
And this gives an error message from rails saying joins with instance
arguments are not supported.
So I'm stuck. What is the best way to do this? Is Arel the right tool? Can I skip active record and just get back a JSON blob? Am I just being dumb?
This question is quite similar to this one but I'd need a argument to the joined scope which isn't supported. And I'm hoping rails added something in last couple years.
Thanks so much for your help.
The way I solved this was to use raw sql. It's ugly and a security risk but I didn't find better.
results = Problem.connection.exec_query(%(
SELECT *
FROM problems
LEFT JOIN (
SELECT *
//etc.
)
))
And then manipulating the results array in memory.

Rails 4: order model by latest date of child

In Rails 4, I have a model Thread which has_many Emails. Each Email has a field named internal_date. I want to return a collection of threads, ordered in a way where the thread with the latest email.internal_date comes first (very similar to how Gmail would sort its inbox).
This is the current line in my controller (not ordering them so far):
#threads = selected_threads.joins(:tags).filter(params_filters).includes(:emails, [some other stuff]).distinct.all.paginate(page: params[:page], :per_page => 10)
I'm doing the joins because of the filtering; and using includes to speed things up.
Ideally I would add a scope order_by_latest_email to my Thread model, without killing the loading time with too many DB queries. Any tips?
Thanks!
I think this is only possible with a really ugly query like so:
Thread.joins(:emails)
.select('threads.*, emails.internal_date')
.joins('LEFT OUTER JOIN emails em ON (emails.internal_date < em.internal_date and emails.thread_id = em.thread_id)')
.where('em.id IS NULL').order('emails.internal_date DESC')
# additional filters here
You can see details in a blog post here, but this is a semi common sql problem known as the greatest n per group.
You need to find the latest email internal date in the group of emails connected to a thread. So what you do is:
compare all the emails with each other
continue until you find one where no other email (where em represents that other email) has a later internal_date (that's what the em.id IS NULL is doing)
order by that email's internal_date.

Rails 3.2 Query - .exists?

I have about 500 outlets. Each outlet will be monitored a minimum of one time per day. I am trying to get a list of outlets that have been monitored each day.
I am having a problem with the query at the moment, any help is appreciated:
<% for outlet in #outlets %>
<% if Monitoring.exists?( :outlet_id => outlet.id, 'DATE(created_at) = ?', Date.today ) %>
The #outlets is an instance variable containing Outlet.all.
This query leaves me with a syntax error. What would be the correct way to do this? I'm trying to check that the Monitoring belongs to the Outlet, and that the Monitoring record was created today.
Also, I'm not entirely sure of the speed implications of this query. There will be a max of 2000 outlets on a page at one time (it's a dashboard, so they appear as either red or green dots).
Any help greatly appreciated.
You're getting a syntax error because you're trying to mix implicit-Hash and implicit-Array arguments:
Monitoring.exists?(:outlet_id => outlet.id, 'DATE(created_at) = ?', Date.today)
The exists? methods wants a Hash as its single argument. You want to use an SQL function in the query though, that means that you have to use the Model.where(...).exists? form:
Monitoring.where(:outlet_id => outlet.id).where('date(created_at) = ?', Date.today).exists?
That still leaves you hitting the database over and over again to light up your lights. You could precompute the whole mess with something like this:
counts = Monitoring.where('date(created_at) = ?', Date.today).count(:group => :outlet_id)
And then look use counts.has_key? outlet.id in your loop. Adding a where(:outlet_id => outlet_ids) (where outlet_ids are the IDs you're interested in) might make sense as well. You might be able to combine the count query with the query that is generating the #outlets too.

Ruby on Rails ActiveRecord efficiency

This code should update the entire table by applying a filter to its "name" values:
entries = select('id, name').all
entries.each do |entry|
puts entry.id
update(entry.id, { :name => sanitize(entry.name) })
end
I am pretty new to Ruby on Rails and found it interesting, that my selection query is split into the single row selections:
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 1) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 2) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 3) LIMIT 1
...
As I understand, it's a kind of optimization, provided by Rails - to select a row only when it's needed (every cycle) and not the all entries at once.
However, is it really more efficient in this case? I mean, if I have 1000 records in my database table - is it better to make 1000 queries than a single one? If not, how can I force Rails to select more than one row per query?
Another question is: not all rows are updated by this query. Does Rails ignore the update query, if the provided values are the same which already exist (in other words, if entry.name == sanitize(entry.name))?
ActiveRecord is an abstraction layer, but when doing certain operations (especially those involving large datasets) it is useful to know what is happening underneath the abstraction layer.
This is pretty much true for all abstractions. (see Joel Spolsky's classic article on leaky abstractions: http://www.joelonsoftware.com/articles/LeakyAbstractions.html )
To deal with the case in point here, Rails provides the update_all method
Entry.find_each do |entry|
#...
end
That fetches all entries (100 per query) and exposes each entry for your pleasure.
If attributes are not changed, Rails will not perform an UPDATE query.

Resources