ActiveRecord count analysis in rails query

ActiveRecord count analysis in rails query - ruby-on-rails

I am hardly checking to find the execution speed of two queries, explain analyze and benchmark because i got timeout for one query but i am not sure this query was causing this.
queue_count = purchase.purchase_items.where("queue_id = ?", queue.id).count
same sql query
SELECT COUNT(*) FROM "purchase_items" WHERE "purchase_items"."purchase_id" = 1241422 AND (queue_id = 3479783)
so i have to remove the count then i got one solution to take all record in array and do the count then i got the query like this
queue_count = purchase.purchase_items.where("queue_id = ?", queue.id).all.count
same sql query
SELECT "purchase_items".* FROM "purchase_items" WHERE "purchase_items"."purchase_id" = 1241422 AND (queue_id = 3479783)
finally got some slight variation when i was checking with query analyze and also benchmark, so this was the correct way? or am i doing anything wrong?

In terms of performance second query will be quite terrible. It will load all records in memory and count them using Ruby. Database is designed to do stuff like this quickly.
In order to analyze query you can do EXPLAIN ANALYZE in Psql console. My long shot is that you're missing some indexes (on purchase_id and queue_id). You can look into this by running:
EXPLAIN ANALYZE SELECT COUNT(*) FROM purchase_items WHERE purchase_id = 1241422 AND (queue_id = 3479783)
If you see that PostgreSQL is scanning whole table, then performance will not be optimal. Try adding indexes:
CREATE INDEX purchase_id_purchase_items_idx ON purchase_items (purchase_id);
CREATE INDEX queue_id_purchase_items_idx ON purchase_items (queue_id);
and examining performance using EXPLAIN ANALYZE then. But never load all records into Ruby to do simple .count on them.

Related

Speed up Active Record group by count query

How can I speed up the following query? I'm look to find record with 6 or less unique values of fb_id. The select doesn't seem to be adding much in terms of time but instead it's the group and count. Is there an alternate way to query? I added an index on fb_id and it only sped up the query by 50%
FbGroupApplication.group(:fb_id).where.not(
fb_id: _get_exclude_fb_group_ids
).group(
"count_fb_id desc"
).count(
"fb_id"
).select{|k, v| v <= 6 }
The query is looking for FbGroupApplications that have 6 or less applications to the same fb_id

Passing a block to the select method made Rails trigger the SQL, convert the found rows into ActiveRecord::Base's ruby object (record), and then perform a select on the array based of the block you gave. This whole process is costly (ruby is not good at this).
You can "delegate" the responsibility of comparing the count vs 6 to the database with a having clause:
FbGroupApplication
.group(:fb_id)
.where.not(fb_id: _get_exclude_fb_group_ids)
.having('count(fb_id) <= 6')

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end

Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)

In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

neo4j cypher - Differing query plan behavior

Nodes with the Location node label have an index on Label.name
Profiling the following query gives me a smart plan, with a NodeHashJoin between the two sides of the graph on either side of Trip nodes. Very clever. Works great.
PROFILE MATCH (rosen:Location)<-[:OCCURS_AT]-(ev:Event)<-[:HAS]-(trip:Trip)-[:OPERATES_ON]->(date:Date)
WHERE rosen.name STARTS WITH "U Rosent" AND
ev.scheduled_departure_time > "07:45:00" AND
date.date = '2015-11-20'
RETURN rosen.name, ev.scheduled_departure_time, trip.headsign
ORDER BY ev.scheduled_departure_time
LIMIT 20;
However, just changing one line of the query from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE id(rosen) = 4752371 AND
seems to alter the entire behavior of the query plan, which now appears to become more "sequential", losing the parallel execution of (Trip)-[:OPERATES_ON]->(Date)
Much slower. 6x more DB hits in total.
Question
Why does changing the retrieval of one, seemingly-unrelated Location node via a different index/mechanism alter the behavior of the whole query?
(I'm not sure how best to convey more information about the graph model, but please advise, and I'd be happy to add details that are missing)
Edit:
It gets better. Changing that query line from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE rosen.name = "U Rosenthaler Platz." AND
results in the same loss of parallelism in the query plan!
Seems odd that a LIKE query is faster than an = ?

Optimize the retrieval of a small number of records from database?

I'm running the following query:
User.where("number > ?", 5).order(&:age).first(20)
I noticed that the speed of the query was about the same whether I replaced "first(20)" with "first(200)" or even just "first". This seems to imply that all records are retrieved by the server, no matter how many records I actually want in the array. Are there any ways to possibly expedite this process?

The performance may well be similar, because in general the database is going to have to identify all of the rows that match the conditions, then order them all, then read the first n rows from the sorted set. If n is 200 then obviously it will have to return more rows to the application, but the primary driver on database performance is probably not the quantity of rows returned but the quantity of rows to be ordered.
As others state:
User.where("number > ?", 5).order(:age).limit(20)
... or to get those with the highest age ...
User.where("number > ?", 5).order(:age => :desc).limit(20)
(Rails 4 syntax)
There are occasions when the database can use an index to provide the sort order, in which case you'd likely see a much larger performance difference between 20 or 200 rows.

You can perform the query with limit:
User.where("number > ?", 5).order(:age).limit(20)
Check this Rails Guides article for more examples.
Good luck!

You can use limit since you're ordering the results:
User.where("number > ?", 5).order('age desc').limit(20)

Return every nth row from database using ActiveRecord in rails

Ruby 1.9.2 / rails 3.1 / deploy onto heroku --> posgresql
Hi, Once a number of rows relating to an object goes over a certain amount, I wish to pull back every nth row instead. It's simply because the rows are used (in part) to display data for graphing, so once the number of rows returned goes above say 20, it's good to return every second one, and so forth.
This question seemed to point in the right direction:
ActiveRecord Find - Skipping Records or Getting Every Nth Record
Doing a mod on row number makes sense, but using basically:
#widgetstats = self.widgetstats.find(:all,:conditions => 'MOD(ROW_NUMBER(),3) = 0 ')
doesn't work, it returns an error:
PGError: ERROR: window function call requires an OVER clause
And any attempt to solve that with e.g. basing my OVER clause syntax on things I see in the answer on this question:
Row numbering in PostgreSQL
ends in syntax errors and I can't get a result.
Am I missing a more obvious way of efficiently returning every nth task or if I'm on the right track any pointers on the way to go? Obviously returning all the data and fixing it in rails afterwards is possible, but terribly inefficient.
Thank you!

I think you are looking for a query like this one:
SELECT * FROM (SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id) stats WHERE mod(rownum,3) = 0
This is difficult to build using ActiveRecord, so you might be forced to do something like:
#widgetstats = self.widgetstats.find_by_sql(
%{
SELECT * FROM
(
SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id
) AS stats
WHERE mod(rownum,3) = 0
}
)
You'll obviously want to change the ordering used and add any WHERE clauses or other modifications to suit your needs.

Were I to solve this, I would either just write the SQL myself, like the SQL that you linked to. You can do this with
my_model.connection.execute('...')
or just get the id numbers and find by id
ids = (1..30).step(2)
my_model.where(id => ids)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart