I have a model called Game in which I build up a scoped query.
Something like:
games = Game.scoped
games = games.team(team_name) if team_name
games = game.opponent(opponent_name) if opponent_name
total_games = games
I then calculate several subsets like:
wins = games.where("team_score > opponent_score").count
losses = games.where("opponent_score > team_score").count
Everything is great. Then I decided that I want to limit the original scope to show the last X number of games.
total_games = games.limit(10)
If there are 100 games that match what I want for total_games, and then I add .limit(10) - it gets the last 10. Great. But now calling
total_games.where("team_score > opponent_score").count
will reach back beyond the last 10, and into results that aren't part of total_games. Since adding .limit(10), I'll always get 10 total games, but also 10 wins, and 10 losses.
After typing this all out, I've realized that the cases where I want to use limit are for showing a smaller set of results - so I'll probably end up just looping through the results to calculate things like wins and losses (instead of doing separate queries as in my subsets above).
I tried this out when total_games had hundreds or thousands of results, and it's significantly slower to loop through than it is to just do separate queries for the subsets.
So, now I must know - what is the best way to limit a scoped query, and then do future queries of those results that restrict themselves results returned by the original .limit(x)?
I don't think you can do what you want to do without separating your query into two steps, first getting 10 games from total_games and making the DB query with all:
last_10_games = total_games.limit(10).all
then selecting from the resulting array and getting the size of the result:
wins = last_10_games.select { |g| g.team_score > g.opponent_score }.count
losses = last_10_games.select { |g| g.opponent_score > g.team_score }.count
I know this is not exactly what you asked for, but I think it's probably the most straightforward solution to the problem.
Related
I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count
I have an app that returns a bunch of records in a list in a random order with some pagination. For this, I save a seed value (so that refreshing the page will return the list in the same order again) and then use .order('random()').
However, say that out of 100 records, I have 10 records that have a preferred_level = 1 while all the other 90 records have preferred_level = 0.
Is there some way that I can place the preferred_level = 1 records first but still keep everything randomized?
For example, I have [A,1],[X,0],[Y,0],[Z,0],[B,1],[C,1],[W,0] and I hope I would get back something like [B,1],[A,1],[C,1],[Z,0],[X,0],[Y,0],[W,0].
Note that even the ones with preferred_level = 1 are randomized within themselves, just that they come before all the 0 records. In theory, I would hope whatever solution would place preferred_level = 2 before the 1 records if I were ever to add them.
------------
I had hoped it would be as intuitively simple as Model.all.order('random()').order('preferred_level DESC') but that doesn't seem to be the case. The second order doesn't seem to affect anything.
Any help would be appreciated!
This got the job done for me on a very similar problem.
select * from table order by preferred_level = 1 desc, random()
or I guess the Rails way
Model.order('preferred_level = 1 desc, random()')
Nodes with the Location node label have an index on Label.name
Profiling the following query gives me a smart plan, with a NodeHashJoin between the two sides of the graph on either side of Trip nodes. Very clever. Works great.
PROFILE MATCH (rosen:Location)<-[:OCCURS_AT]-(ev:Event)<-[:HAS]-(trip:Trip)-[:OPERATES_ON]->(date:Date)
WHERE rosen.name STARTS WITH "U Rosent" AND
ev.scheduled_departure_time > "07:45:00" AND
date.date = '2015-11-20'
RETURN rosen.name, ev.scheduled_departure_time, trip.headsign
ORDER BY ev.scheduled_departure_time
LIMIT 20;
However, just changing one line of the query from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE id(rosen) = 4752371 AND
seems to alter the entire behavior of the query plan, which now appears to become more "sequential", losing the parallel execution of (Trip)-[:OPERATES_ON]->(Date)
Much slower. 6x more DB hits in total.
Question
Why does changing the retrieval of one, seemingly-unrelated Location node via a different index/mechanism alter the behavior of the whole query?
(I'm not sure how best to convey more information about the graph model, but please advise, and I'd be happy to add details that are missing)
Edit:
It gets better. Changing that query line from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE rosen.name = "U Rosenthaler Platz." AND
results in the same loss of parallelism in the query plan!
Seems odd that a LIKE query is faster than an = ?
I have a query that loads thousands of objects and I want to tame it by using find_in_batches:
Car.includes(:member).where(:engine => "123").find_in_batches(batch_size: 500) ...
According to the docs, I can't have a custom sorting order: http://www.rubydoc.info/docs/rails/4.0.0/ActiveRecord/Batches:find_in_batches
However, I need a custom sort order of created_at DESC. Is there another method to run this query in chunks like it does in find_in_batches so that not so many objects live on the heap at once?
Hm I've been thinking about a solution for this (I'm the person who asked the question). It makes sense that find_in_batches doesn't allow you to have a custom order because lets say you sort by created_at DESC and specify a batch_size of 500. The first loop goes from 1-500, the second loop goes from 501-1000, etc. What if before the 2nd loop occurs, someone inserts a new record into the table? That would be put onto the top of the query results and your results would be shifted 1 to the left and your 2nd loop would have a repeat.
You could argue though that created_at ASC would be safe then, but it's not guaranteed if your app specifies a created_at value.
UPDATE:
I wrote a gem for this problem: https://github.com/EdmundMai/batched_query
Since using it, the average memory of my application has HALVED. I highly suggest anyone having similar issues to check it out! And contribute if you want!
The slower manual way to do this, is to do something like this:
count = Cars.includes(:member).where(:engine => "123").count
count = count/500
count += 1 if count%500 > 0
last_id = 0
while count > 0
ids = Car.includes(:member).where("engine = "123" and id > ?", last_id).order(created_at: :desc).limit(500).ids #which plucks just the ids`
cars = Cars.find(ids)
#cars.each or #cars.update_all
#do your updating
last_id = ids.last
count -= 1
end
Can you imagine how find_in_batches with sorting will works on 1M rows or more? It will sort all rows every batch.
So, I think will be better to decrease number of sort calls. For example for batch size equal to 500 you can load IDs only (include sorting) for N * 500 rows and after it just load batch of objects by these IDs. So, such way should decrease have queries with sorting to DB in N times.
I need to create a sort of deductive query, where the same records are ordered multiple times and the order each time is performed independently from previous orders.
Such as:
tallest_trees = Trees.order(:height => :desc).limit(10) # Gets 10 of the tallest trees out of all available trees
tallest_trees.order(:width => :desc) # From the 10 of the tallest trees, gets the widest trees **without** any regard to their height
I've attempted to do it in a similar way as above, but the result has always been that it attempts to order the, in this case, trees, both by height and width at the same time, which is the opposite of my needs.
There hasn't been any difference in regards to using either ActiveRecord or SQL directly. I believe the answer is that I somehow have to "stabilize" the previous query so that the second one doesn't just continue it, but instead order it anew.
There are 2 ways. First:
base_scope = Trees.limit(10)
tallest_trees = base_scope.order(height: :desc)
widest_trees = base_scope.order(width: :desc)
Second - with reorder method:
tallest_trees = Trees.order(height: :desc).limit(10)
widest_trees = tallest_trees.reorder(width: :desc)