How can I speed up the following query? I'm look to find record with 6 or less unique values of fb_id. The select doesn't seem to be adding much in terms of time but instead it's the group and count. Is there an alternate way to query? I added an index on fb_id and it only sped up the query by 50%
FbGroupApplication.group(:fb_id).where.not(
fb_id: _get_exclude_fb_group_ids
).group(
"count_fb_id desc"
).count(
"fb_id"
).select{|k, v| v <= 6 }
The query is looking for FbGroupApplications that have 6 or less applications to the same fb_id
Passing a block to the select method made Rails trigger the SQL, convert the found rows into ActiveRecord::Base's ruby object (record), and then perform a select on the array based of the block you gave. This whole process is costly (ruby is not good at this).
You can "delegate" the responsibility of comparing the count vs 6 to the database with a having clause:
FbGroupApplication
.group(:fb_id)
.where.not(fb_id: _get_exclude_fb_group_ids)
.having('count(fb_id) <= 6')
Related
I have an ActiveRecord request:
Post.all.select { |p| Date.today < p.created_at.weeks_since(2) }
And I want to be able to see what SQL request this produces using .to_sql
The error I get is: NoMethodError: undefined method 'to_sql'
TIA!
ISSUE
There are 2 types of select when it comes to ActiveRecord objects, from the Docs
select with a Block.
First: takes a block so it can be used just like Array#select.
This will build an array of objects from the database for the scope, converting them into an array and iterating through them using Array#select.
This is what you are using right now. This implementation will load every post instantiate a Post object and then iterating over each Post using Array#select to filter the results into an Array. This is highly inefficient, cannot be chained with other AR semantics (e.g. where,order,etc.) and will cause very long lags at scale. (This is also what is causing your error because Array does not have a to_sql method)
select with a list of columns (or a String if you prefer)
Second: Modifies the SELECT statement for the query so that only certain fields are retrieved...
This version is unnecessary in your case as you do not wish to limit the columns returned by the query to posts.
Suggested Resolution:
Instead what you are looking for is a WHERE clause to filter the records at the database level before returning them to the ORM.
Your current filter is (X < Y + 2)
Date.today < p.created_at.weeks_since(2)
which means Today's Date is less than Created At plus 2 Weeks.
We can invert this criteria to make it easier to query by switching this to Today's Date minus 2 weeks is less than Created At. (X - 2 < Y)
Date.today.weeks_ago(2) < p.created_at
This is equivalent to p.created_at > Date.today.weeks_ago(2) which we can convert to a where clause using standard ActiveRecord query methods:
Post.where(created_at: Date.today.weeks_ago(2)...)
This will result in SQL like:
SELECT
posts.*
FROM
posts.*
WHERE
posts.created_at > '2022-10-28'
Notes:
created_at is a TimeStamp so it might be better to use Time.now vs Date.today.
Additional concerns may be involved from a time zone perspective since you will be performing date/time specific comparisons.
You need to call to_sql on a relation. select executes the query and gives you the result, and on the result you don't have to_sql method.
There are similar questions which you can look at as they offer some alternatives.
How can I use the join table's column value with arithmetic operation during the where condition on Rails?
User and Order are the two Schema, Order has user via Foreign key relation
My goal is to find if an Order was created/placed within 5 minutes of User creation (Understanding Users who signup for placing an Order)
Tried the following queries
Order.where('country': 'US').joins(:user).where('orders.created_at <= :u_date', {u_date: 'users.created_at' + 5.minutes })
With this query we get the following error no implicit conversion of Time into String, so the users.created_at is not evaluating into a Date
Hence tried converting the string to DateTime objects, which failed too
Order.joins(:user).where('orders.created_at < ?', 'users.created_at'+ 5.minutes)
How can I do the comparison inside the Where query?
Right now I am plucking the data and comparing it, It'd be great to make it work inside the Where or any relevant query itself
You're invoking + on a string passing as argument a Time object, which is not an out-of-the-box operation, at least in Rails.
If the time to add is not dynamic you could try;
where("orders.created_at <= users.created_at + INTERVAL '5.minutes'")
which makes your DBMS add the proper interval to users.created_at (in this case I'm assuming Postgresql)
I need to retrieve a list of collections ie #medications, #treatments, #therapies etc. with a count of each collections related records.
This works but creates the initial query and then a new query for each related record count. Is there a way I can minimize the number of queries?
#medications = Medication.includes(:records).select(:id, :name).where(office_id: current_user.selected_office)
#medications.each do |medication|
medication.record_count = medication.records.count
end
if #medications query has 10 results I have total of 11 queries. I need 10 collections with related record counts so I would end up with 110 queries per request.
All models have same attributes of name, office_id , etc
I am wondering how I can restructure database to better use or restructure query. Incidentally I am using Postgres db v 9.6
What about with joins and count?:
Medication.left_outer_joins(:records)
.select('medications.name, medications.id, COUNT(records.id) AS records_count')
.group(:id)
.where(office_id: current_user.selected_office)
If you're planning to count on each record per medication, then you can use joins, and count passing the records and assigning an alias.
As each model has the name and id columns, you need to be more precise on defining them, and group in this case (correct me if I'm wrong), is mandatory, otherwise you'd get a PG::GroupingError.
I have a query that loads thousands of objects and I want to tame it by using find_in_batches:
Car.includes(:member).where(:engine => "123").find_in_batches(batch_size: 500) ...
According to the docs, I can't have a custom sorting order: http://www.rubydoc.info/docs/rails/4.0.0/ActiveRecord/Batches:find_in_batches
However, I need a custom sort order of created_at DESC. Is there another method to run this query in chunks like it does in find_in_batches so that not so many objects live on the heap at once?
Hm I've been thinking about a solution for this (I'm the person who asked the question). It makes sense that find_in_batches doesn't allow you to have a custom order because lets say you sort by created_at DESC and specify a batch_size of 500. The first loop goes from 1-500, the second loop goes from 501-1000, etc. What if before the 2nd loop occurs, someone inserts a new record into the table? That would be put onto the top of the query results and your results would be shifted 1 to the left and your 2nd loop would have a repeat.
You could argue though that created_at ASC would be safe then, but it's not guaranteed if your app specifies a created_at value.
UPDATE:
I wrote a gem for this problem: https://github.com/EdmundMai/batched_query
Since using it, the average memory of my application has HALVED. I highly suggest anyone having similar issues to check it out! And contribute if you want!
The slower manual way to do this, is to do something like this:
count = Cars.includes(:member).where(:engine => "123").count
count = count/500
count += 1 if count%500 > 0
last_id = 0
while count > 0
ids = Car.includes(:member).where("engine = "123" and id > ?", last_id).order(created_at: :desc).limit(500).ids #which plucks just the ids`
cars = Cars.find(ids)
#cars.each or #cars.update_all
#do your updating
last_id = ids.last
count -= 1
end
Can you imagine how find_in_batches with sorting will works on 1M rows or more? It will sort all rows every batch.
So, I think will be better to decrease number of sort calls. For example for batch size equal to 500 you can load IDs only (include sorting) for N * 500 rows and after it just load batch of objects by these IDs. So, such way should decrease have queries with sorting to DB in N times.
I'm running the following query:
User.where("number > ?", 5).order(&:age).first(20)
I noticed that the speed of the query was about the same whether I replaced "first(20)" with "first(200)" or even just "first". This seems to imply that all records are retrieved by the server, no matter how many records I actually want in the array. Are there any ways to possibly expedite this process?
The performance may well be similar, because in general the database is going to have to identify all of the rows that match the conditions, then order them all, then read the first n rows from the sorted set. If n is 200 then obviously it will have to return more rows to the application, but the primary driver on database performance is probably not the quantity of rows returned but the quantity of rows to be ordered.
As others state:
User.where("number > ?", 5).order(:age).limit(20)
... or to get those with the highest age ...
User.where("number > ?", 5).order(:age => :desc).limit(20)
(Rails 4 syntax)
There are occasions when the database can use an index to provide the sort order, in which case you'd likely see a much larger performance difference between 20 or 200 rows.
You can perform the query with limit:
User.where("number > ?", 5).order(:age).limit(20)
Check this Rails Guides article for more examples.
Good luck!
You can use limit since you're ordering the results:
User.where("number > ?", 5).order('age desc').limit(20)