The problem with your typical rails pagination gem is that it does 2 queries: one for the page you're on and one for the total count. When you don't care about how many pages there are (e.g. in an endless scroll), that 2nd query is unnecessary (just add 1 to your LIMIT clause in the 1st query and you know if there are more or not).
Is there a gem that'll do pagination without the 2nd query? The 2nd query is expensive when applying non-indexed filters in my WHERE clause on large datasets and indexing all my various filters is unacceptable because I need my inserts to be fast.
Thanks!
Figured it out. When using the will_paginate gem, you can supply your own total_entries option to AR:Base.paginate. This makes it so the 2nd query doesn't run.
This works for sufficiently large datasets where you only care about recent entries.
This isn't necessarily acceptable if you actually expect to hit the end of your list because if the list size is divisible by per_page you're going to query an empty set on your last query. With endless scroll, this is fine. With a manual "load more" button, you'll be displaying "load more" at the very end when there are no more items to load.
The standard approach, as you've identified, is to fetch N+1 records when you need N and if you get more than N records in the response, there is at least one additional page of results you can display.
The only reason you'd want to do an explicit COUNT(*) call is if you need to know specifically how many more records you will need to fetch. On some engines this can take a good chunk of time to compute so it is best avoided especially if the value is never directly used.
Since this is so simple, you really don't need a plugin to do it. Plugins like will_paginate is more concerned with the number of pages available so it does the count operation.
Related
My current code is:
first_three_posts = Post.first(3)
last_three_posts = Post.last(3)
This makes the server hit twice.
Any way I can reduce it to one query
Since you want the first and last elements of an ordered table, the only option (very unsuggested) you have to execute a single query is extracting the whole dataset, and getting the head and tail from the resulting collection in ruby itself.
Needless to say, unless your Post collection is very small, it is much faster to just run 2 different queries.
This is a follow-up to this last question I asked: Sort Users by Number of Followers. That code is:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
What I hope to accomplish is take the ordered users and get the top 100 of those and then randomly choose 5 out of that 100. Is there a way to accomplish this?
Thanks.
users_in_descending_order_of_followers = User.all.sort_by { |u| -u.followers.count }
sample_of_top = users_in_descending_order_of_followers.take(100).sample(5)
You can use sort_by which can be easier to use than sort, and combine take and sample to get the top 100 users and sample 5 of those users.
User.all.sort can "potentially" pose some problems in the long-run, depending on the number of total users, and the availability of resources particularly computer memory, not to mention it would be a lot slower because you're calling 2x .followers.count inside the sort block, which essentially calls 2xN times more DB query; N being the number of users. This is because User.all.sort will immediately execute the User.all query, thereby fetching all User records into memory, as opposed to your usual User.all, which is lazy loaded, until you (for example use .each, or better yet .find_each somewhere down the line)
I suggest something like below (I extended Deekshith's answer referring to your link to the other question):
User.joins(:followers).order('count(followers.user_id) desc').limit(100).sample(5)
.joins, .order, and .limit above will all extend the SQL string query into one string, then executes that SQL string, and finally run .sample(5) (not a SQL anymore!, but is already just a plain ruby method at this point), finally yielding the result that you needed.
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.
User.order(followers_count: :desc).limit(100).sample(5)
This method will out-perform others using count(*). Add an index on followers_count for best effect.
Can Kaminari work without hitting the DB with a COUNT(*) query?
My app's database is huge and counting the items takes much much longer than getting the items itself, leading to performance issues.
Suggestions for other pagination solutions with large datasets are also welcome.
Paginating Without Issuing SELECT COUNT Query
Generally the paginator needs to know the total number of records to display the links, but sometimes we don't need the total number of records and just need the "previous page" and "next page" links. For such use case, Kaminari provides without_count mode that creates a paginatable collection without counting the number of all records. This may be helpful when you're dealing with a very large dataset because counting on a big table tends to become slow on RDBMS.
Just add .without_count to your paginated object:
User.page(3).without_count
In your view file, you can only use simple helpers like the following instead of the full-featured paginate helper:
<%= link_to_prev_page #users, 'Previous Page' %>
<%= link_to_next_page #users, 'Next Page' %>
Source: github.com/kaminari
Well, Kaminari or will_paginate needs to count total somehow in order to determine total_pages to be rendered. This is inevitable. My solution was to look at the database query and try to optimize it. That's the way to go.
(this answer is outdated, see above answers)
We have a case where we do want a total count, but don't want to hit the database for it — our COUNT query takes a couple of seconds in some cases, even with good indexes.
So we've added a counter cache to the parent table, keep it up to date with triggers, and override the total_count singleton on the Relation object:
my_house = House.find(1)
paginated = my_house.cats.page(1)
def paginated.total_count
my_house.cats_count
end
... and all the things that require counts work without making that query.
This is an unusual thing to do. Maintaining a counter cache has some costs. There may be weird side effects if you do further relational stuff with your paginated data. Overriding singleton methods can sometimes make debugging into a nightmare. But used sparingly and documented well, you can get the behavior you want with good performance.
My question is about how to perform varying levels of search into a database while limiting the number of queries.
Let's start simple:
#companies = Company.where("active = ?", true)
Let's say we display records from this set. Then, we need:
#clientcompanies = #companies.where("client_id = ?", #client.id)
We display something from #clientcompanies. Then, we want to drill down further.
#searchcompanies = #clientcompanies.where("name LIKE ? OR notes LIKE ?", "#{params[:search]}%", "#{params[:search]}%")
Are these three statements the most efficient way to go about this?
If indeed the database is starting with the entire Company table each time around, is there a way to limit the scope so each of the above statements would take a shorter amount of time as the size of the set diminishes?
In case it matters, I'm running Rails 3 on both MySQL and PostgreSQL.
It doesn't get much more optimized then what you're already doing. Exactly zero of those statements will execute a SQL query until you try to iterate over the results. Calling methods like all, first, inspect, any?, each etc will be when the query is executed.
Each time you chain on a new where or other arel method, it appends to the sql query that it'll execute at the end. If, somewhere in the middle, you want to see the query that'll be executed you can do puts #searchcompanies.to_sql
Note that if you run these commands in the console each statement appears to run a SQL query only because the console automatically runs .inspect on the line you entered.
Hopefully I answered your question :)
There's a great railscast here: http://railscasts.com/episodes/239-activerecord-relation-walkthrough that explains how ActiveRelation works, and what you can do with it.
EDIT:
I may have mis-understood your question. You indicated that after each where call you were displaying information from the query. What's the use-case for this? Are you displaying all companies on the same page that you have filtered-out companies from a search? If you display something from that very first query then you will be pulling every single company row from your database (which is not going to be very scalable or performant at larger quantities of company entries).
Would it not make sense to only display information from the #searchcompanies variable?
On Ruby on Rails, say, if the Actor model object is Tom Hanks, and the "has_many" fans is 20,000 Fan objects, then
actor.fans
gives an Array with 20,000 elements. Probably, the elements are not pre-populated with values? Otherwise, getting each Actor object from the DB can be extremely time consuming.
So it is on a "need to know" basis?
So does it pull data when I access actor.fans[500], and pull data when I access actor.fans[0]? If it jumps from each record to record, then it won't be able to optimize performance by doing sequential read, which can be faster on the hard disk because those records could be in the nearby sector / platter layer -- for example, if the program touches 2 random elements, then it will be faster just to read those 2 records, but what if it touches all elements in random order, then it may be faster just to read all records in a sequential way, and then process the random elements. But how will RoR know whether I am doing only a few random elements or all elements in random?
Why would you want to fetch 50000 records if you only use 2 of them? Then fetch only those two from DB. If you want to list the fans, then you will probably use pagination - i.e. use limit and offset in your query, or some pagination gem like will_paginate.
I see no logical explanation why should you go the way you try to. Explain a real situation so we could help you.
However there is one think you need to know wile loading many associated objects from DB - use :include like
Actor.all(:include => :fans)
this will eager-load all the fans so there will only be 2 queries instead of N+1, where N is a quantity of actors
Look at the SQL which is spewed out by the server in development mode, and that will tell you how many fan records are being loaded. In this case actor.fans will indeed cause them all to be loaded, which is probably not what you want.
You have several options:
Use a paginator as suggested by Tadas;
Set up another association with the fans table that pulls in just the ones you're interested in. This can be done either with a conditions on the has_many statement, e.g.
has_many :fans, :conditions => "country of residence = 'UK'"
Specifying the full SQL to narrow down the rows returned with the :finder_sql option
Specifying the :limit option which will, well, limit, the number of rows returned.
All depends on what you want to do.