When using Rails and Kaminari, is it possible to limit the maximum number of results to be returned for a given query?
#things = Thing.page(params[:page]).per(10)
Assuming we have 500k+ Thing records in the database, how can I ensure that Kaminari's paginate method will never return more than 10,000 rows to paginate?
Kaminari has a max_per_page option that may be helpful, but it may work via the limit scope method also offered here.
So, I figured out a way to do that using an initializer for Kaminari in config/initializers/kaminari_config.rb:
Kaminari.configure do |config|
config.default_per_page = 10
config.max_per_page = 100
config.max_pages = 100
end
In the above case, 100 * 100 = 10,000 maximum total results
More info here
max_per_page actually sets settings globally. To have individual resource based settings, Kaminari supports Array type objects as well. So, something like below works perfect:
limited_newsfeeds = Newsfeed.order(:created_at).limit(20)
#newfeeds = Kaminari.paginate_array(limited_newsfeed).page(params[:page]).per(3)
You may easily use this #newsfeed object same like normal collection in kaminari views, like
<%= paginate #newsfeeds, remote: true %>
TIP: Using limit() method directly before/after page() doesn't serve the purpose. It either overwrites kaminari's limit (when used after page() method), or gets replaced-by kaminari's limit (when used before page() method)
Source: https://github.com/kaminari/kaminari#paginating-a-generic-array-object
So you're interested in just the top 10k rows? This should do it, but I would consider adding some sorting criteria in there (before the call to limit) so that your chosen 10000 rows make sense.
Thing.limit(10000).page(params[:page]).per(10)
Related
I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count
When querying on a certain model in my rails application, it returns the correct results, excerpt the size, length or count information, even using the limit criteria.
recipes = Recipe
.where(:bitly_url => /some.url/)
.order_by(:date => :asc)
.skip(10)
.limit(100)
recipes.size # => 57179
recipes.count # => 57179
recipes.length # => 57179
I can't understand why this is happening, it keeps showing the total count of the recipes collection, and the correct value should be 100 since I used limit.
count = 0
recipes.each do |recipe|
count += 1
end
# WAT
count # => 100
Can somebody help me?
Thanks!
--
Rails version: 3.2.3
Mongoid version: 2.4.10
MongoDB version: 1.8.4
From the fine manual:
- (Integer) length
Also known as: size
Get's the number of documents matching the query selector.
But .limit doesn't really alter the query selector as it doesn't change what the query matches, .offset and .limit alter what segment of the matches are returned. This doesn't match the behavior of ActiveRecord and the documentation isn't exactly explicit about this subtle point. However, Mongoid's behaviour does match what the MongoDB shell does:
> db.things.find().limit(2).count()
23
My things collection contains 23 documents and you can see that the count ignores the limit.
If you want to know how many results are returned then you could to_a it first:
recipes.to_a.length
As mentioned in one of the comments, in newer Mongoid versions (not sure which ones), you can simply use recipes.count(true) and this will include the limit, without needing to query the result set, as per the API here.
In the current version of mongoid (5.x), count(true) no longer works. Instead, count now accepts an options hash. Among them there's :limit option
criteria.count(limit: 10)
Or, to reuse whatever limit is already set on the criteria
criteria.count(criteria.options.slice(:limit))
In my application, I have an array named #apps which is loaded by ActiveRecord with a record containing the app's name, environment, etc.
I am currently using #apps.count to get the number of apps in the array, but I am having trouble counting the number of applications in the array where the environment = 0.
I tried #apps.count(0) but that didn't work since there are multiple fields for each record.
I also tried something like #apps.count{ |environment| environment = 0} but nothing happened.
Any suggestions?
Just use select to narrow down to what you want:
#apps.select {|a| a.environment == 0}.count
However, if this is based on ActiveRecord, you'd be better off just making your initial query limit it unless of course you need all of the records and are just filtering them in different ways for different purposes.
I'll assume your model is call App since you are putting them in #apps:
App.where(environment: 0).count
You have the variable wrong. Also, you have assignment instead of comparison.
#apps.count{|app| app.environment == 0}
or
#apps.count{|app| app.environment.zero?}
I would use reduce OR each_with_object here:
reduce docs:
#apps.reduce(Hash.new(0)) do |counts, app|
counts[app.environment] += 1
counts
end
each_with_object docs:
#apps.each_with_object(Hash.new(0)) do |app, counts|
counts[app.environment] += 1
end
If you are able to query, use sql
App.group(:environment).count will return a hash with keys as environment and values as the count.
I am using the kaminari gem for pagination. I have a resources controller which paginates perfectly (due to the simple nature of the ordering). That can be seen here:
#resources = Resource.order("created_at desc").page(params[:page]).per(25)
That just sorts them by latest first. when i do .class it appears thats an activerecord::relation
On my tags though, I want to sort them by a relationship (the number of resources assigned to that tag)
#tags = Tag.all.sort{|a, b| b.number_of_resources <=> a.number_of_resources}.page(params[:page]).per(50)
It gives me the error however undefined methodpage' for #`
Tag.all returns an Array, hence your #page call failing, as it expects an ARel relation.
If #number_of_resources maps to a DB column, then all you need to do is:
Tag.order('number_of_resources').page(params[:page]).per(50)
If it's not, you either need to add it to the Tag database table, or just do your sort/paginate in Ruby rather than using kaminari. This will be feasible if the number of tags is under ~1000 or so.
If you do add the info to the db, check out this post: Counter Cache for a column with conditions?
you should do something like: 1) joins the two tables, 2) group rows by tag, 3) count how many rows belongs to each group, 4) order using that new column with the count
you should make a good sql statement and then you can call pagination
Im using the will paginate gem for ruby. I am using will paginate to get a bunch of people and sorting on a field. What I want is only the first 100 of those. Essentially the top people. I cant seeem to do this. How would i go about it? Thanks
As far as my knowledge goes will_paginate doesn't provide an option for this, I just had a root around in its source too to check, but if you don't mind having a subquery in your conditions it can be done...
people = Person.paginate(page: 10).where('people.id IN (SELECT id FROM people ORDER BY a_field LIMIT 100)').per_page(5)
people.total_pages
=> 20
Replace a_field with the field your sorting on, and this should work for you.
(note, the above uses the current version of will_paginate (3.0.2) for its syntax, but the same concept applies to older version as well using the options hash)
will_paginate gem will take total_entries parameter to limit the number of entries to paginate.
#problems = Problem.paginate(page: params[:page], per_page: 10,
total_entries: 30)
This gives you 3 pages of 10 records each.
people = Person.limit(100).paginate(:per_page => 5)