ActiveRecord: size vs count - ruby-on-rails

In Rails, you can find the number of records using both Model.size and Model.count. If you're dealing with more complex queries is there any advantage to using one method over the other? How are they different?
For instance, I have users with photos. If I want to show a table of users and how many photos they have, will running many instances of user.photos.size be faster or slower than user.photos.count?
Thanks!

You should read that, it's still valid.
You'll adapt the function you use depending on your needs.
Basically:
if you already load all entries, say User.all, then you should use length to avoid another db query
if you haven't anything loaded, use count to make a count query on your db
if you don't want to bother with these considerations, use size which will adapt

As the other answers state:
count will perform an SQL COUNT query
length will calculate the length of the resulting array
size will try to pick the most appropriate of the two to avoid excessive queries
But there is one more thing. We noticed a case where size acts differently to count/lengthaltogether, and I thought I'd share it since it is rare enough to be overlooked.
If you use a :counter_cache on a has_many association, size will use the cached count directly, and not make an extra query at all.
class Image < ActiveRecord::Base
belongs_to :product, counter_cache: true
end
class Product < ActiveRecord::Base
has_many :images
end
> product = Product.first # query, load product into memory
> product.images.size # no query, reads the :images_count column
> product.images.count # query, SQL COUNT
> product.images.length # query, loads images into memory
This behaviour is documented in the Rails Guides, but I either missed it the first time or forgot about it.

tl;dr
If you know you won't be needing the data use count.
If you know you will use or have used the data use length.
If you don't know where it is used or the speed difference is neglectable, use size...
count
Resolves to sending a Select count(*)... query to the DB. The way to go if you don't need the data, but just the count.
Example: count of new messages, total elements when only a page is going to be displayed, etc.
length
Loads the required data, i.e. the query as required, and then just counts it. The way to go if you are using the data.
Example: Summary of a fully loaded table, titles of displayed data, etc.
size
It checks if the data was loaded (i.e. already in rails) if so, then just count it, otherwise it calls count. (plus the pitfalls, already mentioned in other entries).
def size
loaded? ? #records.length : count(:all)
end
What's the problem?
That you might be hitting the DB twice if you don't do it in the right order (e.g. if you render the number of elements in a table on top of the rendered table, there will be effectively 2 calls sent to the DB).

Sometimes size "picks the wrong one" and returns a hash (which is what count would do)
In that case, use length to get an integer instead of hash.

The following strategies all make a call to the database to perform a COUNT(*) query.
Model.count
Model.all.size
records = Model.all
records.count
The following is not as efficient as it will load all records from the database into Ruby, which then counts the size of the collection.
records = Model.all
records.size
If your models have associations and you want to find the number of belonging objects (e.g. #customer.orders.size), you can avoid database queries (disk reads). Use a counter cache and Rails will keep the cache value up to date, and return that value in response to the size method.

I recommended using the size function.
class Customer < ActiveRecord::Base
has_many :customer_activities
end
class CustomerActivity < ActiveRecord::Base
belongs_to :customer, counter_cache: true
end
Consider these two models. The customer has many customer activities.
If you use a :counter_cache on a has_many association, size will use the cached count directly, and not make an extra query at all.
Consider one example:
in my database, one customer has 20,000 customer activities and I try to count the number of records of customer activities of that customer with each of count, length and size method. here below the benchmark report of all these methods.
user system total real
Count: 0.000000 0.000000 0.000000 ( 0.006105)
Size: 0.010000 0.000000 0.010000 ( 0.003797)
Length: 0.030000 0.000000 0.030000 ( 0.026481)
so I found that using :counter_cache Size is the best option to calculate the number of records.

Here's a flowchart to simplify your decision-making process. Hope it helps.
Source: Difference Between the Length, Size, and Count Methods in Rails

Related

Rails fastest way to get table records count

I have a user table in Postgresql database, if run User.count, it takes 150ms to get result. It is too too slow to us. Ideally, it shall take less than 10ms to return me the result. Is there any way to cache sql result in model level? Something like
def self.total_count
User.count.cached # that's my imagination
end
In my opinion, there are several ways you could go about this -
You could have another table that stores the count of the total number of users by incrementing the count there when a user is added/deleted or at frequent time intervals.
If your table is extremely big and accuracy is not the most important thing, you also look into Postgres' COUNT ESTIMATE query.
SELECT reltuples AS approximate_row_count FROM pg_class WHERE relname = 'users';
You should look into counter_cache. It will work great if your user belongs_to another model http://guides.rubyonrails.org/association_basics.html

Query Optimization with ActiveRecord for each method

Below mentioned query is taking too much time, not able to understand how to optimized it.
Code and Associations :
temp = []
platforms = current_user.company.advisory_platforms
platforms.each{ |x| temp << x.advisories.published.collect(&:id) }
class Advisory
has_many :advisory_platforms,:through =>:advisory_advisory_platforms
end
class AdvisoryPlatform
has_many :companies,:through => :company_advisory_platforms
has_many :company_advisory_platforms,:dependent => :destroy
has_many :advisory_advisory_platforms,:dependent => :destroy
has_many :advisories, :through => :advisory_advisory_platforms
end
There are three glaring performance issues in your example.
First, you are iterating the records using each which means that you are loading the entire record set into memory at once. If you must iterate records in this way you should always use find_each so it is done in batches.
Second, every iteration of your each loop is performing an additional SQL call to get its results. You want to limit SQL calls to the bare minimum.
Third, you are instantiating entire Rails models simply to collect a single value, which is very wasteful. Instantiating Rails models is expensive.
I'm going to solve these problems in two ways. First, construct an ActiveRecord relation that will access all the data you need in one query. Second, use pluck to grab the id you need without paying the model instantiation cost.
You didn't specify what published is doing so I am going to assume it is a scope on Advisory. You also left out some of the data model so I am going to have to make assumptions about your join models.
advisory_ids = AdvisoryAdvisoryPlatform
.where(advisory_platform_id: current_user.company.advisory_platforms)
.where(advisory_id: Advisory.published)
.pluck(:advisory_id)
If you pass a Relation object as the value of a field, ActiveRecord will convert it into a subquery.
So
where(advisory_id: Advisory.published)
is analogous to
WHERE advisory_id IN (SELECT id FROM advisories WHERE published = true)
(or whatever it is published is doing).

Queries with include in Rails

I have the following problem. I need to do a massive query of table named professionals but I need to optimize the query because for each professional I call the associated tables.
But I have a problem with two associated tables: comments and tariffs.
Comments:
I need to call 3 comments for each professional. I try with:
#professionals.includes(:comments).where(:comments => { type: 0 } ).last(3)
The problem the query only brings 3 professionals, not what I need, all the professionals with only three comments where type be equal to zero.
And when I try:
#professionals.includes(:comments).where(:comments => { type: 0 } )
The result is only professionals with (all the) comments when I need all the professional with or without comments. But if the professional have comments I only need the last three comments where the type be equals zero
Tariffs:
With tariffs I have a similar problem, in this case I need the last 4 tariffs for each professional. I try with:
#professionals.includes(:tariffs).last(4)
But only brings the last 4 professionals.
Models:
class Comment < ActiveRecord::Base
belongs_to :client
belongs_to :professional
end
class Professionals < ActiveRecord::Base
has_many :comment
end
You can't use limit on the joining table in ActiveRecord. The limit is applied to the first relation, which in this case happens to be #professionals.
You have a few choices choices:
Preload all comments for each professional and limit them on output (reduces the number of queries needed but increases memory consumption since you are potentially preloading a lot of AR objects).
Lazy load the required number of comments (increases the number of queries by n+1, but reduces the potential memory consumption).
Write a custom query with raw SQL.
If you preload everything, then you don't have to change much. Just limit the number of comments white iterating through each #professional.
#professionals.each do |professional|
#professional.comments.limit(3)
end
If you lazy load only what you need, then you would apply the limit scope to the comments relation.
#professionals.all
#professionals.each do |professional|
#professional.comments.where(type: 0).limit(3)
end
Writing a custom query is a bit more complex. But you might find that it might be less performant depending on the number of joins you have to make, and how well indexed your table is.
I suggest you take approach two, and use query and fragment caching to improve performance. For example:
- cache #professionals do
- #professionals.each do |professional|
- cache professional do
= professional.name
This approach will hit the database the first time, but after subsequent loads comments will be read from the cache, avoiding the DB hit. You can read more about caching in the Rails Guides.

How to group by multiple attributes on children, and then count?

In a Rails 3.2 app I have a User model that has many Awards.
The Award class has :type, :level and :image attributes.
On a User's show page I want to show their Awards, but with some criteria. User.awards should be grouped by both type and level, and for each type-level combination I want to display its image, and a count of the awards.
I'm struggling to construct the queries and views to achieve this (and to explain this clearly).
How can I group on two attributes of a child record, and then display both a count and attribute (i.e. image) of those children?
It took me some time to figure this out because of the complicated mix of active record objects, arrays and grouped arrays.
Anyway, incase this is useful for anyone else
Given a User has many Awards, and Award has attributes :type, :level, :image.
for award in #user.awards.group_by{ |award| [award.type,award.level] }.sort_by{|award| [award[0][0], award[0][1]]}
puts "#{(award[0][0]).capitalize} - Level #{award[0][1]}" # e.g. Award_Name - Level 1
puts award[1].first.image #outputs the value of award.image, i.e. the image url
puts award[1].count #counts the number of grouped awards
end
A bit fiddly! Maybe there are ways to optimize this code?
Depending on the database you're using you have to build a custom SQL query using a GROUP BY on type and level:
SELECT * FROM users GROUP BY users.type, users.level
(Postgres has a special interpretation of the GROUP BY so check the document of the database you're using).
To write it in Rails read the documentation: http://guides.rubyonrails.org/active_record_querying.html#group
For the count you'll have to do it in a second step (Ruby could do it using the size method on the Array of ActiveRecord object the query will return you).

Doing analytics on a large table in Rails / PostGreSQL

I have a "Vote" table in my database which is growing in size everyday, currently at around 100 million rows. For internal analytics / insights I used to have a rake task which would compute a few basic metrics, like the number of votes made daily in the past few days. It's just a COUNT with a where clause on the date "created_at".
This rake task was doing fine until I deleted the index on "created_at" because it seems that it had a negative impact on the app performance for all the other user-facing queries that didn't need this index, especially when inserting a new row.
Currently I don't have a lot of insights as to what is going on in my app and in this table. However I don't really want to add indexes on such a large table if it's only for my own use.
What else can I try ?
Alternately, you could sidestep the Vote table altogether and keep an external tally.
Every time a vote is cast, a separate tally class that keeps a running count of votes cast will be invoked. There will be one tally record per day. A tally record will have an integer representing the number of votes cast on that day.
Each increment call to the tally class will find a tally record for the current date (today), increment the vote count, and save the record. If no record exists, one will be created and incremented accordingly.
For example, let's have a class called VoteTally with two attributes: a date (date), and a vote count (integer), no timestamps, no associations. Here's what the model will look like:
class VoteTally < ActiveRecord::Base
def self.tally_up!
find_or_create_by_date(Date.today).increment!(:votes)
end
def self.tally_down!
find_or_create_by_date(Date.today).decrement!(:votes)
end
def self.votes_on(date)
find_by_date(date).votes
end
end
Then, in the Vote model:
class Vote < ActiveRecord::Base
after_create :tally_up
after_destroy :tally_down
# ...
private
def tally_up ; VoteTally.tally_up! ; end
def tally_down ; VoteTally.tally_down! ; end
end
These methods will get vote counts:
VoteTally.votes_on Date.today
VoteTally.votes_on Date.yesterday
VoteTally.votes_on 3.days.ago
VoteTally.votes_on Date.parse("5/28/13")
Of course, this is a simple example and you will have to adapt it to suit. This will result in an extra query during vote casting, but it's a hell of a lot faster than a where clause on 100M records with no index. Minor inaccuracies are possible with this solution, but I assume that's acceptable given the anecdotal nature of daily vote counts.
It's just a COUNT with a where clause on the date "created_at".
In that case the only credible index you can use is the one on created_at...
If write performance is an issue (methinks it's unlikely...) and you're using a composite primary key, clustering the table using that index might help too.
If the index has really an impact on the write performance, and it's only a few persons which run statistics now and then, you might consider another general approach:
You could separate your "transaction processing database" from your "reporting database".
You could update your reporting database on a regular basis, and create reporting-only indexes only there. What is more queries regarding reports will not conflict with transaction-oriented traffic, and it doesn't matter how long they run.
Of course, this increases a certain delay, and it increases system complexity. On the other hand, if you roll-forward your reporting database on a regular basis, you can ensure that your backup scheme actually works.

Resources