Rails, data structure and performance - ruby-on-rails

Let's say I have a rails app with 3 tables, one for questions, one for options (possible answers to this question), and one for votes.
Currently, when requesting the statistics on a given question, I have to make a SQL query for each option which will look in the "votes" table (around 1.5 million entries) and count the number of times this option has been selected. It's slow and takes 4/5 seconds.
I was thinking of adding a column directly in the question table which would store the statistics and update them each time someone makes a vote. Is that good practice ? Because it seems redundant to the information that is already in the votes table, only it would be faster to load.
Or maybe I should create another table which would save these statistics for each question ?
Thanks for your advice !

Rails offers a feature called counter_cache which will serve your purpose
Add the counter_cache option to votes model
class Vote < AR::Base
belongs_to :question, :counter_cache => true
end
and the following migration
add_column :questions, :votes_count, :integer, :default => 0
This should increment the votes_count field in questions table for every new record in votes table
For more info: RailsCast

It would be a wise decision, ActiveRecord:CounterCache is made just for that purpose.
Also, there's a Railscast for that

You probably can do a "clever" SQL query using GROUP BY that will give you the expected result in one query. If you are query is that slow you'll probably need to add some indexes on your table.

Related

where query causing a timeout in rails

I have this query in my rails application where I get the name of the Books.
Book.where(:name => #name_list).pluck(:name)
Basically it find the Book whose name is present in the #name_list and then return an array of their names if present.
But since there are a huge number of books present in the database, the request is getting timed out when I call this particular endpoint.
Please let me know if there is any way we can make this faster so that endpoint will work.
Also will the query speed increase if we add this name column as an index into the Books table ?
add_index :books, :name
You asked it an index on books.name will increase the performance of query filtering books by a given list of names.
The answer is yes. This is exactly the textbook use-case for database indexes. The bigger the table is, the bigger the performance benefit will be. For huge tables, it is not unlikely that queries using the index will be a magnitude faster.
I highly suggest adding such an index with the method you already named in your question and try again:
add_index :books, :name
I think this is something you're looking for, it's best if you process the data in batches to prevent memory bloat, or process this large amount of data at once.
In Rails 3.2, how to "pluck_in_batches" for a very large table

Queries with include in Rails

I have the following problem. I need to do a massive query of table named professionals but I need to optimize the query because for each professional I call the associated tables.
But I have a problem with two associated tables: comments and tariffs.
Comments:
I need to call 3 comments for each professional. I try with:
#professionals.includes(:comments).where(:comments => { type: 0 } ).last(3)
The problem the query only brings 3 professionals, not what I need, all the professionals with only three comments where type be equal to zero.
And when I try:
#professionals.includes(:comments).where(:comments => { type: 0 } )
The result is only professionals with (all the) comments when I need all the professional with or without comments. But if the professional have comments I only need the last three comments where the type be equals zero
Tariffs:
With tariffs I have a similar problem, in this case I need the last 4 tariffs for each professional. I try with:
#professionals.includes(:tariffs).last(4)
But only brings the last 4 professionals.
Models:
class Comment < ActiveRecord::Base
belongs_to :client
belongs_to :professional
end
class Professionals < ActiveRecord::Base
has_many :comment
end
You can't use limit on the joining table in ActiveRecord. The limit is applied to the first relation, which in this case happens to be #professionals.
You have a few choices choices:
Preload all comments for each professional and limit them on output (reduces the number of queries needed but increases memory consumption since you are potentially preloading a lot of AR objects).
Lazy load the required number of comments (increases the number of queries by n+1, but reduces the potential memory consumption).
Write a custom query with raw SQL.
If you preload everything, then you don't have to change much. Just limit the number of comments white iterating through each #professional.
#professionals.each do |professional|
#professional.comments.limit(3)
end
If you lazy load only what you need, then you would apply the limit scope to the comments relation.
#professionals.all
#professionals.each do |professional|
#professional.comments.where(type: 0).limit(3)
end
Writing a custom query is a bit more complex. But you might find that it might be less performant depending on the number of joins you have to make, and how well indexed your table is.
I suggest you take approach two, and use query and fragment caching to improve performance. For example:
- cache #professionals do
- #professionals.each do |professional|
- cache professional do
= professional.name
This approach will hit the database the first time, but after subsequent loads comments will be read from the cache, avoiding the DB hit. You can read more about caching in the Rails Guides.

Rails subquery reduce amount of raw SQL

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Comments for many different models: polymorphic or not? (Ruby on Rails)

I am building an app that allows comments on 5 unique models (Posts, Photos, Events, etc), with 2 or 3 more on the way. As it stands, each model has an associated comment model (PostComments, PhotoComments, EventComments, etc), though the comments themselves are generally the same across all models.
I recently discovered the power of polymorphic associations, explained perfectly in Railscast #154, which would essentially combine many models into a single model and many tables into a single table.
While polymorphic associations would clean up code and redundancy, how do they affect performance? I don't know much about database optimization, but it seems like it would take longer to query a comment from 1,000,000 rows in a generic comment table than 200,000 rows in a specific comment table. Is it worth making the switch to polymorphic associations (while the app is still relatively early in development) or should I continue making models/tables for each type of comment?
It really depends how big the site will be. First you have to add a index on the 2 colums.
add_index :comments, [:commentable_type, commentable_id]
This will boost up the speed a lot.
If you have a big speed problem in the future because you have 1.000.000 comments you can always use caching or even migrate to several tables. But really you will need a lot of comments to have speed problems. As long if you index your table! To do a search query in 1.000.000 records isnt that much anyways.
I say, make 1 table!
A little improve over Michael's answer:
add_index :comments, [:commentable_id, :commentable_type]
I think this answer would be better because :commentable_id attribute would narrow down the query more, which means that the overall query speed over the index would be a lot faster. Give me feedbacks on this :)

Rails Eager Loading Question Find(:all, :include => [:model])

I have a Topic and a Project model. I have a Many-to-many ass between them (HABTM one).
In the Topic's Index Page, I want to display the number of projects that each topic have. So I have
#topics = Topic.all(:include => [:projects])
In my controller, and so far so good. The problem is that the Project Model is so big that the query is still really slow
Topic Load (1.5ms) SELECT * FROM "topics"
Project Load (109.2ms) SELECT "projects".*, t0.topic_id as the_parent_record_id FROM "projects" INNER JOIN "projects_topics" t0 ON "projects".id = t0.project_id WHERE (t0.topic_id IN (1,2,3,4,5,6,7,8,9,10,11))
Is there a way to make the second query not to select * but just the name or the ID? Because the counter_cache is not supported by the HABTM Ass, and I don't really want to implement it by myself... so is there a way to make this second query faster?
I just need to pull the count without loading the whole project object...
Thanks in advance,
Nicolás Hock Isaza
counter_cache is very easy to implement
you can convert habtm to double has_many, i.e. has_many :projects_topics in both project and topic model (and belongs_to in projects_topics) and then use counter_cache or do eager loading only on projects_topics
you can do :select => "count(projects_topics.id)", :group => "topics.id" but this won't work well with postgresql if you care about it...
The second option is the best IMO, I usually don't use habtm at all, only double has_many :)
To expand on Devenv's answer counter cache is what you would typically use for this kind of scenario.
From the api docs:
Caches the number of belonging objects
on the associate class through the use
of increment_counter and
decrement_counter. The counter cache
is incremented when an object of this
class is created and decremented when
it‘s destroyed. This requires that a
column named #{table_name}_count (such
as comments_count for a belonging
Comment class) is used on the
associate class (such as a Post
class). You can also specify a custom
counter cache column by providing a
column name instead of a true/false
value to this option (e.g.,
:counter_cache => :my_custom_counter.)
Note: Specifying a counter cache will
add it to that model‘s list of
readonly attributes using
attr_readonly.
Here is a screen cast from ryan bates' railscasts on counter_cache.
Here is an answer to a question I asked half a year ago where the solution was an easily implemented home-brew counter cache.

Resources