Rails subquery reduce amount of raw SQL - ruby-on-rails

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks

Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.

How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.

Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.

Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Related

Rails Data Modelling

In my company, we are trying to cache some data that we are querying from an API. We are using Rails. Two of my models are 'Query' and 'Response'. I want to create a one-to-many relationship between Query and Response, wherein, one query can have many responses.
I thought this is the right way to do it.
Query = [query]
Response = [query_id, response_detail_1, response_detail_2]
Then, in the Models, I did the following Data Associations:
class Query < ActiveRecord::Base
has_many :response
end
class Response < ActiveRecord::Base
belongs_to :query
end
So, canonically, whenever I want to find all the responses for a given query, I would do -
"_id" = Query.where(:query => "given query").id
Response.where(:query_id => "_id")
But my boss made me use an Array column in the Query model, remove the Data Associations between the models and put the id of each response record in that array column in the Query model. So, now the Query model looks like
Query = [query_id, [response_id_1, response_id_2, response_id_3,...]]
I just want to know what are the merits and demerits of doing it both ways and which is the right way to do it.
If the relationship is really a one-to-many relationship, the "standard" approach is what you originally suggested, or using a junction table. You're losing out on referential integrity that you could get with a FK by using the array. Postgres almost had FK constraints on array columns, but from what I researched it looks like it's not currently in the roadmap:
http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/
You might get some performance advantages out of the array approach if you consider it like a denormalization/caching assist. See this answer for some info on that, but it still recommends using a junction table:
https://stackoverflow.com/a/17012344/4280232. This answer and the comments also offer some thoughts on the array performance vs the join performance:
https://stackoverflow.com/a/13840557/4280232
Another advantage of using the array is that arrays will preserve order, so if order is important you could get some benefits there:
https://stackoverflow.com/a/2489805/4280232
But even then, you could put the order directly on the responses table (assuming they're unique to each query) or you could put it on a join table.
So, in sum, you might get some performance advantages out of the array foreign keys, and they might help with ordering, but you won't be able to enforce FK constraints on them (as of the time of this writing). Unless there's a special situation going on here, it's probably better to stick with the "FK column on the child table" approach, as that is considerably more common.
Granted, that all applies mainly to SQL databases, which I notice now you didn't specify in your question. If you're using NoSQL there may be other conventions for this.

How do I get Rails to eager load counts?

This is related to a question a year and change ago.
I put up an example of the question that should work out of the box, provided you have sqlite3 available: https://github.com/cairo140/rails-eager-loading-counts-demo
Installation instructions (for the main branch)
git clone git://github.com/cairo140/rails-eager-loading-counts-demo.git
cd rails-eager-loading-counts-demo
rails s
I have a fuller write-up in the repository, but my general question is this.
How can I make Rails eager load counts in a way that minimizes db queries across the board?
The n+1 problem emerges whenever you use #count on an association, despite having included that association via #includes(:associated) in the ActiveRelation. A workaround is to use #length, but this works well only when the object it's being called on has already been loaded up, not to mention that I suspect it duplicates something that the Rails internals have done already. Also, an issue with using #length is that it results in an unfortunate over-loading when the association was not loaded to begin with and the count is all you need.
From the readme:
We can dodge this issue by running #length on the posts array (see appendix), which is already loaded, but it would be nice to have count readily available as well. Not only is it more consistent; it provides a path of access that doesn't necessarily require posts to be loaded. For instance, if you have a partial that displays the count no matter what, but half the time, the partial is called with posts loaded and half the time without, you are faced with the following scenario:
Using #count
n COUNT style queries when posts are already loaded
n COUNT style queries when posts are not already loaded
Using #length
Zero additional queries when posts are already loaded
n * style queries when posts are not already loaded
Between these two choices, there is no dominant option. But it would be nice to revise #count to defer to #length or access the length that is some other way stored behind the scenes so that we can have the following scenario:
Using revised #count
Zero additional queries when posts are already loaded
n COUNT style queries when posts are not already loaded
So what's the correct approach here? Is there something I've overlooked (very, very likely)?
As #apneadiving suggested, counter_cache works well because the counter column gets automatically updated when records are added or removed. So when you load the parent object, the count is included in the object without needing to access the other table.
However, if for whatever reason you don't like that approach, you could do this:
Post.find(:all,
:select => "posts.*, count(comments.id) `comments_count`",
:joins => "left join comments on comments.post_id = posts.id")
An alternative approach to the one of Zubin:
Post.select('posts.*, count(comments.id) `comments_count`').joins(:comments).group('posts.id')
It appears that the best way to implement this sort of facility might be to create SQL Views (ref: here and here) for the seperate model-and-child-count objects that you want; and their associated ActiveRecord models.
You might be able to be very clever and use subclassing on the original model combined with set_table_name :sql_view_name to retain all the original methods on the objects, and maybe even some of their associations.
For instance, say we were to add 'Post.has_many :comments' to your example, like in #Zubin's answer above; then one might be able to do:
class CreatePostsWithCommentsCountsView < ActiveRecord::Migration
def self.up
#Create SQL View called posts_with_comments_counts which maps over
# select posts.*, count(comments.id) as comments_count from posts
# left outer join comments on comments.post_id = posts.id
# group by posts.id
# (As zubin pointed out above.)
#*Except* this is in SQL so perhaps we'll be able to do further
# reducing queries against it *as though it were any other table.*
end
end
class PostWithCommentsCount < Post #Here there be cleverness.
#The class definition sets up PWCC
# with all the regular methods of
# Post (pointing to the posts table
# due to Rails' STI facility.)
set_table_name :posts_with_comment_counts #But then we point it to the
# SQL view instead.
#If you don't really care about
# the methods of Post being in PWCC
# then you could just make it a
# normal subclass of AR::Base.
end
PostWithCommentsCount.all(:include => :user) #Obviously, this sort of "upward
# looking" include is best used in big lists like "latest posts" rather than
# "These posts for this user." But hopefully it illustrates the improved
# activerecordiness of this style of solution.
PostWithCommentsCount.all(:include => :comments) #And I'm pretty sure you
# should be able to do this without issue as well. And it _should_ only be
# the two queries.
I have set up a small gem that adds an includes_count method to ActiveRecord, that uses a SELECT COUNT to fetch the number of records in an association, without resorting to a JOIN which might be expensive (depending on the case).
See https://github.com/manastech/includes-count
Hope it helps!

Rails optimization Question

In Rails while using activeRecord why are join queries considered bad.
For example
Here i'm trying to find the number of companies that belong to a certain category.
class Company ActiveRecord::Base
has_one :company_profile
end
Finding the number of Company for a particular category_id
number_of_companies = Company.find(:all, :joins=>:company_profile, :conditions=>["(company_profiles.category_id = #{c_id}) AND is_published = true"])
How could this be better or is it just poor design?
company_profiles = CompanyProfile.find_all_by_category_id(c_id)
companies = []
company_profiles.each{|c_profile| companies.push(c_profile.company) }
Isn't it better that the first request creates a single query while i'd be running several queries for the second case.
Could someone explain why joins are considered to be bad practice in Rails
Thanks in advance
To my knowledge, there is no such rule. The rule is to hit the database as least as possible, and rails gives you the right tools for that, using the joins.
The example Sam gives above is exemplary. Simple code, but behind the scenes rails has to do two queries, instead of only one using a join.
If there is one rule that comes to mind, that i think is related, is to avoid SQL where possible and use the rails way as much as possible. This keeps your code database agnostic (as rails handles the differences for you). But sometimes even that is unavoidable.
It comes down to good database design, creating the correct indexes (which you need to define manually in migrations), and sometimes big nested structures/joins are needed.
Join queries are not bad, in fact, they are good, and ActiveRecord has them at its very heart. You don't need to break into find_by_sql to use them, options like :include will handle it for you. You can stay within the ORM, which gives the readability and ease of use, whilst still, for the most part, creating very efficient SQL (providing you have your indexes right!)
Bottom line - you need to do the bare minimum of database operations. Joins are a good way of letting the database do the heavy lifting for you, and lowering the number of queries that you execute.
By the by, DataMapper and Arel (the query engine in Rails 3) feature a lot of lazy loading - this means that code such as:
#category = Category.find(params[:id])
#category.companies.size
Would most likely result in a join query that only did a COUNT operation, as the first line wouldn't result in a query being sent to the db.
If you just want to find the number of companies on a category all you need to do is find the category and then call the association name and size because it will return an array.
#category = Category.find(params[:id])
#category.companies.size

How will ActiveRelation affect rails' includes() 's capabilities?

I've looked over the Arel sources, and some of the activerecord sources for Rails 3.0, but I can't seem to glean a good answer for myself as to whether Arel will be changing our ability to use includes(), when constructing queries, for the better.
There are instances when one might want to modify the conditions on an activerecord :include query in 2.3.5 and before, for the association records which would be returned. But as far as I know, this is not programmatically tenable for all :include queries:
(I know some AR-find-includes make t#{n}.c#{m} renames for all the attributes, and one could conceivably add conditions to these queries to limit the joined sets' results; but others do n_joins + 1 number of queries over the id sets iteratively, and I'm not sure how one might hack AR to edit these iterated queries.)
Will Arel allow us to construct ActiveRecord queries which specify the resulting associated model objects when using includes()?
Ex:
User :has_many posts( has_many :comments)
User.all(:include => :posts) #say I wanted the post objects to have their
#comment counts loaded without adding a comment_count column to `posts`.
#At the post level, one could do so by:
posts_with_counts = Post.all(:select => 'posts.*, count(comments.id) as comment_count',
:joins => 'left outer join comments on comments.post_id = posts.id',
:group_by => 'posts.id') #i believe
#But it seems impossible to do so while linking these post objects to each
#user as well, without running User.all() and then zippering the objects into
#some other collection (ugly)
#OR running posts.group_by(&:user) (even uglier, with the n user queries)
Why don't you actually use AREL at its core. Once you get down to the actual table scope you can use Arel::Relation which is COMPLETELY different from ActiveRecord implementation itself. I truly believe that the ActiveRecord::Relation is a COMPLETELY different (and busted) implementation of a wrapper around an Arel::Relation & Arel::Table. I choose to use Arel at its core by either doing Thing.scoped.table (Arel::Table) which is the active record style OR Arel::Table.new(:table_name) which gives me a fresh Arel::Table (my preferred method). From this you can do the following.
posts = Arel::Table.new(:thing, :as => 'p') #derived relation
comments = Arel::Table.new(:comments, :as => 'c') # derived relation
posts_and_comments = posts.join(comments).on( posts[:id].eq(:comments[:id]) )
# now you can iterate through the derived relation by doing the following
posts_and_comments.each {...} # this will actually return Arel::Rows which is another story.
#
An Arel::Row returns a TRUE definition of a tuple from the set which will consist of an Arel::Header (set of Arel::Attributes) and a tuple.
Also slightly more verbose, the reason why I use Arel at its core is because it truly exposes the relational model to me which is the power behind ActiveRelation. I have noticed that ActiveRecord is exposing like 20% of what Arel has to offer and I am affraid that developers will not realize this NOR will they understand the true core of Relational Algebra. Using the conditions hash is to me "old school" and an ActiveRecord style programming in a Relational Algebra world. Once we learn to break away from the Martin Fowler model based approach and adopt the E.F. Codd Relational Model based approach this is actually what RDBMS have been trying to do for decades but gotten very wrong.
I've taken the liberty to start a seven part learning series on Arel and Relational Algebra for the ruby community. These will consist of short videos ranging from absolute beginner to advanced techniques like self referencing relations and closure under composition. The first video is at http://Innovative-Studios.com/#pilot Please let me know if you need more information or this was not descriptive enough for you.
The future looks bright with Arel.
ActiveRecord::Relation is a fairly weak wrapper around Base#find_by_sql, so :include queries are not extended in any way by its inclusion.
Isn't
Post.includes([:author, :comments]).where(['comments.approved = ?', true]).all
what you're looking for? (taken from the official docs)

Best practices for getting a list of IDs from an ActiveRecord model

I have an ActiveRecord model Language, with columns id and short_code (there are other columns, but they are not relevant to this question). I want to create a method that will be given a list of short codes, and return a list of IDs. I do not care about associations, I just need to end up with an array that looks like [1, 2, 3, ...].
My first thought was to do something like
def get_ids_from_short_codes(*short_codes)
Language.find_all_by_short_code(short_codes.flatten, :select => 'id').map(&:id)
end
but I'm not sure if that's needlessly wasting time/memory/processing.
My question is twofold:
Is there a way to run an ActiveRecord find that will just return an array of a certain table column rather than instantiating objects?
If so, would it actually be worthwhile to collect an array of length n rather than instantiating n ActiveRecord objects?
Note that for my specific purpose, n would be approximately 200.
In Rails 3.x, you can use the pluck method which returns the values from the requested field without instantiating objects to hold them.
This would give you an array of IDs:
Language.where(short_code: short_codes.flatten).pluck(:id)
I should mention that in Rails 3.x you can pluck only one column at a time but in Rails 4 you can pass multiple columns to pluck.
By the way, here's a similar answer to a similar question
Honestly, for 200 records, I wouldn't worry about it. When you get to 2000, 20,000, or 200,000 records - then you can worry about optimization.
Make sure you have short_code indexed in your table.
If you are still concerned about performance, take a look at the development.log and see what the database numbers are for that particular call. You can adjust the query and see how it affects performance in the log. This should give you a rough estimate of performance.
Agree with the previous answer, but if you absolutely must, you can try this
sql = Language.send(:construct_finder_sql, :select => 'id', :conditions => ["short_code in (?)", short_codes])
Language.connection.select_values(sql)
A bit ugly as it is, but it doesn't create in-memory objects.
if you're using associations you can get raw ids directly from ActiveRecord.
eg.:
class User < ActiveRecord::Base
has_many :users
end
irb:=> User.find(:first).user_ids
irb:>> [1,2,3,4,5]
Phil is right about this, but if you do find that this is an issue. You can send a raw SQL query to the database and work at a level below ActiveRecord. This can be useful for situations like this.
ActiveRecord::Base.connection.execute("SQL CODE!")
Benchmark your code first before you resort to this.
This really is a matter of choice.
Overkill or not, ActiveRecord is supposed to give you objects since it's an ORM. And Like Ben said, if you do not want objects, use raw SQL.

Resources