This is more of a "why do things work this way" question rather than a "I don't know how to do this" question...
So the gospel on pulling associated records that you know you're going to use is to use :include because you'll get a join and avoid a whole bunch of extra queries:
Post.all(:include => :comments)
However when you look at the logs, there's no join happening:
Post Load (3.7ms) SELECT * FROM "posts"
Comment Load (0.2ms) SELECT "comments.*" FROM "comments"
WHERE ("comments".post_id IN (1,2,3,4))
ORDER BY created_at asc)
It is taking a shortcut because it pulls all of the comments at once, but it's still not a join (which is what all the documentation seems to say). The only way I can get a join is to use :joins instead of :include:
Post.all(:joins => :comments)
And the logs show:
Post Load (6.0ms) SELECT "posts".* FROM "posts"
INNER JOIN "comments" ON "posts".id = "comments".post_id
Am I missing something? I have an app with half a dozen associations and on one screen I display data from all of them. Seems like it would be better to have one join-ed query instead of 6 individuals. I know that performance-wise it's not always better to do a join rather than individual queries (in fact if you're going by time spent, it looks like the two individual queries above are faster than the join), but after all the docs I've been reading I'm surprised to see :include not working as advertised.
Maybe Rails is cognizant of the performance issue and doesn't join except in certain cases?
It appears that the :include functionality was changed with Rails 2.1. Rails used to do the join in all cases, but for performance reasons it was changed to use multiple queries in some circumstances. This blog post by Fabio Akita has some good information on the change (see the section entitled "Optimized Eager Loading").
.joins will just joins the tables and brings selected fields in return. if you call associations on joins query result, it will fire database queries again
:includes will eager load the included associations and add them in memory. :includes loads all the included tables attributes. If you call associations on include query result, it will not fire any queries
The difference between joins and include is that using the include statement generates a much larger SQL query loading into memory all the attributes from the other table(s).
For example, if you have a table full of comments and you use a :joins => users to pull in all the user information for sorting purposes, etc it will work fine and take less time than :include, but say you want to display the comment along with the users name, email, etc. To get the information using :joins, it will have to make separate SQL queries for each user it fetches, whereas if you used :include this information is ready for use.
Great example:
http://railscasts.com/episodes/181-include-vs-joins
I was recently reading more on difference between :joins and :includes in rails. Here is an explaination of what I understood (with examples :))
Consider this scenario:
A User has_many comments and a comment belongs_to a User.
The User model has the following attributes: Name(string), Age(integer). The Comment model has the following attributes:Content, user_id. For a comment a user_id can be null.
Joins:
:joins performs a inner join between two tables. Thus
Comment.joins(:user)
#=> <ActiveRecord::Relation [#<Comment id: 1, content: "Hi I am Aaditi.This is my first comment!", user_id: 1, created_at: "2014-11-12 18:29:24", updated_at: "2014-11-12 18:29:24">,
#<Comment id: 2, content: "Hi I am Ankita.This is my first comment!", user_id: 2, created_at: "2014-11-12 18:29:29", updated_at: "2014-11-12 18:29:29">,
#<Comment id: 3, content: "Hi I am John.This is my first comment!", user_id: 3, created_at: "2014-11-12 18:30:25", updated_at: "2014-11-12 18:30:25">]>
will fetch all records where user_id (of comments table) is equal to user.id (users table). Thus if you do
Comment.joins(:user).where("comments.user_id is null")
#=> <ActiveRecord::Relation []>
You will get a empty array as shown.
Moreover joins does not load the joined table in memory. Thus if you do
comment_1 = Comment.joins(:user).first
comment_1.user.age
#=> User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? ORDER BY "users"."id" ASC LIMIT 1 [["id", 1]]
#=> 24
As you see, comment_1.user.age will fire a database query again in the background to get the results
Includes:
:includes performs a left outer join between the two tables. Thus
Comment.includes(:user)
#=><ActiveRecord::Relation [#<Comment id: 1, content: "Hi I am Aaditi.This is my first comment!", user_id: 1, created_at: "2014-11-12 18:29:24", updated_at: "2014-11-12 18:29:24">,
#<Comment id: 2, content: "Hi I am Ankita.This is my first comment!", user_id: 2, created_at: "2014-11-12 18:29:29", updated_at: "2014-11-12 18:29:29">,
#<Comment id: 3, content: "Hi I am John.This is my first comment!", user_id: 3, created_at: "2014-11-12 18:30:25", updated_at: "2014-11-12 18:30:25">,
#<Comment id: 4, content: "Hi This is an anonymous comment!", user_id: nil, created_at: "2014-11-12 18:31:02", updated_at: "2014-11-12 18:31:02">]>
will result in a joined table with all the records from comments table. Thus if you do
Comment.includes(:user).where("comment.user_id is null")
#=> #<ActiveRecord::Relation [#<Comment id: 4, content: "Hi This is an anonymous comment!", user_id: nil, created_at: "2014-11-12 18:31:02", updated_at: "2014-11-12 18:31:02">]>
it will fetch records where comments.user_id is nil as shown.
Moreover includes loads both the tables in the memory. Thus if you do
comment_1 = Comment.includes(:user).first
comment_1.user.age
#=> 24
As you can notice comment_1.user.age simply loads the result from memory without firing a database query in the background.
In addition to a performance considerations, there's a functional difference too.
When you join comments, you are asking for posts that have comments- an inner join by default.
When you include comments, you are asking for all posts- an outer join.
tl;dr
I contrast them in two ways:
joins - For conditional selection of records.
includes - When using an association on each member of a result set.
Longer version
Joins is meant to filter the result set coming from the database. You use it to do set operations on your table. Think of this as a where clause that performs set theory.
Post.joins(:comments)
is the same as
Post.where('id in (select post_id from comments)')
Except that if there are more than one comment you will get duplicate posts back with the joins. But every post will be a post that has comments. You can correct this with distinct:
Post.joins(:comments).count
=> 10
Post.joins(:comments).distinct.count
=> 2
In contract, the includes method will simply make sure that there are no additional database queries when referencing the relation (so that we don't make n + 1 queries)
Post.includes(:comments).count
=> 4 # includes posts without comments so the count might be higher.
The moral is, use joins when you want to do conditional set operations and use includes when you are going to be using a relation on each member of a collection.
.joins works as database join and it joins two or more table and fetch selected data from backend(database).
.includes work as left join of database. It loaded all the records of left side, does not have relevance of right hand side model. It is used to eager loading because it load all associated object in memory. If we call associations on include query result then it does not fire a query on database, It simply return data from memory because it have already loaded data in memory.
'joins' just used to join tables and when you called associations on joins then it will again fire query (it mean many query will fire)
lets suppose you have tow model, User and Organisation
User has_many organisations
suppose you have 10 organisation for a user
#records= User.joins(:organisations).where("organisations.user_id = 1")
QUERY will be
select * from users INNER JOIN organisations ON organisations.user_id = users.id where organisations.user_id = 1
it will return all records of organisation related to user
and #records.map{|u|u.organisation.name}
it run QUERY like
select * from organisations where organisations.id = x then time(hwo many organisation you have)
total number of SQL is 11 in this case
But with
'includes' will eager load the included associations and add them in memory(load all associations on first load) and not fire query again
when you get records with includes like
#records= User.includes(:organisations).where("organisations.user_id = 1")
then query will be
select * from users INNER JOIN organisations ON organisations.user_id = users.id where organisations.user_id = 1
and
select * from organisations where organisations.id IN(IDS of organisation(1, to 10)) if 10 organisation
and when you run this
#records.map{|u|u.organisation.name}
no query will fire
Related
I am testing some Active Record call in the console. I often have raw SQL selects like this:
Model.joins(:other_model).where(some_column:foo).select("other_model.column AS col1, model.column AS col2")
Which will just result this:
[#<Model:0x00007fa45f25b3d0 id: nil>,
#<Model:0x00007fa45f25b1c8 id: nil>,
#<Model:0x00007fa45f25afc0 id: nil>,
#<Model:0x00007fa45f25acc8 id: nil>,
#<Model:0x00007fa45f25aa48 id: nil>,
#<Model:0x00007fa45f25a6d8 id: nil>]
Can I have the console output the values from the "other_model.column"? I swear I saw a trick for this before but can't seem to find this anywhere.
Fundamentally, the issue is that ActiveRecord is going to attempt to cast the resulting rows as Models. However, the attributes are still loaded! They're just not in the string representation of the model, since they don't exist as columns on the table.
Given a relation:
relation = Model.joins(:other_model).where(some_column:foo)
.select("other_model.column AS col1, model.column AS col2")
You can easily see the computed columns by just mapping out #attributes:
relation.map(&:attributes)
You will still end up with id fields in attributes regardless, but you can easily ignore them, or further process the mapped attributes to remove them if needed.
For example:
> Account.where("id < 1000")
.select("id * 1024 as bigid")
.limit(10)
.map(&:attributes)
=> [{"id"=>nil, "bigid"=>819200},
{"id"=>nil, "bigid"=>820224},
{"id"=>nil, "bigid"=>822272}]
Another alternative would be to skip the ActiveRecord loading, and use #to_sql to just execute a raw query. In this case, I have a Postgres connection, so I'm using #exec_query; it may vary based on your RDBMS:
Given our relation:
> ActiveRecord::Base.connection.exec_query(relation.to_sql).to_hash
=> [{"bigid"=>"819200"}, {"bigid"=>"820224"}, {"bigid"=>"822272"}]
You can use the #pluck method to get an array of column values, for example:
Model.joins(:other_model).where(some_column:foo).pluck("other_model.column")
I am working on a complex SQL query with rails 4.2.1
def self.proofreader_for_job(job)
User.find_by_sql(
"SELECT * FROM users
INNER JOIN timers
ON users.id = timers.proofreader_id
INNER JOIN tasks
ON tasks.id = timers.task_id
WHERE tasks.job_id = #{job.id}")
end
My schema is (jobs has_many tasks, tasks has_many timers, and a timer belongs_to a user(role: proofreader) through the foriegn key proofreader_id)
The issue is that when I call the method it is returning what is the correct user's email and attributes but the id doesn't match.
For exeample User.proofreader_for_job(job) returns
[#<User id: 178, email: "testemail#gmail.com">]
testemail#gmail.com is the correct email, but I don't have a user in my db with an id of 178.
User.all just returns
[#<User id: 12, email: "fakeemail#gmail.com">,
#<User id: 11, email: "testemail#gmail.com">]
I noticed the issue in my rspec tests, but it happens on both development and test environments.
Does anyone have any idea why my methods is returning a user with such a high id. Is this done by design, if so why?
Thank you.
Since you're doing 'Select *', your statement will return all columns for each of the tables in the JOIN statement. So when you're casting the output from the SQL statement to a User type, I think the wrong 'id' column is being grabbed for the User id (likely the timers or tasks table).
Try explicitly specifying the columns to return like the below statement:
User.find_by_sql(
"SELECT users.id, users.email FROM users
INNER JOIN timers
ON users.id = timers.proofreader_id
INNER JOIN tasks
ON tasks.id = timers.task_id
WHERE tasks.job_id = #{job.id}")
end
I am trying to get a "Select As" query statement to work, but keep getting an error and am not sure why it is not working. Per the API docs, the format is correct.
User.select("firstname as fname")
Results in:
User Load (1.7ms) SELECT firstname as fname FROM "users"
=> #<ActiveRecord::Relation [#<User id: nil>, #<User id: nil>]
However if i use:
User.select(:firstname)
I get:
User Load (2.8ms) SELECT "users"."firstname" FROM "users"
=> #<ActiveRecord::Relation [#<User id: nil, firstname: "John">, #<User id: nil, firstname: "Brian">,
So I can see from the query why it's not returning the results, but I don't understand why its creating the incorrect query. (The actual query I need to use the select as on is more complicated then this query, but I was trying to use the simpler query to try to figure out why it wasn't working properly.
The reason I need to use a select as query is because I have two separate objects from two very different tables that i need to join together and change one of the column names so I can sort by that column. I'm not sure if there is an easier way to change the name prior to combining the objects.
Thanks!
You can use alias_attribute :firstname, :fname in the model and then use User.select(:fname) in the controller as well.
I am new to rails. What is the equivalent rails query for the following sql query?
mysql> select a.code, b.name, b.price, b.page from article a inner join books b on a.id =b.article_id;
I am trying
Article.joins(:books).select("books.name, books.price, books.page, articles.code")
The active record relation returns only table one data
=> #<ActiveRecord::Relation [#<Article id: 1, code: "x", created_at: "2014-11-12 13:28:08", updated_at: "2014-11-14 04:16:06">, #<Article id: 2, code: "y", created_at: "2014-11-12 13:28:08", updated_at: "2014-11-14 04:00:16">]>
What is the solution to join both table?
You normally don't really query like that directly with Rails. Instead, you'd set up your models and use other associated models to achieve this. If speed is an issue, you can use eager loading. If you absolutely need exactly this join, it's:
class Article < ActiveRecord::Base
has_many :books
scope :with_books, lambda {
joins(:books)
.select('articles.code, books.name, books.price, books.page')
}
end
class Book < ActiveRecord::Base
belongs_to :article
end
But this is not so useful. It generates the join you want, but retrieving the book details like that won't be fun. You could do something like:
a = Article.with_books.where("books.name = 'Woken Furies'").first
a.code
And that should give you the article code. Depending on what you need, a better way could be to remove the scope from the Article class and instead query like:
b = Book.where(name: 'Woken Furies')
.joins(:article)
.where("articles.code = 'something'")
b = Book.where("name = 'Woken Furies' AND articles.code = 'something'")
.joins(:article)
Both of these queries should be equivalent. You can get from one associated record to the other with:
book = b.first
article_code = book.article.code
I'm not sure what you need to accomplish with the join, but I think you might get more beautiful code by using plain ActiveRecord. If you need the speed gain, avoid n+1 problems, etc., it might make sense to write those joins out by hand.
I hope I understood your question correctly.
There's more about joining in the Rails guides:
http://guides.rubyonrails.org/active_record_querying.html#joining-tables
Update: You can use pluck if you need to retrieve e.g. just the code and the name:
Article.with_books
.where("books.name = 'Woken Furies'")
.pluck('articles.code, books.name')
I have users which has first_name and last_name fields and i need to do a ruby find all the users that have duplicate accounts based on first and last names. For example i want to have a find that will search through all the other users and find if any have the same name and email. I was thinking a nested loop like this
User.all.each do |user|
//maybe another loop to search through all the users and maybe if a match occurs put that user in an array
end
Is there a better way
You could go a long way toward narrowing down your search by finding out what the duplicated data is in the first place. For example, say you want to find each combination of first name and email that is used more than once.
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
That will return an array containing one of each of the duplicated records. From that, say one of the returned users had "Fred" and "fred#example.com" then you could search for only Users having those values to find all of the affected users.
The return from that find will be something like the following. Note that the array only contains a single record from each set of duplicated users.
[#<User id: 3, first: "foo", last: "barney", email: "foo#example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">,
#<User id: 5, first: "foo1", last: "baasdasdr", email: "abc#example.com", created_at: "2010-12-30 17:20:49", updated_at: "2010-12-30 17:20:49">]
For example, the first element in that array shows one user with "foo" and "foo#example.com". The rest of them can be pulled out of the database as needed with a find.
> User.find(:all, :conditions => {:email => "foo#example.com", :first => "foo"})
=> [#<User id: 1, first: "foo", last: "bar", email: "foo#example.com", created_at: "2010-12-30 17:14:28", updated_at: "2010-12-30 17:14:28">,
#<User id: 3, first: "foo", last: "barney", email: "foo#example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">]
And it also seems like you'll want to add some better validation to your code to prevent duplicates in the future.
Edit:
If you need to use the big hammer of find_by_sql, because Rails 2.2 and earlier didn't support :having with find, the following should work and give you the same array that I described above.
User.find_by_sql("select * from users group by first,email having count(*) > 1")
After some googling, I ended up with this:
ActiveRecord::Base.connection.execute(<<-SQL).to_a
SELECT
variants.id, variants.variant_no, variants.state
FROM variants INNER JOIN (
SELECT
variant_no, state, COUNT(1) AS count
FROM variants
GROUP BY
variant_no, state HAVING COUNT(1) > 1
) tt ON
variants.variant_no = tt.variant_no
AND variants.state IS NOT DISTINCT FROM tt.state;
SQL
Note that part that says IS NOT DISTINCT FROM, this is to help deal with NULL values, which can't be compared with equals sign in postgres.
If you are going the route of #hakunin and creating a query manually, you may wish to use the following:
ActiveRecord::Base.connection.exec_quey(<<-SQL).to_a
SELECT
variants.id, variants.variant_no, variants.state
FROM variants INNER JOIN (
SELECT
variant_no, state, COUNT(1) AS count
FROM variants
GROUP BY
variant_no, state HAVING COUNT(1) > 1
) tt ON
variants.variant_no = tt.variant_no
AND variants.state IS NOT DISTINCT FROM tt.state;
SQL
The change is replacing connection.execute(<<-SQL)
with connection.exec_query(<<-SQL)
There can be a problem with memory leakage using execute
Plead read Clarify DataBaseStatements#execute to get an in depth understanding of the problem.