Rails Arel selecting distinct columns - ruby-on-rails

I've hit a slight block with the new scope methods (Arel 0.4.0, Rails 3.0.0.rc)
Basically I have:
A topics model, which has_many :comments, and a comments model (with a topic_id column) which belongs_to :topics.
I'm trying to fetch a collection of "Hot Topics", i.e. the topics that were most recently commented on. Current code is as follows:
# models/comment.rb
scope :recent, order("comments.created_at DESC")
# models/topic.rb
scope :hot, joins(:comments) & Comment.recent & limit(5)
If I execute Topic.hot.to_sql, the following query is fired:
SELECT "topics".* FROM "topics" INNER JOIN "comments"
ON "comments"."topic_id" = "topics"."id"
ORDER BY comments.created_at DESC LIMIT 5
This works fine, but it potentially returns duplicate topics - If topic #3 was recently commented on several times, it would be returned several times.
My question
How would I go about returning a distinct set of topics, bearing in mind that I still need to access the comments.created_at field, to display how long ago the last post was? I would imagine something along the lines of distinct or group_by, but I'm not too sure how best to go about it.
Any advice / suggestions are much appreciated - I've added a 100 rep bounty in hopes of coming to an elegant solution soon.

Solution 1
This doesn't use Arel, but Rails 2.x syntax:
Topic.all(:select => "topics.*, C.id AS last_comment_id,
C.created_at AS last_comment_at",
:joins => "JOINS (
SELECT DISTINCT A.id, A.topic_id, B.created_at
FROM messages A,
(
SELECT topic_id, max(created_at) AS created_at
FROM comments
GROUP BY topic_id
ORDER BY created_at
LIMIT 5
) B
WHERE A.user_id = B.user_id AND
A.created_at = B.created_at
) AS C ON topics.id = C.topic_id
"
).each do |topic|
p "topic id: #{topic.id}"
p "last comment id: #{topic.last_comment_id}"
p "last comment at: #{topic.last_comment_at}"
end
Make sure you index the created_at and topic_id column in the comments table.
Solution 2
Add a last_comment_id column in your Topic model. Update the last_comment_id after creating a comment. This approach is much faster than using complex SQL to determine the last comment.
E.g:
class Topic < ActiveRecord::Base
has_many :comments
belongs_to :last_comment, :class_name => "Comment"
scope :hot, joins(:last_comment).order("comments.created_at DESC").limit(5)
end
class Comment
belongs_to :topic
after_create :update_topic
def update_topic
topic.last_comment = self
topic.save
# OR better still
# topic.update_attribute(:last_comment_id, id)
end
end
This is much efficient than running a complex SQL query to determine the hot topics.

This is not that elegant in most SQL implementations. One way is to first get the list of the five most recent comments grouped by topic_id. Then get the comments.created_at by sub selecting with the IN clause.
I'm very new to Arel but something like this could work
recent_unique_comments = Comment.group(c[:topic_id]) \
.order('comments.created_at DESC') \
.limit(5) \
.project(comments[:topic_id]
recent_topics = Topic.where(t[:topic_id].in(recent_unique_comments))
# Another experiment (there has to be another way...)
recent_comments = Comment.join(Topic) \
.on(Comment[:topic_id].eq(Topic[:topic_id])) \
.where(t[:topic_id].in(recent_unique_comments)) \
.order('comments.topic_id, comments.created_at DESC') \
.group_by(&:topic_id).to_a.map{|hsh| hsh[1][0]}

In order to accomplish this you need to have a scope with a GROUP BY to get the latest comment for each topic. You can then order this scope by created_at to get the most recent commented on topics.
The following works for me using sqlite
class Comment < ActiveRecord::Base
belongs_to :topic
scope :recent, order("comments.created_at DESC")
scope :latest_by_topic, group("comments.topic_id").order("comments.created_at DESC")
end
class Topic < ActiveRecord::Base
has_many :comments
scope :hot, joins(:comments) & Comment.latest_by_topic & limit(5)
end
I used the following seeds.rb to generate the test data
(1..10).each do |t|
topic = Topic.new
(1..10).each do |c|
topic.comments.build(:subject => "Comment #{c} for topic #{t}")
end
topic.save
end
And the following are the test results
ruby-1.9.2-p0 > Topic.hot.map(&:id)
=> [10, 9, 8, 7, 6]
ruby-1.9.2-p0 > Topic.first.comments.create(:subject => 'Topic 1 - New comment')
=> #<Comment id: 101, subject: "Topic 1 - New comment", topic_id: 1, content: nil, created_at: "2010-08-26 10:53:34", updated_at: "2010-08-26 10:53:34">
ruby-1.9.2-p0 > Topic.hot.map(&:id)
=> [1, 10, 9, 8, 7]
ruby-1.9.2-p0 >
The SQL generated for sqlite(reformatted) is extremely simple and I hope Arel would render different SQL for other engines as this would certainly fail in many DB engines as the columns within Topic are not in the "Group by list". If this did present a problem then you could probably overcome it by limiting the selected columns to just comments.topic_id
puts Topic.hot.to_sql
SELECT "topics".*
FROM "topics"
INNER JOIN "comments" ON "comments"."topic_id" = "topics"."id"
GROUP BY comments.topic_id
ORDER BY comments.created_at DESC LIMIT 5

Since the question was about Arel, I thought I'd add this in, since Rails 3.2.1 adds uniq to the QueryMethods:
If you add .uniq to the Arel it adds DISTINCT to the select statement.
e.g. Topic.hot.uniq
Also works in scope:
e.g. scope :hot, joins(:comments).order("comments.created_at DESC").limit(5).uniq
So I would assume that
scope :hot, joins(:comments) & Comment.recent & limit(5) & uniq
should also probably work.
See http://apidock.com/rails/ActiveRecord/QueryMethods/uniq

Related

Correctness of using methods in model that joins other models in Rails

Maybe the title is confusing, but I didn't know how to explain my doubt.
Say I have the following class methods that will be helpful in order to do chainings to query a model called Player. A Player belongs_to a User, but if I want to fetch Players from a particular village or city, I have to fetch the User model.
def self.by_village(village)
joins(:user).where(:village => "village")
end
def self.by_city(city)
joins(:user).where(:city => "city")
end
Let's say I want to fetch a Player by village but also by city, so I would do...
Player.by_city(city).by_village(village).
This would be doing a join of the User twice, and I don't think that is correct.. Right?
So my question is: What would be the correct way of doing so?
I haven't tried that, but I would judge the answer to your question by the actual sql query ActiveRecord generates. If it does only one join, I would use it as you did, if this results in two joins you could create a method by_village_and_city.
OK. Tried it now:
1.9.2p290 :022 > Player.by_city("Berlin").by_village("Kreuzberg")
Player Load (0.3ms) SELECT "players".* FROM "players" INNER JOIN "users" ON "users"."id" = "players"."user_id" WHERE "users"."city" = 'Berlin' AND "users"."village" = 'Kreuzberg'
=> [#<Player id: 1, user_id: 1, created_at: "2012-07-28 17:05:35", updated_at: "2012-07-28 17:05:35">, #<Player id: 2, user_id: 2, created_at: "2012-07-28 17:08:14", updated_at: "2012-07-28 17:08:14">]
So, ActiveRecors combines the two queries, does the right thing and I would use it, except:
I had to change your implementation though:
class Player < ActiveRecord::Base
belongs_to :user
def self.by_village(village)
joins(:user).where('users.village' => village)
end
def self.by_city(city)
joins(:user).where('users.city' => city)
end
end
and what you're doing is usually handled with parameterized scopes:
class Player < ActiveRecord::Base
belongs_to :user
scope :by_village, lambda { |village| joins(:user).where('users.village = ?', village) }
scope :by_city, lambda { |city| joins(:user).where('users.city = ?', city) }
end

ActiveRecord Association select counts for included records

Example
class User
has_many :tickets
end
I want to create association which contains logic of count tickets of user and use it in includes (user has_one ticket_count)
Users.includes(:tickets_count)
I tried
has_one :tickets_count, :select => "COUNT(*) as tickets_count,tickets.user_id " ,:class_name => 'Ticket', :group => "tickets.user_id", :readonly => true
User.includes(:tickets_count)
ArgumentError: Unknown key: group
In this case association query in include should use count with group by ...
How can I implement this using rails?
Update
I can't change table structure
I want AR generate 1 query for collection of users with includes
Update2
I know SQL an I know how to select this with joins, but my question is now like "How to get data" . My question is about building association which I can use in includes. Thanks
Update3
I tried create association created like user has_one ticket_count , but
looks like has_one doesn't support association extensions
has_one doesn't support :group option
has_one doesn't support finder_sql
Try this:
class User
has_one :tickets_count, :class_name => 'Ticket',
:select => "user_id, tickets_count",
:finder_sql => '
SELECT b.user_id, COUNT(*) tickets_count
FROM tickets b
WHERE b.user_id = #{id}
GROUP BY b.user_id
'
end
Edit:
It looks like the has_one association does not support the finder_sql option.
You can easily achieve what you want by using a combination of scope/class methods
class User < ActiveRecord::Base
def self.include_ticket_counts
joins(
%{
LEFT OUTER JOIN (
SELECT b.user_id, COUNT(*) tickets_count
FROM tickets b
GROUP BY b.user_id
) a ON a.user_id = users.id
}
).select("users.*, COALESCE(a.tickets_count, 0) AS tickets_count")
end
end
Now
User.include_ticket_counts.where(:id => [1,2,3]).each do |user|
p user.tickets_count
end
This solution has performance implications if you have millions of rows in the tickets table. You should consider filtering the JOIN result set by providing WHERE to the inner query.
You can simply use for a particular user:
user.tickets.count
Or if you want this value automatically cached by Rails.
Declare a counter_cache => true option in the other side of the association
class ticket
belongs_to :user, :counter_cache => true
end
You also need a column in you user table named tickets_count.
With this each time you add a new tickets to a user rails will update this column so when you ftech your user record you can simply accs this column to get the ticket count without additional query.
Not pretty, but it works:
users = User.joins("LEFT JOIN tickets ON users.id = tickets.user_id").select("users.*, count(tickets.id) as ticket_count").group("users.id")
users.first.ticket_count
What about adding a method in the User model that does the query?
You wouldn't be modifying the table structure, or you can't modify that either?
How about adding a subselect scope to ApplicationRecord:
scope :subselect,
lambda { |aggregate_fn, as:, from:|
query = self.klass
.select(aggregate_fn)
.from("#{self.table_name} _#{self.table_name}")
.where("_#{self.table_name}.id = #{self.table_name}.id")
.joins(from)
select("(#{query.to_sql}) AS #{as}")
}
Then, one might use the following query:
users = User.select('users.*').subselect('COUNT(*)', as: :tickets_count, from: :tickets)
users.first.ticket_count
# => 5

Rails HABTM joining with another condition

I am trying to get a list, and I will use books as an example.
class Book < ActiveRecord::Base
belongs_to :type
has_and_belongs_to_many :genres
end
class Genre < ActiveRecord::Base
has_and_belongs_to_many :books
end
So in this example I want to show a list of all Genres, but it the first column should be the type. So, if say a genre is "Space", the types could be "Non-fiction" and "Fiction", and it would show:
Type Genre
Fiction Space
Non-fiction Space
The Genre table has only "id", "name", and "description", the join table genres_books has "genre_id" and "book_id", and the Book table has "type_id" and "id". I am having trouble getting this to work however.
I know the sql code I would need which would be:
SELECT distinct genres.name, books.type_id FROM `genres` INNER JOIN genres_books ON genres.id = genres_books.genre_id INNER JOIN books ON genres_books.book_id = books.id order by genres.name
and I found I could do
#genre = Genre.all
#genre.each do |genre|
#type = genre.book.find(:all, :select => 'type_id', :group => 'type_id')
#type.each do |type|
and this would let me see the type along with each genre and print them out, but I couldn't really work with them all at once. I think what would be ideal is if at the Genre.all statement I could somehow group them there so I can keep the genre/type combinations together and work with them further down the road. I was trying to do something along the lines of:
#genres = Genre.find(:all, :include => :books, :select => 'DISTINCT genres.name, genres.description, books.product_id', :conditions => [Genre.book_id = :books.id, Book.genres.id = :genres.id] )
But at this point I am running around in circles and not getting anywhere. Do I need to be using has_many :through?
The following examples use your models, defined above. You should use scopes to push associations back into the model (alternately you can just define class methods on the model). This helps keep your record-fetching calls in check and helps you stick within the Law of Demeter.
Get a list of Books, eagerly loading each book's Type and Genres, without conditions:
def Book < ActiveRecord::Base
scope :with_types_and_genres, include(:type, :genres)
end
#books = Book.with_types_and_genres #=> [ * a bunch of book objects * ]
Once you have that, if I understand your goal, you can just do some in-Ruby grouping to corral your Books into the structure that you need to pass to your view.
#books_by_type = #books.group_by { |book| book.type }
# or the same line, more concisely
#books_by_type = #books.group_by &:type
#books_by_type.each_pair do |type, book|
puts "#{book.genre.name} by #{book.author} (#{type.name})"
end

Rails 3 HABTM Proper form for query with condition = array

I was implementing my first HABTM relationship and have run into an issue with my query.
I am looking to validate my approach and to see if I have found a bug in the AREL (or some other part of Rails) code.
I have the following models
class Item < ActiveRecord::Base
belongs_to :user
belongs_to :category
has_and_belongs_to_many :regions
end
class Region < ActiveRecord::Base
has_ancestry
has_and_belongs_to_many :items
end
I have the associated items_regions table:
class CreateItemsRegionsTable < ActiveRecord::Migration
def self.up
create_table :items_regions, :id => false do |t|
t.references :item, :null => false
t.references :region, :null => false
end
add_index(:items_regions, [:item_id, :region_id], :unique => true)
end
def self.down
drop_table :items_regions
end
end
My goal is to create a scope/query is follows:
Find all items in a region (and its subregions)
The ancestory gem provides a method to retrieve descendant categories for Region as an array. In this case,
ruby-1.9.2-p180 :167 > a = Region.find(4)
=> #<Region id: 4, name: "All", created_at: "2011-04-12 01:14:00", updated_at: "2011-04-12 01:14:00", ancestry: nil, cached_slug: "all">
ruby-1.9.2-p180 :168 > region_list = a.subtree_ids
=> [1, 2, 3, 4]
If there is only one element in the array, the following works
items = Item.joins(:regions).where(["region_id = ?", [1]])
The sql generated is
"SELECT `items`.* FROM `items` INNER JOIN `items_regions` ON `items_regions`.`item_id` = `items`.`id` INNER JOIN `regions` ON `regions`.`id` = `items_regions`.`region_id` WHERE (region_id = 1)"
However, if there are multiple items in the array and I try to use IN
Item.joins(:regions).where(["region_id IN ?", [1,2,3,4]])
ActiveRecord::StatementInvalid: Mysql::Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1,2,3,4)' at line 1: SELECT `items`.* FROM `items` INNER JOIN `items_regions` ON `items_regions`.`item_id` = `items`.`id` INNER JOIN `regions` ON `regions`.`id` = `items_regions`.`region_id` WHERE (region_id IN 1,2,3,4)
The sql generated has an error at the end
"SELECT `items`.* FROM `items` INNER JOIN `items_regions` ON `items_regions`.`item_id` = `items`.`id` INNER JOIN `regions` ON `regions`.`id` = `items_regions`.`region_id` WHERE (region_id IN 1,2,3,4)"
the last part of the generated code should be
(region_id IN ("1,2,3,4"))
If I edit the sql manually and run it, I get what I expect.
So, two questions:
Is my approach for the single value case correct?
Is the sql generation a bug or have I configured things incorrectly?
Thanks
Alan
.where('regions.id' => array)
Should work in all cases, whether or not you specify one value or multiple.
The reason your original query doesn't work is that you actually need to specify valid SQL. So alternatively you can do
.where('region_id IN (?)', [1,2,3,4])
The other responders are correct regarding use of the conditions hash, but the specific issue you're running into after that has to do with field specificity:
Mysql::Error: Unknown column 'items.region_id' in 'where clause'
You're trying to draw a condition based on "region_id", but since you didn't explicitly give a table it uses "items" by default. It sounds like your column is actually on the "item_regions" table. Try this:
where("item_regions.region_id IN (?)", [1,2,3,4])
Or alternatively:
where(:item_regions => {:region_id => [1,2,3,4]})
Did you try the
.where('region_id IN (?)', [1,2,3,4])
form? You need the () to be valid.
I think the cleanest and most idiomatic way to do this in Arel is the nested hash syntax which avoids string literals (and any direct reference to the HABTM join table):
Item.joins(:regions).where(regions: { id: [1,2,3,4] })

Grab only the latest comment in Rails

In a typical User - Post - Comment model in Rails, every user can create a Post and also can create Comment, question is how to grab every user latest comment on specific post.
Example:
Post A have 3 user making comment
User 1 have comment 1, 2, 3, 4, 5, 6
User 2 have comment 1, 2, 3, 4
User 3 have comment 1, 2
So the view I want is just the latest comment for every user:
Post A have 3 user making comment
User 1 latest comment that is 6
User 2 latest comment that is 4
user 3 latest comment that is 2
How to do it ?
thanks
Something like this:
post.comments.for_user(current_user).last
add a named_scope in your model
class Comment
named_scope :for_user, lambda{ |user| {:conditions=>{:user_id => user.id}}
end
That should do the trick.
If you rather do it in rails,
messages_by_users = post.messages.group_by(&:user)
messages_by_users.each do |key, value|
messages_by_users[key] = value.last
end
I have had to get this kind of data and usually I end up doing two queries. In my case I have Blogs and their Posts and I wanted a list of the 3 most recent blog posts with the restriction that the blogs are unique, I dont want 2 posts from the same blog. I ended up doing something like this (MySQL):
q = <<-EOQ
SELECT id,pub_date FROM
(
SELECT id,blog_id,pub_date
FROM posts
ORDER BY pub_date DESC
LIMIT 40
)
t
GROUP BY blog_id
ORDER BY pub_date DESC
LIMIT #{num_posts}
EOQ
post_ids = Post.connection.select_values(q)
Post.find(:all, :include => [:blog], :conditions => ["id IN (?)", post_ids], :order => "posts.pub_date DESC")
So in your case you might have something like:
q = <<-EOQ
SELECT id FROM
(
SELECT id,post_id
FROM comments
ORDER BY id DESC
LIMIT 40
)
t
GROUP BY post_id
ORDER BY id DESC
LIMIT 10
EOQ
post_ids = Post.connection.select_values(q)
Post.find(:all, :include => [:blog], :conditions => ["id IN (?)", post_ids], :order => "posts.id DESC")
Assuming that your database is assigning sequential IDs to the comments, you can do this:
class Comment
named_scope :most_recent, lambda {
lastest_comments = Comment.maximum :id, :group => "user_id, post_id"
{ :conditions => [ "comment_id in ?", lastest_comments.map(&:last) ] }
}
end
This gives you a two-query method that you can use in a variety of ways. The named_scope above pulls back the most recent comments for all users on all posts. This might be a problem if your database is gigantic, but you can certainly add conditions to make it more specific.
As it stands, it is a flexible method that allows you to do the following:
Comment.most_recent.find_by_user #user #-> the most recent comments on all posts by a user
#user.comments.most_recent #-> same as above
Comment.most_recent.find_by_post #post #-> the most recent comments on a single post by all users
#post.comments.most_recent #-> same as above
Comment.most_recent.find_by_user_and_post #user, #post #-> the specific most recent comment by a certain user on a certain post
#post.comments.most_recent.find_by_user #user #-> you get the idea

Resources