:select with find_in_batches in rails - ruby-on-rails

How can I include a :select clause with find_in_batches. The following throws an error " Mysql::Error: Unknown column 'users.id' in 'field list': .
Post.find_in_batches(:batch_size => 100, :select => "users.id, users.name, categories.name, posts.id", :include => [:user, :category]) do |group|
#stuff with group
end

So, if you're considering using find_in_batches it probably means you have a lot of records to go through and you very well might only want select fields to be returned to you from the DB.
In Rails 3/4 you can chain find_in_batches with any other type ActiveRecord::Relation method (or at least, most... I have not tested all of them personally).
This is probably what you're looking for
User.select(:id).find_in_batches(:batch_size => 100) do |group|
# do something with group...
# like print all the ids
puts group.map(&:id)
end
If you try this in the console it generates SQL like this...
SELECT id FROM `users` WHERE (`users`.`id` > 895846) ORDER BY `users`.`id` ASC LIMIT 100
See more info here: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html

Your life with Rails will be much easier if you just retrieve all of the fields for each model queried, like so:
Post.find_in_batches(:batch_size => 100, :include => [:user, :category]) do |post|
u = post.user
c = post.category
# do stuff
end
A trimmed select list, as in your question, provides a limited DB performance improvement, but in most cases not enough to be worth the clunkier code.

Related

left join not providing the object properties i want

I'm sure I'm doing something stupid, but my AR code
#comments = Comment.find(:all, :limit => 10,
:joins => "LEFT JOIN `users` ON comments.user_id = users.id",
:select => 'comments.*, users.theme')
Is returning the correct sql:
SELECT comments.*, users.theme from comments LEFT JOIN users on comments.user_id = users.id
and when I put it into a mysql, I get the results I want, but when I try to access #comment.theme (in an each loop on #comments) after the above AR call is made, theme is not there.
So, is there something special I have to do to the Comments model to allow joins to populate the associated columns? I thought that Rails would just add them as properties I could dot-grab.
Try this:
#comments.each do |comment|
puts comment.theme
end
All the fields are in the array #comments, as Object type Comment, and the field names are as used in the select, with no regard to their original source.
Good luck.

Activerecord opitimization - best way to query all at once?

I am trying to achieve by reducing the numbers of queries using ActiveRecord 3.0.9. I generated about 'dummy' 200K customers and 500K orders.
Here's Models:
class Customer < ActiveRecord::Base
has_many :orders
end
class Orders < ActiveRecord::Base
belongs_to :customer
has_many :products
end
class Product < ActiveRecord::Base
belongs_to :order
end
when you are using this code in the controller:
#customers = Customer.where(:active => true).paginate(page => params[:page], :per_page => 100)
# SELECT * FROM customers ...
and use this in the view (I removed HAML codes for easier to read):
#order = #customers.each do |customer|
customer.orders.each do |order| # SELECT * FROM orders ...
%td= order.products.count # SELECT COUNT(*) FROM products ...
%td= order.products.sum(:amount) # SELECT SUM(*) FROM products ...
end
end
However, the page is rendered the table with 100 rows per page. The problem is that it kinda slow to load because its firing about 3-5 queries per customer's orders. thats about 300 queries to load the page.
There's alternative way to reduce the number of queries and load the page faster?
Notes:
1) I have attempted to use the includes(:orders), but it included more than 200,000 order_ids. that's issue.
2) they are already indexed.
If you're only using COUNT and SUM(amount) then what you really need is to retrieve only that information and not the orders themselves. This is easily done with SQL:
SELECT customer_id, order_id, COUNT(id) AS order_count, SUM(amount) AS order_total FROM orders LEFT JOIN products ON orders.id=products.order_id GROUP BY orders.customer_id, products.order_id
You can wrap this in a method that returns a nice, orderly hash by re-mapping the SQL results into a structure that fits your requirements:
class Order < ActiveRecord::Base
def self.totals
query = "..." # Query from above
result = { }
self.connection.select_rows(query).each do |row|
# Build out an array for each unique customer_id in the results
customer_set = result[row[0].to_i] ||= [ ]
# Add a hash representing each order to this customer order set
customer_set << { :order_id => row[1].to_i, :count => row[2].to_i, :total => row[3].to_i } ]
end
result
end
end
This means you can fetch all order counts and totals in a single pass. If you have an index on customer_id, which is imperative in this case, then the query will usually be really fast even for large numbers of rows.
You can save the results of this method into a variable such as #order_totals and reference it when rendering your table:
- #order = #customers.each do |customer|
- #order_totals[customer.id].each do |order|
%td= order[:count]
%td= order[:total]
You can try something like this (yes, it looks ugly, but you want performance):
orders = Order.find_by_sql([<<-EOD, customer.id])
SELECT os.id, os.name, COUNT(ps.amount) AS count, SUM(ps.amount) AS amount
FROM orders os
JOIN products ps ON ps.order_id = os.id
WHERE os.customer_id = ? GROUP BY os.id, os.name
EOD
%td= orders.name
%td= orders.count
%td= orders.amount
Added: Probably it is better to create count and amount cache in Orders, but you will have to maintain it (count can be counter-cache, but I doubt there is a ready recipe for amount).
You can join the tables in with Arel (I prefer to avoid writing raw sql when possible). I believe that for your example you would do something like:
Customer.joins(:orders -> products).select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
The first method--
Customer.joins(:orders -> products)
--pulls in the nested association in one statement. Then the second part--
.select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
--specifies exactly what columns you want back.
Chain those and I believe you'll get a list of Customer instances only populated with what you've specified in the select method. You have to be careful though because you now have in hand read only objects that are possibly in in invalid state.
As with all the Arel methods what you get from those methods is an ActiveRecord::Relation instance. It's only when you start to access that data that it goes out and executes the SQL.
I have some basic nervousness that my syntax is incorrect but I'm confident that this can be done w/o relying on executing raw SQL.

ActiveRecord Query Union

I've written a couple of complex queries (at least to me) with Ruby on Rail's query interface:
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})
Both of these queries work fine by themselves. Both return Post objects. I would like to combine these posts into a single ActiveRelation. Since there could be hundreds of thousands of posts at some point, this needs to be done at the database level. If it were a MySQL query, I could simply user the UNION operator. Does anybody know if I can do something similar with RoR's query interface?
Here's a quick little module I wrote that allows you to UNION multiple scopes. It also returns the results as an instance of ActiveRecord::Relation.
module ActiveRecord::UnionScope
def self.included(base)
base.send :extend, ClassMethods
end
module ClassMethods
def union_scope(*scopes)
id_column = "#{table_name}.id"
sub_query = scopes.map { |s| s.select(id_column).to_sql }.join(" UNION ")
where "#{id_column} IN (#{sub_query})"
end
end
end
Here's the gist: https://gist.github.com/tlowrimore/5162327
Edit:
As requested, here's an example of how UnionScope works:
class Property < ActiveRecord::Base
include ActiveRecord::UnionScope
# some silly, contrived scopes
scope :active_nearby, -> { where(active: true).where('distance <= 25') }
scope :inactive_distant, -> { where(active: false).where('distance >= 200') }
# A union of the aforementioned scopes
scope :active_near_and_inactive_distant, -> { union_scope(active_nearby, inactive_distant) }
end
I also have encountered this problem, and now my go-to strategy is to generate SQL (by hand or using to_sql on an existing scope) and then stick it in the from clause. I can't guarantee it's any more efficient than your accepted method, but it's relatively easy on the eyes and gives you a normal ARel object back.
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})
Post.from("(#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}) AS posts")
You can do this with two different models as well, but you need to make sure they both "look the same" inside the UNION -- you can use select on both queries to make sure they will produce the same columns.
topics = Topic.select('user_id AS author_id, description AS body, created_at')
comments = Comment.select('author_id, body, created_at')
Comment.from("(#{comments.to_sql} UNION #{topics.to_sql}) AS comments")
Based on Olives' answer, I did come up with another solution to this problem. It feels a little bit like a hack, but it returns an instance of ActiveRelation, which is what I was after in the first place.
Post.where('posts.id IN
(
SELECT post_topic_relationships.post_id FROM post_topic_relationships
INNER JOIN "watched" ON "watched"."watched_item_id" = "post_topic_relationships"."topic_id" AND "watched"."watched_item_type" = "Topic" WHERE "watched"."user_id" = ?
)
OR posts.id IN
(
SELECT "posts"."id" FROM "posts" INNER JOIN "news" ON "news"."id" = "posts"."news_id"
INNER JOIN "watched" ON "watched"."watched_item_id" = "news"."id" AND "watched"."watched_item_type" = "News" WHERE "watched"."user_id" = ?
)', id, id)
I'd still appreciate it if anybody has any suggestions to optimize this or improve the performance, because it's essentially executing three queries and feels a little redundant.
You could also use Brian Hempel's active_record_union gem that extends ActiveRecord with an union method for scopes.
Your query would be like this:
Post.joins(:news => :watched).
where(:watched => {:user_id => id}).
union(Post.joins(:post_topic_relationships => {:topic => :watched}
.where(:watched => {:user_id => id}))
Hopefully this will be eventually merged into ActiveRecord some day.
Could you use an OR instead of a UNION?
Then you could do something like:
Post.joins(:news => :watched, :post_topic_relationships => {:topic => :watched})
.where("watched.user_id = :id OR topic_watched.user_id = :id", :id => id)
(Since you are joins the watched table twice I'm not too sure what the names of the tables will be for the query)
Since there are a lot of joins, it might also be quite heavy on the database, but it might be able to be optimized.
How about...
def union(scope1, scope2)
ids = scope1.pluck(:id) + scope2.pluck(:id)
where(id: ids.uniq)
end
Arguably, this improves readability, but not necessarily performance:
def my_posts
Post.where <<-SQL, self.id, self.id
posts.id IN
(SELECT post_topic_relationships.post_id FROM post_topic_relationships
INNER JOIN watched ON watched.watched_item_id = post_topic_relationships.topic_id
AND watched.watched_item_type = "Topic"
AND watched.user_id = ?
UNION
SELECT posts.id FROM posts
INNER JOIN news ON news.id = posts.news_id
INNER JOIN watched ON watched.watched_item_id = news.id
AND watched.watched_item_type = "News"
AND watched.user_id = ?)
SQL
end
This method returns an ActiveRecord::Relation, so you could call it like this:
my_posts.order("watched_item_type, post.id DESC")
There is an active_record_union gem.
Might be helpful
https://github.com/brianhempel/active_record_union
With ActiveRecordUnion, we can do:
the current user's (draft) posts and all published posts from anyone
current_user.posts.union(Post.published)
Which is equivalent to the following SQL:
SELECT "posts".* FROM (
SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1
UNION
SELECT "posts".* FROM "posts" WHERE (published_at < '2014-07-19 16:04:21.918366')
) posts
In a similar case I summed two arrays and used Kaminari:paginate_array(). Very nice and working solution. I was unable to use where(), because I need to sum two results with different order() on the same table.
Heres how I joined SQL queries using UNION on my own ruby on rails application.
You can use the below as inspiration on your own code.
class Preference < ApplicationRecord
scope :for, ->(object) { where(preferenceable: object) }
end
Below is the UNION where i joined the scopes together.
def zone_preferences
zone = Zone.find params[:zone_id]
zone_sql = Preference.for(zone).to_sql
region_sql = Preference.for(zone.region).to_sql
operator_sql = Preference.for(Operator.current).to_sql
Preference.from("(#{zone_sql} UNION #{region_sql} UNION #{operator_sql}) AS preferences")
end
Less problems and easier to follow:
def union_scope(*scopes)
scopes[1..-1].inject(where(id: scopes.first)) { |all, scope| all.or(where(id: scope)) }
end
So in the end:
union_scope(watched_news_posts, watched_topic_posts)
gem 'active_record_extended'
Also has a set of union helpers among many others.
I would just run the two queries you need and combine the arrays of records that are returned:
#posts = watched_news_posts + watched_topics_posts
Or, at the least test it out. Do you think the array combination in ruby will be far too slow? Looking at the suggested queries to get around the problem, I'm not convinced that there will be that significant of a performance difference.
Elliot Nelson answered good, except the case where some of the relations are empty. I would do something like that:
def union_2_relations(relation1,relation2)
sql = ""
if relation1.any? && relation2.any?
sql = "(#{relation1.to_sql}) UNION (#{relation2.to_sql}) as #{relation1.klass.table_name}"
elsif relation1.any?
sql = relation1.to_sql
elsif relation2.any?
sql = relation2.to_sql
end
relation1.klass.from(sql)
end
When we add UNION to the scopes, it breaks at time due to order_by clause added before the UNION.
So I changed it in a way to give it a UNION effect.
module UnionScope
def self.included(base)
base.send(:extend, ClassMethods)
end
module ClassMethods
def union_scope(*scopes)
id_column = "#{table_name}.id"
sub_query = scopes.map { |s| s.pluck(:id) }.flatten
where("#{id_column} IN (?)", sub_query)
end
end
end
And then use it like this in any model
class Model
include UnionScope
scope :union_of_scopeA_scopeB, -> { union_scope(scopeA, scopeB) }
end
Tim's answer is great. It uses the ids of the scopes in the WHERE clause. As shosti reports, this method is problematic in terms of performance because all ids need to be generated during query execution. This is why, I prefer joeyk16 answer. Here a generalized module:
module ActiveRecord::UnionScope
def self.included(base)
base.send :extend, ClassMethods
end
module ClassMethods
def self.union(*scopes)
self.from("(#{scopes.map(&:to_sql).join(' UNION ')}) AS #{self.table_name}")
end
end
end
If you don't want to use SQL syntax inside your code, here's solution with arel
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id}).arel
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id}).arel
results = Arel::Nodes::Union.new(watched_news_posts, watched_topic_posts)
from(Post.arel_table.create_table_alias(results, :posts))

Simple ActiveRecord Question

I have a database model set up such that a post has many votes, a user has many votes and a post belongs to both a user and a post. I'm using will paginate and I'm trying to create a filter such that the user can sort a post by either the date or the number of votes a post has. The date option is simple and looks like this:
#posts = Post.paginate :order => "date DESC"
However, I can't quite figure how to do the ordering for the votes. If this were SQL, I would simply use GROUP BY on the votes user_id column, along with the count function and then I would join the result with the posts table.
What's the correct way to do with with ActiveRecord?
1) Use the counter cache mechanism to store the vote count in Post model.
# add a column called votes_count
class Post
has_many :votes
end
class Vote
belongs_to :post, :counter_cache => true
end
Now you can sort the Post model by vote count as follows:
Post.order(:votes_count)
2) Use group by.
Post.select("posts.*, COUNT(votes.post_id) votes_count").
join(:votes).group("votes.post_id").order(:votes_count)
If you want to include the posts without votes in the result-set then:
Post.select("posts.*, COUNT(votes.post_id) votes_count").
join("LEFT OUTER JOIN votes ON votes.post_id=posts.id").
group("votes.post_id").order(:votes_count)
I prefer approach 1 as it is efficient and the cost of vote count calculation is front loaded (i.e. during vote casting).
Just do all the normal SQL stuff as part of the query with options.
#posts = Post.paginate :order => "date DESC", :join => " inner join votes on post.id..." , :group => " votes.user_id"
http://apidock.com/rails/ActiveRecord/Base/find/class
So I don't know much about your models, but you seem to know somethings about SQL so
named scopes: you basically just put the query into a class method:
named_scope :index , :order => 'date DESC', :join => .....
but they can take parameters
named_scope :blah, {|param| #base query on param }
for you, esp if you are more familiar with SQL you can write your own query,
#posts = Post.find_by_sql( <<-SQL )
SELECT posts.*
....
SQL

Rails: how to load 2 models via join?

I am new to rails and would appreciate some help optimizing my database usage.
Is there a way to load two models associated with each other with one DB query?
I have two models Person and Image:
class Person < ActiveRecord::Base
has_many :images
end
class Image < ActiveRecord::Base
belongs_to :person
end
I would like to load a set of people and their associated images with a single trip to the DB using a join command. For instance, in SQL, I can load all the data I need with the following query:
select * from people join images on people.id = images.person_id where people.id in (2, 3) order by timestamp;
So I was hoping that this rails snippet would do what I need:
>> people_and_images = Person.find(:all, :conditions => ["people.id in (?)", "2, 3"], :joins => :images, :order => :timestamp)
This code executes the SQL statement I am expecting and loads the instances of Person I need. However, I see that accessing a a Person's images leads to an additional SQL query.
>> people_and_images[0].images
Image Load (0.004889) SELECT * FROM `images` WHERE (`images`.person_id = 2)
Using the :include option in the call to find() does load both models, however it will cost me an additional SELECT by executing it along with the JOIN.
I would like to do in Rails what I can do in SQL which is to grab all the data I need with one query.
Any help would be greatly appreciated. Thanks!
You want to use :include like
Person.find(:all, :conditions => ["people.id in (?)", "2, 3"], :include => :images, :order => :timestamp)
Check out the find documentation for more details
You can use :include for eager loading of associations and indeed it does call exactly 2 queries instead of one as with the case of :joins; the first query is to load the primary model and the second is to load the associated models. This is especially helpful in solving the infamous N+1 query problem, which you will face if you doesn't use :include, and :joins doesn't eager-load the associations.
the difference between using :joins and :include is 1 query more for :include, but the difference of not using :include will be a whole lot more.
you can check it up here: http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations

Resources