ActiveRecord Query Union - ruby-on-rails

I've written a couple of complex queries (at least to me) with Ruby on Rail's query interface:
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})
Both of these queries work fine by themselves. Both return Post objects. I would like to combine these posts into a single ActiveRelation. Since there could be hundreds of thousands of posts at some point, this needs to be done at the database level. If it were a MySQL query, I could simply user the UNION operator. Does anybody know if I can do something similar with RoR's query interface?

Here's a quick little module I wrote that allows you to UNION multiple scopes. It also returns the results as an instance of ActiveRecord::Relation.
module ActiveRecord::UnionScope
def self.included(base)
base.send :extend, ClassMethods
end
module ClassMethods
def union_scope(*scopes)
id_column = "#{table_name}.id"
sub_query = scopes.map { |s| s.select(id_column).to_sql }.join(" UNION ")
where "#{id_column} IN (#{sub_query})"
end
end
end
Here's the gist: https://gist.github.com/tlowrimore/5162327
Edit:
As requested, here's an example of how UnionScope works:
class Property < ActiveRecord::Base
include ActiveRecord::UnionScope
# some silly, contrived scopes
scope :active_nearby, -> { where(active: true).where('distance <= 25') }
scope :inactive_distant, -> { where(active: false).where('distance >= 200') }
# A union of the aforementioned scopes
scope :active_near_and_inactive_distant, -> { union_scope(active_nearby, inactive_distant) }
end

I also have encountered this problem, and now my go-to strategy is to generate SQL (by hand or using to_sql on an existing scope) and then stick it in the from clause. I can't guarantee it's any more efficient than your accepted method, but it's relatively easy on the eyes and gives you a normal ARel object back.
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})
Post.from("(#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}) AS posts")
You can do this with two different models as well, but you need to make sure they both "look the same" inside the UNION -- you can use select on both queries to make sure they will produce the same columns.
topics = Topic.select('user_id AS author_id, description AS body, created_at')
comments = Comment.select('author_id, body, created_at')
Comment.from("(#{comments.to_sql} UNION #{topics.to_sql}) AS comments")

Based on Olives' answer, I did come up with another solution to this problem. It feels a little bit like a hack, but it returns an instance of ActiveRelation, which is what I was after in the first place.
Post.where('posts.id IN
(
SELECT post_topic_relationships.post_id FROM post_topic_relationships
INNER JOIN "watched" ON "watched"."watched_item_id" = "post_topic_relationships"."topic_id" AND "watched"."watched_item_type" = "Topic" WHERE "watched"."user_id" = ?
)
OR posts.id IN
(
SELECT "posts"."id" FROM "posts" INNER JOIN "news" ON "news"."id" = "posts"."news_id"
INNER JOIN "watched" ON "watched"."watched_item_id" = "news"."id" AND "watched"."watched_item_type" = "News" WHERE "watched"."user_id" = ?
)', id, id)
I'd still appreciate it if anybody has any suggestions to optimize this or improve the performance, because it's essentially executing three queries and feels a little redundant.

You could also use Brian Hempel's active_record_union gem that extends ActiveRecord with an union method for scopes.
Your query would be like this:
Post.joins(:news => :watched).
where(:watched => {:user_id => id}).
union(Post.joins(:post_topic_relationships => {:topic => :watched}
.where(:watched => {:user_id => id}))
Hopefully this will be eventually merged into ActiveRecord some day.

Could you use an OR instead of a UNION?
Then you could do something like:
Post.joins(:news => :watched, :post_topic_relationships => {:topic => :watched})
.where("watched.user_id = :id OR topic_watched.user_id = :id", :id => id)
(Since you are joins the watched table twice I'm not too sure what the names of the tables will be for the query)
Since there are a lot of joins, it might also be quite heavy on the database, but it might be able to be optimized.

How about...
def union(scope1, scope2)
ids = scope1.pluck(:id) + scope2.pluck(:id)
where(id: ids.uniq)
end

Arguably, this improves readability, but not necessarily performance:
def my_posts
Post.where <<-SQL, self.id, self.id
posts.id IN
(SELECT post_topic_relationships.post_id FROM post_topic_relationships
INNER JOIN watched ON watched.watched_item_id = post_topic_relationships.topic_id
AND watched.watched_item_type = "Topic"
AND watched.user_id = ?
UNION
SELECT posts.id FROM posts
INNER JOIN news ON news.id = posts.news_id
INNER JOIN watched ON watched.watched_item_id = news.id
AND watched.watched_item_type = "News"
AND watched.user_id = ?)
SQL
end
This method returns an ActiveRecord::Relation, so you could call it like this:
my_posts.order("watched_item_type, post.id DESC")

There is an active_record_union gem.
Might be helpful
https://github.com/brianhempel/active_record_union
With ActiveRecordUnion, we can do:
the current user's (draft) posts and all published posts from anyone
current_user.posts.union(Post.published)
Which is equivalent to the following SQL:
SELECT "posts".* FROM (
SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1
UNION
SELECT "posts".* FROM "posts" WHERE (published_at < '2014-07-19 16:04:21.918366')
) posts

In a similar case I summed two arrays and used Kaminari:paginate_array(). Very nice and working solution. I was unable to use where(), because I need to sum two results with different order() on the same table.

Heres how I joined SQL queries using UNION on my own ruby on rails application.
You can use the below as inspiration on your own code.
class Preference < ApplicationRecord
scope :for, ->(object) { where(preferenceable: object) }
end
Below is the UNION where i joined the scopes together.
def zone_preferences
zone = Zone.find params[:zone_id]
zone_sql = Preference.for(zone).to_sql
region_sql = Preference.for(zone.region).to_sql
operator_sql = Preference.for(Operator.current).to_sql
Preference.from("(#{zone_sql} UNION #{region_sql} UNION #{operator_sql}) AS preferences")
end

Less problems and easier to follow:
def union_scope(*scopes)
scopes[1..-1].inject(where(id: scopes.first)) { |all, scope| all.or(where(id: scope)) }
end
So in the end:
union_scope(watched_news_posts, watched_topic_posts)

gem 'active_record_extended'
Also has a set of union helpers among many others.

I would just run the two queries you need and combine the arrays of records that are returned:
#posts = watched_news_posts + watched_topics_posts
Or, at the least test it out. Do you think the array combination in ruby will be far too slow? Looking at the suggested queries to get around the problem, I'm not convinced that there will be that significant of a performance difference.

Elliot Nelson answered good, except the case where some of the relations are empty. I would do something like that:
def union_2_relations(relation1,relation2)
sql = ""
if relation1.any? && relation2.any?
sql = "(#{relation1.to_sql}) UNION (#{relation2.to_sql}) as #{relation1.klass.table_name}"
elsif relation1.any?
sql = relation1.to_sql
elsif relation2.any?
sql = relation2.to_sql
end
relation1.klass.from(sql)
end

When we add UNION to the scopes, it breaks at time due to order_by clause added before the UNION.
So I changed it in a way to give it a UNION effect.
module UnionScope
def self.included(base)
base.send(:extend, ClassMethods)
end
module ClassMethods
def union_scope(*scopes)
id_column = "#{table_name}.id"
sub_query = scopes.map { |s| s.pluck(:id) }.flatten
where("#{id_column} IN (?)", sub_query)
end
end
end
And then use it like this in any model
class Model
include UnionScope
scope :union_of_scopeA_scopeB, -> { union_scope(scopeA, scopeB) }
end

Tim's answer is great. It uses the ids of the scopes in the WHERE clause. As shosti reports, this method is problematic in terms of performance because all ids need to be generated during query execution. This is why, I prefer joeyk16 answer. Here a generalized module:
module ActiveRecord::UnionScope
def self.included(base)
base.send :extend, ClassMethods
end
module ClassMethods
def self.union(*scopes)
self.from("(#{scopes.map(&:to_sql).join(' UNION ')}) AS #{self.table_name}")
end
end
end

If you don't want to use SQL syntax inside your code, here's solution with arel
watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id}).arel
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id}).arel
results = Arel::Nodes::Union.new(watched_news_posts, watched_topic_posts)
from(Post.arel_table.create_table_alias(results, :posts))

Related

active_model_serializer returning more than one result

I'm trying to return the another_id for a related record. I would just add a has_many and belongs_to relation for each project, but I need to have the user id in order to return the correct results. However, with the code I have below, it returns all of the possible another_ids for the current_user.
If I enter this into psql, it works fine:
WITH RECURSIVE t(id, parent_id, path) AS (
SELECT thing.id, thing.parent_id, ARRAY[thing.id]
FROM thing, projects
WHERE thing.id = 595
UNION
SELECT i.id, i.parent_id, i.parent_id || t.path
FROM thing i
INNER JOIN t ON t.parent_id = i.id
)
SELECT DISTINCT user_thing.another_id FROM user_thing
INNER JOIN t on t.id = user_thing.thing_id
WHERE t.id = user_thing.thing_id AND user_thing.user_id = 2;
another_id
-----------
52
(1 row)
But if I run the code from the serializer, it returns: [52, 51]:
class ProjectSerializer < ActiveModel::Serializer
attributes :id, :another_id
def another_id__sql
"(WITH RECURSIVE t(id, parent_id, path) AS (
SELECT thing.id, thing.parent_id, ARRAY[thing.id]
FROM thing, projects
WHERE thing.id = projects.thing_id
UNION
SELECT i.id, i.parent_id, i.parent_id || t.path
FROM thing i
INNER JOIN t ON t.parent_id = i.id
)
SELECT DISTINCT user_thing.another_id FROM user_thing
INNER JOIN t on t.id = user_thing.thing_id
WHERE t.id = user_thing.thing_id AND user_thing.user_id = #{options[:current_user].id})"
end
end
class API::V1::ProjectsController < API::APIController
def index
render json: Project.all
end
private
def default_serializer_options
{ current_user: #current_user }
end
end
From what I can gather, I'm not understanding how active_model_serializers serializes more than one record.
I'm using rails 4.2.3 and active_model_serializers 0.8.3. I'm afraid I can't change the schema. Also, it probably doesn't matter, but this is the API for an Ember app.
Thanks in advance. I'm a bit embarrassed that I'm having trouble with this.
Edit:
I should probably mention that this is what my project model looks like:
class Project < ActiveRecord::Base
belongs_to :thing
has_many :user_thing, through: :thing
attr_accessor :another_id
def set_another_id(user)
connection = ActiveRecord::Base.connection
result = connection.execute("(WITH RECURSIVE t(id, parent_id, path) AS (
SELECT thing.id, thing.parent_id, ARRAY[thing.id]
FROM thing, projects
WHERE thing.id = #{thing_id}
UNION
SELECT i.id, i.parent_id, i.parent_id || t.path
FROM thing i
INNER JOIN t ON t.parent_id = i.id
)
SELECT DISTINCT user_thing.another_id FROM user_thing
INNER JOIN t on t.id = user_thing.thing_id
WHERE t.id = user_thing.thing_id AND user_thing.user_id = #{user.id})")
#another_id = result[0]["another_id"].to_i
end
end
And this is the show action in the controller:
def show
#project = Project.find(params[:id])
#project.set_another_id(#current_user)
render json: #project
end
The show action does return the correct id.
Also, I know what I have is incorrect. The thing is that I can't just use the activerecord associations, because it depends on that session's current user.
Edit 2:
I thought I was able to get it to work if I just rendered it using: render json: Project.all.to_json, and got rid of the another_id__sql method in the serializer. That does work if it does have another_id. However, if that's nil, I get the error: "NoMethodError in API::V1::ProjectsController#index undefined method []' for nil:NilClass". It looks like this is a possible bug in 0.8, so I'll either have to ask another Stack Overflow question, or I'll have to see if I can upgrade theactive_model_serializers` gem. I was wrong! See my answer below.
All the DB logic belongs in your model, not in your serializer. The serializers simply state what is supposed to be exposed, but it should not be responsible for computing it.
So here, I'd advise to make this another_id a method on your model, which won't solve your issue (as it seems it is more of an SQL issue than anything else), but it will make it so that you don't have a problem with AMS anymore.
Serializers take a record and return a serialized representation suitable for JSON or XML encoding.
They are meant as an alternative to littering your controllers with this:
render json: #users, except: [:foo, :bar, :baz], include: [..........]
And the mental flatulence that is jbuilder.
SQL queries and scopes instead belong in your models.
You can set the serializer by using the each_serializer option. But in this case it will not do you much good the objects you serialize must at least implement the base methods for a serializable model.
So you need to re-write your query so that it returns a collection or array of records.
see:
http://apidock.com/rails/ActiveRecord/Base/find_by_sql/class
https://github.com/rails-api/active_model_serializers/blob/master/lib/active_model_serializers/model.rb
https://github.com/rails-api/active_model_serializers
Got it! It appears that I needed one more method in the serializer:
project_serializer.rb
def another_id
object.another_id
end
def another_id__sql
# sql from before
end
I'm not 100% sure why this works, but I had noticed that, if I left out the another_id__sql, I would get the error column.projects.another_id does not exist. So, I'm guessing that the another_id__sql is called when it's returning an array, but uses the another_id method when the object is a single project record.
I'd still love to hear better ways to do this!

Condition true for ALL records in join

I'm trying to return records from A where all matching records from B satisfy a condition. At the moment my query returns records from A where there is any record from B that satisfies the condition. Let me put this into a real world scenario.
Post.joins(:categories)
.where(:categories => { :type => "foo" })
This will return Posts that have a category of type "foo", what I want is Posts whose categories are ALL of type "foo"!
Help appreciated!
Using your db/schema.rb as posted in #rubyonrails on IRC something like:
Incident.select("incidents.id").
joins("INNER JOIN category_incidents ON category_incidents.incident_id = incidents.id").
joins("INNER JOIN category_marks ON category_marks.category_id = category_incidents.category_id").
where(:category_marks => { :user_group_id => current_user.user_group_id }).
group("incidents.id").
having("SUM(CASE WHEN category_marks.inc = 1 THEN 1 ELSE 0 END) = count(category_indicents.incident_id)")
would do the trick.
It joins the category_marks for the current_user and checks if the count of records with .inc = 1 equals the count of all joined records.
Do note that this only fetches incident.id
I would add a select to the end of this query to check if all categories have type foo. I would also simplify that check by adding an instance method to the Category model.
Post.joins(:categories).select{|p| p.categories.all?(&:type_foo?)}
Category Model
def type_foo?
type == "foo"
end
ADDITION: This is a bit "hacky" but you could make it a scope this way.
class Post < ActiveRecord::Base
scope :category_type_foo, lambda{
post_ids = Post.all.collect{|p| p.id if p.categories.all?(&:type_foo?).compact
Post.where(id: post_ids) }
end
Have you tried query in the opposite direction? i.e.
Categories.where(type: 'foo').joins(:posts)
I may have misunderstood your question though.
Another alternative is
Post.joins(:classifications).where(type: 'foo')

Rails ActiveRecord: Find All Users Except Current User

I feel this should be very simple but my brain is short-circuiting on it. If I have an object representing the current user, and want to query for all users except the current user, how can I do this, taking into account that the current user can sometimes be nil?
This is what I am doing right now:
def index
#users = User.all
#users.delete current_user
end
What I don't like is that I am doing post-processing on the query result. Besides feeling a little wrong, I don't think this will work nicely if I convert the query over to be run with will_paginate. Any suggestions for how to do this with a query? Thanks.
It is possible to do the following in Rails 4 and up:
User.where.not(id: id)
You can wrap it in a nice scope.
scope :all_except, ->(user) { where.not(id: user) }
#users = User.all_except(current_user)
Or use a class method if you prefer:
def self.all_except(user)
where.not(id: user)
end
Both methods will return an AR relation object. This means you can chain method calls:
#users = User.all_except(current_user).paginate
You can exclude any number of users because where() also accepts an array.
#users = User.all_except([1,2,3])
For example:
#users = User.all_except(User.unverified)
And even through other associations:
class Post < ActiveRecord::Base
has_many :comments
has_many :commenters, -> { uniq }, through: :comments
end
#commenters = #post.commenters.all_except(#post.author)
See where.not() in the API Docs.
#users = (current_user.blank? ? User.all : User.find(:all, :conditions => ["id != ?", current_user.id]))
You can also create named_scope, e.g. in your model:
named_scope :without_user, lambda{|user| user ? {:conditions => ["id != ?", user.id]} : {} }
and in controller:
def index
#users = User.without_user(current_user).paginate
end
This scope will return all users when called with nil and all users except given in param in other case. The advantage of this solution is that you are free to chain this call with other named scopes or will_paginate paginate method.
Here is a shorter version:
User.all :conditions => (current_user ? ["id != ?", current_user.id] : [])
One note on GhandaL's answer - at least in Rails 3, it's worth modifying to
scope :without_user, lambda{|user| user ? {:conditions => ["users.id != ?", user.id]} : {} }
(the primary change here is from 'id != ...' to 'users.id !=...'; also scope instead of named_scope for Rails 3)
The original version works fine when simply scoping the Users table. When applying the scope to an association (e.g. team.members.without_user(current_user).... ), this change was required to clarify which table we're using for the id comparison. I saw a SQL error (using SQLite) without it.
Apologies for the separate answer...i don't yet have the reputation to comment directly on GhandaL's answer.
Very easy solution I used
#users = User.all.where("id != ?", current_user.id)
User.all.where("id NOT IN(?)", current_user.id) will through exception
undefined method where for #<Array:0x0000000aef08f8>
User.where("id NOT IN (?)", current_user.id)
Another easy way you could do it:
#users = User.all.where("id NOT IN(?)", current_user.id)
an array would be more helpful
arrayID[0]=1
arrayID[1]=3
User.where.not(id: arrayID)
User.where(:id.ne=> current_user.id)
ActiveRecord::QueryMethods#excluding (Rails 7+)
Starting from Rails 7, there is a new method ActiveRecord::QueryMethods#excluding.
A quote right from the official Rails docs:
excluding(*records)
Excludes the specified record (or collection of records) from the resulting relation. For example:
Post.excluding(post)
# SELECT "posts".* FROM "posts" WHERE "posts"."id" != 1
Post.excluding(post_one, post_two)
# SELECT "posts".* FROM "posts" WHERE "posts"."id" NOT IN (1, 2)
This can also be called on associations. As with the above example, either a single record of collection thereof may be specified:
post = Post.find(1)
comment = Comment.find(2)
post.comments.excluding(comment)
# SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = 1 AND "comments"."id" != 2
This is short-hand for .where.not(id: post.id) and .where.not(id: [post_one.id, post_two.id]).
An ArgumentError will be raised if either no records are specified, or if any of the records in the collection (if a collection is passed in) are not instances of the same model that the relation is scoping.
Also aliased as: without
Sources:
Official docs - ActiveRecord::QueryMethods#excluding
PR - Add #excluding to ActiveRecord::Relation to exclude a record (or collection of records) from the resulting relation.
What's Cooking in Rails 7?
What you are doing is deleting the current_user from the #users Array. This won't work since there isn't a delete method for arrays. What you probably want to do is this
def index
#users = User.all
#users - [current_user]
end
This will return a copy of the #users array, but with the current_user object removed (it it was contained in the array in the first place.
Note: This may not work if array subtraction is based on exact matches of objects and not the content. But it worked with strings when I tried it. Remember to enclose current_user in [] to force it into an Array.

Fastest way to search two models connected via join table in rails given large data set

I have a user model and a cd model connected through a join table 'cds_users'. I'm trying to return a hash of users plus each cd they have in common with the original user.
#user.users_with_similar_cds(1,4,5)
# => {:bob => [4], :tim => [1,5]}
Is there a better/faster way of doing this without looping so much? Maybe a more direct way?
def users_with_similar_cds(*args)
similar_users = {}
Cd.find(:all, :conditions => ["cds.id IN (?)", args]).each do |cd|
cd.users.find(:all, :conditions => ["users.id != ?", self.id]).each do |user|
if similar_users[user.name]
similar_users[user.name] << cd.id
else
similar_users[user.name] = [cd.id]
end
end
end
similar_users
end
[addition]
Taking the join model idea, I could do something like this. I'll call the model 'joined'.
def users_with_similar_cds(*args)
similar_users = {}
Joined.find(:all, :conditions => ["user_id != ? AND cd_id IN (?)", self.id, args]).each do |joined|
if similar_users[joined.user_id]
similar_users[joined.user_id] << cd_id
else
similar_users[joined.user_id] = [cd_id]
end
end
similar_users
end
Would this be the fastest way on large data sets?
You could use find_by_sql on the Users model, and Active Record will dynamically add methods for any extra fields returned by the query. For example:
similar_cds = Hash.new
peeps = Users.find_by_sql("SELECT Users.*, group_concat(Cds_Users.cd_id) as cd_ids FROM Users, Cds_Users GROUP BY Users.id")
peeps.each { |p| similar_cds[p.name] = p.cd_ids.split(',') }
I haven't tested this code, and this particular query will only work if your database supports group_concat (eg, MySQL, recent versions of Oracle, etc), but you should be able to do something similar with whatever database you use.
Yap, you can, with only 2 selects:
Make a join table model named CdUser (use has_many.. through)
# first select
cd_users = CdUser.find(:all, :conditions => ["cd_id IN (?)", args])
cd_users_by_cd_id = cd_users.group_by{|cd_user| cd_user.cd_id }
users_ids = cd_users.collect{|cd_user| cd_user.user_id }.uniq
#second select
users_by_id = User.find_all_by_id(users_ids).group_by{|user| user.id}
cd_users_by_cd_id.each{|cd_id, cd_user_hash|
result_hash[:cd_id] = cd_users_hash.collect{|cd_user| users_by_id[cd_user.user_id]}
}
This is just an ideea, haven't tested :)
FYI: http://railscasts.com/episodes/47-two-many-to-many

Find all objects with no associated has_many objects

In my online store, an order is ready to ship if it in the "authorized" state and doesn't already have any associated shipments. Right now I'm doing this:
class Order < ActiveRecord::Base
has_many :shipments, :dependent => :destroy
def self.ready_to_ship
unshipped_orders = Array.new
Order.all(:conditions => 'state = "authorized"', :include => :shipments).each do |o|
unshipped_orders << o if o.shipments.empty?
end
unshipped_orders
end
end
Is there a better way?
In Rails 3 using AREL
Order.includes('shipments').where(['orders.state = ?', 'authorized']).where('shipments.id IS NULL')
You can also query on the association using the normal find syntax:
Order.find(:all, :include => "shipments", :conditions => ["orders.state = ? AND shipments.id IS NULL", "authorized"])
One option is to put a shipment_count on Order, where it will be automatically updated with the number of shipments you attach to it. Then you just
Order.all(:conditions => [:state => "authorized", :shipment_count => 0])
Alternatively, you can get your hands dirty with some SQL:
Order.find_by_sql("SELECT * FROM
(SELECT orders.*, count(shipments) AS shipment_count FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' GROUP BY orders.id)
AS order WHERE shipment_count = 0")
Test that prior to using it, as SQL isn't exactly my bag, but I think it's close to right. I got it to work for similar arrangements of objects on my production DB, which is MySQL.
Note that if you don't have an index on orders.status I'd strongly advise it!
What the query does: the subquery grabs all the order counts for all orders which are in authorized status. The outer query filters that list down to only the ones which have shipment counts equal to zero.
There's probably another way you could do it, a little counterintuitively:
"SELECT DISTINCT orders.* FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' AND shipments.id IS NULL"
Grab all orders which are authorized and don't have an entry in the shipments table ;)
This is going to work just fine if you're using Rails 6.1 or newer:
Order.where(state: 'authorized').where.missing(:shipments)

Resources