Ruby on Rails 4 count distinct with inner join - ruby-on-rails

I have created a validation rule to limit the number of records a member can create.
class Engine < ActiveRecord::Base
validates :engine_code, presence: true
belongs_to :group
delegate :member, to: :group
validate :engines_within_limit, on: :create
def engines_within_limit
if self.member.engines(:reload).distinct.count(:engine_code) >= self.member.engine_limit
errors.add(:engine, "Exceeded engine limit")
end
end
end
The above doesn't work, specifically this part,
self.member.engines(:reload).distinct.count(:engine_code)
The query it produces is
SELECT "engines".*
FROM "engines"
INNER JOIN "groups"
ON "engines"."group_id" = "groups"."id"
WHERE "groups"."member_id" = $1 [["member_id", 22]]
and returns the count 0 which is wrong
Whereas the following
Engine.distinct.count(:engine_code)
produces the query
SELECT DISTINCT COUNT(DISTINCT "engines"."engine_code")
FROM "engines"
and returns 3 which is correct
What am I doing wrong? It is the same query just with a join?

After doing long chat, we found the below query to work :
self.member
.engines(:reload)
.count("DISTINCT engine_code")

AR:: means ActiveRecord:: below.
The reason for the "wrong" result in the question is that the collection association isn't used correct. A collection association (e.g. has_many) for a record is not a AR::Relation it's a AR::Associations::CollectionProxy. It's a sub class of AR::Relation, and e.g. distinct is overridden.
self.member.engines(:reload).distinct.count(:engine_code) will cause this to happen:
self.member.engines(:reload) is a
AR::Associations::CollectionProxy
.distinct on that will first
fire the db read, then do a .to_a on the result and then doing
"it's own" distinct which is doing a uniq on the array of records
regarding the id of the records.
The result is an array.
.count(:engine_code) this is doing Array#count on the array which is returning
0 since no record in the array equals to the symbol :engine_code.
To get the correct result you should use the relation of the association proxy, .scope:
self.member.engines(:reload).scope.distinct.count(:engine_code)
I think it's a little bit confusing in Rails how collection associations is handled. Many of the "normal" methods for relations works as usual, e.g. this will work without using .scope:
self.member.engines(:reload).where('true').distinct.count(:engine_code)
that is because where isn't overridden by AR::Associations::CollectionProxy.
Perhaps it would be better to always have to use .scope when using the collection as a relation.

Related

Rails 5 select from two different tables and get one result

I have 3 models, Shop, Client, Product.
A shop has many clients, and a shop has many products.
Then I have 2 extra models, one is ShopClient, that groups the shop_id and client_id. The second is ShopProduct, that groups the shop_id and product_id.
Now I have a controller that receives two params, the client_id and product_id. So I want to select all the shops (in one instance variable #shops) filtered by client_id and product_id without shop repetition. How can I do this??
I hope I was clear, thanks.
ps: I'm using Postgresql as database.
Below query will work for you.
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
end
class Client
end
class Product
end
class ShopClient
belongs_to :shop
belongs_to :client
end
class ShopProduct
belongs_to :shop
belongs_to :product
end
#shops = Shop.joins(:clients).where(clients: {id: params[:client_id]}).merge(Shop.joins(:products).where(products: {id: params[:product_id]}))
Just to riff on the answer provided by Prince Bansal. How about creating some class methods for those joins? Something like:
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
class << self
def with_clients(clients)
joins(:clients).where(clients: {id: clients})
end
def with_products(products)
joins(:products).where(products: {id: products})
end
end
end
Then you could do something like:
#shops = Shop.with_clients(params[:client_id]).with_products(params[:product_id])
By the way, I'm sure someone is going to say you should make those class methods into scopes. And you certainly can do that. I did it as class methods because that's what the Guide recommends:
Using a class method is the preferred way to accept arguments for scopes.
But, I realize some people strongly prefer the aesthetics of using scopes instead. So, whichever pleases you most.
I feel like the best way to solve this issue is to use sub-queries. I'll first collect all valid shop ids from ShopClient, followed by all valid shop ids from ShopProduct. Than feed them into the where query on Shop. This will result in one SQL query.
shop_client_ids = ShopClient.where(client_id: params[:client_id]).select(:shop_id)
shop_product_ids = ShopProduct.where(product_id: params[:product_id]).select(:shop_id)
#shops = Shop.where(id: shop_client_ids).where(id: shop_product_ids)
#=> #<ActiveRecord::Relation [#<Shop id: 1, created_at: "2018-02-14 20:22:18", updated_at: "2018-02-14 20:22:18">]>
The above query results in the SQL query below. I didn't specify a limit, but this might be added by the fact that my dummy project uses SQLite.
SELECT "shops".*
FROM "shops"
WHERE
"shops"."id" IN (
SELECT "shop_clients"."shop_id"
FROM "shop_clients"
WHERE "shop_clients"."client_id" = ?) AND
"shops"."id" IN (
SELECT "shop_products"."shop_id"
FROM "shop_products"
WHERE "shop_products"."product_id" = ?)
LIMIT ?
[["client_id", 1], ["product_id", 1], ["LIMIT", 11]]
Combining the two sub-queries in one where doesn't result in a correct response:
#shops = Shop.where(id: [shop_client_ids, shop_product_ids])
#=> #<ActiveRecord::Relation []>
Produces the query:
SELECT "shops".* FROM "shops" WHERE "shops"."id" IN (NULL, NULL) LIMIT ? [["LIMIT", 11]]
note
Keep in mind that when you run the statements one by one in the console this will normally result in 3 queries. This is due to the fact that the return value uses the #inspect method to let you see the result. This method is overridden by Rails to execute the query and display the result.
You can simulate the behavior of the normal application by suffixing the statements with ;nil. This makes sure nil is returned and the #inspect method is not called on the where chain, thus not executing the query and keeping the chain in memory.
edit
If you want to clean up the controller you might want to move these sub-queries into model methods (inspired by jvillians answer).
class Shop
# ...
def self.with_clients(*client_ids)
client_ids.flatten! # allows passing of multiple arguments or an array of arguments
where(id: ShopClient.where(client_id: client_ids).select(:shop_id))
end
# ...
end
Rails sub-query vs join
The advantage of a sub-query over a join is that using joins might end up returning the same record multiple times if you query on a attribute that is not unique. For example, say a product has an attribute product_type that is either 'physical' or 'digital'. If you want to select all shops selling a digital product you must not forget to call distinct on the chain when you're using a join, otherwise the same shop may return multiple times.
However if you'll have to query on multiple attributes in product, and you'll use multiple helpers in the model (where each helper joins(:products)). Multiple sub-queries are likely slower. (Assuming you set has_many :products, through: :shop_products.) Since Rails reduces all joins to the same association to a single one. Example: Shop.joins(:products).joins(:products) (from multiple class methods) will still end up joining the products table a single time, whereas sub-queries will not be reduced.
Below sql query possibly gonna work for you.
--
-- assuming
-- tables: shops, products, clients, shop_products, shop_clients
--
SELECT DISTINCT * FROM shops
JOIN shop_products
ON shop_products.shop_id = shops.id
JOIN shop_clients
ON shop_clients.shop_id = shops.id
WHERE shop_clients.client_id = ? AND shop_products.product_id = ?
If you'll face difficulties while creating an adequate AR expression for this sql query, let me know.
Btw, here is a mock

Rails optional association query

i've been trying to solve this issue for some time now with no success :(
i have 2 model classes - ConfigurationKey and ConfigurationItem, as follows:
class ConfigurationKey < ActiveRecord::Base
has_many :configuration_items
# this class also has a 'name' attribute
end
class ConfigurationItem < ActiveRecord::Base
belongs_to :app
belongs_to :configuration_key
end
i would like to fetch all of the ConfigurationKeys that have a specific 'name' attribute, along with a filtered subset of their associated ConfigurationItems, in one single query.
i used the following command:
configuration_key = ConfigurationKey.includes(:configuration_items).where(name: key_name, configuration_items: { app: [nil, app] })
but the ConfigurationKeys that don't have any associated ConfigurationItems are not returned.
i thought the the 'includes' clause, or the explicit use of 'LEFT OUTER JOIN' would make it work, but it didn't :/
is there any possible way to do this, or do i have to use 2 queries - one to get all of the relevant ConfigurationKeys, and another in order to get all of the relevant ConfigurationItems?
thanks ;)
Using includes with where clause in rails 4.2 generates a LEFT OUTER JOIN query.
Please take a look at the generated sql in rails console.
$ rails c
> ConfigurationKey.includes(:configuration_items).where(name: key_name, configuration_items: { app: [nil, app] })
# Sql is displayed...
Probably, you'll see LEFT OUTER JOINed sql, and if so, it's correct.
Mind you, what you get via ActiveRecord is NOT equals to results from sql. It returns you DISTINCT results from its sql.
So, I think it's impossible to make a success in one single query.

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Specifying conditions on eager loaded associations returns ActiveRecord::RecordNotFound

The problem is that when a Restaurant does not have any MenuItems that match the condition, ActiveRecord says it can't find the Restaurant. Here's the relevant code:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
def self.with_meals_of_the_week
includes({menu_items: :meal}).where(:'menu_items.date' => Time.now.beginning_of_week..Time.now.end_of_week)
end
end
And the sql code generated:
Restaurant Load (0.0ms)←[0m ←[1mSELECT DISTINCT "restaurants".id FROM "restaurants"
LEFT OUTER JOIN "menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN "meals" ON "meals"."id" = "menu_items"."meal_id" WHERE
"restaurants"."id" = ? AND ("menu_items"."date" BETWEEN '2012-10-14 23:00:00.000000'
AND '2012-10-21 22:59:59.999999') LIMIT 1←[0m [["id", "1"]]
However, according to this part of the Rails Guides, this shouldn't be happening:
Post.includes(:comments).where("comments.visible", true)
If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded.
The SQL generated is a correct translation of your query. But look at it,
just at the SQL level (i shortened it a bit):
SELECT *
FROM
"restaurants"
LEFT OUTER JOIN
"menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN
"meals" ON "meals"."id" = "menu_items"."meal_id"
WHERE
"restaurants"."id" = ?
AND
("menu_items"."date" BETWEEN '2012-10-14' AND '2012-10-21')
the left outer joins do the work you expect them to do: restaurants
are combined with menu_items and meals; if there is no menu_item to
go with a restaurant, the restaurant is still kept in the result, with
all the missing pieces (menu_items.id, menu_items.date, ...) filled in with NULL
now look aht the second part of the where: the BETWEEN operator demands,
that menu_items.date is not null! and this
is where you filter out all the restaurants without meals.
so we need to change the query in a way that makes having null-dates ok.
going back to ruby, you can write:
def self.with_meals_of_the_week
includes({menu_items: :meal})
.where('menu_items.date is NULL or menu_items.date between ? and ?',
Time.now.beginning_of_week,
Time.now.end_of_week
)
end
The resulting SQL is now
.... WHERE (menu_items.date is NULL or menu_items.date between '2012-10-21' and '2012-10-28')
and the restaurants without meals stay in.
As it is said in Rails Guide, all Posts in your query will be returned only if you will not use "where" clause with "includes", cause using "where" clause generates OUTER JOIN request to DB with WHERE by right outer table so DB will return nothing.
Such implementation is very helpful when you need some objects (all, or some of them - using where by base model) and if there are related models just get all of them, but if not - ok just get list of base models.
On other hand if you trying to use conditions on including tables then in most cases you want to select objects only with this conditions it means you want to select Restaurants only which has meals_items.
So in your case, if you still want to use only 2 queries (and not N+1) I would probably do something like this:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
cattr_accessor :meals_of_the_week
def self.with_meals_of_the_week
restaurants = Restaurant.all
meals_of_the_week = {}
MenuItems.includes(:meal).where(date: Time.now.beginning_of_week..Time.now.end_of_week, restaurant_id => restaurants).each do |menu_item|
meals_of_the_week[menu_item.restaurant_id] = menu_item
end
restaurants.each { |r| r.meals_of_the_week = meals_of_the_week[r.id] }
restaurants
end
end
Update: Rails 4 will raise Deprecation warning when you simply try to do conditions on models
Sorry for possible typo.
I think there is some misunderstanding of this
If there was no where condition, this would generate the normal set of two queries.
If, in the case of this includes query, there were no comments for any
posts, all the posts would still be loaded. By using joins (an INNER
JOIN), the join conditions must match, otherwise no records will be
returned.
[from guides]
I think this statements doesn't refer to the example Post.includes(:comments).where("comments.visible", true)
but refer to one without where statement Post.includes(:comments)
So all work right! This is the way LEFT OUTER JOIN work.
So... you wrote: "If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded." Ok! But this is true ONLY when there is NO where clause! You missed the context of the phrase.

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Resources