How to make ActiveRecord query unique by a column - ruby-on-rails

I have a Company model that has many Disclosures. The Disclosure has columns named title, pdf and pdf_sha256.
class Company < ActiveRecord::Base
has_many :disclosures
end
class Disclosure < ActiveRecord::Base
belongs_to :company
end
I want to make it unique by pdf_sha256 and if pdf_sha256 is nil that should be treated as unique.
If it is an Array, I'll write like this.
companies_with_sha256 = company.disclosures.where.not(pdf_sha256: nil).group_by(&:pdf_sha256).map do |key,values|
values.max_by{|v| v.title.length}
end
companies_without_sha256 = company.disclosures.where(pdf_sha256: nil)
companies = companies_with_sha256 + companeis_without_sha256
How can I get the same result by using ActiveRecord query?

It is possible to do it in one query by first getting a different id for each different pdf_sha256 as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:
def unique_disclosures_by_pdf_sha256(company)
subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
company.disclosures.where(id: subquery)
.or(company.disclosures.where(pdf_sha256: nil))
end
The great thing about this is that ActiveRecord is lazy loaded, so the first subquery will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures unique by pdf_sha256 plus all the ones that have pdf_sha256 set to nil.
In case you are curious, given a company, the resulting query will be something like:
SELECT "disclosures".* FROM "disclosures"
WHERE (
"disclosures"."company_id" = $1 AND "disclosures"."id" IN (
SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
)
OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)
The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id instead of the whole model and limit the number of results returned by the database:
unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }

You can achieve this by using uniq method
Company.first.disclosures.to_a.uniq(&:pdf_sha256)
This will return you the disclosures records uniq by cloumn "pdf_sha256"
Hope this helps you! Cheers

Assuming you are using Rails 5 you could chain a .or command to merge both your queries.
pdf_sha256_unique_disclosures = company.disclosures.where(pdf_sha256: nil).or(company.disclosures.where.not(pdf_sha256: nil))
Then you can proceed with your group_by logic.
However, in the example above i'm not exactly sure what is the objective but I am curious to better understand how you would use the resulting companies variable.
If you wanted to have a hash of unique pdf_sha256 keys including nil, and its resultant unique disclosure document you could try the following:
sorted_disclosures = company.disclosures.group_by(&:pdf_sha256).each_with_object({}) do |entries, hash|
hash[entries[0]] = entries[1].max_by{|v| v.title.length}
end
This should give you a resultant hash like structure similar to the group_by where your keys are all your unique pdf_sha256 and the value would be the longest named disclosure that match that pdf_sha256.

Why not:
ids = Disclosure.select(:id, :pdf_sha256).distinct.map(&:id)
Disclosure.find(ids)
The id sill be distinct either way since it's the primary key, so all you have to do is map the ids and find the Disclosures by id.

If you need a relation with distinct pdf_sha256, where you require no explicit conditions, you can use group for that -
scope :unique_pdf_sha256, -> { where.not(pdf_sha256: nil).group(:pdf_sha256) }
scope :nil_pdf_sha256, -> { where(pdf_sha256: nil) }
You could have used or, but the relation passed to it must be structurally compatible. So even if you get same type of relations in these two scopes, you cannot use it with or.
Edit: To make it structurally compatible with each other you can see #AlexSantos 's answer

Model.select(:rating)
Result of this is an array of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:
Model.select(:rating).map(&:rating).uniq
or this (most efficient)
Model.uniq.pluck(:rating)
Model.distinct.pluck(:rating)
Update
Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).
Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']
In this case, deduplicate after the query
user.addresses.pluck(:city).uniq # => ['Moscow']

Related

How do I write a Rails finder method where none of the has_many items has a non-nil field?

I'm using Rails 5. I have the following model ...
class Order < ApplicationRecord
...
has_many :line_items, :dependent => :destroy
The LineItem model has an attribute, "discount_applied." I would like to return all orders where there are zero instances of a line item having the "discount_applied" field being not nil. How do I write such a finder method?
First of all, this really depends on whether or not you want to use a pure Arel approach or if using SQL is fine. The former is IMO only advisable if you intend to build a library but unnecessary if you're building an app where, in reality, it's highly unlikely that you're changing your DBMS along the way (and if you do, changing a handful of manual queries will probably be the least of your troubles).
Assuming using SQL is fine, the simplest solution that should work across pretty much all databases is this:
Order.where("(SELECT COUNT(*) FROM line_items WHERE line_items.order_id = orders.id AND line_items.discount_applied IS NULL) = 0")
This should also work pretty much everywhere (and has a bit more Arel and less manual SQL):
Order.left_joins(:line_items).where(line_items: { discount_applied: nil }).group("orders.id").having("COUNT(line_items.id) = 0")
Depending on your specific DBMS (more specifically: its respective query optimizer), one or the other might be more performant.
Hope that helps.
Not efficient but I thought it may solve your problem:
orders = Order.includes(:line_items).select do |order|
order.line_items.all? { |line_item| line_item.discount_applied.nil? }
end
Update:
Instead of finding orders which all it's line items have no discount, we can exclude all the orders which have line items with a discount applied from the output result. This can be done with subquery inside where clause:
# Find all ids of orders which have line items with a discount applied:
excluded_ids = LineItem.select(:order_id)
.where.not(discount_applied: nil)
.distinct.map(&:order_id)
# exclude those ids from all orders:
Order.where.not(id: excluded_ids)
You can combine them in a single finder method:
Order.where.not(id: LineItem
.select(:order_id)
.where.not(discount_applied: nil))
Hope this helps
A possible code
Order.includes(:line_items).where.not(line_items: {discount_applied: nil})
I advice to get familiar with AR documentation for Query Methods.
Update
This seems to be more interested than I initially though. And more complicated, so I will not be able to give you a working code. But I would look into a solution using LineItem.group(order_id).having(discount_applied: nil), which should give you a collection of line_items and then use it as sub-query to find related orders.
If you want all the records where discount_applied is nil then:
Order.includes(:line_items).where.not(line_items: {discount_applied: nil})
(use includes to avoid n+1 problem)
or
Order.joins(:line_items).where.not(line_items: {discount_applied: nil})
Here is the solution to your problem
order_ids = Order.joins(:line_items).where.not(line_items: {discount_applied: nil}).pluck(:id)
orders = Order.where.not(id: order_ids)
First query will return ids of Orders with at least one line_item having discount_applied. The second query will return all orders where there are zero instances of a line_item having the discount_applied.
I would use the NOT EXISTS feature from SQL, which is at least available in both MySQL and PostgreSQL
it should look like this
class Order
has_many :line_items
scope :without_discounts, -> {
where("NOT EXISTS (?)", line_items.where("discount_applied is not null")
}
end
If I understood correctly, you want to get all orders for which none line item (if any) has a discount applied.
One way to get those orders using ActiveRecord would be the following:
Order.distinct.left_outer_joins(:line_items).where(line_items: { discount_applied: nil })
Here's a brief explanation of how that works:
The solution uses left_outer_joins, assuming you won't be accessing the line items for each order. You can also use left_joins, which is an alias.
If you need to instantiate the line items for each Order instance, add .eager_load(:line_items) to the chain which will prevent doing an additional query for every order (N+1), i.e., doing order.line_items.each in a view.
Using distinct is essential to make sure that orders are only included once in the result.
Update
My previous solution was only checking that discount_applied IS NULL for at least one line item, not all of them. The following query should return the orders you need.
Order.left_joins(:line_items).group(:id).having("COUNT(line_items.discount_applied) = ?", 0)
This is what's going on:
The solution still needs to use a left outer join (orders LEFT OUTER JOIN line_items) so that orders without any associated items are included.
Groups the line items to get a single Order object regardless of how many items it has (GROUP BY recipes.id).
It counts the number of line items that were given a discount for each order, only selecting the ones whose items have zero discounts applied (HAVING (COUNT(line_items.discount_applied) = 0)).
I hope that helps.
You cannot do this efficiently with a classic rails left_joins, but sql left join was build to handle thoses cases
Order.joins("LEFT JOIN line_items AS li ON li.order_id = orders.id
AND li.discount_applied IS NOT NULL")
.where("li.id IS NULL")
A simple inner join will return all orders, joined with all line_items,
but if there are no line_items for this order, the order is ignored (like a false where)
With left join, if no line_items was found, sql will joins it to an empty entry in order to keep it
So we left joined the line_items we don't want, and find all orders joined with an empty line_items
And avoid all code with where(id: pluck(:id)) or having("COUNT(*) = 0"), on day this will kill your database

Can i write this Query in ActiveRecord

for a data analysis i need both results into one set.
a.follower_trackings.pluck(:date, :new_followers, :deleted_followers)
a.data_trackings.pluck(:date, :followed_by_count)
instead of ugly-merging an array (they can have different starting dates and i obv. need only those values where the date exists in both arrays) i thought about mysql
SELECT
followers.new_followers,
followers.deleted_followers,
trackings.date,
trackings.followed_by_count
FROM
instagram_user_follower_trackings AS followers,
instagram_data_trackings AS trackings
WHERE
followers.date = trackings.date
AND
followers.user_id=5
AND
trackings.user_id=5
ORDER
BY trackings.date DESC
This is Working fine, but i wonder if i can write the same with ActiveRecord?
You can do the following which should render the same query as your raw SQL, but it's also quite ugly...:
a.follower_trackings.
merge(a.data_trackings).
from("instagram_user_follower_trackings, instagram_data_trackings").
where("instagram_user_follower_trackings.date = instagram_data_trackings.date").
order(:date => :desc).
pluck("instagram_data_trackings.date",
:new_followers, :deleted_followers, :followed_by_count)
There are a few tricks turned out useful while playing with the scopes: the merge trick adds the data_trackings.user_id = a.id condition but it does not join in the data_trackings, that's why the from clause has to be added, which essentially performs the INNER JOIN. The rest is pretty straightforward and leverages the fact that order and pluck clauses do not need the table name to be specified if the columns are either unique among the tables, or are specified in the SELECT (pluck).
Well, when looking again, I would probably rather define a scope for retrieving the data for a given user (a record) that would essentially use the raw SQL you have in your question. I might also define a helper instance method that would call the scope with self, something like:
def Model
scope :tracking_info, ->(user) { ... }
def tracking_info
Model.tracking_info(self)
end
end
Then one can use simply:
a = Model.find(1)
a.tracking_info
# => [[...], [...]]

does x = User.all create a hash? How do I traverse it?

Let's say I have a User table and a Messages table, they have a has_many belongs_to relationship. I want to find the id: for users who's names are "Bob", then pull the message history for one of the id's.
x = User.where(name: "Bob")
Does that create a hash in variable x, with all the results of users whose names were Bob? The result in the console certainly looks like a hash when I run x. To includes the messages tied to all the Bobs, I think I do:
x = User.where(name: "Bob").includes(:messages)
Now that I have x...how do I find the id's of the people whose names are Bob? I don't want to query the db again, I'd like to do it all via the variable, is that possible?
I then want to get the first message of the first id (the first Bob) in my table. Can that be done via the variable, or do I have to go back to the DB once I have the first id?
Thanks for all the help guys and gals!
Most ActiveRecord queries return a Relation.
You can call x = x.to_a to make rails perform the actual query(there will be 2 SQL queries - one for users and one for messages) and then traverse the resulting array.
This will do it. As referenced in the rails guides. http://guides.rubyonrails.org/active_record_querying.html section 13.2
x = Message.includes(:users).where(users: { name: "Bob"})
and then to get the first message just tack on .first at the end of the query.
x = Message.includes(:users).where(users: { name: "Bob"}).first
You need to query from Message, not User. Joins (inner join) and includes (left outer join) can be used for eager loading, like in your question, or to do query across multiple tables.
Message.joins(:user).where('user.name = "bob"')

Activerecord specifications with 2 different models

I need to find a way to display all Vacancies from my Vacancy model except the ones that a user already applied for.
I keep the IDs of the vacancies a certain user applied for in a seperate model AppliedVacancies.
I was thinking something line the lines of:
#applied = AppliedVacancies.where(employee_id: current_employee)
#appliedvacancies_id = []
#applied.each do |appliedvacancy|
#appliedvacancies_id << appliedvacancy.id
end
#notyetappliedvacancies = Vacancy.where("id != ?", #appliedvacancy_id)
But it does not seem to like getting an array of IDs. How would I go about fixing this?
I get following error:
PG::DatatypeMismatch: ERROR: argument of WHERE must be type boolean, not type record
LINE 1: SELECT "vacancies".* FROM "vacancies" WHERE (id != 13,14)
^
: SELECT "vacancies".* FROM "vacancies" WHERE (id != 13,14)
This is purely an SQL problem.
You cannot use != to compare a value to a set of values. You need to use the IN operator.
#notyetappliedvacancies = Vacancy.where("id NOT IN (?)", #appliedvacancy_id)
As an aside, you can drastically improve the code you've written so far. You are needlessly instantiating complete ActiveRecord models for every record found in your applied_vacancies table, when all you need are the IDs.
A first pass at improvement would be to use pluck to skip the entire process and go straight to the list of IDs:
ids = AppliedVacancies.where(employee_id: current_employee).pluck(:id)
#notyetappliedvacancies = Vacancy.where("id NOT IN (?)", ids)
Next, you can go a step further and eliminate the first query all together (or rather, combine it with the last query as a sub-query) by leaving it as an AREL projection which can be subbed into the second query directly:
ids = AppliedVacancies.select(:id).where(employee_id: current_employee)
#notyetappliedvacancies = Vacancy.where("id NOT IN (?)",App)
This will generate a single query:
select * from vacancies where id not in (select id from applied_vacancies where employee_id = <value>)
Answer like #meagar, but Rails 4 way:
#notyetappliedvacancies = Vacancy.where.not(id: #appliedvacancy_id)

Rails: select unique values from a column

I already have a working solution, but I would really like to know why this doesn't work:
ratings = Model.select(:rating).uniq
ratings.each { |r| puts r.rating }
It selects, but don't print unique values, it prints all values, including the duplicates. And it's in the documentation: http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields
Model.select(:rating)
The result of this is a collection of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:
Model.select(:rating).map(&:rating).uniq
or this (most efficient):
Model.uniq.pluck(:rating)
Rails 5+
Model.distinct.pluck(:rating)
Update
Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).
Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']
In this case, deduplicate after the query
user.addresses.pluck(:city).uniq # => ['Moscow']
If you're going to use Model.select, then you might as well just use DISTINCT, as it will return only the unique values. This is better because it means it returns less rows and should be slightly faster than returning a number of rows and then telling Rails to pick the unique values.
Model.select('DISTINCT rating')
Of course, this is provided your database understands the DISTINCT keyword, and most should.
This works too.
Model.pluck("DISTINCT rating")
If you want to also select extra fields:
Model.select('DISTINCT ON (models.ratings) models.ratings, models.id').map { |m| [m.id, m.ratings] }
Model.uniq.pluck(:rating)
# SELECT DISTINCT "models"."rating" FROM "models"
This has the advantages of not using sql strings and not instantiating models
Model.select(:rating).uniq
This code works as 'DISTINCT' (not as Array#uniq) since rails 3.2
Model.select(:rating).distinct
Another way to collect uniq columns with sql:
Model.group(:rating).pluck(:rating)
If I am going right to way then :
Current query
Model.select(:rating)
is returning array of object and you have written query
Model.select(:rating).uniq
uniq is applied on array of object and each object have unique id. uniq is performing its job correctly because each object in array is uniq.
There are many way to select distinct rating :
Model.select('distinct rating').map(&:rating)
or
Model.select('distinct rating').collect(&:rating)
or
Model.select(:rating).map(&:rating).uniq
or
Model.select(:name).collect(&:rating).uniq
One more thing, first and second query : find distinct data by SQL query.
These queries will considered "london" and "london " same means it will neglect to space, that's why it will select 'london' one time in your query result.
Third and forth query:
find data by SQL query and for distinct data applied ruby uniq mehtod.
these queries will considered "london" and "london " different, that's why it will select 'london' and 'london ' both in your query result.
please prefer to attached image for more understanding and have a look on "Toured / Awaiting RFP".
If anyone is looking for the same with Mongoid, that is
Model.distinct(:rating)
Some answers don't take into account the OP wants a array of values
Other answers don't work well if your Model has thousands of records
That said, I think a good answer is:
Model.uniq.select(:ratings).map(&:ratings)
=> "SELECT DISTINCT ratings FROM `models` "
Because, first you generate a array of Model (with diminished size because of the select), then you extract the only attribute those selected models have (ratings)
You can use the following Gem: active_record_distinct_on
Model.distinct_on(:rating)
Yields the following query:
SELECT DISTINCT ON ( "models"."rating" ) "models".* FROM "models"
In my scenario, I wanted a list of distinct names after ordering them by their creation date, applying offset and limit. Basically a combination of ORDER BY, DISTINCT ON
All you need to do is put DISTINCT ON inside the pluck method, like follow
Model.order("name, created_at DESC").offset(0).limit(10).pluck("DISTINCT ON (name) name")
This would return back an array of distinct names.
Model.pluck("DISTINCT column_name")

Resources