Enforcing uniqueness on a relation based on a single column - ruby-on-rails

I have a class method on my Consumer model that functions as a scope (acts on an ActiveRecord::Relation, returns an ActiveRecord::Relation, can be daisy-chained) and returns doubles of some consumers. I want it to only return unique consumers, and I can't seem to find either a way to do it with Rails helpers or the right SQL to pass in to get what I want. I feel like there's a simple solution here - it's not a complicated problem.
I essentially want Array#uniq, but for ActiveRecord::Relation. I tried a select DISTINCT statement
Consumer.joins(...).where(...).select('DISTINCT consumers.id')
It returned the correct 'uniqueness on consumers.id' property, but it only returned the consumer ids, so that when I eventually loaded the relation, the Consumers didn't have their other attributes.
Without select DISTINCT:
my_consumer = Consumer.joins(...).where(...).first
my_consumer.status # => "active"
With select DISTINCT:
my_consumer = Consumer.joins(...).where(...).select('DISTINCT consumers.id').first
my_consumer.status # => ActiveModel::MissingAttributeError: missing attribute: status
I didn't think that an ActiveRecord::Relation would ever load less than the whole model, but I guess it will with the select statement. When I query the Relation after the select statement for the class it contains, it returns Consumer, which seems strange.

.select('DISTINCT consumers.*')

Related

Can I force the execution of an active record query chain?

I have an edge case where I want to use .first only after my SQL query has been executed.
My case is the next one:
User.select("sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count")
.first
.yield_self { |r| r.bar_count / r.foo_count.to_f }
However, this would throw an SQL error saying that I should include my user_id in the GROUP BY clause. I've already found a hacky solution using to_a, but I really wonder if there is a proper way to force execution before my call to .first.
The error is because first uses an order by statement to order by id.
"Find the first record (or first N records if a parameter is supplied). If no order is defined it will order by primary key."
Instead try take
"Gives a record (or N records if a parameter is supplied) without any implied order. The order will depend on the database implementation. If an order is supplied it will be respected."
So
User.select("sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count")
.take
.yield_self { |r| r.bar_count / r.foo_count.to_f }
should work appropriately however as stated the order is indeterminate.
You may want to use pluck which retrieves only the data instead of select which just alters which fields get loaded into models:
User.pluck(
"sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count"
).map do |foo_count, bar_count|
bar_count / foo_count.to_f
end
You can probably do the division in the query as well if necessary.

How to make ActiveRecord query unique by a column

I have a Company model that has many Disclosures. The Disclosure has columns named title, pdf and pdf_sha256.
class Company < ActiveRecord::Base
has_many :disclosures
end
class Disclosure < ActiveRecord::Base
belongs_to :company
end
I want to make it unique by pdf_sha256 and if pdf_sha256 is nil that should be treated as unique.
If it is an Array, I'll write like this.
companies_with_sha256 = company.disclosures.where.not(pdf_sha256: nil).group_by(&:pdf_sha256).map do |key,values|
values.max_by{|v| v.title.length}
end
companies_without_sha256 = company.disclosures.where(pdf_sha256: nil)
companies = companies_with_sha256 + companeis_without_sha256
How can I get the same result by using ActiveRecord query?
It is possible to do it in one query by first getting a different id for each different pdf_sha256 as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:
def unique_disclosures_by_pdf_sha256(company)
subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
company.disclosures.where(id: subquery)
.or(company.disclosures.where(pdf_sha256: nil))
end
The great thing about this is that ActiveRecord is lazy loaded, so the first subquery will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures unique by pdf_sha256 plus all the ones that have pdf_sha256 set to nil.
In case you are curious, given a company, the resulting query will be something like:
SELECT "disclosures".* FROM "disclosures"
WHERE (
"disclosures"."company_id" = $1 AND "disclosures"."id" IN (
SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
)
OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)
The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id instead of the whole model and limit the number of results returned by the database:
unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }
You can achieve this by using uniq method
Company.first.disclosures.to_a.uniq(&:pdf_sha256)
This will return you the disclosures records uniq by cloumn "pdf_sha256"
Hope this helps you! Cheers
Assuming you are using Rails 5 you could chain a .or command to merge both your queries.
pdf_sha256_unique_disclosures = company.disclosures.where(pdf_sha256: nil).or(company.disclosures.where.not(pdf_sha256: nil))
Then you can proceed with your group_by logic.
However, in the example above i'm not exactly sure what is the objective but I am curious to better understand how you would use the resulting companies variable.
If you wanted to have a hash of unique pdf_sha256 keys including nil, and its resultant unique disclosure document you could try the following:
sorted_disclosures = company.disclosures.group_by(&:pdf_sha256).each_with_object({}) do |entries, hash|
hash[entries[0]] = entries[1].max_by{|v| v.title.length}
end
This should give you a resultant hash like structure similar to the group_by where your keys are all your unique pdf_sha256 and the value would be the longest named disclosure that match that pdf_sha256.
Why not:
ids = Disclosure.select(:id, :pdf_sha256).distinct.map(&:id)
Disclosure.find(ids)
The id sill be distinct either way since it's the primary key, so all you have to do is map the ids and find the Disclosures by id.
If you need a relation with distinct pdf_sha256, where you require no explicit conditions, you can use group for that -
scope :unique_pdf_sha256, -> { where.not(pdf_sha256: nil).group(:pdf_sha256) }
scope :nil_pdf_sha256, -> { where(pdf_sha256: nil) }
You could have used or, but the relation passed to it must be structurally compatible. So even if you get same type of relations in these two scopes, you cannot use it with or.
Edit: To make it structurally compatible with each other you can see #AlexSantos 's answer
Model.select(:rating)
Result of this is an array of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:
Model.select(:rating).map(&:rating).uniq
or this (most efficient)
Model.uniq.pluck(:rating)
Model.distinct.pluck(:rating)
Update
Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).
Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']
In this case, deduplicate after the query
user.addresses.pluck(:city).uniq # => ['Moscow']

Blank check causing extra count call

I am hardly trying to find one comparison of result.blank? and result[0] so finally today when i was checking one query with these two methods.
Here the code, result variable is #categories, which is an ActiveRecord result
This blank check calling one extra db call like SELECT COUNT(*) AS count_all
if #categories.blank?
end
But here that extra query is not showing there.
if #categories[0]
end
Is there any logic behind that? I couldn't find that
It is important to note that assigning a ActiveRecord query to a variable does not return the result of the query. Something like this:
#categories = Category.where(public: true)
Does not return an array with all categories that are public. Instead it returns an Relation which defines an query. The query to the database is execute once you call a method in the relation that needs to return the actual record, for example each, load, count.
That said: When you call blank? on a relation Rails needs to know it the relation will not return an empty array. Therefore Rails executes an query like:
SELECT COUNT(*) FROM categories WHERE public = 1
Because that queries is much faster that fetching all records when the only thing you need to know if there are any matching records.
Whereas #categories[0] works differently. Here it need to load all records to have an array holding all macthing categories and than return the first record in that array.
At this point both version ran only on query to the database. But I guess your next step would be to iterate over the records if there were any. If you used the first version (blank?) then the objects were not loaded, they were only counted. Therefore Rails would need to query for the actual records, what would result in a second query. The second exmaple ([0]) has the records already loaded, therefore not seconds query in needed.

Can i write this Query in ActiveRecord

for a data analysis i need both results into one set.
a.follower_trackings.pluck(:date, :new_followers, :deleted_followers)
a.data_trackings.pluck(:date, :followed_by_count)
instead of ugly-merging an array (they can have different starting dates and i obv. need only those values where the date exists in both arrays) i thought about mysql
SELECT
followers.new_followers,
followers.deleted_followers,
trackings.date,
trackings.followed_by_count
FROM
instagram_user_follower_trackings AS followers,
instagram_data_trackings AS trackings
WHERE
followers.date = trackings.date
AND
followers.user_id=5
AND
trackings.user_id=5
ORDER
BY trackings.date DESC
This is Working fine, but i wonder if i can write the same with ActiveRecord?
You can do the following which should render the same query as your raw SQL, but it's also quite ugly...:
a.follower_trackings.
merge(a.data_trackings).
from("instagram_user_follower_trackings, instagram_data_trackings").
where("instagram_user_follower_trackings.date = instagram_data_trackings.date").
order(:date => :desc).
pluck("instagram_data_trackings.date",
:new_followers, :deleted_followers, :followed_by_count)
There are a few tricks turned out useful while playing with the scopes: the merge trick adds the data_trackings.user_id = a.id condition but it does not join in the data_trackings, that's why the from clause has to be added, which essentially performs the INNER JOIN. The rest is pretty straightforward and leverages the fact that order and pluck clauses do not need the table name to be specified if the columns are either unique among the tables, or are specified in the SELECT (pluck).
Well, when looking again, I would probably rather define a scope for retrieving the data for a given user (a record) that would essentially use the raw SQL you have in your question. I might also define a helper instance method that would call the scope with self, something like:
def Model
scope :tracking_info, ->(user) { ... }
def tracking_info
Model.tracking_info(self)
end
end
Then one can use simply:
a = Model.find(1)
a.tracking_info
# => [[...], [...]]

Select with count distinct in ActiveRecord query not returning aggregated fields

I'm doing a select with a count-distinct in ActiveRecord, but it's not returning any of my aggregated fields.
User.
select(
'users.id, count(distinct(shc.id)) as shipping_credit_count,
count(distinct(sc.id)) as service_credit_count'
).
...
...
group('users.id')
Is only returning #<ActiveRecord::Relation [#<User id: 119>]> I was expecting to see the count in my aggregated fields? Why is nothing being returned?
Your query probably works as expected but the inspect method is throwing you of. Read my answer here for a better description: Why group calculation fields do not show up in query result?
You should be able to call service_credit_count and service_credit_count on your objects even though it does not show up when you log them.
I would however implement it a little bit different. I would on the User model add the methods
def service_credit_count
return service_credit_count_sql if self.respond_to?(:service_credit_count_sql)
services.count
end
def shipping_credit_count
return shipping_credit_count_sql if self.respond_to?(:shipping_credit_count_sql)
shippings.count
end
And then in your query name the fields with the suffix. This way you can always use these counts. There is also a small (quite imature) gem I have written that does this: https://github.com/trialbee/association_count

Resources