Can i write this Query in ActiveRecord - ruby-on-rails

for a data analysis i need both results into one set.
a.follower_trackings.pluck(:date, :new_followers, :deleted_followers)
a.data_trackings.pluck(:date, :followed_by_count)
instead of ugly-merging an array (they can have different starting dates and i obv. need only those values where the date exists in both arrays) i thought about mysql
SELECT
followers.new_followers,
followers.deleted_followers,
trackings.date,
trackings.followed_by_count
FROM
instagram_user_follower_trackings AS followers,
instagram_data_trackings AS trackings
WHERE
followers.date = trackings.date
AND
followers.user_id=5
AND
trackings.user_id=5
ORDER
BY trackings.date DESC
This is Working fine, but i wonder if i can write the same with ActiveRecord?

You can do the following which should render the same query as your raw SQL, but it's also quite ugly...:
a.follower_trackings.
merge(a.data_trackings).
from("instagram_user_follower_trackings, instagram_data_trackings").
where("instagram_user_follower_trackings.date = instagram_data_trackings.date").
order(:date => :desc).
pluck("instagram_data_trackings.date",
:new_followers, :deleted_followers, :followed_by_count)
There are a few tricks turned out useful while playing with the scopes: the merge trick adds the data_trackings.user_id = a.id condition but it does not join in the data_trackings, that's why the from clause has to be added, which essentially performs the INNER JOIN. The rest is pretty straightforward and leverages the fact that order and pluck clauses do not need the table name to be specified if the columns are either unique among the tables, or are specified in the SELECT (pluck).
Well, when looking again, I would probably rather define a scope for retrieving the data for a given user (a record) that would essentially use the raw SQL you have in your question. I might also define a helper instance method that would call the scope with self, something like:
def Model
scope :tracking_info, ->(user) { ... }
def tracking_info
Model.tracking_info(self)
end
end
Then one can use simply:
a = Model.find(1)
a.tracking_info
# => [[...], [...]]

Related

Rails 5 ActiveRecord optional inclusive where for nested association's attribute

Assuming this simplified schema:
users has_many discount_codes
discount_codes has_many orders
I want to grab all users, and if they happen to have any orders, only include the orders that were created between two dates. But if they don't have orders, or have orders only outside of those two dates, still return the users and do not exclude any users ever.
What I'm doing now:
users = User.all.includes(discount_codes: :orders)
users = users.where("orders.created_at BETWEEN ? AND ?", date1, date2).
or(users.where(orders: { id: nil })
I believe my OR clause allows me to retain users who do not have any orders whatsoever, but what happens is if I have a user who only has orders outside of date1 and date2, then my query will exclude that user.
For what it's worth, I want to use this orders where clause here specifically so I can avoid n + 1 issues later in determining orders per user.
Thanks in advance!
It doesn't make sense to try and control the orders that are loaded as part of the where clause for users. If you were to control that it'd have to be part of the includes (which I think means it'd have to be a part of the association).
Although technically it can combine them into a single query in some cases, activerecord is going to do this as two queries.
The first query will be executed when you go to iterate over the users and will use that where clause to limit the users found.
It will then run a second query behind the scenes based on that includes statement. This will simply be a query to get all orders which are associated with the users that were found by the previous query. As such the only way to control the orders that are found through the user's where clause is to omit users from the result set.
If I were you I would create an instance method in User model for what you are looking for but instead of using where use a select block:
def orders_in_timespan(start, end)
orders.select{ |o| o.between?(start, end) }
end
Because of the way ActiveRecord will cache the found orders from the includes against the instance then if you start off with an includes in your users query then I believe this will not result in n queries.
Something like:
render json: User.includes(:orders), methods: :orders_in_timespan
Of course, the easiest way to confirm the number of queries is to look at the logs. I believe this approach should have two queries regardless of the number of users being rendered (as likely does your code in the question).
Also, I'm not sure how familiar you are with sql but you can call .to_sql on the end of things such as your users variable in order to see the sql that would be generated which might help shed some light on the discrepancies between what you're getting and what you're looking for.
Option 1: Write a custom query in SQL (ugly).
Option 2: Create 2 separate queries like below...
#users = User.limit(10)
#orders = Order.joins(:discount_code)
.where(created_at: [10.days.ago..1.day.ago], discount_codes: {user_id: users.select(:id)})
.group_by{|order| order.discount_code.user_id}
Now you can use it like this ...
#users.each do |user|
orders = #orders[user.id]
puts user.name
puts user.id
puts orders.count
end
I hope this will solve your problem.
You need to use joins instead of includes. Rails joins use inner joins and will reject all the records which don't have associations.
User.joins(discount_codes: :orders).where(orders: {created_at: [10.days.ago..1.day.ago]}).distinct
This will give you all distinct users who placed orders in a given period of time.
user = User.joins(:discount_codes).joins(:orders).where("orders.created_at BETWEEN ? AND ?", date1, date2) +
User.left_joins(:discount_codes).left_joins(:orders).group("users.id").having("count(orders.id) = 0")

Ordering a collection by instance method

I would like to order a collection first by priority and then due time like this:
#ods = Od.order(:priority, :due_date_time)
The problem is due_date_time is an instance method of Od, so I get
PG::UndefinedColumn: ERROR: column ods.due_date_time does not exist
I have tried the following, but it seems that by sorting and mapping ids, then finding them again with .where means the sort order is lost.
#ods = Od.where(id: (Od.all.sort {|a,b| a.due_date_time <=> b.due_date_time}.map(&:id))).order(:priority)
due_date_time calls a method from a child association:
def due_date_time
run.cut_off_time
end
run.cut_off_time is defined here:
def cut_off_time
(leave_date.beginning_of_day + route.cut_off_time_mins_since_midnight * 60)
end
I'm sure there is an easier way. Any help much appreciated! Thanks.
order from ActiveRecord similar to sort from ruby. So, Od.all.sort run iteration after the database query Od.all, run a new iteration map and then send a new database query. Also Od.all.sort has no sense because where select record when id included in ids but not searching a record for each id.
Easier do something like this:
Od.all.sort_by { |od| [od.priority, od.due_date_time] }
But that is a slow solution(ods table include 10k+ records). Prefer to save column to sort to the database. When that is not possible set logic to calculate due_date_time in a database query.

How to make ActiveRecord query unique by a column

I have a Company model that has many Disclosures. The Disclosure has columns named title, pdf and pdf_sha256.
class Company < ActiveRecord::Base
has_many :disclosures
end
class Disclosure < ActiveRecord::Base
belongs_to :company
end
I want to make it unique by pdf_sha256 and if pdf_sha256 is nil that should be treated as unique.
If it is an Array, I'll write like this.
companies_with_sha256 = company.disclosures.where.not(pdf_sha256: nil).group_by(&:pdf_sha256).map do |key,values|
values.max_by{|v| v.title.length}
end
companies_without_sha256 = company.disclosures.where(pdf_sha256: nil)
companies = companies_with_sha256 + companeis_without_sha256
How can I get the same result by using ActiveRecord query?
It is possible to do it in one query by first getting a different id for each different pdf_sha256 as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:
def unique_disclosures_by_pdf_sha256(company)
subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
company.disclosures.where(id: subquery)
.or(company.disclosures.where(pdf_sha256: nil))
end
The great thing about this is that ActiveRecord is lazy loaded, so the first subquery will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures unique by pdf_sha256 plus all the ones that have pdf_sha256 set to nil.
In case you are curious, given a company, the resulting query will be something like:
SELECT "disclosures".* FROM "disclosures"
WHERE (
"disclosures"."company_id" = $1 AND "disclosures"."id" IN (
SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
)
OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)
The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id instead of the whole model and limit the number of results returned by the database:
unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }
You can achieve this by using uniq method
Company.first.disclosures.to_a.uniq(&:pdf_sha256)
This will return you the disclosures records uniq by cloumn "pdf_sha256"
Hope this helps you! Cheers
Assuming you are using Rails 5 you could chain a .or command to merge both your queries.
pdf_sha256_unique_disclosures = company.disclosures.where(pdf_sha256: nil).or(company.disclosures.where.not(pdf_sha256: nil))
Then you can proceed with your group_by logic.
However, in the example above i'm not exactly sure what is the objective but I am curious to better understand how you would use the resulting companies variable.
If you wanted to have a hash of unique pdf_sha256 keys including nil, and its resultant unique disclosure document you could try the following:
sorted_disclosures = company.disclosures.group_by(&:pdf_sha256).each_with_object({}) do |entries, hash|
hash[entries[0]] = entries[1].max_by{|v| v.title.length}
end
This should give you a resultant hash like structure similar to the group_by where your keys are all your unique pdf_sha256 and the value would be the longest named disclosure that match that pdf_sha256.
Why not:
ids = Disclosure.select(:id, :pdf_sha256).distinct.map(&:id)
Disclosure.find(ids)
The id sill be distinct either way since it's the primary key, so all you have to do is map the ids and find the Disclosures by id.
If you need a relation with distinct pdf_sha256, where you require no explicit conditions, you can use group for that -
scope :unique_pdf_sha256, -> { where.not(pdf_sha256: nil).group(:pdf_sha256) }
scope :nil_pdf_sha256, -> { where(pdf_sha256: nil) }
You could have used or, but the relation passed to it must be structurally compatible. So even if you get same type of relations in these two scopes, you cannot use it with or.
Edit: To make it structurally compatible with each other you can see #AlexSantos 's answer
Model.select(:rating)
Result of this is an array of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:
Model.select(:rating).map(&:rating).uniq
or this (most efficient)
Model.uniq.pluck(:rating)
Model.distinct.pluck(:rating)
Update
Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).
Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']
In this case, deduplicate after the query
user.addresses.pluck(:city).uniq # => ['Moscow']

tricky union query using ruby on rails/active record

I have
a = Profile.last
a.mailbox.inbox
a.mailbox.sentbox
active_conversations = [IDS OF ACTIVE CONVERSATIONS]
a.mailbox.inbox & active_conversations
returns part of what I need
I want
(a.mailbox.inbox & active_conversations) AND a.mailbox.sentbox
but I need it as SQL, so that I can order it efficiently. I want to order it by ('updated_at')
I have tried joins and other things but they don't work. The classes of (a.mailbox.inboxa and the sentbox are
ActiveRecord::Relation::ActiveRecord_Relation_Conversation
but
(a.mailbox.inbox & active_conversations)
is an array
edit
Something as simple as a.mailbox.inbox JOINS SOMEHOW a.mailbox.sentbox I should be able to work with, but I also can't seem to figure out.
Instead of doing
(a.mailbox.inbox & active_conversations)
you should be able to do
a.mailbox.inbux.where('conversations.id IN (?)', active_conversations)
I believe the Conversation class (and its underlying conversations table) should be right according to the mailboxer code.
However this gives you an ActiveRelation object instead of an array. You can transform this to pure SQL using to_sql. So I think something like this should work:
# get the SQL of both statements
inbox_sql = a.mailbox.inbux.where('conversations.id IN (?)', active_conversations).to_sql
sentbox_sql = a.mailbox.sentbox.to_sql
# use both statements in a UNION SQL statement issued on the Conversation class
Conversation.from("#{inbox_sql} UNION #{sentbox_sql} ORDER BY id AS conversations")

Enforcing uniqueness on a relation based on a single column

I have a class method on my Consumer model that functions as a scope (acts on an ActiveRecord::Relation, returns an ActiveRecord::Relation, can be daisy-chained) and returns doubles of some consumers. I want it to only return unique consumers, and I can't seem to find either a way to do it with Rails helpers or the right SQL to pass in to get what I want. I feel like there's a simple solution here - it's not a complicated problem.
I essentially want Array#uniq, but for ActiveRecord::Relation. I tried a select DISTINCT statement
Consumer.joins(...).where(...).select('DISTINCT consumers.id')
It returned the correct 'uniqueness on consumers.id' property, but it only returned the consumer ids, so that when I eventually loaded the relation, the Consumers didn't have their other attributes.
Without select DISTINCT:
my_consumer = Consumer.joins(...).where(...).first
my_consumer.status # => "active"
With select DISTINCT:
my_consumer = Consumer.joins(...).where(...).select('DISTINCT consumers.id').first
my_consumer.status # => ActiveModel::MissingAttributeError: missing attribute: status
I didn't think that an ActiveRecord::Relation would ever load less than the whole model, but I guess it will with the select statement. When I query the Relation after the select statement for the class it contains, it returns Consumer, which seems strange.
.select('DISTINCT consumers.*')

Resources