How to find records with maximum value (defined in associated model) per day - ruby-on-rails

I would like to fetch all records per day with highest priority (as defined in associated model)
I'm struggling to build this with activerecord (rails 4.2)
The problem is very similar to this one
Get records with max value for each group of grouped SQL results
except that the age would come from the second model
or also this one
with activerecord how can I select records based on the highest value of a field?
Model 1: Workduration:
date, duration
belongs_to :timerule
Model 2: Timerule:
priority
has_many :workdurations
I put together the data as follows (all in Workduration)
def self.withPrio
select("workdurations.*, timerules.prio AS prio").joins(:timerule)
end
I couldn't find the proper way to build the LEFT OUTER JOIN (self-join) on it.
Try-And-Error-Code:
Workduration.withPrio.joins("left join ? workdurations.date = wd2.date and workdurations.prio < wd2.prio", Workduration.withPrio)
Any help is appreciated!

I ended up doing this with (a big) find_by_sql and a second query to keep the scopes chainable:
scope :maxPrioIds, ->{find_by_sql('SELECT o.*
FROM
(SELECT workdurations.*, timerules.prio AS prio FROM "workdurations" INNER JOIN "timerules" ON "timerules"."id" = "workdurations"."timerule_id") o
LEFT JOIN (SELECT workdurations.*, timerules.prio AS prio FROM "workdurations" INNER JOIN "timerules" ON "timerules"."id" = "workdurations"."timerule_id") b
ON o.date = b.date AND o.prio < b.prio
WHERE b.prio is NULL').map(&:id)}
scope :relevant, -> {where(id: Workduration.maxPrioIds)}

Related

having in ActiveRecord

I have been trying to find a solution to my problem for a few days, so I am turning towards the community, hopefully I am not missing something obvious here.
I have 2 models in rails:
class Room
has_many :accesses
end
class Access
belongs_to :accessor, polymorphic: true
end
Accessor can be of 2 types: Person or Team
I am trying to find the most efficient way to find the rooms that a user has access to, but which are not accessible from any teams.
I tried:
Room.joins(:accesses).where(accesses: {accessor: Person.find(1234)}).where.not(accesses: {accessor_type: Team'})
But that returns the rooms that people have accesses to, it does not filter out the ones that Team AND People have access to.
I am thinking the having clause is the way to go, in which it would count the number of Teams accesses to rooms, and keep the rooms that have 0 team accesses. Though all my attempts are failing.
I would love to hear any advice.
Left join
Instead of using HAVING, which requires us to add a GROUP BY, I'd start with a LEFT JOIN and a WHERE.
You can do this by left-joining to the room_accesses table specifically on "Team" accessor_type. We're left-joining because we're going to scope this join to only team accesses, and select only the rows where no such accesses exist. An inner join would not return these rows at all. We'll need to use a table alias as we're already using the room_accesses table to join to the person you are looking up.
We may as well admit Rails isn't great at this level of query abstraction, so let's just construct the raw SQL fragments for our first solution:
person = Person.find(1234)
person.rooms.joins(
"LEFT JOIN room_accesses team_accesses
ON team_accesses.room_id = rooms.id
AND team_accesses.accessor_type = 'Team'"
).where("team_accesses.id IS NULL")
This generates, for SQLite,
SELECT "rooms".* FROM "rooms"
INNER JOIN "room_accesses"
ON "rooms"."id" = "room_accesses"."room_id"
LEFT JOIN room_accesses team_accesses
ON team_accesses.room_id = rooms.id
AND team_accesses.accessor_type = 'Team'
WHERE "room_accesses"."accessor_id" = 1
AND "room_accesses"."accessor_type" = 'Person'
AND (team_accesses.id IS NULL)
Having
You can do this with aHAVING by similarly joining to room_accesses again with the team_accesses alias, grouping by rooms.id (since we want at most one record per room), and selecting the groups HAVING a zero count of team accesses:
person.rooms.joins(
"LEFT JOIN room_accesses team_accesses
ON team_accesses.room_id = rooms.id
AND team_accesses.accessor_type = 'Team'"
).group("rooms.id").having("COUNT(team_accesses.id) = 0")
generates:
SELECT "rooms".* FROM "rooms"
INNER JOIN "room_accesses"
ON "rooms"."id" = "room_accesses"."room_id"
LEFT JOIN room_accesses team_accesses
ON team_accesses.room_id = rooms.id
AND team_accesses.accessor_type = 'Team'
WHERE "room_accesses"."accessor_id" = 1
AND "room_accesses"."accessor_type" = 'Person'
GROUP BY rooms.id
HAVING (COUNT(team_accesses.id) = 0)
Using associations instead of raw SQL
You can get halfway there in Rails by defining a scoped association:
class Room < ApplicationRecord
has_many :room_accesses
has_many :team_accesses, ->{ where accessor_type: "Team" }, class_name: "RoomAccess"
end
Assuming you're using a recent version of ActiveRecord, this allows you to do
person.rooms.left_joins(:team_accesses)
However, the table name used for this left joins is "team_accesses_rooms", which is predictable in this simple case but not part of the public API to my knowledge and subject to being changed if other joins are used in this same query. Still, if you're feeling daring:
person.rooms.left_joins(:team_accesses).where(team_accesses_rooms: {id: nil})
Frankly I would not recommend this method as you're relying on a table alias that you're not in control of and is not obvious where it comes from. With the raw SQL, you are in control of it and it's obvious where it came from.

Select records all of whose records exist in another join table

In the following book club example with associations:
class User
has_and_belongs_to_many :clubs
has_and_belongs_to_many :books
end
class Club
has_and_belongs_to_many :users
has_and_belongs_to_many :books
end
class Book
has_and_belongs_to_many :users
has_and_belongs_to_many :clubs
end
given a specific club record:
club = Club.find(params[:id])
how can I find all the users in the club who have all books in array of books?
club.users.where_has_all_books(books)
In PostgreSQL it can be done with a single query. (Maybe in MySQL too, I'm just not sure.)
So, some basic assumptions first. 3 tables: clubs, users and books, every table has id as a primary key. 3 join tables, books_clubs, books_users, clubs_users, each table contains pairs of ids (for books_clubs it will be [book_id, club_id]), and those pairs are unique within that table. Quite reasonable conditions IMO.
Building a query:
First, let's get ids of books from given club:
SELECT book_id
FROM books_clubs
WHERE club_id = 1
ORDER BY book_id
Then get users from given club, and group them by user.id:
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = 1
GROUP BY CU.user_id
Join these two queries by adding having to 2nd query:
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(##1##)
where ##1## is the 1st query.
What's going on here: Function array_agg from the left part creates a sorted list (of array type) of book_ids. These are books of user. ARRAY(##1##) from the right part returns the sorted list of books of the club. And operator #> checks if 1st array contains all elements of the 2nd (ie if user has all books of the club).
Since 1st query needs to be performed only once, it can be moved to WITH clause.
Your complete query:
WITH club_book_ids AS (
SELECT book_id
FROM books_clubs
WHERE club_id = :club_id
ORDER BY book_id
)
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = :club_id
GROUP BY CU.user_id
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(SELECT * FROM club_book_ids);
It can be verified in this sandbox: https://www.db-fiddle.com/f/cdPtRfT2uSGp4DSDywST92/5
Wrap it to find_by_sql and that's it.
Some notes:
ordering by book_id is not necessary; #> operator works with unordered arrays too. I just have a suspicion that comparison of ordered array is faster.
JOIN users U ON U.id = CU.user_id in 2nd query is only necessary for fetching user properties; in case of fetching user ids only it can be removed
It appears to work by grouping and counting.
club.users.joins(:books).where(books: { id: club.books.pluck(:id) }).group('users.id').having('count(*) = ?', club.books.count)
If anyone knows how to run the query without intermediate queries that would be great and I will accept the answer.
This looks like a situation where you'd make two queries, one to get all the ids you need, the other select perform a WHERE IN.

Increase performance: avoid looking for the right element in a collection

I have this situation.
activity.rb
belongs_to :user
belongs_to :cause
belongs_to :sub_cause
belongs_to :client
def amount
duration / 60.0 * user.hourly_cost_by_year(date.year).amount rescue 0
end
user.rb
has_many :hourly_costs # one hourly_cost for year
has_many :activities
def hourly_cost_by_year(year = Date.today.year)
hourly_costs.find { |hc| hc.year == year }
end
hourly_cost.rb
belongs_to :user
I have a big report where I achieved good performance (the number of SQL queries is fixed) but I think I could do better. The query I use is
activities = Activity.includes(:client, :cause, :sub_cause, user: :hourly_costs)
And this is ok, it's fast, but I think is improvable because hourly_cost_by_year method. I mean, activity has a date and I can use that date to know which of those hourly costs I should use. Something like this in activity
def self.user_with_single_hourly_cost
joins('LEFT JOIN users u ON u.id = activities.user_id').
joins('LEFT JOIN hourly_costs hc ON hc.user_id = u.id AND hc.year = EXTRACT(year from activities.date)')
end
But I don't how integrate this in my query. Whatever I tried did not work. I could use raw SQL but I'm trying to use ActiveRecord. I even thought to use redis to cache every hourly cost by user and year, could work, but I think this query, with the extract part, should do the best job because I'd have a flat table.
Update: I try to clarify. Whatever query I use in my action at some point I have to do
activities.sum(&:amount)
and that method, you know, is
def amount
duration / 60.0 * user.hourly_cost_by_year(date.year).amount rescue 0
end
And I don't know how to pick directly the hourly_cost I want without search between hourly_costs. Is this possible?
You may consider using Arel for this. Arel is the underlying query assembler for rails/activerecord (so no new dependencies) and can be very useful when building complex queries because it offers far more depth than the high level ActiveRecord::QueryMethods.
Obviously with a broader API comes more verbosity (which actually adds quite a bit to the readability) and less syntactical sugar which takes some getting used to but has proven indispensable for me on multiple occasions.
While I did not take the time to recreate your data structure something like this may work for you
activities = Activity.arel_table
users = User.arel_table
hourly_costs = HourlyCost.arel_table
activity_users_hourly_cost = activities
.join(users,Arel::Nodes::OuterJoin)
.on(activities[:user_id].eq(users[:id]))
.join(hourly_costs,Arel::Nodes::OuterJoin)
.on(hourly_costs[:user_id].eq(users[:id])
.and(hourly_costs[:year].eq(Arel::Nodes::Extract.new(activities[:date],'year'))
)
)
Activity.includes(:client, :cause, :sub_cause).joins(activity_users_hourly_cost.join_sources)
This will add the requested join e.g.
activity_users_hourly_cost.to_sql
#=> SELECT
FROM [activities]
LEFT OUTER JOIN [users] ON [activities].[user_id] = [users].[id]
LEFT OUTER JOIN [hourly_costs] ON [hourly_costs].[user_id] = [users].[id]
AND [hourly_costs].[year] = EXTRACT(YEAR FROM [activities].[date])
Update
If you just want to add the "hourly_cost" this should work for you
Activity.includes(:client, :cause, :sub_cause)
.joins(activity_users_hourly_cost.join_sources)
.select("activities.*, activities.duration / 60.0 * ISNULL([hourly_costs].[amount],0) as hourly_cost_by_year")
Please note that this will only return Activity objects but they will now have a method called hourly_cost_by_year which will return the result of that calculation. Full SQL will look like
SELECT
[activities].*,
activities.duration / 60.0 * ISNULL([hourly_costs].[amount],0) as hourly_cost_by_year
FROM [activities]
-- Dependant upon WHERE Clause
LEFT OUTER JOIN causes ON [activities].[cause_id] = [causes].[id]
LEFT OUTER JOIN sub_causes ON [activities].[subcause_id] = [subcauses].[id]
LEFT OUTER JOIN clients [activities].[client_id] = [clients].[id]
--
LEFT OUTER JOIN [users] ON [activities].[user_id] = [users].[id]
LEFT OUTER JOIN [hourly_costs] ON [hourly_costs].[user_id] = [users].[id]
AND [hourly_costs].[year] = EXTRACT(YEAR FROM [activities].[date])
You could build the select portion in Arel too if you like but seems overkill for such a simple statement.

Exclude object if one of the has_many related entities has the attribute with value x

I came across about the problem excluding data, if the attribute x of one of the associated data has the value 'a'.
Example:
class Order < ActiveRecord::Base
has_many :items
end
class Item < ActiveRecord::Base
belongs_to :order
validate_presence_of :status
end
The query should return all Orders that don't have an Item with status = 'paid' (status != 'paid').
Because of the 1:n association an Order can have many Items. And one of the Itmes can have the status = 'paid'. These Orders must be excluded from the result of my query even if the order has other items with status different from 'paid'.
How would I solve this problem:
paid_items = Items.where(status: 'paid').pluck(:order_id)
orders_wo_paid = Order.where('id NOT IN (?)', paid_items)
Is there an ActiveRecord solution, that solves this problem in one query.
Or are there other ways to solve this question?
I 'm not looking for ruby solution such as:
Order.select do |order|
!order.items.pluck(:status).include?('paid')
end
thx for ideas and inspirations.
You can do:
Order.where('orders.id NOT IN (?)', Item.where(status: 'paid').select(:order_id))
If you're using Rails 4.x then:
Order.where.not(id: Item.where(status: 'paid').select(:order_id))
The query you are interested in is the following, but creating with activerecord will be hard/no very readable:
SELECT
orders.*
FROM
orders
LEFT JOIN
order_items ON orders.id = order_items.order_id
GROUP BY
order_items.order_id
HAVING
COUNT(DISTINCT order_items.id) = COUNT(DISTINCT order_items.status <> 'paid')
Sorry for the sql indentation, I have no idea which are the conventions for it.
A way (not the best one at all) to it with rails (unfortunately writing sql for the most important parts) would be the following:
Order.group(:order_id).joins("LEFT JOIN order_items ON orders.id = order_items.order_id")
.having("COUNT(DISTINCT order_items.id) = COUNT(DISTINCT order_items.status <> 'paid')")
Of course you can play with AREL to get rid of the hard coded sql, but in my opinion it will not be easier to read.
You can have an example of creating lefts joins in this gist: https://gist.github.com/mildmojo/3724189

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

Resources