Rails 4 - Getting most sold objects with a relationship in between - ruby-on-rails

I have a Product model with :name and price. I also have a Order model with :amount (of units sold) and belongs_to :product
How could I get a array of the top 5 most sold object, in terms on units sold?
I was thinking of getting something like:
{"Razors"=>4, "Axes"=>2, "Cars"=>1, ...}
P.S.: If possible, how would it be getting the top 5 most sold objects, in terms of income made?

You can pull all needed data with one single SQL query.
You didn't mention database you are using, but here are some examples:
MySQL:
SELECT products.*, SUM(amount) total_amount FROM orders
LEFT JOIN products on orders.product_id = products.id
GROUP BY product_id ORDER BY total_amount DESC LIMIT 5
PostgreSQL:
SELECT products.*, SUM(amount) total_amount FROM orders
LEFT JOIN products on orders.product_id = products.id
GROUP BY products.id, products.name ORDER BY total_amount DESC LIMIT 5
Conctruct the query using Rails or use find_by_sql method to insert raw SQL.

This is a very long winded solution but answers your original question I think.
#Get only products that have a price and name. (may not be required).
products = Product.includes(:orders).where.not(name: nil, price: nil)
#Sets up an empty hash
hash = {}
products.each do |product|
product.orders.each do |order|
hash[product.name] = [order.amount]
end
end
Then you can sort your hash based on the values to sort by the amount of units sold (and also limit to 5)
Hash[hash.sort_by{|k, v| v}.reverse].take(5)
Interested to what others suggest via an ActiveRecord or SQL query though.

Related

Update models based on association record value

I have thousands of price comparators where each of them have many products. The comparator has an attribute :minimum_price which is the minimum price of it's products. What would be the fastest way to update all comparators :minimum_price
Comparator.rb
has_many :products
Product.rb
belongs_to :comparator
Let's imagine the following:
comparator_1 have 3 products with a price of 3, 5, 7
comparator_2 have 2 products with a price of 2, 4
How could I update all comparators :minimum_price in one query ?
Updating all in one query will require the use of a CTE which are not supported by default by ActiveRecord. There are libraries that provide you with tools to use them in Rails (e.g. this) or you can also do it with a direct query like this:
ActiveRecord::Base.connection.execute("
update comparators set minimum_price = min_vals.min_price
from (
select comparators.id as comp_id, min(products.price) as min_price
from comparators inner join products on comparators.id = products.comparator_id
group by comparators.id
) as min_vals
where comparators.id = min_vals.comp_id
")
NOTE: This is a postgresql query, so the syntax may vary slightly if it's a different database.
Try this, but I don't know how you store your minimum values.
Comparator.update_all([
'minimum_price = ?, updated_at = ?',
Product.find(self.product_id).price, Time.now
])

Select records all of whose records exist in another join table

In the following book club example with associations:
class User
has_and_belongs_to_many :clubs
has_and_belongs_to_many :books
end
class Club
has_and_belongs_to_many :users
has_and_belongs_to_many :books
end
class Book
has_and_belongs_to_many :users
has_and_belongs_to_many :clubs
end
given a specific club record:
club = Club.find(params[:id])
how can I find all the users in the club who have all books in array of books?
club.users.where_has_all_books(books)
In PostgreSQL it can be done with a single query. (Maybe in MySQL too, I'm just not sure.)
So, some basic assumptions first. 3 tables: clubs, users and books, every table has id as a primary key. 3 join tables, books_clubs, books_users, clubs_users, each table contains pairs of ids (for books_clubs it will be [book_id, club_id]), and those pairs are unique within that table. Quite reasonable conditions IMO.
Building a query:
First, let's get ids of books from given club:
SELECT book_id
FROM books_clubs
WHERE club_id = 1
ORDER BY book_id
Then get users from given club, and group them by user.id:
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = 1
GROUP BY CU.user_id
Join these two queries by adding having to 2nd query:
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(##1##)
where ##1## is the 1st query.
What's going on here: Function array_agg from the left part creates a sorted list (of array type) of book_ids. These are books of user. ARRAY(##1##) from the right part returns the sorted list of books of the club. And operator #> checks if 1st array contains all elements of the 2nd (ie if user has all books of the club).
Since 1st query needs to be performed only once, it can be moved to WITH clause.
Your complete query:
WITH club_book_ids AS (
SELECT book_id
FROM books_clubs
WHERE club_id = :club_id
ORDER BY book_id
)
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = :club_id
GROUP BY CU.user_id
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(SELECT * FROM club_book_ids);
It can be verified in this sandbox: https://www.db-fiddle.com/f/cdPtRfT2uSGp4DSDywST92/5
Wrap it to find_by_sql and that's it.
Some notes:
ordering by book_id is not necessary; #> operator works with unordered arrays too. I just have a suspicion that comparison of ordered array is faster.
JOIN users U ON U.id = CU.user_id in 2nd query is only necessary for fetching user properties; in case of fetching user ids only it can be removed
It appears to work by grouping and counting.
club.users.joins(:books).where(books: { id: club.books.pluck(:id) }).group('users.id').having('count(*) = ?', club.books.count)
If anyone knows how to run the query without intermediate queries that would be great and I will accept the answer.
This looks like a situation where you'd make two queries, one to get all the ids you need, the other select perform a WHERE IN.

How to write sub query in active record?

I have two tables users and posts and they have association of has_many. I want to fetch details of both users and posts in a single query. I'm able to manage the sql query but I don't want to use the raw query in the code (using execute method) as i think it is kind of simple thing and can be written using active record.
Here is the sql query
SELECT a.id, a.name, a.timestamp, b.id, b.user_id, b.title
FROM users a
INNER JOIN (SELECT id, user_id, title, from, to FROM posts) b on b.user_id = a.id
where id IN ( 1, 2, 3);
I think includes does not help here because i'm dealing with large data.
Can any one help me ?
If you just want those specific columns and nothing else then this will work
User.joins(:post)
.where(id: [1,2,3])
.select("users.id, users.name, users.timestamp,
posts.id as post_id, posts.user_id as post_user_id,
posts.title as post_title")
This will return an ActiveRecord::Relation of User objects with virtual attributes for post_id, post_user_id (Not sure why you need this one since you already selected users.id), and post_title.
The query produced will be
SELECT users.id,
users.name,
users.timestamp,
posts.id as post_id,
posts.user_id as post_user_id,
posts.title as post_title
FROM users
INNER JOIN posts on posts.user_id = users.id
where users.id IN ( 1, 2, 3);
Please note you may have multiple User objects, one for each Post, just as the SQL query does.
You can execute your exact query using the string version of joins e.g.
User.joins("INNER JOIN (SELECT id, user_id, title, from, to FROM posts) b on b.user_id = users.id")
.where(id: [1,2,3])
.select("users.id, users.name, users.timestamp,
b.id as post_id, b.user_id as post_user_id,
b.title as post_title")
Additionally to avoid some of the overhead you can use arel instead e.g.
users_table = User.arel_table
posts_table = Post.arel_table
query = users_table.project(Arel.star)
.join(posts_table)
.on(posts_table[:user_id].eq(users_table[:id]))
.where(users_table[:id].in([1,2,3]))
ActiveRecord::Base.connection.exec_query(query.to_sql)
This will return an ActiveRecord::Result with 2 useful methods columns (the columns selected) and rows. You can convert this to a Hash(#to_hash) but note that any columns with duplicate names (id for instance) will overwrite one another.
You could fix this by specifying the colums you want selected in the project portion. e.g. your current query would be:
query = users_table.project(
users_table[:id],
users_table[:name],
users_table[:timestamp],
posts_table[:id].as('post_id'),
posts_table[:user_id].as('post_user_id'),
posts_table[:title].as('post_title')
).join(posts_table)
.on(posts_table[:user_id].eq(users_table[:id]))
.where(users_table[:id].in([1,2,3]))
ActiveRecord::Base.connection.exec_query(query.to_sql).to_hash
Since none of the names collide now it can be structured into a nice Hash where the keys are the column names and the values or the row value for that record.
users = User.joins(:posts).includes(:posts).where(id: [1, 2, 3])
Will give you all the users with theirs posts.
then you can do whatever you want with them, but to access posts data for first retrieved user
first_user_posts = users.first.posts # this will not make additional DB queries as you used includes and data is already added
We use joins to have INNER JOIN statement in the SQL
We use includes to load all posts in the memory
I have two tables users and posts and they have association of
has_many. I want to fetch details of both users and posts in a single
query.
can be done with includes like
users = User.includes(:posts).where({posts: {user_id: [1,2,3]}})
other is eager_load and preload you can use as per your requirements, for more https://blog.arkency.com/2013/12/rails4-preloading/

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

Activerecord opitimization - best way to query all at once?

I am trying to achieve by reducing the numbers of queries using ActiveRecord 3.0.9. I generated about 'dummy' 200K customers and 500K orders.
Here's Models:
class Customer < ActiveRecord::Base
has_many :orders
end
class Orders < ActiveRecord::Base
belongs_to :customer
has_many :products
end
class Product < ActiveRecord::Base
belongs_to :order
end
when you are using this code in the controller:
#customers = Customer.where(:active => true).paginate(page => params[:page], :per_page => 100)
# SELECT * FROM customers ...
and use this in the view (I removed HAML codes for easier to read):
#order = #customers.each do |customer|
customer.orders.each do |order| # SELECT * FROM orders ...
%td= order.products.count # SELECT COUNT(*) FROM products ...
%td= order.products.sum(:amount) # SELECT SUM(*) FROM products ...
end
end
However, the page is rendered the table with 100 rows per page. The problem is that it kinda slow to load because its firing about 3-5 queries per customer's orders. thats about 300 queries to load the page.
There's alternative way to reduce the number of queries and load the page faster?
Notes:
1) I have attempted to use the includes(:orders), but it included more than 200,000 order_ids. that's issue.
2) they are already indexed.
If you're only using COUNT and SUM(amount) then what you really need is to retrieve only that information and not the orders themselves. This is easily done with SQL:
SELECT customer_id, order_id, COUNT(id) AS order_count, SUM(amount) AS order_total FROM orders LEFT JOIN products ON orders.id=products.order_id GROUP BY orders.customer_id, products.order_id
You can wrap this in a method that returns a nice, orderly hash by re-mapping the SQL results into a structure that fits your requirements:
class Order < ActiveRecord::Base
def self.totals
query = "..." # Query from above
result = { }
self.connection.select_rows(query).each do |row|
# Build out an array for each unique customer_id in the results
customer_set = result[row[0].to_i] ||= [ ]
# Add a hash representing each order to this customer order set
customer_set << { :order_id => row[1].to_i, :count => row[2].to_i, :total => row[3].to_i } ]
end
result
end
end
This means you can fetch all order counts and totals in a single pass. If you have an index on customer_id, which is imperative in this case, then the query will usually be really fast even for large numbers of rows.
You can save the results of this method into a variable such as #order_totals and reference it when rendering your table:
- #order = #customers.each do |customer|
- #order_totals[customer.id].each do |order|
%td= order[:count]
%td= order[:total]
You can try something like this (yes, it looks ugly, but you want performance):
orders = Order.find_by_sql([<<-EOD, customer.id])
SELECT os.id, os.name, COUNT(ps.amount) AS count, SUM(ps.amount) AS amount
FROM orders os
JOIN products ps ON ps.order_id = os.id
WHERE os.customer_id = ? GROUP BY os.id, os.name
EOD
%td= orders.name
%td= orders.count
%td= orders.amount
Added: Probably it is better to create count and amount cache in Orders, but you will have to maintain it (count can be counter-cache, but I doubt there is a ready recipe for amount).
You can join the tables in with Arel (I prefer to avoid writing raw sql when possible). I believe that for your example you would do something like:
Customer.joins(:orders -> products).select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
The first method--
Customer.joins(:orders -> products)
--pulls in the nested association in one statement. Then the second part--
.select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
--specifies exactly what columns you want back.
Chain those and I believe you'll get a list of Customer instances only populated with what you've specified in the select method. You have to be careful though because you now have in hand read only objects that are possibly in in invalid state.
As with all the Arel methods what you get from those methods is an ActiveRecord::Relation instance. It's only when you start to access that data that it goes out and executes the SQL.
I have some basic nervousness that my syntax is incorrect but I'm confident that this can be done w/o relying on executing raw SQL.

Resources