How do I calculate the most popular combination of a order lines? (or any similar order/order lines db arrangement) - ruby-on-rails

I'm using Ruby on Rails. I have a couple of models which fit the normal order/order lines arrangement, i.e.
class Order
has_many :order_lines
end
class OrderLines
belongs_to :order
belongs_to :product
end
class Product
has_many :order_lines
end
(greatly simplified from my real model!)
It's fairly straightforward to work out the most popular individual products via order line, but what magical ruby-fu could I use to calculate the most popular combination(s) of products ordered.
Cheers,
Graeme

My suggestion is to create an array a of Product.id numbers for each order and then do the equivalent of
h = Hash.new(0)
# for each a
h[a.sort.hash] += 1
You will naturally need to consider the scale of your operation and how much you are willing to approximate the results.
External Solution
Create a "Combination" model and index the table by the hash, then each order could increment a counter field. Another field would record exactly which combination that hash value referred to.
In-memory Solution
Look at the last 100 orders and recompute the order popularity in memory when you need it. Hash#sort will give you a sorted list of popularity hashes. You could either make a composite object that remembered what order combination was being counted, or just scan the original data looking for the hash value.

Thanks for the tip digitalross. I followed the external solution idea and did the following. It varies slightly from the suggestion as it keeps a record of individual order_combos, rather than storing a counter so it's possible to query by date as well e.g. most popular top 10 orders in the last week.
I created a method in my order which converts the list of order items to a comma separated string.
def to_s
order_lines.sort.map { |ol| ol.id }.join(",")
end
I then added a filter so the combo is created every time an order is placed.
after_save :create_order_combo
def create_order_combo
oc = OrderCombo.create(:user => user, :combo => self.to_s)
end
And finally my OrderCombo class looks something like below. I've also included a cached version of the method.
class OrderCombo
belongs_to :user
scope :by_user, lambda{ |user| where(:user_id => user.id) }
def self.top_n_orders_by_user(user,count=10)
OrderCombo.by_user(user).count(:group => :combo).sort { |a,b| a[1] <=> b[1] }.reverse[0..count-1]
end
def self.cached_top_orders_by_user(user,count=10)
Rails.cache.fetch("order_combo_#{user.id.to_s}_#{count.to_s}", :expiry => 10.minutes) { OrderCombo.top_n_orders_by_user(user, count) }
end
end
It's not perfect as it doesn't take into account increased popularity when someone orders more of one item in an order.

Related

Include scalar on active record relation?

Starting with two tables
class Single
has_many :manies
def total_value
manies.sum(:value)
end
end
class Many
belongs_to: single
# also has an integer 'value'
end
singles = Singles.all.includes(:total_value, :total_value2)
singles.each {|s| s.total_value } # Makes a new sql call for each single
I need to list many singles, and for each single I need to total up all of the values for that single. If I just call total_value it ends up with N+1 SQL calls.
It is pretty trivial to do this in a single raw SQL query, but I can't figure out how to tell active record to do it.
Grouping.. kindof works?
Many.group(:single).sum(:value)
=> {#<Single id: "d80b4132-7ef1-4fe1-a9e0-7da89c00d295">=>0.171322e4}
But it returns a hash, not an ActiveRecord relation.. not sure what that is going to do when I have thousands of singles but it doesn't seem good.
Plus.. now I want to add a second scalar on value2. Still fairly trivial in SQL.
What I would prefer to do is use includes.. but I believe it only works on associations, and I can't figure out how to make has_one work on a scalar.
e.g. something like this
class Single
has_many :manies
has_one :total_value, -> { many.sum(:value) }
has_one :total_value2, -> { many.sum(:value2) }
end
class Many
belongs_to: single
# also has an integer 'value'
end
singles = Singles.all.includes(:total_value, :total_value2)
singles.each {|s| [s.total_value, s.total_value2] } # already cached, does not make SQL call

Does splitting up an active record query over 2 methods hit the database twice?

I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end
Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end

Ruby/rails sort_by not ordering properly?

I have a long array of Photo model objects, and I want to sort them by created_at, newest first, then get a new array with the first 21 photos.
My problem is that the final array is not ordered properly.
Here is my code:
#recent_photos = photos.sort_by(&:created_at).reverse.first(21)
when I print out #recent_photos the created_at values are ordered like this:
1458948707
1458943713
1458947042
1458945171
...
What is the correct way to sort objects?
UPDATE:
here's how the initial list is compiled:
photos = #user.photos
#following = #user.following
#following.each do |f|
photos += f.photos if f.id != #user.id
end
#user.memberships.each do |group|
photos += group.photos
end
SOLUTION:
problem was with the question - I wanted to sort by timestamp not created_at, and those were timestamp values in the output
You can crunch it all down into a single query:
#recent_photos = Photo.where(
user_id: #user.following_ids
).order('created_at DESC').limit(21)
You really do not want to be doing N queries for each of these as it will get slower and slower as a person has more people they're following. If they follow 10,000 people that's a ridiculous number of queries.
If you add a :through definition to your model you may even be able to query the photos directly:
has_many :follower_photos,
class_name: 'Photo',
through: :followers
Whatever your constraints are, boil them down to something you can query in one shot whenever possible. If that's not practical, get it down to a predictable number of queries, never N.
Try:
#recent_photos = Photo.order('created_at desc').first(21)

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Paginate through a randomized list of blog posts using will_paginate

I want to give users the ability to page through my blog posts in random order.
I can't implement it like this:
#posts = Post.paginate :page => params[:page], :order => 'RANDOM()'
since the :order parameter is called with every query, and therefore I risk repeating blog posts.
What's the best way to do this?
RAND accepts a seed in MySQL:
RAND(N)
From the MySQL docs:
RAND(), RAND(N)
Returns a random floating-point value
v in the range 0 <= v < 1.0. If a
constant integer argument N is
specified, it is used as the seed
value, which produces a repeatable
sequence of column values. In the
following example, note that the sequences of values produced by RAND(3) is the same both places where it occurs.
Other databases should have similar functionality.
If you use the SAME seed each time you call RAND, the order will be consistent across requests and you can paginate accordingly.
You can then store the seed in the user's session - so each user will see a set of results unique to them.
To avoid each page (generated from a new request) potentially having a repeated post you'll need to store the order of posts somewhere for retrieval over multiple requests.
If you want each user to have a unique random order then save the order in a session array of IDs.
If you don't mind all users having the same random order then have a position column in the posts table.
You could :order => RANDOM() on your original query that populates #posts, and then when you paginate, don't specify the order.
Create a named scope on your Post model that encapsulates the random behaviour:
class Post < ActiveRecord::Base
named_scope :random, :order => 'RANDOM()'
.
.
.
end
Your PostsController code then becomes:
#posts = Post.random.paginate :page => params[:page]

Resources