In Rails, I have some models that look like this:
class Product
has_many :listings
end
class Listing
belongs_to :product
# quantity_in_kg
# total_price
# price_per_kg = total_price / quantity_in_kg
end
I'd like to be able to compare the listings for a product based on the price per kilogram, compared to the price per kilogram for the product. For example, this listing is only $2 per kilogram, whereas the product's average is $3.
Eventually, I'd like to be able to run a query that says "give me all of the listings which are below the average price of their product".
What's an effective way of doing this? I was thinking of something custom with ActiveRecord callbacks, and caching the per-kilo average in the products table, and the per-kilo price for each listing in the listings table. There's probably a lot of scope for getting that wrong, so I was wondering if there was another way.
I'm using Postgres 9.6 and Rails 5.1.0.
(Bonus points: listings can also be active/inactive, and I'd only like to compare the average of all active listings).
My suggestion is to start with a simple after_save callback and see where it takes you. You can add some updating criteria like "only recalculate on create/destroy or if active has been updated", add some transactions if you're feeling extra careful, etc..
If gets too slow, add a background worker to update it regularly (for example).
class Listing
after_save do
product.update(
avg_price_per_kg: product.listings.where(active: true).average(:price_per_kg)
)
end
end
Related
Trying to avoid n+1 query
I'm working on a web based double entry accounting application that has the following basic models;
ruby
class Account < ApplicationRecord
has_many :splits
has_many :entries, through: :splits
end
class Entry < ApplicationRecord
has_many :splits, -> {order(:account_id)}, dependent: :destroy, inverse_of: :entry
attribute :amount, :integer
attribute :reconciled
end
class Split < ApplicationRecord
belongs_to :entry, inverse_of: :splits
belongs_to :account
attribute :debit, :integer
attribute :credit, :integer
attribute :transfer, :string
end
This is a fairly classic Accounting model, at least it is patterned after GnuCash, but it leads to somewhat complex queries. (From ancient history this is pretty much a 3rd normal form structure!)
First Account is a hierarchal tree structure (an Account belongs to a parent (except ROOT) and my have many children, children may also have many children, which I call a family). Most of these relations are covered in the Account model and optimized as much as you can a recursive structure.
An Account has many Entries(transactions) and entries must have at least two Splits that the sum of the Amount attribute(or Debits/Credits) must equal 0.
The primary use of this structure is to produce Ledgers, which is just a list of Entries and their associated Splits usually filtered by a date range. This is fairly simple if the account has no Family/Children
ruby
# self = a single Account
entries = self.entries.where(post_date:#bom..#eom).includes(:splits).order(:post_date,:numb)
It get more complex if you want a ledger of an account that has many children (I want a Ledger of all Current Assets)
ruby
def self.scoped_acct_range(family,range)
# family is a single account_id or array of account_ids
Entry.where(post_date:range).joins(:splits).
where(splits: {account_id:family}).
order(:post_date,:numb).distinct
end
While this works, I guess I have an n+1 query because if I use includes instead of joins I won't get all the splits for an Entry, only those in the family - I want all splits. That means it reloads(queries) the splits in the view. Also distinct is needed because a split could reference an account multiple time.
My question is there a better way to handle this three model query?
I threw together a few hacks, one going backwards from splits:
ruby
def self.scoped_split_acct_range(family,range)
# family is a single account_id or array of account_ids
# get filtered Entry ids
entry_ids = Split.where(account_id:family).
joins(:entry).
where(entries:{post_date:range}).
pluck(:entry_id).uniq
# use ids to get entries and eager loaded splits
Entry.where(id:eids).includes(:splits).order(:post_date,:numb)
end
This also works and by the ms reported in log, may even be faster. Normal use of either would be looking at 50 or so Entries for a month, but then you can filter a years worth of transactions - but you get what you asked for. For normal use, an ledger for a month is about 70ms, Even a quarter is around 100ms.
I've used a few attributes in both Splits and Accounts that got rid a few view level queries. Transfer is basically concatenated Account names going up the tree.
Again, just looking to see if I'm missing something and there is a better way.
Using a nested select is the proper option IMO.
You can optimize your code with the nested select to use the following:
entry_ids = Entry.where(post_date: range)
.joins(:splits)
.where(post_date: range, splits: { account_id: family })
.select('entries.id')
.distinct
Entry.where(id: entry_ids).includes(:splits).order(:post_date,:numb)
This will generate a single SQL statement with a nested select, instead of having 2 SQL queries: 1 to get the Entry ids and pass it to Rails and 1 other query to select entries based on those ids.
The following gem, developed by an ex-colleague, can help you deal with this kind of stuff: https://github.com/MaxLap/activerecord_where_assoc
In your case, it would enable you to do the following:
Entry.where_assoc_exists(:splits, account_id: 123)
.where(post_date: range)
.includes(:splits)
.order(:post_date, :numb)
Which does the same thing as I suggested but behind the scene.
I want to know if it's possible to achieve something like this: Product.last(cost: 100)
Where it will return the product, but with a virtually updated cost attribute (so there is no change to the database).
Here is the scenario, maybe someone can tell me a more elegant way of doing what I am trying to: I have three models, Order, LineItem, and Product. LineItem belongs_to Order and Order has_many LineItems. LineItem is polymorphic. Order has_many products through LineItems (but Product does not belong to LineItem or Order). Order.products << Product.first works as I would expect, the line item is assigned the product and the order will save correctly. However.. Each line item in an order has a cost attribute that by default inherits from the cost of the product. I want to modify the cost attribute on LineItem without modifying it on Product (i.e. a Product has some sort of temporary discount on it that adjusts the price). Am I going about this in the best way?
Not sure I understood your question, but you you can do this to get an object and change one of its attribute's value locally:
#product =
Product.last.tap do |product|
product.cost = 100
product.readonly! # to ensure you won't persist this change
end
Many content-aggregators, like reddit or hackernews, sort their stories with an algorithm based on a combination of the number of upvotes and the time since it has been submitted. The simplest way to implement such sorting would be to make a function in the database that would calculate the ranking for every item and sort based on that - but that would quickly become inefficient, since it would require to calculate the ranking for all the items on each query.
Another way would be to save the ranking for every item in the database. But when would I recalculate it? If I did it only when a submission was voted on, then those which were not voted on would stay with the same ranking even though they should fall because of passing time. So, what's the best way to implement such sorting? I'm not asking what's the best algorithm for that, but how it would be applied.
You should store number of votes with Rails' standard counter_cache:
class Vote < ActiveRecord::Base
belongs_to :post, counter_cache: true
end
class Post < ActiveRecord::Base
# Migration for this model should create 'votes_count' column:
# t.integer :votes_count
has_many :votes, dependent: :destroy
end
What about applying complex algorithm, especially one which has time as one of its variables/parameters, well, I doubt there is better way (in general – you can come up with more simple algorithm to save efforts) than recalculating rating each midnight.
In Rails, you can use whenever gem:
# config/schedule.rb
every 1.day, at: '12am' do
runner Post.update_rating
end
# post.rb
class Post < ActiveRecord::Base
# Add 'rating' column in migration
# t.integer :rating
def self.update_rating
# Ruby-way
# self.find_in_batches do |batch|
# batch.each do |post|
# post.update(rating: (post.votes.to_f / (Date.today - post.created_at.to_date)).to_i)
# end
# end
# SQL-way (probably a quicker one, but you should think on how
# not to lock your database for a long period of time)
# SOME_DATE_FUNCTION will depend on DB engine that you use
self.update_all(:rate, "votes_count / SOME_DATE_FUNCTION(created_at, NOW())")
end
end
UPD. Response to the points made in comments.
If we're dealing with a site that is as dynamic as reddit, we should refresh the ratings more often.
As far as I understand you consider Reddit's main page as a reference on posts sorted by rating (which is calculated using vote count and age penalty). I seriously doubt that Reddit recalculates 'age penalty' more often than once per day. Of course, it could calculate vote counts in real-time (and that's what you can do easily with counter_cache in Rails).
So when new records suddenly gets 1000 upvotes, it goes to the main page immediately. But then each midnight it will receive penalty which equals to, say, 20% of the votes it received that day.
Well, you said that we won't discuss the specific algorithm, which could be really complex :)
Also, do you think the ranking should in addition to that be recalculated each time a post is upvoted?
Of course, nothing should stop you to update 'vote counts' part of rating in real-time (and show records based on the resulting rating). Then you can calculate 'age penalty' for each record once per day, sinking old posts a bit.
I've got a Match model and a Team model.
I want to count how many goals a Team scores during the league (so I have to sum all the scores of that team, in both home_matches and away_matches).
How can I do that? What columns should I put into the matches and teams database tables?
I'd assume your Match model looks something like this:
belongs_to :home_team, class_name:"Team"
belongs_to :away_team, class_name:"Team"
attr_accessible :home_goal_count, :away_goal_count
If so, you could add a method to extract the number of goals:
def goal_count
home_matches.sum(:home_goal_count) + away_matches.sum(:away_goal_count)
end
Since this could be expensive (especially if you do it often), you might just cache this value into the team model and use an after_save hook on the Match model (and, if matches ever get deleted, then an after_destroy hook as well):
after_save :update_team_goals
def update_team_goals
home_team.update_attribute(:goal_count_cache, home_team.goal_count)
away_team.update_attribute(:goal_count_cache, away_team.goal_count)
end
Since you want to do this for leagues, you probably want to add a belongs_to :league on the Match model, a league parameter to the goal_count method (and its query), and a goal_count_cache_league column if you want to cache the value (only cache the most recently changed with my suggested implementation, but tweak as needed).
You dont put that in any table. Theres a rule for databases: Dont ever store data in your database that could be calculated from other fields.
You can calcuate that easyly using this function:
def total_goals
self.home_matches.collect(&:home_goals).inject(&:+)+self.away_matches.collect(&:away_goals).inject(&:+)
end
that should do it for you. If you want the mathes filtered for a league you can use a scope for that.
Live site: http://iatidata.heroku.com
Github: https://github.com/markbrough/IATI-Data
Based on aid information released through the IATI Registry: iatiregistry.org
I'm a bit of a Rails n00b so sorry if this is a really stupid question.
There are two key Models in this app:
Activity - which contains details
such as recipient country, funding
organisation
Transaction - which contains details such as how much money (value) was committed or disbursed (transaction_type), when, to whom, etc.
All Transactions nest under an Activity. Each Activity has multiple Transactions. They are connected together by activity_id. has_many :transactions and belongs_to :activity are defined in the Activity and Transaction Models respectively.
So: all of this works great when I'm trying to get details of transactions for a single activity - either when looking at a single activity (activity->show) or looping through activities on the all activities page (activity->index). I just call
#activities.each do |activity|
activity.transactions.each do |transaction|
transaction.value # do something like display it
end
end
But what I now really want to do is to get the sum of all transactions for all activities (subject to :conditions for the activity).
What's the best way to do this? I guess I could do something like:
#totalvalue = 0
#activities.each do |activity|
activity.transactions.each do |transaction|
#totalvalue = #totalvalue + transaction.value
end
end
... but that doesn't seem very clean and making the server do unnecessary work. I figure it might be something to do with the model...?! sum() is another option maybe?
This has partly come about because I want to show the total amount going to each country for the nice bubbles on the front page :)
Thanks very much for any help!
Update:
Thanks for all the responses! So, this works now:
#thiscountry_activities.each do |a|
#thiscountry_value = #thiscountry_value + a.transactions.sum(:value)
end
But this doesn't work:
#thiscountry_value = #thiscountry_activities.transactions.sum(:value)
It gives this error:
undefined method `transactions' for #<Array:0xb5670038>
Looks like I have some sort of association problem. This is how the models are set up:
class Transaction < ActiveRecord::Base
belongs_to :activity
end
class Activity < ActiveRecord::Base
has_and_belongs_to_many :policy_markers
has_and_belongs_to_many :sectors
has_many :transactions
end
I think this is probably quite a simple problem, but I can't work out what's going on. The two models are connected together via id (in Activity) and activity_id (in Transactions).
Thanks again!
Use Active Record's awesome sum method, available for classes:
Transaction.sum(:value)
Or, like you want, associations:
activity.transactions.sum(:value)
Let the database do the work:
#total_value = Transaction.sum(:value)
This gives the total for all transactions. If you have some activities already loaded, you can filter them this way:
#total_value = Transaction.where(:activity_id => #activities.map(&:id)).sum(:value)
You can do it with one query:
#total_value = Transaction.joins(:activity).where("activities.name" => 'foo').sum(:value)
My code was getting pretty messy summing up virtual attributes. So I wrote this little method to do it for me. You just pass in a collection and a method name as a string or symbol and you get back a total. I hope someone finds this useful.
def vsum collection, v_attr # Totals the virtual attributes of a collection
total = 0
collection.each { |collect| total += collect.method(v_attr).call }
return total
end
# Example use
total_credits = vsum(Account.transactions, :credit)
Of course you don't need this if :credit is a table column. You are better off using the built in ActiveRecord method above. In my case i have a :quantity column that when positive is a :credit and negative is a :debit. Since :debit and :credit are not table columns they can't be summed using ActiveRecord.
As I understood, you would like to have the sum of all values of the transaction table. You can use SQL for that. I think it will be faster than doing it the Ruby way.
select sum(value) as transaction_value_sum from transaction;
You could do
#total_value = activity.transactions.sum(:value)
http://ar.rubyonrails.org/classes/ActiveRecord/Calculations/ClassMethods.html