How to define reputation system values in model - ruby-on-rails

How can I set up the default_scope in my blogging application so that the index orders the entries by an algorithm defined in the model?
If I were to use a HackerNews-like formula for the ranking algorithm as shown below, how can I define it in my model?
total_score = (votes_gained - 1) / (age_in_hours + 2)^1.5
The votes_gained variable relies on the Active_Record_Reputation_System, and is written as the following in my views:
votes_gained = #post.reputation_value_for(:votes).to_i
Finally, age_in_hours is pretty straight forward
age_in_hours = (Time.now - #post.created_at)/1.hour
How can I use these figures to order my blog posts index? I've been trying to figure out how to define total_score correctly in the model so that I can add it into the default scope as default_scope order("total_score DESC") or something similar. Direct substitution has not worked, and I'm not sure of how to "rephrase" each part of the formula.
How exactly should I define total_score? Thanks much for your insight!

Seeing as how you can't rely on active record to translate the formula into SQL, you have to write it yourself. The only potential concern here is that this is not a database-independent solution.
Since you are using Postgres, you can define your scope as (I haven't tested this yet, so let me know whether it works):
AGE_IN_HOURS = "(#{Time.now.tv_sec} - EXTRACT (EPOCH FROM posts.created_at))/3600"
TOTAL_SCORE = "(rs_reputations.value - 1)/((#{AGE_IN_HOURS}) + 2)^1.5"
default_scope joins("INNER JOIN rs_reputations ON rs_reputations.target_id = posts.id").where("rs_reputations.target_type = 'Post'").order(TOTAL_SCORE)
EDIT: Actually this won't work because, as it stands, Time.now is calculated one time (when the model loads), but it needs to be recalculated each time records are pulled. Use
default_scope lambda { order_by_score }
def self.order_by_score
age_in_hours = "(#{Time.now.tv_sec} - EXTRACT (EPOCH FROM posts.created_at))/3600"
total_score = "(rs_reputations.value - 1)/((#{age_in_hours}) + 2)^1.5"
joins("INNER JOIN rs_reputations ON rs_reputations.target_id = posts.id").where("rs_reputations.target_type = 'Post'").order(TOTAL_SCORE)
end

Related

Rails - How to use custom attribute in where?

I have Order model in which I have datetime column start and int columns arriving_dur, drop_off_dur, etc.. which are durations in seconds from start
Then in my model I have
class Order < ApplicationRecord
def finish_time
self.start + self.arriving_duration + self.drop_off_duration
end
# other def something_time ... end
end
I want to be able to do this:
Order.where(finish_time: Time.now..(Time.now+2.hours) )
But of course I can't, because there's no such column finish_time. How can I achieve such result?
I've read 4 possible solutions on SA:
eager load all orders and select it with filter - that would not work well if there were more orders
have parametrized scope for each time I need but that means soo much code duplication
have sql function for each time and bind it to model with select() - it's just pain
somehow use http://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html#method-i-attribute ? But I have no idea how to use it for my case or whether it even solves the problem I have.
Do you have any idea or some 'best practice' how to solve this?
Thanks!
You have different options to implement this behaviour.
Add an additional finish_time column and update it whenever you update/create your time values. This could be done in rails (with either before_validation or after_save callbacks) or as psql triggers.
class Order < ApplicationRecord
before_validation :update_finish_time
private
def update_finish_time
self.finish_time = start_time + arriving_duration.seconds + drop_off_duration.seconds
end
end
This is especially useful when you need finish_time in many places throughout your app. It has the downside that you need to manage that column with extra code and it stores data you actually already have. The upside is that you can easily create an index on that column should you ever have many orders and need to search on it.
An option could be to implement the finish-time update as a postgresql trigger instead of in rails. This has the benefit of being independent from your rails application (e.g. when other sources/scripts access your db too) but has the downside of splitting your business logic into many places (ruby code, postgres code).
Your second option is adding a virtual column just for your query.
def orders_within_the_next_2_hours
finishing_orders = Order.select("*, (start_time + (arriving_duration + drop_off_duration) * interval '1 second') AS finish_time")
Order.from("(#{finishing_orders.to_sql}) AS orders").where(finish_time: Time.now..(Time.now+2.hours) )
end
The code above creates the SQL query for finishing_order which is the order table with the additional finish_time column. In the second line we use that finishing_orders SQL as the FROM clause ("cleverly" aliased to orders so rails is happy). This way we can query finish_time as if it was a normal column.
The SQL is written for relatively old postgresql versions (I guess it works for 9.3+). If you use make_interval instead of multiplying with interval '1 second' the SQL might be a little more readable (but needs newer postgresql version, 9.4+ I think).

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

Computing ActiveRecord nil attributes

In my Rails app I have something like this in one of the models
def self.calc
columns_to_sum = "sum(price_before + price_after) as price"
where('product.created_at >= ?', 1.month.ago.beginning_of_day).select(columns_to_sum)
end
For some of the rows we have price_before and or price_after as nil. This is not ideal as I want to add both columns and call it price. How do I achieve this without hitting the database too many times?
You can ensure the NULL values to be calculated as 0 by using COALESCE which will return the first non NULL value:
columns_to_sum = "sum(COALESCE(price_before, 0) + COALESCE(price_after, 0)) as price"
This would however calculate the sum prices of all products.
On the other hand, you might not have to do this if all you want to do is have an easy way to calculate the price of one product. Then you could add a method to the Product model
def.price
price_before.to_i + price_after.to_i
end
This has the advantage of being able to reflect changes to the price (via price_before or price_after) without having to go through the db again as price_before and price_after will be fetched by default.
But if you want to e.g. select records from the db based on the price you need to place that functionality in the DB.
For that I'd modulize your scopes and join them again later:
def self.with_price
columns_to_sum = "(COALESCE(price_before, 0) + COALESCE(price_after, 0)) as price"
select(column_names, columns_to_sum)
end
This will return all records with an additional price reader method.
And a scope independent from the one before:
def self.one_month_ago
where('product.created_at >= ?', 1.month.ago.beginning_of_day)
end
Which could then be used like this:
Product.with_price.one_month_ago
This allows you to continue modifying the scope before hitting the DB, e.g. to get all Products where the price is higher than x
Product.with_price.one_month_ago.where('price > 5')
If you are trying to get the sum of price_before and price_after for each individual record (as opposed to a single sum for the entire query result), you want to do it like this:
columns_to_sum = "(coalesce(price_before, 0) + coalesce(price_after, 0)) as price"
I suspect that's what you're after, since you have no group in your query. If you are after a single sum, then the answer by #ulferts is correct.

Extract records which satisfy a model function in Rails

I have following method in a model named CashTransaction.
def is_refundable?
self.amount > self.total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
Now I need to extract all the records which satisfy the above function i.e records which return true.
I got that working by using following statement:
CashTransaction.all.map { |x| x if x.is_refundable? }
But the result is an Array. I am looking for ActiveRecord_Relation object as I need to perform join on the result.
I feel I am missing something here as it doesn't look that difficult. Anyways, it got me stuck. Constructive suggestions would be great.
Note: Just amount is a CashTransaction column.
EDIT
Following SQL does the job. If I can change that to ORM, it will still do the job.
SELECT `cash_transactions`.* FROM `cash_transactions` INNER JOIN `refunds` ON `refunds`.`cash_transaction_id` = `cash_transactions`.`id` WHERE (cash_transactions.amount > (SELECT SUM(`amount`) FROM `refunds` WHERE refunds.cash_transaction_id = cash_transactions.id GROUP BY `cash_transaction_id`));
Sharing Progress
I managed to get it work by following ORM:
CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
But what I was actually looking was something like:
CashTransaction.joins(:refunds).where(is_refundable? : true)
where is_refundable? being a model function. Initially I thought setting is_refundable? as attr_accesor would work. But I was wrong.
Just a thought, can the problem be fixed in an elegant way using Arel.
There are two options.
1) Finish, what you have started (which is extremely inefficient when it comes to bigger amount of data, since it all is taken into the memory before processing):
CashTransaction.all.map(&:is_refundable?) # is the same to what you've written, but shorter.
SO get the ids:
ids = CashTransaction.all.map(&:is_refundable?).map(&:id)
ANd now, to get ActiveRecord Relation:
CashTransaction.where(id: ids) # will return a relation
2) Move the calculation to SQL:
CashTransaction.where('amount > total_refunded_amount')
Second option is in every possible way faster and efficient.
When you deal with database, try to process it on the database level, with smallest Ruby involvement possible.
EDIT
According to edited question here is how you would achieve the desired result:
CashTransaction.joins(:refunds).where('amount > SUM(refunds.amount)')
EDIT #2
As to your updates in question - I don't really understand, why you have latched onto is_refundable? as an instance method, which could be used in query, which is basically not possible in AR, but..
My suggestion is to create a scope is_refundable:
scope :is_refundable, -> { CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
}
Now it is available in as short notation as
CashTransaction.is_refundable
which is shorter and more clear than aimed
CashTransaction.where('is_refundable = ?', true)
You can do it this way:
cash_transactions = CashTransaction.all.map { |x| x if x.is_refundable? } # Array
CashTransaction.where(id: cash_transactions.map(&:id)) # ActiveRecord_Relation
But, this is an in-efficient way of doing it as the other answerers also mentioned.
You can do it using SQL if amount and total_refunded_amount are the columns of the cash_transactions table in the database which will be much more efficient and performant:
CashTransaction.where('amount > total_refunded_amount')
But, if amount or total_refunded_amount are not the actual columns in the database, then you can't do it this way. Then, I guess you have do it the other way which is in-efficient than using raw SQL.
I think you should pre-compute is_refundable result (in a new column) when a CashTransaction and his refunds (supposed has_many ?) are updated by using callbacks :
class CashTransaction
before_save :update_is_refundable
def update_is_refundable
is_refundable = amount > total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
end
class Refund
belongs_to :cash_transaction
after_save :update_cash_transaction_is_refundable
def update_cash_transaction_is_refundable
cash_transaction.update_is_refundable
cash_transaction.save!
end
end
Note : The above code must certainly be optimized to prevent some queries
They you can query is_refundable column :
CashTransaction.where(is_refundable: true)
I think it's not bad to do this on two queries instead of a join table, something like this
def refundable
where('amount < ?', total_refunded_amount)
end
This will do a single sum query then use the sum in the second query, when the tables grow larger you might find that this is faster than doing a join in the database.

update_all with a method

Lets say I have a model:
class Result < ActiveRecord::Base
attr_accessible :x, :y, :sum
end
Instead of doing
Result.all.find_each do |s|
s.sum = compute_sum(s.x, s.y)
s.save
end
assuming compute_sum is a available method and does some computation that cannot be translated into SQL.
def compute_sum(x,y)
sum_table[x][y]
end
Is there a way to use update_all, probably something like:
Result.all.update_all(sum: compute_sum(:x, :y))
I have more than 80,000 records to update. Each record in find_each creates its own BEGIN and COMMIT queries, and each record is updated individually.
Or is there any other faster way to do this?
If the compute_sum function can't be translated into sql, then you cannot do update_all on all records at once. You will need to iterate over the individual instances. However, you could speed it up if there are a lot of repeated sets of values in the columns, by only doing the calculation once per set of inputs, and then doing one mass-update per calculation. eg
Result.all.group_by{|result| [result.x, result.y]}.each do |inputs, results|
sum = compute_sum(*inputs)
Result.update_all('sum = #{sum}', "id in (#{results.map(&:id).join(',')})")
end
You can replace result.x, result.y with the actual inputs to the compute_sum function.
EDIT - forgot to put the square brackets around result.x, result.y in the group_by block.
update_all makes an sql query, so any processing you do on the values needs to be in sql. So, you'll need to find the sql function, in whichever DBMS you're using, to add two numbers together. In Postgres, for example, i believe you would do
Sum.update_all(sum: "x + y")
which will generate this sql:
update sums set sum = x + y;
which will calculate the x + y value for each row, and set the sum field to the result.
EDIT - for MariaDB. I've never used this, but a quick google suggests that the sql would be
update sums set sum = sum(x + y);
Try this first, in your sql console, for a single record. If it works, then you can do
Sum.update_all(sum: "sum(x + y)")
in Rails.
EDIT2: there's a lot of things called sum here which is making the example quite confusing. Here's a more generic example.
set col_c to the result of adding col_a and col_b together, in class Foo:
Foo.update_all(col_c: "sum(col_a + col_b)")
I just noticed that i'd copied the (incorrect) Sum.all.update_all from your question. It should just be Sum.update_all - i've updated my answer.
I'm completely beginner, just wondering Why not add a self block like below, without adding separate column in db, you still can access Sum.sum from outside.
def self.sum
x+y
end

Resources