How to update thousands of records - ruby-on-rails

I have to update an age column based on the value in a date of birth column. There are thousands of records to update.
How do I do this using rails?
Is this the right way to do it?
User.update_all(:age => some_method);
def some_method
age = Date.today.year - dob.year
end

Yes, update_all is the right method but no, you can't do it like this. Your some_method will only get called once to set up a database call (I assume you're persisting to a database). You'll then get an error because dob won't be recognised in the scope of the User class.
You'll need to translate your date logic to SQL functions.
Something like (for mysql):
User.update_all("age = year(now()) -
year(dob) -
(DATE_FORMAT(now(), '%m%d') < DATE_FORMAT(dob, '%m%d'))")
(NB. the date_format stuff is so that you get the right age for people who's birthdays are later in the year than the current date - see this question for more details)

The other option is to use one of the batches functionality in rails.
User.where(some_condition).find_in_batches do |group_of_users|
# some logic
# e.g. group_of_users.update_all(:age => some_logic)
end
This would lock your db for less time. Note that you should pretty much always update with a condition in mind. I can't think of many cases you would want to update an entire table every time something happens.
There are a few options checkout the rails docs or the api.

your query is right.
There are many way to update record in a batch/lot.
But, I think that your query is best. Because it is rails query that will support every condition for all database.
for updating more than one attributes
Model.update_all(:column1 => value1, :column2 => value2, ........)
or
you can use :
Model.update_all("column1 = value1, column2 = value2, ........")

Related

Rails - How to use custom attribute in where?

I have Order model in which I have datetime column start and int columns arriving_dur, drop_off_dur, etc.. which are durations in seconds from start
Then in my model I have
class Order < ApplicationRecord
def finish_time
self.start + self.arriving_duration + self.drop_off_duration
end
# other def something_time ... end
end
I want to be able to do this:
Order.where(finish_time: Time.now..(Time.now+2.hours) )
But of course I can't, because there's no such column finish_time. How can I achieve such result?
I've read 4 possible solutions on SA:
eager load all orders and select it with filter - that would not work well if there were more orders
have parametrized scope for each time I need but that means soo much code duplication
have sql function for each time and bind it to model with select() - it's just pain
somehow use http://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html#method-i-attribute ? But I have no idea how to use it for my case or whether it even solves the problem I have.
Do you have any idea or some 'best practice' how to solve this?
Thanks!
You have different options to implement this behaviour.
Add an additional finish_time column and update it whenever you update/create your time values. This could be done in rails (with either before_validation or after_save callbacks) or as psql triggers.
class Order < ApplicationRecord
before_validation :update_finish_time
private
def update_finish_time
self.finish_time = start_time + arriving_duration.seconds + drop_off_duration.seconds
end
end
This is especially useful when you need finish_time in many places throughout your app. It has the downside that you need to manage that column with extra code and it stores data you actually already have. The upside is that you can easily create an index on that column should you ever have many orders and need to search on it.
An option could be to implement the finish-time update as a postgresql trigger instead of in rails. This has the benefit of being independent from your rails application (e.g. when other sources/scripts access your db too) but has the downside of splitting your business logic into many places (ruby code, postgres code).
Your second option is adding a virtual column just for your query.
def orders_within_the_next_2_hours
finishing_orders = Order.select("*, (start_time + (arriving_duration + drop_off_duration) * interval '1 second') AS finish_time")
Order.from("(#{finishing_orders.to_sql}) AS orders").where(finish_time: Time.now..(Time.now+2.hours) )
end
The code above creates the SQL query for finishing_order which is the order table with the additional finish_time column. In the second line we use that finishing_orders SQL as the FROM clause ("cleverly" aliased to orders so rails is happy). This way we can query finish_time as if it was a normal column.
The SQL is written for relatively old postgresql versions (I guess it works for 9.3+). If you use make_interval instead of multiplying with interval '1 second' the SQL might be a little more readable (but needs newer postgresql version, 9.4+ I think).

Get first entry from an associated table Ruby on Rails

I have a one to many relationship: User has many Payments. I am trying to find a query that gets the first payment of each user(using created_at from the payments table).
I have found a similar question with an SQL response, but I have no idea how to write it with Active Record.
how do I query sql for a latest record date for each user
Quoting the answer:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
For me, it would be min instead of max.
Thank you :)
Try this one for POSTGRES
Payment.select("DISTINCT ON(user_id) *").order("user_id, created_at ASC")
And For SQL
Payment.group(:user_id).having('created_at = MAX(created_at)')
If I'm going to answer the question above with: (I don't based on given raw SQL)
User has many Payments. I am trying to find a query that gets the first payment of each user(using created_at from the payments table).
Let say:
# Assumed to have a Single User, as reference
user = User.first
# Now, get first payment (from Payment model)
user.payments.last
# .last since it will always get the first created row by created_at.
If I fully understand what you're trying to do. I'm don't know why you need max or min date?
What about this?
If you want first payment of each user
dates = Payment.group(:user_id).minimum(:created_at).values
payments = Payment.where(created_at: dates)
From payment you can find user too.
I think you have username as foreign key, you can change accordingly. :)
Let me know if you face any issue, as I tested it works.
I know this answer is not the best, but it will work even or transactions with milliseconds difference, as rails saves date(created_at and updated_at) with ms level.
I am sorry for not replying to everything, but after multiple tests, this is the quickest answer (in run time) I came with:
Payment.where(:id => Payment.group(:user_id).pluck(:id))
I am saying it might not be the quickest way because I am using a sub query. I am getting the unique values and getting the ID's:
Payment.group(:user_id).pluck(:id)
Then I am matching those ID's.
The downside of this is that it won't work reversed, for getting the last payment.
There was also a possibility to use group_by and map but, since map is coming from ruby, it is taking much more time.
I'm not sure but try this :
In your controller :
def Page
#payments = Payment.first
end
in your html.erb :
<% #payments.each do |payment| %>
<p> <%= payment.amount %> </p>
Hope this help !
Record.association.order(:created_at).first

Extract records which satisfy a model function in Rails

I have following method in a model named CashTransaction.
def is_refundable?
self.amount > self.total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
Now I need to extract all the records which satisfy the above function i.e records which return true.
I got that working by using following statement:
CashTransaction.all.map { |x| x if x.is_refundable? }
But the result is an Array. I am looking for ActiveRecord_Relation object as I need to perform join on the result.
I feel I am missing something here as it doesn't look that difficult. Anyways, it got me stuck. Constructive suggestions would be great.
Note: Just amount is a CashTransaction column.
EDIT
Following SQL does the job. If I can change that to ORM, it will still do the job.
SELECT `cash_transactions`.* FROM `cash_transactions` INNER JOIN `refunds` ON `refunds`.`cash_transaction_id` = `cash_transactions`.`id` WHERE (cash_transactions.amount > (SELECT SUM(`amount`) FROM `refunds` WHERE refunds.cash_transaction_id = cash_transactions.id GROUP BY `cash_transaction_id`));
Sharing Progress
I managed to get it work by following ORM:
CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
But what I was actually looking was something like:
CashTransaction.joins(:refunds).where(is_refundable? : true)
where is_refundable? being a model function. Initially I thought setting is_refundable? as attr_accesor would work. But I was wrong.
Just a thought, can the problem be fixed in an elegant way using Arel.
There are two options.
1) Finish, what you have started (which is extremely inefficient when it comes to bigger amount of data, since it all is taken into the memory before processing):
CashTransaction.all.map(&:is_refundable?) # is the same to what you've written, but shorter.
SO get the ids:
ids = CashTransaction.all.map(&:is_refundable?).map(&:id)
ANd now, to get ActiveRecord Relation:
CashTransaction.where(id: ids) # will return a relation
2) Move the calculation to SQL:
CashTransaction.where('amount > total_refunded_amount')
Second option is in every possible way faster and efficient.
When you deal with database, try to process it on the database level, with smallest Ruby involvement possible.
EDIT
According to edited question here is how you would achieve the desired result:
CashTransaction.joins(:refunds).where('amount > SUM(refunds.amount)')
EDIT #2
As to your updates in question - I don't really understand, why you have latched onto is_refundable? as an instance method, which could be used in query, which is basically not possible in AR, but..
My suggestion is to create a scope is_refundable:
scope :is_refundable, -> { CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
}
Now it is available in as short notation as
CashTransaction.is_refundable
which is shorter and more clear than aimed
CashTransaction.where('is_refundable = ?', true)
You can do it this way:
cash_transactions = CashTransaction.all.map { |x| x if x.is_refundable? } # Array
CashTransaction.where(id: cash_transactions.map(&:id)) # ActiveRecord_Relation
But, this is an in-efficient way of doing it as the other answerers also mentioned.
You can do it using SQL if amount and total_refunded_amount are the columns of the cash_transactions table in the database which will be much more efficient and performant:
CashTransaction.where('amount > total_refunded_amount')
But, if amount or total_refunded_amount are not the actual columns in the database, then you can't do it this way. Then, I guess you have do it the other way which is in-efficient than using raw SQL.
I think you should pre-compute is_refundable result (in a new column) when a CashTransaction and his refunds (supposed has_many ?) are updated by using callbacks :
class CashTransaction
before_save :update_is_refundable
def update_is_refundable
is_refundable = amount > total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
end
class Refund
belongs_to :cash_transaction
after_save :update_cash_transaction_is_refundable
def update_cash_transaction_is_refundable
cash_transaction.update_is_refundable
cash_transaction.save!
end
end
Note : The above code must certainly be optimized to prevent some queries
They you can query is_refundable column :
CashTransaction.where(is_refundable: true)
I think it's not bad to do this on two queries instead of a join table, something like this
def refundable
where('amount < ?', total_refunded_amount)
end
This will do a single sum query then use the sum in the second query, when the tables grow larger you might find that this is faster than doing a join in the database.

Find_or_initialize_by Issue

I have a DB Table for a Model entitled TradeDailyAverage. It has a date (DateTime) & averageprice (decimal) column
When I run this, I can't get the averageprice attribute to update:
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
newaverage.update_attributes :averageprice => dailyaverage
Further, when I run this, the date will show up, but the averageprice will not show up in rails console. It only shows up as blank:
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
puts newaverage.averageprice
puts newaverage.date
Is there anything special that I need to do to averageprice before I save it?
Here is all of the entire Rake Task for your reference:
averages = Trade.where('date >= ?', 7.days.ago).average(:price, :group => "DATE_TRUNC('day', date - INTERVAL '1 hour')")
# Loops through each daily average produced above
averages.each do |date, avg|
# Converts the BigDecimal to Floating Point(?)
averagefloat = avg.to_f
# Rounds the Daily Average to only two decimal points
dailyaverage = number_with_precision(averagefloat, :precision => 2)
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
newaverage.update_attributes :averageprice => dailyaverage
If you want to use find_or_initialize_by you need to think carefully about the implications. Lets take your first example:
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
newaverage.update_attributes :averageprice => dailyaverage
This should work, when the TradeDailyAverage for the given date is already in the database. It should not work however, when you get a new record back. The reason is simply because a new record is not persisted to the database. There is no way for update_attributes to update a non persisted record. You have two options here:
1.) Do not use update_attributes but assign the value and call save. This works for both, new and created records:
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
newaverage.averageprice = dailyaverage
newaverage.save
2.) Do not use find_or_initialize_by but find_or_create_by. This way if the record does not exist, a new one is directly written to the database. Now update_attributes should work because you always get persisted records back. Note that this approach has the drawback that you save records without an averageprice to the database. I would not recommend that.
The explanation above should also explain your second example:
newaverage = TradeDailyAverage.find_or_initialize_by_date(date)
puts newaverage.averageprice
puts newaverage.date
This should output the averageprice and the date for persisted records. If you get a newly initialized record back though, it will only display the date. Since you only initialized the record with a date object, there is no way that the averageprice is already set.
My issue was simply that upon saving to my database, PostgreSQL was changing the hourly time, possibly due to a timezone issue. Thus, all of my instances above were new, and I couldn't update attributes of an existing model. So, I converted my datetime data to dates, changed my date db column to date instead of datetime, and now everything is working smoothly. Yves gives some great info above though which helped me later on.

Rails & Sqlite question - comparing dates to db timestamps

I'm trying to do this: map the total sales on a day to an array of dates for highcharts (yes my project is effectively exactly the same as the railscast example).
I'm unfortunately just ending up with a lot of 0s; I believe the piece in my model:
def self.total_revenue_on(date)
where("date(created_at) = ?", date).sum(:amt)
end
is failing to match the date to the datetime written in my database, e.g. "2011-07-21 09:22:28.388944+0000". Pretty sure that's where it's failing because if I remove the timezone piece manually from my database (get rid of "+0000" and leave just "2011-07-21 09:22:28.388944") it works just fine.
I think this is really a rails/sqlite question: am I storing the timestamp improperly, or comparing improperly? Any help is greatly appreciated!
The best practice is to use to_s(:db) for referencing datetimes in a database in Rails. Try:
def self.total_revenue_on(date)
where("date(created_at) = ?", date.to_s(:db)).sum(:amt)
end
OK, I managed to solve this by using a different lookup method:
def self.total_revenue_on(date)
where("datetime >= ? and datetime < ?", date, date + 1.day).sum(:amt)
end
Still completely perplexed by the problem with the original, but this seems to be working.

Resources