Check for new records - ruby-on-rails

I have an interesting dilema. I have an application that when a new record, in this case a user, is created it is published to another application (specific information is).
Now I could use User.last and get the latest and greatest. Publishing happens as soon as the record is saved and it only takes a second. So assume I have 500 users signing up at once.
Thats 500 new records published to the second app, For each of those I need to say:
If this user is new, do x with it, else ignore it.
I am using the whenever gem to create a cron job on the second app that watches every 5 seconds for new records. in that time 5 new recods could come through so I need to update the above statement to say:
If the record is 5 seconds or younger do x with it, else ignore it.
Can I do the following:
Users.all.each do |u|
*if u is 5 seconds or less old*
do something here
end
end
I am not sure what the if statement would be, would it be u.created_at <= 5.seconds ??

User.where('created_at >= ?', 5.seconds.ago) will find records that are no more than five seconds old. But it sounds like you might be better off with an API that would push create events to your second app.

You could. It would look like this:
gauge = Time.now - 5
Users.all.each do |u|
u.created_at >= gauge
# do something here
end
end
The issue is that it takes time to run the loop. Even if you ran a more efficient query, you're still relying on timestamps for a high level of precision.
User.where('created_at >= ?', Time.now - 5)
If there's a delay in the system, say the computer takes .5 seconds to get to your query, you'll miss any users created in the .5 second gap. Better to add a published column to Users, mark the record as published, and then check for unpublished records.
users = User.where(published: nil)
users.each { |u| u.publish }
Or, as another post mentioned, look for an API that will push create events.

Related

Rails transaction isn't acting atomic?

I have an accounting system I wrote which follows standard dual-entry accounting practices.
There is a feature of dual entry accounting called 'trial balance' where you can verify the entire system is correct because when you run it, it will always equal 0.00
I have written tests and always run my trial balance when the system is 'stopped', but under write-heavy database load during seeding lots of records, I noticed my trial balance is WRONG about 1 out of 10 tries.
When it's at rest (no inserts), its always correct at 0.00 however.
When I insert transactions they're always in a transaction, like this:
2000.times do |i|
ActiveRecord::Base.transaction do
puts "#{i} ==================================================================="
entry = JournalEntry.create!(description: 'Purchase mower on credit', user: user)
entry.line_items.create!(amount: Money.from_amount(1551.75).cents, account: property.accounts.find_by(name: 'Equipment'), side: :debit)
entry.line_items.create!(amount: Money.from_amount(1551.75).cents, account: property.accounts.find_by(name: 'Accounts Payable'), side: :credit)
end
end
The fact it breaks under load makes me think I'm not understanding something vital about how Rails transactions work......
What could be causing this?
FWIW my trial balance function (GeneralLedger.new(property).trial_balance) executes the following pseodo-SQL (NOT in a transaction):
SELECT sum(...) WHERE account = 'asset'
SELECT sum(...) WHERE account = 'liability'
SELECT sum(...) WHERE account = 'equity'
SELECT sum(...) WHERE account = 'income'
SELECT sum(...) WHERE account = 'expense'
I then add them together according to the Accounting Formula to arrive at 0.00:
def trial_balance
balance_category(:asset) - (balance_category(:liability) + balance_category(:equity) + balance_category(:income) - balance_category(:expense))
end
The balance_category function is what triggers each SELECT for a total of 5 times, once for each category.
Because its returning 0 it means it's somehow selecting when it's halfway inserted........... I have no idea how this is happening?
I could understand if the creation of the journal entry/line item was not in a transaction and it was SELECTing halfway inserted rows, but it should only select from the entire group as a whole after the transaction ends?
If you want to avoid repeated statements, collapse it into one, something of this form:
SELECT account, SUM(...) AS amount FROM ...
WHERE account IN ('asset', 'liability', ...)
GROUP BY account
You can fetch these like this:
where(account: ACCOUNT_TYPES).group(:account).pluck('account, SUM(...)')
Where ACCOUNT_TYPES is an array of the account types you need to fetch.
You can always take this pluck form and convert to a quick look-up hash with .to_h then use it like this:
balance_category = ...where(...)...pluck(...).to_h
balance_category[:asset]
If you need a default value, consider:
balance_category = Hash.new(0).merge(...where(...)...pluck(...).to_h)
Where that default can be integer (0) or a float (0.0) or anything at all.

How to make a job idempotent for 1 run every month (Rails 5)?

I need to generate invoices once per month, according to the EST/EDT timezone (the clients are throughout the country but in this industry billing happens at the same timezone).
I'm creating a GenerateInvoicesJob but I'm having trouble reasoning about a 100% perfect way to generate invoices so that there isn't any possible duplicate/confusion with regards to:
Generate invoices only once per month
Let the job run every day
Make the job idempotent
Then the final point which is the hard one for me, how do I ensure that there are no bugs with EST/DST and 1 hour slipping through.
Here is my clock.rb:
every(1.day, 'GenerateInvoicesJob', tz: 'America/New_York', at: '04:00') do
Delayed::Job.enqueue GenerateInvoicesJob.new, queue: 'high'
end
And here's the top of my job:
Unit.where(enabled: true)
.joins(:user)
.where('last_invoice_generated_at <= ?', Time.now.utc.end_of_month)
.each do |unit|
ActiveRecord::Base.transaction do
unit.update_attributes(
last_invoice_generated_at: Time.now.utc
)
invoice = Invoice.create!(
...
)
line_item = LineItem.create!(
...
)
end
I realize the direct conditional logic might be wrong, so that's not entirely my question... my main addition to that question is whats the best way overall to do this so I can make sure that all times in EST are 100% accounted for, including weird off-by-1-hour bugs, etc. This job is super important so I'm hesitant on a way to make it perfect.
On top of that I"m not sure whether I should store UTC in the database.... normally I know you always are supposed to store UTC, but I know UTC doesnt have DST so I'm afraid if I store it like that, the job could run one time and invoices would be not run properly
I would do something like this in the worker:
# `beginning_of_month` because we want to load units that haven't
# been billed this month
units_to_bill = Unit.where(enabled: true)
.where('last_invoice_generated_at < ?', Time.current.beginning_of_month)
# `find_each` because it needs less memory
units_to_bill.find_each do |unit|
# Beginn a transaction to ensure all or nothing is updated
Unit.transaction do
# reload the unit, because it might have been updated by another
# task in the meantime
unit.reload
# lock the current unit for updates
unit.lock!
# check if the condition is still true
if unit.last_invoice_generated_at < 1.month.ago
# generate invoices
# invoice = Invoice.create!(
# line_item = LineItem.create!(
# last step update unit
unit.update_attributes(
last_invoice_generated_at: Time.current
)
end
end
end

ActiveRecord query for unique on a given column, given a certain order

I have Events, with a date, and a name, that has_many Minutes
I need to find all the events that are the first one by date to have the given name, then collect all of the minutes associated with those events.
Event.order(date: :asc).uniq_by(&:name).map(&:minutes).flatten
almost works. except that I end up with an array, and not an ActiveRecord::Relation, and that I'd really like to start on Minute and make this into a scope like... Minute.first_time_events() gives me all the minutes from events that were the first event by date with their name.
One final caveat, I'm using rails 3.2. Thanks
If you don't mind having raw sql in your app you could do it all in one call with
Minute.where(event_id: Event.joins(
"INNER JOIN (SELECT url, MIN(date) AS minDate FROM events GROUP BY name) groupedEvents ON events.name = groupedEvents.name AND events.date = groupedEvents.minDate"
))
If you don't mind making two calls to the db then I would definitely go with the readability of Max's suggestion
ids = Event.order(date: :asc).uniq_by(&:name).map(&:id)
minutes = Minute.where(event_id: ids)
I would pull the ids of the events and then use that in a scope to fetch Minutes.
ids = Event.uniq(:name).order(date: :asc).ids
minutes = Minute.joins(:event).where(events: { id: ids })
The caveat here is that this is how I would do it in Rails 4. Some backporting may be necessary but you should really be focusing on updating your app since 3.2 will be EOL any day now.

How do I ensure a model always uses a transaction and locks (in Rails)?

I noticed that Rails can have concurrency issues with multiple servers and would like to force my model to always lock. Is this possible in Rails, similar to unique constraints to force data integrity? Or does it just require careful programming?
Terminal One
irb(main):033:0* Vote.transaction do
irb(main):034:1* v = Vote.lock.first
irb(main):035:1> v.vote += 1
irb(main):036:1> sleep 60
irb(main):037:1> v.save
irb(main):038:1> end
Terminal Two, while sleeping
irb(main):240:0* Vote.transaction do
irb(main):241:1* v = Vote.first
irb(main):242:1> v.vote += 1
irb(main):243:1> v.save
irb(main):244:1> end
DB Start
select * from votes where id = 1;
id | vote | created_at | updated_at
----+------+----------------------------+----------------------------
1 | 0 | 2013-09-30 02:29:28.740377 | 2013-12-28 20:42:58.875973
After execution
Terminal One
irb(main):040:0> v.vote
=> 1
Terminal Two
irb(main):245:0> v.vote
=> 1
DB End
select * from votes where id = 1;
id | vote | created_at | updated_at
----+------+----------------------------+----------------------------
1 | 1 | 2013-09-30 02:29:28.740377 | 2013-12-28 20:44:10.276601
Other Example
http://rhnh.net/2010/06/30/acts-as-list-will-break-in-production
You are correct that transactions by themselves don't protect against many common concurrency scenarios, incrementing a counter being one of them. There isn't a general way to force a lock, you have to ensure you use it everywhere necessary in your code
For the simple counter incrementing scenario there are two mechanisms that will work well:
Row Locking
Row locking will work as long as you do it everywhere in your code where it matters. Knowing where it matters may take some experience to get an instinct for :/. If, as in your above code, you have two places where a resource needs concurrency protection and you only lock in one, you will have concurrency issues.
You want to use the with_lock form; this does a transaction and a row-level lock (table locks are obviously going to scale much more poorly than row locks, although for tables with few rows there is no difference as postgresql (not sure about mysql) will use a table lock anyway. This looks like this:
v = Vote.first
v.with_lock do
v.vote +=1
sleep 10
v.save
end
The with_lock creates a transaction, locks the row the object represents, and reloads the objects attributes all in one step, minimizing the opportunity for bugs in your code. However this does not necessarily help you with concurrency issues involving the interaction of multiple objects. It can work if a) all possible interactions depend on one object, and you always lock that object and b) the other objects each only interact with one instance of that object, e.g. locking a user row and doing stuff with objects which all belong_to (possibly indirectly) that user object.
Serializable Transactions
The other possibility is to use serializable transaction. Since 9.1, Postgresql has "real" serializable transactions. This can perform much better than locking rows (though it is unlikely to matter in the simple counter incrementing usecase)
The best way to understand what serializable transactions give you is this: if you take all the possible orderings of all the (isolation: :serializable) transactions in your app, what happens when your app is running is guaranteed to always correspond with one of those orderings. With ordinary transactions this is not guaranteed to be true.
However, what you have to do in exchange is to take care of what happens when a transaction fails because the database is unable to guarantee that it was serializable. In the case of the counter increment, all we need to do is retry:
begin
Vote.transaction(isolation: :serializable) do
v = Vote.first
v.vote += 1
sleep 10 # this is to simulate concurrency
v.save
end
rescue ActiveRecord::StatementInvalid => e
sleep rand/100 # this is NECESSARY in scalable real-world code,
# although the amount of sleep is something you can tune.
retry
end
Note the random sleep before the retry. This is necessary because failed serializable transactions have a non-trivial cost, so if we don't sleep, multiple processes contending for the same resource can swamp the db. In a heavily concurrent app you may need to gradually increase the sleep with each retry. The random is VERY important to avoid harmonic deadlocks -- if all the processes sleep the same amount of time they can get into a rhythm with each other, where they all are sleeping and the system is idle and then they all try for the lock at the same time and the system deadlocks causing all but one to sleep again.
When the transaction that needs to be serializable involves interaction with a source of concurrency other than the database, you may still have to use row-level locks to accomplish what you need. An example of this would be when a state machine transition determines what state to transition to based on a query to something other than the db, like a third-party API. In this case you need to lock the row representing the object with the state machine while the third party API is queried. You cannot nest transactions inside serializable transactions, so you would have to use object.lock! instead of with_lock.
Another thing to be aware of is that any objects fetched outside the transaction(isolation: :serializable) should have reload called on them before use inside the transaction.
ActiveRecord always wraps save operations in a transaction.
For your simple case it might be best to just use a SQL update instead of performing logic in Ruby and then saving. Here is an example which adds a model method to do this:
class Vote
def vote!
self.class.update_all("vote = vote + 1", {:id => id})
end
This method avoids the need for locking in your example. If you need more general database locking check see David's suggestion.
You can do the following in your model like so
class Vote < ActiveRecord::Base
validate :handle_conflict, only: :update
attr_accessible :original_updated_at
attr_writer :original_updated_at
def original_updated_at
#original_updated_at || updated_at
end
def handle_conflict
#If we want to use this across multiple models
#then extract this to module
if #conflict || updated_at.to_f> original_updated_at.to_f
#conflict = true
#original_updated_at = nil
#If two updates are made at the same time a validation error
#is displayed and the fields with
errors.add :base, 'This record changed while you were editing'
changes.each do |attribute, values|
errors.add attribute, "was #{values.first}"
end
end
end
end
The original_updated_at is a virtual attribute that is set. handle_conflict is fired when the record is updated. Checks to see if the updated_at attribute is in the database is later than the one hidden(defined on your page). By the way you should define the following in the your app/view/votes/_form.html.erb
<%= f.hidden_field :original_updated_at %>
If a there is a conflict then raise the validation error.
And if you are using Rails 4 you will won't have the attr_accessible and will need to add :original_updated_at to your vote_params method in your controller.
Hopefully this sheds some light.
For simple +1
Vote.increment_counter :vote, Vote.first.id
Because vote was used both for the table name and the field, this is how the 2 are used
TableName.increment_counter :field_name, id_of_the_row

Update database after duration (Ruby on Rails)

I have a cycle model with two fields: duration (string) and completed (boolean). When a user creates a cycle, they enter the duration (lets say 30 minutes) and the cycle is set to not complete (boolean 0). How do I update that database entry after the cycle duration (30 minutes) to mark the cycle as complete (boolean 1)? Is there a way to handle this with ruby/rails code, or do I have to execute a javascript function?
The goal is to be able to find and display all completed cycles using Cycle.all(:conditions..) and call the SQL database. I wrote a "complete?" method in the cycle model that compares the age of the cycle to the duration, but this is useless for SQL find methods.
What's the best way to tackle this? Thanks!
Define a rake task that runs something likeā€¦
desc "Expire old cycles"
task :cron => :environment do
expired = Cycle.all :conditions => ["expiration < ?", DateTime.now]
expired.each { |c| c.expire! }
end
Where c#expire! is a method that'll mark it as expired in the database. Then setup rake cron to run every N minutes via a cronjob.
If you're comfortable doing this in SQL, you can optimize this by writing a query to do UPDATE cycles SET complete = 1 WHERE expiration < NOW();.
You can add another field, let's say Expired_time that is when the cycle is complete. For example:
# Here is the example record:
Duration Created_at Expired_time
30 mins Time Time + 30 mins
And now simply check current date (now) with Expired_time to check it is complete or not.

Resources