Activerecord transaction concurrency race condition issues

Activerecord transaction concurrency race condition issues - ruby-on-rails

I'm currently doing live testing of a game I'm making for Android. The services are written in rails 3.1 and I'm using Postgresql. Some of my more technically savvy testers have been able to manipulate the game by recording their requests to the server and replaying them with high concurrency. I'll try to briefly describe the scenario below without getting caught up in the code.
A user can purchase multiple items, each item has its own record in the database.
The request goes to a controller action, which creates a purchase model to record information about the transaction.
The trade model has a method that sets up the purchase of the items. It essentially does a few logical steps to see if they can purchase the item. The most important is that they have a limit of 100 items per user at any given time. If all the conditions pass, a simple loop is used to create the number of items they requested.
So, what they are doing is, recording 1 valid request purchase via a proxy. Then replaying it with high concurrency, which essentially is allowing a few extra to slip through each time. So if they set it to purchase 100 quantity, they can get it up to 300-400 or if they do 15 quantity, they can get it up to like 120.
The above purchase method is wrapped in a transaction. However, even though its wrapped it won't stop it in certain circumstances where the requests are executing nearly at the same time. I'm guessing this may require some DB locking. Another twist in this that needs to be known is that at any given time rake task are being ran in cron jobs against the user table to update the players health and energy attributes. So, that cannot be blocked either.
Any assistance would be really awesome. This is my little hobby side project and I want to make sure the game is fair and fun for everyone.
Thanks so much!
Controller action:
def hire
worker_asset_type_id = (params[:worker_asset_type_id])
quantity = (params[:quantity])
trade = Trade.new()
trade_response = trade.buy_worker_asset(current_user, worker_asset_type_id, quantity)
user = User.find(current_user.id, select: 'money')
respond_to do |format|
format.json {
render json: {
trade: trade,
user: user,
messages: {
messages: [trade_response.to_s]
}
}
}
end
end
Trade Model Method:
def buy_worker_asset(user, worker_asset_type_id, quantity)
ActiveRecord::Base.transaction do
if worker_asset_type_id.nil?
raise ArgumentError.new("You did not specify the type of worker asset.")
end
if quantity.nil?
raise ArgumentError.new("You did not specify the amount of worker assets you want to buy.")
end
if quantity <= 0
raise ArgumentError.new("Please enter a quantity above 0.")
end
quantity = quantity.to_i
worker_asset_type = WorkerAssetType.where(id: worker_asset_type_id).first
if worker_asset_type.nil?
raise ArgumentError.new("There is no worker asset of that type.")
end
trade_cost = worker_asset_type.min_cost * quantity
if (user.money < trade_cost)
raise ArgumentError.new("You don't have enough money to make that purchase.")
end
# Get the users first geo asset, this will eventually have to be dynamic
potential_total = WorkerAsset.where(user_id: user.id).length + quantity
# Catch all for most people
if potential_total > 100
raise ArgumentError.new("You cannot have more than 100 dealers at the current time.")
end
quantity.times do
new_worker_asset = WorkerAsset.new()
new_worker_asset.worker_asset_type_id = worker_asset_type_id
new_worker_asset.geo_asset_id = user.geo_assets.first.id
new_worker_asset.user_id = user.id
new_worker_asset.clocked_in = DateTime.now
new_worker_asset.save!
end
self.buyer_id = user.id
self.money = trade_cost
self.worker_asset_type_id = worker_asset_type_id
self.trade_type_id = TradeType.where(name: "market").first.id
self.quantity = quantity
# save trade
self.save!
# is this safe?
user.money = user.money - trade_cost
user.save!
end
end

Sounds like you need idempotent requests so that request replay is ineffective. Where possible implement operations so that repeating them has no effect. Where not possible, give each request a unique request identifier and record whether requests have been satisfied or not. You can keep the request ID information in an UNLOGGED table in PostgreSQL or in redis/memcached since you don't need it to be persistent. This will prevent a whole class of exploits.
To deal with just this one problem create an AFTER INSERT OR DELETE ... FOR EACH ROW EXECUTE PROCEDURE trigger on the user items table. Have this trigger:
BEGIN
-- Lock the user so only one tx can be inserting/deleting items for this user
-- at the same time
SELECT 1 FROM user WHERE user_id = <the-user-id> FOR UPDATE;
IF TG_OP = 'INSERT' THEN
IF (SELECT count(user_item_id) FROM user_item WHERE user_item.user_id = <the-user-id>) > 100 THEN
RAISE EXCEPTION 'Too many items already owned, adding this item would exceed the limit of 100 items';
END IF;
ELIF TG_OP = 'DELETE' THEN
-- No action required, all we needed to do is take the lock
-- so a concurrent INSERT won't run until this tx finishes
ELSE
RAISE EXCEPTION 'Unhandled trigger case %',TG_OP;
END IF;
RETURN NULL;
END;
Alternately, you can implement the same thing in the Rails application by taking row-level lock on the customer ID before adding or deleting any item ownership records. I prefer to do this sort of thing in triggers where you can't forget to apply it somewhere, but I realise you might prefer to do it at the app level. See Pessimistic locking.
Optimistic locking is not a great fit for this application. You can use it by incrementing the lock counter on the user before adding/removing items, but it'll cause row churn on the users table and is really unnecessary when your transactions will be so short anyway.

We can't help much unless you show us your relevant schema and queries. I suppose that you do something like:
$ start transaction;
$ select amount from itemtable where userid=? and itemid=?;
15
$ update itemtable set amount=14 where userid=? and itemid=?;
commit;
An you should do something like:
$ start transaction;
$ update itemtable set amount=amount-1 returning amount where userid=? and itemid=?;
14
$ commit;

Related

Stripe API auto_paging get all Stripe::BalanceTransaction except some charge

I'm trying to get all Stripe::BalanceTransaction except those they are already in my JsonStripeEvent
What I did =>
def perform(*args)
last_recorded_txt = REDIS.get('last_recorded_stripe_txn_last')
txns = Stripe::BalanceTransaction.all(limit: 100, expand: ['data.source', 'data.source.application_fee'], ending_before: last_recorded_txt)
REDIS.set('last_recorded_stripe_txn_last', txns.data[0].id) unless txns.data.empty?
txns.auto_paging_each do |txn|
if txn.type.eql?('charge') || txn.type.eql?('payment')
begin
JsonStripeEvent.create(data: txn.to_json)
rescue StandardError => e
Rails.logger.error "Error while saving data from stripe #{e}"
REDIS.set('last_recorded_stripe_txn_last', txn.id)
break
end
end
end
end
But It doesnt get the new one from the API.
Can anyone could help me for this ? :)
Thanks

I think it's because the way auto_paging_each works is almost opposite to what you expect :)
As you can see from its source, auto_paging_each calls Stripe::ListObject#next_page, which is implemented as follows:
def next_page(params={}, opts={})
return self.class.empty_list(opts) if !has_more
last_id = data.last.id
params = filters.merge({
:starting_after => last_id,
}).merge(params)
list(params, opts)
end
It simply takes the last (already fetched) item and adds its id as the starting_after filter.
So what happens:
You fetch 100 "latest" (let's say) records, ordered by descending date (default order for BalanceTransaction API according to Stripe docs)
When you call auto_paging_each on this dataset then, it takes the last record, adds its id as the
starting_after filter and repeats the query.
The repeated query returns nothing because there are noting newer (starting later) than the set you initially fetched.
As far as there are no more newer items available, the iteration stops after the first step
What you could do here:
First of all, ensure that my hypothesis is correct :) - put the breakpoint(s) inside Stripe::ListObject and check. Then 1) rewrite your code to use starting_after traversing logic instead of ending_before - it should work fine with auto_paging_each then - or 2) rewrite your code to control the fetching order manually.
Personally, I'd vote for (2): for me slightly more verbose (probably), but straightforward and "visible" control flow is better than poorly documented magic.

Getting a Primary Key error in Rails using Sidekiq and Sidekiq-Cron

I have a Rails project that uses Sidekiq for worker tasks, and Sidekiq-Cron to handle scheduling. I am running into a problem, though. I built a controller (below) that handled all of my API querying, validation of data, and then inserting data into the database. All of the logic functioned properly.
I then tore out the section of code that actually inserts API data into the database, and moved it into a Job class. This way the Controller method could simply pass all of the heavy lifting off to a job. When I tested it, all of the logic functioned properly.
Finally, I created a Job that would call the Controller method every minute, do the validation checks, and then kick off the other Job to save the API data (if necessary). When I do this the first part of the logic seems to work, where it inserts new event data, but the logic where it checks to see if this is the first time we've seen an event for a specific object seems to be failing. The result is a Primary Key violation in PG.
Code below:
Controller
require 'date'
class MonnitOpenClosedSensorsController < ApplicationController
def holderTester()
#MonnitschedulerJob.perform_later(nil)
end
# Create Sidekiq queue to process new sensor readings
def queueNewSensorEvents(auth_token, network_id)
m = Monnit.new("iMonnit", 1)
# Construct the query to select the most recent communication date for each sensor in the network
lastEventForEachSensor = MonnitOpenClosedSensor.select('"SensorID", MAX("LastCommunicationDate") as "lastCommDate"')
lastEventForEachSensor = lastEventForEachSensor.group("SensorID")
lastEventForEachSensor = lastEventForEachSensor.where('"CSNetID" = ?', network_id)
todaysDate = Date.today
sevenDaysAgo = (todaysDate - 7)
lastEventForEachSensor.each do |event|
# puts event["lastCommDate"]
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"')
recentEvent = recentEvent.where('"CSNetID" = ? AND "SensorID" = ? AND "LastCommunicationDate" = ?', network_id, event["SensorID"], event["lastCommDate"])
recentEvent.each do |recent|
message = m.get_extended_sensor(auth_token, recent["SensorID"])
if message["LastDataMessageMessageGUID"] != recent["id"]
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, message["SensorID"])
# puts "hi inner"
# puts message["LastDataMessageMessageGUID"]
# puts recent['id']
# puts recent["SensorID"]
# puts message["SensorID"]
# raise message
end
end
end
# Queue up any Sensor Events for new sensors
# This would be sensors we've never seen before, from a Postgres standpoint
sensors = m.get_sensor_ids(auth_token)
sensors.each do |sensor|
sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
# sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
sensorCheck = sensorCheck.group(:SensorID)
sensorCheck = sensorCheck.where('"CSNetID" = ? AND "SensorID" = ?', network_id, sensor)
# sensorCheck = sensorCheck.where('id = "?"', sensor["LastDataMessageMessageGUID"])
if sensorCheck.any? == false
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, sensor)
end
end
end
end
The above code breaks Sensor Events for new sensors. It doesn't recognize that a sensor already exists, first issue, and then doesn't recognize that the event it is trying to create is already persisted to the database (uses a GUID for comparison).
Job to persist data
class MonnitopenclosedsensorJob < ApplicationJob
queue_as :default
def perform(auth_token, network_id, sensor)
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
sensorRecord = MonnitOpenClosedSensor.new
sensorRecord.SensorID = newSensor['SensorID']
sensorRecord.MonnitApplicationID = newSensor['MonnitApplicationID']
sensorRecord.CSNetID = newSensor['CSNetID']
lastCommunicationDatePretty = newSensor['LastCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
nextCommunicationDatePretty = newSensor['NextCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
sensorRecord.LastCommunicationDate = Time.at(lastCommunicationDatePretty)
sensorRecord.NextCommunicationDate = Time.at(nextCommunicationDatePretty)
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
sensorRecord.PowerSourceID = newSensor['PowerSourceID']
sensorRecord.Status = newSensor['Status']
sensorRecord.CanUpdate = newSensor['CanUpdate'] == "true" ? 1 : 0
sensorRecord.ReportInterval = newSensor['ReportInterval']
sensorRecord.MinimumThreshold = newSensor['MinimumThreshold']
sensorRecord.MaximumThreshold = newSensor['MaximumThreshold']
sensorRecord.Hysteresis = newSensor['Hysteresis']
sensorRecord.Tag = newSensor['Tag']
sensorRecord.ActiveStateInterval = newSensor['ActiveStateInterval']
sensorRecord.CurrentReading = newSensor['CurrentReading']
sensorRecord.BatteryLevel = newSensor['BatteryLevel']
sensorRecord.SignalStrength = newSensor['SignalStrength']
sensorRecord.AlertsActive = newSensor['AlertsActive']
sensorRecord.AccountID = newSensor['AccountID']
sensorRecord.CreatedOn = Time.now.getutc
sensorRecord.CreatedBy = "Monnit Open Closed Sensor Job"
sensorRecord.LastModifiedOn = Time.now.getutc
sensorRecord.LastModifiedBy = "Monnit Open Closed Sensor Job"
sensorRecord.save
sensorRecord = nil
end
end
Job to call controller every minute
class MonnitschedulerJob < ApplicationJob
queue_as :default
def perform(*args)
m = Monnit.new("iMonnit", 1)
getImonnitUsers = ImonnitCredential.select('"auth_token", "username", "password"')
getImonnitUsers.each do |user|
# puts user["auth_token"]
# puts user["username"]
# puts user["password"]
if user["auth_token"] != nil
m.logon(user["auth_token"])
else
auth_token = m.get_auth_token(user["username"], user["password"])
auth_token = auth_token["Result"]
end
network_list = m.get_network_list(auth_token)
network_list.each do |network|
# puts network["NetworkID"]
MonnitOpenClosedSensorsController.new.queueNewSensorEvents(auth_token, network["NetworkID"])
end
end
end
end
Sorry about the length of the post. I tried to include as much information as I could about the code involved.
EDIT
Here is the code for the extended sensor, along with the JSON response:
def get_extended_sensor(auth_token, sensor_id)
response = self.class.get("/json/SensorGetExtended/#{auth_token}?SensorID=#{sensor_id}")
if response['Result'] != "Invalid Authorization Token"
response['Result']
else
response['Result']
end
end
{
"Method": "SensorGetExtended",
"Result": {
"ReportInterval": 180,
"ActiveStateInterval": 180,
"InactivityAlert": 365,
"MeasurementsPerTransmission": 1,
"MinimumThreshold": 4294967295,
"MaximumThreshold": 4294967295,
"Hysteresis": 0,
"Tag": "",
"SensorID": 189092,
"MonnitApplicationID": 9,
"CSNetID": 24391,
"SensorName": "Open / Closed - 189092",
"LastCommunicationDate": "/Date(1500999632000)/",
"NextCommunicationDate": "/Date(1501010432000)/",
"LastDataMessageMessageGUID": "d474b3db-d843-40ba-8e0e-8c4726b61ec2",
"PowerSourceID": 1,
"Status": 0,
"CanUpdate": true,
"CurrentReading": "Open",
"BatteryLevel": 100,
"SignalStrength": 84,
"AlertsActive": true,
"CheckDigit": "QOLP",
"AccountID": 14728
}
}

Some thoughts:
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"') -
this is not doing any ordering; you are presuming that the records you retrieve here are the latest records.
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
without the implementation details of get_extended_sensor it's impossible to tell you how
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
is resolving.
It's highly likely that you are getting duplicate messages. It's almost never a good idea to use input data as a primary key - rather autogenerate a GUID in your job, use that as the primary key, and then use the LastDataMessageMessageGUID as a correlation id.

So the issue that I was running into, as it turns out, is as follows:
A sensor event was pulled from the API and queued up in as a worker job in Sidekiq.
If the queue is running a bit slow, API speed or simply a lot of jobs to process, the 1 minute poll might hit again and pull the same sensor event down and queue it up.
As the queue processes, the sensor event gets inserted into the database with it's GUID being the primary key
As the queue continues to catch up with itself, it hits the same event that was scheduled as a secondary job. This job then fails.
My solution to this was to move my "does this SensorID and GUID exist in the database" to the actual job. So when the job ran the first thing it'd do is check AGAIN for the record to already exist. This means I am checking twice, but this quick check has low overhead.
There is still the risk that a check could happen and pass while another job is inserting the record, before it commits it to the database, and then it could fail. But the retry would catch it, and then clear it on out as a successful process when the check doesn't validate on the second round. Having said that, however, the check occurs AFTER the API data has been pulled. Since, in theory, the database persist of a single record from the API data would happen really fast (much faster than the API call would happen), it really does lower the chances of you having to hit a retry on any job....and I mean you'd have a better chance of hitting the lottery than having the second check fail and trigger a retry.
If anyone else has a better, or more clean solution, please feel free to include it as a secondary answer!

Log long-running transactions in Rails

I'm getting "Mysql2::Error: Lock wait timeout exceeded" errors in my production Rails app, and I'm looking for help debugging which transaction is locking the tables for an excessively long time. MySQL has a "slow query" log, but not a log of slow transactions, as far as I can tell.
Is there a way to log information about how long transactions take directly from ActiveRecord?

My team logs slow transactions like this:
In lib/slow_transaction_logger.rb:
module SlowTransactionLogger
LOG_IF_SLOWER_THAN_SECONDS = ENV.fetch("LOG_TRANSACTIONS_SLOWER_THAN_SECONDS", "5").to_f
def transaction(*args)
start = Time.now
value = super
finish = Time.now
duration = finish - start
return value if duration <= LOG_IF_SLOWER_THAN_SECONDS
backtrace = caller.
select { |path| path.start_with?(Rails.root.to_s) }.
reject { |row|
row.include?("/gems/") ||
row.include?("/app/middleware/") ||
row.include?("/config/initializers/")
}
puts "slow_transaction_logger: duration=#{duration.round(2)} start=#{start.to_s.inspect} finish=#{finish.to_s.inspect} class=#{name} caller=#{backtrace.to_json}"
value
end
end
In config/initializers/active_record.rb:
class ActiveRecord::Base
class << self
prepend SlowTransactionLogger
end
end
So by default, it logs transactions slower than 5 seconds, in every environment.
If you set a LOG_TRANSACTIONS_SLOWER_THAN_SECONDS environment variable (e.g. on Heroku), you can tweak that limit.
We log with puts intentionally because it ends up in the logs in Heroku, and we also we see it more easily in dev and tests. But feel free to use Rails.logger if you prefer.

How do I iterate on a collection when I don't know what the upper limit of iterations is?

I have an API that I am pulling data from, and I want to collect all the tags from this API...but I don't know the number of tags in advance, and the API throttles access via the max number of results returned in any 1 call (100). It has an unlimited number of pages though.
So a call may look like this: Tag.update_tags(100, 5) where 100 is the max number of objects returned in 1 call and 5 is the page to begin (i.e. if you assume that the tags are stored sequentially, what this is saying is return the tag records with IDs in the range of 401 - 500.
The issue is, I don't want to manually have to enter 5 (i.e. I don't know what the upper limit is). There is no way for me to ping the total number of tags (if there were, I would simply divide it and put this call in a loop up to that number).
All I do know is that once it reaches a page that doesn't have any results, it will return an empty array [].
So, how do I loop through all the tags and stop when the result returned is an empty array (which would be the final result returned and therefore not evaluated)?
What does that loop look like?

Use an unconditional loop with a break statement when the result returns the empty array.
i = 1
loop do
result = call_to_api(i)
do_something_with(result)
i += 1
break if result.empty?
end
Of course in a production scenario you want something a little more robust, including exception handlers, some progress log reporting, and some kind of concrete iteration limit to ensure that the loop does not become infinite.
Update
Here's an example using a class to wrap up the logic.
class Api
DEFAULT_OPTIONS = {:start_position => 1, :max_iterations => 1000}
def initialize(base_uri, config)
#config = DEFAULT_OPTIONS.merge(config)
#position = config[:start_position]
#results_count = 0
end
def each(&block)
advance(&block) while can_advance?
log("Processed #{#results_count} results")
end
def advance(&block)
yield result
#results_count += result.count
#position += 1
#current_result = nil
end
def result
#current_result ||= begin
response = Net::HTTP.get_response(current_uri)
JSON.decode(response.body)
rescue
# provide some exception handling/logging
end
end
def can_advance?
#position < (#config[:start_position] + #config[:max_iterations]) && result.any?
end
def current_uri
Uri.parse("#{#base_uri}?page=#{#position}")
end
end
api = Api.new('http://somesite.com/api/v1/resource')
api.each do |result|
do_something_with(result)
end
There's also an angle with this to allow for concurrency by setting the start and iteration count for each thread which would definetly speed this up with the concurrent http requests.

Hmmm. You can get 100 items at a time, and start at a particular page. How to implement the iteration depends on what you want to do. Let's suppose that you want to collect all the unique tags. Establish a map (for example, a HashMap), then retrieve one page at a time and process it. When you hit a page that's empty, you're done.
// Implements a map and methods to update it
MyHashMap uniqueTags;
// Stores a page of tags
Page page;
Do
// get a page of tags
page = readTags();
if (page != null) {
uniqueTags.getUniqueTags(page);
} else {
break;
}
until (page == null);

find_or_create and race-condition in rails, theory and production

Hi I've this piece of code
class Place < ActiveRecord::Base
def self.find_or_create_by_latlon(lat, lon)
place_id = call_external_webapi
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
result
end
end
Then I'd like to do in another model or controller
p = Post.new
p.place = Place.find_or_create_by_latlon(XXXXX, YYYYY) # race-condition
p.save
But Place.find_or_create_by_latlon takes too much time to get the data if the action executed is create and sometimes in production p.place is nil.
How can I force to wait for the response before execute p.save ?
thanks for your advices

You're right that this is a race condition and it can often be triggered by people who double click submit buttons on forms. What you might do is loop back if you encounter an error.
result = Place.find_by_place_id(...) ||
Place.create(...) ||
Place.find_by_place_id(...)
There are more elegant ways of doing this, but the basic method is here.

I had to deal with a similar problem. In our backend a user is is created from a token if the user doesn't exist. AFTER a user record is already created, a slow API call gets sent to update the users information.
def self.find_or_create_by_facebook_id(facebook_id)
User.find_by_facebook_id(facebook_id) || User.create(facebook_id: facebook_id)
rescue ActiveRecord::RecordNotUnique => e
User.find_by_facebook_id(facebook_id)
end
def self.find_by_token(token)
facebook_id = get_facebook_id_from_token(token)
user = User.find_or_create_by_facebook_id(facebook_id)
if user.unregistered?
user.update_profile_from_facebook
user.mark_as_registered
user.save
end
return user
end
The step of the strategy is to first remove the slow API call (in my case update_profile_from_facebook) from the create method. Because the operation takes so long, you are significantly increasing the chance of duplicate insert operations when you include the operation as part of the call to create.
The second step is to add a unique constraint to your database column to ensure duplicates aren't created.
The final step is to create a function that will catch the RecordNotUnique exception in the rare case where duplicate insert operations are sent to the database.
This may not be the most elegant solution but it worked for us.

I hit this inside a sidekick job that retries and gets the error repeatedly and eventually clears itself. The best explanation I've found is on a blog post here. The gist is that postgres keeps an internally stored value for incrementing the primary key that gets messed up somehow. This rings true for me because I'm setting the primary key and not just using an incremented value so that's likely how this cropped up. The solution from the comments in the link above appears to be to call ActiveRecord::Base.connection.reset_pk_sequence!(table_name) This cleared up the issue for me.
begin
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
rescue ActiveRecord::StatementInvalid => error
#save_retry_count = (#save_retry_count || 1)
ActiveRecord::Base.connection.reset_pk_sequence!(:place)
retry if( (#save_retry_count -= 1) >= 0 )
raise error
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart