Mongoid caching or multiple requests accessing the Database

Mongoid caching or multiple requests accessing the Database - ruby-on-rails

I have an application running on Rails 4 using MongoDB and mongoid to connect the Rails app with Mongodb.
Let me please paste in a few lines of my trouble maker code.
def generate_transaction_uuid
current_time = Time.now
julian_date = current_time.strftime("%j")
year = current_time.strftime("%Y")
date = current_time.strftime("%H")
channel_identifier = '1'
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
uuid = year[-1] + julian_date[-3..-1] + date + channel_identifier + running_sequence_numbers
TransactionUuid.create(transaction_uuid: uuid)
uuid
end
The code above generates a transaction uuid. The logic ( the core part ) is generating the last 5 digits of the transaction id, the running sequence number. Currently the logic is to generate an id and store it in a table 'TransactionUuid'. When the next request arrives, the last stored uuid is queried from the table, the last 5 digits extracted out of that and the next sequence number is created. Its again inserted into the DB and the process goes on.
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
This works fine in most cases and gives unique transaction ids for each Rails request. But on certain cases, this generates duplicate transaction Ids. Why exactly does that happen is quite unclear to me because from the code its quite evident that the transaction Ids generated are to be unique. Also, when i run the code manually it doesnt seem to be having a problem.
I inspected into the Rails logs to check if the problem is having two simultaneous requests accessing the database at the same time, which is possible. But from the Rails logs, it seems that even requests that have a difference of about 20seconds in between them seem to generate the same transaction id on certain instances. This is whats really confusing me. We use nginx-passenger to serve our requests.
What exactly might be the issue here, some kind of database caching that happens internally or some totally unrelated weird problem or may be even a bug in my code? Any kind of help on this would be much appreciated.

Related

Using limit and offset in rails together with updated_at and find_each - will that cause a problem?

I have a Ruby on Rails project in which there are millions of products with different urls. I have a function "test_response" that checks the url and returns either a true or false for the Product attribute marked_as_broken, either way the Product is saved and has its "updated_at"-attribute updated to the current Timestamp.
Since this is a very tedious process I have created a task which in turn starts off 15 tasks, each with a N/15 number of products to check. The first one should check from, for example, the first to the 10.000th, the second one from the 10.000nd to the 20.000nd and so on, using limit and offset.
This script works fine, it starts off 15 process but rather quickly completes one script after another far too early. It does not terminate, it finishes with a "Process exited with status 0".
My guess here is that using find_each together with a search for updated_at as well as in fact updating the "updated_at" while running the script changes everything and does not make the script go through the 10.000 items as supposed but I can't verify this.
Is there something inherently wrong by doing what I do here. For example, does "find_each" run a new sql query once in a while providing completely different results each time, than anticipated? I do expect it to provide the same 10.000 -> 20.000 but just split it up in pieces.
task :big_response_launcher => :environment do
nbr_of_fps = Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").size.to_i
nbr_of_processes = 15
batch_size = ((nbr_of_fps / nbr_of_processes))-2
heroku = PlatformAPI.connect_oauth(auth_code_provided_elsewhere)
(0..nbr_of_processes-1).each do |i|
puts "Launching #{i.to_s}"
current_offset = batch_size * i
puts "rake big_response_tester[#{current_offset},#{batch_size}]"
heroku.dyno.create('kopa', {
:command => "rake big_response_tester[#{current_offset},#{batch_size}]",
:attach => false
})
end
end
task :big_response_tester, [:current_offset, :batch_size] => :environment do |task,args|
current_limit = args[:batch_size].to_i
current_offset = args[:current_offset].to_i
puts "Launching with offset #{current_offset.to_s} and limit #{current_limit.to_s}"
Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").limit(current_limit).offset(current_offset).find_each do |fp|
fp.test_response
end
end

As many have noted in the comments, it seems like using find_each will ignore the order and limit. I found this answer (ActiveRecord find_each combined with limit and order) that seems to be working for me. It's not working 100% but it is a definite improvement. The rest seems to be a memory issue, i.e. I cannot have too many processes running at the same time on Heroku.

Arel + Rails 4.2 causing problems (bindings being lost)

We recently upgraded to Rails 4.2 from Rails 4.1 and are seeing problems with using Arel + Activerecord because we're getting this type of error:
ActiveRecord::StatementInvalid: PG::ProtocolViolation: ERROR: bind message supplies 0 parameters, but prepared statement "" requires 8
Here's the code that is breaking:
customers = Customer.arel_table
ne_subquery = ImportLog.where(
importable_type: Customer.to_s,
importable_id: customers['id'],
remote_type: remote_type.to_s.singularize,
destination: 'hello'
).exists.not
first = Customer.where(ne_subquery).where(company_id: #company.id)
second = Customer.joins(:import_logs).merge(
ImportLog.where(
importable_type: Customer.to_s,
importable_id: customers['id'],
remote_type: remote_type.to_s.singularize,
status: 'pending',
destination: 'hello',
remote_id: nil
)
).where(company_id: #company.id)
Customer.from(
customers.create_table_alias(
first.union(second),
Customer.table_name
)
)
We figured out how to solve the first part of the query (running into the same rails bug of not having bindings) by moving the exists.not to be within Customer.where like so:
ne_subquery = ImportLog.where(
importable_type: Customer.to_s,
importable_id: customers['id'],
destination: 'hello'
)
first = Customer.where("NOT (EXISTS (#{ne_subquery.to_sql}))").where(company_id: #company.id)
This seemed to work but we ran into the same issue with this line of code:
first.union(second)
whenever we run this part of the query, the bindings get lost. first and second are both active record objects but as soon as we "union" them, they lose the bindings are become arel objects.
We tried cycling through the query and manually replacing the bindings but couldn't seem to get it working properly. What should we do instead?
EDIT:
We also tried extracting the bind values from first and second, and then manually replacing them in the arel object like so:
union.grep(Arel::Nodes::BindParam).each_with_index do |bp, i|
bv = bind_values[i]
bp.replace(Customer.connection.substitute_at(bv, i))
end
However, it fails because:
NoMethodError: undefined method `replace' for #<Arel::Nodes::BindParam:0x007f8aba6cc248>
This was a solution suggested in the rails github repo.

I know this question is a bit old, but the error sounded familiar. I had some notes and our solution in a repository, so I thought I'd share.
The error we were receiving was:
PG::ProtocolViolation: ERROR: bind message supplies 0 parameters, but
prepared statement "" requires 1
So as you can see, our situation is a bit different. We didn't have 8 bind values. However, our single bind value was still being clobbered. I changed the naming of things to keep it general.
first_level = Blog.all_comments
second_level = Comment.where(comment_id: first_level.select(:id))
third_level = Comment.where(comment_id: second_level.select(:id))
Blog.all_comments is where we have the single bind value. That's the piece we're losing.
union = first_level.union second_level
union2 = Comment.from(
Comment.arel_table.create_table_alias union, :comments
).union third_level
relation = Comment.from(Comment.arel_table.create_table_alias union2, :comments)
We created a union much like you except that we needed to union three different queries.
To get the lost bind values at this point, we did a simple assignment. In the end, this is a little simpler of a case than yours. However, it may be helpful.
relation.bind_values = first_level.bind_values
relation
By the way, here's the GitHub issue we found while working on this. It doesn't appear to have any updates since this question was posted though.

How do I get the last updated date in an array?

This code just displays the values inside the array model.request_reports
To get the most recent, I have to loop through and compare the current
report.updated_at with the last saved report.update_at value. One thing to find
out is what class the update_at field is and how to compare them against each other. The class is ActiveSupport::TimeZone
I need to keep track of the array index of the report that has the most recent updated_at as I loop so that I can access it after the loop.
The problem is, I don't know how to do this:
msg = ""
reports_arr = model.request_reports
reports_arr.each do |report|
updated_at = report.updated_at
if updated_at
msg = msg + "#{updated_at} --- "
msg = msg + "#{updated_at.class}---"
end
end
msg

To add to #meagar comment. You should be using the DB to do sorts on tables.
With that said we need to know what DB you are using as the exact command differs for each.
Mongo w/ Mongoid would be Model.order_by(:updated_at => 'desc').first

My loop had to go through the array and check by greatest date value because in the system Im using, it automatically sorts the reports array by the field "due_at" which is not the reports most recent updated record. Code below works for me.
msg = ""
reports_arr = model.request_reports
last_modified_report = model.last_modified_report
recent = nil
recent_report = nil
reports_arr.each_with_index do |report,index|
updated_at = report.updated_at
if index == 0
recent = updated_at
recent_report = report
end
if updated_at > recent
recent = updated_at
recent_report = report
end
last_modified_report = recent_report
end
msg = msg + "#{recent}---"
msg = msg + "#{recent_report}---"
msg = msg + "#{last_modified_report}"
model.last_modified_report = last_modified_report
model.save(validate: false)
msg

The OP's answer is only good if you absolutely cannot query the database for the info you want directly. I assume you only want the index so you can find the most recent one?
Even if automatic sorting is on one column, your query for the data can have it sorted on a different column.
model.request_reports.order_by(:updated_at => 'desc').first
If you have a default scope that's messing with your query, you can ask for an unscoped list, although I doubt a default ordering would cause any trouble.
model.unscoped.order_by(:updated_at => 'desc').first
You can string together queries that are already written: that can be useful even if request_reports is a query or scope you have somewhere.
It will be way less expensive than getting everything, and looping through it - you are always better off finding a way to get just the info you need in a db query if you can.

Activerecord transaction concurrency race condition issues

I'm currently doing live testing of a game I'm making for Android. The services are written in rails 3.1 and I'm using Postgresql. Some of my more technically savvy testers have been able to manipulate the game by recording their requests to the server and replaying them with high concurrency. I'll try to briefly describe the scenario below without getting caught up in the code.
A user can purchase multiple items, each item has its own record in the database.
The request goes to a controller action, which creates a purchase model to record information about the transaction.
The trade model has a method that sets up the purchase of the items. It essentially does a few logical steps to see if they can purchase the item. The most important is that they have a limit of 100 items per user at any given time. If all the conditions pass, a simple loop is used to create the number of items they requested.
So, what they are doing is, recording 1 valid request purchase via a proxy. Then replaying it with high concurrency, which essentially is allowing a few extra to slip through each time. So if they set it to purchase 100 quantity, they can get it up to 300-400 or if they do 15 quantity, they can get it up to like 120.
The above purchase method is wrapped in a transaction. However, even though its wrapped it won't stop it in certain circumstances where the requests are executing nearly at the same time. I'm guessing this may require some DB locking. Another twist in this that needs to be known is that at any given time rake task are being ran in cron jobs against the user table to update the players health and energy attributes. So, that cannot be blocked either.
Any assistance would be really awesome. This is my little hobby side project and I want to make sure the game is fair and fun for everyone.
Thanks so much!
Controller action:
def hire
worker_asset_type_id = (params[:worker_asset_type_id])
quantity = (params[:quantity])
trade = Trade.new()
trade_response = trade.buy_worker_asset(current_user, worker_asset_type_id, quantity)
user = User.find(current_user.id, select: 'money')
respond_to do |format|
format.json {
render json: {
trade: trade,
user: user,
messages: {
messages: [trade_response.to_s]
}
}
}
end
end
Trade Model Method:
def buy_worker_asset(user, worker_asset_type_id, quantity)
ActiveRecord::Base.transaction do
if worker_asset_type_id.nil?
raise ArgumentError.new("You did not specify the type of worker asset.")
end
if quantity.nil?
raise ArgumentError.new("You did not specify the amount of worker assets you want to buy.")
end
if quantity <= 0
raise ArgumentError.new("Please enter a quantity above 0.")
end
quantity = quantity.to_i
worker_asset_type = WorkerAssetType.where(id: worker_asset_type_id).first
if worker_asset_type.nil?
raise ArgumentError.new("There is no worker asset of that type.")
end
trade_cost = worker_asset_type.min_cost * quantity
if (user.money < trade_cost)
raise ArgumentError.new("You don't have enough money to make that purchase.")
end
# Get the users first geo asset, this will eventually have to be dynamic
potential_total = WorkerAsset.where(user_id: user.id).length + quantity
# Catch all for most people
if potential_total > 100
raise ArgumentError.new("You cannot have more than 100 dealers at the current time.")
end
quantity.times do
new_worker_asset = WorkerAsset.new()
new_worker_asset.worker_asset_type_id = worker_asset_type_id
new_worker_asset.geo_asset_id = user.geo_assets.first.id
new_worker_asset.user_id = user.id
new_worker_asset.clocked_in = DateTime.now
new_worker_asset.save!
end
self.buyer_id = user.id
self.money = trade_cost
self.worker_asset_type_id = worker_asset_type_id
self.trade_type_id = TradeType.where(name: "market").first.id
self.quantity = quantity
# save trade
self.save!
# is this safe?
user.money = user.money - trade_cost
user.save!
end
end

Sounds like you need idempotent requests so that request replay is ineffective. Where possible implement operations so that repeating them has no effect. Where not possible, give each request a unique request identifier and record whether requests have been satisfied or not. You can keep the request ID information in an UNLOGGED table in PostgreSQL or in redis/memcached since you don't need it to be persistent. This will prevent a whole class of exploits.
To deal with just this one problem create an AFTER INSERT OR DELETE ... FOR EACH ROW EXECUTE PROCEDURE trigger on the user items table. Have this trigger:
BEGIN
-- Lock the user so only one tx can be inserting/deleting items for this user
-- at the same time
SELECT 1 FROM user WHERE user_id = <the-user-id> FOR UPDATE;
IF TG_OP = 'INSERT' THEN
IF (SELECT count(user_item_id) FROM user_item WHERE user_item.user_id = <the-user-id>) > 100 THEN
RAISE EXCEPTION 'Too many items already owned, adding this item would exceed the limit of 100 items';
END IF;
ELIF TG_OP = 'DELETE' THEN
-- No action required, all we needed to do is take the lock
-- so a concurrent INSERT won't run until this tx finishes
ELSE
RAISE EXCEPTION 'Unhandled trigger case %',TG_OP;
END IF;
RETURN NULL;
END;
Alternately, you can implement the same thing in the Rails application by taking row-level lock on the customer ID before adding or deleting any item ownership records. I prefer to do this sort of thing in triggers where you can't forget to apply it somewhere, but I realise you might prefer to do it at the app level. See Pessimistic locking.
Optimistic locking is not a great fit for this application. You can use it by incrementing the lock counter on the user before adding/removing items, but it'll cause row churn on the users table and is really unnecessary when your transactions will be so short anyway.

We can't help much unless you show us your relevant schema and queries. I suppose that you do something like:
$ start transaction;
$ select amount from itemtable where userid=? and itemid=?;
15
$ update itemtable set amount=14 where userid=? and itemid=?;
commit;
An you should do something like:
$ start transaction;
$ update itemtable set amount=amount-1 returning amount where userid=? and itemid=?;
14
$ commit;

find_or_create and race-condition in rails, theory and production

Hi I've this piece of code
class Place < ActiveRecord::Base
def self.find_or_create_by_latlon(lat, lon)
place_id = call_external_webapi
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
result
end
end
Then I'd like to do in another model or controller
p = Post.new
p.place = Place.find_or_create_by_latlon(XXXXX, YYYYY) # race-condition
p.save
But Place.find_or_create_by_latlon takes too much time to get the data if the action executed is create and sometimes in production p.place is nil.
How can I force to wait for the response before execute p.save ?
thanks for your advices

You're right that this is a race condition and it can often be triggered by people who double click submit buttons on forms. What you might do is loop back if you encounter an error.
result = Place.find_by_place_id(...) ||
Place.create(...) ||
Place.find_by_place_id(...)
There are more elegant ways of doing this, but the basic method is here.

I had to deal with a similar problem. In our backend a user is is created from a token if the user doesn't exist. AFTER a user record is already created, a slow API call gets sent to update the users information.
def self.find_or_create_by_facebook_id(facebook_id)
User.find_by_facebook_id(facebook_id) || User.create(facebook_id: facebook_id)
rescue ActiveRecord::RecordNotUnique => e
User.find_by_facebook_id(facebook_id)
end
def self.find_by_token(token)
facebook_id = get_facebook_id_from_token(token)
user = User.find_or_create_by_facebook_id(facebook_id)
if user.unregistered?
user.update_profile_from_facebook
user.mark_as_registered
user.save
end
return user
end
The step of the strategy is to first remove the slow API call (in my case update_profile_from_facebook) from the create method. Because the operation takes so long, you are significantly increasing the chance of duplicate insert operations when you include the operation as part of the call to create.
The second step is to add a unique constraint to your database column to ensure duplicates aren't created.
The final step is to create a function that will catch the RecordNotUnique exception in the rare case where duplicate insert operations are sent to the database.
This may not be the most elegant solution but it worked for us.

I hit this inside a sidekick job that retries and gets the error repeatedly and eventually clears itself. The best explanation I've found is on a blog post here. The gist is that postgres keeps an internally stored value for incrementing the primary key that gets messed up somehow. This rings true for me because I'm setting the primary key and not just using an incremented value so that's likely how this cropped up. The solution from the comments in the link above appears to be to call ActiveRecord::Base.connection.reset_pk_sequence!(table_name) This cleared up the issue for me.
begin
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
rescue ActiveRecord::StatementInvalid => error
#save_retry_count = (#save_retry_count || 1)
ActiveRecord::Base.connection.reset_pk_sequence!(:place)
retry if( (#save_retry_count -= 1) >= 0 )
raise error
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Mongoid caching or multiple requests accessing the Database - ruby-on-rails

Related

Using limit and offset in rails together with updated_at and find_each - will that cause a problem?

Arel + Rails 4.2 causing problems (bindings being lost)

How do I get the last updated date in an array?

Activerecord transaction concurrency race condition issues

find_or_create and race-condition in rails, theory and production

Categories

Resources