I'm getting "Mysql2::Error: Lock wait timeout exceeded" errors in my production Rails app, and I'm looking for help debugging which transaction is locking the tables for an excessively long time. MySQL has a "slow query" log, but not a log of slow transactions, as far as I can tell.
Is there a way to log information about how long transactions take directly from ActiveRecord?
My team logs slow transactions like this:
In lib/slow_transaction_logger.rb:
module SlowTransactionLogger
LOG_IF_SLOWER_THAN_SECONDS = ENV.fetch("LOG_TRANSACTIONS_SLOWER_THAN_SECONDS", "5").to_f
def transaction(*args)
start = Time.now
value = super
finish = Time.now
duration = finish - start
return value if duration <= LOG_IF_SLOWER_THAN_SECONDS
backtrace = caller.
select { |path| path.start_with?(Rails.root.to_s) }.
reject { |row|
row.include?("/gems/") ||
row.include?("/app/middleware/") ||
row.include?("/config/initializers/")
}
puts "slow_transaction_logger: duration=#{duration.round(2)} start=#{start.to_s.inspect} finish=#{finish.to_s.inspect} class=#{name} caller=#{backtrace.to_json}"
value
end
end
In config/initializers/active_record.rb:
class ActiveRecord::Base
class << self
prepend SlowTransactionLogger
end
end
So by default, it logs transactions slower than 5 seconds, in every environment.
If you set a LOG_TRANSACTIONS_SLOWER_THAN_SECONDS environment variable (e.g. on Heroku), you can tweak that limit.
We log with puts intentionally because it ends up in the logs in Heroku, and we also we see it more easily in dev and tests. But feel free to use Rails.logger if you prefer.
Related
In a Rails 6.x app, I have a controller method which backgrounds queries that take longer than 2 minutes (to avoid a browser timeout), advises the user, stores the results, and sends a link that can retrieve them to generate a live page (with Highcharts charts). This works fine.
Now, I'm trying to implement the same logic with a method that backgrounds the creation of a report, via a Tempfile, and attaches the contents to an email, if the query runs too long. This code works just fine if the 2-minute timeout is NOT reached, but the Tempfile is empty at the commented line if the timeout IS reached.
I've tried wrapping the second part in another thread, and wrapping the internals of each thread with a mutex, but this is all getting above my head. I haven't done a lot of multithreading, and every time I do, I feel like I stumble around till I get it. This time, I can't even seem to stumble into it.
I don't know if the problem is with my thread(s), or a race condition with the Tempfile object. I've had trouble using Tempfiles before, because they seem to disappear quicker than I can close them. Is this one getting cleaned up before it can be sent? The file handle actually still exists on the file system at the commented point, even though it's empty, so I'm not clear on what's happening.
def report
queue = Queue.new
file = Tempfile.new('report')
thr = Thread.new do
query = %Q(blah blah blah)
#calibrations = ActiveRecord::Base.connection.exec_query query
query = %Q(blah blah blah)
#tunings = ActiveRecord::Base.connection.exec_query query
if queue.empty?
unless #tunings.empty?
CSV.open(file.path, 'wb') do |csv|
csv << ["headers...", #parameters].flatten
#calibrations.each do |c|
line = [c["h1"], c["h2"], c["h3"], c["h4"], c["h5"], c["h6"], c["h7"], c["h8"]]
t = #tunings.select { |t| t["code"] == c["code"] }.first
#parameters.each do |parameter|
line << t[parameter.downcase]
end
csv << line
end
end
send_data file.read, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=\"report.csv\""
end
else
# When "timed out", `file` is empty here
NotificationMailer.report_ready(current_user, file.read).deliver_later
end
end
give_up_at = Time.now + 120.seconds
while Time.now < give_up_at do
if !thr.alive?
break
end
sleep 1
end
if thr.alive?
queue << "Timeout"
render html: "Your report is taking longer than 2 minutes to generate. To avoid a browser timeout, it will finish in the background, and the report will be sent to you in email."
end
end
The reason the file is empty is because you are giving the query 120 seconds to complete. If after 120 seconds that has not happened you add "Timeout" to the queue. The query is still running inside the thread and has not reached the point where you check if the queue is empty or not. When the query does complete, since the queue is now not empty, you skip the part where you write the csv file and go to the Notification.report line. At that point the file is still empty because you never wrote anything into it.
In the end I think you need to rethink the overall logic of what you are trying to accomplish and there needs to be more communication between the threads and the top level.
Each thread needs to tell the top level if it has already sent the result, and the top level needs to let the thread know that its past time to directly send the result, and instead should email the result.
Here is some code that I think / hope will give some insight into how to approach this problem.
timeout_limit = 10
query_times = [5, 15, 1, 15]
timeout = []
sent_response = []
send_via_email = []
puts "time out is set to #{timeout_limit} seconds"
query_times.each_with_index do |query_time, query_id|
puts "starting query #{query_id} that will take #{query_time} seconds"
timeout[query_id] = false
sent_response[query_id] = false
send_via_email[query_id] = false
Thread.new do
## do query
sleep query_time
unless timeout[query_id]
puts "query #{query_id} has completed, displaying results now"
sent_response[query_id] = true
else
puts "query #{query_id} has completed, emailing result now"
send_via_email[query_id] = true
end
end
give_up_at = Time.now + timeout_limit
while Time.now < give_up_at
break if sent_response[query_id]
sleep 1
end
unless sent_response[query_id]
puts "query #{query_id} timed out, we will email the result of your query when it is completed"
timeout[query_id] = true
end
end
# simulate server environment
loop { }
=>
time out is set to 10 seconds
starting query 0 that will take 5 seconds
query 0 has completed, displaying results now
starting query 1 that will take 15 seconds
query 1 timed out, we will email the result of your query when it is completed
starting query 2 that will take 1 seconds
query 2 has completed, displaying results now
starting query 3 that will take 15 seconds
query 1 has completed, emailing result now
query 3 timed out, we will email the result of your query when it is completed
query 3 has completed, emailing result now
I have a sidekiq worker that waits for a change to happen to a record made by a remote client. Something like the following:
#myworker async process to wait for client to confirm status
perform(myRecordID)
sendClient(myRecordID)
didClientAcknowledge = false
while !didClientAcknowledge
didClientAcknowledge = myRecords.find(myRecordID).status == :ACK_OK
if didClientAcknowledge
break
end
# wait for client to perform an update on the record to confirm status
sleep 5.seconds
end
Rails.logger.info("client got the message")
end
my problem is that although I can see that the client has in fact performed the acknowledgement and updated the record with correct status update (ACK_OK), my sidekiq thread continues to see the old status for myRecord.
I'm sure my logic is flawed here but it seems like the sidekiq process does not "see" changes to the DB...but if I used my rails console I can see that the client has in fact updated the DB as expected...
Thanks!
Edit 1
ok so here's a thought, instead of the loop, I'll schedule another call to the worker within 5 seconds... so here's the updated code:
perform(myRecordID, retry_count)
retry_count -= 1
if retry_count < 1
return
end
sendClient(myRecordID)
didClientAcknowledge = false
if !didClientAcknowledge
didClientAcknowledge = myRecords.find(myRecordID).status == :ACK_OK
if didClientAcknowledge
Rails.logger.info("client got the message")
return
end
# wait for client to perform an update on the record to confirm status
myWorker.perform_in(5.seconds)
end
Rails.logger.info("client got the message")
end
This seems to work, but will test a bit more..one challenge is having a retry count which means I need to maintain some sort of variable between calls to the worker...
edit2 possibly this can be done by passing in the time to the first call and then checking if a timeout has been surpassed before invoking the next instance...assuming time does not stand still as well inside the async call...
edit3 Adding the retry_count argument allows us to control how many times this worker will be spawned...
I have a Ruby on Rails project in which there are millions of products with different urls. I have a function "test_response" that checks the url and returns either a true or false for the Product attribute marked_as_broken, either way the Product is saved and has its "updated_at"-attribute updated to the current Timestamp.
Since this is a very tedious process I have created a task which in turn starts off 15 tasks, each with a N/15 number of products to check. The first one should check from, for example, the first to the 10.000th, the second one from the 10.000nd to the 20.000nd and so on, using limit and offset.
This script works fine, it starts off 15 process but rather quickly completes one script after another far too early. It does not terminate, it finishes with a "Process exited with status 0".
My guess here is that using find_each together with a search for updated_at as well as in fact updating the "updated_at" while running the script changes everything and does not make the script go through the 10.000 items as supposed but I can't verify this.
Is there something inherently wrong by doing what I do here. For example, does "find_each" run a new sql query once in a while providing completely different results each time, than anticipated? I do expect it to provide the same 10.000 -> 20.000 but just split it up in pieces.
task :big_response_launcher => :environment do
nbr_of_fps = Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").size.to_i
nbr_of_processes = 15
batch_size = ((nbr_of_fps / nbr_of_processes))-2
heroku = PlatformAPI.connect_oauth(auth_code_provided_elsewhere)
(0..nbr_of_processes-1).each do |i|
puts "Launching #{i.to_s}"
current_offset = batch_size * i
puts "rake big_response_tester[#{current_offset},#{batch_size}]"
heroku.dyno.create('kopa', {
:command => "rake big_response_tester[#{current_offset},#{batch_size}]",
:attach => false
})
end
end
task :big_response_tester, [:current_offset, :batch_size] => :environment do |task,args|
current_limit = args[:batch_size].to_i
current_offset = args[:current_offset].to_i
puts "Launching with offset #{current_offset.to_s} and limit #{current_limit.to_s}"
Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").limit(current_limit).offset(current_offset).find_each do |fp|
fp.test_response
end
end
As many have noted in the comments, it seems like using find_each will ignore the order and limit. I found this answer (ActiveRecord find_each combined with limit and order) that seems to be working for me. It's not working 100% but it is a definite improvement. The rest seems to be a memory issue, i.e. I cannot have too many processes running at the same time on Heroku.
Per docs I thought it would be (for everyday at 3pm)
daily.hour_of_day(15)
What I'm getting is a random mess. First, it's executing whenever I push to Heroku regardless of time, and then beyond that, seemingly randomly. So the latest push to Heroku was 1:30pm. It executed: twice at 1:30pm, once at 2pm, once at 4pm, once at 5pm.
Thoughts on what's wrong?
Full code (note this is for the Fist of Fury gem, but FoF is heavily influenced by Sidetiq so help from Sidetiq users would be great as well).
class Outstanding
include SuckerPunch::Job
include FistOfFury::Recurrent
recurs { daily.hour_of_day(15) }
def perform
ActiveRecord::Base.connection_pool.with_connection do
# Auto email lenders every other day if they have outstanding requests
lender_array = Array.new
Inventory.where(id: (Borrow.where(status1:1).all.pluck("inventory_id"))).each { |i| lender_array << i.signup.id }
lender_array.uniq!
lender_array.each { |l| InventoryMailer.outstanding_request(l).deliver }
end
end
end
Maybe you should use:
recurrence { daily.hour_of_day(15) }
instead of recurs?
I'm currently doing live testing of a game I'm making for Android. The services are written in rails 3.1 and I'm using Postgresql. Some of my more technically savvy testers have been able to manipulate the game by recording their requests to the server and replaying them with high concurrency. I'll try to briefly describe the scenario below without getting caught up in the code.
A user can purchase multiple items, each item has its own record in the database.
The request goes to a controller action, which creates a purchase model to record information about the transaction.
The trade model has a method that sets up the purchase of the items. It essentially does a few logical steps to see if they can purchase the item. The most important is that they have a limit of 100 items per user at any given time. If all the conditions pass, a simple loop is used to create the number of items they requested.
So, what they are doing is, recording 1 valid request purchase via a proxy. Then replaying it with high concurrency, which essentially is allowing a few extra to slip through each time. So if they set it to purchase 100 quantity, they can get it up to 300-400 or if they do 15 quantity, they can get it up to like 120.
The above purchase method is wrapped in a transaction. However, even though its wrapped it won't stop it in certain circumstances where the requests are executing nearly at the same time. I'm guessing this may require some DB locking. Another twist in this that needs to be known is that at any given time rake task are being ran in cron jobs against the user table to update the players health and energy attributes. So, that cannot be blocked either.
Any assistance would be really awesome. This is my little hobby side project and I want to make sure the game is fair and fun for everyone.
Thanks so much!
Controller action:
def hire
worker_asset_type_id = (params[:worker_asset_type_id])
quantity = (params[:quantity])
trade = Trade.new()
trade_response = trade.buy_worker_asset(current_user, worker_asset_type_id, quantity)
user = User.find(current_user.id, select: 'money')
respond_to do |format|
format.json {
render json: {
trade: trade,
user: user,
messages: {
messages: [trade_response.to_s]
}
}
}
end
end
Trade Model Method:
def buy_worker_asset(user, worker_asset_type_id, quantity)
ActiveRecord::Base.transaction do
if worker_asset_type_id.nil?
raise ArgumentError.new("You did not specify the type of worker asset.")
end
if quantity.nil?
raise ArgumentError.new("You did not specify the amount of worker assets you want to buy.")
end
if quantity <= 0
raise ArgumentError.new("Please enter a quantity above 0.")
end
quantity = quantity.to_i
worker_asset_type = WorkerAssetType.where(id: worker_asset_type_id).first
if worker_asset_type.nil?
raise ArgumentError.new("There is no worker asset of that type.")
end
trade_cost = worker_asset_type.min_cost * quantity
if (user.money < trade_cost)
raise ArgumentError.new("You don't have enough money to make that purchase.")
end
# Get the users first geo asset, this will eventually have to be dynamic
potential_total = WorkerAsset.where(user_id: user.id).length + quantity
# Catch all for most people
if potential_total > 100
raise ArgumentError.new("You cannot have more than 100 dealers at the current time.")
end
quantity.times do
new_worker_asset = WorkerAsset.new()
new_worker_asset.worker_asset_type_id = worker_asset_type_id
new_worker_asset.geo_asset_id = user.geo_assets.first.id
new_worker_asset.user_id = user.id
new_worker_asset.clocked_in = DateTime.now
new_worker_asset.save!
end
self.buyer_id = user.id
self.money = trade_cost
self.worker_asset_type_id = worker_asset_type_id
self.trade_type_id = TradeType.where(name: "market").first.id
self.quantity = quantity
# save trade
self.save!
# is this safe?
user.money = user.money - trade_cost
user.save!
end
end
Sounds like you need idempotent requests so that request replay is ineffective. Where possible implement operations so that repeating them has no effect. Where not possible, give each request a unique request identifier and record whether requests have been satisfied or not. You can keep the request ID information in an UNLOGGED table in PostgreSQL or in redis/memcached since you don't need it to be persistent. This will prevent a whole class of exploits.
To deal with just this one problem create an AFTER INSERT OR DELETE ... FOR EACH ROW EXECUTE PROCEDURE trigger on the user items table. Have this trigger:
BEGIN
-- Lock the user so only one tx can be inserting/deleting items for this user
-- at the same time
SELECT 1 FROM user WHERE user_id = <the-user-id> FOR UPDATE;
IF TG_OP = 'INSERT' THEN
IF (SELECT count(user_item_id) FROM user_item WHERE user_item.user_id = <the-user-id>) > 100 THEN
RAISE EXCEPTION 'Too many items already owned, adding this item would exceed the limit of 100 items';
END IF;
ELIF TG_OP = 'DELETE' THEN
-- No action required, all we needed to do is take the lock
-- so a concurrent INSERT won't run until this tx finishes
ELSE
RAISE EXCEPTION 'Unhandled trigger case %',TG_OP;
END IF;
RETURN NULL;
END;
Alternately, you can implement the same thing in the Rails application by taking row-level lock on the customer ID before adding or deleting any item ownership records. I prefer to do this sort of thing in triggers where you can't forget to apply it somewhere, but I realise you might prefer to do it at the app level. See Pessimistic locking.
Optimistic locking is not a great fit for this application. You can use it by incrementing the lock counter on the user before adding/removing items, but it'll cause row churn on the users table and is really unnecessary when your transactions will be so short anyway.
We can't help much unless you show us your relevant schema and queries. I suppose that you do something like:
$ start transaction;
$ select amount from itemtable where userid=? and itemid=?;
15
$ update itemtable set amount=14 where userid=? and itemid=?;
commit;
An you should do something like:
$ start transaction;
$ update itemtable set amount=amount-1 returning amount where userid=? and itemid=?;
14
$ commit;