What could trigger a deadlock-message on Firebird when there is only a single transaction writing to the DB?
I am building a webapp with a backend written in Delphi2010 on top of a Firebird 2.1 database. I am getting an concurrent-update error that I cannot make sense of. Maybe someone can help me debug the issue or explain scenarios that may lead to the message.
I am trying an UPDATE to a single field on a single record.
UPDATE USERS SET passwdhash=? WHERE (RECID=?)
The message I am seeing is the standard:
deadlock
update conflicts with concurrent update
concurrent transaction number is 659718
deadlock
Error Code: 16
I understand what it tells me but I do not understand why I am seeing it here as there are no concurrent updates I know of.
Here is what I did to investigate.
I started my appplication server and checked the result of this query:
SELECT
A.MON$ATTACHMENT_ID,
A.MON$USER,
A.MON$REMOTE_ADDRESS,
A.MON$REMOTE_PROCESS,
T.MON$STATE,
T.MON$TIMESTAMP,
T.MON$TOP_TRANSACTION,
T.MON$OLDEST_TRANSACTION,
T.MON$OLDEST_ACTIVE,
T.MON$ISOLATION_MODE
FROM MON$ATTACHMENTS A
LEFT OUTER JOIN MON$TRANSACTIONS T
ON (T.MON$ATTACHMENT_ID = A.MON$ATTACHMENT_ID)
The result indicates a number of connections but only one of them has non-NULLs in the MON$TRANSACTION fields. This connection is the one I am using from IBExperts to query the monitor-tables.
Am I right to think that connection with no active transaction can be disregarded as not contributing to a deadlock-situation?
Next I put a breakpoint on the line submitting the UPDATE-Statement in my application server and executed the request that triggers it. When the breakpoint stopped the application I then reran the Monitor-query above.
This time I could see another transaction active just as I would expect:
Then I let my appserver execute the UPDATE and reap the error-message as shown above.
What can trigger the deadlock-message when there is only one writing transaction? Or are there more and I am misinterpreting the output? Any other suggestions on how to debug this?
Firebird uses MVCC (Multiversion Concurrency Control) for its transaction model. One of the features is that - depending on the transaction isolation - you will only see the last version committed when your transaction started (consistency and concurrency isolation levels), or that were committed when your statement started (read committed). A change to a record will create a new version of the record, which will only become visible to other active transactions when it has been committed (and then only for read committed transactions).
As a basic rule there can only be one uncommitted version of a record. So attempts by two transactions to update the same record will fail for one of those transaction. For historical reasons these type of errors are grouped under the deadlock error family, even though it is not actually a deadlock in the normal concurrency vernacular.
This rule is actually a bit more restrictive depending on your transaction isolation: for consistency and concurrency level there can also be no newer committed versions of a record that is not visible to your transaction.
My guess is that for you something like this happened:
Transaction 1 started
Transaction 2 started with concurrency or consistency isolation
Transaction 1 modifies record (new version created)
Transaction 1 commits
Transaction 2 attempts to modify same record
(Note, step 1+3 and 2 could be in a different order (eg 1,3,2, or 2,1,3))
Step 5 fails, because the new version created in step 3 is not visible to transaction 2. If instead read committed had been used then step 5 would succeed as the new version would be visible to the transaction at that point.
Related
On production server I got this error
ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR: canceling statement due to statement timeout <SQL query here>
In this line:
Contact.where(id: contact_ids_to_delete).delete_all
SQL query was DELETE command with a huge list of ids. It timed out.
I came up with a solution which is to delete Contacts in batches:
Contact.where(id: contact_ids_to_delete).in_batches.delete_all
The question is, how do I test my solution? Or what is the common way to test it? Or is there any gem that would make testing it convenient?
I see two possible ways to test it:
1. (Dynamically) set the timeout in test database to a small amount of seconds and create a test in which I generate a lot of Contacts and then try to run my code to delete them.
It seems to be the right way to do it, but it could potentially slow down the tests execution, and setting the timeout dynamically (which would be the ideal way to do it) could be tricky.
2. Test that deletions are in batches.
It could be tricky, because this way I would have to monitor the queries.
This is not an edge case that I would test for because it requires building and running a query that exceeds your database's built-in timeouts; the minimum runtime for this single test would be at least that time.
Even then, you may write a test for this that passes 100% of the time in your test environment but fails 100% of the time in production because of differences between the two environments that you can never fully replicate; for one, your test database is being used by a single concurrent user while your production database will have multiple concurrent users, different available resources, and different active locks. This isn't the type of issue that you write a test for because the test won't ensure it doesn't happen in production. Best practices will do that.
I recommend that you follow the best practices for Rails by using the find_in_batches or find_each methods with the expectation that the database server can successfully act on batches of 1000 records at a time:
Contact.where(id: contact_ids_to_delete).find_in_batches do |contacts|
contacts.delete_all
end
Or if you prefer:
Contact.where(id: contact_ids_to_delete).find_in_batches(&:delete_all)
You can tweak the batch size with batch_size if you're paranoid about your production database server not being able to act on 1000 records at a time:
Contact.where(id: contact_ids_to_delete).find_in_batches(batch_size: 500) { |contacts| contacts.delete_all }
As my understanding, if we have any code inside this transaction and when it happens any error with (save!, ...) in that block the entire code will revert, here the problem is if any timeout (rack timeout = 12) happens in this block.
def create
ActiveRecord::Base.transaction do
// timeout happens
end
end
How can we rollback a code with a transaction when a Rack::Timeout occurs?
When a Rack timeout happens, any transaction in process will be rolled back, but transactions that have already been committed will, of course remain committed. You should not have to worry about it.
Once you start a database transaction, it will eventually either be committed or rolled back. Those are the only two possibilities for ending a transaction. When you commit the transaction, you are saying that you want those changes to be saved regardless of what happens next. If you do not commit the transaction, the database will automatically rollback the transaction once it gets into any state where the transaction cannot move forward, such as a broken network connection.
ActiveRecord automatically commits the transaction when the ActiveRecord::Base.transaction do block exits normally. Abnormal exits may cause ActiveRecord to issue a ROLLBACK command to the database, which is efficient and good practice and returns the connection to a ready state, but it is not strictly necessary, because unless the transaction is explicitly committed, the database will eventually automatically roll it back.
If you look at ActiveRecord::ConnectionAdapters::TransactionManager#within_new_transaction on line 270 and 283 Rails is rescuing Exception. Rescuing Exception will catch anything and everything including kill commands and should generally be avoided. It is used in this case to ensure that no matter what is raised (including Rack::Timeout) the transaction will rollback.
You need to explicitly specify an error class, then it will be rescued and ActiveRecord will rollback the transaction.
e.g. Timeout.timeout(1, Timeout::Error) do
https://ruby-doc.org/stdlib-2.4.0/libdoc/timeout/rdoc/Timeout.html
The exception thrown to terminate the given block cannot be rescued
inside the block unless klass is given explicitly.
Without it ActiveRecord thinks that there is no error and makes a COMMIT.
It seems it was a default behavior until ruby 2.1
API clients in a busy application are competing for existing resources. They request 1 or 2 at a time, then attempt actions upon those record. I am trying to use transactions to protect state but am having trouble getting a clear picture of row locks, especially where nested transactions (I guess savepoints, since PG doesn't really do transactions within transactions?) are concerned.
The process should look like this:
Request N resources
Remove those resources from the pool to prevent other users from attempting to claim them
Perform action with those resources
Roll back the entire transaction and return resources to pool if an error occurs
(Assume happy path for all examples. Requests always result in products returned.)
One version could look like this:
def self.do_it(request_count)
Product.transaction do
locked_products = Product.where(state: 'available').lock('FOR UPDATE').limit(request_count).to_a
Product.where(id: locked_products.map(&:id)).update_all(state: 'locked')
do_something(locked_products)
end
end
It seems to me that we could have a deadlock on that first line if two users request 2 and only 3 are available. So, to get around it, I'd like to do...
def self.do_it(request_count)
Product.transaction do
locked_products = []
request_count.times do
Product.transaction(requires_new: true) do
locked_product = Product.where(state: 'available').lock('FOR UPDATE').limit(1).first
locked_product.update!(state: 'locked')
locked_products << locked_product
end
end
do_something(locked_products)
end
end
But from what I've managed to find online, that inner transaction's end will not release the row locks -- they'll only be released when the outermost transaction ends.
Finally, I considered this:
def self.do_it(request_count)
locked_products = []
request_count.times do
Product.transaction do
locked_product = Product.where(state: 'available').lock('FOR UPDATE').limit(1).first
locked_product.update!(state: 'locked')
locked_products << locked_product
end
end
Product.transaction { do_something(locked_products) }
ensure
evaluate_and_cleanup(locked_products)
end
This gives me two completely independent transactions followed by a third that performs the action, but I am forced to do a manual check (or I could rescue) if do_something fails, which makes things messier. It also could lead to deadlocks if someone were to call do_it from within a transaction, which is very possible.
So my big questions:
Is my understanding of the release of row locks correct? Will row locks within nested transactions only be released when the outermost transaction is closed?
Is there a command that will change the lock type without closing the transaction?
My smaller question:
Is there some established or totally obvious pattern here that's jumping out to someone to handle this more sanely?
As it turns out, it was pretty easy to answer these questions by diving into the PostgreSQL console and playing around with transactions.
To answer the big questions:
Yes, my understanding of row locks was correct. Exclusive locks acquired within savepoints are NOT released when the savepoint is released, they are released when the overall transaction is committed.
No, there is no command to change the lock type. What kind of sorcery would that be? Once you have an exclusive lock, all queries that would touch that row must wait for you to release the lock before they can proceed.
Other than committing the transaction, rolling back the savepoint or the transaction will also release the exclusive lock.
In the case of my app, I solved my problem by using multiple transactions and keeping track of state very carefully within the app. This presented a great opportunity for refactoring and the final version of the code is simpler, clearer, and easier to maintain, though it came at the expense of being a bit more spread out than the "throw-it-all-in-a-PG-transaction" approach.
I'm experiencing random errors (several per day) in my mvc+ef+unity application under higher load (10+ request per sec):
The connection was not closed / The connection's current state is connecting
deadlocks on Count queries (no explicit transaction)
An item with the same key has already been added. in System.Data.Entity.DbContext.SetTEntity while resolving DbContext
The remote host closed the connection. The error code is 0x80070057
There is already an open DataReader associated with this Command which must be closed first. - I turned on MARS to get rid off this (altough I believe it should work correctly without MARS, there are no nested queries), which may caused another random error:
The server will drop the connection, because the client driver has sent multiple requests while the session is in single-user mode.
I use this implementation of PerRequestLifetimeManager and tried Unity.Mvc3 too without any difference.
There are some hints that DbContext is not being disposed correctly. I am not sure if per-request is the cause of problems, because it seems to be common practise.
After further investigation I found out that request processing thread sometimes steals DbContext from other thread, so Rashid's implementation of PerRequestLifetimeManager may not be thread safe. I moved to Unity.Mvc3 again and the errors disappeared, I must have made some mistake when I tried that last time.
The only error not related were deadlocks. They were caused by collision of
SELECT ... FROM X JOIN Y ... JOIN Z ...
and
BEGIN TRAN
UPDATE Z ...
UPDATE Y ...
COMMIT TRAN
SELECT locked Y and wanted Z, TRAN locked Z and wanted Y
I'm looking for some good library for processing tasks (or 'operations' as we call them in our domain model) for Java or .NET. We save each operation to perform in db and then we need some mechanism for fetching unprocessed tasks from db, process them and update db record with proper status ('processed OK' / 'process error').
The trick is that operation can depend one on another. For example when 'Operation Payment' is being processed the system might discover that we need to perform 'Operation Check Payment Data' before - so it should create new operation row in db, pause performing 'Operation Payment', process 'Operation Check Payment Data' in next turn and after it completes go back to processing 'Operation Payment'.
I'll show you how we manage this at the moment.
We've got db table 'operations'. Cron-like mechanism runs each minute and fetches first 100 unprocessed operations from db and process it. If (while processing) system finds that some other operation (B) is needed to perform current operation (A), then new operation (B) record is created and performing current operation (A) is halted. Next minute cron fetches operations A and B. Operation A is fetches as it is not processed but system sees that dependent operation B is already created so it does not create it once again. Operation B is processed and status 'processed OK' is saved in proper row in db. Next minute cron fetches operation A from db and can finally perform it because dependent task is completed.
We are looking for ways to make it simpler, better and more elegant.
There is a list of open-source Java workflow engines.