Random errors occur with per-request DbContext - asp.net-mvc

I'm experiencing random errors (several per day) in my mvc+ef+unity application under higher load (10+ request per sec):
The connection was not closed / The connection's current state is connecting
deadlocks on Count queries (no explicit transaction)
An item with the same key has already been added. in System.Data.Entity.DbContext.SetTEntity while resolving DbContext
The remote host closed the connection. The error code is 0x80070057
There is already an open DataReader associated with this Command which must be closed first. - I turned on MARS to get rid off this (altough I believe it should work correctly without MARS, there are no nested queries), which may caused another random error:
The server will drop the connection, because the client driver has sent multiple requests while the session is in single-user mode.
I use this implementation of PerRequestLifetimeManager and tried Unity.Mvc3 too without any difference.
There are some hints that DbContext is not being disposed correctly. I am not sure if per-request is the cause of problems, because it seems to be common practise.

After further investigation I found out that request processing thread sometimes steals DbContext from other thread, so Rashid's implementation of PerRequestLifetimeManager may not be thread safe. I moved to Unity.Mvc3 again and the errors disappeared, I must have made some mistake when I tried that last time.
The only error not related were deadlocks. They were caused by collision of
SELECT ... FROM X JOIN Y ... JOIN Z ...
and
BEGIN TRAN
UPDATE Z ...
UPDATE Y ...
COMMIT TRAN
SELECT locked Y and wanted Z, TRAN locked Z and wanted Y

Related

Remote neo4j gremlin inconsistent results

I setup a neo4j server, gremlin server and gremlin console. I am connecting gremlin server to neo4j with SteelBridgeLabs/neo4j-gremlin-bolt. When I add several nodes and try to fetch all nodes afterwards from gremlin console, I get inconsistent results. It doesn't return all nodes.
neo4j.properties
gremlin.graph=com.steelbridgelabs.oss.neo4j.structure.Neo4JGraph
#neo4j.graph.name=graph.db
neo4j.identifier=dummy
neo4j.url=bolt://localhost:7687
neo4j.username=neo4j
neo4j.password=pass
neo4j.readonly=false
neo4j.vertexIdProvider=com.steelbridgelabs.oss.neo4j.structure.providers.Neo4JNativeElementIdProvider
neo4j.edgeIdProvider=com.steelbridgelabs.oss.neo4j.structure.providers.Neo4JNativeElementIdProvider
That is how i add nodes and their results
gremlin> g.addV('cat').property("name","sylvester")
==>v[null]
gremlin> g.addV('cat').property("name","tom")
==>v[null]
gremlin> g.addV('cat').property("name","garfield")
==>v[null]
gremlin> g.addV('mice').property("name","jerry")
==>v[null]
Neo4j Browser shows these nodes without a problem. But when i query from gremlin-console i get different result as follows
gremlin> g.V().valueMap()
==>{name=[garfield]}
==>{name=[sylvester]}
==>{name=[tom]}
==>{name=[jerry]}
gremlin> g.V().valueMap()
==>{name=[garfield]}
The comments above apparently lead to the answer here, but I can't say that I'm clear on exactly why the fix mentioned actually resolves the problem.
For those who come across this question, I will summarize how this sort of connectivity is expected to work. Connecting from Gremlin Console to Gremlin Server with:
:remote connect tinkerpop.server conf/remote.yaml
opens a sessionless connection which should create a situation where the server manages transactions for you, meaning each time you submit a request to the server, the a transaction is opened, the traversal executed and the transaction closed. It also means that any mutations to the graph should be automatically be committed (or rolledback on failure) at the end of each request. With that model, there should be no situation where you get stale or inconsistent data.
To build on that notion, performing a submission as:
gremlin> g.tx().rollback();g.V().valueMap()
under this (and any) connection model should automatically yield a fresh transaction explicitly and thus never produce a stale result set.
The following method of connection yields a session such that the user manages the transaction:
:remote connect tinkerpop.server conf/remote.yaml session
and therefore may yield an inconsistent state. You must explicitly commit and rollback transactions as needed as the transaction can extend over multiple requests. In other words expect that it will be necessary to call g.tx().rollback() prior to getting the latest changes made to the graph in a different thread of execution.
The final connection option is as follows and it blends the two concepts:
:remote connect tinkerpop.server conf/remote.yaml session-managed
You get a session in the sense that variables are preserved between requests, but each request represents a single transaction which is commit or rolledback at the end of each request. Like a sessionless connection, you should not expect an inconsistent state or stale data and, as mentioned earlier, use of g.tx().rollback() prior to the query should force start a new transaction even if the managed transaction somehow failed to behave as expected.
If these things are not working as described here, I would likely wonder about the graph provider itself and whether or not its transaction semantics are completely compliant with the TinkerPop model.

Deadlock on concurrent update, but I can see no concurrency

What could trigger a deadlock-message on Firebird when there is only a single transaction writing to the DB?
I am building a webapp with a backend written in Delphi2010 on top of a Firebird 2.1 database. I am getting an concurrent-update error that I cannot make sense of. Maybe someone can help me debug the issue or explain scenarios that may lead to the message.
I am trying an UPDATE to a single field on a single record.
UPDATE USERS SET passwdhash=? WHERE (RECID=?)
The message I am seeing is the standard:
deadlock
update conflicts with concurrent update
concurrent transaction number is 659718
deadlock
Error Code: 16
I understand what it tells me but I do not understand why I am seeing it here as there are no concurrent updates I know of.
Here is what I did to investigate.
I started my appplication server and checked the result of this query:
SELECT
A.MON$ATTACHMENT_ID,
A.MON$USER,
A.MON$REMOTE_ADDRESS,
A.MON$REMOTE_PROCESS,
T.MON$STATE,
T.MON$TIMESTAMP,
T.MON$TOP_TRANSACTION,
T.MON$OLDEST_TRANSACTION,
T.MON$OLDEST_ACTIVE,
T.MON$ISOLATION_MODE
FROM MON$ATTACHMENTS A
LEFT OUTER JOIN MON$TRANSACTIONS T
ON (T.MON$ATTACHMENT_ID = A.MON$ATTACHMENT_ID)
The result indicates a number of connections but only one of them has non-NULLs in the MON$TRANSACTION fields. This connection is the one I am using from IBExperts to query the monitor-tables.
Am I right to think that connection with no active transaction can be disregarded as not contributing to a deadlock-situation?
Next I put a breakpoint on the line submitting the UPDATE-Statement in my application server and executed the request that triggers it. When the breakpoint stopped the application I then reran the Monitor-query above.
This time I could see another transaction active just as I would expect:
Then I let my appserver execute the UPDATE and reap the error-message as shown above.
What can trigger the deadlock-message when there is only one writing transaction? Or are there more and I am misinterpreting the output? Any other suggestions on how to debug this?
Firebird uses MVCC (Multiversion Concurrency Control) for its transaction model. One of the features is that - depending on the transaction isolation - you will only see the last version committed when your transaction started (consistency and concurrency isolation levels), or that were committed when your statement started (read committed). A change to a record will create a new version of the record, which will only become visible to other active transactions when it has been committed (and then only for read committed transactions).
As a basic rule there can only be one uncommitted version of a record. So attempts by two transactions to update the same record will fail for one of those transaction. For historical reasons these type of errors are grouped under the deadlock error family, even though it is not actually a deadlock in the normal concurrency vernacular.
This rule is actually a bit more restrictive depending on your transaction isolation: for consistency and concurrency level there can also be no newer committed versions of a record that is not visible to your transaction.
My guess is that for you something like this happened:
Transaction 1 started
Transaction 2 started with concurrency or consistency isolation
Transaction 1 modifies record (new version created)
Transaction 1 commits
Transaction 2 attempts to modify same record
(Note, step 1+3 and 2 could be in a different order (eg 1,3,2, or 2,1,3))
Step 5 fails, because the new version created in step 3 is not visible to transaction 2. If instead read committed had been used then step 5 would succeed as the new version would be visible to the transaction at that point.

Rails 4 Multithreaded App - ActiveRecord::ConnectionTimeoutError

I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database

How does Rails 4 Russian doll caching prevent stampedes?

I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.

Single user database connection best practices

With MS Access single user,
Is it good practice or okay to maintain a persistent connection throughout?
psuedocode:
app.start();
access.connect();
domanymanystuff();
access.disconnect();
app.exit();
--- OR ----
app.start();
access.connect();
doonetask();
access.disconnect();
...
access.connect();
doanothertask();
access.disconnect();
...
app.exit();
?
Honestly it won't matter since most data connection are pooled and will hang around for reuse after you have closed them. You do want to make sure that your transactions are performed in a 'per unit of work' fashion.
Otherwise, even with a single user DB you could find your application locking itself out.
So, try this:
Open connection
Start transaction
Perform unit of work
Commit transaction
...
Start transaction
Perform unit of work
Commit transaction
...
Start transaction
Perform unit of work
Commit transaction
...
Close connection
You can maintain a persistent connection throughout with a single-user database.

Resources