Database resiliency - database-connection

I'm designing an application that relies heavily on a database. I need my application to be resilient to short losses of connectivity to the database (network going down for a few seconds for example). What is the usual patterns that people use for these kind of problems. Is there something that I can do on the database access layer to handle gracefully a small glitch in the network connection to the db (i'm using hibernate + oracle jdbc + dbcp pool).

I'll assume you have hidden every database access behind a DAO or something similiar.
Now create wrappers around these DAOs, that try to call them, and in case of an exception wait a second and retry. Of course this will cause 'hanging' of the application during db-outage, but it will come back to live when the database becomes available.
If this is not acceptable you'll have to move the cut up closer to the ui layer. Consider the following approach.
User causes a
request.
wrap all the request information in a message and put it in the queue.
return to the user, telling him that his request will get processed in a short time.
A worker registered on the queue will process the request, retrying when database problems exist.
Note that you are now deep in concurrency land. So you must handle things like requests referencing an entity which already got deleted.
Read up on 'eventual consistency'
Since you are using hibernate, you'll have to deal with lazy loading. An interruption in connectivity will kill your session, so for you it might be best not to use lazy loading at all, but work with detached objects.

Related

How to handle split-brain?

I have read in Orleans FAQ when split-brain could happen but I don't understand what bad can happen and how to handle it properly.
FAQ says something vague like:
You just need to consider the rare possibility of having two instances of an actor while writing your application.
But how actually should I consider this and what can happen if I won't?
Orleans Paper (http://research.microsoft.com/pubs/210931/Orleans-MSR-TR-2014-41.pdf) says this:
application can rely on external persistent
storage to provide stronger data consistency
But I don't understand what this means.
Suppose split brain happened. Now I have two instances of one grain. When I'll send a few messages they could be received by these two (or there can be even more?) different instances. Suppose each instance prior to receiving these messages had same state. Now, after processing these messages they have different states.
How they should persist their states? There could be a conflict.
When another instances will be destroyed and only one will remain what will happen to the states of destroyed instances? It'll be like messages processed by them has never been processed? Then client state and server state could be desyncronized IIUC.
I see this (split-brain) as a big problem and I don't understand why there is so little attention to it.
Orleans leverages the consistency guarantees of the storage provider. When you call this.WriteStateAsync() from a grain, the storage provider ensures that the grain has seen all previous writes. If it has not, an exception is thrown. You can catch that exception and call DeactivateOnIdle() and rethrow the exception or call ReadStateAsync() and retry. So if you have 2 grains during a split-brain scenario, which ever one calls WriteStateAsync() first prevents the other one from writing state without first having read the most up-to-date state.
Update: Starting in Orleans v1.5.0, a grain which allows an InconsistentStateException to be thrown back to the caller will automatically be deactivated when the currently executing calls complete. A grain can catch and handle the exception to avoid automatic deactivation.

Core data with REST api [duplicate]

Hey, I'm working on the model layer for our app here.
Some of the requirements are like this:
It should work on iPhone OS 3.0+.
The source of our data is a RESTful Rails application.
We should cache the data locally using Core Data.
The client code (our UI controllers) should have as little knowledge about any network stuff as possible and should query/update the model with the Core Data API.
I've checked out the WWDC10 Session 117 on Building a Server-driven User Experience, spent some time checking out the Objective Resource, Core Resource, and RestfulCoreData frameworks.
The Objective Resource framework doesn't talk to Core Data on its own and is merely a REST client implementation. The Core Resource and RestfulCoreData all assume you talk to Core Data in your code and they solve all the nuts and bolts in the background on the model layer.
All looks okay so far and initially I though either Core Resource or RestfulCoreData will cover all of the above requirements, but... There's a couple of things none of them seemingly happen to solve correctly:
The main thread should not be blocked while saving local updates to the server.
If the saving operation fails the error should be propagated to the UI and no changes should be saved to the local Core Data storage.
Core Resource happens to issue all of its requests to the server when you call - (BOOL)save:(NSError **)error on your Managed Object Context and therefore is able to provide a correct NSError instance of the underlying requests to the server fail somehow. But it blocks the calling thread until the save operation finishes. FAIL.
RestfulCoreData keeps your -save: calls intact and doesn't introduce any additional waiting time for the client thread. It merely watches out for the NSManagedObjectContextDidSaveNotification and then issues the corresponding requests to the server in the notification handler. But this way the -save: call always completes successfully (well, given Core Data is okay with the saved changes) and the client code that actually called it has no way to know the save might have failed to propagate to the server because of some 404 or 421 or whatever server-side error occurred. And even more, the local storage becomes to have the data updated, but the server never knows about the changes. FAIL.
So, I'm looking for a possible solution / common practices in dealing with all these problems:
I don't want the calling thread to block on each -save: call while the network requests happen.
I want to somehow get notifications in the UI that some sync operation went wrong.
I want the actual Core Data save fail as well if the server requests fail.
Any ideas?
You should really take a look at RestKit (http://restkit.org) for this use case. It is designed to solve the problems of modeling and syncing remote JSON resources to a local Core Data backed cache. It supports an offline mode for working entirely from the cache when there is no network available. All syncing occurs on a background thread (network access, payload parsing, and managed object context merging) and there is a rich set of delegate methods so you can tell what is going on.
There are three basic components:
The UI Action and persisting the change to CoreData
Persisting that change up to the server
Refreshing the UI with the response of the server
An NSOperation + NSOperationQueue will help keep the network requests orderly. A delegate protocol will help your UI classes understand what state the network requests are in, something like:
#protocol NetworkOperationDelegate
- (void)operation:(NSOperation *)op willSendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op didSuccessfullySendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op encounteredAnError:(NSError *)error afterSendingRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
#end
The protocol format will of course depend on your specific use case but essentially what you're creating is a mechanism by which changes can be "pushed" up to your server.
Next there's the UI loop to consider, to keep your code clean it would be nice to call save: and have the changes automatically pushed up to the server. You can use NSManagedObjectContextDidSave notifications for this.
- (void)managedObjectContextDidSave:(NSNotification *)saveNotification {
NSArray *inserted = [[saveNotification userInfo] valueForKey:NSInsertedObjects];
for (NSManagedObject *obj in inserted) {
//create a new NSOperation for this entity which will invoke the appropraite rest api
//add to operation queue
}
//do the same thing for deleted and updated objects
}
The computational overhead for inserting the network operations should be rather low, however if it creates a noticeable lag on the UI you could simply grab the entity ids out of the save notification and create the operations on a background thread.
If your REST API supports batching, you could even send the entire array across at once and then notify you UI that multiple entities were synchronized.
The only issue I foresee, and for which there is no "real" solution is that the user will not want to wait for their changes to be pushed to the server to be allowed to make more changes. The only good paradigm I have come across is that you allow the user to keep editing objects, and batch their edits together when appropriate, i.e. you do not push on every save notification.
This becomes a sync problem and not one easy to solve. Here's what I'd do: In your iPhone UI use one context and then using another context (and another thread) download the data from your web service. Once it's all there go through the sync/importing processes recommended below and then refresh your UI after everything has imported properly. If things go bad while accessing the network, just roll back the changes in the non UI context. It's a bunch of work, but I think it's the best way to approach it.
Core Data: Efficiently Importing Data
Core Data: Change Management
Core Data: Multi-Threading with Core Data
You need a callback function that's going to run on the other thread (the one where actual server interaction happens) and then put the result code/error info a semi-global data which will be periodically checked by UI thread. Make sure that the wirting of the number that serves as the flag is atomic or you are going to have a race condition - say if your error response is 32 bytes you need an int (whihc should have atomic acces) and then you keep that int in the off/false/not-ready state till your larger data block has been written and only then write "true" to flip the switch so to speak.
For the correlated saving on the client side you have to either just keep that data and not save it till you get OK from the server of make sure that you have a kinnf of rollback option - say a way to delete is server failed.
Beware that it's never going to be 100% safe unless you do full 2-phase commit procedure (client save or delete can fail after the signal from the server server) but that's going to cost you 2 trips to the server at the very least (might cost you 4 if your sole rollback option is delete).
Ideally, you'd do the whole blocking version of the operation on a separate thread but you'd need 4.0 for that.

db4o Client See Changes from Another Client

I'm running a db4o server with multiple clients accessing it. I just ran into the issue of one client not seeing the changes from another client. From my research on the web, it looks like there are basically two ways to solve it.
1: Call Refresh() on the object (from http://www.gamlor.info/wordpress/2009/11/db4o-client-server-and-concurrency/):
const int activationDeph = 4;
client2.Ext().Refresh(objFromClient2, activationDeph);
2: Instead of caching the IObjectContainer, open a new IObjectContainer for every DB request.
Is that right?
Yes, #1 is more efficient, but is that really realistic to specify which objects to refresh? I mean, when a DB is involved, every time a client accesses it, it should get the latest information. That's why I'm leaning towards #2. Plus, I don't have major efficiency concerns.
So, am I right that those are the two approaches? Or is there another?
And, wait a sec... what happens when your object goes out of scope? On a timer, I call a method that gets an object from the DB server. That method instantiates the object. Since the object went out of scope, it's not there to refresh. And when I call the DB, I don't see the changes from the client. In this case, it seems like the only option is to open a new IObjectContainer. No?
** Edit **
I thought I'd post some code using the solution I finally decided to use. Since there were some serious complexities with using a new IObjectContainer for every call, I'm simply going to do a Refresh() in every method that accesses the DB (see Refresh() line below). Since I've encapsulated my DB access into logic classes, I can make sure to do the Refresh() there, every time. I just tested this and it seems to be working.
Note: The Database variable below is the the db4o IObjectContainer.
public static ApplicationServer GetByName(string serverName)
{
ApplicationServer appServer = (from ApplicationServer server in Database
where server.Name.ToUpperInvariant() == serverName.ToUpperInvariant()
select server).FirstOrDefault();
Database.Ext().Refresh(appServer, 10);
return appServer;
}
1) As you said, the major problem with this that you usually really don't know what objects to refresh.
You can use the committed event to refresh objects as soon as any client has committed. db4o will distribute that event. Note that this also consumes some network traffic & time to send the events. And there will be a time frame where your objects have a stale state.
2) It actually the cleanest method, but not for every db request. Use a object container for every logical unit of work. Any operation which is one 'atomic' unit of work in your business-operations.
Anyway in general. db4o was never build with the client server scenario as first priority, and it shows in the concurrent scenarios. You cannot avoid working with stale (and even inconsistent) object state and there is no concurrency control options (except the low level semaphores).
My recommendation: Use a client container per unit of work. Be aware that even then you might get stale data, which then might lead to a inconsistent view & update. When there are rarely any contentions & races in your application scenario and you can tolerate a mistake once in a while, then this is fine. However if you really need to ensure correctness, then I recommend to use a database which has a better concurrency store =(

How to handle this concurrency scenario with NHibernate + asp.net mvc?

The context: a web application written in asp.net MVC + NHibernate. This is a card game where players play at the same time so their actions might modify the same field of an entity at the same time (they all do x = x + 1 on a field). Seems pretty classical but I don't know how to handle it.
Needless to say I can't present a popup to the user saying "The entity has been modified by another player. Merge or cancel ?". When you think that this is related to an action to a card, I can't interfere like this. My application has to internally handle this scenario. Since the field is in an entity class and each session has it own instance of the entity, I can't simply take a CLR lock. Does it mean I should use pessimistic concurrency so that each web request acting on this entity is queued until a player finished his action? In practical terms in means that each PlayCard request should use a lock?
Please, don't send me to NH doc about concurrency or alike. I'm after the technique that should be used in this case, not how to implement it in NH.
Thanks
It may make sense depending on your business logic to try second level caching. This may be a good depending on the length of the game and how it is played. Since the second level cache exists on the session factory level, the session factory will have to be managed according to the life time of the game. An Nh session can be created per request, but being spawned by a session factory configured for second level cache means data of interest is cached across all sessions. The advantage of using second level cache is that you can configure this on a class by class basis - caching only the entities your require. It also provides a variety of concurrency strategies depending on the cache provider. Even though this may shift the concurrency issue from the DB level to the NH session, this may give you a better option for dealing with your situation. There are gotchas to using this but it's suitability all depends on your business logic.
You can try to apply optimistic locking in this way:
DB entity will have a column tracking entity version (nhibernate.info link).
If you get "stale version" exception while saving entity ( = modified by another user) - reload the entity and try again. Then send the updated value to the client.
As I understand your back-end receives request from the client, then opens session, does some changes and updates entities closing session. In this case no thread will hold one entity in memory for too long and optimistic locking conflicts shouldn't happen too often.
This way you can avoid having many locked threads waiting for operation to complete.
On the other hand, if you expect retries to happen too often you can try SELECT FOR UPDATE locking when loading your entity (using LockMode.Upgrade in NH Get method). Although I found the thread that discourages me from using this with SQL Server: SO link.
In general the solution depends on the logic of the game and whether you can resolve concurrency conflicts in your code without showing messages to users. I'd also made UI updating itself with the latest data often enough to avoid players acting on obsolete game situation and then be surprised with the outcome.

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources