Having issues with deadlocks on a real-time, transactional, multi-user neo4J embedded system. Can you point me to documentation which will spell out what locks are acquired for each graph action - I'm especially concerned with adding and deleting relationships as that seems to cause most of the deadlocks.
e.g.
Add relationship: write locks placed on both end nodes (is it true that write locks are also placed on all relationships that exist for both end nodes?)
Delete relationship: write locks placed on relationship and both end nodes (is it true that write locks are also placed on all relationships for both end nodes?).
Why do the end nodes need to be locked during a relationship deletion?
Thanks
When you add a relationship, the graph locks the nodes involved. You can get a deadlock if you lock items in an unpredictable fashion. For me, I had a one-to-many relationship create, so we could order out many nodes by Node ID and this prevented deadlocks for us.
When you delete, it's more complicated. It locks the nodes involved, but under the covers stores the all the relationships as a doubly linked list, so when you remove a relationship you have to lock the previous and the next links so you can link them together without issue. This is something you cannot predict, as you don't really have any ability to get these ID's under the covers.
Your best bet is to do a deadlock retry policy. Do a try{}catch(DeadlockDetectedException){} and if you catch the deadlock exception, retry(I did this by putting the entire operation in a while loop that wouldn't break until the operation I wanted was free of deadlocks).
Adding and removing relationships also requires updating both nodes references to what relationships they have. In other words, adding and removing relationships implies writes to the nodes of both ends. Therefore Neo4j needs to take write locks for all three entities.
The documentation is unfortunately out of date, it seems. There is more to locking in Neo4j than what that page reveals, especially now that it has support for things like unique constraints.
Nicholas advice about trying to order the entities by their id, that you want to write to, is worth trying. You can also try to split things out in the graph, such that transactions that would otherwise conflict are less likely to work on the same data.
Found this http://docs.neo4j.org/chunked/stable/transactions-locking.html which covers some basic info but does not mention linked lists of relationships
Related
In neo4j I have an application where an API endpoint does CRUD operations on the graph, then I materialize reachable parts of the graph starting at known nodes, and finally I send out the materialized subgraphs to a bunch of other machines that don’t know how to query neo4j directly. However, the materialized views are moderately large, and within a given minute only small parts of each one will change, so I’d like to be able to query “what has changed since last time I checked” so that I only have to send the deltas. What’s the best way to do that? I’m not sure if it helps, but my data doesn’t contain arbitrary-length paths — if needed I can explicitly write each node and edge type into my query.
One possibility I imagined was adding a “last updated” timestamp as a property on every node and edge, and instead of deleting things directly, just add a “deleted” boolean property and update the timestamp, and then use some background process to actually delete a few minutes later (after the deltas have been sent out). Then in my query, select all reachable nodes and edges and filter them based on the timestamp property. However:
If there’s clock drift between two different neo4j write servers and the Raft leader changes from one to the other, can the timestamps go back in time? Or even worse, will two concurrent writes always give me a transaction time that is in commit order, or can they be reordered within a single box? I would rather use a graph-wide monotonically-increasing integer like
the write commit ID, but I can’t find a function that gives me that.
Or theoretically I could use the cookie used for causal consistency,
but since you only get that after the transaction is complete, it’d
be messy to have to do every write as two separate transactions.
Also, it just sucks to use deletion markers because then you have to explicitly remove deleted edges / nodes in every other query you do.
Are there other better patterns here?
I know that you're not supposed to rely on IDs as identifier for nodes over the long term because when you delete nodes, the IDs may be re-assigned to new nodes (ref).
Neo4j reuses its internal ids when nodes and relationships are deleted. This means that applications using, and relying on internal Neo4j ids, are brittle or at risk of making mistakes. It is therefore recommended to rather use application-generated ids.
If I'm understanding this correctly, then only looking up a node/relationship by its id when you can't guarantee if it may have been deleted puts you at risk.
If through my application design I can guarantee that the node with a certain ID hasn't been deleted since the time ID was queried, am I alright to use the IDs? Or is there still some problem that I might run into?
My use case is that I wish to perform a complex operation which spans multiple transactions. And I need to know if the ID I obtained for a node during the first transaction of that operation is a valid way of identifying the node during the last transaction of the operation.
As long as you are certain that a node/relationship with a given ID won't be deleted, you can use its native ID indefinitely.
However, over time you may get want to add support for other use cases that will need to delete that entity. Once that happens, your existing query could start producing intermittent errors (that may not be obvious).
So, it is still generally advisable to use your own identification properties.
I'm currently building a Core Data app and I've hit a snag. I guess here's some context on the schema:
The app is to keep track of a therapist's session with her clients. So the schema is organized thus: there's a table of clients, clients have sessions, sessions have activities, and activities have metrics. In the app these metrics translate to simple counters, timers, and NSSliders.
The crux is that the client wants to be able to insert previously made activities into new sessions for new clients. So, I've tried just doing a simple fetch request and then moved on to an NSFetchedResultsController. I keep running into the issue that since Core Data is an object graph, I get a ton of activity entries with virtually the same data. The only differentiating property would be whatever the session is (and if you want to go further back, the client itself).
I'm not sure if this is something I need to change in the schema itself, or if there's some kind of workaround I can do within Core Data. I've already tried doing distinct fetch results with the NSFetchedResultsController by using the result type NSDictionaryResultType. It kind of accomplishes what I want but I only get the associated properties of the entity, and not any children entities associated with it (I need those metrics, you see).
Any help is appreciated, and I can post code if desired even though I don't really have a specific coding error.
I don't see the problem. If you modeled things with the Client, Session, Activity, and Metric entities, each having a to-many relationship to the one to its right and to-one/to-many inverse relationship to the one to its left (in the order I listed the entities), there is nothing stopping you from adding a particular activity into another session (of another client), is it?
Maybe I'm misunderstanding the question.
Just use a simple NSFetchRequest and set the predicate for exactly what you are looking for. You can set the fetch limit if you are getting too many results but your question doesn't exactly sounds like a question IMO.
I believe what you are looking for is an NSPredicate to narrow your results down. Once you fetch a specific object you can assign any relation or attribute to that object easily with dot notation then save the context.
Is there a way to ensure an ordered atomic change set from Simperium?
I have a data model that has complex relationships associated. It seems looking over things that it is possible for the object graph to enter in an invalid state if the communication pipe is severed. Is there a way to indicate to Simperium that a group of changes belong together? This would be helpful as the client or server would prevent applying those changes unless all the data from a "transaction" is present thus keeping the object graph in a valid state.
Presently it's expected that your relationships are marked as optional, which allows objects to be synced and stored in any order without technically violating your model structure. Relationships are lazily re-established by Simperium at first opportunity, even if the connection is severed and later restored.
But this approach does pass some burden to your application logic. The code is open source, and suggestions for changes in this regard are welcome.
I have an object (A) that has a To-Many relationship to another object (B).
Also, B holds an inverse relationship to A.
When I delete B, It still shown on A's relation ship count, unless i manually clean the inverse relationship of B before deletion.
I want it to happen synchronously, so i could update a UITableView and delete B's row,
instead of waiting for MOC's save action to complete.
Is there any way to handle that without manually cleaning B's inverse?
(I have tons of these relationships and it would be bad practice & hard to maintain)
Thanks!
That should work automatically if you set the "Delete Rule" for the inverse relationship from B to A to "Nullify" in the Core Data Model inspector in Xcode.
See Relationship Delete Rules in the "Core Data Programming Guide" for more information.
Almost 5 years later (at iOS 10 now) and I stumbled upon the same issue.
App started crashing after I decided to 'optimise' things by removing the saveContext() from almost everywhere, thinking that the in-memory representation is guaranteed to be correct (since includesPendingChanges is true by default).
However I was getting the issue described (and because later some UITableView needed to be updated, app was crashing).
Here are four separate approaches, any one of which solved the issue in my case:
(ordered from best to worst as far as I can judge)
calling processPendingChanges() (on the NSManagedObjectContext) after the deletion
calling refreshAllObjects() (on the NSManagedObjectContext) after the deletion
calling saveContext() after the deletion (as others have pointed out in the comments)
just waiting. for example, delaying the execution of the code that depends on the correct data by 0.1 seconds or so (using DispatchQueue.main.asyncAfter). (this of course is by far the worst approach and you should not implement it)
Things I am still not sure about:
I tried replicating it on a separate very small example project and I did not get the issue. So what is the underlying problem?
Is this expected behaviour or a bug? Is it documented anywhere?