My last question was closed for being a duplicate Confused about MERGE sometimes creating duplicate relationship, however I was unable to find a solution, and this deals with duplicate relationships, not duplicate nodes.
I have a query when a user VISITED another user's profile
MATCH (you:User {user_id: { myId }}), (youVisited:User {user_id: { id }})
MERGE (you)-[yvr:VISITED]->(youVisited)
SET yvr.seen = false, yvr.created_at = timestamp()
RETURN yvr.created_at as visited_at
I noticed that in rare cases, a duplicate [:VISITED] relationship happens. For (1057)-[:VISITED]->(630), both have the same properties, and there's really only supposed to be one [:VISITED] no matter what (the next time the user visits, it should simply MERGE the [:VISITED] and update the [:VISITED {created_at: ..., seen: false}] between the same User nodes:
{
created_at: 1485800172734,
seen: false
}
I thought the point of MERGE to prevent this? Clearly not, so why does this happen and how can I ensure this doesn't happen?
I have looked up some other things, but I am not sure if the information is reliable or up to date. For example: http://neo4j.com/docs/developer-manual/current/cypher/clauses/create-unique/, am I supposed to be using CREATE UNIQUE instead? I thought MERGE was pretty much a better replacement for it.
I agree that in some cases, MERGE and CREATE UNIQUE can be used for the same purpose. MERGE does not replace CREATE UNIQUE, however.
For example, MERGE allows multiple matches, and its pattern has to fully match the graph to be considered a match - it will simply duplicate partial matches; CREATE UNIQUE, on the other hand, will error on multiple matches, and allows partial matches - it will attempt to re-use existing parts of your graph and add the missing parts.
As mentioned in the docs, there also seems to be a difference regarding uniqueness of relationships, i.e. what you are experiencing:
MERGE might be what you want to use instead of CREATE UNIQUE. Note however, that MERGE doesn’t give as strong guarantees for relationships being unique.
I'll leave it up to the developers of Neo4j to explain exactly what those guarantees are. I can only say that in your particular case, CREATE UNIQUE seems a better fit than MERGE anyway: if your intent is to only ever allow a single VISITED relationship from one user to another - his last visit - and multiple VISITED relationships are a violation of your data model, then by all means use CREATE UNIQUE to document this intent, and enforce it at the database level at the same time.
In this case, one could argue that the VISITED relationship is also not particularly well-named, since it implies that there could be more: one for each time that a user visited another user's profile.
As mentioned in my comments, there was a locking bug with MERGE upon Neo4j switching to the COST planner.
As far as I can tell it works like this:
Due to the bug, double-checked locking wasn't occurring, so after MERGE determines the relationship doesn't exist, it locks on the nodes in preparation to CREATE the relationship, but there's a race condition between the time of the existence check of the relationship, and the locking, so a concurrent MERGE or CREATE could have created the relationship just before the locks were acquired, resulting in duplicate relationships being created.
The fix will ensure MERGE checks for the existence of the relationship again after the locks are acquired. This should restore concurrency guarantees for MERGE.
This fix is not yet in current Neo4j releases as of 2/10/2017.
In the meantime, you can explicitly lock on the nodes in question before you MERGE to prevent the race condition.
You can do this by setting/removing nonexistent values on the nodes in question, or use APOC locking procedures.
Related
It's a commonly known pitfall that MERGE creates either all nodes or none of them when you give it a complex pattern. If you don't have uniqueness constraints, then MERGE will create duplicate nodes, by design.
I was wondering if there is a way to ensure a pattern exists, while utilizing matches.
e.g. for this pattern:
(Person {name: "Bob"})-[:FRIEND_OF]->(Person {name: "Alice"})-[:FRIEND_OF]->(Person {name: "Chloe"})
The database could be in various states:
Bob and Chloe exist but Alice doesn't. Alice should be created.
All 3 exist but none of them are friends, relation should be created.
Alice and Chloe exist and are friends, Bob should be created.
and so on.
In the end result, I want that the pattern is somehow ensured. I understand that the MERGE operation perhaps wasn't built to ensure this. Is there a specific way of dealing with this issue, or am I supposed to just ensure this at application level, and in problem/domain-specific ways to ensure efficiency?
hi how can i transform this SQL Query as CYPHER Query ? :
SELECT n.enginetype, n.Rocket20, n.Yearlong, n.DistanceOn,
FROM TIMETAB AS n
JOIN PLANEAIR AS p ON (n.tailnum = p.tailNum)
If it is requisition before using that query to create any relationship or antyhing please write and help with that one too.. thanks
Here's a good guide for comparing SQL with Cypher and showing the equivalent Cypher for some SQL queries.
If we were to translate this directly, we'd use :PLANEAIR and :TIMETAB node labels (though I'd recommend using better names for these), and we'll need a relationship between them. Let's call it :RELATION.
Joins in SQL tend to be replaced with relationships between nodes, so we'll need to create these patterns in your graph:
(:PLANEAIR)-[:RELATION]->(:TIMETAB)
There are several ways to get your data into the graph, usually through LOAD CSV. The general approach is to MERGE your :PLANEAIR and :TIMETAB nodes with some id or unique property (maybe TailNum?, use ON CREATE SET ... after the MERGE to add the rest of the properties to the node when it's created, and then MERGE the relationship between the nodes.
The MERGE section of the developers manual should be helpful here, though I'd recommend reading through the entire dev manual anyway.
With this in place, the Cypher equivalent query is:
MATCH (p:PLANEAIR)-[:RELATION]->(n:TIMETAB)
RETURN n.Rocket20,p.enginetype, n.year, n.distance
Now this is just a literal translation of your SQL query. You may want to reconsider your model, however, as I'm not sure how much value there is in keeping time-related data for a plane separate from its node. You may just want to have all of the :TIMETAB properties on the :PLANEAIR node and do away with the :TIMETAB nodes completely. Of course your queries and use cases should guide how to model that data best.
EDIT
As far as creating the relationship between :PLANEAIR and :TIMETAB nodes (and again, I recommend using better labels for these, and maybe even keeping all time-related properties on a :Plane node instead of a separate one), provided you already have those nodes created, you'll need to do a joining match, but it will help to have a unique constraints on :PLANEAIR(tailnum) :TIMETAB(tailNum) (or an index, if this isn't supposed to be a unique property):
CREATE CONSTRAINT ON (p:PLANEAIR)
ASSERT p.tailNum IS UNIQUE
CREATE CONSTRAINT ON (n:TIMETAB)
ASSERT n.TailNum IS UNIQUE
Now we're ready to create the relationships
MATCH (p:PLANEAIR)
MATCH (n:TIMETAB)
WHERE p.tailNum = n.tailNum
CREATE (p)-[:RELATION]->(n)
REMOVE n.tailNum
Now that the relationships are created, and :TIMETAB tailNum property removed, we can drop the unique constraint on :TIMETAB(tailNum), since the relationship to :PLANEAIR is all we need.
DROP CONSTRAINT ON (n:TIMETAB)
ASSERT n.tailNum IS UNIQUE
I recently discovered that a race condition exists when executing concurrent MERGE statements. Specifically, duplicate nodes can be created in the scenario where a node is created after the MATCH step but before the CREATE step of a given MERGE.
This can be worked around in some instances using unique constraints on the merged nodes; however, this falls short in scenarios where:
There is no single unique property to enforce (e.g. pairs of properties need to be unique but individual ones don't).
Trying to merge relationships and paths.
Does using CREATE UNIQUE solve this problem (or do the same pitfalls exist)? If so, is it the only option? It feels like the usefulness of MERGE is fairly heavily diminished when it effectively can't guarantee the uniqueness of the path or node being merged...
When MERGE statements are executed concurrently, these situations may occur. Basically, each transaction gets a view of the graph at the first point of reading, and won't see updates made after that point (with some variations). The main exception to this are uniquely constrained nodes, where Neo4j will initialise a fresh reader from the index when reading, regardless of what was previously read in the transaction.
A workaround could be to create a 'dummy' property and a unique constraint on it and one of the node labels. In Neo4j 2.2.5, this should work to get around your problem.
Is using a tree with a counter on the root node, to be referenced and incremented when creating new nodes, a viable way of managing unique IDs in Neo4j? In a previous question on performance on this forum (Neo4j merge performance VS create/set), the approach was described, and it occurred to me it may suggest a methodology for unique ID management without having to extend the Neo4j database (and support that extension). However, I noticed this approach has not been mentioned in other discussions on best practice for unique ID management (Best practice for unique IDs in Neo4J and other databases?).
Can anyone help validate or reject this approach?
Thanks!
You can just create a singleton node (I'll give it the label IdCounter in my example) to hold the "next-valid ID counter" value. There is no need for it be part of any "tree" or for it to have any relationships at all.
When you create the singleton, initialize it with the first id value that you want to use. For example:
CREATE (:IdCounter {nextId: 1});
Here is a simple example of how to use it when creating a new node.
MATCH (c:IdCounter)
CREATE (x {id: c.nextId})
SET c.nextId = c.nextId + 1
RETURN x;
Since all Cypher queries are transactional, if the node creation did not happen for any reason, then the nextId increment would also not be done, so you should never end up with any gaps in assigned id numbers.
However, to avoid re-using the same id number, you would have to write your queries carefully to ensure that the increment always happens whenever you create a new node (using CREATE, CREATE UNIQUE, or MERGE).
To clarify, let's assume that we have nodes representing people and the following relationships: "BIOLOGICAL_MOTHER" and "BIOLOGICAL_FATHER".
Then, for any person node, said node can only have one "BIOLOGICAL_MOTHER" and one "BIOLOGICAL_FATHER". How can we ensure that this is the case?
No. Neo4J currently only supports uniqueness constraints.
I believe several people are working on different schema constructs for neo4j, that would permit you to constrain graphs in any number of different ways. What it seems you're asking for boils down to a database constraint that if there is a relationship of type BIOLOGICAL_FATHER from one person to another, that the DB may not accept any creation of new relationships of that same type. In other words, relationship cardinality constraints, by relationship type.
At the moment, I think the best you can do is verify in your application code that such a relationship doesn't exist before creating it, but the DB won't do this checking for you.
The particular constraint you're looking for sounds easy enough, hopefully a neo4j dev will jump in here and say, "Oh, no worries, that's planned for release XYZ" - but I'm not sure about that.
More broadly, there are a number of issues with graphs that make constraints very tricky. In my personal graph domain, I'd like to make it impossible to create new relationships such that they would introduce cycles in the graph over a particular relationship type. (E.g. (a)-[:owns]->(b)-[:owns]->(a) is extremely undesirable for me). This would be a very costly constraint to actually enforce in the general case, since verifying whether a new relationship was OK could potentially involve traversing a huge graph.
Over the long run, it seems reasonable that neo4j might implement local constraints, but still shy away from anything that implied non-local constraint checking.
Steve,
In terms of Cypher, if I am given two names of people - say Sam and Dave, and wish to make Sam the father of Dave, but only if Dave doesn't yet have a father, I could do something like this:
MATCH (f {name : 'Sam'}), (s {name : 'Dave'})
WHERE NOT (s)<-[:FATHER]-()
CREATE (f)-[:FATHER]->(s)
If Dave already has a father the WHERE clause filters Dave out, which means no relationship will be created.
Grace and peace,
Jim