Where vs property specification - neo4j

Trying to understand when to use a property value vs a WHERE clause.
$Match (g:GROUP {GroupID: 1}) RETURN g
gives the expected response (all reported properties as expected).
And,
$match (a:ADDRESS {AddressID: 454}) return a
gives the expected response (all reported properties as expected).
However, the combo in a MERGE
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1;
creates two new nodes (with no properties, of course, except a redundant AddressID and GroupID. The AddressID and GroupID were created with toInt() and I tried putting the property values in toInt() also (same result):
Added 2 labels, created 2 nodes, set 2 properties, created 1 relationship, returned 1 row in 77 ms.
So, after DETACH DELETE the extraneous nodes, I try again with (which works)
Match (g:GROUP) WHERE g.GroupID = 1
Match (a:ADDRESS) WHERE a.AddressID = 454
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
Returned 1 row in 14 ms.
WHY does the separate MATCHing work while the property spec does not?

MERGE is one of the trickier clauses for exactly this behavior.
From the Cypher documentation for MERGE:
When using MERGE on full patterns, the behavior is that either the
whole pattern matches, or the whole pattern is created. MERGE will not
partially use existing patterns — it’s all or nothing. If partial
matches are needed, this can be accomplished by splitting a pattern up
into multiple MERGE clauses.
So when you're going to MERGE a pattern, and you aren't using variables bound to already existing nodes, then the entire pattern is matched, or if it doesn't exist, the entire pattern is created, which, in your case, creates duplicate nodes, as your intent is to use existing nodes in the MERGE.
In general, when you want to MERGE a relationship or pattern between nodes that already exist, it's best to MATCH or MERGE on the nodes which should already exist first, and then MERGE the pattern with the matched or merged variables.
EDIT
I think there's some confusion here about the reasons for the differences in the queries.
This doesn't have anything to do with whether the properties are defined in a WHERE clause, or inline on the nodes in the MATCH clauses.
In fact, you can do this just fine with your last query, and it will behave identically:
Match (g:GROUP {GroupID:1})
Match (a:ADDRESS {AddressID:454})
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
The reasons for the differences, again, the behavior of MERGE
Really the easiest way to grasp what's going on is to consider what the behavior would be if MERGE were substituted first with MATCH, and then if no match was found, with CREATE.
MATCH (g)-[r:USES]->(a)
and if there is no match, it does CREATE instead
CREATE (g)-[r:USES]->(a)
That should make sense...a CREATE with existing nodes will create the missing part, the relationship.
Contrast that with using MERGE on the entire pattern:
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
Return g.ShortName, type(r), a.Line1;
First this will attempt a MATCH:
MATCH (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
and then when no match is found, a CREATE
CREATE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
And given what we know of how CREATE works, it doesn't attempt to match parts of the pattern (and there are no variables that have already matched to existing elements of the graph), it creates the pattern as a whole, creating a brand new :GROUP and :ADDRESS node with the given properties, and the new :USES relationship.

MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1; most likely creates two nodes, because those properties (GroupID for the GROUP node / AddressID for ADDRESS node) are not the only properties on those nodes.
Matching the nodes first ensures that you're getting nodes with matching properties (which could have other properties, too) and merge those.
If you had an index with uniqueness constraint on both GroupID for GROUP nodes and AddressID for ADDRESS nodes, then the MERGE without matching first, should still make that connection.

Related

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Test if relationship exists in neo4j / spring data

I'm trying to solve the simple question, if in a graph with the "knows" relationship, some person A knows some person B. Ideally I would answer this question with either true or false but I'm failing to solve this.
I found the following in another StackOverflow question which is almost what I want, except that apart from just answering my question, it also changes the graph:
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
MERGE (p)-[r:KNOWS]->(b)
ON CREATE SET r.alreadyExisted=false
ON MATCH SET r.alreadyExisted=true
RETURN r.alreadyExisted;
In the end I would like to put this in a Spring Neo4J repository like this
public interface PersonRepository extends GraphRepository<Person> {
boolean knows(final Long me, final Long other);
}
That means if there is a way to do it without cypher - using Springs Query and Finder methods, that would be fine too.
The Cypher query for this is a simple one, the key here is the EXISTS() function, which will return a boolean value if the pattern given to the function exists in the graph.
Here's the Cypher query.
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN EXISTS( (p)-[:KNOWS]-(b) )
You can even make it more concise:
RETURN EXISTS( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
As a complementary note to what #InverseFalcon said
// first
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN exists( (p)-[:KNOWS]-(b) )
// second
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
There is a difference between the two examples that were provided:
the first one builds a Cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a Cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH
It merely means if you have 5 persons in your database P={0,1,2,3,4};
the first query nearly checks existence |A|x|A| = 5x5 = 25 possible path between every two-person where the first Person node has id equals to` 0 and the second Person node has id equals to 1.
the second query checks existence of the path from a Person node with id 0 and the Person node with id 1.
also cause exists can be a function and keyword, convention suggests to write functions as lowercase and others as uppercase.
// Use an existential sub-query to filter.
WHERE EXISTS {
MATCH (n)-->(m) WHERE n.userId = m.userId
}
also, you can rename the return value to some new variable for example:
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) ) as knows

What is the difference between using/(not using) MATCH in a MERGE query in cypher?

When creating a probably existing node in neo4j, i'm getting confused about the use of MATCH. What would be the difference between these two cases?
1.-
MERGE (n:Person { user_id: 1234 ,sex:'male'})
2.- MATCH (a:Person) WHERE a.user_id = 1234 AND a.sex = 'male' MERGE (n:Person { user_id: 1234 ,sex:'male'})
Actually even after reading the documentation I can't understand the usefullness of MATCH
MERGE is "get or create". If the pattern exists, get it (bind to the variables specified). If the pattern specified does not exist, then create it.
MATCH is just "get". If the pattern exists, bind to the variables specified. If the pattern does not exist, then nothing is bound to the variables.
Here are the differences between your 2 queries.
Query #1 will create a node with the specified label and properties if and only if such a node is not found.
Query #2 is basically a waste of time. It calls MATCH to look for a node with the specified label and properties, and only if it is found does it call MERGE (which would create the node if and only if it does not already exist). So, #2 ends up never making any changes. If the node already existed, it will continue to exist (since MERGE will discover it doesn't need to do anything). If it did not exist, it will remain nonexistent (since MERGE would not even be invoked).
Needless to say, one would never want to use MATCH and MERGE in the same way as #2. Here is a more typical example of MATCH and MERGE, which ensures that the SPOUSE_OF relationship exists between 2 people who are already in the DB:
MATCH (a:Person {user_id: 1234}), (b:Person {user_id: 5678})
MERGE (a)-[:SPOUSE_OF]->(b);

Cypher, create unique relationship for one node

I want to add a "created by" relationship on nodes in my database. Any node should be able of having this relationship but there can never be more than one.
Right now my query looks something like this:
MATCH (u:User {email: 'my#mail.com'})
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.name='Node name', n.attribute='value'
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
As I have understood Cypher there is no way to achieve what I want, the current query will only make sure there are no unique relationships based on TWO nodes, not ONE. So, this will create more CREATED_BY relationships when run for another User and I want to limit the outgoing CREATED_BY relationship to just one for all nodes.
Is there a way to achieve this without running multiple queries involving program logic?
Thanks.
Update
I tried to simplyfy the query by removing implementation details, if it helps here's the updated query based on cybersams response.
MERGE (c:Company {name: 'Test Company'})
ON CREATE SET c.uuid='db764628-5695-40ee-92a7-6b750854ebfa', c.created_at='2015-02-23 23:08:15', c.updated_at='2015-02-23 23:08:15'
WITH c
OPTIONAL MATCH (c)
WHERE NOT (c)-[:CREATED_BY]-()
CREATE (c)-[:CREATED_BY {date: '2015-02-23 23:08:15'}]->(u:User {token: '32ba9d2a2367131cecc53c310cfcdd62413bf18e8048c496ea69257822c0ee53'})
RETURN c
Still not working as expected.
Update #2
I ended up splitting this into two queries.
The problem I found was that there was two possible outcomes as I noticed.
The CREATED_BY relationship was created and (n) was returned using OPTIONAL MATCH, this relationship would always be created if it didn't already exist between (n) and (u), so when changing the email attribute it would re-create the relationship.
The Node (n) was not found (because of not using OPTIONAL MATCH and the WHERE NOT (c)-[:CREATED_BY]-() clause), resulting in no relationship created (yay!) but without getting the (n) back the MERGE query looses all it's meaning I think.
My Solution was the following two queries:
MERGE (n:Node {name: 'Name'})
ON CREATE SET
SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)-[r:CREATED_BY]-()
RETURN c, r
Then I had program logic check the value of r, if there was no relationship I would run the second query.
MATCH (n:Node {name: 'Name'})
MATCH (u:User {email: 'my#email.com'})
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
Unfortunately I couldn't find any real solution to combining this in one single query with Cypher. Sam, thanks! I've selected your answer even though it didn't quite solve my problem, but it was very close.
This should work for you:
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)
WHERE NOT (n)-[:CREATED_BY]->()
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(:User {email: 'my#mail.com'})
RETURN n;
I've removed the starting MATCH clause (because I presume you want to create a CREATED_BY relationship even when that User does not yet exist in the DB), and simplified the ON CREATE to remove the redundant setting of the name property.
I have also added an OPTIONAL MATCH that will only match an n node that does not already have an outgoing CREATED_BY relationship, followed by a CREATE UNIQUE clause that fully specifies the User node.

Keep track of Changes in Neo4j - Achieve functionality like a "flag" variable in standard programming

Initial setup of the sample database is provided link to console
There are various cases and within each case, there are performers(with properties id and name). This is the continuation of problems defined problem statement and solution to unique node creation
The solution in the second link is (credits to Christophe Willemsen
)
MATCH (n:Performer)
WITH collect(DISTINCT (n.name)) AS names
UNWIND names as name
MERGE (nn:NewUniqueNode {name:name})
WITH names
MATCH (c:Case)
MATCH (p1)-[r:RELATES_TO]->(p2)<-[:RELATES]-(c)-[:RELATES]->(p1)
WITH r
ORDER BY r.length
MATCH (nn1:NewUniqueNode {name:startNode(r).name})
MATCH (nn2:NewUniqueNode {name:endNode(r).name})
MERGE (nn1)-[rf:FINAL_RESULT]->(nn2)
SET rf.strength = CASE WHEN rf.strength IS NULL THEN r.value ELSE rf.strength + r.value END
This solution achieved what was asked for.
But I need to achieve something like this.
foreach (Case.id in the database)
{
foreach(distinct value of r.Length)
{
//update value property of node normal
normal.value=normal.value+0.5^(r.Length-2)
//create new nodes and add the result as their relationship or merge it to existing one
MATCH (nn1:NewUniqueNode {name:startNode(r).name})
MATCH (nn2:NewUniqueNode {name:endNode(r).name})
MERGE (nn1)-[rf:FINAL_RESULT]->(nn2)
//
rf.strength=rf.strength + r.value*0.5^(r.Length-2);
}
}
The problem is to track the change in the case and then the r.Length property. How can it be achieved in Cypher?
I will not redo the last part, where setting strengths.
One thing though, in your console link, there is only one Normal node, so why do you need to iterate over each case, you can just match distinct relationships lengths.
By the way for the first part :
MATCH (n:Case)
MATCH (n)-[:RELATES]->()-[r:RELATES_TO]->()<-[:RELATES]-(n)
WITH collect(DISTINCT (r.length)) AS lengths
MATCH (normal:Normal)
UNWIND lengths AS l
SET normal.value = normal.value +(0.5^l)
RETURN normal.value
Explanations :
Match the cases
Foreach case, match the RELATES_TO relationships of Performers for that Case
collect distinct lengths
Match the Normal node, iterate the the distinct lengths collection and setting the proper value on the normal node

Resources