Misunderstanding of cypher query in Neo4j - neo4j

Look at following example graph (from Neo4j reference):
And ther query is:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson.name
The result is:
"Anders"
I can't understand why this result was returnes. First of all,
what does it mean:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
In particualr, Bossman has also (like Anders) two outgoing edges and is connected to David.
Can someone explain me a semantic of this query ?

So as you noted there are two nodes which look like they fit the pattern you described. Both Anders and Bossman are connected to David, and both have two outgoing relationships.
The thing you're missing is that with Cypher patterns, relationships are unique for the pattern, they will not be reused (this is actually very useful, for example it prevents infinite loops when using variable-length relationships when a cycle is present).
So in this MATCH pattern:
MATCH (david { name: 'David' })--(otherPerson)-->()
the relationship used to get from David to Bossman (the :BLOCKS relationship) will not be reused in the pattern (specifically the (otherPerson)-->() part), so you will only get a single result row for this, while for Anders you will get 2. Your WHERE clause then rules out the match for Bossman, since the count of foaf is 1.
One way you could alter this query to get the desired result is to check for the degrees of a relationship in the WHERE clause rather than in the MATCH pattern. This is also more efficient as checking for relationship degrees doesn't have to perform an expand operation, the relationship degree data is on the node itself.
MATCH ({ name: 'David' })--(otherPerson)
WHERE size((otherPerson)-->()) > 1
RETURN otherPerson.name
(also it's a good idea to use node labels in your matches, at least for your intended starting nodes. Indexes (if present) will only be used when you explicitly use both the label and the indexed property in the match, it won't work when you omit the label, or use a label that's not a part of the index).

Related

Neo4J Matching Nodes Based on Multiple Relationships

I had another thread about this where someone suggested to do
MATCH (p:Person {person_id: '123'})
WHERE ANY(x IN $names WHERE
EXISTS((p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(:Dias {group_name: x})))
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
This does what I need it to, returns nodes that fit the criteria without returning the relationships, but now I need to include another param that is a list.
....(:Dias {group_name: x, second_name: y}))
I'm unsure of the syntax.. here's what I tried
WHERE ANY(x IN $names and y IN $names_2 WHERE..
this gives me a syntax error :/
Since the ANY() function can only iterate over a single list, it would be difficult to continue to use that for iteration over 2 lists (but still possible, if you create a single list with all possible x/y combinations) AND also be efficient (since each combination would be tested separately).
However, the new existenial subquery synatx introduced in neo4j 4.0 will be very helpful for this use case (I assume the 2 lists are passed as the parameters names1 and names2):
MATCH (p:Person {person_id: '123'})
WHERE EXISTS {
MATCH (p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(d:Dias)
WHERE d.group_name IN $names1 AND d.second_name IN $names2
}
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
By the way, here are some more tips:
If it is possible to specify the direction of each relationship in your query, that would help to speed up the query.
If it is possible to remove any node labels from a (sub)query and still get the same results, that would also be faster. There is an exception, though: if the (sub)query has no variables that are already bound to a value, then you would normally want to specify the node label for the one node that would be used to kick off that (sub)query (you can do a PROFILE to see which node that would be).

Why is a single Neo4j relationship shown twice in Cypher query results?

Let's consider a trivial graph with a directed relationship:
CREATE
(`0` :Car {value:"Ford"})
, (`1` :Car {value:"Subaru"})
, (`0`)-[:`DOCUMENT` {value:"DOC-1"}]->(`1`);
The following query MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car) RETURN * returns:
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Subaru"}│{"value":"Ford"} │{"value":"DOC-1"}│
├──────────────────┼──────────────────┼─────────────────┤
│{"value":"Ford"} │{"value":"Subaru"}│{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
The graph defines only a Ford->Subaru relationship, why are two relationships?
How to interpret the reversed one (line 1; not specified in the CREATE) statement?
Note: This is a follow-up to Convert multiple relationships between 2 nodes to a single one with weight asked by me earlier. I solved my problem, but I'm not convinced my answer is the best solution.
Your MATCH statement here doesn't specify the direction, therefore there are two possible paths that will match the pattern (remember that the ordering of nodes in the path is important and distinguishes paths from each other), thus your two answers.
If you specify the direction of the relationship instead you'll find there is only one possible path that matches:
MATCH (n1:Car)-[r:DOCUMENT]->(n2:Car)
RETURN *
As for the question of why we get two paths back when we omit the direction, remember that paths are order-sensitive: two paths that have the same elements but with a different order of the elements are different paths.
To help put this into perspective, consider the following two queries:
# Query 1
MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car)
WHERE n1.value = 'Ford'
RETURN *
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Ford"} │{"value":"Subaru"}│{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
# Query 2
MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car)
WHERE n1.value = 'Subaru'
RETURN *
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Subaru"}│{"value":"Ford"} │{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
Conceptually (and also used by the planner, in absence of indexes), to get to each of the above results you start off with the results of the full match as in your description, then filter to the only one which meets the given criteria.
The results above would not be consistent with the original directionless match query if that original query only returned a single row instead of two.
Additional information from the OP
It will take a while to wrap my head around it, but it does work this way and here's a piece of documentation to confirm it's by design:
When your pattern contains a bound relationship, and that relationship pattern doesn’t specify direction, Cypher will try to match the relationship in both directions.
MATCH (a)-[r]-(b)
WHERE id(r)= 0
RETURN a,b
This returns the two connected nodes, once as the start node, and once as the end node.

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

Neo4j Tree structure with stop condition

I have Neo4j version 3.0.4 with tree based data inside and I am trying to find solution for following problem:
I want start from root and travel down collecting all nodes. When I found node of type (property) "female" I want to include it in the result and stop going down.
Here is my try to describe the problem and expected result
Notes:
there is relationship between nodes called "relation". Every node has
only 1 parent so it is tree structure.
So far I have:
match p=(root:User {isRoot:true})-[:RELATION*..]-(child:User) return p
Which return the tree structure but without stopping condition
How can achieve the result?
Update 1:
Maybe a better way to describe the desired outcome is - I want every node from a tree going in depth and starting from root (or specific node) that has no direct or indirect parents of type female. Does that make sense?
You have two options here, really. The easy one is to do as InverseFalcon suggests and get all results, then prune using a predicate:
MATCH (root:User {isRoot: true})
WITH root
MATCH p = (root) - [:RELATION*] -> (:User {type: 'female'} )
WHERE ALL(x IN NODES(p)[1..-1] WHERE x.type = 'male')
RETURN NODES(p)
The harder, but better one, especially if your data set is very large or you plan to run a very large number of queries, is to refactor your data model so that instead of a generic -[:RELATION]->, you have particular relationship types that you plan to query against (:DAUGHTER|:SON, e.g.). Relationships in neo4j are much faster to query on than node labels or especially node properties, so design your relationships to accommodate the analysis you'll want to perform.
[EDITED]
Does this work for you?
MATCH p=(root:User {isRoot:true})-[:RELATION*0..]-(:User {type: 'male'})-[:RELATION]-(:User {type: 'female'})
RETURN p;
This query should return all paths that start at the root node and end at a female node, but without going through any other female nodes. I have assumed that the non-female nodes have "male" as the type value. The variable-length relationship pattern specifies 0.. so that a path consisting of a female root node can be returned as well.

Resources