I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".
Specifically, I have a query
match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b
which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4.
Can I write the query without having to repeat the second match clause? Using
match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b
seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,
match (c)-[:st]->(b)
tries to find matches between ANY node of (c) and ANY node of (b), whereas
match (b)-[:st]->(b)
tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?
Thanx for any insight into the inner working ...
When you write the 2 MATCH statements
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.
To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.
If you do a single MATCH, you obviously have a single node.
Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:
1 2
1 (1, 1) (1, 2)
2 (2, 1) (2, 2)
whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:
1 2
1 (1, 1)
2 (2, 2)
which are not the interesting pairs, if you don't have self-relationships.
Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.
Related
I am using Neo4j (version 3.5.1) and Spring-data-neo4j (5.0.10.RELEASE) in my application. I am also using OGM.
I have the below relationship between my nodes:
A vehicle (V) has Part(s) (P1, P2 and P3). Parts can themselves be linked with other parts (for e.g P2 is linked with P6)
I am trying to write a cypher query to get all the parts in a vehicle. However, I want to paginate the results and also want to order the parts ordered by the creation date (part created recently is returned first)
Below is my query:
MATCH (vehicle: Vehicle{id:{vehicleId}})
WITH vehicle MATCH p=(vehicle)-[:HAS_PART]-(part:Part)
WITH p, part SKIP 1 LIMIT 1 OPTIONAL MATCH m=(part)-[:IS_LINKED_WITH]->(:Part)
RETURN collect(nodes(p)), collect(relationships(p)), collect(nodes(m)), collect(relationships(m))
I sometimes get result size greater than 1. Also I am not sure how to order the returned Part by creation date (Part node has creationDate property set when it is created).
Any help would be highly appreciated. Thanks.
Data can be create as follows:
merge (v:Vehicle{id:'V1'})-[:HAS_PART]->(p:Part{id:'P1'})-[:IS_LINKED_WITH]->(p:Part{id:'P5'})
match (v :Vehicle{id:'V1'})
merge (v)-[:HAS_PART]->(p:Part{id:'P3'})
match (v :Vehicle{id:'V1'})
merge (v)-[:HAS_PART]->(p:Part{id:'P2'})-[:IS_LINKED_WITH]->(p:Part{id:'P6'})
You don't need to define paths in your patterns for this.
First match all parts of the vehicle (use :HAS_PART*.. if you have chained HAS_PART relationships):
MATCH (v:Vehicle {id:'V1'})-[:HAS_PART]-(part:Part)
I suppose not all parts have a IS_LINKED_WITH relationship, so use OPTIONAL MATCH for the linked parts (if you used only MATCH you wouldn't get parts with 0 linked relationships)
OPTIONAL MATCH (part)-[:IS_LINKED_WITH]-(linked:Part)
then collect all of the parts and use UNWIND so they are in single variable
WITH COLLECT(DISTINCT part) + COLLECT(DISTINCT linked) as allParts
UNWIND allParts as part
And use regular RETURN, ORDER BY, SKIP and LIMIT clauses:
RETURN DISTINCT part.id
ORDER BY part.id
SKIP 1 LIMIT 2
The whole query:
MATCH (v:Vehicle {id:'V1'})-[:HAS_PART]-(part:Part)
OPTIONAL MATCH (part)-[:IS_LINKED_WITH]-(linked:Part)
WITH COLLECT(DISTINCT part) + COLLECT(DISTINCT linked) as allParts
UNWIND allParts as part
RETURN DISTINCT part.id
ORDER BY part.id
SKIP 1 LIMIT 2
I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
I try to get chain started with Rita Ora
She has relations with
Drake, Jay Z and Justin Bieber
Query takes random from these guys, for example Jay Z
Then Query takes relations of Jay Z: Karrine
Steffans, Rosario Dawson and Rita Ora
Query can't take Rita Ora cuz
she is already in chain, so it takes random from others two, for
example Rosario Dawson
...
And at the end we should have a chain Rita Ora - Jay Z - Rosario Dawson - other celeb - other celeb 2
Is that possible to do it by query?
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ... construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN the results.
Note. Instead of UNWIND and count(DISTINCT ...), getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.
I have 3 labels, A, B, and Z. A & B both have a relationship to Z. I want to find all the A nodes that do not have share any of nodes Z in common with B
Currently, doing a normal query where the relationship DOES exist, works.
MATCH (a:A)-[:rel1]->(z:Z)<-[:rel2]-(b:B { uuid: {<SOME ID>} })
RETURN DISTINCT a
But when I do
MATCH (a:A)
WHERE NOT (a)-[:rel1]->(z:Z)<-[:rel2]-(b:B { uuid: {<SOME ID>} }))
RETURN DISTINCT a
It throws an error
Neo4j::Server::CypherResponse::ResponseError: z not defined
Not sure if the syntax for this is incorrect, I tried WHERE NOT EXIST() but no luck.
The query is part of a larger one called through a rails app using neo4jrb / (Neo4j::Session.query)
This is a problem to do with the scope of your query. When you describe a node in a MATCH clause like the below
MATCH (n:SomeLabel)
You're telling cypher to look for a node with the label SomeLabel, and assign it to the variable n in the rest of the query, and at the end of the query, you can return the values stored in this node using RETURN n (unless you drop n by not including it in a WITH clause).
Later on in you query, if you want to MATCH another node, you can do it in reference to n, so for example:
MATCH (m:SomeOtherLabel)-[:SOME_RELATIONSHIP]-(n)
Will match a variable connected (in any direction) to the node n, with a label SomeOtherLabel, and assign it to the variable m for the rest of the query.
You can only assign nodes to variables like this in MATCH, OPTIONAL MATCH, MERGE, CREATE and (sort of) in WITH and UNWIND clauses (someone correct me here if I've missed one, I suppose you also do this in list comprehensions and FOREACH clauses).
In your second query, you are trying to find a node with the label A, which is not connected to a node with the label Z. However, the way you have written the query means that you are actually saying find a node with label A which is not connected via a rel1 relationship to the node stored as z. This will fail (and as shown, neo complains that z is not defined), because you can't create a new variable like this in the WHERE clause.
To correct your error, you need to remove the reference to the variable z, and ensure you have also defined the variable b containing your node before the WHERE clause. Now, you keep the label in the query, like the below.
MATCH (a:A)
MATCH (b:B { uuid: {<SOME ID>} })
WHERE NOT (a)-[:rel1]->(:Z)<-[:rel2]-(b) // changed this line
RETURN DISTINCT a
And with a bit of luck, this will now work.
You get the error because z is the identifier of a node that you are using in a where clause that you have not yet identified.
Since you know b already I would match it first and then use it in your where clause. You don't need to assign :Z an identifier, simply using the node label will suffice.
MATCH (b:B { uuid: {<SOME ID>} })
WITH b
MATCH (a:A)
WHERE NOT (a)-[:rel1]->(:Z)<-[:rel2]-(b)
RETURN DISTINCT a
I have a graph of a tree structure (well no, more of a DAG because i can have multiple parents) and need to be able to write queries that return all results in a flat list, starting at a particular node(s) and down.
I've reduced one of my use cases to this simple example. In the ascii representation here, n's are my nodes and I've appended their id. p is a permission in my auth system, but all that is pertinent to the question is that it marks the spot from which I need to recurse downwards to collect nodes which should be returned by the query.
There can be multiple root nodes related to p's
The roots, such as n3 below, should be contained in the results, as well as the children
The relationship depth is unbounded.
Graph:
n1
^ ^
/ \
n2 n3<--p
^ ^
/ \
n4 n5
^
/
n6
If it's helpful, here's the cypher I ran to throw together this quick example:
CREATE path=(n1:n{id:1})<-[:HAS_PARENT]-(n2:n{id:2}),
(n1)<-[:HAS_PARENT]-(n3:n{id:3})<-[:HAS_PARENT]-(n4:n{id:4}),
(n3)<-[:HAS_PARENT]-(n5:n{id:5}),
(n4)<-[:HAS_PARENT]-(n6:n{id:6})
MATCH (n{id:3})
CREATE (:p)-[:IN]->(n)
Here is the current best query I have:
MATCH (n:n)<--(:p)
WITH collect (n) as parents, (n) as n
OPTIONAL MATCH (c)-[:HAS_PARENT*]->(n)
WITH collect(c) as children, (parents) as parents
UNWIND (parents+children) as tree
RETURN tree
This returns the correct set of results, and unlike some previous attempts I made which did not use any collect/unwind, the results come back as a single column of data as desired.
Is this the most optimal way of making this type of query? It is surprisingly more complex than I thought the simple scenario called for. I tried some queries where I combined the roots ("parents" in my query) with the "children" using a UNION clause, but I could not find a way to do so without repeating the query for the relationship with p. In my real world queries, that's a much more expensive operation which i've reduced down here for the example, so I cannot run it more than once.
This might suit your needs:
MATCH (c)-[:HAS_PARENT*0..]->(root:n)<--(:p)
RETURN root, COLLECT(c) AS tree
Each result row will contain a distinct root node and a collection if its tree nodes (including the root node).
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.