Skipping a node in a variable length relationship in cypher - neo4j

I am trying to figure out how to write a Cypher query for Neo4J. I have a linked list of nodes like this:
n-[FIRST_NODE]->n-[NEXT_NODE]->n-[NEXT_NODE]->.....
The FIRST_NODE relationship has a property that says how deep in the list we should retrieve nodes. I want to retrieve a list of nodes possibly skipping one based on a property in n and retrieve x amount of nodes where x is the depth we should traverse in the list. Does this make sense?
I have come up with the below query but it doesn't work!
MATCH (x)-[firstIssue:FIRST_NODE]->(y:Type1)
MATCH (z)-[:NEXT_NODE*1..{firstIssue.Count}]->(a:Type1)
RETURN x,y,z,a
Any help would be apprectiated!.

Cypher does not support dynamic bounds for variable length paths.
However, in neo4j 3.x, you can install the APOC plugin and use the apoc.path.expand procedure. For example:
MATCH (x)-[firstIssue:FIRST_NODE]->(y:Type1)
CALL apoc.path.expand(y, 'NEXT_NODE>', '+Type1', 1, firstIssue.Count) YIELD path
RETURN x, firstIssue, path;

Related

How optimised is this Cypher query?

[Edit] I'm using Neo4j 4.2.1
I have this need for a Cypher query that brings back a complete tree given its root node. All nodes and relationships must be fetched and present only once in the returned sets. Here's what I have come to:
MATCH p = (n)-[*..]->(m)
WHERE id(n) = 0
WITH relationships(p) AS r
WITH distinct last(r) as rel
WITH [node IN [startNode(rel), endNode(rel)] | node] AS tmp, rel
UNWIND tmp AS node
RETURN collect(DISTINCT node) AS nodes, collect(distinct rel) AS relationships;
Running the query on our database to get about 820 nodes makes the thing crash for lack of memory (5Gb allowed). Hard to believe. So I'm wondering : Is this query ill-born? Is there one technique I'm using that shouldn't be used for my purpose?
I strongly recommend that you come up with a node property that is guaranteed to be the same on all the nodes in a contiguous tree, if you don't have one already. I'll call that property same_prop. Here's what I do to run queries like the one you're running:
Index same_prop. If you have different node labels, then you need this index created for each different node label you expect to have in the tree.
CREATE INDEX samepropnode FOR (n:your_label) ON (n.same_prop)
is the kind of thing you need in Neo4j 4+. In Neo4j, indices are cheap, and can sometimes speed up queries quite a bit.
Collect all possible values of same_prop and store them in a text file (I use tab-separated values as safer than comma-separated values).
Use the Python driver, or your language of choice that has a Neo4j driver written (strongly recommend Neo4j-provided drivers, not third-party) to write wrapper code that executes a Cypher query something like this:
MATCH (p)-->(c)
USING INDEX p:your_label(same_prop)
WHERE p.same_prop IN [ same_prop_list ]
RETURN DISTINCT
p.datapiece1 AS `first_parent_datapiece`,
p.datapiecen AS `nth_parent_datapiece`,
c.datapiece1 AS `first_child_datapiece`,
c.datapiecen AS `nth_child_datapiece`
It's not a good idea, in general, to return nodes and relationships unless you're debugging.
Then in your Python (for example) code, you're simply going to read in all your same_prop values from the file you got in Step 2, chunk up the values in reasonable size chunks, maybe 1,000 or 10,000, and substitute them in for the [ same_prop_list ] in the Cypher query on-the-fly.

Neo4j cypher query - How to return path nodes but not include the nodes with identical specific properties

I’m trying to MATCH nodes from the Neo4j database and filter the result I get from the path value with no success.
I’m currently using a query like this:
MATCH path = (x:x)-[y:y]->(f:f) return DISTINCT nodes(path)
and the result I get is:
[{“property1”:“prop1"}, {“property2”:“b”}, {“property3”:“c”}]
[{“property1”:“prop2"}, {“property2”:“b”}, {“property3”:“c”}]
[{“property1”:“prop3"}, {“property2”:“b”}, {“property3”:“c”}]
but I want the returning result to be distinct by a pair of node properties (property2 and property3)
so the result should be only one of the three:
[{“property1”:“prop1"}, {“property2”:“b”}, {“property3”:“c”}]
I don’t mind what is the value I get in “property1”, it can be any of the three.
is there a way to accomplish this?
This query uses the aggregating function COLLECT to generate a list of all the paths that share each distinct pair of property2/property3 values, and assigns the first path from each list to the path variable. Therefore, the returned path values will all have a distinct pair of property2/property3 values.
MATCH p = (:x)-[y:y]->(f:f)
WITH y.property2 AS p2, f.property3 AS p3, COLLECT(p)[0] AS path
RETURN path

Neo4J Matching Nodes Based on Multiple Relationships

I had another thread about this where someone suggested to do
MATCH (p:Person {person_id: '123'})
WHERE ANY(x IN $names WHERE
EXISTS((p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(:Dias {group_name: x})))
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
This does what I need it to, returns nodes that fit the criteria without returning the relationships, but now I need to include another param that is a list.
....(:Dias {group_name: x, second_name: y}))
I'm unsure of the syntax.. here's what I tried
WHERE ANY(x IN $names and y IN $names_2 WHERE..
this gives me a syntax error :/
Since the ANY() function can only iterate over a single list, it would be difficult to continue to use that for iteration over 2 lists (but still possible, if you create a single list with all possible x/y combinations) AND also be efficient (since each combination would be tested separately).
However, the new existenial subquery synatx introduced in neo4j 4.0 will be very helpful for this use case (I assume the 2 lists are passed as the parameters names1 and names2):
MATCH (p:Person {person_id: '123'})
WHERE EXISTS {
MATCH (p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(d:Dias)
WHERE d.group_name IN $names1 AND d.second_name IN $names2
}
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
By the way, here are some more tips:
If it is possible to specify the direction of each relationship in your query, that would help to speed up the query.
If it is possible to remove any node labels from a (sub)query and still get the same results, that would also be faster. There is an exception, though: if the (sub)query has no variables that are already bound to a value, then you would normally want to specify the node label for the one node that would be used to kick off that (sub)query (you can do a PROFILE to see which node that would be).

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

Find all relations starting with a given node

In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.

Resources