Cypher: Quantifying over zero or more node-then-relations - neo4j

I want to return all nodes a and b, where b is not downstream of a via any path that begins with relation rel. I keep finding myself having to write one condition for the case where a is linked directly to b via rel, and one for the indirect case, leading to something like this:
//Semi-pseudo-code.
match (a)-[*]->(b)
optional match dir=(a)-[:rel]->(b)
optional match indir=(a)-[:rel]-()-[*]->(b)
where length(dir)=0
and length(indir)=0
return a,b
Is there any easier way? Really I want something like this, where the bare quantifier means "zero or more nodes-then-relations":
match (a)-[*]->(b)
match not (a)-[:rel]-*->(b)
return a,b
Note: I suspect this may at root be the same as my last question: Cypher: Matching nodes at arbitrary depth via a strictly alternating set of relations

We can use WHERE NOT to formulate negative conditions, in a similar fashion to your second semi-pseudocode:
MATCH (a)-[*]->(b)
WHERE NOT ((a)-[:rel]->()-[*1..]->(b))
RETURN a, b
Of course, this will be anything but efficient, so you should at least try to restrict the labels of a and b and the relationships between them, e.g. (a:Label1)-[:rel1|rel2*]->(b:Label2)
An example:
CREATE
(n1:N {name: "n1"}),
(n2:N {name: "n2"}),
(n3:N {name: "n3"}),
(n4:N {name: "n4"}),
(n5:N {name: "n5"}),
(n1)-[:x]->(n2),
(n3)-[:rel]->(n4),
(n4)-[:x]->(n5)
The query results in:
╒══════════╤══════════╕
│a │b │
╞══════════╪══════════╡
│{name: n1}│{name: n2}│
├──────────┼──────────┤
│{name: n4}│{name: n5}│
└──────────┴──────────┘
As you can see, it does not include n3 and n5, as it starts with a :rel relationship.

This should work:
MATCH (a)-[rs*]->(b)
WHERE TYPE(rs[0]) <> 'rel'
RETURN a, b;
However, the query below should be much more performant, as it filters out all unwanted path beginnings before it does the very expensive variable-length path search. The *0.. syntax makes the variable-length search use a lower bound of 0 for the length (so x will also be returnable as b).
MATCH (a)-[r]->(x)
WHERE TYPE(r) <> 'rel'
MATCH (x)-[*0..]->(b)
RETURN a, b;

Related

neo4j cypher keep ordering imposed by path for later in the query

I am using a query like
MATCH p=((:Start)-[:NEXT*..100]->(n))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n WHERE (n:RELEVANT)
...
RETURN n.someprop;
Where I want to have the results ordered by the natural ordering arising from the direction of the -[:NEXT]-> relationships.
But the WITH in the third line scrambles up that ordering. Problem is, I need the with to 1. filter for :RELEVANT nodes and 2. to get only distinct such nodes.
Is there some way to preserve the ordering? Maybe assign number ordering on the path and reuse it later with ORDER BY? No idea how to do it.
You're asking for distinct nodes, which indicates that the node might be reachable by multiple paths, and thus might be present at multiple distances from the start node.
Instead of using DISTINCT, you should use min() (or max(), depending on your requirements) on the path length for each n. Since those are aggregation functions, you will only ever get a single row for each n.
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH n, min(length(p)) as distance
WITH n
ORDER BY distance
...
RETURN n.someprop;
And if you remove the WHERE clause from WITH and put the label :RELEVANT in the MATCH? Maybe the WHERE is causing the problem... Try something this:
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n
...
RETURN n.someprop;

match a branching path of variable length

I have a graph which looks like this:
Here is the link to the graph in the neo4j console:
http://console.neo4j.org/?id=av3001
Basically, you have two branching paths, of variable length. I want to match the two paths between orange node and yellow nodes. I want to return one row of data for each path, including all traversed nodes. I also want to be able to include different WHERE clauses on different intermediate nodes.
At the end, i need to have a table of data, like this:
a - b - c - d
neo - morpheus - null - leo
neo - morpheus - trinity - cypher
How could i do that?
I have tried using OPTIONAL MATCH, but i can't get the two rows separately.
I have tried using variable length path, which returns the two paths but doesn't allow me to access and filter intermediate nodes. Plus it returns a list, and not a table of data.
I've seen this question:
Cypher - matching two different possible paths and return both
It's on the same subject but the example is very complex, a more generic solution to this simpler problem is what i'm looking for.
You can define what your end node by using WHERE statement. So in your case end node has no outgoing relationship. Not sure why you expect a null on return as you said neo - morpheus - null - leo
MATCH p=(n:Person{name:"Neo"})-[*]->(end) where not (end)-->()
RETURN extract(x IN nodes(p) | x.name)
Edit:
may not the the best option as I am not sure how to do this programmatically. If I use UNWIND I get back only one row. So this is a dummy solution
MATCH p=(n{name:"Neo"})-[*]->(end) where not (end)-->()
with nodes(p) as list
return list[0].name,list[1].name,list[2].name,list[3].name
You can use Cypher to match a path like this MATCH p=(:a)-[*]->(:d) RETURN p, and p will be a list of nodes/relationships in the path in the order it was traversed. You can apply WHERE to filter the path just like with node matching, and apply any list functions you need to it.
I will add these examples too
// Where on path
MATCH p=(:a)-[*]-(:d) WHERE NONE(n in NODES(p) WHERE n.name="Trinity") WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Spit path into columns
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Match path, filter on label
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN FILTER(n in p WHERE "a" in LABELS(n)) as a, FILTER(n in p WHERE "b" in LABELS(n)) as b, FILTER(n in p WHERE "c" in LABELS(n)) as c, FILTER(n in p WHERE "d" in LABELS(n)) as d
Unfortunately, you HAVE to explicitly set some logic for each column. You can't make dynamic columns (that I know of). In your table example, what is the rule for which column gets 'null'? In the last example, I set each column to be the set of nodes of a label.
I.m.o. you're asking for extensive post-processing of the results of a simply query (give me all the paths starting from Neo). I say this because :
You state you need to be able to specify specific WHERE clauses for each path (but you don't specify which clauses for which path ... indicating this might be a dynamic thing ?)
You don't know the size of the longest path beforehand ... but you still want the result to be a same-size-for-all-results table. And would any null columns then always be just before the end node ? Why (for that makes no real sense other then convenience) ?
...
Therefore (and again i.m.o.) you need to process the results in a (Java or whatever you prefer) program. There you'll have full control over the resultset and be able to slice and dice as you wish. Cypher (exactly like SQL in fact) can only do so much and it seems that you're going beyond that.
Hope this helps,
Regards,
Tom
P.S. This may seem like an easy opt-out, but look at how simple your query is as compared to the constructs that have to be wrought trying to answer your logic. So ... separate the concerns.

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Neo4J match and create relationship is very very slow with few millions records

I have about 3.5M nodes with label A and about 400 nodes with label B.
Nodes with label B already have directed relation like (b1:B)-(c:CONNECTS)->(b2:B) now I need to add 3.5M another type of relationships by comparing A node properties with :CONNECTS relationship properties.
My statement looks like this:
MATCH (a:A)
MATCH (c:C)
MATCH (b1:B {id: a.a1_id})-[rl:CONNECTS*1..21]->(b2:B {id: a.b2_id}) WHERE ALL(x in rl WHERE x.connect_id = c.connect_id)
MATCH (new_a:B)-[r:TO]->(new_b:B) WHERE r in rl
CREATE (new_a)-[:TICKET {ticket_id: ID(a)}]->(new_b)
This statement is extremely slow and just hangs up. I even tried to do some performance tuning mentioned here, especially I allocated heap size to 16GB.
I find it quite strange that it can't handle this size of data. What am I missing? I tried to model differently and reduce relationship queries and use more schema index, but I failed to do a lot differently because of type of data I have and type of query I want to perform after all data is there.
I also tried to use periodic commit while creating A nodes with csv import. It has same issues.
I hope I am clear enough. I would really appreciate some inputs. Thanks.
What are the labels A, B, C ? A CONNECTS relationship is also free of meaning.
Queries like this are meant to be comprehensible not the opposite!
// generates 3.5M rows
MATCH (a:A)
// generates x-times 3.5M rows
// you never use that C except for checking an connect id?
MATCH (c:C)
// many million times execute this variable length expand
MATCH (b1:B {id: a.a1_id})-[rl:CONNECTS*1..21]->(b2:B {id: b2_id})
WHERE ALL(x in rl WHERE x.connect_id = c.connect_id)
// lookup by relationship is very bad esp. as you looking over a cross product of all 400x400 B's
MATCH (new_a:B)-[r:TO]->(new_b:B) WHERE r in rl
// why do you store the id of a on this self!!-relationship?
CREATE (new_b)-[:TICKET {ticket_id: ID(a)}]->(new_b);
Where does b2_id come from?
Perhaps something like this:
MATCH (a:A)
MATCH (b1:B {id: a.a1_id})
MATCH (b2:B {id: {b2_id}})
MATCH (b1)-[rels:CONNECTS*..21]->(b2)
WHERE ALL(x in tail(rels) WHERE x.connect_id = head(rels).connect_id)
UNWIND rels AS r
WITH a,startNode(r) as new_a, endNode(r) as new_b
CREATE (new_a)-[:TICKET {ticket_id: ID(a)}]->(new_b);

Cypher Optional Match

I have a graph in that contains two types of nodes (objects and pieces) and two types of links (similarTo and contains). Some pieces are made of the pieces.
I would like to extract the path to each piece starting from a set of objects.
MATCH (o:Object)
WITH o
OPTIONAL MATCH path = (p:Piece) <-[:contains*]- (o) -[:similarTo]- (:Object)
RETURN path
The above query only returns part of the pieces. In the returned graph, some objects do not directly connect to any pieces, the latter are not returned, although they actually do!
I can change the query to:
MATCH (o:Object) -[:contains*]-> (p:Piece)
OPTIONAL MATCH (o) –[:similarTo]- (:Object)
However, I did not manage to return the whole path for that query, which I need to return collection of nodes and links with:
WITH rels(path) as relations , nodes(path) as nodes
UNWIND relations as r unwind nodes as n
RETURN {nodes: collect(distinct n), links: collect(distinct {source: id(startNode(r)), target: id(endNode(r))})}
I'd be grateful to any recommendation.
Would something like this do the trick ?
I created a small graph representing objects and pieces here : http://console.neo4j.org/r/abztz4
Execute distinct queries with UNION ALL
Here you'll combine the two use cases in one set of paths :
MATCH (o:Object)
WITH o
OPTIONAL MATCH p=(o)-[:CONTAINS]->(piece)
RETURN p
UNION ALL
MATCH (o:Object)
WITH o
OPTIONAL MATCH p=(o)-[:SIMILAR_TO]-()-[:CONTAINS]->(piece)
RETURN p

Resources