How to get all relationships between a set of nodes? - neo4j

In general, the problem is: there is a query with (variable length) matches of node:
MATCH (a:`SOME.LABEL.1` {...})
MATCH (b:`SOME.LABEL.2` {...})
MATCH (c:`SOME.LABEL.3` {...})
...
MATCH (z:`SOME.LABEL.n` {...})
I need to get all relationships between this set of nodes. I started thinking about searching of distinct combinations of (a, b, c, ..., z):
WITH a,b,c, ..., z
MATCH (a) -[ab]-> (b)
MATCH (a) -[ac]-> (c)
...
MATCH (z) -[za]-> (a)
RETURN ab, ac, ..., za;
But i think its too complex.
There is an apoc function - apoc.algo.cover, that does what i need, but, unfortunately, i need to do it with pure cypher.

This will work:
WITH ['LAB1', 'LAB2', 'LAB3', 'LAB4'] AS labs
MATCH (n)-[r]->(m)
WHERE
ANY(l1 IN labs WHERE l1 IN LABELS(n)) AND
ANY(l2 IN labs WHERE l2 IN LABELS(m))
RETURN r
But if you are using neo4j 5, the newer label expression syntax will likely be more performant:
MATCH (:LAB1|LAB2|LAB3|LAB4)-[r]->(:LAB1|LAB2|LAB3|LAB4)
RETURN r

Related

Preventing duplicates with multiple collects in cypher query - which is the more canonical approach?

I was initially caught off guard by cypher returning a cross product of multiple collect statements for a query.
To remove duplicates in the returned values; which is the more canonical (or otherwise preferred) approach out of the two examples below:
Option A:
MATCH (a)
MATCH (a)-->(b)
WITH a, collect(b) as bs
MATCH (a)-->(c)
RETURN a, bs, collect(c) as cs
Option B:
MATCH (a)
MATCH
(a)-->(b),
(a)-->(c)
RETURN
a,
collect(DISTINCT b) as bs,
collect(DISTINCT c) as cs
I'm assuming option A has better performance.
Since both your constructs only return results if the full pattern
(b)<--(a)-->(c)
exists,the shortest way would be
MATCH (b)<--(a)-->(c)
RETURN
a,
collect(DISTINCT b) as bs,
collect(DISTINCT c) as cs
In case one or both of the two edges is/are optional, and you can use apoc, you can also do something like:
MATCH (a)
RETURN
a,
apoc.coll.toSet([(a)-->(b) | b]) as bs,
apoc.coll.toSet([(a)-->(c) | b]) as cs

Neo4j Query - Find all nodes which satisfy a property condition and have relationship

I have a query which runs successfully:
match (n:A {tag_no:"N2203"})<-[:rel_a]-(v:B)-[:rel_b]->(r)<-[:rel_c]-(n) WHERE (v.invoice_date>="2012-08-01" AND v.invoice_date<"2016-02-01") with v, collect(r) as rs where all (x in rs where x.date<"2016-08-01") return count(v) as count;
I need to further filter this query.
I need to related r nodes with max(r.date) for each v, and find out how many of those nodes have a relationship with another node of type D
I'm trying this query, it throws a syntax error
match (n:A {tag_no:"N2203"})<-[:rel_a]-(v:B)-[:rel_b]->(r)<-[:rel_c]-(n) WHERE (v.invoice_date>="2012-08-01" AND v.invoice_date<"2016-02-01") with v, max(r.date) as date collect(r) as rs where (all (x in rs where x.date<"2016-08-01") AND filter(x in rs where x.date=date)[0]<-[:rel_c]-(d:D)) return count(v) as count;
I also tried many other combinations, but all throw some syntax error. All help is appreciated.
The main issue is that query parser cannot now if filter(x in rs where x.date=date)[0] really is a node, so this is not allowed in the syntax. Fortunately, it's possible to work around this at the cost of some verbosity:
Use another WITH clause to introduce an alias (n) to the node variable. That will allow you to use the WHERE <pattern> syntax.
Also, you cannot introduce new variables in the WHERE clause, so just use (:D) instead of (d:D).
So the query will look like this (obviously, I have not tested it):
MATCH (n:A {tag_no:"N2203"})<-[:rel_a]-(v:B)-[:rel_b]->(r)<-[:rel_c]-(n)
WHERE v.invoice_date>="2012-08-01"
AND v.invoice_date<"2016-02-01"
WITH
v,
max(r.date) AS date,
collect(r) AS rs
WHERE all(x IN rs WHERE x.date<"2016-08-01")
WITH filter(x in rs where x.date=date)[0] AS n, v
WHERE (n)<-[:rel_c]-(:D)
RETURN count(v) AS count;

Cypher : Return Nodes that matched along with Nodes that didn't match

With Labels A, B, and Z, A and B have their own relationships to Z. With the query
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z
RETURN DISTINCT a, matched_z
Which returns the nodes of A and all the Nodes Z that have a relationship to A and B
I'm stuck on trying to ALSO return a separate array of the Z Nodes that B has with Z but not with A (i.e. missing_z). I am attempting to do an initial query to return all the relationships between B & Z
results = MATCH (b:B { uuid: {id} })
MATCH (b)-[:rel2]->(z:Z)
RETURN DISTINCT COLLECT(z.uuid) AS z
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z, z
RETURN DISTINCT a, matched_z, filter(skill IN z.array WHERE NOT z.uuid IN {results}) AS missing_z
The results seem to have nil for missing_z where one would assume it should be populated. Not sure if filter is the correct way to go with a WHERE NOT / IN scenario. Can the above 2 queries be combined into 1?
The hard part here, in my opinion, is that any failed matches will drop everything you have matched so far. But your starting point seems to be "All Z related by B.uuid", So start by collecting that and filtering/copying from there.
Use WITH + aggregation functions to copy+filter columns
Use OPTIONAL MATCH if a failure to match shouldn't drop already collected rows.
If I understand what you are trying to do well enough, This cypher should do the job, and just adjust it as needed (let me know if you need help understanding any part of it/adapting it)
// Match base set
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
// Collect into single list
WITH COLLECT(z) as zs
// Match all A (ignore relation to Zs)
MATCH (a:A)
// For each a, return a, the sub-list of Zs related to a, and the sub-list of Zs not related to a
RETURN a as a, FILTER(n in zs WHERE (a)-[:rel1]->(n)) as matched, FILTER(n in zs WHERE NOT (a)-[:rel1]->(n)) as unmatched
This query might do what you want:
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
WITH COLLECT(z) as all_zs
UNWIND all_zs AS z
MATCH (a)-[:rel1]->(z)
WITH all_zs, COLLECT(DISTINCT z) AS matched_zs
RETURN matched_zs, apoc.coll.subtract(all_zs, matched_zs) AS missing_zs;
It first stores in the all_zs variable all the Z nodes that have a rel2 relationship from b. This collection's contents remain unaffected even if the second MATCH clause matches a subset of those Z nodes.
It then stores in matched_zs the distinct all_zs nodes that have a rel1 relationship from any A node.
Finally, it returns:
the matched_zs collection, and
the unique nodes from all_zs that are not also in matched_zs, as missing_zs.
The query uses the convenient APOC function apoc.coll.subtract to generate the latter return value.

Cypher: Graph a nodes's connections 3 deep

given a node, I want to use D3 to graph it, and it's neighborhood 3 deep.
The best strategy I can come up with is:
Query1: MATCH (n)-[r]-(m) WHERE id(n) IN [501] RETURN n, r, m
Then from the results, in my app, collect all of m's id's, put those new id's in the IN clause (remove the ones I've already done), and repeat the query.
Query2: MATCH (n)-[r]-(m) WHERE id(n) IN [502,511,1111] RETURN n, r, m
Query3: MATCH (n)-[r]-(m) WHERE id(n) IN [512,519,1116,1130] RETURN n, r, m
Note: we don't know the id's of the 2nd query till after the 1st, etc.
But this means running 3 query's, and lost of IO shuffling.
Is there a better better way to do this? I feel like I'm doing too much work in my app, when it should be done in cypher. I looked in the D3 examples, but didn't see this kind of query.
Thanks!
Mike
You can run following query or similar, but the concept should be easy to understand:
MATCH (n)-[r*1..3]-(m) WHERE id(n) IN [501] RETURN n, r, m
Where *1..3 means match from 1 to 3 relationships away from the n node
This way it is only a single query and should be significantly faster then running three separate queries
Does this work?
MATCH (n)-[r]-(m)-[s]-(o) WHERE id(n) IN [501] RETURN n, r, m, s, o
Does this work for you?
MATCH (n)-[r1]-(m1)-[r2]-(m2)-[r3]-(m3)
WHERE ID(n) = 501 AND
ID(m2) IN [502,511,1111] AND
ID(m3) IN [512,519,1116,1130]
RETURN n, r1, m1, r2, m2, r3, m3;
Why not just combine all 3:
MATCH (n)-[r]-(m)
WHERE id(n) IN [501]
WITH m
MATCH (m)-[s]-(o)
WITH o
MATCH (o)-[t]-(p)
RETURN o,t,p
The difference between this and the other answers is that this will revisit relationships especially because there is no direction specified, if that's what you want.
How about this: with "resultDataContents":["graph"]
MATCH path = (n)-[*..3]-(m)
WHERE id(n) IN [501]
RETURN path
or if you want to save bandwidth
MATCH path = (n)-[*..3]-(m)
WHERE id(n) IN [501]
RETURN [x in nodes(path) | id(x)] as node_ids, [x in rels(path) | id(x)] as rel_ids
Thanks for the answers, they didn't work out for me.
I did this:
Set idsTodo to the initial node.
Then ran this:
MATCH (n)-[r]->(m) WHERE id(n) IN {idsTodo} RETURN n, r, m
Added idsTodo to idsDone
Then added m.ids to a idsTodo, then subtracted idsDone
Then ran the query again, repeating 2 more times.
By the end, I had every node and it's relationships.

neo4j cypher: stacking results with UNION and WITH

I'm doing a query like
MATCH (a)
WHERE id(a) = {id}
WITH a
MATCH (a)-->(x:x)-->(b:b)
WITH a, x, b
MATCH (a)-->(y:y)-->(b:b)
WITH a, x, y, b
MATCH (b)-->(c:c)
RETURN collect(a), collect(x), collect(y), collect(b), collect(c)
what I want here is to have the b from MATCH (a)-->(y:y)-->(b:b) to be composed of the ones from that line and the ones from the previous MATCH (a)-->(x:x)-->(b:b). The problem I'm having with UNION is that its picky about the number and kind of nodes to be passed on the next query, and I'm having trouble understanding how to make it all go together.
What other solution could I use to merge these nodes during the query or just before returning them? (Or if should I do it with UNION then how to do it that way...)
(Of course the query up there could be done in other better ways. My real one can't. That is just meant to give a visual example of what I'm looking to do.)
Much obliged!
This simplified query might suit your needs.
I took out all the collect() function calls, as it is not clear that you really need to aggregate anything. For example, there will only be a single 'a' node, so aggregating the 'a's does not make sense.
Please be aware that every row of the result will be for a node labelled either 'x' or 'y'. But, since every row has to have both the x and y values -- every row will have a null value for one of them.
START a=node({id})
MATCH (a)-->(x:x)-->(b:b)-->(c:c)
RETURN a, x, null AS y, b, c
UNION
MATCH (a)-->(y:y)-->(b:b)-->(c:c)
RETURN a, null AS x, y, b, c
The best solution I could come up in the end was something like this
MATCH (a)-->(x:x)-->(b1:b)-->(c1:c)
WHERE id(a) = {id} AND NOT (a)-->(:y)-->(b1)
WITH a, collect(x) as xs, collect(DISTINCT b1) as b1s, collect(c1) as c1s
MATCH (a)-->(y:y)-->(b2:b)-->(c2:c)
RETURN a, xs, collect(y), (b1s + collect(b2)), c1s + collect(c2)

Resources