I want to find all paths given a start node
MATCH path=(n)-[rels*1..10]-(m)
with the following 2 conditions on path inlcusion:
true if relationship between subsequent nodes in path has property PROP='true'
if type(relationship)=SENDS then true if direction of the relationship is outgoing (from one path node to the next node in the path)
Another way of phrasing this is that direction doesn't matter unless the relationship name is SENDS
I can do condition 1 with WHERE ALL (r IN rels WHERE r.PROP='true') however ive no idea how to do condition 2.
The only way I can think of to filter on relationship direction without declaring direction in the match pattern is by comparing the start node of each relationship in the path with the node at the corresponding index of the nodes() collection from the path. For this you need the relationship and node collections from the path, an index counter and some boolean evaluation equivalent to ALL(). One way to do it is to use REDUCE with a collection for the accumulator, so you can accumulate index and maintain a true/false value for the path at the same time. Here's an example, the accumulator starts at [0,1] where the 0 is the index for testing that startNode(r) equals the node at the corresponding index in the node collection (i.e. it's an outgoing relationship) and the 1 represents true, which signifies that the path has not yet failed your conditions. For each relationship the index value is incremented and the CASE/WHEN clause multiplies the 'boolean' with 1 if your conditions are satisfied, with 0 if not. The evaluation of the path is then the evaluation of the second value in the collection returned by REDUCE -- if 1 then yay, if 0 then boo.
MATCH path = (n)-[*1..10]-(m)
WITH path, nodes(path) as ns, relationships(path) as rs
WHERE REDUCE(acc = [0,1], r IN rs |
[acc[0]+1, CASE WHEN
r.PROP='true' AND
(type(r) <> "SENDS" OR startNode(r) = ns[acc[0]]) THEN acc[1]*1 ELSE acc[1]*0 END]
)[1] = 1
RETURN path
or maybe this is more readable
WHERE REDUCE(acc = [0,1], r IN rs |
CASE WHEN
r.PROP=true AND
(type(r) <> "SENDS" OR startNode(r) = ns[acc[0]])
THEN [acc[0]+1, acc[1]*1]
ELSE [acc[0]+1, acc[1]*0]
END
)[1] = 1
Here's a console: http://console.neo4j.org/?id=v3kgz9
For completeness I've answered the question using jjaderberg correct solution plus a condition to fix the start node and ensure that no zero length paths are included
MATCH p = (n)-[*1..10]-(m)
WHERE ALL(n in nodes(p) WHERE 1=length(filter(m in nodes(p) WHERE m=n)))
AND (id(n)=1)
WITH p, nodes(p) as ns, relationships(p) as rs
WHERE REDUCE(acc = [0,1], r IN rs | [acc[0]+1,
CASE WHEN r.PROP='true' AND (type(r) <> "SEND" OR startNode(r) = ns[acc[0]])
THEN acc[1]*1
ELSE acc[1]*0
END])[1] = 1
RETURN nodes(p);
Or my alternative answer based on jjaderbags answer but does not use accumulator but which is slightly slower
MATCH p=(n)-[rels*1..10]-(m)
WHERE ALL(n in nodes(p) WHERE 1=length(filter(m in nodes(p) WHERE m=n)))
AND( ALL (r IN rels WHERE r.PROP='true')
AND id(n)=1)
WITH p, range(0,length(p)-1) AS idx, nodes(p) as ns, relationships(p) as rs
WHERE ALL (i in idx WHERE
CASE type(rs[i])='SEND'
WHEN TRUE THEN startnode(rs[i])=ns[i]
ELSE TRUE
END)
RETURN nodes(p);
Related
I'm working on neo4j and I'm trying to put a condition on a series of relationships, where I need to sum a property of said relationship.
When I do the basic textbook part of filtering for the relationship without the sum, it all works.
MATCH c= (a)-[:VENDE*2..3]->(b)
WHERE ALL (r IN relationships(c)
WHERE r.ammontare = 25000)
RETURN c
When I try to put the condition on the sum of said property, I can't find a way.
I tried REDUCE but it's stuck because the this error I didn't manage to work around: "Type mismatch: accumulator is Integer but expression has type Boolean"
MATCH c= (a)-[:VENDE*2..3]->(b)
WHERE ALL (r IN relationships(c)
WHERE REDUCE( t = 0, n in relationships(c)|t= t+ n.ammontare) = 25000)
RETURN c
I tried with apoc but this doesn't work either. What am I getting wrong?
MATCH c= (a)-[:VENDE*2..3]->(b)
WHERE all(r in relationships(r)
WHERE APOC.COLL.SUM( [r IN relationships(c)| r.ammontare]) = 25000)
return c
The last method runs but it's too heavy and goes into timeout.
MATCH c= (a)-[:VENDE*2]->(b)
WITH c, relationships(c) as rels
UNWIND (rels) as rel
With c, sum(rel.ammontare) as re
where re > 25000
return c
I'm not sure but this will work.
MATCH c= (a)-[:VENDE*2..3]->(b)
WHERE ALL (r IN relationships(c)
WHERE REDUCE( t = 0, n in relationships(c)| t+ n.ammontare) = 25000)
RETURN c
I have vertice v which has a lot of (millions) edges. Each time I query for a small subset of these edges (1K). I would like these edges to be different in each query. Is it possible?
Here is one way to do that (using Cypher):
MATCH (v:Foo)
WHERE v.id = 'node_with_millions_of_BAR_relationships'
WITH v, [(v)-[r:BAR]->() | r] AS rels
WITH v, rels, CASE WHEN $limit >= SIZE(rels)
THEN 1.0
ELSE TOFLOAT($limit) / SIZE(rels) END AS pct
RETURN v, CASE WHEN pct >= 1.0
THEN rels
ELSE REDUCE(s = [], x IN rels | CASE
WHEN SIZE(s) < $limit AND rand() < pct THEN s + x
ELSE s END)
END AS randomRels
Assumptions made by this example:
The (maximum) number of random relationships to return is specified by the limit parameter.
The node of interest has the Foo label, and the relationships of interest are outgoing and have the BAR type.
How could I imply restrictions on variable length path?
I have all possible paths from some start node query:
CREATE INDEX ON :NODE(id)
MATCH all_paths_from_Start = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person)
WHERE start.id = 128 AND start.country <> "Uganda"
RETURN paths;
No I want filter out all paths which have at least two persons with the same country. How could I do that?
1) Get an array of countries to the path with possible duplicates: REDUCE
2) Remove duplicates and compare the sizes of arrays: UNWIND + COLLECT(DISTINCT...)
MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person)
WHERE start.id = 128 AND start.country <> "Uganda"
WITH path,
REDUCE(acc=[], n IN NODES(path) | acc + n.country) AS countries
UNWIND countries AS country
WITH path,
countries, COLLECT(DISTINCT country) AS distinctCountries
WHERE SIZE(countries) = SIZE(distinctCountries)
RETURN path
P.S. REDUCE can be replaced by EXTRACT (thanks to Gabor Szarnyas):
MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person)
WHERE start.id = 128 AND start.country <> "Uganda"
WITH path,
EXTRACT(n IN NODES(path) | n.country) AS countries
UNWIND countries AS country
WITH path,
countries, COLLECT(DISTINCT country) AS distinctCountries
WHERE SIZE(countries) = SIZE(distinctCountries)
RETURN path
P.P.S. Thanks again to Gabor Szarnyas for another idea for simplifying the query:
MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person)
WHERE start.id = 128 AND start.country <> "Uganda"
WITH path
UNWIND NODES(path) AS person
WITH path,
COLLECT(DISTINCT person.country) as distinctCountries
WHERE LENGTH(path) + 1 = SIZE(distinctCountries)
RETURN path
One solution that I can think of is to get the nodes of the path, and for each person on the path, extract the value of the number of persons from the same country (which we determine by filtering for the same country. A path has persons from unique countries if it has zero persons from the same country, i.e. for all persons, there is only a single person (the person himself/herself) from that country.
MATCH p = (start:Person {id: 128})-[:FRIENDSHIP*1..20]->(person:Person)
WHERE start.country <> "Uganda"
WITH p, nodes(p) AS persons
WITH p, extract(p1 IN persons | size(filter(p2 IN persons WHERE p1.country = p2.country))) AS personsFromSameCountry
WHERE length(filter(p3 IN personsFromSameCountry WHERE p3 > 1)) = 0
RETURN p
The query is syntactically correct but I didn't test it on any data.
Note that I moved the id = 128 condition to the pattern and shortened the all_paths_from_Start variable as p.
I have a tree-like graph as shown below
Now let's say I start from the root node R and want to find all the paths from 1 to the nearest type B node. In the example graph, the result should be
path-1: 1,2
path-2: 1,3,6,10,13
path-3: 1,3,7,10,13
How can I do this?
Keep the node type in the label - (:A) and (:B), relationships between nodes are of type 'connect'.
// Find all paths from Root to all B-nodes
MATCH (A:A {name:1}), p = (A)-[:connect*]->(B:B)
// Get all node labels for each path
WITH A, p, extract( n in nodes(p) | labels(n) ) as pathLabels
// We find the number of occurrences of B-node in each path
WITH A, p, reduce( bCount = 0, Labels in pathLabels |
CASE WHEN 'B' IN Labels THEN 1 ELSE 0 END + bCount
) as bCount
// Return only the path in which the B-node is in the end of the path
WHERE bCount = 1
RETURN p
Example data query:
MERGE (A1:A {name:1})-[:connect]-(B2:B {name:2}) MERGE (A1)-[:connect]-(A3:A {name:3}) MERGE (B2)-[:connect]-(A4:A {name:4}) MERGE (B2)-[:connect]-(A5:A {name:5}) MERGE (A4)-[:connect]-(B8:B {name:8}) MERGE (B8)-[:connect]-(A11:A {name:11}) MERGE (B8)-[:connect]-(A12:A {name:12}) MERGE (A5)-[:connect]-(A9:A {name:9}) MERGE (A3)-[:connect]-(A6:A {name:6}) MERGE (A3)-[:connect]-(A7:A {name:7}) MERGE (A6)-[:connect]-(A10:A {name:10}) MERGE (A7)-[:connect]-(A10) MERGE (A10)-[:connect]-(B13:B {name:13}) RETURN *
Update (searching not A-type nodes):
// Find all paths from Root to all not A-nodes
MATCH (A:A {name:1}), p = (A)-[:connect*]->(B) WHERE NOT 'A' IN labels(B)
// Get all node labels for each path
WITH A, p, extract( n in nodes(p) | labels(n) ) as pathLabels
// We find the number of occurrences of A-node in each path
WITH A, p, reduce( aCount = 0, Labels in pathLabels |
CASE WHEN 'A' IN Labels THEN 1 ELSE 0 END + aCount
) as aCount
// Return only the path in which the count of A-node
// is 1 less the total number of nodes in the path.
WHERE aCount = length(p)
RETURN p
Let's say, I have a path A->B->C->D and the relationships have a property val.
Now, I have to pick any two nodes from the path and if the rel.val>0.8
and if it is true for all the pair of nodes, then return the path
Ex:
P = A-->B-->C-->D
All nodes = [A,B,C,D]
return p if{
rel.val of (A,B) >0.8
rel.val of (A,C) >0.8
rel.val of (A,D) >0.8
rel.val of (B,C) >0.8
rel.val of (B,D) >0.8
rel.val of (C,D) >0.8
}
Here is my query, (of course the query is wrong):
MATCH p=(a{word:"quality"})-[r*1..2]->(b)
WHERE NONE (n IN nodes(p) WHERE size(filter(x IN nodes(p) WHERE n = x))> 1)
MATCH q = (a)-[r:coocr]->(b) where a in nodes(p) AND b in nodes(p) AND NOT b = a AND None(rel IN rels(q) WHERE rel.val < 0.8 )
RETURN p
In summary, you want to MATCH a path and then make sure that all pairs of nodes in your path are connected by a relationship which fullfills a certain criterion (rel.val > 0.8).
Interesting question, I think this is not really straightforward. Maybe I am overlooking something obvious?
Here is an idea how to approach the problem. You first MATCH your path, then MATCH between all nodes in the path and count the number of relationships with rel.val > 0.8. This number has to be the size of the factorial of the number of nodes (num relationships == (num nodes)!, number of possible combinations of 2).
The following query returns the number of relationships, but I don't know how to compare this to the factorial of the number of nodes:
// match your path like before
MATCH p=(a:Uselabel {word:"quality"})-[r:USETYPE*1..2]->(b)
// use unwind to get the nodes from the path
UNWIND nodes(path) AS x
// do this twice to match the nodes onto themselves
UNWIND nodes(path) AS y
// match your relationship
MATCH (x)-[rel:USETYPE]-(y)
// criterion for your relationship
WHERE rel.val > 0.8
// only if two different nodes
WHERE x <> y
// get the count of pairs
WITH p, count(DISTINCT rel) AS num_pairs
// now I don't know how to get/compare the factorial of the number of nodes :)
RETURN num_pairs
I didn't find a built-in function for the factorial, so you have to look into this.