Say I have the following graph:
(a1:A) -> (b1:B) -> (c1:C) -> (d1:D)
\ /
- -> (x1:X) - --> (y1:Y)
(a2:A) -> (b2:B) -> (c2:C) -> (d2:D)
(a3:A) -> (x3:X) -> (y3:Y) -> (d3:D)
The actual graph also contains other relationships between node label A and D. But I am only interested in these relationships between A and D. So I have to force some rules on the path. see query below
match p1=((:A)-->(:B)-->(:C)->(:D))
return p1
match p2=((:A)-->(:X)-->(:Y)->(:D))
return p2
This will return me four rows
a1-b1-c1-d1
a1-x1-y1-d1
a2-b2-c2-d2
a3-x3-y3-d3
But I would like to return an array of subgraphs and merge paths based on the common node attribute D.name. I.e., d1, d2, d3 are differentiated by their node attribute called "name". So the output I would like to have is
// query logic
// return subgraphs, each subgraph is a row. each subgraph contains the unique node and relationships from A to D. If there are multiple paths to the same D, then merge these paths to the same subgraph
More concretely, the return of the above example becomes
row1: nodes(a1, b1, c1, d1, x1, y1), relationships(a1, b1, c1, d1, x1, y1)
row2: nodes(a2, b2, c2, d2), relationships(a2, b2, c2, d2)
row3: nodes(a3, x3, y3, d3), relationships(a3, x3, y3, d3)
What would be the query?
UPDATE:
Test graph
merge (a1:A{name: 'a1'})
merge (b1:B{name: 'b1'})
merge (c1:C{name: 'c1'})
merge (d1:D{name: 'd1'})
merge (x1:X{name: 'x1'})
merge (y1:Y{name: 'y1'})
merge (a2:A{name: 'a2'})
merge (b2:B{name: 'b2'})
merge (c2:C{name: 'c2'})
merge (d2:D{name: 'd2'})
merge (a3:A{name: 'a3'})
merge (x3:X{name: 'x3'})
merge (y3:Y{name: 'y3'})
merge (d3:D{name: 'd3'})
merge(a1)-[:TESTS]->(b1)
merge(b1)-[:TESTS]->(c1)
merge(c1)-[:TESTS]->(d1)
merge(a1)-[:TESTS]->(x1)
merge(x1)-[:TESTS]->(y1)
merge(y1)-[:TESTS]->(d1)
merge(a2)-[:TESTS]->(b2)
merge(b2)-[:TESTS]->(c2)
merge(c2)-[:TESTS]->(d2)
merge(a3)-[:TESTS]->(x3)
merge(x3)-[:TESTS]->(y3)
merge(y3)-[:TESTS]->(d3)
It seems that not just the :D nodes are different, but also the :A nodes. But, assuming that you want group by d.name, I guess this does it.
MATCH p=((:A)-[*]->(d:D))
RETURN d.name AS dName,
apoc.coll.toSet(
apoc.coll.flatten(
COLLECT(nodes(p))
)
) AS nodes,
apoc.coll.toSet(
apoc.coll.flatten(
COLLECT(relationships(p))
)
) AS relationships
In case you want to filter for specific paths, you can do something like:
MATCH (a:A), (d:D)
OPTIONAL MATCH p1=((a)-->(:B)-->(:C)-->(d))
OPTIONAL MATCH p2=((a)-->(:X)-->(:Y)-->(d))
WITH d.name AS dName,
apoc.coll.toSet(
apoc.coll.flatten(
COLLECT(COALESCE(nodes(p1),[]) + COALESCE(nodes(p2),[]))
)
) AS nodes,
apoc.coll.toSet(
apoc.coll.flatten(
COLLECT(COALESCE(relationships(p1),[]) + COALESCE(relationships(p2),[]))
)
) AS relationships
WHERE nodes <> []
RETURN nodes, relationships
Related
I want to get all the list of distinct nodes and relationship that I am getting through this query.
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
Return a,b,c,d,r
limit 10
This should work:
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
WITH * LIMIT 10
RETURN
COLLECT(DISTINCT a) AS aList,
COLLECT(DISTINCT b) AS bList,
COLLECT(DISTINCT c) AS cList,
COLLECT(DISTINCT r) AS rList,
COLLECT(DISTINCT d) AS dList
The scenario is the following:
I have a set of nodes of type x that are linked to nodes of type y.
I want to match all x nodes except those that are linked to a y node that has an attribute equal to a particular value.
Example input:
CREATE (a:x {name: 'a'}), (b:x {name: 'b'}), (c:x {name: 'c'});
CREATE (d:y {name: 'd', attrib: 1}), (e:y {name: 'e', attrib: 2}),
(f:y {name: 'f', attrib: 3}), (g:y {name: 'g', attrib: 4}),
(h:y {name: 'h', attrib: 5}), (i:y {name: 'i', attrib: 6});
MATCH (a), (d), (e) WHERE a.name = 'a' AND d.name = 'd' AND e.name = 'e'
CREATE (a)-[r:z]->(d), (a)-[s:z]->(e) RETURN *;
MATCH (b), (f), (g) WHERE b.name = 'b' AND f.name = 'f' AND g.name = 'g'
CREATE (b)-[r:z]->(f), (b)-[s:z]->(g) RETURN *;
MATCH (c), (h), (i) WHERE c.name = 'c' AND h.name = 'h' AND i.name = 'i'
CREATE (c)-[r:z]->(h), (c)-[s:z]->(i) RETURN *;
Here I want to return all the x nodes except those that are linked to a y node that has attrib = 5.
Here's what I tried:
MATCH (n:x)-[]-(m:y) WHERE NOT m.attrib = 5 RETURN n
From this query I get all x nodes, that is: a, b and c. I would like to exclude c, because it's linked to h, which has h.attrib = 5.
Edit:
I found a query that does the job:
MATCH (n:x), (m:x)-[]-(o:y)
WHERE o.attrib = 5
WITH collect(n) as all_x_nodes, collect(m) as bad_x_nodes
RETURN [n IN all_x_nodes WHERE NOT n IN bad_x_nodes]
The problem is that it's not efficient. Any better alternative?
This simple query should do exactly what you asked for: "return all the x nodes except those that are linked to a y node that has attrib = 5."
MATCH (n:x)
WHERE NOT (n)--(:y {attrib: 5})
RETURN n;
A better approach is to find all :x nodes that you want to exclude (that are connected to the :y node with the specific attribute), collect those x nodes, then match to all :x nodes that aren't in the collection:
MATCH (exclude:x)--(:y{attrib:5})
WITH collect(distinct exclude) as excluded
MATCH (n:x)
WHERE NOT n in excluded
RETURN collect(n) as result
An alternate approach using APOC Procedures is to get both collections, and subtract the excluded collection from the other:
MATCH (exclude:x)--(:y{attrib:5})
WITH collect(distinct exclude) as excluded
MATCH (n:x)
WITH excluded, collect(n) as nodes
RETURN apoc.coll.subtract(nodes, excluded) as result
In either case, it would help to have an index on :y(attrib). In this data set it doesn't matter. On much larger sets it will.
I have a tree-like graph as shown below
Now let's say I start from the root node R and want to find all the paths from 1 to the nearest type B node. In the example graph, the result should be
path-1: 1,2
path-2: 1,3,6,10,13
path-3: 1,3,7,10,13
How can I do this?
Keep the node type in the label - (:A) and (:B), relationships between nodes are of type 'connect'.
// Find all paths from Root to all B-nodes
MATCH (A:A {name:1}), p = (A)-[:connect*]->(B:B)
// Get all node labels for each path
WITH A, p, extract( n in nodes(p) | labels(n) ) as pathLabels
// We find the number of occurrences of B-node in each path
WITH A, p, reduce( bCount = 0, Labels in pathLabels |
CASE WHEN 'B' IN Labels THEN 1 ELSE 0 END + bCount
) as bCount
// Return only the path in which the B-node is in the end of the path
WHERE bCount = 1
RETURN p
Example data query:
MERGE (A1:A {name:1})-[:connect]-(B2:B {name:2}) MERGE (A1)-[:connect]-(A3:A {name:3}) MERGE (B2)-[:connect]-(A4:A {name:4}) MERGE (B2)-[:connect]-(A5:A {name:5}) MERGE (A4)-[:connect]-(B8:B {name:8}) MERGE (B8)-[:connect]-(A11:A {name:11}) MERGE (B8)-[:connect]-(A12:A {name:12}) MERGE (A5)-[:connect]-(A9:A {name:9}) MERGE (A3)-[:connect]-(A6:A {name:6}) MERGE (A3)-[:connect]-(A7:A {name:7}) MERGE (A6)-[:connect]-(A10:A {name:10}) MERGE (A7)-[:connect]-(A10) MERGE (A10)-[:connect]-(B13:B {name:13}) RETURN *
Update (searching not A-type nodes):
// Find all paths from Root to all not A-nodes
MATCH (A:A {name:1}), p = (A)-[:connect*]->(B) WHERE NOT 'A' IN labels(B)
// Get all node labels for each path
WITH A, p, extract( n in nodes(p) | labels(n) ) as pathLabels
// We find the number of occurrences of A-node in each path
WITH A, p, reduce( aCount = 0, Labels in pathLabels |
CASE WHEN 'A' IN Labels THEN 1 ELSE 0 END + aCount
) as aCount
// Return only the path in which the count of A-node
// is 1 less the total number of nodes in the path.
WHERE aCount = length(p)
RETURN p
Let's say, I have a path A->B->C->D and the relationships have a property val.
Now, I have to pick any two nodes from the path and if the rel.val>0.8
and if it is true for all the pair of nodes, then return the path
Ex:
P = A-->B-->C-->D
All nodes = [A,B,C,D]
return p if{
rel.val of (A,B) >0.8
rel.val of (A,C) >0.8
rel.val of (A,D) >0.8
rel.val of (B,C) >0.8
rel.val of (B,D) >0.8
rel.val of (C,D) >0.8
}
Here is my query, (of course the query is wrong):
MATCH p=(a{word:"quality"})-[r*1..2]->(b)
WHERE NONE (n IN nodes(p) WHERE size(filter(x IN nodes(p) WHERE n = x))> 1)
MATCH q = (a)-[r:coocr]->(b) where a in nodes(p) AND b in nodes(p) AND NOT b = a AND None(rel IN rels(q) WHERE rel.val < 0.8 )
RETURN p
In summary, you want to MATCH a path and then make sure that all pairs of nodes in your path are connected by a relationship which fullfills a certain criterion (rel.val > 0.8).
Interesting question, I think this is not really straightforward. Maybe I am overlooking something obvious?
Here is an idea how to approach the problem. You first MATCH your path, then MATCH between all nodes in the path and count the number of relationships with rel.val > 0.8. This number has to be the size of the factorial of the number of nodes (num relationships == (num nodes)!, number of possible combinations of 2).
The following query returns the number of relationships, but I don't know how to compare this to the factorial of the number of nodes:
// match your path like before
MATCH p=(a:Uselabel {word:"quality"})-[r:USETYPE*1..2]->(b)
// use unwind to get the nodes from the path
UNWIND nodes(path) AS x
// do this twice to match the nodes onto themselves
UNWIND nodes(path) AS y
// match your relationship
MATCH (x)-[rel:USETYPE]-(y)
// criterion for your relationship
WHERE rel.val > 0.8
// only if two different nodes
WHERE x <> y
// get the count of pairs
WITH p, count(DISTINCT rel) AS num_pairs
// now I don't know how to get/compare the factorial of the number of nodes :)
RETURN num_pairs
I didn't find a built-in function for the factorial, so you have to look into this.
I am running the following query that is meant to compare two collections nodes set1 and set2. All nodes in set2 are in set1, and I would like to identify all the nodes in set1 that are NOT in set2. However, the query returns a set of nodes that includes some of the nodes in set1. I am running this query on v2.1.7. Suggestions?
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE ALL(x in set2 WHERE NOT x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Alternative query, same result:
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE NONE(x in set2 WHERE x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Your matches don't return just one row as you might expect but many rows,
and your comparison is done between the cross product of those many row combinations. You probably want to create a set for each of your two subtrees first with a combination of unwind + collect(distinct)
The code below will not be as fast, as cypher internally doesn't have a Set concept yet.
try this
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(p) as n
with collect(distinct n) as set1
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(q) as m
with collect(distinct m) as set2
WHERE NONE(x in set2 WHERE x in set1)
UNWIND set1 AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
The following query was successful, and addresses Michael's discussion regarding cross products (above).
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(p) as set1
UNWIND set1 as x1
with collect(DISTINCT x1) as set11
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(q) as set2,set11
UNWIND set2 as x2
with collect(distinct x2) as set22,set11
with REDUCE(pneumo=[],x in set11|case when x in set22 then pneumo else pneumo
+ [x] END) AS pneumo
return pneumo