I want to return the count of the union statement, but I'm having a little trouble with my return statement.
Given a venn diagram, the union is the sum of the "areas" of the two circles minus the intersection between them. I'm trying to emulate this, but I ran into a little trouble because Booleans don't convert into ints.
I'm trying to return something like this:
COUNT(DISTINCT a.name) + COUNT(DISTINCT b.name) - (a.name == b.name)
You can do CASE WHEN a.name = b.name THEN 1 ELSE 0 END (and do sum on that, or something). However, you might have dups if you're doing distinct of the other two--maybe you need to adjust something in the rest of your query to avoid duplicates, if you can give us more detail.
If we assume your original query looked like the first UNION example in the neo4j 2.1.5 cheat sheet:
MATCH (a)-[:KNOWS]->(b)
RETURN b.name
UNION
MATCH (a)-[:LOVES]->(b)
RETURN b.name
Then you can get the count of the number of distinct names in the UNION this way:
OPTIONAL MATCH (a)-[:KNOWS]->(b)
WITH COLLECT(DISTINCT b.name) AS n1
OPTIONAL MATCH (c)-[:LOVES]->(d)
WITH COLLECT(DISTINCT d.name) AS n2, n1
RETURN LENGTH(filter(x IN n2 WHERE NOT (x IN n1))) + LENGTH(n1)
I don't see a way to use an actual UNION statement to calculate the answer.
This may be a bit more cypher than you planned to write but I was recently in a similar situation and I ended up putting the sets and intersection in collections and figuring out the resulting difference.
I am sure there is a better way but this is what i came up with. Essentially, I found set 1 and set 2 and put them each in a collection. Then I found the intersection by finding all of the things that were the same and put them in another collection called the intersection. then I just filtered down each of set1, and set2 against the intersection. In the end I am left with two sets that contain the nodes out of the intersection.
match (things_in_set_1)
where <things in set 1 criteria>
with collect(things_in_set_1.name) as set1
match (things_in_set_2)
where <things in set 2 criteria>
with collect(things_in_set_2.name) as set2, set1
optional match (things_in_set_1),(things_in_set_2)
where things_in_set_1.name = things_in_set_2.name
with collect(things_in_set_1.name) as intersection, set1, set2
with filter( id IN set1 WHERE not(id in(intersection)) ) as set_unique_nodes1, set2, intersection
with filter( id IN set2 WHERE not(id in(intersection)) ) as set_unique_nodes2, set_unique_nodes1
return length(set_unique_nodes2) + length(set_unique_nodes1)
Related
I'm trying to make a cypher query to make nodes list which is using "multi match" as follows:
MATCH (N1:Node)-[r1:write]->(P1:Node),
(P1:Node)-[r2:access]->(P2:Node)<-[r3:create]-(P1)
WHERE r1.Time <r3.Time and r1.ID = r2.ID and r3.Time < r2.Time
return nodes(*)
I expect the output of the Cypher to be all nodes of the result, but Cypher doesn't support nodes(*).
I know that there is a way to resolve this like thisa;
MATCH G1=(N1:Node)-[r1:write]->(P1:Node),
G2=(P1:Node)-[r2:access]->(P2:Node)<-[r3:create]-(P1)
WHERE r1.Time <r3.Time and r1.ID = r2.ID and r3.Time < r2.Time
return nodes(G1), nodes(G2)
But the match part could be changed frequently so I want to know the way to get nodes from multi-match without handling variable for every match.
Is it possible?
Your specific example is easy to consolidate into a single path, as in:
MATCH p=(:Node)-[r1:write]->(p1:Node)-[r2:access]->(:Node)<-[r3:create]-(p1)
WHERE r1.Time < r3.Time < r2.Time AND r1.ID = r2.ID
RETURN NODES(p)
If you want the get distinct nodes, you can replace RETURN NODES(p) with UNWIND NODES(p) AS n RETURN DISTINCT n.
But, in general, you might be able to use the UNION clause to join the results of multiple disjoint statements, as in:
MATCH p=(:Node)-[r1:write]->(p1:Node)-[r2:access]->(:Node)<-[r3:create]-(p1)
WHERE r1.Time < r3.Time < r2.Time AND r1.ID = r2.ID
UNWIND NODES(p) AS n
RETURN n
UNION
MATCH p=(q0:Node)-[s1:create]->(q1:Node)-[s2:read]->(q0)
WHERE s2.Time > s1.Time AND s1.ID = s2.ID
UNWIND NODES(p) AS n
RETURN n
The UNION clause would combine the 2 results (after removing duplicates). UNION ALL can be used instead to keep all the duplicate results. The drawback of using UNION (ALL) is that variables cannot be shared between the sub-statements.
I have the following records in my neo4j database
(:A)-[:B]->(:C)-[:D]->(:E)
(:C)-[:D]->(:E)
I want to get all the C Nodes and all the relations and related Nodes. If I do the query
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
I get the first to match if I do
Match (i:C)-[u:D]->(y:E)
Return i,u,y
I get the second to match.
But I want both of them in one query. How do I do that?
The easiest way is to UNION the queries, and pad unused variables with null (because all cyphers UNION'ed must have the same return columns
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
UNION
Match (i:C)-[u:D]->(y:E)
Return NULL as p, NULL as o,i,u,y
In your example though, the second match actually matches the last half of the first chain as well, so maybe you actually want something more direct like...
MATCH (c:C)
OPTIONAL MATCH (connected)
WHERE (c)-[*..20]-(connected)
RETURN c, COLLECT(connected) as connected
It looks like you're being a bit too specific in your query. If you just need, for all :C nodes, the connected nodes and relationships, then this should work:
MATCH (c:C)-[r]-(n)
RETURN c, r, n
I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?
Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.
You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .
Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.
I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks
I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.
The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.
I wish to use the results of a UNION (n) as a filter for a subsequent match.
MATCH (n:Thing)-<<Insert valid match filters here>>
RETURN n
UNION
MATCH (n:Thing)-<<Insert a different set of match filters here>>
RETURN n;
n feeds into:
MATCH (n)-[:RELTYPE1]->(a:Artifact);
RETURN a;
I would expect to use a WITH statement, but I've struggled to figure out how structure the statement.
MATCH (n:Thing)-<<Insert valid match filters here>>
RETURN n
UNION
MATCH (n:Thing)-<<Insert a different set of match filters here>>
WITH n
MATCH (n)-[:RELTYPE1]->(a:Artifact);
RETURN a;
This was my original attempt, but the WITH is interpreted as the start of subquery of the UNION's second match (which makes sense).
I can see a few inelegant ways to make this work, but what is the proper approach?
I have been looking at your union example and it makes sense to me but I cannot see how I could make it work. But I am certainly not the guy with all of the answers. Is there a reason you couldn't do something like this though...
MATCH (n:Thing)
WHERE n.name = 'A'
WITH collect(n) as n1
MATCH (n:Thing)
WHERE n.name = 'B'
WITH n1 + collect(n) AS both
UNWIND both AS n
MATCH (n)-[:RELTYPE1]->(a:Artifact);
RETURN a;