Perform MATCH on collection / Break apart collection - neo4j

I am trying to mimc the functionality of the neo4j browser to display my graph in my front end. The neo4j browser issues two calls for every query - the first call performs the query that the user types into the query box and the second call uses find the relationships between every node returned in the first user-entered query.
{
"statements":[{
"statement":"START a = node(1,2,3,4), b = node(1,2,3,4)
MATCH a -[r]-> b RETURN r;",
"resultDataContents":["row","graph"],
"includeStats":true}]
}
In my application I would like to be more efficient so I would like to be able to get all of my nodes and relationships in a single query. The query that I have at present is:
START person = node({personId})
MATCH person-[:RELATIONSHIP*]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts
This query runs well but it doesn't give me the "d" and "details" nodes which were linked to the original "person".
I have tried to join the "p" and "person" results in a collection:
collect(p) + collect(person) AS people
But this does not allow me to perform a MATCH on the resulting collection. As far as I can figure out there is no way of breaking apart a collection.
The only option I see at the moment is to split the query into two; return the "collect(p) + collect(person) AS people" collection and then use the node values in a second query. Is there a more efficient way of performing this query?

If you use the quantifier *0.. RELATIONSHIP is also match at a depth of 0 making person the same as p in this case. The * without specified limits defaults to 1..infinity
START person = node({personId})
MATCH person-[:RELATIONSHIP*0..]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts

Related

Remove automorphisms of a cypher query output

When doing a Cypher query to retrieve a specific subgraph with automorphisms, let's say
MATCH (a)-[:X]-(b)-[:X]-(c),
RETURN a, b, c
It seems that the default behaviour is to return every retrieved subgraph and all their automorphisms.
In that exemple, if (u)-[:X]-(v)-[:X]-(w) is a graph matching the pattern, the output will be u,v,w but also w,v,u, which consist in the same graph.
Is there a way to retrieve each subgraph only once ?
EDIT: It would be great if Cypher have a feature to do that in the search, using some kind of symmetry breaking condition as it would reduce the computing time. If that is not the case, how would you post-process to find the desired output ?
In the query you are making, (a)-[r:X]-(b) and (a)-[t:X]-(c) refer to a similar pattern. Since (b) and (c) can be interchanged. What is the need to repeat matching twice? MATCH (a)-[r:X]-(b) RETURN a, r, b returns all the subgraphs you are looking for.
EDIT
You can do something as follows to find the nodes, which are having two relations of type X.
MATCH (a)-[r:X]-(b) WHERE size((a)-[:X]-()) = 2 RETURN a, r, b
For these kind of mirrored patterns, we can add a restriction on the internal graph ids so only one of the two paths is kept:
MATCH (a)-[:X]-(b)-[:X]-(c)
WHERE id(a) < id(c)
RETURN a, b, c
This will also prevent the case where a = c.

Cypher query that will return only 1 relation of each type between two nodes

How can I craft a query that will return only one relation of a certain type between two nodes?
For example:
MATCH (a)-[r:InteractsWith*..5]->(b) RETURN a,r,b
Because (a) may have interacted with (b) many times, the result will contain many relations between the two. However, the relations are not identical. They have different properties because they occurred at different points in time.
But what if you're only interested in the fact that they have interacted at least once?
Instead of the result as it appears currently I'd like to receive a result that has either:
Only one random relation from the set of relations between (a) and (b)
Only those relations that fit to some criteria (e.g. "newest" or one of each type, ...)
One approach I have thought of is creating new relations of the type "hasEverInteractedWith". But there should be another way, right?
Use shortestPath() to get the quickest single result.
MATCH (a)-[:InteractsWith*..5]->(b)
WITH DISTINCT a, b
MATCH p = shortestPath((a)-[:InteractsWith*..5]->(b))
RETURN a, b, RELATIONSHIPS(p) AS r
If you want to get a specific one, you'll have to get all of the r and then filter them down, which will be slower (but provide more context).
MATCH (a)-[r:InteractsWith*..5]->(b)
WITH a, b, COLLECT(r) AS rs
RETURN a, b, REDUCE(s = HEAD(rs), r IN TAIL(rs)|CASE WHEN s.date > r.date THEN s ELSE r END)

Cypher query doesn't return all the expected nodes

I have this graph:
A<-B->C
B is the root of a tiny tree. There is exactly one relation between A and B, and one between B and C.
When I run the following, one node is returned. Why does this Cypher query not return the A and C nodes?
MATCH(a {name:"A"})<-[]-(rewt)-[]->(c) RETURN c
It would seem to be that the first half of that query would find the root, and the second half would find both child nodes.
Until a few minutes ago, I would have thought it logically identical to the following query which works. What's the difference?
MATCH (a {name:"A"})<-[]-(rewt)
MATCH (rewt)-[]->(c)
RETURN c
EDIT for cybersam
I have abstracted my database so we could discuss my specific issue. Now, we still have a tiny tree, but there are 4 nodes that are children of the root.(Sorry this is different, but I'm developing and don't want to change my environment too much.)
This query returns all 4:
match(a)<-[]-(b:ROOT)-[]->(c) return c
One of them has a name of "dddd"...
match(a {name"dddd"})<-[]-(b:ROOT)-[]->(c) return c
This query only returns three of them. "dddd" is not included. omg.
To answer cybersam's specific question, this query:
MATCH (a {name:"dddd"})<--(rewt:CODE_ROOT)
MATCH (rewt)-->(c)
RETURN a = c;
Returns four rows. The values are true, false, false, false
[UPDATED]
There is a difference between your 2 queries. A MATCH clause will filter out all duplicate relationships.
Therefore, your first query would filter out all matches where the left-side relationship is the same as the right-side relationship:
MATCH(a {name:"A"})<--(rewt)-->(c)
RETURN c;
Your second query would allow the 2 relationships to be the same, since the relationships are found by 2 separate MATCH clauses:
MATCH (a {name:"A"})<--(rewt)
MATCH (rewt)-->(c)
RETURN c;
If I am right, then the following query should return N rows (where N is the number of outgoing relationships from rewt) and only one value should be true:
MATCH (a {name:"A"})<--(rewt)
MATCH (rewt)-->(c)
RETURN a = c;
Both work just fine for me. I've tried on 2.3.0 Community.
Do you mind posting your CREATE command ?
In each MATCH clause, each relationship will be matched only once. See http://neo4j.com/docs/stable/cypherdoc-uniqueness.html for reference.
See this related question as well: What does a comma in a Cypher query do?

Neo4j cypher query efficiency and syntax

I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks
I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.
The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.

Querying multiple indexes not working if one condition fails in Neo4j

I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).
In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.

Resources