Excluding "symmetric" results in Neo4j - neo4j

I want to query a Neo4j graph for a structure that includes two interchangeable nodes, but I don't want two unique responses for each of the "symmetric" responses.
How do I express in Cypher that two nodes are interchangeable?
An example:
I want to look for the following structure in the graph with the following query:
MATCH (c:Customer)-[]->(p:Purchase)
MATCH (c:Customer)-[]->(q:Purchase)
MATCH (p)-[]->(m:Company)
MATCH (q)-[]->(m:Company)
RETURN DISTINCT c, p, q, m
The default behavior would be for Neo4j to return the following two graphs:
(i.e. The assignment of p and q to Purchase1 and Purchase2 are reversed)
How do I express that the elements p and q in my query are interchangeable, and I only need one of the above responses?

To prevent those kinds of results, you would typically have an inequality based on the node ids:
WHERE id(p) < id(q)
That said, you may be able to form this query a little cleaner like this (provided you want all purchases between a customer and a company with at least two purchases made from that customer to the company):
MATCH (c:Customer)-->(p:Purchase)-->(m:Company)
WITH c, m, collect(p) as purchases, count(p) as purchaseCount
WHERE purchaseCount >= 2
RETURN c, m, purchases

Related

Separating matching nodes in a query result

I defined the directed relation Know on person nodes. For example, if Sara knows Alice then Sara-> Alice. I wrote this Cypher query to find all the people who know both the right and left side of the directed relation.
match ((n:Person)-[:Know]-> (m:Person)),(p:Person)
where EXISTS ((m)<-[:Know]-(p)-[:Know]->(n))
RETURN m,n,p
I need to get subgraphs with 3 nodes in the query's result but the result I get is a graph with many nodes. Is there any method to change the query to generate subgraphs with just 3 nodes (for example, a subgraph of Alex-> Sara, Alex-> Alice, Sara-> Alice and if Sara has the same condition on two other people it is shown in another subgraph). This requires repeating some nodes in the output.
MATCH clauses are more flexible than that. Try this:
MATCH (n:Person)-[:Know]->(m:Person)<-[:Know]-(p:Person)-[:Know]->(n)
WHERE NOT EXISTS (()-[:Know]->(p))
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(m)
WHERE q <> n
AND q <> p
}
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(n)
WHERE q <> p
}
RETURN m, n, p
You might have to use a unique ID property, and I'm not sure if the WITH clause will work here as I've gotten it; but with subqueries, you are generally able to import variables from above using WITH.

Optimize Cypher query on multiple unique labels

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?
To illustrate, I want all nodes that are labeled with one of the labels in label_list:
label_list = ['label_1', 'label_2', ...]
Which would be faster in an application (this is pseudo-code):
for label in label_list:
run.query("MATCH (n:{label}) return n")
or
run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")
EDIT:
Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:
MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo
Could someone speak to it?
Yes, it would be better to run multiple NodeByLabelScans in a single query.
For example:
OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n
Notes on the query:
It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
It uses multiple aggregation steps to avoid cartesian products (also see this).
And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).

How to express constraints on consecutive relations of the Neo4j shortestpath query algoritm?

I would like to query a shortestpath in Neo4j but expressing conditions between consecutive relations.
Suppose I have nodes labelled type and relations labelled rel. Such relations have the attributes start_time, end_time, exec_time (for the moment they are of type string, but if you prefer you can consider them as integer). I would like to find the shortest path between two nodes b1 and b2 subject to the constraint that:
the relation outgoing from b1 should have the attribute starting_time bigger than a given value (let me call it th;
if there are more than one of such relations, starting_time of the next relation should be bigger than ending_time of the previous.
Between two nodes I can have multiple realations.
I started from this query limiting the relations with starting_time bigger than th.
MATCH (b1:type{id:"0247"}), (b2:type{id:"0222"}),
p=shortestPath((b1)−[t:rel*]−>(b2))
WHERE ALL (r in relationships(p) WHERE r.starting_time>"14:56:00" )
RETURN p;
I was trying something like this:
MATCH (b1:type{id:"0247"}), (b2:type{id:"0222"}),
p=shortestPath((b1)−[t:rel*]−>(b2))
WITH "14:56:00" as th
WHERE ALL (r in relationships(p) WHERE r.starting_tme>th WITH r.end_time as th )
RETURN p;
but it does not work and I am not sure the shortestPath algorithm in Neo4j accesses the relations of the shortest path sequentially.
How can I express such a condition in Neo4j cypher query language?
If it is not possible is there a suitable way to model such a time condition in a graph database (I mean how can I change the DB?)
This query may do what you want:
MATCH p = shortestPath((b1:type{id:"0247"})−[t:rel*]−>(b2:type{id:"0222"}))
WHERE REDUCE(s = {ok: true, last: "14:56:00"}, r IN RELATIONSHIPS(p) |
{ok: s.ok AND r.starting_time > s.last, last: r.end_time}
).ok
RETURN p;
This query uses REDUCE to iteratively test the relationships while updating the current state s at every step.

Aggregating relationships via cypher

I am fairly certain I have seen it somewhere but all keywords I have tried came up empty.
I have a graph that connects persons and companies via documents:
(:Person/:Company)-[ ]-(:Document)-[ ]-(:Person/:Company)
What I would like to do is return a graph that shows the connection between persons and companies directly with the relationship strength based on the number of connections between them.
I get the data with
MATCH (p)-[]-(d:Document)-[]-(c)
WHERE p:Person or p:Company and c:Person or c:Company
WITH p,c, count(d) as rel
RETURN p,rel,c
However in the Neo4J-Browser, the nodes appear without any relationships. Is there a way to achieve this or do I have to create some kind of meta relationship?
If you install APOC Procedures, you'll be able to create virtual relationships which are used for visualization but aren't actually stored in the db.
MATCH (p)-[]-(d:Document)-[]-(c)
WHERE (p:Person or p:Company AND c:Person or c:Company)
AND id(p) < id(c)
WITH p,c, count(d) as relStrength
CALL apoc.create.vRelationship(p,'REL',{strength:relStrength}, c) YIELD rel
RETURN p,rel,c
I also added a predicate on ids of p and c so you don't repeat the same two nodes with p and c switched.

Neo4j cypher query efficiency and syntax

I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks
I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.
The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.

Resources