compare sum and sum of different conditions - post union processing?

compare sum and sum of different conditions - post union processing? - neo4j

Hi I have a relationship Artist - Collaborated -> Writer and would like to find who are the artists who write mainly their own songs. Thus the weighted edge between writer and artist with the same name should be bigger than the sum of all other weights.
I managed to do this:
MATCH (n:Artist)-[r:Collaborated]-(m:Writer)
WITH n, m, sum(r.weight) as wrote
WHERE n.name = toLower(m.name)
RETURN n.name as Node, wrote ORDER BY wrote descending;
but I am not sure how to incorporate the second condition. Do I have to use post union processing? Any help pls?
To join the two WHERE conditions, I tried something like this and compare the first sum to the second sum but it doesn't work:
MATCH (o:Artist)-[q:Collaborated]-(p:Writer)
WITH o, p, sum(q.weight) as wrote1
WHERE o.name <> toLower(p.name)
MATCH (n:Artist)-[r:Collaborated]-(m:Writer)
WITH n, m, sum(r.weight) as wrote2
WHERE n.name = toLower(m.name) and wrote2>wrote1
RETURN n.name as Node, wrote2;
This is an example of how my graph looks like:
I would like to know if the weight between eminem and eminem is bigger than all the other weights

Firstly, your model is a little weird, you have two nodes Eminem, one with the label Artist and an other with the label Writer.
For my POV, you should have only one node Eminem with both labels.
To respond to your question I think that this query can helps you :
MATCH (o:Artist)-[r:Collaborated]->(p:Writer)
WITH o, CASE WHEN o.name = p.name THEN r.weight ELSE -1*r.weight END AS score
RETURN o, sum(score) AS score
If the score is superior to 0, then you know that eminem and eminem is bigger than all the other weights.

Related

Separating matching nodes in a query result

I defined the directed relation Know on person nodes. For example, if Sara knows Alice then Sara-> Alice. I wrote this Cypher query to find all the people who know both the right and left side of the directed relation.
match ((n:Person)-[:Know]-> (m:Person)),(p:Person)
where EXISTS ((m)<-[:Know]-(p)-[:Know]->(n))
RETURN m,n,p
I need to get subgraphs with 3 nodes in the query's result but the result I get is a graph with many nodes. Is there any method to change the query to generate subgraphs with just 3 nodes (for example, a subgraph of Alex-> Sara, Alex-> Alice, Sara-> Alice and if Sara has the same condition on two other people it is shown in another subgraph). This requires repeating some nodes in the output.

MATCH clauses are more flexible than that. Try this:
MATCH (n:Person)-[:Know]->(m:Person)<-[:Know]-(p:Person)-[:Know]->(n)
WHERE NOT EXISTS (()-[:Know]->(p))
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(m)
WHERE q <> n
AND q <> p
}
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(n)
WHERE q <> p
}
RETURN m, n, p
You might have to use a unique ID property, and I'm not sure if the WITH clause will work here as I've gotten it; but with subqueries, you are generally able to import variables from above using WITH.

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?

We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a

Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

How do I find pairs of nodes that are not connected in n hops?

As the title says, I have a graph of nodes which are interconnected with a relationship N. I now want to find all pairs of nodes which are further than 20 hops away from each other.
A naive approach with the following cypher query is far too slow:
MATCH (n:CELL)
WITH n
MATCH (k:CELL)
WHERE NOT (n)-[:N*1..20]->(k)
RETURN n, k
I could create a second relationship K with a "distance" property and then match that, but to do so for every Node doesn't exactly scale well (I've got 18k nodes, so I would need more than 160 million new relationships).
Is there any other way to solve this in neo4j?

You could try to use shortest-path which is more efficient.
MATCH (n:CELL)
WHERE shortestPath((n)-[:N*..20]->(k:CELL)) IS NULL
RETURN n, k

What about something like this:
MATCH p=((n:CELL-[:N*..20]->(k:CELL))
WITH n, k, min(length(p)) as minDinstance
WHERE minDinstance > 20/2 AND n <> k
RETURN DISTINCT n, k, minDinstance

Neo4j cypher query efficiency and syntax

I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks

I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.

The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.

Create relationship between nodes having same property value in common, using one Cypher query

Beginning with Neo4j 1.9.2, and using Cypher query language, I would like to create relationships between nodes having a specific property value in common.
I have set of nodes G having a property H, without any relationship currently existing between G nodes.
In a Cypher statement, is it possible to group G nodes by H property value and create a relationship HR between each nodes becoming to same group? Knowing that each group have a size between 2 & 10 and I'm having more than 15k of such groups (15k different H values) for about 50k G nodes.
I've tried hard to manage such query without finding a correct syntax. Below is a small sample dataset:
create
(G1 {name:'G1', H:'1'}),
(G2 {name:'G2', H:'1'}),
(G3 {name:'G3', H:'1'}),
(G4 {name:'G4', H:'2'}),
(G5 {name:'G5', H:'2'}),
(G6 {name:'G6', H:'2'}),
(G7 {name:'G7', H:'2'})
return * ;
At the end, I'd like such relationships:
G1-[:HR]-G2-[:HR]-G3-[:HR]-G1
And:
G4-[:HR]-G5-[:HR]-G6-[:HR]-G7-[:HR]-G4
In another case, I may want to update massively the relationships between nodes using/comparing some of their properties. Imagine nodes of type N and nodes of type M, with N nodes related to M with a relationship named :IS_LOCATED_ON. The order of the location can be stored as a property of N nodes (N.relativePosition being Long from 1 to MAX_POSITION), but we may need later to update the graph model such a way: make N nodes linked between themselves by a new :PRECEDES relationship, so that we can find easier and faster next node N on the given set.
I'd expect such language may allow to update massive set of nodes/relationships manipulating their properties.
Is it not possible?
If not, is it planned or may be it planned?
Any help would be greatly appreciated.

Since there's nothing in the data you supplied to get rank, I've played with collections
to get one as follows:
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
RETURN n.name, n.H, rank ORDER BY n.H, n.name;
Building off of that you can then start determining relationships
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
WITH n, others, rank, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
RETURN n.name, n.H, rank, next ORDER BY n.H, n.name;
Finally ( and slightly more condensed )
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
CREATE n-[:HR]->next
RETURN n, next;

You can just do it like that, maybe indicate direction in your relationships:
CREATE
(G1 { name:'G1', H:'1' }),
(G2 { name:'G2', H:'1' }),
(G3 { name:'G3', H:'1' }),
(G4 { name:'G4', H:'2' }),
(G5 { name:'G5', H:'2' }),
(G6 { name:'G6', H:'2' }),
(G7 { name:'G7', H:'2' }),
G1-[:HR]->G2-[:HR]->G3-[:HR]->G1,
G4-[:HR]->G5-[:HR]->G6-[:HR]->G7-[:HR]->G1
See http://console.neo4j.org/?id=ujns0x for an example.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

compare sum and sum of different conditions - post union processing? - neo4j

Related

Separating matching nodes in a query result

Nodes with relationship to multiple nodes

How do I find pairs of nodes that are not connected in n hops?

Neo4j cypher query efficiency and syntax

Create relationship between nodes having same property value in common, using one Cypher query

Categories

Resources