How do I optimize a multi-node Neo4J query? - neo4j

I am new to Cypher querying. Below query has been running for the past few hours. Trying to infer relationships between 'T' nodes by using 'D' and 'R' nodes in the middle. Wanted to understand if there's a better way to write it.
MATCH (t:T)-[r1:T_OF]->(d:D)<-[r2:R_OF]-(m:R)-[r3:R_OF]->(e:D)<-[r4:T_OF]-(u:T)
WHERE t.name <> u.name AND d.name <> e.name
RETURN t.name, u.name, count(*) as degree
ORDER BY degree desc
Here's the count of each node and relationship type -
Nodes
T: 4,657
D: 2,458,733
R: 4,822
Relationships
T_OF: 4,915,004
R_OF: 284,548

You could add a clause to avoid computing both (t, u) and (u, t), that would reduce the size of the cartesian product by half:
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or maybe
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE t.name < u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
which won't cost an extra read for the id.
It probably doesn't make a difference, but you can also avoid binding variables that you don't use (r1, r2, r3, r4, m).
It's hard to optimize a query when you don't have the matching data and can't PROFILE it. However, I see that you have much more T_OF relationships than you have R_OF, so maybe if you change the traversal order that will prune branches faster:
MATCH (m:R)-[:R_OF]->(d:D)<-[:T_OF]-(t:T)
MATCH (m)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or even
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
You could also try to reduce the size of the first cartesian product with the same id() trick (or ordering the names), but then you need to reassemble the couples at the end:
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE id(d) < id(e)
AND d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE t.name <> u.name
WITH t.name AS name1, u.name AS name2, count(*) AS degree
WITH CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2,
degree
RETURN name1, name2, sum(degree) AS degree
ORDER BY degree DESC
All these possibilities would need to be profiled (on a smaller set, or use EXPLAIN to just get the plan, but that's just the theory and the profile is much more interesting) to see if they lead anywhere.

Related

Improve performance of neo4j query with multiple OPTIONAL MATCH

I am new to Neo4j and have a query that I want to improve. Any help or recommendations would be appreciated.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WITH e,t,p,u,s,a,f,au
WHERE
(t.id in [3450] or t.id is null) AND
(e.id in [16260] or e.id is null)
AND s.id = 16
AND (a.id = 0 or a.id is NULL)
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
See neo4j profile plan
here
Instead of traversing the whole graph, you will want to move the filtering part of the cypher queries sooner in the query.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
WHERE s.id = 16
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
WHERE e.id = 16260 or e.id is null
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
WHERE a.id = 0 or a.id is NULL
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WHERE t.id = 3450 or t.id is null
WITH e,t,p,u,s,a,f,au
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
This should greatly improve the query performance as you will be starting from only a single Source node instead of traversing all the source nodes in your graph.
It would also help if your graph model supports adding relationship types to the OPTIONAL MATCHes.
For example:
OPTIONAL MATCH (c)<-[:HAS_AUTHOR*1..2]-(au:Author)
This way you avoid traversing all the relationship types in each OPTIONAL MATCH.
If not, there are still some improvements you could make.
You could run the
OPTIONAL MATCH (c)<-[*1..2]-(node)
And then filter the results based on node type:
CASE WHEN node:Author THEN ... ELSE ... END

Neo4j Match with multiple relationships

I need a MATCH where either relationship is true. I understand the (person1)-[:r1|:r2]-(person2). The problem I am having is that one of the MATCH traverse through another node. IE:
(p1:person)-[:FRIEND]-(p2:person)-[:FRIEND]-(p3:person)
So I want this kind of logic. The enemy of my enemy is my friend. And my friend is my friend. Output list of all the names who are my friend. I also limit the relationship to a particular value.
Something like:
MATCH (p1:Person)-[:ENEMY{type:'human'}]-(myEnemy:Person)-[enemy2:ENEMY{type:'human'}]-(myFriend:Person)
OR (p1:Person)-[friend:FRIEND{type:'human'}]-(myFriend:Person)
RETURN p1.name, myFriend.name
I need one list that I can then do aggregation on.
This is my first posting....so if my question is a mess...hit me with your feedback and I will clarify :)
You can use the UNION clause to combine 2 queries and also remove duplicate results:
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
The ID(p) < ID(f) filtering is done to avoid having the same pair of Person names being returned twice (in reverse order).
[UPDATE]
To get a count of how many friends each Person has, you can take advantage of the new CALL subquery syntax (in neo4j 4.0) to do post-union processing:
CALL {
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
}
RETURN pName, COUNT(f) AS friendCount

(Neo4j, Cypher) How to set incremental number to relationships?

i'm using neo4j. what i'd like to do is to create a root node for search result and to create relationships from root node to search result nodes. and I'd like to set incremental number to each relationship's property.
if possible, with one query.
Sorry for not explaining enough.
This is what I'd like to do.
Any more concise way?
// create test data
WITH RANGE(0, 99) AS indexes,
['Paul', 'Bley', 'Bill', 'Evans', 'Robert', 'Glasper', 'Chihiro', 'Yamanaka', 'Fred', 'Hersch'] AS names
UNWIND indexes AS index
CREATE (p:Person { index: index, name: (names[index%10] + toString(index)) });
// create 'Results' node with relationships to search result 'Person' nodes.
// 'SEARCH_RESULT' relationships have 'order' and 'orderBy' properties.
CREATE(x:Results{ts: TIMESTAMP()})
WITH x
MATCH(p:Person)
WHERE p.name contains '1'
MERGE(x)-[r:SEARCH_RESULT]->(p)
WITH x, r, p
MATCH (x)-[r]->(p)
WITH x, r, p
ORDER BY p.name desc
WITH RANGE(0, COUNT(r)-1) AS indexes, COLLECT(r) AS rels
UNWIND indexes AS i
SET (rels[i]).order = i
SET (rels[i]).orderBy = 'name'
RETURN rels;
// validate
MATCH(x:Results)-[r:SEARCH_RESULT]->(p:Person)
RETURN r, p.name ORDER BY r.order;

Neo4j - Cypher node and relationship relation

I have the following query:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
MATCH (filterCharacteristic1:Characteristic)
WHERE filterCharacteristic1.id = 1
WITH dg, filterCharacteristic1
CALL apoc.index.between(childD,'HAS_VALUE_ON',filterCharacteristic1,'(value:(10))') YIELD rel
WITH DISTINCT rel, childD, dg
MATCH (childD)-(rel) // here I need to go further only with 'childD' nodes that have relationship with 'rel'(match `apoc.index.between` predicate)
As you may see from the query above - at the end I'm trying to filter childD nodes that have the relationship with rel but I don't know how to describe it in Cypher. Something like (childD)-(rel) or (childD)-[rel] doesn't work and leads to the error. Please help
You need to look for a match pattern and compare the relationship:
...
WITH DISTINCT rel, childD, dg
MATCH (childD)-[tmp]-() WHERE tmp = rel
RETURN rel, child, dg
Or you can compare directly:
...
WITH DISTINCT rel, childD, dg, startNode(rel) AS sRel, endNode(rel) AS eRel
WHERE (childD)--(sRel) OR (childD)--(eRel)
RETURN rel, child, dg

match in clause in cypher

How can I do an match in clause in cypher
e.g. I'd like to find movies with ids 1, 2, or 3.
match (m:movie {movie_id:("1","2","3")}) return m
if you were going against an auto index the syntax was
START n=node:node_auto_index('movie_id:("123", "456", "789")')
how is this different against a match clause
The idea is that you can do:
MATCH (m:movie)
WHERE m.movie_id in ["1", "2", "3"]
However, this will not use the index as of 2.0.1. This is a missing feature in the new label indexes that I hope will be resolved soon. https://github.com/neo4j/neo4j/issues/861
I've found a (somewhat ugly) temporary workaround for this.
The following query doesn't make use of an index on Person(name):
match (p:Person)... where p.name in ['JOHN', 'BOB'] return ...;
So one option is to repeat the entire query n times:
match (p:Person)... where p.name = 'JOHN' return ...
union
match (p:Person)... where p.name = 'BOB' return ...
If this is undesirable then another option is to repeat just a small query for the id n times:
match (p:Person) where p.name ='JOHN' return id(p)
union
match (p:Person) where p.name ='BOB' return id(p);
and then perform a second query using the results of the first:
match (p:Person)... where id(p) in [8,16,75,7] return ...;
Is there a way to combine these into a single query? Can a union be nested inside another query?

Resources