Improve performance of neo4j query with multiple OPTIONAL MATCH - neo4j

I am new to Neo4j and have a query that I want to improve. Any help or recommendations would be appreciated.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WITH e,t,p,u,s,a,f,au
WHERE
(t.id in [3450] or t.id is null) AND
(e.id in [16260] or e.id is null)
AND s.id = 16
AND (a.id = 0 or a.id is NULL)
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
See neo4j profile plan
here

Instead of traversing the whole graph, you will want to move the filtering part of the cypher queries sooner in the query.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
WHERE s.id = 16
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
WHERE e.id = 16260 or e.id is null
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
WHERE a.id = 0 or a.id is NULL
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WHERE t.id = 3450 or t.id is null
WITH e,t,p,u,s,a,f,au
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
This should greatly improve the query performance as you will be starting from only a single Source node instead of traversing all the source nodes in your graph.
It would also help if your graph model supports adding relationship types to the OPTIONAL MATCHes.
For example:
OPTIONAL MATCH (c)<-[:HAS_AUTHOR*1..2]-(au:Author)
This way you avoid traversing all the relationship types in each OPTIONAL MATCH.
If not, there are still some improvements you could make.
You could run the
OPTIONAL MATCH (c)<-[*1..2]-(node)
And then filter the results based on node type:
CASE WHEN node:Author THEN ... ELSE ... END

Related

Neo4j Match with multiple relationships

I need a MATCH where either relationship is true. I understand the (person1)-[:r1|:r2]-(person2). The problem I am having is that one of the MATCH traverse through another node. IE:
(p1:person)-[:FRIEND]-(p2:person)-[:FRIEND]-(p3:person)
So I want this kind of logic. The enemy of my enemy is my friend. And my friend is my friend. Output list of all the names who are my friend. I also limit the relationship to a particular value.
Something like:
MATCH (p1:Person)-[:ENEMY{type:'human'}]-(myEnemy:Person)-[enemy2:ENEMY{type:'human'}]-(myFriend:Person)
OR (p1:Person)-[friend:FRIEND{type:'human'}]-(myFriend:Person)
RETURN p1.name, myFriend.name
I need one list that I can then do aggregation on.
This is my first posting....so if my question is a mess...hit me with your feedback and I will clarify :)
You can use the UNION clause to combine 2 queries and also remove duplicate results:
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
The ID(p) < ID(f) filtering is done to avoid having the same pair of Person names being returned twice (in reverse order).
[UPDATE]
To get a count of how many friends each Person has, you can take advantage of the new CALL subquery syntax (in neo4j 4.0) to do post-union processing:
CALL {
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
}
RETURN pName, COUNT(f) AS friendCount

Neo4j Cypher relationship existis and collection of IDs

I have a following Neo4j Cypher query that checks if relationship exists between User and entity and returns boolean result:
MATCH (u:User) WHERE u.id = {userId} MATCH (entity) WHERE id(entity) = {entityGraphId} RETURN EXISTS( (u)<-[:OWNED_BY]-(entity) )
Please help to rewrite this query in order to be able to accept a collection of {entityGraphIds} instead of a single {entityGraphId} and check if a relationship exists between User and any entities with these {entityGraphIds}.
For example, I have user1 and entity1, entity2. user1 has a relationship with entity2. I'll pass {user.id} like {userId} and {entity1.id, entity2.id} like {entityGraphIds} and this query should return true.
I believe you can simply use the IN operator. Considering these parameters:
:params {userId: 1, entityGraphIds : [2,3,4]}
Then, the query:
MATCH (u:User) WHERE u.id = {userId}
MATCH (entity) WHERE id(entity) IN ({entityGraphIds})
RETURN EXISTS( (u)<-[:OWNED_BY]-(entity) )
EDIT:
If you are trying to return true when :User is connected to at least 1 entity, then you can simplify your query to:
OPTIONAL MATCH (u:User)<-[:OWNED_BY]-(entity:Entity)
WHERE u.id = {userId} AND id(entity) IN ({entityGraphIds})
RETURN u IS NOT NULL

How do I optimize a multi-node Neo4J query?

I am new to Cypher querying. Below query has been running for the past few hours. Trying to infer relationships between 'T' nodes by using 'D' and 'R' nodes in the middle. Wanted to understand if there's a better way to write it.
MATCH (t:T)-[r1:T_OF]->(d:D)<-[r2:R_OF]-(m:R)-[r3:R_OF]->(e:D)<-[r4:T_OF]-(u:T)
WHERE t.name <> u.name AND d.name <> e.name
RETURN t.name, u.name, count(*) as degree
ORDER BY degree desc
Here's the count of each node and relationship type -
Nodes
T: 4,657
D: 2,458,733
R: 4,822
Relationships
T_OF: 4,915,004
R_OF: 284,548
You could add a clause to avoid computing both (t, u) and (u, t), that would reduce the size of the cartesian product by half:
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or maybe
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE t.name < u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
which won't cost an extra read for the id.
It probably doesn't make a difference, but you can also avoid binding variables that you don't use (r1, r2, r3, r4, m).
It's hard to optimize a query when you don't have the matching data and can't PROFILE it. However, I see that you have much more T_OF relationships than you have R_OF, so maybe if you change the traversal order that will prune branches faster:
MATCH (m:R)-[:R_OF]->(d:D)<-[:T_OF]-(t:T)
MATCH (m)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or even
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
You could also try to reduce the size of the first cartesian product with the same id() trick (or ordering the names), but then you need to reassemble the couples at the end:
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE id(d) < id(e)
AND d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE t.name <> u.name
WITH t.name AS name1, u.name AS name2, count(*) AS degree
WITH CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2,
degree
RETURN name1, name2, sum(degree) AS degree
ORDER BY degree DESC
All these possibilities would need to be profiled (on a smaller set, or use EXPLAIN to just get the plan, but that's just the theory and the profile is much more interesting) to see if they lead anywhere.

Get property of neo4j node with optional relationship like left join

I am having 2 nodes lets say of 2 type 'Student' and 'Class'
Student have {id, name}.
Class have {id, name}.
Student can have optional relationship with Class node as 'ATTENDS'.
(s:Student)-[r:ATTENDS]->(c:Class).
[r:ATTENDS] - Optional relationship. (present or may not present)
I want student record as it's all properties. If present relationship then class_name will match with present "Class" node else class_name will be null.
{student_id,student_name,class_name}
I tried by cypher query, but not getting result. Please help.
OPTIONAL MATCH (s:Student)-[:ATTENDS]->(c:Class) WHERE s.id =1
RETURN s.id AS student_id , s.name as student_name, c.name as class_name
By this query, if relationship exists then all values, if no relationship exists then all values are null.
If you don't care about the type of relation, you could run
MATCH (student:Student {id :1})
OPTIONAL MATCH (s)-->(class:Class)
RETURN student.id, student.name, class.name
and you'll have no need to set aliases.
Got solution to this problem by trying different queries.
MATCH (s:Student {id :1})
OPTIONAL MATCH (s)-[:ATTENDS]->(c:Class)
RETURN s.id AS student_id , s.name as student_name, c.name as class_name
Need to first match required criteria and then optional match. If anyone have simpler solution then please post.
Wrote a graph-gist for this at http://gist.neo4j.org/?11110772
The short answer is:
MATCH (s:Student) OPTIONAL MATCH (s)-->(c:Course)
RETURN s.name, c.name
Read the gist for more details. http://gist.neo4j.org/?11110772
Note that you cannot ignore the first MATCH. If the entire query is optional, nothing will be retrieved. In SQL you also have a non optional query on one table and then a left join to the second, optional, table.

match in clause in cypher

How can I do an match in clause in cypher
e.g. I'd like to find movies with ids 1, 2, or 3.
match (m:movie {movie_id:("1","2","3")}) return m
if you were going against an auto index the syntax was
START n=node:node_auto_index('movie_id:("123", "456", "789")')
how is this different against a match clause
The idea is that you can do:
MATCH (m:movie)
WHERE m.movie_id in ["1", "2", "3"]
However, this will not use the index as of 2.0.1. This is a missing feature in the new label indexes that I hope will be resolved soon. https://github.com/neo4j/neo4j/issues/861
I've found a (somewhat ugly) temporary workaround for this.
The following query doesn't make use of an index on Person(name):
match (p:Person)... where p.name in ['JOHN', 'BOB'] return ...;
So one option is to repeat the entire query n times:
match (p:Person)... where p.name = 'JOHN' return ...
union
match (p:Person)... where p.name = 'BOB' return ...
If this is undesirable then another option is to repeat just a small query for the id n times:
match (p:Person) where p.name ='JOHN' return id(p)
union
match (p:Person) where p.name ='BOB' return id(p);
and then perform a second query using the results of the first:
match (p:Person)... where id(p) in [8,16,75,7] return ...;
Is there a way to combine these into a single query? Can a union be nested inside another query?

Resources