Neo4j Cypher query null or IN - neo4j

I have a following cypher query:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
RETURN ru, u, childD
SKIP 0 LIMIT 100
Decision entity can belong to 0..N Tenant objects
#NodeEntity
public class Decision {
private final static String BELONGS_TO = "BELONGS_TO";
#Relationship(type = BELONGS_TO, direction = Relationship.OUTGOING)
private Set<Tenant> tenants = new HashSet<>();
....
}
I need to extend the Cypher query above in order to return all childD where parentD and childD not belong to any of Tenant or belong to Tenant with IDs provided in {tenantIds} set. Please help me with this query.

Cypher is very expressive language, just follow your textual requirements...
MATCH (t:Tenant) WHERE ID(t) in {tenantIds}
WITH COLLECT(t) as tenants
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE
id(parentD) = {decisionId}
AND
// not belong to any of Tenant or belong to Tenant
(not (parentD)-[:BELONGS_TO]-(:Tenant) OR any(t in tenants WHERE (parentD)-[:BELONGS_TO]-(t)))
AND
// not belong to any of Tenant or belong to Tenant
(not (childD)-[:BELONGS_TO]-(:Tenant) OR any(t in tenants WHERE (childD)-[:BELONGS_TO]-(t)))
RETURN ru, u, childD
SKIP 0 LIMIT 100

Use optional match to collect and test tenants:
MATCH (parentD) WHERE id(parentD) = {decisionId}
OPTIONAL MATCH (parentD)-[:BELONGS_TO]->(T:Tenant)
WHERE NOT id(T) IN {tenantIds}
WITH parentD, collect(T) AS TC
WHERE size(TC) <= 0
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[:BELONGS_TO]->(T:Tenant)
WHERE NOT id(T) IN {tenantIds}
WITH childD, ru, u, collect(T) AS TC
WHERE size(TC) <= 0
RETURN ru, u, childD
SKIP 0 LIMIT 100

Related

Improve performance of neo4j query with multiple OPTIONAL MATCH

I am new to Neo4j and have a query that I want to improve. Any help or recommendations would be appreciated.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WITH e,t,p,u,s,a,f,au
WHERE
(t.id in [3450] or t.id is null) AND
(e.id in [16260] or e.id is null)
AND s.id = 16
AND (a.id = 0 or a.id is NULL)
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
See neo4j profile plan
here
Instead of traversing the whole graph, you will want to move the filtering part of the cypher queries sooner in the query.
MATCH (s:Source),
(s)-[:SourceContext]->(c:Context),
(c)-[:ContextFunction]->(f:Function)
WHERE s.id = 16
OPTIONAL MATCH (c)<-[*1..2]-(e:Entity)
WHERE e.id = 16260 or e.id is null
OPTIONAL MATCH (c)<-[*1..2]-(au:Author)
OPTIONAL MATCH (c)<-[*1..2]-(p:Period)
OPTIONAL MATCH (c)<-[*1..2]-(u:Unit)
OPTIONAL MATCH (c)<-[*1..2]-(a:AttributeSet)
WHERE a.id = 0 or a.id is NULL
OPTIONAL MATCH (c)<-[*1..2]-(t:Timeseries)
WHERE t.id = 3450 or t.id is null
WITH e,t,p,u,s,a,f,au
return {SourceID: s.id, EntityID: e.id, TimeSeriesID: t.id, PeriodID: p.id, UnitID: u.id, FunctionID: f.id, AttributeSetID: a.id}
This should greatly improve the query performance as you will be starting from only a single Source node instead of traversing all the source nodes in your graph.
It would also help if your graph model supports adding relationship types to the OPTIONAL MATCHes.
For example:
OPTIONAL MATCH (c)<-[:HAS_AUTHOR*1..2]-(au:Author)
This way you avoid traversing all the relationship types in each OPTIONAL MATCH.
If not, there are still some improvements you could make.
You could run the
OPTIONAL MATCH (c)<-[*1..2]-(node)
And then filter the results based on node type:
CASE WHEN node:Author THEN ... ELSE ... END

Neo4j Match with multiple relationships

I need a MATCH where either relationship is true. I understand the (person1)-[:r1|:r2]-(person2). The problem I am having is that one of the MATCH traverse through another node. IE:
(p1:person)-[:FRIEND]-(p2:person)-[:FRIEND]-(p3:person)
So I want this kind of logic. The enemy of my enemy is my friend. And my friend is my friend. Output list of all the names who are my friend. I also limit the relationship to a particular value.
Something like:
MATCH (p1:Person)-[:ENEMY{type:'human'}]-(myEnemy:Person)-[enemy2:ENEMY{type:'human'}]-(myFriend:Person)
OR (p1:Person)-[friend:FRIEND{type:'human'}]-(myFriend:Person)
RETURN p1.name, myFriend.name
I need one list that I can then do aggregation on.
This is my first posting....so if my question is a mess...hit me with your feedback and I will clarify :)
You can use the UNION clause to combine 2 queries and also remove duplicate results:
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
The ID(p) < ID(f) filtering is done to avoid having the same pair of Person names being returned twice (in reverse order).
[UPDATE]
To get a count of how many friends each Person has, you can take advantage of the new CALL subquery syntax (in neo4j 4.0) to do post-union processing:
CALL {
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
}
RETURN pName, COUNT(f) AS friendCount

Neo4j - Cypher node and relationship relation

I have the following query:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
MATCH (filterCharacteristic1:Characteristic)
WHERE filterCharacteristic1.id = 1
WITH dg, filterCharacteristic1
CALL apoc.index.between(childD,'HAS_VALUE_ON',filterCharacteristic1,'(value:(10))') YIELD rel
WITH DISTINCT rel, childD, dg
MATCH (childD)-(rel) // here I need to go further only with 'childD' nodes that have relationship with 'rel'(match `apoc.index.between` predicate)
As you may see from the query above - at the end I'm trying to filter childD nodes that have the relationship with rel but I don't know how to describe it in Cypher. Something like (childD)-(rel) or (childD)-[rel] doesn't work and leads to the error. Please help
You need to look for a match pattern and compare the relationship:
...
WITH DISTINCT rel, childD, dg
MATCH (childD)-[tmp]-() WHERE tmp = rel
RETURN rel, child, dg
Or you can compare directly:
...
WITH DISTINCT rel, childD, dg, startNode(rel) AS sRel, endNode(rel) AS eRel
WHERE (childD)--(sRel) OR (childD)--(eRel)
RETURN rel, child, dg

Neo4j Cypher relationship existis and collection of IDs

I have a following Neo4j Cypher query that checks if relationship exists between User and entity and returns boolean result:
MATCH (u:User) WHERE u.id = {userId} MATCH (entity) WHERE id(entity) = {entityGraphId} RETURN EXISTS( (u)<-[:OWNED_BY]-(entity) )
Please help to rewrite this query in order to be able to accept a collection of {entityGraphIds} instead of a single {entityGraphId} and check if a relationship exists between User and any entities with these {entityGraphIds}.
For example, I have user1 and entity1, entity2. user1 has a relationship with entity2. I'll pass {user.id} like {userId} and {entity1.id, entity2.id} like {entityGraphIds} and this query should return true.
I believe you can simply use the IN operator. Considering these parameters:
:params {userId: 1, entityGraphIds : [2,3,4]}
Then, the query:
MATCH (u:User) WHERE u.id = {userId}
MATCH (entity) WHERE id(entity) IN ({entityGraphIds})
RETURN EXISTS( (u)<-[:OWNED_BY]-(entity) )
EDIT:
If you are trying to return true when :User is connected to at least 1 entity, then you can simplify your query to:
OPTIONAL MATCH (u:User)<-[:OWNED_BY]-(entity:Entity)
WHERE u.id = {userId} AND id(entity) IN ({entityGraphIds})
RETURN u IS NOT NULL

How do I optimize a multi-node Neo4J query?

I am new to Cypher querying. Below query has been running for the past few hours. Trying to infer relationships between 'T' nodes by using 'D' and 'R' nodes in the middle. Wanted to understand if there's a better way to write it.
MATCH (t:T)-[r1:T_OF]->(d:D)<-[r2:R_OF]-(m:R)-[r3:R_OF]->(e:D)<-[r4:T_OF]-(u:T)
WHERE t.name <> u.name AND d.name <> e.name
RETURN t.name, u.name, count(*) as degree
ORDER BY degree desc
Here's the count of each node and relationship type -
Nodes
T: 4,657
D: 2,458,733
R: 4,822
Relationships
T_OF: 4,915,004
R_OF: 284,548
You could add a clause to avoid computing both (t, u) and (u, t), that would reduce the size of the cartesian product by half:
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or maybe
MATCH (t:T)-[:T_OF]->(d:D)<-[:R_OF]-(:R)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE t.name < u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
which won't cost an extra read for the id.
It probably doesn't make a difference, but you can also avoid binding variables that you don't use (r1, r2, r3, r4, m).
It's hard to optimize a query when you don't have the matching data and can't PROFILE it. However, I see that you have much more T_OF relationships than you have R_OF, so maybe if you change the traversal order that will prune branches faster:
MATCH (m:R)-[:R_OF]->(d:D)<-[:T_OF]-(t:T)
MATCH (m)-[:R_OF]->(e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
AND d.name <> e.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
or even
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE id(t) < id(u)
AND t.name <> u.name
RETURN t.name, u.name, count(*) AS degree
ORDER BY degree DESC
You could also try to reduce the size of the first cartesian product with the same id() trick (or ordering the names), but then you need to reassemble the couples at the end:
MATCH (m:R)-[:R_OF]->(d:D)
MATCH (m)-[:R_OF]->(e:D)
WHERE id(d) < id(e)
AND d.name <> e.name
MATCH (d:D)<-[:T_OF]-(t:T), (e:D)<-[:T_OF]-(u:T)
WHERE t.name <> u.name
WITH t.name AS name1, u.name AS name2, count(*) AS degree
WITH CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2,
degree
RETURN name1, name2, sum(degree) AS degree
ORDER BY degree DESC
All these possibilities would need to be profiled (on a smaller set, or use EXPLAIN to just get the plan, but that's just the theory and the profile is much more interesting) to see if they lead anywhere.

Resources