How can I make this neo4j query faster? - neo4j

I have a database that contains these four nodes:
Store, Guitar, GuitarModel, Accessory
*Guitar refers to a specific guitar that a person can own/play
optional match (a:Store), (b:Guitar), (c:GuitarModel), (d:Accessory)
where a.StoreNumber ="1234" and (a)-[:ContainsGuitar]->(b) and
(b)-[:IS_OF_MODEL]->(c) and
((d)-[:COMES_STANDARD]-(c) OR (d)-[:COMES_OPTIONAL]-(c) OR (d)-:COMES_OPTION_UPGRADE]-(c) OR (d)-[:COMES_UPGRADE]-(c))
return b.name, collect(d.name)
My issue right now is this query is pretty slow it takes about 120,000ms to perform.
I have 67,000 nodes and 131,000 relationships.
So am I doing something wrong that making this slow?

Do you have an index/constraint on :Store(StoreNumber) ?
Why are you only using an optional match ? You can combine MATCH & OPTIONAL MATCH
Why are you doing your pattern in the WHERE clause ? You should put it directly in a MATCH.
I think that your query creates a cartesian product between nodes, that's why it's so slow.
Can you try this query :
MATCH
(:Store { StoreNumber:"1234" })-[:ContainsGuitar]->(b)
RETURN
b.name,
[(b)-[:IS_OF_MODEL]->(:GuitarModel)-[:COMES_STANDARD|COMES_OPTIONAL|COMES_OPTION_UPGRADE|COMES_UPGRADE]-(d:Accessory) | d.name]

Related

Multiple Match queries in one query

I have the following records in my neo4j database
(:A)-[:B]->(:C)-[:D]->(:E)
(:C)-[:D]->(:E)
I want to get all the C Nodes and all the relations and related Nodes. If I do the query
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
I get the first to match if I do
Match (i:C)-[u:D]->(y:E)
Return i,u,y
I get the second to match.
But I want both of them in one query. How do I do that?
The easiest way is to UNION the queries, and pad unused variables with null (because all cyphers UNION'ed must have the same return columns
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
UNION
Match (i:C)-[u:D]->(y:E)
Return NULL as p, NULL as o,i,u,y
In your example though, the second match actually matches the last half of the first chain as well, so maybe you actually want something more direct like...
MATCH (c:C)
OPTIONAL MATCH (connected)
WHERE (c)-[*..20]-(connected)
RETURN c, COLLECT(connected) as connected
It looks like you're being a bit too specific in your query. If you just need, for all :C nodes, the connected nodes and relationships, then this should work:
MATCH (c:C)-[r]-(n)
RETURN c, r, n

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

Cypher Query not returning nonexistent relationships

I have a graph database where there are user and interest nodes which are connected by IS_INTERESTED relationship. I want to find interests which are not selected by a user. I wrote this query and it is not working
OPTIONAL MATCH (u:User{userId : 1})-[r:IS_INTERESTED] -(i:Interest)
WHERE r is NULL
Return i.name as interest
According to answers to similar questions on SO (like this one), the above query is supposed to work.However,in this case it returns null. But when running the following query it works as expected:
MATCH (u:User{userId : 1}), (i:Interest)
WHERE NOT (u) -[:IS_INTERESTED] -(i)
return i.name as interest
The reason I don't want to run the above query is because Neo4j gives a warning:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (i))
What am I doing wrong in the first query where I use OPTIONAL MATCH to find nonexistent relationships?
1) MATCH is looking for the pattern as a whole, and if can not find it in its entirety - does not return anything.
2) I think that this query will be effective:
// Take all user interests
MATCH (u:User{userId: 1})-[r:IS_INTERESTED]-(i:Interest)
WITH collect(i) as interests
// Check what interests are not included
MATCH (ni:Interest) WHERE NOT ni IN interests
RETURN ni.name
When your OPTIONAL MATCH query does not find a match, then both r AND i must be NULL. After all, since there is no relationship, there is no way get the nodes that it points to.
A WHERE directly after the OPTIONAL MATCH is pulled into the evaluation.
If you want to post-filter you have to use a WITH in between.
MATCH (u:User{userId : 1})
OPTIONAL MATCH (u)-[r:IS_INTERESTED] -(i:Interest)
WITH r,i
WHERE r is NULL
Return i.name as interest

How to query for multiple OR'ed Neo4j paths?

Anyone know of a fast way to query multiple paths in Neo4j ?
Lets say I have movie nodes that can have a type that I want to match (this is psuedo-code)
MATCH
(m:Movie)<-[:TYPE]-(g:Genre { name:'action' })
OR
(m:Movie)<-[:TYPE]-(x:Genre)<-[:G_TYPE*1..3]-(g:Genre { name:'action' })
(m)-[:SUBGENRE]->(sg:SubGenre {name: 'comedy'})
OR
(m)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
The problem is, the first "m:Movie" nodes to be matched must match one of the paths specified, and the second SubGenre is depenedent on the first match.
I can make a query that works using MATCH and WHERE, but its really slow (30 seconds with a small 20MB dataset).
The problem is, I don't know how to OR match in Neo4j with other OR matches hanging off of the first results.
If I use WHERE, then I have to declare all the nodes used in any of the statements, in the initial MATCH which makes the query slow (since you cannot introduce new nodes in a WHERE)
Anyone know an elegant way to solve this ?? Thanks !
You can try a variable length path with a minimal length of 0:
MATCH
(m:Movie)<-[:TYPE|:SUBGENRE*0..4]-(g)
WHERE g:Genre and g.name = 'action' OR g:SubGenre and g.name='comedy'
For the query to use an index to find your genre / subgenre I recommend a UNION query though.
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' })
RETURN distinct m
UNION
(m:Movie)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
RETURN distinct m
Perhaps the OPTIONAL MATCH clause might help here. OPTIONAL MATCH beavior is similar to the MATCH statement, except that instead of an all-or-none pattern matching approach, any elements of the pattern that do not match the pattern specific in the statement are bound to null.
For example, to match on a movie, its genre and a possible sub-genre:
OPTIONAL MATCH (m:Movie)-[:IS_GENRE]->(g:Genre)<-[:IS_SUBGENRE]-(sub:Genre)
WHERE m.title = "The Matrix"
RETURN m, g, sub
This will return the movie node, the genre node and if it exists, the sub-genre. If there is no sub-genre then it will return null for sub. You can use variable length paths as you have above as well with OPTIONAL MATCH.
[EDITED]
The following MATCH clause should be equivalent to your pseudocode. There is also a USING INDEX clause that assumes you have first created an index on :SubGenre(name), for efficiency. (You could use an index on :Genre(name) instead, if Genre nodes are more numerous than SubGenre nodes.)
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' }),
(m)-[:SUBGENRE]->()<-[:SUB_TYPE*0..3]-(sg:SubGenre { name: 'comedy' })
USING INDEX sg:SubGenre(name)
Here is a console that shows the results for some sample data.

What is the difference between multiple MATCH clauses and a comma in a Cypher query?

In a Cypher query language for Neo4j, what is the difference between one MATCH clause immediately following another like this:
MATCH (d:Document{document_ID:2})
MATCH (d)--(s:Sentence)
RETURN d,s
Versus the comma-separated patterns in the same MATCH clause? E.g.:
MATCH (d:Document{document_ID:2}),(d)--(s:Sentence)
RETURN d,s
In this simple example the result is the same. But are there any "gotchas"?
There is a difference: comma separated matches are actually considered part of the same pattern. So for instance the guarantee that each relationship appears only once in resulting path is upheld here.
Separate MATCHes are separate operations whose paths don't form a single patterns and which don't have these guarantees.
I think it's better to explain providing an example when there's a difference.
Let's say we have the "Movie" database which is provided by official Neo4j tutorials.
And there're 10 :WROTE relationships in total between :Person and :Movie nodes
MATCH (:Person)-[r:WROTE]->(:Movie) RETURN count(r); // returns 10
1) Let's try the next query with two MATCH clauses:
MATCH (p:Person)-[:WROTE]->(m:Movie) MATCH (p2:Person)-[:WROTE]->(m2:Movie)
RETURN p.name, m.title, p2.name, m2.title;
Sure you will see 10*10 = 100 records in the result.
2) Let's try the query with one MATCH clause and two patterns:
MATCH (p:Person)-[:WROTE]->(m:Movie), (p2:Person)-[:WROTE]->(m2:Movie)
RETURN p.name, m.title, p2.name, m2.title;
Now you will see 90 records are returned.
That's because in this case records where p = p2 and m = m2 with the same relationship between them (:WROTE) are excluded.
For example, there IS a record in the first case (two MATCH clauses)
p.name m.title p2.name m2.title
"Aaron Sorkin" "A Few Good Men" "Aaron Sorkin" "A Few Good Men"
while there's NO such a record in the second case (one MATCH, two patterns)
There are no differences between these provided that the clauses are not linked to one another.
If you did this:
MATCH (a:Thing), (b:Thing) RETURN a, b;
That's the same as:
MATCH (a:Thing) MATCH (b:Thing) RETURN a, b;
Because (and only because) a and b are independent. If a and b were linked by a relationship, then the meaning of the query could change.
In a more generic way, "The same relationship cannot be returned more than once in the same result record." [see 1.5. Cypher Result Uniqueness in the Cypher manual]
Both MATCH-after-MATCH, and single MATCH with comma-separated pattern should logically return a Cartesian product. Except, for comma-separated pattern, we must exclude those records for which we already added the relationship(s).
In Andy's answer, this is why we excluded repetitions of the same movie in the second case: because the second expression from each single MATCH was using there the same :WROTE relationship as the first expression.
If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (a)) .
IN short their is NO Difference in this both query but used it very carefully.
In a more generic way, "The same relationship cannot be returned more than once in the same result record." [see 1.5. Cypher Result Uniqueness in the Cypher manual]
How about this statement?
MATCH p1=(v:player)-[e1]->(n)
MATCH p2=(n:team)<-[e2]-(m)
WHERE e1=e2
RETURN e1,e2,p1,p2

Resources