I have the following Neo4J Cypher query:
MATCH (u:User{uid:'1819228'}),
(ctx:Context)-[:BY]->(u)
WITH DISTINCT ctx, u
MATCH (s:Statement)-[:IN]->(ctx),
(s)-[:BY]->(u)
RETURN DISTINCT s, ctx
ORDER BY s.timestamp ASC;
I have a feeling that something here is not efficient, because it runs pretty slow. Is it a case of creating a product with several query results?
What would be the best way to optimize this query to get the results in the same form?
This may be faster (and you may not need the DISTINCT):
MATCH (u:User)<-[:BY]-(s:Statement)-[:IN]->(ctx:Context)-[:BY]->(u)
WHERE u.uid = '1819228'
RETURN DISTINCT s, ctx
ORDER BY s.timestamp;
You may also want to create an index (or uniqueness constraint, which automatically creates an index) on :User(uuid), which will quickly find the desired User to start off the query.
Related
I have the following Neo4J DB model:
I need to be able to find a node of the type Concept called "idea" made by the user infranodus and see which nodes of the type Context it is connected to.
I use the following query, but it's taking too long and doesn't even load...
MATCH (ctx:Context)-[:BY]->(u:User{name:'infranodus'})
WITH DISTINCT ctx
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
RETURN DISINCT c, ctx, count(cn);
How to fix it?
adding indexes to the neo4j database would be helpful ( i didnt see any there ) then start your match with the indexed node and traverse from there. i think that having a
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
WHERE (ctx)-[:BY]->(u:User{name:'infranodus'})
....
would also avoid querying twice.
Create an index on the name property of nodes with the User label - a unique index would be great for this.
An index for the name property on concept would be good, too (probably without uniqueness constraint - even though you're looking for distinct nodes explicitly).
Start your query from there.
MATCH (u:User {name:'infranodus'})
MATCH (c:Concept {name: 'idea'})
MATCH (u)<-[:BY]-(ctx:Context)<-[:AT]-(c)
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
RETURN c, ctx, size((c)-[:AT]->(ctx)<-[:AT]-(:Concept))
Doesn't look as nice, but you're asking for query tuning. Make sure to PROFILE your existing query and this. Plus, you had a typo in your query after your return keyword.
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
I'm very new to Neo4J and I can't get this simple query work.
The data I have looks like this:
(a)-[:likes]->(b)
(a)-[:likes]->(c)
Now I'd like to extract a list with everyone who likes someone else.
Tried
match (u)-[:likes]->(p) return u order by p.id desc;
This gives me a duplicate of (a).
I tried using distinct:
match (u)-[:likes]->(p) return distinct u order by p.id desc;
This gives me 'variable p undefined'.
I know that if I drop the ordering, distinct works and gives me (a) once.
But how can I work with distinct and order by in the same time?
Consider why your query isn't working:
Without the distinct, you have rows with each pairing of u and p. When you use DISTINCT, how is it supposed to order when there are multiple lines for the same u, matching to multiple p's? That's an impossible task.
If you change it to order by u.id instead, then it works just fine.
I do encourage you to use labels, by the way, to restrict your query only to relevant nodes. You can also rework your query to prevent it from emitting duplicates and avoid the need for DISTINCT completely.
If we assume the nodes you're interested in are labeled with :Person, your query might be:
MATCH (p:Person)
WHERE EXISTS( (p)-[:likes]-() )
RETURN p ORDER BY p.id DESC
I have a query that takes too long to execute:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
MATCH c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
However, when I split it in two:
MATCH (s:Person{id:"103"}), s-[rel]-a
RETURN rel, s;
and
MATCH (s:Person{id:"103"}),
c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
they are much faster.
Why is it that passing rel and s to the next query makes it so much slower?
(I'm asking because that sample query is only a part of a bigger one and I pass on rel and s with the WITH instead of RETURN to the next part of the query)
Thank you
The first cycles for each node and relation found in the first MATCH:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
One row for each relation involving that node. I would use the third query, since rel is never used.
Maybe you try to "profile" the query in Neo4J console, it will give you some clues of how the query actually executed in server.
btw, why would you need to pass the on the "rel" since it never been used