I have a query that takes too long to execute:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
MATCH c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
However, when I split it in two:
MATCH (s:Person{id:"103"}), s-[rel]-a
RETURN rel, s;
and
MATCH (s:Person{id:"103"}),
c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
they are much faster.
Why is it that passing rel and s to the next query makes it so much slower?
(I'm asking because that sample query is only a part of a bigger one and I pass on rel and s with the WITH instead of RETURN to the next part of the query)
Thank you
The first cycles for each node and relation found in the first MATCH:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
One row for each relation involving that node. I would use the third query, since rel is never used.
Maybe you try to "profile" the query in Neo4J console, it will give you some clues of how the query actually executed in server.
btw, why would you need to pass the on the "rel" since it never been used
Related
How does the where condition in neo4j works ?
I have simple data set with following relationship =>
Client -[CONTAINS {created:"yesterday or today"}]-> Transaction -[INCLUDES]-> Item
I would like to filter above to get the items for a transaction which were created yesterday, and I use the following query -
Match
(c:Client) -[r:CONTAINS]-> (t:Transaction),
(t) -[:INCLUDES]-> (i:Item)
where r.created="yesterday"
return c,t,i
But it still returns the dataset without filtering. What is wrong ? And how does the filtering works in neo4j for multiple MATCH statements say when I want to run my query on filetered dataset from previous steps?
Thank you very much in advance.
Your query seems fine to me. However, there are 2 things I would like to point out here:
In this case, the WHERE clause can be removed and use match by property instead.
The MATCH clause can be combined.
So, the query would be:
MATCH (c:Client) -[r:CONTAINS {created: "yesterday"}]-> (t:Transaction) -[:INCLUDES]-> (i:Item)
RETURN c, t, i
Regarding your second question, when you want to run another query on the filtered dataset from the previous step, use WITH command. Instead of returning the result, WITH will pipe your result to the next query.
For example, with your query, we can do something like this to order the result by client name and return only the client:
MATCH (c:Client) -[r:CONTAINS {created: "yesterday"}]-> (t:Transaction) -[:INCLUDES]-> (i:Item)
WITH c, t, i
ODERBY c.name DESC
RETURN c
There does not seem to be anything wrong with the cypher statement.
Applying subsequent MATCH statements can be done with the WITH clause, it's well documented here : https://neo4j.com/docs/cypher-manual/current/clauses/with/
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have a simple query
MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m
and when executing the query "manually" (i.e. using the browser interface to follow edges) I only get a single node as a result as there are no further connections. Checking this with the query
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o
shows no results and
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE) RETURN m
shows a single node so I have made no mistake doing the query manually.
However, the issue is that the first question takes ages to finish and I do not understand why.
Consequently: What is the reason such trivial query takes so long even though the maximum result would be one?
Bonus: How to fix this issue?
As Tezra mentioned, the variable-length pattern match isn't in the same category as the other two queries you listed because there's no restrictions given on any of the nodes in between n and m, they can be of any type. Given that your query is taking a long time, you likely have a fairly dense graph of :CONNECTION relationships between nodes of different types.
If you want to make sure all nodes in your path are of the same label, you need to add that yourself:
MATCH path = (n:TYPE {id:123})<-[:CONNECTION*]-(m:TYPE)
WHERE all(node in nodes(path) WHERE node:TYPE)
RETURN m
Alternately you can use APOC Procedures, which has a fairly efficient means of finding connected nodes (and restricting nodes in the path by label):
MATCH (n:TYPE {id:123})
CALL apoc.path.subgraphNodes(n, {labelFilter:'TYPE', relationshipFilter:'<CONNECTION'}) YIELD node
RETURN node
SKIP 1 // to avoid returning `n`
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o Is not a fair test of MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m because it excludes the possibility of MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:ANYTHING_ELSE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o.
For your main query, you should be returning DISTINCT results MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN DISTINCT m.
This is for 2 main reasons.
Without distinct, each node needs to be returned the number of times for each possible path to it.
Because of the previous point, that is a lot of extra work for no additional meaningful information.
If you use RETURN DISTINCT, it gives the cypher planner the choice to do a pruning search instead of an exhaustive search.
You can also limit the depth of the exhaustive search using ..# so that it doesn't kill your query if you run against a much older version of Neo4j where the Cypher Planner hasn't learned pruning search yet. Example use MATCH (n:TYPE {id:123})<-[:CONNECTION*..10]<-(m:TYPE) RETURN m
I need 2 lists of nodes for the call of my procedure. The following query doesnt work because the first list is not defined (overwritten with the second collect I guess). I already tried a lot of queries but somehow im missing the right one. I think this one is showing what I actually want to achieve.
MATCH (n:NODE)
WHERE n.NODE_ELID='BLOCK1' OR n.NODE_ELID='BLOCK2'
WITH COLLECT(n) AS blockNodes
MATCH (m:NODE)
WHERE m.NODE_ELID='MUST1' OR m.NODE_ELID='MUST2'
WITH COLLECT(m) AS mustNodes
MATCH (from:NODE{NODE_ELID:'START'}),(to:NODE{NODE_ELID:'END'})
CALL example.aStar(from,to,'CONNECTED_TO','DISTANCE','COORD_X','COORD_Y',blockNodes,mustNodes) yield path as path, weight as weight
RETURN path, weight
Thanks in advance.
Pass along blockNodes in line 6:
WITH blockNodes, COLLECT(m) AS mustNodes
The point here is that WITH does many things: it performs projection, aggregation, filtering (as WITH clauses can have their own WHERE clause) and ordering/limiting. See the docs on WITH for more details.
I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.