I have a Cypher query that is performing extremely poorly (~ 30 sec):
START foo=node:foos('Name:*')
MATCH foo<-[:HasMember]-()<-[:PartOf]-()<-[:Connected]-bar
WHERE foo.Name IN ["name1", "name2"] AND bar.Enabled = true
RETURN DISTINCT bar.Guid AS Guid, foo.Name AS Name
What I think what is happening is that the Lucene index is used to pull out all values, and then graph search is used to match to the names in the set, because if I change the query to the one below it is orders of magnitude faster (16 ms):
START foo=node:foos('Name:"name1" OR Name:"name2"')
MATCH foo<-[:HasMember]-()<-[:PartOf]-()<-[:Connected]-bar
WHERE bar.Enabled = true
RETURN DISTINCT bar.Guid AS Guid, foo.Name AS Name
Is there a way to get the first query to execute as fast as the second without resorting to manually building a Lucene query out of the name set?
The other option would be to use a traversal but I prefer to stay in Cypher-land if possible.
Maybe if you don't want to build the query, try filtering earlier?
Like
START foo=node:foos('Name:*')
WHERE foo.Name IN ["name1", "name2"]
WITH foo
MATCH foo<-[:HasMember]-()<-[:PartOf]-()<-[:Connected]-bar
WHERE bar.Enabled = true
RETURN DISTINCT bar.Guid AS Guid, foo.Name AS Name
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have the following Neo4J Cypher query:
MATCH (u:User{uid:'1819228'}),
(ctx:Context)-[:BY]->(u)
WITH DISTINCT ctx, u
MATCH (s:Statement)-[:IN]->(ctx),
(s)-[:BY]->(u)
RETURN DISTINCT s, ctx
ORDER BY s.timestamp ASC;
I have a feeling that something here is not efficient, because it runs pretty slow. Is it a case of creating a product with several query results?
What would be the best way to optimize this query to get the results in the same form?
This may be faster (and you may not need the DISTINCT):
MATCH (u:User)<-[:BY]-(s:Statement)-[:IN]->(ctx:Context)-[:BY]->(u)
WHERE u.uid = '1819228'
RETURN DISTINCT s, ctx
ORDER BY s.timestamp;
You may also want to create an index (or uniqueness constraint, which automatically creates an index) on :User(uuid), which will quickly find the desired User to start off the query.
I have the following Cypher Neo4J query, where I'm trying to find 3 nodes with a certain property and rename them:
MATCH (u:User{uid:"418938923891"}),
(ctx:Context{name:"seo_191220T1718"}), (ctx)-[:BY]->(u),
(ctxk:Context{name:"kwrds_191220T1718"}), (ctxk)-[:BY]->(u),
(ctxs:Context{name:"serp_191220T1718"}), (ctxs)-[:BY]->(u)
WITH DISTINCT ctx, ctxk, ctxs
SET ctx.name = "seo_storyn",
ctxk.name = "kwrds_storyn",
ctxs.name = "serp_storyn";
All works fine, however, when the query runs it says 24 properties changed, probably because the WITH results are multiplied.
Is there a more elegant and efficient way to do that?
This streamlined version of the query should update the same nodes that yours does. Is it possible that you have more than one node matching the name or uid values you searched for?
MATCH
(ctx:Context {name:"seo_191220T1718"})-[:BY]->(u:User {uid:"418938923891"}),
(ctxk:Context {name:"kwrds_191220T1718"})-[:BY]->(u),
(ctxs:Context {name:"serp_191220T1718"})-[:BY]->(u)
SET ctx.name = "seo_storyn",
ctxk.name = "kwrds_storyn",
ctxs.name = "serp_storyn"
RETURN u, ctx, ctxk, ctxs
I added a return clause so that you can see which patterns the query is matching. Try running it in neo4j browser, and then looking at the results in table view. That might give you an idea of why you are getting multiple results.
You could also break the query into individual parts and see whether multiple results are returned.
MATCH (ctx:Context {name:"seo_191220T1718"})-[:BY]->(u:User {uid:"418938923891"}) RETURN *
MATCH (ctxk:Context {name:"kwrds_191220T1718"})-[:BY]->(u:User {uid:"418938923891"}) RETURN *
MATCH (ctxs:Context {name:"serp_191220T1718"})-[:BY]->(u:User {uid:"418938923891"}) RETURN *
I want a query that starting from a node, it counts the possible end nodes given relation type:
For example this query:
MATCH (start:typeA{my_id:"abc"})-[:rel]->(l:typeB) return count(l)
works great and returns a proper number, i.e., 500. The same happens with:
MATCH p=(start:BusStop{StopCode:"0247"})-[:CAN_BOARD]->(:Leg) return count(p)
However if I do:
MATCH (start:typeA{my_id:"abc"}) return count((start)-[:rel]->(:typeB))
returns 1.
What is the difference between this query and the previous ones?
The result of a path expression (as used in your last query) is a list of paths. This is different than the result when the same path pattern is used in a MATCH clause.
You would have gotten 500 if you changed your last query to use SIZE() instead of COUNT():
MATCH (start:typeA{my_id:"abc"}) return SIZE((start)-[:rel]->(:typeB))
I have a query that I am trying to execute. The query works, but there isn't an option to see this data in graph format. Instead the data is returned in table/text format.
When I simplify the query, the output is displayed in graph format - No idea why,
This is the query that is giving me the issue:
MATCH (p:Person)-[hi:hasIdentity]->(i:Identity)
MATCH (j:Person)-[hi2:hasIdentity]->(i2:Identity)
MATCH (i)-[bl:Linked]->(i2)
WHERE NOT p=j
return DISTINCT(p.id), COUNT(DISTINCT(j))
LIMIT 5
Does anyone have any idea why that might be the case?
You'll need to return variables associated with nodes and/or relationships for it to display as a graph. As it is now you're returning properties of nodes (p.id), probably integers or strings. Try this return instead:
...
RETURN p, COUNT(DISTINCT j)
LIMIT 5
By the way, DISTINCT isn't a function, no need for parenthesis, and when you have a RETURN or WITH that has an aggregation, you don't need to use DISTINCT for that line since the non-aggregation variables become distinct since they act as the grouping key for the aggregation.