Inconsistent cypher query results in neo4j - neo4j

For illustrating this issue, create thousand nodes labeled z having incrementing numeric attribute zid.
FOREACH (i IN range(1, 1000)| CREATE (z:z { zid: i }));
Now find a node using random zid value between 1 and 1000.
MATCH (n:z { zid: round(rand()*1000)})
RETURN n;
The above cypher returns inconsistent results, sometimes no nodes are returned, sometimes multiple nodes are returned.
Tweaking the cypher as follows yields consistent results.
WITH round(rand()*1000) AS x
MATCH (n:z { zid: x })
RETURN x, n;
What is wrong with the first cypher query ?

The reason why you are receiving inconsistent results with the first query has to do with how Neo4j evaluates Cypher queries. The function round(rand()*1000) is evaluated for each of the items within the label index for z when using WHERE or concise syntax. When you use the WITH clause, the function is evaluated once.
That being said, this looks like a bug that is specific to the rand() function.

Related

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

Filter results in query before they are returned with Neo4j Cypher

I have a query that can return a potentially huge number of rows. It is extremely critical that it is as fast as possible.
Is there a way to change what is RETURNED based on what is matched? Here is a simplified version of the query:
MATCH p=(x)-[*]->(y) WHERE x.n="a"
WITH p,y OPTIONAL MATCH (y)<-[]-(z)
RETURN DISTINCT p,z
If the OPTIONAL MATCH finds results then there is no need for the large set 'p' to be returned.
What I would like to do with the results is:
IF z has results THEN RETURN z ELSE RETURN p
Thanks!
This query should work (it essentially produces 2 kinds of results):
MATCH p=(x)-[*]->(y) WHERE x.n="a"
OPTIONAL MATCH (y)<--(z)
RETURN DISTINCT
CASE WHEN z IS NULL THEN {p: p} ELSE {z: z} END AS res;
Note that unbounded patterns like [*] can take a long time (or never finish, or run out of memory) if you have a lot of data.
You can use the coalesce() function. It will return the first non-null value in the list of expressions passed to it. Try:
MATCH p=(x)-[*]->(y) WHERE x.n="a"
WITH p,y
OPTIONAL MATCH (y)<-[]-(z)
WITH coalesce(z, p) as result
RETURN DISTINCT result

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?
Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.
You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .
Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

How does count(nodes(p)) work in Cypher, Neo4j

I'm looking for an explanation of how this works and why doesn't return the number of nodes in a path. Suppose I matched a path p. Now:
WITH p, count(nodes(p)) AS L1 RETURN L1
returns 1.
When this is clear, how do I count paths nodes properly?
count() is an aggregate function. When using any aggregate function, result rows will be grouped by whatever is included in the RETURN clause and not an aggregate function. In this case, result rows will be grouped by p and the return value will be count(nodes(p)).
nodes(p) returns an array of nodes, so count(nodes(p)) will return the count of arrays and will always equal 1.
In order return the amount of nodes in the path you should use size(nodes(p)).
If you're just interested in the length of a path and not particularly in the nodes that are included in it, I would encourage you to use length(p). This will return the length in rels for a given path, without having to manipulate/access the nodes.

Querying multiple indexes not working if one condition fails in Neo4j

I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).
In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.

Resources