Neo4j indices slow when querying across 2 labels - neo4j

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?

Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.

You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .

Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

Related

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

Collection in WITH clause gets expanded to one element per row

I want to create a map projection with node properties and some additional information.
Also I want to collect some ids in a collection and use this later in the query to filter out nodes (where ID(n) in ids...).
The map projection is created in an apoc call which includes several union matches.
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
MATCH (n)-[:has]->({"Vacation"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u, nodeWithInfo
This does not return anything because, when the where part is evaluated it doesn´t check "nodesWithAdditionalInfosIds" as a flat list but instead only gets one id per row.
The problem only exists because I am passing the ids (nodesWithAdditionalInfosIds) AND the nodeProjection (nodeWithInfo) on in the WITH clause.
If I instead only use the id collection and don´t use the nodeWithInfo projection the following adjustement works and returns my only the nodes which ids are in the id collection:
...
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
MATCH (n)-[:has]->({"Urlaub"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u
If I just return the collection "nodesWithAdditionalInfosIds" directly after the WITH clause in both examples this gets obvious. Since the first one generates a flat list in one result row and the second one gives me one id per row.
I have the feeling that I am missing a crucial knowledge about neo4js With clause.
Is there a way I can pass on my listOfIds and use it as a flat list without the need to have an exclusive WITH clause for the collection?
edit:
Right now I am using the following workaround:
After I do the check on the ID of "n" and "u" I don´t return but instead keep the filtered "n" and "u" nodes and start a second apoc call that returns "nodeWithInfo" like before.
WITH n, u
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo, n, u
WHERE nodeWithInfo.id = ID(n) OR nodeWithInfo.id = ID(u)
RETURN nodeWithInfo, n, u
This way I can return the nodes n, u and the additional information (to one of the nodes) per row. But I am sure there must be a better way.
I know ids in neo4j have to be used with care, if at all. In this case I only need them to be valid inside this query, so it doesn´t matter if the next time the same node has another id.
The problem is stripped down to the core problem (in my opinion), the original query is a little bigger with several UNION MATCH inside apoc and the actual match on nodes which ids are contained in my collection is checking for some more restrictions instead of asking for any node.
Aggregating functions, like COLLECT(), aggregate over a set of "grouping keys".
In the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
the grouping key is nodeWithInfo. Therefore, each nodesWithAdditionalInfosIds would always be a list containing one value.
And in the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
there is no grouping key. Therefore, in this situation, nodesWithAdditionalInfosIds will contain all the nodeWithInfo.id values.

Multiple Match queries in one query

I have the following records in my neo4j database
(:A)-[:B]->(:C)-[:D]->(:E)
(:C)-[:D]->(:E)
I want to get all the C Nodes and all the relations and related Nodes. If I do the query
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
I get the first to match if I do
Match (i:C)-[u:D]->(y:E)
Return i,u,y
I get the second to match.
But I want both of them in one query. How do I do that?
The easiest way is to UNION the queries, and pad unused variables with null (because all cyphers UNION'ed must have the same return columns
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
UNION
Match (i:C)-[u:D]->(y:E)
Return NULL as p, NULL as o,i,u,y
In your example though, the second match actually matches the last half of the first chain as well, so maybe you actually want something more direct like...
MATCH (c:C)
OPTIONAL MATCH (connected)
WHERE (c)-[*..20]-(connected)
RETURN c, COLLECT(connected) as connected
It looks like you're being a bit too specific in your query. If you just need, for all :C nodes, the connected nodes and relationships, then this should work:
MATCH (c:C)-[r]-(n)
RETURN c, r, n

cypher - relationship traversal vs ordering by property

graph snippet
I have a Shape with a list of Points.
I have the following requirements:
1) retrieve an ordered set of Points;
2) insert/remove a Point and preserve the order of the rest of the Points
I can achieve this by:
A) Point has a sequence integer property that could be used to order;
B) Add a :NEXT relationship between each Point to create a linked list.
I'm new to Neo4j so not sure which approach is preferable to satisfy the requirements?
For the first requirement, I wrote the following queries and found the performance for the traversal to be poor but Im sure its a badly constructed query:
//A) 146 ms
Match (s:Shape {id: "1-700-y11-1.1.I"})-[:POINTS]->(p:Point)
return p
order by p.sequence;
//B) Timeout! Bad query I know, but dont know the right way to go about it!
Match path=(s:Shape {id: "1-700-y11-1.1.I"})-[:POINTS]->(p1:Point)-[:NEXT*]->(p2:Point)
return collect(p1, p2);
To get ordered list of points use a slightly modified version of the second query:
Match path=(s:Shape {id: "1-700-y11-1.1.I"})-[:POINTS]->(P1:Point)
-[:NEXT*]-> (P2:Point)
WHERE NOT (:Point)-[:NEXT]->(P1) AND
NOT (P2)-[:NEXT]->(:Point)
RETURN TAIL( NODES( path) )
And for example query to delete:
WITH "id" as pointToDelete
MATCH (P:Point {id: pointToDelete})
OPTIONAL MATCH (Prev:Point)-[:NEXT]->(P)
OPTIONAL MATCH (P)-[:NEXT]->(Next:Point)
FOREACH (x in CASE WHEN Prev IS NOT NULL THEN [1] ELSE [] END |
MERGE (Prev)-[:NEXT]->(Next)
)
DETACH DELETE P

Including vars in Neo4j WITH statement changes query output

I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.

Resources