How to make WITH Neo4J Cypher 2 query more efficient? - neo4j

I had a query of this kind, which would basically find a specific node "Statement", find all the nodes connected to it with an :OF relation, find all their connections to one another, as well all the relations of the node "Statement" to other nodes and the node "Statement" itself:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WITH DISTINCT to, c1, c2, s
MATCH c1-[by:BY]->u, c2-[at:AT]->ctx
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND by.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND at.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
DELETE s,rel,to,by,at;
This worked OK for when there was 3 nodes connected to the node "Statement", but when there's a 100, it crashes the database.
I tried playing around passing different nodes and relationships with a WITH, but it didn't help.
The closest to a solution that I could get was to set up automatic indexing on relationship properties and then execute the deletion with two queries:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
s-[by:BY]->u, s-[in:IN]->ctx, c-[of:OF]->s DELETE by,in,of,s;
START rel=relationship:relationship_auto_index
(statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878")
DELETE rel;
2 Questions:
1) I know that the first query took too long because there were too many iterations. How to avoid that?
2) Do you know how to combine the two faster queries above into one so that it works fast and preferably without using the relationship index and START clause?
Thank you!

For this statement
You must not separate the condition on to from the match. Then Cypher will find all matches first and only filter after it is done with that.
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
WITH DISTINCT to, c1, c2, s
MATCH c1-[by:BY]->u, c2-[at:AT]->ctx
WHERE by.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND at.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
DELETE s,rel,to,by,at;
Also I'm not sure if this c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2 doesn't span up a cross-product.
Just do this:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
RETURN count(*),count(distinct c1), count(distinct c2), count(distinct to)
to see some numbers.
You also don't seem to use (u) and (ctx) in the result? So might be an option to convert that into a condition. (Have to try), then you might even be able to leave of the with (if the cardinality with distinct is not much smaller than without.
....
WHERE c2-[:AT {at.statement:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}]->()
AND c1-[:BY {statement:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}]->()
DELETE s,rel,to,b
HTH
Would love to get the dataset to try it out.

Related

2 lists of nodes for a procedure call

I need 2 lists of nodes for the call of my procedure. The following query doesnt work because the first list is not defined (overwritten with the second collect I guess). I already tried a lot of queries but somehow im missing the right one. I think this one is showing what I actually want to achieve.
MATCH (n:NODE)
WHERE n.NODE_ELID='BLOCK1' OR n.NODE_ELID='BLOCK2'
WITH COLLECT(n) AS blockNodes
MATCH (m:NODE)
WHERE m.NODE_ELID='MUST1' OR m.NODE_ELID='MUST2'
WITH COLLECT(m) AS mustNodes
MATCH (from:NODE{NODE_ELID:'START'}),(to:NODE{NODE_ELID:'END'})
CALL example.aStar(from,to,'CONNECTED_TO','DISTANCE','COORD_X','COORD_Y',blockNodes,mustNodes) yield path as path, weight as weight
RETURN path, weight
Thanks in advance.
Pass along blockNodes in line 6:
WITH blockNodes, COLLECT(m) AS mustNodes
The point here is that WITH does many things: it performs projection, aggregation, filtering (as WITH clauses can have their own WHERE clause) and ordering/limiting. See the docs on WITH for more details.

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

In cypher, how do I return a list of connected nodes and their position in the string?

I'm sure this is an easy cypher query, but I'm relatively new to cypher, so apologies ahead of time, but I can't find a previously asked question.
If I have a bunch of nodes connected like this:
(:Start)-[:NEXT]->(step1)-[:NEXT]->(step2)-[:NEXT]->(step3)-[:NEXT]->etc.
And I want to return all the nodes in this group, I can write this:
match (s:Start)-[:NEXT*]->(steps)
return s, steps
But what if I want to order them by their distance from the starting node? Is there a characteristic I an apply order by to or is it more complicated than that?
Thanks
You can enforce the ordering by introducing a variable on the collection of :NEXT relationships, and ordering by their size (how many :NEXTs to get to the node).
MATCH (s:Start)-[rels:NEXT*]->(steps)
RETURN s, steps
ORDER BY SIZE(rels)
Nodes of paths are returned in their sequenced order, so you can use the nodes collection as starting point :
MATCH (s:Start)-[rels:NEXT*]->(steps)
UNWIND range(1, size(nodes(p))-1) AS i
RETURN nodes(p)[i] as node, i
ORDER BY i
Example of this query against the console example : http://console.neo4j.org/r/7nzgov

NEO4J nodes under a node filter by relationship

How can we add a relationship to the query.
Say A-[C01]-B-[C02]-D and A-[C01]-B-[C03]-E
C01 C02 C03 are relationship codes I want to get output
B E
because I want only nodes that can be reached unbroken by C01 or C03
How can I get this result in Cypher?
You may want to clarify, what you're asking for seems like a very simple case of matching. You may want to provide some more info, such as node labels and how you're matching to your start nodes, since without these we have to make things up for example code.
MATCH (a:Thing)
WHERE a.ID = 123
WITH a
MATCH (a)-[:C01|C03*]->(b:Thing)
RETURN b
The key here is specifying multiple relationship types to traverse, using * for multiplicity, so it will match on all nodes that can be reached by any chain of those relationships.

In neo4j is there a way to get path between more than 2 random nodes whose direction of relation is not known

I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.

Resources