Neo4J exclude nodes after set property - neo4j

I have only just started with Neo4j so I suspect I am missing something very basic here. Given I have the following graph.
And starting on the highlighted node (ID 7937) I need to get all the connected nodes but nothing passed any of the "off" nodes.
Using this
match (n:TestNode)-[:LINK*]-(m)
where ID(n) = 7937
return *
Gives me everything of course which I would suspect due to no filter.
I need the end result to be:
This seems to give me the result I need. Is this the correct way or something better:
match p=n-[:LINK*..]-m
where ID(n) = 7937 and all(x in nodes(p) WHERE x.status = 'on')
return p;

Your query does not give you the results you said you wanted. That is, it does not include the nearest off nodes in the results.
Here is a query that does include the nearest off nodes. However, the results will contain partial paths as well as full paths; this is because your data has no consistent directionality to the LINK relationships, so it is hard to determine when we have reached the end of a path (it can probably be done, but it would make the query more complex).
MATCH p=n-[:LINK*..]-m
WHERE ID(n) = 7937
RETURN DISTINCT REDUCE(s =[n], x IN NODES(p)[1..] |
CASE (s[-1]).status
WHEN 'on' THEN s + x
ELSE s
END) AS res;
Here is a console that shows sample results.

Related

How to avoid specific paths in allShortestPath function?

The graph is supposed to represent a system similar to github, with commits (commit1, commit2, commit3 and commit4), documents (d1, d2) and changes on those documents (green nodes).
I am trying to use CYPHER to get all the documents values at a specific commit. In other words, I am trying to find the shortest path between the specific commit and each of the documents on my graph, but avoiding some paths.
Imagine if I am on commit4, d1 should be equal to foo2 and d2 should be equal to spain. This could be solved with the following CYPHER query:
MATCH (c:Commit {id: 'commit4'})-[:FOLLOWS|CHANGED*]->(:Value)<-[:EQUALS]-(d:Document), p = allShortestPaths((c)-[*]-(d))
RETURN p
This would give the following response:
Now, imagine that I want to be get the values on commit3. The request should not return any changes from the commit4. However, if I use the allShortestPaths function the way I do, it will go through commit4 since it is actually the shortest path to d1 and return the exact same response than if my starting node was commit4.
MATCH (c:Commit {id: 'commit3'})-[:FOLLOWS|CHANGED*]->(:Value)<-[:EQUALS]-(d:Document), p = allShortestPaths((c)-[*]-(d))
RETURN p
How could I avoid the allShortestPath function to go through a [:FOLLOWS]->(c) relationship and solve my problem?
From what you explained, I understand that you don't want to traverse the FOLLOWS edge in the opposite direction of the edge. To do so you can use cypher projection in algo.shortestPath:
MATCH (start:Commit {name:'commit4'})
MATCH (end:Document)
CALL algo.shortestPath.stream(start, end, null,{
nodeQuery:'MATCH (n) RETURN id(n) AS id',
relationshipQuery:'MATCH (n)-[r:FOLLOWS|CHANGED]->(p) RETURN id(n) AS source, id(p) AS target UNION MATCH (n)-[r:EQUALS]-(p) RETURN id(n) AS source, id(p) AS target',
graph:'cypher'})
YIELD nodeId, cost
WITH end as document, algo.asNode(nodeId) AS value WHERE "Value" in labels(value)
return document, value
Replace "commit4" with any other commit name.

Cypher: WHERE ALL Not working as expected

So my query is for a "superpath finding problem".
The relevant nodes here are;
route: The overall path object
tlroutesegment: The logical link between the route and the different segments (which compose the full path) (ps: I know this could be better represented using a relationship, however the database is just made this way :S)
oms: The PHYSICAL path segments itself
validochpath: More or less irrelevant for this question; Top level entity of routes
So on to the actual problem I am having; below is a WORKING solution to the above, HOWEVER, I wanted to optimize the query a bit by reducing the # of routes we have to search through in the 4th line here.
MATCH (vp:validochpath {"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)
WITH collect(oms) AS omsNodes
MATCH (ort:route)
WHERE ALL(x in omsNodes WHERE (ort)<--(:tlroutesegment)-->(x))
WITH ort
MATCH (ort)--(vp:validochpath)
RETURN *
This is what the new query looks like, as you can see I use the relation to filter out much of the route nodes.
MATCH (vp:validochpath {onepID:"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)<--(rs2:tlroutesegment)
WITH rs2, collect(oms) AS omsNodes
MATCH (rs2)-->(ort2:route)
WHERE ALL(x in omsNodes WHERE (x)<--(:tlroutesegment)-->(ort2))
MATCH (ort2)--(vp:validochpath)
RETURN *
The problem is, this query does not seem to filter out any nodes with the WHERE ALL and just returns everything.
In your second query, the WHERE clause accepts all matches.
From the first MATCH clause, we know that rs2 is a tlroutesegment and that all the nodes in omsNodes are related to rs2. From the second MATCH clause, we also know that ort2 is related to rs2. Your WHERE clause is checking that all the nodes in omsNodes are related to a tlroutesegment that is also related to ort2. Since rs2 is a tlroutesegment, this test always succeeds.
If you want to test for the existence of paths with tlroutesegment nodes that are different than rs2, try this WHERE clause:
WHERE ALL(x in omsNodes WHERE
SIZE([(x)<--(y:tlroutesegment)-->(ort2) WHERE y <> rs2 | y]) > 0)

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?
Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.
You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .
Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

Neo4j cypher - search for nodes with no path between them

I'm trying to find a generic way to search for a node or set of nodes which does not have a link to a another node or set of nodes.
As an example, I was able to find all the nodes of a specific type (e.g. :Style) which are connected somehow to a specific set of nodes (e.g. :MetadataRoot), with the following:
match (root:MetadataRoot),
(n:Style),
p=shortestPath((root)-[*]-(n))
return p
Using this, I was able to subtract the set of all :Style nodes from the nodes returned by the above query, but that doesn't seem like the best way to go about this.
If you know the label of the start nodes you can use the EXISTS function :
MATCH (n:Style)
WHERE NOT EXISTS((n)-[]-())
RETURN n
If you know the end node :
MATCH (n:Style)
WHERE NOT EXISTS ((n)-[*]-(:MetadataRoot))
RETURN n
EDIT :
Not sure, but regarding the performance issues in your comment, a workaround could be something like this :
MATCH p=allShortestPaths((n:Style)-[*]-(:MetadataRoot))
WITH nodes(p) as nodesRelated
MATCH (s:Style) WHERE NOT s IN nodesRelated
This should be way faster and it should need less resources to execute:
MATCH (n:Style),
OPTIONAL MATCH p=shortestPath((:MetadataRoot)-[*0..40]-(n))
WITH n, p
WHERE p IS NULL
RETURN n ```

Resources