Return nodes of a connected node - neo4j

For a given node n, I want to get its related nodes and all nodes connected to the related nodes.
For example:
MATCH (n)-[:IN]->(x)
WHERE n.myid='myid'
RETURN n, x
How do I return all the x's connected nodes as well?

Assuming that you don't want type(r) to be IN,
Have you tried?
MATCH (n)-[:IN]->(x)-[]-(y)
WHERE n.myid='myid'
AND y<>n
RETURN n,x,collect(y)
I have it collecting y because you will otherwise get a bunch of "rows" for each, but that is totally up to you.
Try playing on:
http://console.neo4j.org
Console with above example
http://console.neo4j.org/r/yaczrx
Also, you may want to look into how deep you want that second lookup to go.
BTW: if you want to see the path in the console (you can see how the nodes are interconnected): http://console.neo4j.org/r/39mz9a
MATCH path =(n:Crew)-[:KNOWS]-m-[rr]-(x)
WHERE n.name='Neo' AND x<>n
RETURN n AS Initial_Node, m AS Linking_Node,
collect(x) AS Nodes_connected_to_Linking_Node, path

MATCH (n)-[:IN]->(x)--(y)
WHERE n.myid='myid'
RETURN n, x,y

Related

How to filter components in according to nodes values in Neo4j

I created my network in Neo4j, in particular, it's composed of many "chains" ( every node can have at most one incoming edge and at most one outgoing edge). How can I make a query in order to return all those chains composed by only nodes having a value in range <x,y>? (you can consider every node has Identifier|date|value)
example: >7
3-->10-->9-->4 IGNORED
8-->10-->9-->12 TAKEN
ps: I tried to use libraries such as gds, and it seems very helpful, but still I can't figure out.
Thank you
I would try the following:
MATCH p= (n)-[*]->(m)
WHERE NOT (n)<--() AND NOT (m)-->()
WITH p
WHERE all(node in nodes(p) WHERE x < node.value y)
RETURN p
First, you filter all the paths from start to end and then apply the range filter on all nodes in the path.
Edit: based on comment, to consider also chains composed of one node you can do:
MATCH p= (n)-[*0..]->(m)
WHERE NOT (n)<--() AND NOT (m)-->()
WITH p
WHERE all(node in nodes(p) WHERE x < node.value y)
RETURN p

Get nodes which are not connected to specific node in Neo4j

I want to get all the nodes which are not connected to the given set of nodes. Suppose I've 5 nodes A,B,C,D,E. Now A->B->C are connected with the :Is_Friend relationship. Now I want all the nodes which are not connected to A (i.e D and E).
I tried this query but it's not working
MATCH (a:Friend{name:"A"})-[:Is_Friend_Of*]->(b:Friend)
MATCH (c:Friend)
WHERE NOT (c)-[:Is_Friend_Of]->(b)
RETURN c
Thi query should do what you want it to, however, I would caution that depending on the size of the number of unmatched friends in your database you could get a lot of matches.
// match the single longest chain of friends in a :Is_Friend_Of relationship
// starting with 'A' that is possible
MATCH path=(a:Friend {name:"A"})-[:Is_Friend_Of*]->(b:Friend)
WHERE NOT (b)-[:Is_Friend_Of*]->()
WITH path
// then find the other friends that aren't in that path
MATCH (c:Friend)
WHERE NOT c IN nodes(path)
RETURN c

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

Neo4j cypher - Counting immediate children of root nodes

I'm struggling with a problem despite having read a lot of documentation... I'm trying to find my graph root node (or nodes, they may be several top nodes) and counting their immediate children (all relations are typed :BELONGS_TO)
My graph looks like this (cf. attached screenshot). I have been trying the following query which works as long as the root node only has ONE incomming relationship, and it doesn not when it has more than one. (i'm not realy familiar with the cyhper language yet).
MATCH (n:Somelabel) WHERE NOT (()-[:BELONGS_TO]->(n:Somelabel)) RETURN n
Any help would be much appreciated ! (i haven't even tried to count the root nodes immediate children yet...which would be "2" according to my graph)
Correct query was given by cybersam
MATCH (n:Somelabel) WHERE NOT (n)-[:BELONGS_TO]->() RETURN n;
MATCH (n:Somelabel)<-[:BELONGS_TO]-(c:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->() RETURN n, count(c);
Based on your diagram, it looks like you are actually looking for "leaf" nodes. This query will search for all Somelabel nodes that have no outgoing relationships, and return each such node along with a count of the number of distinct nodes that have a relationship pointing to that node.
MATCH (n:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->()
OPTIONAL MATCH (m)-[:BELONGS_TO]->(n)
RETURN n, COUNT(DISTINCT m);
If you are actually looking for all "root" nodes, your original query would have worked.
As a sanity check, if you have a specific node that you believe is a "leaf" node (let's say it has an id value of 123), this query should return a single row with null values for r and m. If you get non-null results, then you actually have outgoing relationships.
MATCH (n {id:123})
OPTIONAL MATCH (n)-[r]->(m)
RETURN r, m

Neo4j cypher - search for nodes with no path between them

I'm trying to find a generic way to search for a node or set of nodes which does not have a link to a another node or set of nodes.
As an example, I was able to find all the nodes of a specific type (e.g. :Style) which are connected somehow to a specific set of nodes (e.g. :MetadataRoot), with the following:
match (root:MetadataRoot),
(n:Style),
p=shortestPath((root)-[*]-(n))
return p
Using this, I was able to subtract the set of all :Style nodes from the nodes returned by the above query, but that doesn't seem like the best way to go about this.
If you know the label of the start nodes you can use the EXISTS function :
MATCH (n:Style)
WHERE NOT EXISTS((n)-[]-())
RETURN n
If you know the end node :
MATCH (n:Style)
WHERE NOT EXISTS ((n)-[*]-(:MetadataRoot))
RETURN n
EDIT :
Not sure, but regarding the performance issues in your comment, a workaround could be something like this :
MATCH p=allShortestPaths((n:Style)-[*]-(:MetadataRoot))
WITH nodes(p) as nodesRelated
MATCH (s:Style) WHERE NOT s IN nodesRelated
This should be way faster and it should need less resources to execute:
MATCH (n:Style),
OPTIONAL MATCH p=shortestPath((:MetadataRoot)-[*0..40]-(n))
WITH n, p
WHERE p IS NULL
RETURN n ```

Resources