Imagine that i have a graph in which for every pair of nodes m,n of type Nod1 there can be a node k of type Nod2 that connect them through relationships of type Rel, that is, there can be multiple patterns of the kind p=(m:Nod1)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1). For a given node m (satisfying for example m.key="whatever") how can i find the node n that maximizes the number of nodes k that connect m to n? For example: imagine that there are 3 nodes k that connects m to n1 satisfying n1.key="hello" and 10 nodes k that connects m to n2 satisfying n2.key="world"; how to build a query that retrieves the node n2? :)
The title of the question is count duplicated, because i think that the problem is solved if i can count all "duplicated" patterns for each node n (that is, all patterns that has n as "endnode")!! :)
Start by matching your m; then match the pattern you want, then filter by distinct n nodes, and count the number of k nodes connected via that n node, and you should be there.
MATCH (m:Nod1 { key: "whatever" })
WITH m
MATCH (m)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1)
RETURN distinct(n), count(k) as x
ORDER BY x DESC;
Related
I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).
(Using Neo4j 3.x and neo4j.v1 Python driver)
I have two tracks T1 and T2 to the same target. Somewhere before reaching the target, the two tracks meet at node X and become one until the target is reached.
Track T1: T----------X-----------A
Track T2: '-----Q
I use the following Cypher query to generate each one of the tracks:
UNWIND {coords} AS coordinates
UNWIND {pax} AS pax
CREATE (n:Node)
SET n = coordinates
SET n.pax = pax
RETURN n
using the parameter list, e.g. {'pax': 'A', 'coords': [{'id': 0, 'lon': '8.553095', 'lat': '47.373146'}, etc.]}
and then link the nodes using the id only for the purpose of keeping the sequence of the trackpoints:
UNWIND {pax} AS pax
MATCH (n:Node {pax: pax})
WITH n
ORDER BY n.id
WITH COLLECT(n) AS nodes
UNWIND RANGE(0, SIZE(nodes) - 2) AS idx
WITH nodes[idx] AS n1, nodes[idx+1] AS n2
MERGE (n1)-[:NEXT]->(n2)
From the (unknown) point X (CS1 in the picture above) on, both tracks have identical trackpoints. I can match those using:
MATCH (n:Node), (m:Node)
WHERE m <> n AND n.id < m.id AND n.lat = m.lat AND n.lon = m.lon
MERGE (n)-[:IS]->(m)
with lat, lon being the (identical) coordinates. This is just my clumsy way to determine the first joint trackpoint. What I really need is to have one (linked) track from point X onward with the pax property updated, e.g. as ['A', 'B']
Question 1 (generalized):
How can I merge two nodes with a relationship into one node with an updated property? C3 and S3 merge into a new node CS3.
Question 2:
How can I do this if I have two linked lists with a set of pairwise identical properties?
(Ax)-[:NEXT]-> (A1)-[:NEXT]->(A2)-[:NEXT]->(A3)
(Ax)-[:NEXT]-> (B1)-[:NEXT]->(B2)-[:NEXT]->(B3)
where Ax.x <> Bx.x but A1.x = B1.X and A2.x = B2.x etc.
Thank you all for your hints and helpful ideas.
As the title says, I have a graph of nodes which are interconnected with a relationship N. I now want to find all pairs of nodes which are further than 20 hops away from each other.
A naive approach with the following cypher query is far too slow:
MATCH (n:CELL)
WITH n
MATCH (k:CELL)
WHERE NOT (n)-[:N*1..20]->(k)
RETURN n, k
I could create a second relationship K with a "distance" property and then match that, but to do so for every Node doesn't exactly scale well (I've got 18k nodes, so I would need more than 160 million new relationships).
Is there any other way to solve this in neo4j?
You could try to use shortest-path which is more efficient.
MATCH (n:CELL)
WHERE shortestPath((n)-[:N*..20]->(k:CELL)) IS NULL
RETURN n, k
What about something like this:
MATCH p=((n:CELL-[:N*..20]->(k:CELL))
WITH n, k, min(length(p)) as minDinstance
WHERE minDinstance > 20/2 AND n <> k
RETURN DISTINCT n, k, minDinstance
I have 4 types of nodes: S, G, R and C
S nodes have an idStr property that identifies them.
Every node of type G uses just a S node: (:G)-[:USES]->(:S)
Every node of type C may be connected to multiple R or G nodes: (:C)-[:CONNECTED_TO]->(:R|:G)
Every node of type R may be connected to multiple R or G nodes: (:R)-[:CONNECTED_TO]->(:R|:G)
Question:
Given an idStr range, I want to get all R and C nodes that are connected (directly or indirectly) only to G nodes that use S nodes with an idStr in that range.
The closest approach I have achieved is:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs
MATCH p=(n)-[:CONNECTED_TO*]->(c:G)
WITH FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
but still some nodes that are connected to G nodes that use S nodes not in the range are being returned. [Neo4j Console Test]
What am I trying to do?
First match is used to get two things: G nodes that use S nodes with idStr in the given range (GroupGs) and the C nodes that are connected to those G nodes.
Once we get that, we have to check if those C nodes are connected to more G nodes (directly or through R nodes). That is the second match.
Now we have to check for each C node if all the G nodes connected to it (directly or through R nodes) are in the GroupGs range. If it is so, that C node (and the R nodes in the paths to the G nodes) are a match, and that is what I am trying to get.
Second approach (suggested by #FrobberOfBits)
Trying to use just one match, so we are sure the n node is the same in the matching:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C), p=(n)-[:CONNECTED_TO*]->(c:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs, FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
The result is the same. [Neo4j Console Test]
Third approach (suggested by #FrobberOfBits)
Giving semantics to the problem, C may be an endpoint in a network, R a repeater, G a gateway and S a Sim card.
Sim nodes have an iccid property that identifies them.
Every node of type Gateway uses just a Sim node: (:Gateway)-[:USES]->(:Sim)
Every node of type Endpoint may be connected to multiple Repeater or Gateway nodes: (:Endpoint)-[:CONNECTED_TO]->(:Repeater|:Gateway)
Every node of type Repeater may be connected to multiple Repeater or Gateway nodes: (:Repeater)-[:CONNECTED_TO]->(:Repeater|:Gateway)
I am trying to get all the Repeater and Endpoint nodes that are just connected to Gateway nodes that are using Sim nodes whose iccid are in a range.
Any idea about what am I doing wrong?
Your query is really confusing things with the variables you choose -- binding "a" to label S's, and "b" to label G's? Later binding "c" to "G's" in the second match clause? This query is going to be hard to debug in the future, and makes it hard to see what's going on; consider binding label "G" to "g", or "gs", or similar, and so on.
I think your problem is the second match clause. The (c:G) in the second match clause doesn't relate to anything in the first (which is (b:G)). This means that the path via a set of CONNECTED_TO* relationships from some node to some (c:G) has nothing to do with the complex match on the first line of the query. This second match matches anything labeled G, not just the things you specify in the first match.
That second match is bad because of the requirement you stated:
only to G nodes that use S nodes with an idStr in that range
I don't have your test data, so I can't verify that this works. But here's something to try instead:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C),
p=(n)-[:CONNECTED_TO*]->(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs,
FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
Apologies if the syntax edited here isn't perfect; this is a complex query and is going to take some fiddling, but I think the placement and mis-labeling of that second MATCH is your issue. My solution may not be perfect and may require tinkering, but should get you there.
I think I finally got it:
MATCH (a:S)<-[:USES]-(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(b) AS GroupGs
MATCH (c)-[:CONNECTED_TO*]->(d:G)
WHERE NOT d IN GroupGs
WITH COLLECT(c) AS badCandidates,GroupGs
MATCH (e)-[:CONNECTED_TO*]->(f:G)
WHERE NOT e IN badCandidates AND f IN GroupGs
RETURN e
First I get GroupGs: all the G nodes that use a S node with an idStr property in the given range.
Now I collect all the C and R nodes that are connected to a G node not in the GroupGs and I call them badCandidates.
Finally, I get all the C and R nodes that are not in the badCandidates collection and are connected to a G node in the GroupGs.
Here you have an example: [Neo4j Console Test]
I hope this helps someone.
I've got a Cypher query that gets a set of nodes 'n' of type 't', say (it works it's way through a number of different node types in the graph to reach this point).
If we assume the following:
The rest of type t nodes are the set 'm', so no intersect between m and n.
Type t nodes have multiple types of relationships between them.
I have a specific relationship 'r' that I'm interested in. In this specific case I know the following to be true:
Type t nodes can have 0 or more of these r relationships, incoming/outgoing.
The nodes within set n have no outgoing r relationships to set m
The nodes within set m may have outgoing r relationships to set m or n.
I have set n, I'm trying to determine the nodes from set m that meet the following conditions:
Have 0 r relationships
OR
Only have r relationships to set n, but not to any node in set m.
Some example data:
Type t nodes:
n1, n2, n3
m1, m2, m3
Type r relationships
m1 (no r relationships)
m2->n1, m2->n2
m3->n3, m3->m2
The results should return m1 and m2, but not m3.
I'm quite new to Cypher, so feel free to point to relevant documentation as required. Also, if you can explain the process you go through to determine the answer, I'd appreciate that as I suspect I'm just not quite understanding something simple here.
Your example is more model than data, you may know how to tell m:s and n:s apart but I cant write a query on the identifiers alone, there must be some actual data or structure to discriminate. For isntance, assume all nodes in the graph are type t, let sets n, m be distinguished by labels :N, :M, let the identifiers you use be values for property uid (to make the query results map with your question), and let type r relationship be [:R], then create your graph with
CREATE
(n1:N{uid:"n1"}), (n2:N{uid:"n2"}), (n3:N{uid:"n3"})
,(m1:M{uid:"m1"}), (m2:M{uid:"m2"}), (m3:M{uid:"m3"})
, m2-[:R]->n1, m2-[:R]->n2
, m3-[:R]->n3, m3-[:R]->m2
The query could then look something like
MATCH (n:N) // bind each node in the set n
WITH collect(n) AS nn // collect and treat them as a set nn
MATCH (m:M) // grab each node in the set m
OPTIONAL MATCH m-[:R]->(x) // optionally expand from m to unknown by r
WITH nn, m, collect(x) AS xx // collect unknown per m as xx where
WHERE ALL (x IN xx // all unknown nodes are in the nn set
WHERE x IN nn) // (if m has no -[:R]-> then the set xx is empty
// and the condition is true–i.e.
// either m has no outgoing r or
// the other node is in nn)
RETURN m
Result
m
(3:M {uid:"m1"})
(4:M {uid:"m2"})
You can try the query here.