Neo4j - Intersect two node lists using Cypher - neo4j

Having the following graphs:
node g1 with child nodes (a, b)
node g2 with child nodes (b, c)
using the query
MATCH (n)-[]-(m) WHERE ID(m) = id RETURN n
being id the id of the node g1, I get a and b, and vice-versa when using the id of g2. What I would like to understand is how can I get the intersection of those two results, in this case having the first return (a, b) and the second return (b, c) getting as final result (b).
I tried using the WITH cause but I wasn't able to achieve the desired result. Keep in mind that I'm new to Neo4j and only came here after a few failed attempts, research on Neo4j Documentation, general google search and
Stackoverflow.
Edit1 (one of my tries):
MATCH (n)-[]->(m)
WHERE ID(m) = 750
WITH n
MATCH (o)-[]->(b)
WHERE ID(b) = 684 and o = n
RETURN o
Edit2:
The node (b), that I represented as being the same on both graphs are in fact two different nodes on the db, each one relating to a different graph (g1 and g2). Representatively they are the same as they have the exactly same info (labels and attributes), but on the database thy are not. I'm sorry since it was my fault for not being more explicit on this matter :(
Edit3:
Why I don't using a single node (b) for both graphs
Using the graphs above as example, imagine that I have yet another layer so: on g1 the child node (b) as a child (e), while on g2 the child node (b) as a child (f). If I had (b) as a single node, when I create (e) and (f) I only could add it to (b) loosing the hierarchy, becoming impossible to distinguish which of them, (e) or (f), belonged to g1 ou g2.

This should work (assuming you pass id1 and id2 as parameters):
MATCH (a)--(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN n;
[UPDATED, based on new info from comments]
If you have multiple "clones" of the "same" node and you want to quickly determine which clones are related without having to perform a lot of (slow) property comparisons, you can add a relationship (say, typed ":CLONE") between clones. That way, a query like this would work:
MATCH (a)--(m)-[:CLONE]-(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN m, n;

You can find the duplicity of the node, by using this query -
[1]
Duplicity with single node -
MATCH pathx =(n)-[:Relationship]-(find) WHERE find.name = "action" RETURN pathx;
[2]
or for two nodes giving only immediate parent node
MATCH pathx =(n)-[:Relationship]-(find), pathy= (p)-[:Relationship]
-(seek) WHERE find.name = "action" AND seek.name="requestID" RETURN pathx,
pathy;
[3]
or to find the entire network i.e. all the nodes connected -
MATCH pathx =(n)--()-[:Relationship]-(find), pathy= (p)--()-[:Relationship]-
(seek) WHERE find.name = "action"
AND seek.name="requestID" RETURN pathx, pathy;

Related

Neo4J Cypher - Unwinding nodes from a path variable

I have seen examples of unwinding lists, but not of unwinding a list of paths. How can I find, for example, all of the shortest paths between one type of node and another, and return or get the nodes that are found, in this example the nodes specifically being b.
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
UNWIND nodes(p) ... //get all of the b nodes
RETURN b
Note: I would like to use b within the query for another purpose (omitted), and therefore need to unwind the path into a list of b nodes.
After matching all shortest paths, if you just want the b nodes as a result, you may simply RETURN b. I don't believe you need to UNWIND it, since b is clearly identified in your MATCH
Edit:
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
WITH collect(b) as bees
UNWIND bees as b
//do something
return b
It seems like you just want to see all person nodes that have an incoming PARENT_OF node from another person node. If so, this should work:
MATCH ()-[:PARENT_OF]->(b:person)
RETURN DISTINCT b;
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
with nodes(p) as nodes
with nodes[size(nodes)-1] as b
return b

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Filtering out nodes on two cypher paths

I have a simplified Neo4j graph (old version 2.x) as the image with 'defines' and 'same' edges. Assume the number on the define edge is a property on the edge
The queries I would like to run are:
1) Find nodes defined by both A and B -- Requried result: C, C, D
START A=node(885), B=node(996) MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
Above works and returns C and D. But I want C twice since its defined twice. But without the distinct on x, it returns all the paths from A to B.
2)Find nodes that are NOT (defined by both A,B OR are defined by both A,B but connected via a same edge) -- Required result: G
Something like:
R1: MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
R2: MATCH (A-[:define]->(e)-(:similar)-(f)<-[:define]-B) RETURN e,f
(Nodes defined by A - (R1+R2) )
3) Find 'middle' nodes that do not have matching calls from both A and B --Required result: C,G
I want to output C due to the 1 define(either 45/46) that does not have a matching define from B.
Also output G because there's no define to G from B.
Appreciate any help on this!
Your syntax is a bit strange to me, so I'm going to assume you're using an older version of Neo4j. We should be able to use the same approaches, though.
For #1, Your proposed match without distinct really should be working. The only thing I can see is adding missing parenthesis around A and B node variables.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
RETURN x
Also, I'm not sure what you mean by "returns all paths from A to B." Can you clarify that, and provide an example of the output?
As for #2, we'll need several several parts to this query, separating them with WITH accordingly.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
WITH A, B, COLLECT(DISTINCT x) as exceptions
OPTIONAL MATCH (A)-[:define]->(x)-[:same]-(y)<-[:define]-(B)
WHERE x NOT IN exceptions AND y NOT IN exceptions
WITH A, B, exceptions + COLLECT(DISTINCT x) + COLLECT(DISTINCT y) as allExceptions
MATCH (aNode)
WHERE aNode NOT IN allExceptions AND aNode <> A AND aNode <> B
RETURN aNode
Also, you should really be using labels on your nodes. The final match will match all nodes in your graph and will have to filter down otherwise.
EDIT
Regarding your #3 requirement, the SIZE() function will be very helpful here, as you can get the size of a pattern match, and it will tell you the number of occurrences of that pattern.
The approach on this query is to first get the collection of nodes defined by A or B, then filter down to the nodes where the number of :defines relationships from A are not equal to the number of :defines relationships from B.
While we would like to use something like a UNION WITH in order to get the union of nodes defined by A and union it with the nodes defined by B, Neo4j's UNION support is weak right now, as it doesn't let you do any additional operations after the UNION happens, so instead we have to resort to adding both sets of nodes into the same collection then unwinding them back into rows.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)
WITH A, B, COLLECT(x) as middleNodes
MATCH (B)-[:define]->(x)
WITH A, B, middleNodes + COLLECT(x) as allMiddles
UNWIND allMiddles as middle
WITH DISTINCT A, B, middle
WHERE SIZE((A)-[:define]->(middle)) <> SIZE((B)-[:define]->(middle))
RETURN middle

Cypher query that will return only 1 relation of each type between two nodes

How can I craft a query that will return only one relation of a certain type between two nodes?
For example:
MATCH (a)-[r:InteractsWith*..5]->(b) RETURN a,r,b
Because (a) may have interacted with (b) many times, the result will contain many relations between the two. However, the relations are not identical. They have different properties because they occurred at different points in time.
But what if you're only interested in the fact that they have interacted at least once?
Instead of the result as it appears currently I'd like to receive a result that has either:
Only one random relation from the set of relations between (a) and (b)
Only those relations that fit to some criteria (e.g. "newest" or one of each type, ...)
One approach I have thought of is creating new relations of the type "hasEverInteractedWith". But there should be another way, right?
Use shortestPath() to get the quickest single result.
MATCH (a)-[:InteractsWith*..5]->(b)
WITH DISTINCT a, b
MATCH p = shortestPath((a)-[:InteractsWith*..5]->(b))
RETURN a, b, RELATIONSHIPS(p) AS r
If you want to get a specific one, you'll have to get all of the r and then filter them down, which will be slower (but provide more context).
MATCH (a)-[r:InteractsWith*..5]->(b)
WITH a, b, COLLECT(r) AS rs
RETURN a, b, REDUCE(s = HEAD(rs), r IN TAIL(rs)|CASE WHEN s.date > r.date THEN s ELSE r END)

Get nodes connected only to other nodes that have a property in a range

I have 4 types of nodes: S, G, R and C
S nodes have an idStr property that identifies them.
Every node of type G uses just a S node: (:G)-[:USES]->(:S)
Every node of type C may be connected to multiple R or G nodes: (:C)-[:CONNECTED_TO]->(:R|:G)
Every node of type R may be connected to multiple R or G nodes: (:R)-[:CONNECTED_TO]->(:R|:G)
Question:
Given an idStr range, I want to get all R and C nodes that are connected (directly or indirectly) only to G nodes that use S nodes with an idStr in that range.
The closest approach I have achieved is:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs
MATCH p=(n)-[:CONNECTED_TO*]->(c:G)
WITH FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
but still some nodes that are connected to G nodes that use S nodes not in the range are being returned. [Neo4j Console Test]
What am I trying to do?
First match is used to get two things: G nodes that use S nodes with idStr in the given range (GroupGs) and the C nodes that are connected to those G nodes.
Once we get that, we have to check if those C nodes are connected to more G nodes (directly or through R nodes). That is the second match.
Now we have to check for each C node if all the G nodes connected to it (directly or through R nodes) are in the GroupGs range. If it is so, that C node (and the R nodes in the paths to the G nodes) are a match, and that is what I am trying to get.
Second approach (suggested by #FrobberOfBits)
Trying to use just one match, so we are sure the n node is the same in the matching:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C), p=(n)-[:CONNECTED_TO*]->(c:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs, FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
The result is the same. [Neo4j Console Test]
Third approach (suggested by #FrobberOfBits)
Giving semantics to the problem, C may be an endpoint in a network, R a repeater, G a gateway and S a Sim card.
Sim nodes have an iccid property that identifies them.
Every node of type Gateway uses just a Sim node: (:Gateway)-[:USES]->(:Sim)
Every node of type Endpoint may be connected to multiple Repeater or Gateway nodes: (:Endpoint)-[:CONNECTED_TO]->(:Repeater|:Gateway)
Every node of type Repeater may be connected to multiple Repeater or Gateway nodes: (:Repeater)-[:CONNECTED_TO]->(:Repeater|:Gateway)
I am trying to get all the Repeater and Endpoint nodes that are just connected to Gateway nodes that are using Sim nodes whose iccid are in a range.
Any idea about what am I doing wrong?
Your query is really confusing things with the variables you choose -- binding "a" to label S's, and "b" to label G's? Later binding "c" to "G's" in the second match clause? This query is going to be hard to debug in the future, and makes it hard to see what's going on; consider binding label "G" to "g", or "gs", or similar, and so on.
I think your problem is the second match clause. The (c:G) in the second match clause doesn't relate to anything in the first (which is (b:G)). This means that the path via a set of CONNECTED_TO* relationships from some node to some (c:G) has nothing to do with the complex match on the first line of the query. This second match matches anything labeled G, not just the things you specify in the first match.
That second match is bad because of the requirement you stated:
only to G nodes that use S nodes with an idStr in that range
I don't have your test data, so I can't verify that this works. But here's something to try instead:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C),
p=(n)-[:CONNECTED_TO*]->(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs,
FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
Apologies if the syntax edited here isn't perfect; this is a complex query and is going to take some fiddling, but I think the placement and mis-labeling of that second MATCH is your issue. My solution may not be perfect and may require tinkering, but should get you there.
I think I finally got it:
MATCH (a:S)<-[:USES]-(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(b) AS GroupGs
MATCH (c)-[:CONNECTED_TO*]->(d:G)
WHERE NOT d IN GroupGs
WITH COLLECT(c) AS badCandidates,GroupGs
MATCH (e)-[:CONNECTED_TO*]->(f:G)
WHERE NOT e IN badCandidates AND f IN GroupGs
RETURN e
First I get GroupGs: all the G nodes that use a S node with an idStr property in the given range.
Now I collect all the C and R nodes that are connected to a G node not in the GroupGs and I call them badCandidates.
Finally, I get all the C and R nodes that are not in the badCandidates collection and are connected to a G node in the GroupGs.
Here you have an example: [Neo4j Console Test]
I hope this helps someone.

Resources