Cypher: extract only nodes with independent connections to two other nodes? - neo4j

I have the following query:
MATCH (rebecca:Person)-[r1*1..3]-(robert:Person)
WHERE rebecca.name=Rebecca AND robert.name='Robert'
RETURN rebecca, robert,
extract(x IN r1 | {rel: x, start: startNode(x), end: endNode(x)})
This returns all the nodes and edges within 3 hops of both Rebecca or Robert. So it includes some nodes that are 2 hops from Rebecca, and 3 hops from Robert, where his connection is via Rebecca.
Is there a way I can exclude the nodes for which Robert's only connection is via Rebecca, and vice versa?
I'm interested in connections that they truly share independently, not where the only connection is via each other.

With 3 hops, the only possible way to go through the other is if (from Rebecca for this example):
rebecca->robert->node->robert
So robert would have to appear twice (or rebecca, from the other direction).
It seems to me that you just need a restriction that rebecca and robert must appear only once each in the path:
MATCH p=(rebecca:Person)-[r1*1..3]-(robert:Person)
WHERE rebecca.name=Rebecca AND robert.name='Robert'
AND SINGLE(rebecca in nodes(p)) AND SINGLE(robert in nodes(p))
RETURN rebecca, robert,
extract(x IN r1 | {rel: x, start: startNode(x), end: endNode(x)})

Related

How can I find the set of lowest parents (or ancestors) nodes of P1 that cut all paths between P1 and P2

Suppose that I have the following graph. How can I find the lowest set of nodes that cut all (oriented) paths between nodes [1,2,3,4] and node [66]. In my case I want to find nodes [11, 5,9, 6] (node 7 must be excluded because node 6 and 9 are more lower ancestor to nodes 3 and 4. Thank you for your helps.
I have an answer to a similar question here.
Here are the steps I would take:
Match to the starting nodes [1,2,3,4] and collect them for later.
Match to and expand out from your end node (66) to all connected nodes (using the directed pattern), blacklisting the starting nodes from step 1 (so we don't include paths to those nodes or beyond them). Collect these nodes as descendents.
Expand from your starting nodes, terminating at descendents (so we get paths to the first descendent encountered, but don't continue expanding past any of them).
Cypher doesn't have great support for performing the termination during expansion behavior in step 3, so we need path expander procs from APOC Procedures for that.
Let's say that these are nodes of type :Node with id properties for the numeric values, with :PARENT relationships between them pointing toward parents/ancestors. Let's say we have an index on :Node(id) for quick lookup. Using APOC our query would look something like:
MATCH (n:Node)
WHERE n.id IN [1,2,3,4]
WITH collect(n) as startNodes
MATCH path = (end:Node {id:66})<-[:PARENT*]-(descendent)
WHERE none(node in nodes(path) WHERE node in startNodes)
WITH startNodes, end, collect(DISTINCT descendent) as descendents
CALL apoc.path.subgraphNodes(startNodes, {terminatorNodes:descendents}) YIELD node as mostRecentDescendents
RETURN mostRecentDescendents

neo4j get random path from known node

I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
I try to get chain started with Rita Ora
She has relations with
Drake, Jay Z and Justin Bieber
Query takes random from these guys, for example Jay Z
Then Query takes relations of Jay Z: Karrine
Steffans, Rosario Dawson and Rita Ora
Query can't take Rita Ora cuz
she is already in chain, so it takes random from others two, for
example Rosario Dawson
...
And at the end we should have a chain Rita Ora - Jay Z - Rosario Dawson - other celeb - other celeb 2
Is that possible to do it by query?
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ... construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN the results.
Note. Instead of UNWIND and count(DISTINCT ...), getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.

Find the distance in a path between each node and the last node of the path

I am very new to Cypher and I need help to solve a problem I am facing..
In my graph I have a path represeting a data stream and I need to know, for each node in the path, the distance from the last node of the path.
For example if i have the following path:
(a)->(b)->(c)->(d)
the distance must be 3 for a, 2 for b, 1 for c and 0 for d.
Is there an efficient way to obtain this result in Cypher?
Thanks a lot!
Mauro
If it is just hops between nodes then i think this will fit the bill.
match p=(a:Test {name: 'A'})-[r*3]->(d:Test {name: 'D'})
with p, range(length(p),0,-1) as idx
unwind idx as elem
return (nodes(p)[elem]).name as Node
, length(p) - elem as Distance
order by Node
In this answer, I define a path to be "complete" if its start node has no incoming relationship and its end node has no outgoing relationship.
This query returns, for each "complete" path, a collection of objects containing each node's neo4j-generated ID and the number of hops to the end of that path:
MATCH p=(x)-[*]->(y)
WHERE (NOT ()-->(x)) AND (NOT (y)-->())
WITH NODES(p) AS np, LENGTH(p) AS lp
RETURN EXTRACT(i IN RANGE(0, lp, 1) | {id: ID(np[i]), hops: lp - i})
NOTE: Matching with [*] will be costly with large graphs, so you may need to limit the maximum hop value. For example, use [*..4] instead to limit the max hop value to 4.
Also, qualifying the query with appropriate node labels and relationship types may speed it up.

Multiple relationships in Match Cypher

Trying to find similar movies on the basis of tags. But I also need all the tags for the given movie and its each similar movie (to do some calculations). But surprisingly collect(h.w) gives repeated values of h.w (where w is a property of h)
Here is the cypher query. Please help.
MATCH (m:Movie{id:1})-[h1:Has]->(t:Tag)<-[h2:Has]-(sm:Movie),
(m)-[h:Has]->(t0:Tag),
(sm)-[H:Has]->(t1:Tag)
WHERE m <> sm
RETURN distinct(sm), collect(h.w)
Basically a query like
MATCH (x)-[h]->(y), (a)-[H]->(b)
RETURN h
is returning each result for h n times where n is the number of results for H. Any way around this?
I replicated the data model for this question to help answer it.
I then setup a sample dataset using Neo4j's online console: http://console.neo4j.org/?id=dakmi3
Running the following query from your question:
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
RETURN DISTINCT sm, collect(h.weight)
Which results in:
(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.31, 0.12, 0.31, 0.01, 0.31, 0.01]
The issue is that there are duplicate relationships being returned, which results in duplicated weight in the collection. The solution is to use WITH to limit relationships to distinct records and then return the collection of weights of those relationships.
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH DISTINCT sm, h
RETURN sm, collect(h.weight)
(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.01]
I'm afraid I still don't quite get your intention, but about the general question of duplicate results, that is just the way a disconnected pattern works. Cypher must consider something like
(:A), (:B)
as one pattern, not two. That means that any satisfying graph structure is considered a distinct match. Suppose you have the graph resulting from
CREATE (:A), (:B), (:B)
and query it for the pattern above, you get two results, namely
neo4j-sh (?)$ MATCH (a:A),(b:B) RETURN *;
==> +-------------------------------+
==> | a | b |
==> +-------------------------------+
==> | Node[15204]{} | Node[15207]{} |
==> | Node[15204]{} | Node[15208]{} |
==> +-------------------------------+
==> 2 rows
==> 53 ms
Similarly when matching your pattern (x)-[h]->(y), (a)-[H]->(b) cypher considers each combination of the two pattern parts to make up a unique match for the one whole pattern–so the results for h are compounded by the results for H.
This the way the pattern matching works. To achieve what you want you could first consider if you really need to query for a disconnected pattern. If you do, or if a connected pattern also generates redundant matches, then aggregate one or more of the pattern parts. A simple case might be
CREATE (a:A), (b1:B), (b2:B)
, (c1:C), (c2:C), (c3:C)
, a-[:X]->b1, a-[:X]->b2
, a-[:Y]->c1, a-[:Y]->c2, a-[:Y]->c3
queried with
MATCH (b:B)<-[:X]-(a:A)-[:Y]->(c:C) // with 1 (a), 2 (b) and 3 (c) you get 6 matched paths
RETURN a, collect (b) as bb, collect (c) as cc // after aggregation by (a) there is one path
Sometimes it makes sense to do the aggregation as an intermediate step
MATCH (b)<-[:X]-(a:A) // 2 paths
WITH a, collect(b) as bb // 1 path
MATCH a-[:Y]->(c) // 3 paths
RETURN a, bb, collect(c) as cc // 1 path

finding the farthest node using Neo4j (node without any incoming relation)

I have created a graph db in Neo4j and want to use it for generalization purposes.
There are about 500,000 nodes (20 distinct labels) and 2.5 million relations (50 distinct types) between them.
In an example path : a -> b -> c-> d -> e
I want to find out the node without any incoming relations (which is 'a').
And I should do this for all the nodes (finding the nodes at the beginning of all possible paths that have no incoming relations).
I have tried several Cypher codes without any success:
match (a:type_A)-[r:is_a]->(b:type_A)
with a,count (r) as count
where count = 0
set a.isFirst = 'true'
or
match (a:type_A), (b:type_A)
where not (a)<-[:is_a*..]-(b)
set a.isFirst = 'true'
Where is the problem?!
Also, I have to create this code in neo4jClient, too.
Your first query will only match paths where there is a relationship [r:is_a], so counting r can never be 0. Your second query will return any arbitrary pair of nodes labeled :typeA that aren't transitively related by [:is_a]. What you want is to filter on a path predicate. For the general case try
MATCH (a)
WHERE NOT ()-->a
This translates roughly "any node that does not have incoming relationships". You can specify the pattern with types, properties or labels as needed, for instance
MATCH (a:type_A)
WHERE NOT ()-[:is_a]->a
If you want to find all nodes that have no incoming relationships, you can find them using OPTIONAL MATCH:
START n=node(*)
OPTIONAL MATCH n<-[r]-()
WITH n,r
WHERE r IS NULL
RETURN n

Resources