How to avoid cycle in neo4j cypher queries - neo4j

I have friend-friend data model which has two relationships between any two friend nodes based on how one friend defines the other friend.
For example, User "A" can define user "B" as 'FRIEND' and "B" can define "A" as 'BUDDY'.
The problems is, when I try to get the 3rd degree of relationship of user "A", it returns user "B", where as the actual result should be "D" only.
MATCH(a:Users {first_name : "A"}) -[:BUDDY|FRIEND*3] -> (b)
RETURN a,b
OR
MATCH (a)-[]-(b)-[]-(c)-[]-(d)
WHERE a.first_name="A"
RETURN a,d

Alternatively, you can do this:
MATCH p=((a:Users {first_name : "A"})-[:BUDDY|FRIEND*3]->(b))
WITH DISTINCT a, b, nodes(p) as nodes
UNWIND nodes AS node
WITH a, b, nodes, COLLECT(DISTINCT node) as distinct_nodes
WITH a, b WHERE SIZE(nodes)=SIZE(distinct_nodes)
RETURN a, b
or a bit easier with an APOC call:
MATCH p=((a:Users {first_name : "A"})-[:BUDDY|FRIEND*3]->(b))
WITH DISTINCT a, b WHERE SIZE(nodes(p)) = SIZE(apoc.coll.toSet(nodes(p)))
RETURN a, b

I'd suggest the APOC Path Expander procedures which use a means of expansion that only ever consider a single path to a node, allow for specification of the max and min depth, take relationship filters, and set whether visiting a node more than once is permitted. Specifically, the apoc.path.expandConfig() procedure should meet your needs.
MATCH (a:Users {first_name: "A"})
CALL apoc.path.expandConfig(a, {relationshipFilter:"BUDDY|FRIEND",minLevel:3,maxLevel:3, bfs:true,uniqueness:"NODE_GLOBAL"}) YIELD path
RETURN a, path
The uniqueness:"NODE_GLOBAL" parameter makes sure no node is visited more than once.

Related

neo4j Cypher -- traverse variable length path, but stop when a label is found

Suppose my graph database has a 'flow' of foo nodes. in between each foo nodes might be any number of bar, bar1, bar2, ...barN nodes which ultimately connect to the next foo node.
So, all of these are possible
(a:foo)-->(:bar)-->(b:foo)
(b:foo)-->(:bar)-->(c:foo)-->(:bar1)-->(d:foo)
(a:foo)-->(:bar)-->(:bar1)-->(:bar2)-->(:barN)-->(c:foo)
etc.
I'd like to return each distinct pair of foo nodes which do NOT have any other foo nodes between them
For the above examples, the solution should return:
a, b
b, c
c, d
a, c
Solution should NOT include the following, which have foo nodes between them:
b, d
a, d
What I've tried. this returns all foo pairs that connect, regardless of what's in between.
MATCH x=(a:foo)-[:RELTYPE*1..]->(b:foo)
RETURN a,b
This should work:
MATCH x = (a:foo)-[:RELTYPE*..]->(b:foo)
WHERE NONE(n IN NODES(x)[1..-1] WHERE ANY(l IN LABELS(n) WHERE l = 'foo'))
RETURN a, b
[UPDATE]
Or even better:
MATCH x = (a:foo)-[:RELTYPE*..]->(b:foo)
WHERE NONE(n IN NODES(x)[1..-1] WHERE n:foo)
RETURN a, b
You can also leverage APOC Procedures for path expansion procs which can handle this kind of use case.
Using this graph:
CREATE (a:foo {name:'a'}), (b:foo {name:'b'}), (c:foo {name:'c'}), (d:foo {name:'d'}),
(a)-[:RELTYPE]->(:bar)-[:RELTYPE]->(b),
(b)-[:RELTYPE]->(:bar)-[:RELTYPE]->(c)-[:RELTYPE]->(:bar1)-[:RELTYPE]->(d),
(a)-[:RELTYPE]->(:bar)-[:RELTYPE]->(:bar1)-[:RELTYPE]->(:bar2)-[:RELTYPE]->(:barN)-[:RELTYPE]->(c)
We can apply this query:
MATCH (start:foo)
CALL apoc.path.subgraphNodes(start, {relationshipFilter:'RELTYPE>', labelFilter:'/foo'}) YIELD node as end
RETURN start, end
This starts at every :foo node, traverses only outgoing :RELTYPE relationships, and will terminate expansion when reaching a :foo labeled node (the / before the 'foo' in the label filter indicates that this is a termination filter on the label).

Neo4J Cypher - Unwinding nodes from a path variable

I have seen examples of unwinding lists, but not of unwinding a list of paths. How can I find, for example, all of the shortest paths between one type of node and another, and return or get the nodes that are found, in this example the nodes specifically being b.
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
UNWIND nodes(p) ... //get all of the b nodes
RETURN b
Note: I would like to use b within the query for another purpose (omitted), and therefore need to unwind the path into a list of b nodes.
After matching all shortest paths, if you just want the b nodes as a result, you may simply RETURN b. I don't believe you need to UNWIND it, since b is clearly identified in your MATCH
Edit:
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
WITH collect(b) as bees
UNWIND bees as b
//do something
return b
It seems like you just want to see all person nodes that have an incoming PARENT_OF node from another person node. If so, this should work:
MATCH ()-[:PARENT_OF]->(b:person)
RETURN DISTINCT b;
MATCH p = allShortestPaths((a: person)-[:PARENT_OF]-(b: person))
with nodes(p) as nodes
with nodes[size(nodes)-1] as b
return b

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Filtering out nodes on two cypher paths

I have a simplified Neo4j graph (old version 2.x) as the image with 'defines' and 'same' edges. Assume the number on the define edge is a property on the edge
The queries I would like to run are:
1) Find nodes defined by both A and B -- Requried result: C, C, D
START A=node(885), B=node(996) MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
Above works and returns C and D. But I want C twice since its defined twice. But without the distinct on x, it returns all the paths from A to B.
2)Find nodes that are NOT (defined by both A,B OR are defined by both A,B but connected via a same edge) -- Required result: G
Something like:
R1: MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
R2: MATCH (A-[:define]->(e)-(:similar)-(f)<-[:define]-B) RETURN e,f
(Nodes defined by A - (R1+R2) )
3) Find 'middle' nodes that do not have matching calls from both A and B --Required result: C,G
I want to output C due to the 1 define(either 45/46) that does not have a matching define from B.
Also output G because there's no define to G from B.
Appreciate any help on this!
Your syntax is a bit strange to me, so I'm going to assume you're using an older version of Neo4j. We should be able to use the same approaches, though.
For #1, Your proposed match without distinct really should be working. The only thing I can see is adding missing parenthesis around A and B node variables.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
RETURN x
Also, I'm not sure what you mean by "returns all paths from A to B." Can you clarify that, and provide an example of the output?
As for #2, we'll need several several parts to this query, separating them with WITH accordingly.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
WITH A, B, COLLECT(DISTINCT x) as exceptions
OPTIONAL MATCH (A)-[:define]->(x)-[:same]-(y)<-[:define]-(B)
WHERE x NOT IN exceptions AND y NOT IN exceptions
WITH A, B, exceptions + COLLECT(DISTINCT x) + COLLECT(DISTINCT y) as allExceptions
MATCH (aNode)
WHERE aNode NOT IN allExceptions AND aNode <> A AND aNode <> B
RETURN aNode
Also, you should really be using labels on your nodes. The final match will match all nodes in your graph and will have to filter down otherwise.
EDIT
Regarding your #3 requirement, the SIZE() function will be very helpful here, as you can get the size of a pattern match, and it will tell you the number of occurrences of that pattern.
The approach on this query is to first get the collection of nodes defined by A or B, then filter down to the nodes where the number of :defines relationships from A are not equal to the number of :defines relationships from B.
While we would like to use something like a UNION WITH in order to get the union of nodes defined by A and union it with the nodes defined by B, Neo4j's UNION support is weak right now, as it doesn't let you do any additional operations after the UNION happens, so instead we have to resort to adding both sets of nodes into the same collection then unwinding them back into rows.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)
WITH A, B, COLLECT(x) as middleNodes
MATCH (B)-[:define]->(x)
WITH A, B, middleNodes + COLLECT(x) as allMiddles
UNWIND allMiddles as middle
WITH DISTINCT A, B, middle
WHERE SIZE((A)-[:define]->(middle)) <> SIZE((B)-[:define]->(middle))
RETURN middle

cypher query to return or keep only the final sequence when variable length relationship identifiers are used

Is there a way to keep or return only the final full sequences of nodes instead of all subpaths when variable length identifiers are used in order to do further operations on each of the final full sequence path.
MATCH path = (S:Person)-[rels:NEXT*]->(E:Person)................
eg: find all sequences of nodes with their names in the given list , say ['graph','server','db'] with same 'seqid' property exists in the relationship in between.
i.e.
(graph)->(server)-(db) with same seqid :1
(graph)->(db)->(server) with same seqid :1 //there can be another matching
sequence with same seqid
(graph)->(db)->(server) with same seqid :2
Is there a way to keep only the final sequence of nodes say ' (graph)->(server)->(db)' for each sequences instead of each of the subpath of a large sequence like (graph)->(server) or (server)->(db)
pls help me to solve this.........
(I am using neo4j 2.3.6 community edition via java api in embedded mode..)
What we could really use here is a longestSequences() function that would do exactly what you want it to do, expand the pattern such that a and b would always be matched to start and end points in the sequence such that the pattern is not a subset of any other matched pattern.
I created a feature request on neo4j for exactly this: https://github.com/neo4j/neo4j/issues/7760
And until that gets implemented, we'll have to make do with some alternate approach. I think what we'll have to do is add additional matching to restrict a and b to start and end nodes of full sequences.
Here's my proposed query:
WITH ['graph', 'server' ,'db'] as names
MATCH p=(a)-[rels:NEXT*]->(b)
WHERE ALL(n in nodes(p) WHERE n.name in names)
AND ALL( r in rels WHERE rels[0]['seqid'] = r.seqid )
WITH names, p, a, rels, b
// check if b is a subsequence node instead of an end node
OPTIONAL MATCH (b)-[rel:NEXT]->(c)
WHERE c.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where b is a subsequence node
WITH names, p, a, rels, b, c
WHERE c IS NULL
WITH names, p, a, rels, b
// check if a is a subsequence node instead of a start node
OPTIONAL MATCH (d)-[rel:NEXT]->(a)
WHERE d.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where a is a subsequence node
WITH p, a, b, d
WHERE d IS NULL
RETURN p, a as startNode, b as endNode
MATCH (S:Person)-[r:NEXT]->(:Person)
// Possible starting node
WHERE NOT ( (:Person)-[:NEXT {seqid: r.seqid}]->(S) )
WITH S,
// Collect all possible values of `seqid`
collect (distinct r.seqid) as seqids
UNWIND seqids as seqid
// Possible terminal node
MATCH (:Person)-[r:NEXT {seqid: seqid}]->(E:Person)
WHERE NOT ( (E)-[:NEXT {seqid: seqid}]->(:Person) )
WITH S,
seqid,
collect(distinct E) as ES
UNWIND ES as E
MATCH path = (S)-[rels:NEXT* {seqid: seqid}]->(E)
RETURN S,
seqid,
path
[EDITED]
This query might do what you want:
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN path;
It first identifies all Person nodes (p1) with at least one outgoing NEXT relationship that have no incoming NEXT relationships (with the same seqid), and their distinct outgoing seqid values. Then it finds all "complete" paths (i.e., paths whose start and end nodes have no incoming or outgoing NEXT relationships with the desired seqid, respectively) starting at each p1 node and having relationships all sharing the same seqid. Finally, it returns each complete path.
If you just want to get the name property of all the Person nodes in each path, try this query (with a different RETURN clause):
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN EXTRACT(n IN NODES(path) | n.name);

Resources