How do I make Neo4J continue after conditional create - neo4j

After having created a collection of nodes, some of the nodes should also have a relation attached based on a condition. In the example below the condition is simulated with WHERE n.number > 3 and the nodes are simple numbers:
WITH [2, 3, 4] as numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH collect(n) AS nodes
UNWIND nodes AS n
WITH nodes, n WHERE n.number > 3
CREATE (n)-[:IM_SPECIAL]->(n)
RETURN nodes
Which returns:
╒════════════════════════════════════════╕
│"nodes" │
╞════════════════════════════════════════╡
│[{"number":2},{"number":3},{"number":4}]│
└────────────────────────────────────────┘
Added 3 labels, created 3 nodes, set 3 properties, created 1 relationship, started streaming 1 records in less than 1 ms and completed after 1 ms.
My problem is that nothing is returned unless I have at least one of these "special" nodes that is caught by the filter. The problem can be simulated by changing the input numbers to [1, 2, 3] which returns an empty result (no nodes) even though the nodes are created (as they should):
<empty result>
Added 3 labels, created 3 nodes, set 3 properties, completed after 2 ms.
I might be approaching the problem totally wrong but I've exhausted my Google skills... what Neo4J Cypher magic am I missing?

The documentation about Conditional Cypher Execution - Using correlated subqueries in 4.1+ describes how to solve this without the need for Apoc:
WITH [2, 3, 4] AS numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH n
CALL {
WITH n
WITH n WHERE n.number > 3
CREATE (n)-[:IM_SPECIAL]->(n)
RETURN count(n)
}
RETURN collect(n) AS nodes
Thanks to Sanjay Singh and Jose Bacoy for putting me on the right track.

WITH nodes, n WHERE n.number > 3
Each clause of a Cypher query must yield a result for for the subsequent lines of the query to consume. The above line yields nothing if you start with [1,2,3].
For your purpose, this will work.
WITH [1,2,3,4] as numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH n
CALL apoc.do.when(n.number>3,
'CREATE (n)-[:IM_SPECIAL]->(n) RETURN n',
'RETURN n',
{n:n}
)
YIELD value as m
WITH collect(m) AS nodes
RETURN nodes

Related

Difficulty using UNWIND in Neo4j

I am very new to Neo4j, so this is probably a simple question.
I have several hundred nodes with a property "seq" (for sequence). This number basically represents the day of the month. So all of these several hundred nodes have a seq property between 1 and 31. I want to combine all the nodes with the same seq into a single node - so that all the nodes with seq = 1 are combined into a "January 1" node. All nodes with seq =2 are combined into a "January 2" node, etc. I have a property of "pat_id" that will be combined into an array from all the merged noes for a day.
Here is my code:
WITH range(1,31) as counts
UNWIND counts AS cnt
MATCH (n:OUTPT {seq:cnt})
WITH collect(n) AS nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {
pat_id:'combine',
seq:'discard'}, mergeRels:true})
YIELD node
RETURN node
I initially tried to do this with a FOREACH loop, but I can't do a MATCH inside a FOREACH.
I have been doing an UNWIND, but it is only merging the nodes with the first value (seq = 1). I assume this is because the RETURN statement ends the loop. But when I remove the RETURN statement, I get this error:
Query cannot conclude with CALL (must be RETURN or an update clause) (line 5, column 1 (offset: 99))
"CALL apoc.refactor.mergeNodes(nodes, {properties: {"
Any help would be appreciated.
The problem is with this line:
WITH collect(n) AS nodes
You've matched to all :OUTPT nodes with a sequence number within 1-31, but then you aggregate them into a single large collection, then merge them into a single node.
If you want to collect the nodes according to the sequence number, then the sequence number (in your case, cnt) needs to be the grouping key of the aggregation:
WITH cnt, collect(n) AS nodes
That will get you a row per distinct cnt value, with the list of nodes with the same count on the associated row.
Because Cypher operations execute per row, your APOC refactor call will execute per row. Because each row is associated with a different cnt value, and each has a different list, you will be performing the refactoring for each list separately.
The output will be one row per cnt value, with a single node per row (as a result of merging all the nodes in that row's list into a single node).

How to check if nodes are connect at all in Neo4j

I have a graph in Neo4j (first time using it) of about 10 different nodes that are connected in various ways. Not all nodes are connected to each other, as some have up to 6 or 7 neighbors, while some have only 1. What query would I write/use to check if a path exists from NodeA to NodeB? It doesn't have to be the shortest path, just if a path exists.
Along with this, is there a way to count who has the most or least neighbors? Thanks everyone for help in advance.
Return Foo nodes a and b if there is at least one path between them. (This variable-length path query with unbounded length could take a very long time or run out of memory if there are a lot of paths or very long paths).
MATCH (a:Foo {id: 'a'}), (b:Foo {id: 'b'})
WHERE (a)-[*]-(b)
RETURN a, b;
Return all paths between a and b. (This query could require even more time and memory than the previous query, since it will attempt to return all matching paths).
MATCH path=(a:Foo {id: 'a'})-[*]-(b:Foo {id: 'b'})
RETURN path;
Return the 10 nodes with the most neighbors, in descending order:
MATCH (n)--()
WITH n, COUNT(*) AS c
RETURN n
ORDER BY c DESC
LIMIT 10;

neo4j get random path from known node

I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
I try to get chain started with Rita Ora
She has relations with
Drake, Jay Z and Justin Bieber
Query takes random from these guys, for example Jay Z
Then Query takes relations of Jay Z: Karrine
Steffans, Rosario Dawson and Rita Ora
Query can't take Rita Ora cuz
she is already in chain, so it takes random from others two, for
example Rosario Dawson
...
And at the end we should have a chain Rita Ora - Jay Z - Rosario Dawson - other celeb - other celeb 2
Is that possible to do it by query?
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ... construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN the results.
Note. Instead of UNWIND and count(DISTINCT ...), getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.

Role of variables in cypher match query

I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".
Specifically, I have a query
match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b
which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4.
Can I write the query without having to repeat the second match clause? Using
match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b
seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,
match (c)-[:st]->(b)
tries to find matches between ANY node of (c) and ANY node of (b), whereas
match (b)-[:st]->(b)
tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?
Thanx for any insight into the inner working ...
When you write the 2 MATCH statements
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.
To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.
If you do a single MATCH, you obviously have a single node.
Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:
1 2
1 (1, 1) (1, 2)
2 (2, 1) (2, 2)
whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:
1 2
1 (1, 1)
2 (2, 2)
which are not the interesting pairs, if you don't have self-relationships.
Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.

Neo4j cyper query: How to get unique nodes for two depth with their depth value

I am using Cyper query in neo4j
My requirement is,
need to get two level unique(friends) and their shortest depth value.
Graph looks like,
a-[:frnd]->b, b-[:frnd]->a
b-[:frnd]->c, c-[:frnd]->b
c-[:frnd]->d, d-[:frnd]->c
a-[:frnd]->c, c-[:frnd]->a
I tried as,
START n=node(8) match p=n-[:frnd*1..2]->(x) return x.email, length(p)
My output is,
b 1 <--length(p)
a 2
c 2
c 1
d 2
a 2 and so on.
My required output,
My parent node(a) should not not be listed.
I need only (c) with shortest length 1
c with 2 should not be repeated.
Pls help me to solve this,.
(EDITED. Finding n via START n=node(8) causes problems with other variables later on. So, below we find n in the MATCH statement.)
MATCH p = shortestPath((n {email:"a"})-[:frnd*..2]->(x))
WHERE n <> x AND length(p) > 0
RETURN x.email, length(p)
ORDER BY length(p)
LIMIT 1
If there are multiple "closest friends", this returns one of them.
Also, the shortestPath() function does not support a minimal path length -- so "1..2" had be become "..2", and the WHERE clause needed to specify length(p) > 0.

Resources