Neo4j mixing nodes with and without extended relationships - neo4j

I have a simple social stream where few nodes and relationships look like the following:
row 1: (a) -> (stream) -> (d)
row 2: (a) -> (stream) -> (d) -> (source)
Basically I want to pull both rows but put some restrictions on the type of source, for example I currently use this:
MATCH (a)-[]->(stream)-[]->(d)
OPTIONAL MATCH (d)-[]->(source)
WHERE source.x = 3
RETURN stream, d, source
This works well. Almost. When source.x = 3, I get both the rows. Which is perfect. But when source.x != 3, I want the query to ignore the second row, but because of optional match the row still appears.
When source.x = 3
row 1. stream, d, null
row 2. stream, d, source
When source.x != 3
row 1. stream, d, null
row 2. stream, d, null
When source.x != 3, I want the query to ignore the second row. Because it contains the (source) node but not the one we want. The output should look like:
row 1. stream, d, null
Basically something like, if (source) does not exist show row.. if it does exist, show row only if source.x = 3
EDIT: As recommended in the comment, I have attached a simplified example.

You can pipe the results with WITH and a WHERE clause for filtering before returning them :
MATCH (me:User {name:'me'})-[:LIKES]->(transport)
OPTIONAL MATCH (transport)-[:HAS_DRIVER]->(driver:Driver {name:'Anne'})
WITH me, transport, driver
WHERE (NOT (transport)-[:HAS_DRIVER]->()) OR (NOT driver IS NULL)
RETURN me, transport, driver

Related

A method to sum all values in a returned column using Cypher in Neo4j

I have written the following Cypher query to get the frequency of a certain item from a set of orders.
MATCH (t:Trans)-[r:CONTAINS]->(i:Item)
WITH i,COUNT(*) AS CNT,size(collect(t)) as NumTransactions
RETURN i.ITEM_ID as item, NumTransactions, NumTransactions/CNT as support
I get a table like this as my output
Item NumTransactions Support
A 2 1
B 1132 1
C 2049 1
And so on. What I mean to do is divide each NumTransaction by the total number of transactions. i.e. the sum of the entire num transactions column, to get the support but instead it divides NumTransactions by itself. Can someone point me to the correct function if it exists or an approach to do so?
This should work:
MATCH (:Trans)-[:CONTAINS]->(i:Item)
WITH i, COUNT(*) as c
WITH COLLECT({i: i, c: c}) AS data
WITH data, REDUCE(s = 0.0, n IN data | s + n.c) AS total
UNWIND data AS d
RETURN d.i.ITEM_ID as item, d.c AS NumTransactions, d.c/total as support
By the way, SIZE(COUNT(t)) is inefficient, as it first creates a new collection of t nodes, gets its size, and then deletes the collection. COUNT(t) would have been more efficient.
Also, given your MATCH clause, as long as every t has at most a single CONTAINS relationship to a given i, COUNT(*) (which counts the number of result rows) would give you the same result as COUNT(t).

Optional match extends query

When I add an optional match to my already working query the selection is expanded.
I have a structure regarding players in games as follows
(player)-[got]->(result)-[in]->(game)
And as players oppose each other things will look like this in the final data
(player_1)-[got]->(result_1)-[in]->(game)<-[in]-(result_2)-[got]-(player_2)
Given a list of result_1 ids I try to find corresponding result_2
The basic query
MATCH (r:Result)-[:In]->(g:Game)<-[:In]-(or:Result)
WHERE r.id IN [30,32]
RETURN r, or, g, m
returns exactly what I expect:
(30)-(g1)-(or1)
(32)-(g2)-(or2)
But games can also be in an (optional) match and this query
MATCH (r:Result)-[:In]->(g:Game)<-[:In]-(or:Result)
OPTIONAL MATCH (g)<-[:Contains]-(m:Match)
WHERE r.id IN [30,32]
RETURN r, or, g, m
returns
(30)-(g1)-(or1)
(32)-(g2)-(or2)
(33)-(g3)-(or3)
(n)-(gn)-(orn)
Whatever else happens to match the structure p-r-g-r-p but with no regard to the list [30,32]
I do suspect that it has something to do with the mirrored nature of the data because if I remove r from the returned values I still get (30) and (32) back as or but I cannot figure out why and thus how to stop it.
I've tried to add a With before the optional but it makes no difference.
The WHERE clause modifies the immediately preceding [OPTIONAL] MATCH or WITH clause.
You need to move your WHERE clause so it is right after the initial MATCH, so that it will limit r as you intended. Like this:
MATCH (r:Result)-[:In]->(g:Game)<-[:In]-(or:Result)
WHERE r.id IN [30,32]
OPTIONAL MATCH (g)<-[:Contains]-(m:Match)
RETURN r, or, g, m

How to log when a relation already exist?

I have created a hierarchical tree to represent the organization chart of a company on Neo4j, which is like the picture below.
When I insert a lot of relation with a LOAD CSV, I use this request:
LOAD CSV WITH HEADERS FROM "file:///newRelation.csv" AS row
MERGE (a:Person {name:row.person1Name})
MERGE(b:Person {name:row.person2Name})
FOREACH (t in CASE WHEN NOT EXISTS((a)-[*]->(b)) THEN [1] ELSE [] END |
MERGE (a)-[pr:Manage]->(b) )
With this request, I only create the relationship if the two people do not already have a hierarchical relationship.
How to save (log) the list of relationships that are not created because the test below fail?
CASE WHEN NOT EXISTS((a)-[*]->(b)
You need to move the existence check to a level above the foreach:
LOAD CSV WITH HEADERS FROM "file:///newRelation.csv" AS row
MERGE (a:Person {name:row.person1Name})
MERGE(b:Person {name:row.person2Name})
WITH a, b, row,
CASE WHEN NOT exists((a)-[*]->(b)) THEN [1] ELSE [] END AS check
FOREACH (t IN check |
MERGE (a)-[pr:Manage]->(b)
)
WITH a, b, row, check WHERE size(check) = 0
RETURN a, b, row

Unexpected behavior combining collections in Cypher

Using http://console.neo4j.org as a sandbox, I have come across the following unexpected behavior:
Statement 1 - Returns 1 row with a collection containing Neo Node
MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
WITH c1+[] AS c2
RETURN c2
Statement 2 - Returns 0 rows (unexpected)
MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
MATCH (n:Crew)
WHERE n.name="NoOne"
WITH c1+COLLECT(n) AS c2
RETURN c2
Statement 3 - Returns 1 row containing an empty collection
MATCH (n:Crew)
WHERE n.name="NoOne"
WITH COLLECT(n) AS c1
RETURN c1
I fail to see why Statement 2 is not returning the same result as Statement 1, because it should return a collection containing the Neo node, just like in Statement 1.
Statement 3 shows that the second MATCH in Statement 2 should be resulting in an empty collection.
Is this behavior expected in Cypher? If that's the case, I'd be happy about a small explanation to help me understand this behavior.
I've run into this exact behavior before, and it is very frustrating. The issue is with the second MATCH clause in Query 2: if an existing result row (in this case, your single row with c1) doesn't return any results for a MATCH, that row will be dropped completely after that MATCH clause, even though that MATCH on its own (without the pre-existing result row) returns an empty collection. If you convert it to an OPTIONAL MATCH you'll be able to keep your result row when there are no matches.
UPDATE: See below for a more thorough analysis, but the tl,dr is that the second COLLECT(n) in Statement 2 does return an empty list, just like in Statement 3; however, the whole clause WITH c1+COLLECT(n) AS c2 returns no rows, because there are no rows with a c1 value after the second MATCH.
I can't quite think of the right explanation for why the 2nd query doesn't do what you expect, but if you have multiple optional matches that you want to chain together then you could use an OPTIONAL MATCH to do that:
OPTIONAL MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
OPTIONAL MATCH (n:Crew)
WHERE n.name="NoOne"
WITH c1+COLLECT(n) AS c2
RETURN c2

Role of variables in cypher match query

I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".
Specifically, I have a query
match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b
which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4.
Can I write the query without having to repeat the second match clause? Using
match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b
seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,
match (c)-[:st]->(b)
tries to find matches between ANY node of (c) and ANY node of (b), whereas
match (b)-[:st]->(b)
tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?
Thanx for any insight into the inner working ...
When you write the 2 MATCH statements
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.
To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.
If you do a single MATCH, you obviously have a single node.
Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:
1 2
1 (1, 1) (1, 2)
2 (2, 1) (2, 2)
whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:
1 2
1 (1, 1)
2 (2, 2)
which are not the interesting pairs, if you don't have self-relationships.
Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.

Resources