Neo4j Cypher - merge columns and get distinct values from all

Neo4j Cypher - merge columns and get distinct values from all - neo4j

I've got a complex query that I'm trying to do with OPTIONAL MATCH statements. It looks like this:
MATCH (p:Person {name:'Victoria'})
OPTIONAL MATCH (p)-[:MANAGES]->(:Office)<-[MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(target1)
OPTIONAL MATCH (p)-[:SUPPORTS]->(:Office)<-[MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(target2)
OPTIONAL MATCH (p)-[:ASSISTS]->(:Person)-[*0..1]->(:Group)<--(target3)
OPTIONAL MATCH (p)-->(:Group)<--(target4)
RETURN DISTINCT target1,target2,target3,target4
What I want to do is get the results as if they were a single column called target instead of getting target1, target2, target3, and target4 back as separate columns.
Is there a way to collect/unwind the four potential target columns to return them as a single column result set?
I know I could get the desired result using a UNION of four separate queries that return a value called target, but I was wondering if there was a better way.
Thanks.

Yes, this is possible, and you're on the right track. Here's an example using collect and unwind to do this:
MATCH (p:Person {name:'Victoria'})
OPTIONAL MATCH (p)-[:MANAGES]->(:Office)<-[:MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(target1)
WITH p, COLLECT(target1) as target1s
OPTIONAL MATCH (p)-[:SUPPORTS]->(:Office)<-[:MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(target2)
WITH p, target1s, COLLECT(target2) as target2s
OPTIONAL MATCH (p)-[:ASSISTS]->(:Person)-[*0..1]->(:Group)<--(target3)
WITH p, target1s, target2s, COLLECT(target3) as target3s
OPTIONAL MATCH (p)-->(:Group)<--(target4)
WITH target1s + target2s + target3s + COLLECT(target4) AS targets
UNWIND targets AS target
RETURN DISTINCT target
Note that you should be able to collapse the :MANAGES and :SUPPORTS optional matches into a single [:MANAGES|SUPPORTS] relationship, as the rest of the path is the same.

Related

Neo4j Cypher query - collect elements from 2 different variables to single list

I have a following part of Cypher query:
MATCH (ch:Characteristic) WHERE id(ch) = {characteristicId} WITH ch OPTIONAL MATCH (ch)<-[:SET_ON]-(v:Value)...
first of all I'm looking for (ch:Characteristic) by characteristicId and then applying required logic for this variable at the rest of my query.
my Characteristic can also have(or not) a child Characteristic nodes, like:
(ch:Characteristic)-[:CONTAINS]->(childCh)
Please help to extend my query in order to collect ch and childCh into a list of Characteristic thus I'll be able at the rest of my query to apply required logic to all Characteristic at this list.
UPDATED - possible solution #2
This is my current working query:
MATCH (chparent:Characteristic)
WHERE id(chparent) = {characteristicId}
OPTIONAL MATCH (chparent)-[:CONTAINS*]->(chchild:Characteristic)
WITH chparent, collect(distinct(chchild)) as childs
WITH childs + chparent as nodes
UNWIND nodes as ch
OPTIONAL MATCH (ch)<-[:SET_ON]-(v:Value)-[:SET_FOR]->(Decision)
OPTIONAL MATCH (v)-[:CONTAINS]->(vE) OPTIONAL MATCH (vE)-[:CONTAINS]->(vEE)
OPTIONAL MATCH (ch)-[:CONTAINS]->(cho:CharacteristicOption)
OPTIONAL MATCH (cho)-[:CONTAINS]->(choE) OPTIONAL MATCH (ch)-[:CONTAINS]->(chE)
DETACH DELETE choE, cho, ch, vEE, vE, v, chE
This is an attempt to simplify the query above:
MATCH (ch:Characteristic)
WHERE (:Characteristic {id: {characteristicId}})-[:CONTAINS*]->(ch)
OPTIONAL MATCH (ch)<-[:SET_ON]-(v:Value)-[:SET_FOR]->(Decision)
OPTIONAL MATCH (v)-[:CONTAINS]->(vE)
OPTIONAL MATCH (vE)-[:CONTAINS]->(vEE)
OPTIONAL MATCH (ch)-[:CONTAINS]->(cho:CharacteristicOption)
OPTIONAL MATCH (cho)-[:CONTAINS]->(choE)
OPTIONAL MATCH (ch)-[:CONTAINS]->(chE)
DETACH DELETE choE, cho, ch, vEE, vE, v, chE
but this query doesn't delete required Characteristic nodes and my tests fail. What am I doing wrong at the last query ?

You can try something like this with apoc:
MATCH (chparent:Characteristic {characteristicId: <someid>})
OPTIONAL MATCH (chparent)-[:CONTAINS]->(chchild:Characteristic)
WITH apoc.coll.union(chparent,chchild) as distinctList
...
With pure cypher you can try something like this:
MATCH (chparent:Characteristic {characteristicId: <someid>})
OPTIONAL MATCH (chparent)-[:CONTAINS]->(chchild:Characteristic)
WITH chparent,collect(distinct(chchild)) as childs
WITH chparent + childs as list
....
Not really sure if you need distinct in collect, but I added just so you know you can do this to filter out duplicates.

You can actually do this easily by using a variable-length relationship match of 0..1, as it will let you match on your root :Characteristic node and any of its children.
MATCH (chparent:Characteristic)-[:CONTAINS*0..1]->(ch:Characteristic)
WHERE id(chparent) = {characteristicId}
// ch contains both the parent and children, no need for a list
...

A more simplified query.
MATCH (c:Characteristics) WHERE (:Characteristics {id: 123})-[:CONTAINS*0..1]->(c) return c;
Matches all Characteristics including the root node that (optionally) have incoming relationships of type CONTAINS from the node specified with id 123.

I assume all the children of Characteristic will also have the label Characteristic. Another assumption I made is, you need characteristicId which is defined by you, not the internal id defined by neo4J. id(ch) fetching the internal id instead of user defined ID. You might want to pass the characteristicId variable like I gave here.
MATCH (chparent:Characteristic {characteristicId: <someid>})
WITH chparent
OPTIONAL MATCH (chparent)-[:CONTAINS]->(chchild:Characteristic)
WITH chchild
<your operation>

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?

You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.

It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Reusing path in multiple MATCH UNION queries cypher neo4j

I'd like to pull and combine data from several different paths that share a path at the beginning, not all of which might exist. For example, I'd like to do something like this:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (s)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
so that it doesn't retrace the path up to s. I can't actually do this, though, because UNION separates queries and the (s)-[:OPTIONAL] in the second part will match anything with an outgoing OPTIONAL relation; the s is a loose handle.
Is there a better way of doing this than repeating the path:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
I made a few attempts using WITH, but they all either caused the query to return nothing if any part failed, or I could not get them to line up into a single column and instead got rows with redundant data, or (with multiple, nested WITHs, which I'm not sure about the scoping of) just fetching everything.

Have you looked at the semantics of an optional match? So you can match to s, beyond s and your optional component. Something like:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN data.attribute, otherData.attribute
Sorry I missed the importance of a single column, is it really important?
You can gather the vaues into a single collection :
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN [data.attribute] + COLLECT(otherData.attribute)
But doesn't this work for a single column:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
WITH [data.attribute] + COLLECT(otherData.attribute) as col
RETURN UNWIND col AS val

Difference between a single (OPTIONAL) MATCH clause and multiple

What is the difference between
OPTIONAL MATCH clauseA, clauseB
and
OPTIONAL MATCH clauseA
OPTIONAL MATCH clauseB
I get different behavior depending on which form I use.
For example:
START n=node(111)
OPTIONAL MATCH n<-[links_n_in]-(n_from),n-[links_n_out]->(n_to)
RETURN n,COLLECT(n_from) AS n_from,COLLECT(links_n_in) AS links_n_in,COLLECT(n_to) AS n_to,COLLECT(links_n_out) AS links_n_out
which is designed to return a node; it's incoming relationships and from nodes; it's outgoing relationship and to nodes.
I have a test graph consisting of Node 111 which has 4 outgoing relationships each of which points to the same Node (I have other test cases in which 111 points to different Nodes). Executing the query as above returns only Node 111 in column 'n'. The columns for 'n_from', 'links_n_in', 'n_to', 'links_n_out' are empty.
If I modify the query to:
START n=node(111)
OPTIONAL MATCH n<-[links_n_in]-(n_from)
OPTIONAL MATCH n-[links_n_out]->(n_to)
RETURN n,COLLECT(n_from) AS n_from,COLLECT(links_n_in) AS links_n_in,COLLECT(n_to) AS n_to,COLLECT(links_n_out) AS links_n_out
then the n_to and link_n_out columns are populated as expected.

The first form treats it as a single extended pattern that must match entirely.
The second form treats them as distinct optional patterns, and can match the two separately.
So your results make sense, when you think about what it's doing--if the whole OPTIONAL MATCH pattern isn't found, it doesn't match any of the OPTIONAL MATCH pattern.

Difference between START n = node(*) and MATCH (n)

In Neo4j 2.0 this query:
MATCH (n) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
returns different results than this query:
START n = node(*) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
The first query works as expected returning a single Node for 'blevine' and the associated Nodes mentioned in the OPTIONAL MATCH clauses. The second query returns many more Nodes which do not even have a username property. I realize that start n = node(*) is not recommended and that START is not even required in 2.0. But the second form (with OPTIONAL MATCH replaced with question marks on the relationship type) worked prior to 2.0. In the second form, why is 'n' not being constrained to the single 'blevine' node by the first WHERE clause?

To run the second query as expected you would just need to add WITH n. In your query you would need to filter the result and pass it for optional match which is to be done using WITH
START n = node(*) WHERE n.username = 'blevine'
WITH n
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
From the documentation
WHERE defines the MATCH patterns in more detail. The predicates are part of the
pattern description, not a filter applied after the matching is done.
This means that WHERE should always be put together with the MATCH clause it belongs to.
when you do start n=node(*) where n.name="xyz" you need to pass the result explicitly into your next optional matches. But when you do MATCH (n) WHERE n.name="xyz" this tells graph specifically what node to start looking into.
EDIT
Here is the thing. The documentation says Optional Match returns null if a pattern is not found so in your first case, it includes all those results too where n.username property is null or cases where n doesnt even have a relationship suggested in the OPTIONAL MATCH pattern. So when you do a WITH n , the graph is explicitly told to use only n.
Excerpt from the documentation (link : here)
OPTIONAL MATCH matches patterns against your graph database, just like MATCH does.
The difference is that if no matches are found, OPTIONAL MATCH will use NULLs for
missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher
equivalent of the outer join in SQL.
Either the whole pattern is matched, or nothing is matched. Remember that
WHERE is part of the pattern description, and the predicates will be
considered while looking for matches, not after. This matters especially
in the case of multiple (OPTIONAL) MATCH clauses, where it is crucial to
put WHERE together with the MATCH it belongs to.
Also few more things to note about the behaviour of WHERE clause: here
Excerpts:
WHERE is not a clause in it’s own right — rather, it’s part of MATCH,
OPTIONAL MATCH, START and WITH.
In the case of WITH and START, WHERE simply filters the results.
For MATCH and OPTIONAL MATCH on the other hand, WHERE adds constraints
to the patterns described. It should not be seen as a filter after the
matching is finished.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart