Neo4j Matching Variable number of relationships from previous query - neo4j

The code on the Neo4j docs shows the following for matching against a variable number of relationships.
MATCH (charlie { name: 'Charlie Sheen' })-[:ACTED_IN*1..3]-(movie:Movie)
RETURN movie.title
I am trying to do this based on a previous query using a WITH statement but cannot find the right syntax. Everything I try yields an error. I am aiming for something of the form:
MATCH p = ...
...
WITH length(p) AS len_p
MATCH (charlie { name: 'Charlie Sheen' })-[:ACTED_IN*len_p..len_p]-(movie:Movie)
RETURN movie.title
However, this syntax yields Neo.ClientError.Statement.SyntaxError.
What is the recommended way to do this?

[EDITED]
You can use the apoc.path.expandConfig procedure to perform variable-length relationship queries with dynamic bounds.
For example (to get paths of length len_p consisting of ACTED_IN relationships ending in a Movie node):
...
WITH length(p) AS len_p
MATCH (charlie:Person {name: 'Charlie Sheen'})
CALL apoc.path.expandConfig(charlie, {
relationshipFilter: 'ACTED_IN',
labelFilter: '>Movie',
minLevel: len_p,
maxLevel: len_p
}) YIELD path
RETURN LAST(NODES(path)).title AS title

Related

Return non-matching nodes while creating relationships Neo4j Cypher

I wrote a script to to batch create a bunch of relationship in neo4j. Here is the cypher:
:param batch => [{startId: 'abc123', endId: 'abc321'}, {startId: 'abc456', endId: 'abc654']
UNWIND $batch as row
MATCH (from {id: row.startId}
MATCH (to {id: row.endId}
CREATE (from)-[rel:HAS]->(to)
RETURN rel
The problem that there might be some startId/endId entries that don't match any nodes and are silently ignore. Is there a way to return the list of rows that don't match any nodes and create the relationship for the nodes that do match?
I tried OPTIONAL MATCH to fail-fast as soon an id doesn't find a startId/endId however, the query execution was really slow.
First of all, you should always try to specify a label for the node that is used to kick off a MATCH (unless the MATCH pattern uses any already-bound nodes). Otherwise, every single node in the DB must be scanned. In addition, you should consider using indexes to speed up your MATCHs (but, again, you'd need to specify the labels).
Here is a query that uses the APOC procedure apoc.do.when to create a new relationship when appropriate. It returns each row and the corresponding new relationship (or NULL if either node is not found):
UNWIND $batch as row
OPTIONAL MATCH (from:Foo {id: row.startId})
OPTIONAL MATCH (to:Foo {id: row.endId})
CALL apoc.do.when(
from IS NOT NULL AND to IS NOT NULL,
'CREATE (from)-[rel:HAS]->(to) RETURN rel',
'RETURN NULL AS rel',
{from: from, to: to}) YIELD value
RETURN row, value.rel AS rel

Cypher merge nodes with same property and collected the other property

I have nodes with this structure
(g:Giocatore { nome, match, nazionale})
(nome:'Del Piero', match:'45343', nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
(nome:'Del Piero', match:'18235', nazionale:'ITA')
The property 'match' is unique (ID's of match) while there are several 'nome' with the same name.
I want to merge all the nodes with the same 'nome' and create a collection of different 'match' like this
(nome:'Del Piero', match:[45343,18235], nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
I tried with apoc library too but nothing works.
Any idea?
Can you try this query :
MATCH (n:Giocatore)
WITH n.nome AS nome, collect(n) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.match) AS matches
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches
Here I'm using APOC to merge the nodes, but then I do a map transformation on the node list to have an array of match, and I set it on the merged node.
I don't know if you have a lot of Giocatore nodes, so perhaps this query will do an OutOfMemory exception, so you will have to batch your query. You can for example replace the first line by MATCH (n:Giocatore) WHERE n.nome STARTS WITH 'A' and repeat it for each letter or you can also use the apoc.periodic.iterate procedure :
CALL apoc.periodic.iterate(
'MATCH (n:Giocatore) WITH n.nome AS nome, collect(n) AS node2Merge RETURN node2Merge, extract(x IN node2Merge | x.match) AS matches',
'CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches',
{batchSize:1000,parallel:true,retries:3,iterateList:true}
) YIELD batches, total

Cypher query to find nodes that are not related to other node by property

Consider the following DB structure:
For your convenience, you can create it using:
create (p1:Person {name: "p1"}),(p2:Person {name: "p2"}),(p3:Person {name: "p3"}),(e1:Expertise {title: "Exp1"}),(e2:Expertise {title: "Exp2"}),(e3:Expertise {title: "Exp3"}),(p1)-[r1:Expert]->(e1),(p1)-[r2:Expert]->(e2),(p2)-[r3:Expert]->(e2),(p3)-[r4:Expert]->(e3),(p2)-[r5:Expert]->(e3)
I want to be able to find all Person nodes that are not related to a specific Expertise node, e.g. "Exp2"
I tried
MATCH (p:Person)--(e:Expertise)
WHERE NOT (e.title = "Exp2")
RETURN p
But it returns all the Person nodes (while I expected it to return only p3).
Logically, this result makes sense because each of these nodes is related to at least one Expertise that is not Exp2.
But what I want is to find all the Person nodes that are not related to Exp2, even if they are related to other nodes as well.
How can this be done?
Edit
It appears that I wasn't clear on the requirements. This is a (very) simplified way of presenting my problem with a much more complicated DB.
Consider the possibility that Expertise has more properties which I would like to use in the same query (not necessarily with negation). For example:
MATCH (p)--(e)
WHERE e.someProp > 5 AND e.anotherProp = "cookie" AND NOT e.title = "Exp2"
UPDATE
You need to restrict it a bit more, meaning to only the person
MATCH (p:Person), (e:Expertise {title="Exp2"})
WHERE NOT (p)-[]->(e)
RETURN p
I think you will be just fine with the <> operator :
MATCH (p:Person)--(e:Expertise)
WHERE e.title <> "Exp2"
RETURN p
Or you can express it in a pattern :
MATCH (p:Person)
WHERE NOT EXISTS((p)--(e:Expertise {title:"Exp2"}))
RETURN p
Little change query from #ChristopheWillemsen:
MATCH (e:Expertise) WHERE e.someProperty > 5 AND NOT e.title = someValue
WITH collect(e) as es
MATCH (p:Person) WHERE all(e in es WHERE NOT Exists( (p)--(e) ) )
RETURN p
UPDATE:
// Collect the `Expertise` for which the following conditions:
MATCH (e:Expertise) WHERE e.num > 3 AND e.title = 'Exp2'
WITH collect(e) as es
// Select the users who do not connect with any of of expertise from `es` set:
OPTIONAL MATCH (p:Person) WHERE all(e in es WHERE NOT Exists( (p)--(e) ) )
RETURN es, collect(p)
Another query with some optimization:
// Get the set of `Expertise-node` for which the following conditions:
MATCH (e:Expertise) WHERE e.num > 3 AND e.title = 'Exp2'
// Collect all `Person-node` connected to node from the `Expertise-node` set:
OPTIONAL MATCH (e)--(p:Person)
WITH collect(e) as es, collect(distinct id(p)) as eps
//Get all `Person-node` not in `eps` set:
OPTIONAL MATCH (p:Person) WHERE NOT id(p) IN eps
RETURN es, collect(p)

How to query for multiple OR'ed Neo4j paths?

Anyone know of a fast way to query multiple paths in Neo4j ?
Lets say I have movie nodes that can have a type that I want to match (this is psuedo-code)
MATCH
(m:Movie)<-[:TYPE]-(g:Genre { name:'action' })
OR
(m:Movie)<-[:TYPE]-(x:Genre)<-[:G_TYPE*1..3]-(g:Genre { name:'action' })
(m)-[:SUBGENRE]->(sg:SubGenre {name: 'comedy'})
OR
(m)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
The problem is, the first "m:Movie" nodes to be matched must match one of the paths specified, and the second SubGenre is depenedent on the first match.
I can make a query that works using MATCH and WHERE, but its really slow (30 seconds with a small 20MB dataset).
The problem is, I don't know how to OR match in Neo4j with other OR matches hanging off of the first results.
If I use WHERE, then I have to declare all the nodes used in any of the statements, in the initial MATCH which makes the query slow (since you cannot introduce new nodes in a WHERE)
Anyone know an elegant way to solve this ?? Thanks !
You can try a variable length path with a minimal length of 0:
MATCH
(m:Movie)<-[:TYPE|:SUBGENRE*0..4]-(g)
WHERE g:Genre and g.name = 'action' OR g:SubGenre and g.name='comedy'
For the query to use an index to find your genre / subgenre I recommend a UNION query though.
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' })
RETURN distinct m
UNION
(m:Movie)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
RETURN distinct m
Perhaps the OPTIONAL MATCH clause might help here. OPTIONAL MATCH beavior is similar to the MATCH statement, except that instead of an all-or-none pattern matching approach, any elements of the pattern that do not match the pattern specific in the statement are bound to null.
For example, to match on a movie, its genre and a possible sub-genre:
OPTIONAL MATCH (m:Movie)-[:IS_GENRE]->(g:Genre)<-[:IS_SUBGENRE]-(sub:Genre)
WHERE m.title = "The Matrix"
RETURN m, g, sub
This will return the movie node, the genre node and if it exists, the sub-genre. If there is no sub-genre then it will return null for sub. You can use variable length paths as you have above as well with OPTIONAL MATCH.
[EDITED]
The following MATCH clause should be equivalent to your pseudocode. There is also a USING INDEX clause that assumes you have first created an index on :SubGenre(name), for efficiency. (You could use an index on :Genre(name) instead, if Genre nodes are more numerous than SubGenre nodes.)
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' }),
(m)-[:SUBGENRE]->()<-[:SUB_TYPE*0..3]-(sg:SubGenre { name: 'comedy' })
USING INDEX sg:SubGenre(name)
Here is a console that shows the results for some sample data.

Get nodes from LENGTH(r)

This code
MATCH (n { name: 'Create node' })<-[r*]-(s { name: ';' })
WITH n,s, LENGTH(r) AS depth
RETURN n,s, depth
will return the number of relationships between first and last nodes. Is it possible to get the nodes that are in between those relationships?
Bonus question: is it possible to get them in order?
http://console.neo4j.org/r/z1iafh
(this code does not work in console, only on localhost. query to create nodes
create
(_0 {name:"CREATE"}),
(_1 {name:"("}),
(_2 {name:"node_name"}),
(_3 {name:")"}),
(_4 {name:";"}),
_1-[:CREATE_NODE_COMMAND]->_0,
_2-[:CREATE_NODE_COMMAND]->_1,
_3-[:CREATE_NODE_COMMAND]->_2,
_4-[:CREATE_NODE_COMMAND]->_3
)
You could augment your match statement to match the entire path from n to s and then you could use the nodes function on the path to return the collection of nodes in order (from n to s). If you want just the nodes between the start and end nodes you could return the collection form the second to the second last only.
MATCH p=(n { name: 'Create node' })<-[r*]-(s { name: ';' })
WITH n,s, size(r) as depth, length(p) as depth2, nodes(p) as nodes
RETURN n,s, depth, depth2, nodes[1..length(nodes)-1]
size() can be used to return the number of elements in a collection whereas length() should only be used to return the length of a path or a string. Its use on other objects (collections and patterns) may be deprecated in future neo4j versions; currently supported for backwards compatibility.

Resources