Neo4j: Get all relationships between a set of nodes - neo4j

Let's say I have nodes A and B. I want to find all the paths that connect these two nodes. How would I do it?
Here's an illustration I made. Sorry, I know it sucks lol

You can use a variable length relationship to return all such paths.
For your example (which has relationships pointing in both directions), this query using an undirected variable length relationship should work:
MATCH p=(:Foo {id: 'A'})-[*]-(:Foo {id: 'B'})
RETURN p
Note, however, that variable length relationship without a reasonable upper bound can take virtually forever or run out of memory. So, depending on your DB characteristics, you should determine and use a reasonable upper bound. For example:
MATCH p=(:Foo {id: 'A'})-[*..7]-(:Foo {id: 'B'})
RETURN p
To improve performance, it is also generally helpful to specify the possible relationship types along the path, to avoid going down inappropriate paths, as in:
MATCH p=(:Foo {id: 'A'})-[:TYPE_A|TYPE_B|TYPE_C*..7]-(:Foo {id: 'B'})
RETURN p

Related

Neo4J Matching Nodes Based on Multiple Relationships

I had another thread about this where someone suggested to do
MATCH (p:Person {person_id: '123'})
WHERE ANY(x IN $names WHERE
EXISTS((p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(:Dias {group_name: x})))
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
This does what I need it to, returns nodes that fit the criteria without returning the relationships, but now I need to include another param that is a list.
....(:Dias {group_name: x, second_name: y}))
I'm unsure of the syntax.. here's what I tried
WHERE ANY(x IN $names and y IN $names_2 WHERE..
this gives me a syntax error :/
Since the ANY() function can only iterate over a single list, it would be difficult to continue to use that for iteration over 2 lists (but still possible, if you create a single list with all possible x/y combinations) AND also be efficient (since each combination would be tested separately).
However, the new existenial subquery synatx introduced in neo4j 4.0 will be very helpful for this use case (I assume the 2 lists are passed as the parameters names1 and names2):
MATCH (p:Person {person_id: '123'})
WHERE EXISTS {
MATCH (p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(d:Dias)
WHERE d.group_name IN $names1 AND d.second_name IN $names2
}
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
By the way, here are some more tips:
If it is possible to specify the direction of each relationship in your query, that would help to speed up the query.
If it is possible to remove any node labels from a (sub)query and still get the same results, that would also be faster. There is an exception, though: if the (sub)query has no variables that are already bound to a value, then you would normally want to specify the node label for the one node that would be used to kick off that (sub)query (you can do a PROFILE to see which node that would be).

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Two simple Cypher queries fail when combined

I'm stumped on this one, and I think the answer will be straightforward, so let me cut right to it.
Given a graph that looks like this:
Created by a query that looks like this:
CREATE (simpsons:Family {name: "Simpson"})
CREATE (homer:Father {name: "Homer"})
CREATE (lisa:Daughter {name: "Lisa"})
CREATE (snowball:Pet {name: "snowball"})
CREATE (lisa)-[:owns]->(snowball)-[:has]->(:Item {name: "catnip"})
CREATE (homer)-[:has]->(:Item {name: "beer"})
CREATE (lisa)-[:has]->(:Item {name: "saxophone"})
CREATE (lisa)<-[:memberOf]-(simpsons)-[:memberOf]->(homer)
Why would a query that looks like this fail?
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
Taken separately, its two components both find matches.
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"})
RETURN f;
and
MATCH (f:Family),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
But when pieced together there are no matches.
I have tried PROFILEing the query and it seems like Cypher works backwards from Snowball. It can make that first connection between the family and Snowball.
After that it does a VarLengthExpand(All)
snowball, f, lisa
(f)-[ UNNAMED22:*..10]-(lisa)
Which yields 6 rows. We then drop to 0 rows with this Filter:
snowball, f, lisa
lisa: Daughter
I can get the match to work if I declare a connection between the family and a daughter in the first line of the match statement, but for reasons having to do w/ my particular application this is not a useful workaround.
MATCH (f:Family)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(lisa)-[*1..10]-(snowball:Pet {name: "snowball"})-[*1..10]-(:Item {name: "catnip"})
RETURN f;
I think I'm missing something about how Cypher searches for these patterns. Does anyone have insight into what that might be? Thank you for your time!
This isn't a Cypher bug, this is a side-effect of relationship uniqueness within a given MATCH pattern.
From the uniqueness section of the docs:
While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
This type of uniqueness is usually correct, and is great for preventing infinite loops when using variable-length relationships which traverse a cycle.
Relationship uniqueness is preserved for patterns from a MATCH or an OPTIONAL MATCH, even when these include multiple comma separated paths, as in your case.
You have all of the paths within the pattern of a single MATCH, so relationships must be unique; if used in one path, they will not be reused for another path.
The real problem is here: (f)-[*1..10]-(snowball:Pet) because you've already traversed the same relationship (<memberOf between the Simpsons and Lisa) when you did (f)-[*1..10]-(lisa:Daughter) earlier. Since the relationship cannot be reused, one of those two paths will not be able to be matched, so the entire MATCH fails...no such pattern exists with unique relationships.
Note that when you break up the single MATCH into multiple MATCHes, as in stdob--'s answer, the query succeeds. There is no uniqueness in play here between separate MATCH clauses.

How can branches get pruned in a query?

suppose I have a tree where certain nodes have a relationship of a give type, how would I return all nodes in the tree except for those with the given relationship and its descendants.
I've gotten half way there with something like this (the tree is built on has links):
match (root: {Name: 'Root'})-[:has*]->(n) where not (n)-[:Exemption]-() return n
but, of course, this excludes nodes that have a relationship of type Exemption but not its descendants, so the descendants show up as unconnected nodes
how do I structure the query?
This query should work:
MATCH p=({Name: 'Root'})-[:has*]->(n)
WHERE NONE(x IN NODES(p) WHERE (x)-[:Exemption]-())
RETURN n;
It filters out any :has relationship paths with a node that (also) has an :Exemption relationship.
I would probably split the path up into two parts:
MATCH p=(:Label {Name: 'Root'})-[:has*]->(n) WHERE NOT EXISTS ((n)-[:Exemption]->())
MATCH p2 = (n)-[:has*]->(m) WHERE NOT (m)-[:has]->()
RETURN p,p2;
after some experimentation, this worked:
match (root: {Name: 'Root'})-[:has*]->(n)
match (m)-[:has*]->(n)
where not (m)-[:Exemption]-()
return n
which essentially says: find a path, for any node on that path, traverse back up to make sure that none of the ancestors show an exception. so what I'm negating here is the subpath created with the second match

Neo4J match and create relationship is very very slow with few millions records

I have about 3.5M nodes with label A and about 400 nodes with label B.
Nodes with label B already have directed relation like (b1:B)-(c:CONNECTS)->(b2:B) now I need to add 3.5M another type of relationships by comparing A node properties with :CONNECTS relationship properties.
My statement looks like this:
MATCH (a:A)
MATCH (c:C)
MATCH (b1:B {id: a.a1_id})-[rl:CONNECTS*1..21]->(b2:B {id: a.b2_id}) WHERE ALL(x in rl WHERE x.connect_id = c.connect_id)
MATCH (new_a:B)-[r:TO]->(new_b:B) WHERE r in rl
CREATE (new_a)-[:TICKET {ticket_id: ID(a)}]->(new_b)
This statement is extremely slow and just hangs up. I even tried to do some performance tuning mentioned here, especially I allocated heap size to 16GB.
I find it quite strange that it can't handle this size of data. What am I missing? I tried to model differently and reduce relationship queries and use more schema index, but I failed to do a lot differently because of type of data I have and type of query I want to perform after all data is there.
I also tried to use periodic commit while creating A nodes with csv import. It has same issues.
I hope I am clear enough. I would really appreciate some inputs. Thanks.
What are the labels A, B, C ? A CONNECTS relationship is also free of meaning.
Queries like this are meant to be comprehensible not the opposite!
// generates 3.5M rows
MATCH (a:A)
// generates x-times 3.5M rows
// you never use that C except for checking an connect id?
MATCH (c:C)
// many million times execute this variable length expand
MATCH (b1:B {id: a.a1_id})-[rl:CONNECTS*1..21]->(b2:B {id: b2_id})
WHERE ALL(x in rl WHERE x.connect_id = c.connect_id)
// lookup by relationship is very bad esp. as you looking over a cross product of all 400x400 B's
MATCH (new_a:B)-[r:TO]->(new_b:B) WHERE r in rl
// why do you store the id of a on this self!!-relationship?
CREATE (new_b)-[:TICKET {ticket_id: ID(a)}]->(new_b);
Where does b2_id come from?
Perhaps something like this:
MATCH (a:A)
MATCH (b1:B {id: a.a1_id})
MATCH (b2:B {id: {b2_id}})
MATCH (b1)-[rels:CONNECTS*..21]->(b2)
WHERE ALL(x in tail(rels) WHERE x.connect_id = head(rels).connect_id)
UNWIND rels AS r
WITH a,startNode(r) as new_a, endNode(r) as new_b
CREATE (new_a)-[:TICKET {ticket_id: ID(a)}]->(new_b);

Resources