Neo4j: Exclude certain nodes in variable path relationship - neo4j

I've a graph database consisting of two types of nodes - persons and businesses, and one type of relationship - payment.
A person may pay either another person, or another business. Likewise, a business may pay a person or a business. That is, all these four types of paths are possible
(person)-[:PAYS]->(person)
(person)-[:PAYS]->(business)
(business)-[:PAYS]->(person)
(business)-[:PAYS]->(business)
In a use case of detecting possible money laundering, I would like to extract cases where payment made by a person went through several businesses before reaching another person. That is (omitting the relationship for convenience):
(person)-(business)-(business)-(business)-(person)
My cypher query should therefore look something like this:
(person)-[:PAYS*0..3]-(person)
However, this will also return me the following relationship, which isn't what I want:
(person)-(business)-(person)-(business)-(person)
What can I do to exclude (person) from the variable length relationship [:PAYS*0..3]?
I've followed the solution given here and tried this:
MATCH path((person)-[:PAYS*0..3]-(person))
WHERE NONE(n IN nodes(path) WHERE n:person)
RETURN path
However, this query ran for a long time before giving an output of zero results (which isn't correct). Another obvious solution is to change my relationship to make a distinction between [:PAYS_BUSINESS] and [:PAYS_PERSON], but I would find out if a solution is possible without changing my graph schema.

The reason that
MATCH path=((person)-[:PAYS*0..3]-(person))
WHERE NONE(n IN nodes(path) WHERE n:person)
RETURN path
does not result in anything seems to be that the first and the last node are persons
if you want to find the paths from :person to :person with only :business in between, you could do this
MATCH path=((p1:Person)-[:PAYS*1..3]-(p2:Person))
WHERE ALL(n IN nodes(path)[1..-1] WHERE n:Business)
RETURN path
You may all want to look at the apoc.path.expand and apoc.path.expandConfig procedures (https://neo4j.com/labs/apoc/4.1/overview/apoc.path/). Powerful, but you introduce a dependency on the APOC library.

5 minutes after I posted this question, I thought of and tried a possible solution that seems to work. Not sure if this is against the rules, but here's a possible way out of my own problem (in case someone else is facing the same problem):
MATCH x=(p1:person)-[:PAYS]-(b1:business)
WITH *
MATCH y=(b1:business)-[:PAYS*..3]-(b2:business)-[:PAYS]-(p2:person)
RETURN x, y

You might want to look at how I handled this with X-linked inheritance. In that use case you aggregate the sex of the parent (M or F) and can then excluded MM from the aggregated string since a man never passes an X to his son.
http://stumpf.org/genealogy-blog/graph-databases-in-genealogy
The query exclude all MM concatenated strings, rather accepted anything except MM:
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m) with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c, reduce(srt2 ='|', q IN nodes(p)| srt2 + q.RN + '|') AS PathOrder where c=replace(c,'MM','') return distinct m.fullname as Fullname
In your case its P and B (person or business).

Related

In cypher, return only the nodes with most recent relationship

Related to this other question, but approaching from the other side of the relationship.
Here is the scenario. I am modeling a person that lives or has lived at one or more different locations. Included in the relationship is the start date (represented as ms since epoch) when they moved in.
(:Person{name:'bill'}) -[:livesAt {since:1111000}]->(:Place{name:'apartmentA'})
(:Person{name:'bill'}) -[:livesAt {since:2222000}]->(:Place{name:'apartmentB'})
(:Person{name:'john'}) -[:livesAt {since:3333000}]->(:Place{name:'apartmentA'})
(:Person{name:'chris'}) -[:livesAt {since:1100000}]->(:Place{name:'apartmentC'})
(:Person{name:'chris'}) -[:livesAt {since:1122000}]->(:Place{name:'apartmentA'})
I want to write a query that returns the person nodes that are still living at a given location. They are still living at a location if the livesAt relationship has the largest "since" of all relationships from that person.
I was trying something like this:
MATCH (:Place {name: 'apartmentA'})<-[r]-(p:Person)
WITH max(r.since) as most_recent, p.name as pname
MATCH (t:Person {name:pname}) -[e]->(l:Place)
WITH t,l
ORDER BY e.since DESC
return t,l
If my query worked with the example above, given place 'apartmentA' I would expect to get john and chris.
To find what you want, you have to make sure that you filter to people who don't have a :livesAt relationship with a larger since property (indicating that they now live elsewhere) to a different location. That's important because it's possible that they lived at that location, moved elsewhere, then moved back later.
We can use existential subqueries from Neo4j 4.x to give us finer control for describing the pattern that we don't want to exist.
MATCH (loc:Place {name: 'apartmentA'})<-[r:livesAt]-(p:Person)
WITH loc, max(r.since) as most_recent, p
WHERE NOT EXISTS {
MATCH (p) -[r:livesAt]->(other)
WHERE r.since > most_recent AND other <> loc
}
RETURN p.name
You might also consider remodeling this, keeping a :currentResidence relationship to their current residence, updating that (deleting the old, creating the new) when they move. That's in addition to the :livesAt relationships you already have (I assume you use those for other queries). That lets you very quickly perform checks and matches based on current residence without needing to do any additional filtering at all.
EDIT:
If you don't want to use an existential subquery, we can use an OPTIONAL MATCH of the pattern instead, and only filter to the results where the other node is null, meaning that no such pattern exists:
MATCH (loc:Place {name: 'apartmentA'})<-[r:livesAt]-(p:Person)
WITH loc, max(r.since) as most_recent, p
OPTIONAL MATCH (p) -[r:livesAt]->(other)
WHERE r.since > most_recent AND other <> loc
WITH p, other
WHERE other IS NULL
RETURN p.name

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

Cypher Query not returning nonexistent relationships

I have a graph database where there are user and interest nodes which are connected by IS_INTERESTED relationship. I want to find interests which are not selected by a user. I wrote this query and it is not working
OPTIONAL MATCH (u:User{userId : 1})-[r:IS_INTERESTED] -(i:Interest)
WHERE r is NULL
Return i.name as interest
According to answers to similar questions on SO (like this one), the above query is supposed to work.However,in this case it returns null. But when running the following query it works as expected:
MATCH (u:User{userId : 1}), (i:Interest)
WHERE NOT (u) -[:IS_INTERESTED] -(i)
return i.name as interest
The reason I don't want to run the above query is because Neo4j gives a warning:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (i))
What am I doing wrong in the first query where I use OPTIONAL MATCH to find nonexistent relationships?
1) MATCH is looking for the pattern as a whole, and if can not find it in its entirety - does not return anything.
2) I think that this query will be effective:
// Take all user interests
MATCH (u:User{userId: 1})-[r:IS_INTERESTED]-(i:Interest)
WITH collect(i) as interests
// Check what interests are not included
MATCH (ni:Interest) WHERE NOT ni IN interests
RETURN ni.name
When your OPTIONAL MATCH query does not find a match, then both r AND i must be NULL. After all, since there is no relationship, there is no way get the nodes that it points to.
A WHERE directly after the OPTIONAL MATCH is pulled into the evaluation.
If you want to post-filter you have to use a WITH in between.
MATCH (u:User{userId : 1})
OPTIONAL MATCH (u)-[r:IS_INTERESTED] -(i:Interest)
WITH r,i
WHERE r is NULL
Return i.name as interest

Cypher path querying (using Neo4j)

I have a graph datebase so that there is in it some pattern like this one:
(n1)-[:a]->(n2),
(n1)-[:b]->(n2),
(n1)-[:c]->(n2),
(n1)-[:e]->(n2),
(n1)-[:d]->(n3),
(n2)-[:b]->(n4)
And I want to have all graph with this pattern
MATCH p={
(n3)<-[:d]-(n1)-[:a]->(n2)-[:b]->(n4),
(n1)-[:b]->(n2)<-[:c]-(n1),
(n1)-[:e]->(n2)
}
RETURN p
Is it possible? I've search a little but I haven't found how to do it.
I know we can use "|" for a type like this
()-[:a|b]->()
but there is no "&" and the path assigning only works on pattern which are written without ",".
Thanks
EDIT:
If it could help, here is another example of what I'm seeking:
In a database with movies, person and relations like ACTED_IN, KNOWS, FRIEND and HATE
I want all the graphs containing an actor "Actor1" (who ACTED_IN a movie "M") who KNOWS "Person1", FRIEND "Person2" and HATE "Person3" which ACTED_IN the same movie "M".
An UNION like the one in the answer of "Michael Hunger" does not work because we have multiple subgraphs and not graphs. Moreover, some subgraph might not be correct answers for the bigger pattern.
Your query will be very inefficient, as you don't restrict your search to a set of start nodes neither with labels or label+property combinations !!!!
You can use UNION for that:
MATCH p=(n3)<-[:d]-(n1)-[:a]->(n2)-[:b]->(n4) RETURN p
UNION
MATCH p=(n1)-[:b]->(n2)<-[:c]-(n1) RETURN p
UNION
MATCH p=(n1)-[:e]->(n2) RETURN p

Cypher: preventing results from duplicating on WITH / sequential querying

In a query like this
MATCH (a)
WHERE id(a) = {x}
MATCH (a)-->(b:x)
WITH a, collect(DISTINCT id(b)) AS Bs
MATCH (a)-->(c:y)
RETURN collect(c) + Bs
what I'm trying to do is to gather two sets of nodes that came from different queries, but with this kind of procedure all the b rows get to be returned multiplied by the number of a rows.
How should I deal with this kind of problem that arises from sequential queries?
[Note that the reported query is only a conceptual representation of what I mean. Please don't try to solve the code (that would be trivial) but only the presented problem.]
Your query shouldn't return any cross product since you aggregate in the WITH clause, so there is only one result item/row (the disconnected path a, collect(b)) when the second match begins. It's not clear therefore what the problem is that you want solved–cross products can be solved differently in different cases.
The way your query would work, conceptually speaking, is: match anything related from a, then filter that anything on having label :x. The second leg of the query does the same but filters on label :y. You can therefore combine your queries as
MATCH (a)-->(b)
WHERE id(a) = {x} AND (b:x OR b:y)
RETURN b
Other cases of 'path explosion' can't be solved as easily (sometimes UNION is good, sometimes you can reorder your pattern, sometimes you can do some aggregate-and-reduce to make it happen) , but you'll have to ask about that separately.
How about using UNION for this? See http://docs.neo4j.org/chunked/milestone/query-union.html#union-combine-two-queries-and-remove-duplicates
-brian

Resources