Neo4j cypher query for X-chromosome ancestors - neo4j

In genetic genealogy X-chromosome data is useful linking to certain ancestors. This is well illustrated at: X-DNA Inheritance Chart
My Neo4j database has nodes for each Person and relationships connecting them of father and mother. Each node has a property sex (for the Person's gender; M or F). A female has two X-chromosomes, one from either parent. A male has one X-chromosome, always from the mother.
You can use reduce to see the genders involved in the inheritance from ancestors:
match p=(n:Person{RN:1})-[:father|mother*..20]->m
return m.fullname as FullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c
So, starting with a male (RN:1), the result for c is MM for his father and MF for his mother, MMM for the paternal grandfather and MFM for the maternal grandfather, etc. This pattern shows that when c contains MM (two Ms together in sequence) that these are NOT contributing to the X-chromosome of the start Person.
I want to remove any node that has the MM pattern. It's easy to do this with external code, but I cannot figure out how to do it within the cypher query.

This should work for you:
MATCH p=(n:Person { RN:1 })-[:father|mother*..20]->m
WITH m, NODES(p) AS a
WITH m, REDUCE(c = "", i IN RANGE(0, SIZE(a)-1)| CASE
WHEN c IS NULL OR (i > 0 AND (a[i-1]).sex = "M" AND (a[i]).sex = "M") THEN
NULL
ELSE
c + (a[i]).sex
END ) AS c
WHERE c IS NOT NULL
RETURN m.fullName AS fullName, c
ORDER BY LENGTH(c);
And here is a console that demonstrates the results.

A little late to the party and same thought process as #cybersam's solution.
match p=(n:Person { RN: 1 })-[:father|mother*..20]->(m)
with p, m, extract( g in nodes(p) | g.sex ) as genders
with p, m, genders, range(0,size(genders) -1,1) as gender_index
unwind gender_index as idx
with p, m, genders, collect([genders[idx], genders[idx+1]]) as pairs
where not ['M','M'] in pairs
return m.fullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c

This query gets me only ancestors contributing an X-chromosome:
match p=(n:Person{RN:1})-[:father|mother*..20]->(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return m.RN,m.fullname as Name, c
The collection of genders adds a gender for each generation and is filtered to exclude any MM since a male cannot transmit his X to another male (e.g., son).

Related

Pattern Matching in Neo4j

Assume that in an application, the user gives us a graph and we want to consider it as a pattern and find all occurrences of the pattern in the neo4j database. If we knew what the pattern is, we could write the pattern as a Cypher query and run it against our database. However, now we do not know what the pattern is beforehand and receive it from the user in the form of a graph. How can we perform a pattern matching on the database based on the given graph (pattern)? Is there any apoc for that? Any external library?
One way of doing this is to decompose your input graph into edges and create a dynamic cypher from it. I have worked on this quite some time ago, and the solution below is not perfect but indicates a possible direction.
For example, if you feed this graph:
and you take the id(node) from the graph, (i am not taking the rel ids, this is one of the imperfections)
this query
WITH $nodeids AS selection
UNWIND selection AS s
WITH COLLECT (DISTINCT s) AS selection
WITH selection,
SPLIT(left('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',SIZE(selection)*2-1),",") AS nodeletters
WITH selection,
nodeletters,
REDUCE (acc="", nl in nodeletters |
CASE acc
WHEN "" THEN acc+nl
ELSE acc+','+nl
END) AS rtnnodes
MATCH (n) WHERE id(n) IN selection
WITH COLLECT(n) AS nodes,selection,nodeletters,rtnnodes
UNWIND nodes AS n
UNWIND nodes AS m
MATCH (n)-[r]->(m)
WITH DISTINCT "("
+nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(n) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(n)| acc + ':'+ p))+")-[:"+type(r)+"]->("
+ nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(m) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(m)| acc + ':'+ p))+")" as z,rtnnodes
WITH COLLECT(z) AS parts,rtnnodes
WITH REDUCE(y=[], x in range(0, size(parts)-1) | y + replace(parts[x],"[","[r" + (x+1))) AS parts2,
REDUCE (acc="", x in range(0, size(parts)-1) | CASE acc WHEN "" THEN acc+"r"+(x+1) ELSE acc+",r"+(x+1) END) AS rtnrels,
rtnnodes
RETURN
REDUCE (acc="MATCH ",p in parts2 |
CASE acc
WHEN "MATCH " THEN acc+p
ELSE acc+','+p
END)+
" RETURN "+
rtnnodes+","+rtnrels+
" LIMIT "+{limit}
AS cypher
returns something like
cypher: "MATCH (a:Person)-[r1:DRIVES]->(b:Car),(a:Person)-[r2:KNOWS]->(c:Person) RETURN a,b,c,r1,r2 LIMIT 50"
which you can feed to the next query.
In Graphileon, you can just select the nodes, and the result will be visualized as well.
Disclosure : I work for Graphileon
I have used patterns in genealogy queries.
The X-chromosome is not transmitted from father to son. As you traverse a family tree you can use the reduce function to create a concatenated string of the sex of the ancestor. You can then accept results that lack MM (father-son). This query gives all the descendants inheriting the ancestor's (RN=32) X-chromosome.
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return distinct m.fullname as Fullname
I am developing other pattern specific queries as part of a Neo4j PlugIn for genealogy. These will include patterns of triangulation groups.
GitHub repository for Neo4j Genealogy PlugIn

Query to write hops and return all the properties from the middle nodes or a better way to do it and skip hops?

Node A is connected to Node E through different nodes B (B can be repeating), C and D etc as given below.
(A)--(C)--(D)--(E)
(A)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(B)--(C)--(D)--(E)
There could be up to 7 B nodes between A and C or no B node at all (like the first case above).
Question: How to get all the E1, E2, E3, E4 connected to A1 with a single query and return properties from all A, B, C, D and E nodes? I could not return the properties using the hops.
MATCH (A {Id:30})-[*1..6]-(E) RETURN DISTINCT A.Name, E.Name;
But we want to return B.Name (If there are multiple B nodes in the middle their names too), C.Name and D.Name too. Happy to skip hopping completely if required. Help please? Thanks in advance.
Try this
// in case there is always A,C and E, you can look for
// paths with length 3 to 6
MATCH path=(A)-[*3..6]-(E)
// return the name of each node in the same order
RETURN [n IN nodes(path) | n.name] AS nodeNames
Assuming A through E are node labels, this query should get all paths that match your pattern (with 0 to 7 B nodes between the A and C nodes), and return distinct lists of node Name values:
MATCH p=(:A)-[*..8]-(:C)--(:D)--(:E)
WHERE ALL(n IN NODES(p)[1..-3] WHERE 'B' IN LABELS(n))
RETURN DISTINCT [m IN NODES(p) | m.Name] AS names
In general, the query would be more efficient if you could also specify the relationship types and their directionality.

MATCH based on sum of common node property

I need help figuring out a MATCH statement.
My data model is as follows:
(a:musician {name}) //individual musicians
(b:jamSession {date, durationInHours}) //the date and length of jam sessions where 2 or more musicians participated together
and the relation
[r:PLAYED]
I've already figured out how to find all of the jam sessions a specific musician played at:
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession) RETURN a.name, b.date
and all of the musicians a specific musician played with
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession)<-[r2:PLAYED]-(c:musician) RETURN c.name
But how do I get only the musicians that Joe Smith has played with were the sum total time of their common jam sessions was >=100 hours and what date the pair of musicians meet the 100 hour milestone?
This may work for you (assuming b.date is suitable for sorting):
MATCH (a:musician)-[:PLAYED]->(b:jamSession)<-[:PLAYED]-(c:musician)
WHERE a.name = "Joe Smith"
WITH a, c, b ORDER BY b.date
WITH a, c,
REDUCE(s = {sum: 0}, x IN COLLECT(b) |
CASE WHEN s.date IS NULL AND x.durationInHours + s.sum >= 100
THEN {sum: s.sum + x.durationInHours, date: x.date}
ELSE s
END
) AS data
WHERE data.date IS NOT NULL
RETURN c.name AS name, data.date AS date
The aggregating function COLLECT is used to collect the date-ordered b nodes that are shared by the same a and c node pairs. And the REDUCE function is used to iterate through the ordered b nodes to find when the 100 threshold is met.

Neo4j Return nodes such that every link follows an specific pattern

I was practicing with the Movie Database from Neo4j in order to practice and I have done the next query:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
RETURN a
This query returns 3 rows but If I go to the graph view on the web editor and expand the "Tom Hanks" node I, of course, have one movie such that Tom Hanks directed and acted in that movie but the rest of the connected nodes only have the ACTED_IN relation. What I want to do is to, in this case, filter and remove Tom Hanks from the result since he has at least one connection such that it has only one relation (either ACTED_IN or DIRECTED)
PD: My expected result would be only the row representing node "Clint Eastwood"
So you only want results where the person acted in and directed the same movies, but never simply acted in, without directing, or directed, without acting.
You could use this approach:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN]->()) = actedDirectedCount AND size((a)-[:DIRECTED]->()) = actedDirectedCount
RETURN a
Though you can simplify this a bit by combining the relationship types in the pattern used in your WHERE clause like so:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN|DIRECTED]->()) = actedDirectedCount * 2
RETURN a
If the actedDirectedCount = 3 movies, then there must be at a minimum 3 :ACTED_IN relationships and 3 :DIRECTED relationships, so a minimum of 6 relationships using either relationship. If there are any more than this, then there are additional movies that they either acted in or directed, so we'd filter that out.
There options come to my mind:
1.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, collect(distinct m) as actedMovies
with a where all(x in directedMovies where x in actedMovies) and all(x in actedMovies where x in directedMovies)
return a
2.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with * order by id(m)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, m order by id (m)
with a, directedMovies, collect(distinct m) as actedMovies
with a where actedMovies=directedMovies
return a
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
with * where all(x in directedMovies where (a)-[:ACTED_IN]->(x))
MATCH (m:Movie)<-[:ACTED_IN]-(a)
with a, collect(distinct m) as actedMovies
with * where all(x in actedMovies where (a)-[:DIRECTED]->(x))
return a
The first two are equally expensive and the last one is a bit more expensive.

Create relationship between nodes having same property value in common, using one Cypher query

Beginning with Neo4j 1.9.2, and using Cypher query language, I would like to create relationships between nodes having a specific property value in common.
I have set of nodes G having a property H, without any relationship currently existing between G nodes.
In a Cypher statement, is it possible to group G nodes by H property value and create a relationship HR between each nodes becoming to same group? Knowing that each group have a size between 2 & 10 and I'm having more than 15k of such groups (15k different H values) for about 50k G nodes.
I've tried hard to manage such query without finding a correct syntax. Below is a small sample dataset:
create
(G1 {name:'G1', H:'1'}),
(G2 {name:'G2', H:'1'}),
(G3 {name:'G3', H:'1'}),
(G4 {name:'G4', H:'2'}),
(G5 {name:'G5', H:'2'}),
(G6 {name:'G6', H:'2'}),
(G7 {name:'G7', H:'2'})
return * ;
At the end, I'd like such relationships:
G1-[:HR]-G2-[:HR]-G3-[:HR]-G1
And:
G4-[:HR]-G5-[:HR]-G6-[:HR]-G7-[:HR]-G4
In another case, I may want to update massively the relationships between nodes using/comparing some of their properties. Imagine nodes of type N and nodes of type M, with N nodes related to M with a relationship named :IS_LOCATED_ON. The order of the location can be stored as a property of N nodes (N.relativePosition being Long from 1 to MAX_POSITION), but we may need later to update the graph model such a way: make N nodes linked between themselves by a new :PRECEDES relationship, so that we can find easier and faster next node N on the given set.
I'd expect such language may allow to update massive set of nodes/relationships manipulating their properties.
Is it not possible?
If not, is it planned or may be it planned?
Any help would be greatly appreciated.
Since there's nothing in the data you supplied to get rank, I've played with collections
to get one as follows:
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
RETURN n.name, n.H, rank ORDER BY n.H, n.name;
Building off of that you can then start determining relationships
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
WITH n, others, rank, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
RETURN n.name, n.H, rank, next ORDER BY n.H, n.name;
Finally ( and slightly more condensed )
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
CREATE n-[:HR]->next
RETURN n, next;
You can just do it like that, maybe indicate direction in your relationships:
CREATE
(G1 { name:'G1', H:'1' }),
(G2 { name:'G2', H:'1' }),
(G3 { name:'G3', H:'1' }),
(G4 { name:'G4', H:'2' }),
(G5 { name:'G5', H:'2' }),
(G6 { name:'G6', H:'2' }),
(G7 { name:'G7', H:'2' }),
G1-[:HR]->G2-[:HR]->G3-[:HR]->G1,
G4-[:HR]->G5-[:HR]->G6-[:HR]->G7-[:HR]->G1
See http://console.neo4j.org/?id=ujns0x for an example.

Resources