I have a subgraph in neo4j with multiple paths, generated via:
match p=((n:Actor)-[*1..3]->(m:film)) where n.surname='Craig' and m.name='Minions' and ALL(x in nodes(p)[1..length(p)-1] where labels(x)[0]='Director') return p
Now, from this subgraph I want a list of tuples, where each tuple is a pair of connected nodes in the subgraph:
node0, node1
node1, node3
node0, node2
node2, node26
I tried:
match p=((n:Actor)-[*1..3]->(m:film))
where n.surname='Craig' and m.name='Minions' and ALL(x in nodes(p)[1..length(p)-1] where labels(x)[0]='Director')
with nodes(p) as np
match p2=((nn)-[]-()) where nn IN np
return p2
but this just returned the nearest neighbour of every single node in p. Including to nodes not in the subgraph.
This seems to work
MATCH p=((n:Actor)-[*1..3]->(m:Film))
WHERE n.surname='Craig' AND m.name='minions' AND ALL(x in nodes(p)[1..length(p)-1] WHERE labels(x)[0]='Director')
MATCH p2=(n2)-[r]-(m2)
WHERE n2 IN nodes(p) AND m2 IN nodes(p)
RETURN
n2,r,m2
However is very slow, any speed up recommendations?
Rather than take all of the data from the paths and rematch it against the graph you could process the paths for pairs in memory. If you take the nodes from each path and process them two an a time, you can collect them in ordered pairs.
...
// for each path grab the nodes
// and an index for them less the last one
//
with nodes(p) as node_list, range(0, size(nodes(p)) - 2, 1) as idx
//
// put the tuples in ordered pairs
//
unwind idx as i
with node_list[i] as a , node_list[i+1] as b
with
case
when id(a) < id(b) then [id(a), id(b)]
else [id(b), id(a)]
end as tuple
return tuple, count(*)
order by count(*) desc
Related
I have a graph where a pair of nodes can have several relationships between them.
I would like to count this relationships between each pair of nodes, and set it as a parameter of each relationship.
I tried something like:
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
with s, e, r, length(r) as cnt
MATCH (s2:LabeledExperience{name:s.name})-[r2:NextExp{name:r.name}]->(e2:LabeledExperience{name: e.name})
SET r2.weight = cnt
But this set the weight always to one.
I also tried:
MATCH ()-[r:NextExp]->()
with r, length(r) as cnt
MATCH ()-[r2:NextExp{name:r.name}]->()
SET r2.weight = cnt
But this takes too much time since there are more than 90k relationships and there is no index (since it is not possible to have them on edges).
They are always set to 1 because of the way you are counting.
When you group by s, e, r that is always going to result in a single row. But if you collect(r) for every s, e then you will get a collection of all of the :NextExp relationships between those two nodes.
Also, length() is for measuring the length (number of nodes) in a matched path and should not work directly on a relationship.
Match the relationship and put them in a collection for each pair of nodes. Iterate over each rel in the collection and set the size of the collection of rels.
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
WITH s, e, collect(r) AS rels
UNWIND rels AS rel
SET rel.weight = size(rels)
I have Neo4j community 3.5.5, where I built a graph data model with rail station and line sections between stations. Stations and line sections are nodes and connect is the relationship linking them.
I would like to name a starting station and an ending station and sum up length of all rail section in between. I tried the Cypher query below but Neo4j does not recognize line_section as a node type.
match (n1:Station)-[:Connect]-(n2:Station)
where n1.Name='Station1' and n2.Name='Station3'
return sum(Line_Section.length)
I know it is possible to do the summing with Neo4j Traversal API. Is it possible to do this in Cypher?
First capture the path from a start node to the end node in a variable, then reduce the length property over it.
MATCH path=(n1: Station { name: 'Station1' })-[:Connect]->(n2: Station { name: 'Station2' })
RETURN REDUCE (totalLength = 0, node in nodes(path) | totalLength + node.length) as totalDistance
Assuming the line section nodes have the label Line_Section, you can use a variable-length relationship pattern to get the entire path, list comprehension to get a list of the line section nodes, UNWIND to get the individual nodes, and then use aggregation to SUM all the line section lengths:
MATCH p = (n1:Station)-[:Connect*]-(n2:Station)
WHERE n1.Name='Station1' AND n2.Name='Station3'
UNWIND [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] AS ls
RETURN p, SUM(ls.length) AS distance
Or, you could use REDUCE in place of UNWIND and SUM:
MATCH p = (n1:Station)-[:Connect*]-(n2:Station)
WHERE n1.Name='Station1' AND n2.Name='Station3'
RETURN p, REDUCE(s = 0, ls IN [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] |
s + ls.length) AS distance
[UPDATED]
Note: a variable-length relationship with an unbounded number of hops is expensive, and can take a long time or run out of memory.
If you only want the distance for the shortest path, then this should be faster:
MATCH
(n1:Station { name: 'Station1' }),
(n2:Station {name: 'Station3' }),
p = shortestPath((n1)-[*]-(n2))
WHERE ALL(r IN RELATIONSHIPS(p) WHERE TYPE(r) = 'Connect')
RETURN p, REDUCE(s = 0, ls IN [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] |
s + ls.length) AS distance
Currently I'm running personalized PageRank over a set of nodes. I want to take the top n nodes and RETURN all relationships between these nodes in addition to the resulting PageRank score for each node and a set of properties for each node.
Right now I'm managing to RETURN the start and end nodes of the relationships, but I'm unable to figure out how to also RETURN the scores and any additional node properties.
My query is as follows:
MATCH (p) WHERE p.paper_id = $paper_id
CALL algo.pageRank.stream(null, null, {direction: "BOTH", sourceNodes: [p]})
YIELD nodeId, score
WITH p, nodeId, score ORDER BY score DESC
LIMIT 25
MATCH (n) WHERE id(n) = nodeId
WITH collect(n) as nodes
UNWIND nodes as n
MATCH (n)-[r:Cites]->(p) WHERE p in nodes
RETURN startNode(r).paper_id as start, collect(endNode(r).paper_id) as end
In the second block of code I collect the matched nodes n in order to then find all relationships between nodes in that collection later on. However, including score in the WITH collect(n) as nodes line results in score being used as the indexer when I want to somehow pass it in separately (is this allowed?).
The output format is not important since I can properly format it server side.
APOC can help here, you can use apoc.algo.cover() to get all the relationships between a set of nodes.
As for bringing the score along, it would probably be best to extract from the collected nodes a map projection which includes the score:
MATCH (p) WHERE p.paper_id = $paper_id
CALL algo.pageRank.stream(null, null, {direction: "BOTH", sourceNodes: [p]})
YIELD nodeId, score
WITH p, nodeId, score ORDER BY score DESC
LIMIT 25
MATCH (n) WHERE id(n) = nodeId
WITH collect(nodeId) as ids, collect(n {.*, score}) as nodes
CALL apoc.algo.cover(ids) YIELD rel
RETURN nodes, collect(rel) as rels
With Labels A, B, and Z, A and B have their own relationships to Z. With the query
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z
RETURN DISTINCT a, matched_z
Which returns the nodes of A and all the Nodes Z that have a relationship to A and B
I'm stuck on trying to ALSO return a separate array of the Z Nodes that B has with Z but not with A (i.e. missing_z). I am attempting to do an initial query to return all the relationships between B & Z
results = MATCH (b:B { uuid: {id} })
MATCH (b)-[:rel2]->(z:Z)
RETURN DISTINCT COLLECT(z.uuid) AS z
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z, z
RETURN DISTINCT a, matched_z, filter(skill IN z.array WHERE NOT z.uuid IN {results}) AS missing_z
The results seem to have nil for missing_z where one would assume it should be populated. Not sure if filter is the correct way to go with a WHERE NOT / IN scenario. Can the above 2 queries be combined into 1?
The hard part here, in my opinion, is that any failed matches will drop everything you have matched so far. But your starting point seems to be "All Z related by B.uuid", So start by collecting that and filtering/copying from there.
Use WITH + aggregation functions to copy+filter columns
Use OPTIONAL MATCH if a failure to match shouldn't drop already collected rows.
If I understand what you are trying to do well enough, This cypher should do the job, and just adjust it as needed (let me know if you need help understanding any part of it/adapting it)
// Match base set
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
// Collect into single list
WITH COLLECT(z) as zs
// Match all A (ignore relation to Zs)
MATCH (a:A)
// For each a, return a, the sub-list of Zs related to a, and the sub-list of Zs not related to a
RETURN a as a, FILTER(n in zs WHERE (a)-[:rel1]->(n)) as matched, FILTER(n in zs WHERE NOT (a)-[:rel1]->(n)) as unmatched
This query might do what you want:
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
WITH COLLECT(z) as all_zs
UNWIND all_zs AS z
MATCH (a)-[:rel1]->(z)
WITH all_zs, COLLECT(DISTINCT z) AS matched_zs
RETURN matched_zs, apoc.coll.subtract(all_zs, matched_zs) AS missing_zs;
It first stores in the all_zs variable all the Z nodes that have a rel2 relationship from b. This collection's contents remain unaffected even if the second MATCH clause matches a subset of those Z nodes.
It then stores in matched_zs the distinct all_zs nodes that have a rel1 relationship from any A node.
Finally, it returns:
the matched_zs collection, and
the unique nodes from all_zs that are not also in matched_zs, as missing_zs.
The query uses the convenient APOC function apoc.coll.subtract to generate the latter return value.
Is there a way to keep or return only the final full sequences of nodes instead of all subpaths when variable length identifiers are used in order to do further operations on each of the final full sequence path.
MATCH path = (S:Person)-[rels:NEXT*]->(E:Person)................
eg: find all sequences of nodes with their names in the given list , say ['graph','server','db'] with same 'seqid' property exists in the relationship in between.
i.e.
(graph)->(server)-(db) with same seqid :1
(graph)->(db)->(server) with same seqid :1 //there can be another matching
sequence with same seqid
(graph)->(db)->(server) with same seqid :2
Is there a way to keep only the final sequence of nodes say ' (graph)->(server)->(db)' for each sequences instead of each of the subpath of a large sequence like (graph)->(server) or (server)->(db)
pls help me to solve this.........
(I am using neo4j 2.3.6 community edition via java api in embedded mode..)
What we could really use here is a longestSequences() function that would do exactly what you want it to do, expand the pattern such that a and b would always be matched to start and end points in the sequence such that the pattern is not a subset of any other matched pattern.
I created a feature request on neo4j for exactly this: https://github.com/neo4j/neo4j/issues/7760
And until that gets implemented, we'll have to make do with some alternate approach. I think what we'll have to do is add additional matching to restrict a and b to start and end nodes of full sequences.
Here's my proposed query:
WITH ['graph', 'server' ,'db'] as names
MATCH p=(a)-[rels:NEXT*]->(b)
WHERE ALL(n in nodes(p) WHERE n.name in names)
AND ALL( r in rels WHERE rels[0]['seqid'] = r.seqid )
WITH names, p, a, rels, b
// check if b is a subsequence node instead of an end node
OPTIONAL MATCH (b)-[rel:NEXT]->(c)
WHERE c.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where b is a subsequence node
WITH names, p, a, rels, b, c
WHERE c IS NULL
WITH names, p, a, rels, b
// check if a is a subsequence node instead of a start node
OPTIONAL MATCH (d)-[rel:NEXT]->(a)
WHERE d.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where a is a subsequence node
WITH p, a, b, d
WHERE d IS NULL
RETURN p, a as startNode, b as endNode
MATCH (S:Person)-[r:NEXT]->(:Person)
// Possible starting node
WHERE NOT ( (:Person)-[:NEXT {seqid: r.seqid}]->(S) )
WITH S,
// Collect all possible values of `seqid`
collect (distinct r.seqid) as seqids
UNWIND seqids as seqid
// Possible terminal node
MATCH (:Person)-[r:NEXT {seqid: seqid}]->(E:Person)
WHERE NOT ( (E)-[:NEXT {seqid: seqid}]->(:Person) )
WITH S,
seqid,
collect(distinct E) as ES
UNWIND ES as E
MATCH path = (S)-[rels:NEXT* {seqid: seqid}]->(E)
RETURN S,
seqid,
path
[EDITED]
This query might do what you want:
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN path;
It first identifies all Person nodes (p1) with at least one outgoing NEXT relationship that have no incoming NEXT relationships (with the same seqid), and their distinct outgoing seqid values. Then it finds all "complete" paths (i.e., paths whose start and end nodes have no incoming or outgoing NEXT relationships with the desired seqid, respectively) starting at each p1 node and having relationships all sharing the same seqid. Finally, it returns each complete path.
If you just want to get the name property of all the Person nodes in each path, try this query (with a different RETURN clause):
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN EXTRACT(n IN NODES(path) | n.name);