Pairs from a directed acyclic Neo4j graph - neo4j

I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.

Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.

Related

NEO4j Cypher query to return paths with a condition on the number of connected nodes

I would like to clean up my graph database a bit by removing unnecessary nodes. In one case, unnecessary nodes are nodes B between nodes A and C where B has the same name as node C and NO OTHER incoming relationships. I am having trouble coming up with a Cypher query that restricts the number of incoming edges.
The first part was easy:
MATCH (n1:TypeA)<-[r1:Inside]-(n2:TypeB)<-[r2:Inside]-(n3:TypeC)
WHERE n2.name = n3.name
Based on other SE questions (especially this one) I then tried doing something like:
WITH n2, collect(r2) as rr
WHERE length(rr) = 1
RETURN n2
but this also returned nodes with more than one incoming edge. It seems my WHERE clause on the length is not filtering the returned n2 nodes. I tried a few other things I found online, but they either returned nothing or were no
longer syntactically correct in the current version.
After I find the n2 nodes that match the pattern, I'll want to connect n3 directly to n1 and DETACH DELETE n2. Again, I was easily able to do that part when I didn't need the restriction on the number of incoming edges to n2. That previous question has FOREACH (r IN rr | DELETE r), but I want to detach delete the n2 nodes, not just those edges. I don't know how to correctly adapt this to operating on the nodes attached to the rs and I certainly want to be sure it's finding the correct nodes before deleting anything since Neo4j lacks basic undo functionality (but you can't put a RETURN command inside a FOREACH for some crazy reason).
How do I filter nodes on a path by the number of incoming edges using Cypher?
I think I can do this in py2neo by first collecting all the n1,n2,n3 triples matching the pattern, then going through each returned record and add them to a list if n2 has only one incoming edge. Then go through that list and perform the trimming operation, but if this can be done in pure Cypher, then I'd like to know how because I have a number of similar adjustments to make.
You need to pass along path in your WITH statement.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH path, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
RETURN path
Or shorter like this:
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
AND size((n2)<-[:PARTOF]-()) = 1
RETURN path
Borrowing some insight from this answer I came up with a kludge that seems to work.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH n2, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
MATCH (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
RETURN n1,n2,n3
I expect that not all parts of that are needed and it isn't an efficient solution, but I don't know enough yet to improve upon it.
For example, I define the first line as path, but I can't use MATCH path in the penultimate line, and I have no idea why.
Also, if I write WITH size((n2)<-[:PARTOF]-()) as degree (dropping the n2, after WITH) it returns not only the n2 with degree > 1, but also all the nodes connected to them (even more than the n3 nodes). I don't know why it behaves like this and the Neo4j documentation for WITH has no examples or description to help me understand why the n2 is necessary here. Any improvements to my Cypher query or explanation of how or why it works would be greatly appreciated.

Find shortest path with complex predicate (check simultaneously both node properties and relation property)

I've built a graph with 40 mln nodes and 40 mln relations with Neo4j.
Mostly I search for different shortest paths and queries are to be very fast. Right now it usually takes a few milliseconds per query.
For speed I encode all parameters in relations property val, so ordinary query looks like this:
MATCH (one:Obj{oid:'1'})
with one
MATCH (two:Obj{oid:'2'}), path=shortestPath((one) -[*0..50]-(two))
WHERE ALL (x IN RELATIONSHIPS(path) WHERE ((x.val > 50 and x.val<109) ))
return path
But one filter cannot be done this way, as it should evaluate (on each step) property of starting node, property of relation, property of ending node, for example:
Path: n1(==1)-r1(==2)-n2(==1)-r2(==5)-n3(==3)
On step1: properties of n1 and n2 equal 1 and relation's property equals 2, that's OK, going further
On step2: property of n2 equals 1, but property of n3 equals 3, so we stop. If it was 1, we would stop anyway, because relation r2 is not 2, but 5.
I've used RELATIONSHIPS and NODES predicates, but they seem to work separately.
Also, I guess this can be done with traversal API, but I'll have to rewrite a lot of my other code, so it is not desirable.
Am I missing some fast solution?
It looks like your basic query is running quite fast. If you want to filter at additional steps, you probably have to add additional optional match and with statements to accommodate the filters. Undesired elements should drop out.

Generate table view from a graph in neo4j

I have the following graph slightly modified from this question.
The generated graph I have is the:
But I want to get the original table that has been inserted with either in html or regenerating the original excel:
So which is the neo4j query that return the result above?
You won't be able to for two reasons.
One is that direction will be impossible to determine. You have (using shorthand)
(1,2)->(6,7)<-(3,2)
(6,7)->(9,2)->(5,1)
(7,7)->(4,1)->(1,2)
(6,7)->(7,7)->(3,2)
Unless your graph has an error, you can see that the relationship between (6,7) and (3,2) isn't consistent with the rest, so my guess is you weren't consistent with the relationship direction when creating the graph, and if this is so this throws off any possible ordering when attempting to generate the table.
But if we assume that this was a mistake in merging to the graph, and it was supposed to be (6,7)->(3,2), then that fixes the ordering issue.
But now there's other ordering issues. You have no data in your graph to determine which node to start with.
The graphical result of this line alone: (7,7), (4,1), (1,2) can be expressed just as well with (4,1), (1,2), (7,7) or (1,2), (7,7), (4,1).
You may be able to generate rows back that will be logically equivalent, but you won't be able to get back the identical input.
But again, assuming that the graph was supposed to be (6,7)->(3,2), and assuming that a logically equivalent table is fine, then something like this may work:
MATCH p=(a)-->(b)-->(c)-->(a)
UNWIND nodes(p) as node
WITH p, node
ORDER BY id(node)
WITH p, collect(node) as nodes
WITH head(collect(p)) as path, nodes
WITH path[0] as a, path[1] as b, path[2] as c
RETURN a.x as Xa, a.y as Ya, b.x as Xb, b.y as Yb, c.x as Xc, c.y as Yc
Why so complicated? Because for each triangle of three nodes, there are 3 ways to order those nodes (as mentioned above), and so there will be 3 paths for each set of the same nodes. To avoid redundant rows with the same points in different orders, we order the nodes in the path, collect the 3 paths, take only one of them, and use those when returning the x and y values.

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

Neo4j Cypher: check attributes of not consecutive nodes in path

I have got a graph that represents several bus/train stops in different cities.
Lets assume I want to go from city A (with stops a1, a2, a3...) to city Z (with stops z1, z2...)
There are several routes (relations) between the nodes and I want to get all paths between the start and the end node. My cost vector would be complex (travel time and waiting time and price and and and...) in reality, therefore I cannot use shortestpaths etc. I managed to write a (quite complex) query that does what I want: In general it is looking for each match with start A and end Z that is available.
I try to avoid looping by filter out results with special characteristics, e. g.
MATCH (from{name:'a1'}), (to{name:'z1'}),
path = (from)-[:CONNECTED_TO*0..8]->(to)
WHERE ALL(b IN NODES(path) WHERE SINGLE(c IN NODES(path) WHERE b = c))
Now I want to avoid the possiblity to visit one city more than once, e. g. instead of a1-->a2-->d2-->d4-->a3-->a4-->z1 I want to get a1-->a4-->z1.
Therefore I have to check all nodes in the path. If the value of n.city is the same for consecutive nodes, everything is fine. But If I got a path with nodes of the same city that are not consecutive, e. g. cityA--> cityB-->cityA I want to throw away that path.
How can I do that? Is something possible?
I know, that is not really a beatiful approach, but I invested quite a lot of time in finding a better one without throwing away the whole data structure but I could not find one. Its just a prototype and Neo4j is not my focus. I want to test some tools and products to build some knowledge. I will go ahead with a better approach next time.
Interesting question. The important thing to observe here is that a path that never revisits a city (after leaving it) must have fewer transitions between cities than the number of distinct cities. For example:
AABBC (a "good" path) has 3 distinct cities and 2 transitions
ABBAC (a "bad" path) also has 3 distinct cities but 3 transitions
With this observation in mind, the following query should work (even if the start and end nodes are the same):
MATCH path = ({name:'a1'})-[:CONNECTED_TO*0..8]->({name:'z1'})
WITH path, NODES(path) as ns
WITH path, ns,
REDUCE(s = {cnt: 0, last: ns[0].city}, x IN ns[1..] |
CASE WHEN x.city = s.last THEN s ELSE {cnt: s.cnt+1, last: x.city} END).cnt AS nTransitions
UNWIND ns AS node
WITH path, nTransitions, COUNT(DISTINCT node.city) AS nCities
WHERE nTransitions < nCities
RETURN path;
The REDUCE function is used to calculate the number of transitions in a path.

Resources