Can't seem to figure out, and I'm not sure it entirely possible. I have a graph like so
a-[:granted]->b-[:granted]->...x-[granted_source]>s
where b and x are of interest. While I already know a and s, the end points, which are defined in START clause.
Note that b and c could be one ( a->b->s ) or more then one ( a->b->c->x->s ) and the goal is to find the shortest path returning only the nodes that are pointed to by a 'granted' relationship.
The closest I've got is:
start s=node(21), p=node(2)
match paths=shortestPath(p-[:granted|granted_source*]->s)
return NODES(paths)
Which gives all the nodes, including start (p) and end (s). But I can't seem to filter out, or better would be to not return them at all, only the nodes that are pointed to by a granted relationship and in the order from (s) if possible. I'm on Neo4j 2.0b and I'm wondering if Labels, which I have no issue using, would be the better way to go? Any help would be very appreciated.
So, you want to chop the head and tail off of a collection of nodes? (Am I understanding that right?) How about:
start s=node(21), p=node(2)
match paths=shortestPath(p-[:granted|granted_source*]->s)
return NODES(paths)[1..-1]
I think I resolved it using a WITH, I think this is probably the best performance given that first the p-... are fetched, then all ...->s are fetched and then using the shortestPath() is used to get the 'in between' nodes. The results appear correct.
start s=node(21), p=node(2)
match p-[:granted]-x, y-[:granted_source]->s
with x, y
match paths=shortestPath(x-[:granted*]->y)
return NODES(paths)
Related
I would like to clean up my graph database a bit by removing unnecessary nodes. In one case, unnecessary nodes are nodes B between nodes A and C where B has the same name as node C and NO OTHER incoming relationships. I am having trouble coming up with a Cypher query that restricts the number of incoming edges.
The first part was easy:
MATCH (n1:TypeA)<-[r1:Inside]-(n2:TypeB)<-[r2:Inside]-(n3:TypeC)
WHERE n2.name = n3.name
Based on other SE questions (especially this one) I then tried doing something like:
WITH n2, collect(r2) as rr
WHERE length(rr) = 1
RETURN n2
but this also returned nodes with more than one incoming edge. It seems my WHERE clause on the length is not filtering the returned n2 nodes. I tried a few other things I found online, but they either returned nothing or were no
longer syntactically correct in the current version.
After I find the n2 nodes that match the pattern, I'll want to connect n3 directly to n1 and DETACH DELETE n2. Again, I was easily able to do that part when I didn't need the restriction on the number of incoming edges to n2. That previous question has FOREACH (r IN rr | DELETE r), but I want to detach delete the n2 nodes, not just those edges. I don't know how to correctly adapt this to operating on the nodes attached to the rs and I certainly want to be sure it's finding the correct nodes before deleting anything since Neo4j lacks basic undo functionality (but you can't put a RETURN command inside a FOREACH for some crazy reason).
How do I filter nodes on a path by the number of incoming edges using Cypher?
I think I can do this in py2neo by first collecting all the n1,n2,n3 triples matching the pattern, then going through each returned record and add them to a list if n2 has only one incoming edge. Then go through that list and perform the trimming operation, but if this can be done in pure Cypher, then I'd like to know how because I have a number of similar adjustments to make.
You need to pass along path in your WITH statement.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH path, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
RETURN path
Or shorter like this:
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
AND size((n2)<-[:PARTOF]-()) = 1
RETURN path
Borrowing some insight from this answer I came up with a kludge that seems to work.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH n2, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
MATCH (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
RETURN n1,n2,n3
I expect that not all parts of that are needed and it isn't an efficient solution, but I don't know enough yet to improve upon it.
For example, I define the first line as path, but I can't use MATCH path in the penultimate line, and I have no idea why.
Also, if I write WITH size((n2)<-[:PARTOF]-()) as degree (dropping the n2, after WITH) it returns not only the n2 with degree > 1, but also all the nodes connected to them (even more than the n3 nodes). I don't know why it behaves like this and the Neo4j documentation for WITH has no examples or description to help me understand why the n2 is necessary here. Any improvements to my Cypher query or explanation of how or why it works would be greatly appreciated.
I want to find all loops that originate and terminate with a specific node in a Neo4j database. I tried:
START n=node:Event(time=",timestamp,")
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
WHERE NONE (n IN nodes(p) WHERE size(filter(x IN nodes(p) WHERE n = x))> 2)
RETURN p, length(p)
This is the best I can mashup from what is on the web. There are two things I don't like about this:
1. it crashes
2. the count threshold must be ">2" to allow for the start+termination node. That means that loops that visit the same intermediate node twice will be included, which I wish was not the case.
I'm not interested in the shortest path. I want to know all loops that return to my starting node.
Thank you in advance!
This query should return all loops that start and end at the specified node and have no other repeated nodes:
START n=node:Event(time=",timestamp,")
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
UNWIND TAIL(NODES(p)) AS m
WITH p, COUNT(DISTINCT m) AS cm
WHERE LENGTH(p)-1 = cm
RETURN p, LENGTH(p);
Thank you, cybersam! That was helpful. As typed, it gave a few errors and warned me that "START" is deprecated. I found the following modifications worked:
MATCH (n:Event{time:1458238060505007})
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
UNWIND TAIL(NODES(p)) AS m WITH p RETURN p
The only problem with this is that it appears to give all paths that go through the desired start node, n. Is that true? If so, is there a way to correct this?
This what finally worked for me. It is very close to what cybersam suggested. Apologies for doing this "the wrong way". I'm sure cybersam will yell at me, again, but adding code via Comment is not very easy to read.
MATCH p=(n:Event{time:",timestamp,"})-[:LINKED_TO*1..5]->(n)
UNWIND TAIL (NODES(p)) AS m
WITH p,COUNT(DISTINCT m) AS cm
WHERE LENGTH(p) = cm
RETURN p
As I noted earlier, one sticking point was the use of "START", which is deprecated and causes errors (for example, when using RNeo4j in R, which I'm using). The new way appears to be to use MATCH and specify your starting node in the path pattern. The other confusing thing for me was the use of "LENGTH(p)-1" instead of "LENGTH(p)". For one node connecting to another, the path has a length of 2, not 3 and there are only 2 distinct nodes. For my application, "LENGTH(p)=cm" worked.
Finally, if you want the nodes in the paths, do NOT try to use "WITH m,..." because this messes up the "COUNT(DISTINCT(m))" computation for some reason that I do not understand.
I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.
Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.
Using neo4j community edition 2.x. In Cypher, I need to MATCH nodes in (two) different ways, then combine these (two) sets of matched nodes into single set (one variable name). This set would then be used for further action.
naive graph example (I can't post images)
I would like to find all knowledge of the squirrel, including the knowledge shared by the groups she is member of. (example is fictional)
I imagine something like this:
MATCH (u:User{username:'squirrel'}), (:User{username:'squirrel'})<-[:MEMBER]-(g:Group)
WITH "COMBINATION OF u AND g" AS ug
MATCH (ug)-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
Outcome should be both "crack nuts" and "escape predators".
In the place of "COMBINATION OF u AND g" I tried variations on collect(u)+collect(g), EXTRACT, etc. Without success.
So far the simplest working way I found is using UNION.
MATCH (u:User{username:'squirrel'})-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
UNION
MATCH (u:User{username:'squirrel'})<-[:MEMBER]-(:Group)-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
This might solve this simple example, but is not good for more complex queries. I seek the solution for more general problem: MATCH several sets of nodes, glue them into single set (single variable) and continue with this new set.
Any ideas, please? Am I missing something basic? Or is this impossible? Thanks!
Something possibly similar on grokbase.
edit:
With this hacky solution to similar question I was able to solve the problem by extracting internal ids from collection of nodes:
MATCH (u:User{username:'squirrel'}), (:User{username:'squirrel'})<-[:MEMBER]-(g:Group)
WITH [x in collect(u)+collect(g)|id(x)] as collectedIds MATCH (ug) WHERE id(ug) in collectedIds
MATCH (ug)-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
Could it be done any better?
At least since Neo4j 3.0 you can use variable-length pattern matching to solve this issue. Simply set explicitly the minimum length to 0 and move the label test to a separate WHERE clause:
MATCH (:User {username:'squirrel'}) <-[:MEMBER*0..1]- (ug)
WHERE ug:User OR ug:Group
WITH ug
MATCH (ug)-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
Not sure about the general case, but for this specific case, you might try to combine the two patterns into one as follows,
MATCH (u:User{username:'squirrel'})<-[:MEMBER*0..1]-()-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
The only general solution I found so far:
match required starting points (with different names)
collect internal ids of starting points
match the starting points with collected ids (now the starting points have single name)
do whatever action you need to do with the starting points
Now the code itself:
MATCH (u:User{username:'squirrel'}), (:User{username:'squirrel'})<-[:MEMBER]-(g:Group)
WITH [x in collect(u)+collect(g)|id(x)] as collectedIds MATCH (ug) WHERE id(ug) in collectedIds
MATCH (ug)-[:KNOW_HOW]->(k:Knowledge)
RETURN k.type
I have a structure like so:
user-[:talking]->topic-[:categorized_in]->topic[:categorized_in]->topic... etc
Starting at a user, how would I get the furthest away topics they're talking about. Basically this represents the top level categories they are talking about. This is the only way I know to go about doing this, and it returns all of the nodes along the way, not just the leaf nodes.
START user=node(1)
MATCH user-[:talking]->x<-[:categorized_in*0..]-y
RETURN distinct y.uuid
This is my latest attempt. It seems to work, though I don't know if this is the best way to go about it?:
START user=node(1)
MATCH user-[:talking]->x<-[:categorized_in*0..]-y<-[?:pull]-z
WHERE z is null
RETURN distinct y.uuid
So this is how to do it for anybody interested:
START user=node(1)
MATCH user-[:talking]->x<-[:categorized_in*0..]-y<-[?:categorized_in]-z
WHERE z is null
RETURN distinct y.uuid
You can now filter against patterns in the WHERE.
So if you have a newer version of Neo4j, I think the query would look like
START user=node(1)
MATCH user-[:talking]->x<-[:categorized_in*0..]-y
WHERE NOT(y<-[:categorized_in]-())
RETURN DISTINCT y.uuid