What I'm looking for
With variable length relationships (see here in the neo4j manual), it is possible to have a variable number of relationships with a certain label between two nodes.
# Cypher
match (g1:Group)-[:sub_group*]->(g2:Group) return g1, g2
I'm looking for the same thing with nodes, i.e. a way to query for two nodes with a variable number of nodes in between, but with a label condition on the nodes rather than the relationships:
# Looking for something like this in Cypher:
match (g1:Group)-->(:Group*)-->(g2:Group) return g1, g2
Example
I would use this mechanism, for example, to find all (direct or indirect) members of a group within a group structure.
# Looking for somthing like this in Cypher:
match (group:Group)-->(:Group*)-->(member:User) return member
Take, for example, this structure:
group1:Group
|-------> group2:Group -------> user1:User
|-------> group3:Group
|--------> page1:Page -----> group4:Group -----> user2:User
In this example, user1 is a member of group1 and group2, but user2 is only member of group4, not member of the other groups, because a non-Group labeled node is in between.
Abstraction
A more abstract pattern would be a kind of repeat operator |...|* in Cypher:
# Looking for repeat operator in Cypher:
match (g1:Group)|-[:is_subgroup_of]->(:Group)|*-[:is_member_of]->(member:User)
return member
Does anyone know of such a repeat operator? Thanks!
Possible Solution
One solution I've found, is to use a condition on the nodes using where, but I hope, there is a better (and shorter) soluation out there!
# Cypher
match path = (member:User)<-[*]-(g:Group{id:1})
where all(node in tail(nodes(path)) where ('Group' in labels(node)))
return member
Explanation
In the above query, all(node in tail(nodes(path)) where ('Group' in labels(node))) is one single where condition, which consists of the following key parts:
all: ALL(x in coll where pred): TRUE if pred is TRUE for all values in
coll
nodes(path): NODES(path): Returns the nodes in path
tail(): TAIL(coll): coll except first element–––I'm using this, because the first node is a User, not a Group.
Reference
See Cypher Cheat Sheet.
How about this:
MATCH (:Group {id:1})<-[:IS_SUBGROUP_OF|:IS_MEMBER_OF*]-(u:User)
RETURN DISTINCT u
This will:
find all subtrees of the group with ID 1
only traverse the relationships IS_GROUP_OF and IS_MEMBER_OF in incoming direction (meaning sub-groups or users that belong to group with ID or one of its sub-groups)
only return nodes which have a IS_MEMBER_OF relationship to a group in the subtree
and discard duplicate results (users who belong to more than one of the groups in the tree would otherwise appear multiple times)
I know this relies on relationships types rather than node labels, but IMHO this is a more graphy approach.
Let me know if this would work or not.
Related
I am a newbie who just started learning graph database and I have a problem querying the relationships between nodes.
My graph is like this:
There are multiple relationships from one node to another, and the IDs of these relationships are different.
How to find relationships where the number of relationships between two nodes is greater than 2,or is there a problem with the model of this graph?
Just like on the graph, I want to query node d and node a.
I tried to use the following statement, but the result is incorrect:
match (from)-[r:INVITE]->(to)
with from, count(r) as ref
where ref >2
return from
It seems to count the number of relations issued by all from, not the relationship between from-->to.
to return nodes who have more then 2 relationship between them you need to check the size of the collected rels. something like
MATCH (x:Person)-[r:INVITE]-(:Party)
WITH x, size(collect(r)) as inviteCount
WHERE inviteCount > 2
RETURN x
Aggregating functions like COLLECT and COUNT use non-aggregating terms in the same WITH (or RETURN) clause as "grouping keys".
So, here is one way to get pairs of nodes that have more than 2 INVITE relationships (in a specific direction) between them:
MATCH (from)-[r:INVITE]->(to)
WITH from, to, COUNT(r) AS ref
WHERE ref > 2
RETURN from, to
NOTE: Ideally (for clarity and efficiency), your nodes would have specific labels and the MATCH pattern would specify those labels.
How can we add a relationship to the query.
Say A-[C01]-B-[C02]-D and A-[C01]-B-[C03]-E
C01 C02 C03 are relationship codes I want to get output
B E
because I want only nodes that can be reached unbroken by C01 or C03
How can I get this result in Cypher?
You may want to clarify, what you're asking for seems like a very simple case of matching. You may want to provide some more info, such as node labels and how you're matching to your start nodes, since without these we have to make things up for example code.
MATCH (a:Thing)
WHERE a.ID = 123
WITH a
MATCH (a)-[:C01|C03*]->(b:Thing)
RETURN b
The key here is specifying multiple relationship types to traverse, using * for multiplicity, so it will match on all nodes that can be reached by any chain of those relationships.
I've built a simple graph of one node type and two relationship types: IS and ISNOT. IS relationships means that the node pair belongs to the same group, and obviouslly ISNOT represents the not belonging rel.
When I have to get the groups of related nodes I run the following query:
"MATCH (a:Item)-[r1:IS*1..20]-(b:Item) RETURN a,b"
So this returns a lot of a is b results and I added some code to group them after.
What I'd like is to group them modifying the query above, but given my rookie level I haven't yet figured it out. What I'd like is to get one row per group like:
(node1, node3, node5)
(node2,node4,node6)
(node7,node8)
I assume what you call groups are nodes present in a path where all these nodes are connected with a :IS relationship.
I think this query is what you want :
MATCH p=(a:Item)-[r1:IS*1..20]-(b:Item)
RETURN nodes(p) as nodes
Where p is a path describing your pattern, then you return all the nodes present in the path in a collection.
Note that, a simple graph (http://console.neo4j.org/r/ukblc0) :
(item1)-[:IS]-(item2)-[:IS]-(item3)
will return already 6 paths, because you use undericted relationships in the pattern, so there are two possible paths between item1 and item2 for eg.
I have created a graph db in Neo4j and want to use it for generalization purposes.
There are about 500,000 nodes (20 distinct labels) and 2.5 million relations (50 distinct types) between them.
In an example path : a -> b -> c-> d -> e
I want to find out the node without any incoming relations (which is 'a').
And I should do this for all the nodes (finding the nodes at the beginning of all possible paths that have no incoming relations).
I have tried several Cypher codes without any success:
match (a:type_A)-[r:is_a]->(b:type_A)
with a,count (r) as count
where count = 0
set a.isFirst = 'true'
or
match (a:type_A), (b:type_A)
where not (a)<-[:is_a*..]-(b)
set a.isFirst = 'true'
Where is the problem?!
Also, I have to create this code in neo4jClient, too.
Your first query will only match paths where there is a relationship [r:is_a], so counting r can never be 0. Your second query will return any arbitrary pair of nodes labeled :typeA that aren't transitively related by [:is_a]. What you want is to filter on a path predicate. For the general case try
MATCH (a)
WHERE NOT ()-->a
This translates roughly "any node that does not have incoming relationships". You can specify the pattern with types, properties or labels as needed, for instance
MATCH (a:type_A)
WHERE NOT ()-[:is_a]->a
If you want to find all nodes that have no incoming relationships, you can find them using OPTIONAL MATCH:
START n=node(*)
OPTIONAL MATCH n<-[r]-()
WITH n,r
WHERE r IS NULL
RETURN n
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.