I've got a sub graph, of node type c which has a relationship to t1 and t2. Node t1 has a relationship with w1 and w2. Node t2 has a relationship with w1.
What I want to query with cypher is from node c return the w nodes that have 2 or more t nodes related. ie w1 only.
Apparently you can't aggregate in the WHERE clause like
START c=node(7)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WHERE COUNT(t) >= 2
RETURN w.WName;
Maybe looking at it another way, this doesn't work either as I only want the w's that only relate to t1 and t2...?
START c=node(7), t1=node(10), t2=node(8)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WHERE t in [t1, t2]
RETURN t, w.WName;
Update
Anyone wanting something like the second one, this works:
START c=node(7), t1=node(8), t2=node(10)
MATCH (c)-[:T_TO]-(t1)-[:W_TO]-(w),(c)-[:T_TO]-(t2)-[:W_TO]-(w)
RETURN w.WName;
How about
START c=node(7)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WITH COUNT(t) as tCount,w
WHERE tCount >= 2
RETURN w.WName;
Related
Node A is connected to Node E through different nodes B (B can be repeating), C and D etc as given below.
(A)--(C)--(D)--(E)
(A)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(B)--(C)--(D)--(E)
There could be up to 7 B nodes between A and C or no B node at all (like the first case above).
Question: How to get all the E1, E2, E3, E4 connected to A1 with a single query and return properties from all A, B, C, D and E nodes? I could not return the properties using the hops.
MATCH (A {Id:30})-[*1..6]-(E) RETURN DISTINCT A.Name, E.Name;
But we want to return B.Name (If there are multiple B nodes in the middle their names too), C.Name and D.Name too. Happy to skip hopping completely if required. Help please? Thanks in advance.
Try this
// in case there is always A,C and E, you can look for
// paths with length 3 to 6
MATCH path=(A)-[*3..6]-(E)
// return the name of each node in the same order
RETURN [n IN nodes(path) | n.name] AS nodeNames
Assuming A through E are node labels, this query should get all paths that match your pattern (with 0 to 7 B nodes between the A and C nodes), and return distinct lists of node Name values:
MATCH p=(:A)-[*..8]-(:C)--(:D)--(:E)
WHERE ALL(n IN NODES(p)[1..-3] WHERE 'B' IN LABELS(n))
RETURN DISTINCT [m IN NODES(p) | m.Name] AS names
In general, the query would be more efficient if you could also specify the relationship types and their directionality.
I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).
Having the following graphs:
node g1 with child nodes (a, b)
node g2 with child nodes (b, c)
using the query
MATCH (n)-[]-(m) WHERE ID(m) = id RETURN n
being id the id of the node g1, I get a and b, and vice-versa when using the id of g2. What I would like to understand is how can I get the intersection of those two results, in this case having the first return (a, b) and the second return (b, c) getting as final result (b).
I tried using the WITH cause but I wasn't able to achieve the desired result. Keep in mind that I'm new to Neo4j and only came here after a few failed attempts, research on Neo4j Documentation, general google search and
Stackoverflow.
Edit1 (one of my tries):
MATCH (n)-[]->(m)
WHERE ID(m) = 750
WITH n
MATCH (o)-[]->(b)
WHERE ID(b) = 684 and o = n
RETURN o
Edit2:
The node (b), that I represented as being the same on both graphs are in fact two different nodes on the db, each one relating to a different graph (g1 and g2). Representatively they are the same as they have the exactly same info (labels and attributes), but on the database thy are not. I'm sorry since it was my fault for not being more explicit on this matter :(
Edit3:
Why I don't using a single node (b) for both graphs
Using the graphs above as example, imagine that I have yet another layer so: on g1 the child node (b) as a child (e), while on g2 the child node (b) as a child (f). If I had (b) as a single node, when I create (e) and (f) I only could add it to (b) loosing the hierarchy, becoming impossible to distinguish which of them, (e) or (f), belonged to g1 ou g2.
This should work (assuming you pass id1 and id2 as parameters):
MATCH (a)--(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN n;
[UPDATED, based on new info from comments]
If you have multiple "clones" of the "same" node and you want to quickly determine which clones are related without having to perform a lot of (slow) property comparisons, you can add a relationship (say, typed ":CLONE") between clones. That way, a query like this would work:
MATCH (a)--(m)-[:CLONE]-(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN m, n;
You can find the duplicity of the node, by using this query -
[1]
Duplicity with single node -
MATCH pathx =(n)-[:Relationship]-(find) WHERE find.name = "action" RETURN pathx;
[2]
or for two nodes giving only immediate parent node
MATCH pathx =(n)-[:Relationship]-(find), pathy= (p)-[:Relationship]
-(seek) WHERE find.name = "action" AND seek.name="requestID" RETURN pathx,
pathy;
[3]
or to find the entire network i.e. all the nodes connected -
MATCH pathx =(n)--()-[:Relationship]-(find), pathy= (p)--()-[:Relationship]-
(seek) WHERE find.name = "action"
AND seek.name="requestID" RETURN pathx, pathy;
I have the following data in Neo4j:
CREATE (t1:T {start:1, end:8})
CREATE (t2:T {start:1, end:4})
CREATE (t3:T {start:1, end:2})
CREATE (t4:T {start:3, end:4})
CREATE (t5:T {start:5, end:6})
CREATE (t6:T {start:7, end:8})
CREATE (t2)-[r1:T_OF]->(t1)
CREATE (t3)-[r2:T_OF]->(t2)
CREATE (t4)-[r3:T_OF]->(t2)
CREATE (t5)-[r4:T_OF]->(t1)
CREATE (t6)-[r5:T_OF]->(t1)
This creates a tree with start and end values, which in my actual application are epoch dates. I want to be able to find the nodes that don't have shorter/smaller nodes attached to them in a given range.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
(MAGIC)
RETURN t
My goal is for this to only return t2 and t5, even though t3 and t4 fall in the range. Since they have a T_OF relationship to t2, they should be ignored.
I've tried a few different ways, but unfortunately I can't figure this one out.
Please let me know if I should explain better.
Does this work for you?
It collects all the T nodes with the right date range, and then filters out all the nodes that have a T_OF relationship to any node in the collection.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
WITH COLLECT(t) AS ct
RETURN FILTER(x IN ct
WHERE ALL (y IN ct
WHERE NOT ((x)-[:T_OF]->(y))))
AS result;
You can use paths as expressions and also negate them.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
AND NOT (t)<-[:T_OF]-()
RETURN t
Imagine that i have a graph in which for every pair of nodes m,n of type Nod1 there can be a node k of type Nod2 that connect them through relationships of type Rel, that is, there can be multiple patterns of the kind p=(m:Nod1)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1). For a given node m (satisfying for example m.key="whatever") how can i find the node n that maximizes the number of nodes k that connect m to n? For example: imagine that there are 3 nodes k that connects m to n1 satisfying n1.key="hello" and 10 nodes k that connects m to n2 satisfying n2.key="world"; how to build a query that retrieves the node n2? :)
The title of the question is count duplicated, because i think that the problem is solved if i can count all "duplicated" patterns for each node n (that is, all patterns that has n as "endnode")!! :)
Start by matching your m; then match the pattern you want, then filter by distinct n nodes, and count the number of k nodes connected via that n node, and you should be there.
MATCH (m:Nod1 { key: "whatever" })
WITH m
MATCH (m)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1)
RETURN distinct(n), count(k) as x
ORDER BY x DESC;