I have a dataset in neo4j that looks something like this:
(a)-[similar_to]->(b)
Each node has a property called 'id' that is unique. In the following example dataset, each 'a' node had a 'similar_to' relationship to each 'b' node:
a.id b.id
1 5
1 2
2 13
3 12
Here is what the topology looks like:
graph topology image
What I would like to do is to retrieve a table of the two groups of nodes that are connected such that the result would look like:
1, 2, 5, 13
3, 12
The best I've been able to do with Cypher so far is:
MATCH (a)-[r:similar_to*]-(b)
RETURN collect(distinct a.id)
However, the output of this is to print all of the nodes on one row:
5, 1, 2, 3, 12, 13
I have tried various permutations of this query, but keep failing. I've searched the forums for 'subgraph' and 'neo4j', but was unable to find a suitable solution. Any direction/ideas would be appreciated.
Thanks!
My understanding is you want every root node "a" and the group of all nodes that have the direct/indirect relationships [:similar_to] with the "a", if so, try this,
MATCH (a)-[r:similar_to*]->(b)
Where not(a<-[:similar_to]-())
RETURN a, collect(distinct b.id) as group
The "WHERE" clause restricts the node "a" to be the root node of each group.
The "RETURN" clause groups all nodes on the matched paths by the root node "a".
If you want to include each root "a" in the group, just change the path to,
(a)-[r:similar_to*0..]->(b)
Related
After having created a collection of nodes, some of the nodes should also have a relation attached based on a condition. In the example below the condition is simulated with WHERE n.number > 3 and the nodes are simple numbers:
WITH [2, 3, 4] as numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH collect(n) AS nodes
UNWIND nodes AS n
WITH nodes, n WHERE n.number > 3
CREATE (n)-[:IM_SPECIAL]->(n)
RETURN nodes
Which returns:
╒════════════════════════════════════════╕
│"nodes" │
╞════════════════════════════════════════╡
│[{"number":2},{"number":3},{"number":4}]│
└────────────────────────────────────────┘
Added 3 labels, created 3 nodes, set 3 properties, created 1 relationship, started streaming 1 records in less than 1 ms and completed after 1 ms.
My problem is that nothing is returned unless I have at least one of these "special" nodes that is caught by the filter. The problem can be simulated by changing the input numbers to [1, 2, 3] which returns an empty result (no nodes) even though the nodes are created (as they should):
<empty result>
Added 3 labels, created 3 nodes, set 3 properties, completed after 2 ms.
I might be approaching the problem totally wrong but I've exhausted my Google skills... what Neo4J Cypher magic am I missing?
The documentation about Conditional Cypher Execution - Using correlated subqueries in 4.1+ describes how to solve this without the need for Apoc:
WITH [2, 3, 4] AS numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH n
CALL {
WITH n
WITH n WHERE n.number > 3
CREATE (n)-[:IM_SPECIAL]->(n)
RETURN count(n)
}
RETURN collect(n) AS nodes
Thanks to Sanjay Singh and Jose Bacoy for putting me on the right track.
WITH nodes, n WHERE n.number > 3
Each clause of a Cypher query must yield a result for for the subsequent lines of the query to consume. The above line yields nothing if you start with [1,2,3].
For your purpose, this will work.
WITH [1,2,3,4] as numbers
UNWIND numbers AS num
CREATE(n:Number {number: num})
WITH n
CALL apoc.do.when(n.number>3,
'CREATE (n)-[:IM_SPECIAL]->(n) RETURN n',
'RETURN n',
{n:n}
)
YIELD value as m
WITH collect(m) AS nodes
RETURN nodes
Suppose that I have the following graph. How can I find the lowest set of nodes that cut all (oriented) paths between nodes [1,2,3,4] and node [66]. In my case I want to find nodes [11, 5,9, 6] (node 7 must be excluded because node 6 and 9 are more lower ancestor to nodes 3 and 4. Thank you for your helps.
I have an answer to a similar question here.
Here are the steps I would take:
Match to the starting nodes [1,2,3,4] and collect them for later.
Match to and expand out from your end node (66) to all connected nodes (using the directed pattern), blacklisting the starting nodes from step 1 (so we don't include paths to those nodes or beyond them). Collect these nodes as descendents.
Expand from your starting nodes, terminating at descendents (so we get paths to the first descendent encountered, but don't continue expanding past any of them).
Cypher doesn't have great support for performing the termination during expansion behavior in step 3, so we need path expander procs from APOC Procedures for that.
Let's say that these are nodes of type :Node with id properties for the numeric values, with :PARENT relationships between them pointing toward parents/ancestors. Let's say we have an index on :Node(id) for quick lookup. Using APOC our query would look something like:
MATCH (n:Node)
WHERE n.id IN [1,2,3,4]
WITH collect(n) as startNodes
MATCH path = (end:Node {id:66})<-[:PARENT*]-(descendent)
WHERE none(node in nodes(path) WHERE node in startNodes)
WITH startNodes, end, collect(DISTINCT descendent) as descendents
CALL apoc.path.subgraphNodes(startNodes, {terminatorNodes:descendents}) YIELD node as mostRecentDescendents
RETURN mostRecentDescendents
I have an application where nodes and relations are shown. After a result is shown, nodes and relations can be added through the gui. When the user is done, I would like to get all the data from the database again (because I don't have all data by this point in the front-end) based on the Neo4j id's of all nodes and links. The difficult part for me is that there are "floating" nodes that don't have a relation in the result of the gui (they will have relations in the database, but I don't want these). Worth mentioning is that on my relations, I have the start and end node id. I was thinking to start from there, but then I don't have these floating nodes.
Let's take a look at this poorly drawn example image:
As you can see:
node 1 is linked (no direction) to node 2.
node 2 is linked to node 3 (from 2 to 3)
node 3 is linked to node 4 (from 3 to 4)
node 3 is also linked to node 5 (no direction)
node 6 is a floating node, without relations
Let's assume that:
id(relation between 1 and 2) = 11
id(relation between 2 and 3) = 12
id(relation between 3 and 4) = 13
id(relation between 3 and 5) = 14
Keeping in mind that behind the real data, there are way more relations between all these nodes, how can I recreate this very image again via Neo4j? I have tried doing something like:
match path=(n)-[rels*]-(m)
where id(n) in [1, 2, 3, 4, 5]
and all(rel in rels where id in [11, 12, 13, 14])
and id(m) in [1, 2, 3, 4, 5]
return path
However, this doesn't work properly because of multiple reasons. Also, just matching on all the nodes doesn't get me the relations. Do I need to union multiple queries? Can this be done in 1 query? Do I need to write my own plugin?
I'm using Neo4j 3.3.5.
You don't need to keep a list of node IDs. Every relationship points to its 2 end nodes. Since you always want both end nodes, you get them for free using just the relationship ID list.
This query will return every single-relationship path from a relationship ID list. If you are using the neo4j Browser, its visualization should knit together these short paths and display your original full paths.
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
By the way, all neo4j relationships have a direction. You may choose not to specify the direction when you create one (using MERGE) and/or query for one, but it still has a direction. And the neo4j Browser visualization will always show the direction.
[UPDATED]
If you also want to include "floating" nodes that are not attached to a relationship in your relationship list, then you could just use a separate floating node ID list. For example:
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
UNION
MATCH p=(n)
WHERE ID(n) IN [6]
RETURN p
I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".
Specifically, I have a query
match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b
which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4.
Can I write the query without having to repeat the second match clause? Using
match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b
seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,
match (c)-[:st]->(b)
tries to find matches between ANY node of (c) and ANY node of (b), whereas
match (b)-[:st]->(b)
tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?
Thanx for any insight into the inner working ...
When you write the 2 MATCH statements
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.
To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.
If you do a single MATCH, you obviously have a single node.
Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:
1 2
1 (1, 1) (1, 2)
2 (2, 1) (2, 2)
whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:
1 2
1 (1, 1)
2 (2, 2)
which are not the interesting pairs, if you don't have self-relationships.
Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.
I have a query that I'm not sure how to implement or if it's efficient to do in cypher. Anyway, here's what I'm trying to do.
I have basically this graph:
I want to get all the nodes/relationships from 1 to 3 (note: the empty node can be any number of nodes). I also want all the, if any, incoming edges from the last two nodes and only the last two nodes that are not in the original path. In this case the edges that are in red should also be added to result.
I already know the path that I want. So in this example I would have been given node ids 1, ..., 2, 3 and I think I know how to get the path of the first part.
MATCH (n)-->() WHERE n.nid IN ['1', '...', '2', '3'] RETURN n
I just can't figure out how to get the red edges for the last two nodes in the path. Also, I'm not given node ids 4 and 5. We can assume the edges connecting 1, ..., 2, 3 all have the same label and all the other edges have a different label.
I think I need to use merge but can't figure out how to do it yet.
Or if someone know's how to do this in gremlin, I'm all ears.
Does this work for you?
MATCH ({nid: '1'})-[:t*]->(n2 {nid: '2'})-[:t]->(n3 {nid: '3'})
OPTIONAL MATCH ()-[t42]->(n2)
WHERE (TYPE(t42) <> 't')
OPTIONAL MATCH ()-[t53]->(n3)
WHERE (TYPE(t53) <> 't')
RETURN COLLECT(t42) AS c42, COLLECT(t53) AS c53;
I give all the relationships on the left path (in your diagram) the type "t". (The term label is used for nodes, not relationships.). You said we can assume that the other relationships do not have that type, so this query takes advantage of that fact to filter out type "t" relationships from the result.
This query also makes the 4-2 and 5-3 relationships optional.