I have an application where nodes and relations are shown. After a result is shown, nodes and relations can be added through the gui. When the user is done, I would like to get all the data from the database again (because I don't have all data by this point in the front-end) based on the Neo4j id's of all nodes and links. The difficult part for me is that there are "floating" nodes that don't have a relation in the result of the gui (they will have relations in the database, but I don't want these). Worth mentioning is that on my relations, I have the start and end node id. I was thinking to start from there, but then I don't have these floating nodes.
Let's take a look at this poorly drawn example image:
As you can see:
node 1 is linked (no direction) to node 2.
node 2 is linked to node 3 (from 2 to 3)
node 3 is linked to node 4 (from 3 to 4)
node 3 is also linked to node 5 (no direction)
node 6 is a floating node, without relations
Let's assume that:
id(relation between 1 and 2) = 11
id(relation between 2 and 3) = 12
id(relation between 3 and 4) = 13
id(relation between 3 and 5) = 14
Keeping in mind that behind the real data, there are way more relations between all these nodes, how can I recreate this very image again via Neo4j? I have tried doing something like:
match path=(n)-[rels*]-(m)
where id(n) in [1, 2, 3, 4, 5]
and all(rel in rels where id in [11, 12, 13, 14])
and id(m) in [1, 2, 3, 4, 5]
return path
However, this doesn't work properly because of multiple reasons. Also, just matching on all the nodes doesn't get me the relations. Do I need to union multiple queries? Can this be done in 1 query? Do I need to write my own plugin?
I'm using Neo4j 3.3.5.
You don't need to keep a list of node IDs. Every relationship points to its 2 end nodes. Since you always want both end nodes, you get them for free using just the relationship ID list.
This query will return every single-relationship path from a relationship ID list. If you are using the neo4j Browser, its visualization should knit together these short paths and display your original full paths.
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
By the way, all neo4j relationships have a direction. You may choose not to specify the direction when you create one (using MERGE) and/or query for one, but it still has a direction. And the neo4j Browser visualization will always show the direction.
[UPDATED]
If you also want to include "floating" nodes that are not attached to a relationship in your relationship list, then you could just use a separate floating node ID list. For example:
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
UNION
MATCH p=(n)
WHERE ID(n) IN [6]
RETURN p
Related
I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
I try to get chain started with Rita Ora
She has relations with
Drake, Jay Z and Justin Bieber
Query takes random from these guys, for example Jay Z
Then Query takes relations of Jay Z: Karrine
Steffans, Rosario Dawson and Rita Ora
Query can't take Rita Ora cuz
she is already in chain, so it takes random from others two, for
example Rosario Dawson
...
And at the end we should have a chain Rita Ora - Jay Z - Rosario Dawson - other celeb - other celeb 2
Is that possible to do it by query?
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ... construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN the results.
Note. Instead of UNWIND and count(DISTINCT ...), getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.
I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.
Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.
I have a query that I'm not sure how to implement or if it's efficient to do in cypher. Anyway, here's what I'm trying to do.
I have basically this graph:
I want to get all the nodes/relationships from 1 to 3 (note: the empty node can be any number of nodes). I also want all the, if any, incoming edges from the last two nodes and only the last two nodes that are not in the original path. In this case the edges that are in red should also be added to result.
I already know the path that I want. So in this example I would have been given node ids 1, ..., 2, 3 and I think I know how to get the path of the first part.
MATCH (n)-->() WHERE n.nid IN ['1', '...', '2', '3'] RETURN n
I just can't figure out how to get the red edges for the last two nodes in the path. Also, I'm not given node ids 4 and 5. We can assume the edges connecting 1, ..., 2, 3 all have the same label and all the other edges have a different label.
I think I need to use merge but can't figure out how to do it yet.
Or if someone know's how to do this in gremlin, I'm all ears.
Does this work for you?
MATCH ({nid: '1'})-[:t*]->(n2 {nid: '2'})-[:t]->(n3 {nid: '3'})
OPTIONAL MATCH ()-[t42]->(n2)
WHERE (TYPE(t42) <> 't')
OPTIONAL MATCH ()-[t53]->(n3)
WHERE (TYPE(t53) <> 't')
RETURN COLLECT(t42) AS c42, COLLECT(t53) AS c53;
I give all the relationships on the left path (in your diagram) the type "t". (The term label is used for nodes, not relationships.). You said we can assume that the other relationships do not have that type, so this query takes advantage of that fact to filter out type "t" relationships from the result.
This query also makes the 4-2 and 5-3 relationships optional.
I have a query like this:
MATCH left, right
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6])
WITH left, right
LIMIT 1
RETURN left, right
UNION MATCH left, right
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6])
WITH left, right
SKIP 4 LIMIT 1
RETURN left, right
UNION MATCH left, right
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6])
WITH left, right
SKIP 8 LIMIT 1
RETURN left, right
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel;
In general, I'm just creating a dataset so that I can use it later in CREATE UNIQUE instruction.
Obviously, that doesn't work - query analyzer says that I can only use RETURN clause once.
My question is - how to compose a dataset in this case? I tried to assign an alias and use it in CREATE UNIQUE - can't get it to work either. What am I doing wrong? Is this scenario even possible?
I may misunderstand what you are after, but here's what occurs to me when I look at your query.
To begin with here's an adaptation of your query that uses SKIP and LIMIT without RETURN or UNION.
MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right
LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH [rel] as rels //If you want to return the relationship later you can put it in a collection and bring it WITH
MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right, rels
SKIP 4 LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH rels + [rel] as rels
MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right, rels
SKIP 8 LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH rels + [rel] as rels
RETURN LENGTH(rels), rels // You can return the relationships here but SKIP/LIMIT does its job also if you don't return anything
But this query is a bit wild. It's really three queries, where two have been artificially squeezed in as sub queries of the first. It matches the same nodes anew in each sub query, and there really isn't anything gained by running the queries this way rather than separately (it's actually slower, because in each sub query you match also the nodes you know you will not use).
So my first suggestion is to use START instead of MATCH...WHERE when getting nodes by id. As it stands, the query binds every node in the database as "left", and then every node in the database as "right", and then it filters out all the nodes bound to "left" that don't fit the condition in the WHERE clause, and then the same for "right". Since this part of the query is repeated three times, all nodes in the database are bound a total of six times. That's expensive for creating three relationships. If you use START you can bind the nodes you want right away. This doesn't really answer your question, but it will be faster and the query will be cleaner. So, use START to get nodes by their internal id.
START left = node(1,2,3), right = node(4,5,6)
The second thing I think of is the difference between nodes and 'paths' or 'result items' when you match patterns. When you bind three nodes in "left" and three other nodes in "right", you don't have three result items, but nine. For each node bound in "left" you get three results, because there are three possible "right" to combine it with. If you wanted to relate every "left" to every "right", great. But I think what you are looking for are the result items (1),(4), (2),(5), (3),(6), and though it seems convenient to bind the three "left" nodes and the three "right" nodes in one query with collections of node ids, you then you have to do all that filtering to get rid of the 6 unwanted matches. The query gets complex and cumbersome, and its actually slower than running the queries separately. Another way to say this is to say that (1)-[:FRIEND]->(4) is a distinct pattern, not (relevantly) connected to the other patterns you are creating. It would be different if you wanted to create (1)-[:FRIEND]->(2)<-[:FRIEND]-(3), then you would want to handle those three nodes together. Maybe you're just exploring fringe uses of cypher, but I thought I should point it out. By the way, using SKIP and LIMIT in this way is a bit off key, they're not really intended for pattern matching and filtering. It's also unpredictable, unless you also use ORDER BY, since there is no guarantee that the results will be in a certain order. You don't know which result item it is that get's passed on. Anyway, in this case, I think it would be better to bind the nodes and create the relationship in three separate queries.
START left = node(1), right = node(4)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel
START left = node(2), right = node(5)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel
START left = node(3), right = node(6)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel
Since you already know that you want those three pairs, and not, say, (1),(4),(1),(5),(1),(6) it would make sense to query for just those pairs, and the easiest way is to query separately.
But thirdly, since the three queries are structurally identical, differing only in property value (if id is to be considered a property) you can simplify the query by generalizing or anonymizing that which distinguishes them, i.e. use parameters.
START left = node({leftId}), right = node({rightId})
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel
parameters: {leftId:1, rightId:4}, {leftId:2, rightId:5}, {leftId:3, rightId:6}
Since the structure is identical, cypher can cache the execution plan. This makes for good performance, and the query is tidy, maintainable and can be easily extended if later you want to do the same operation on other pairs of nodes.
I have a dataset in neo4j that looks something like this:
(a)-[similar_to]->(b)
Each node has a property called 'id' that is unique. In the following example dataset, each 'a' node had a 'similar_to' relationship to each 'b' node:
a.id b.id
1 5
1 2
2 13
3 12
Here is what the topology looks like:
graph topology image
What I would like to do is to retrieve a table of the two groups of nodes that are connected such that the result would look like:
1, 2, 5, 13
3, 12
The best I've been able to do with Cypher so far is:
MATCH (a)-[r:similar_to*]-(b)
RETURN collect(distinct a.id)
However, the output of this is to print all of the nodes on one row:
5, 1, 2, 3, 12, 13
I have tried various permutations of this query, but keep failing. I've searched the forums for 'subgraph' and 'neo4j', but was unable to find a suitable solution. Any direction/ideas would be appreciated.
Thanks!
My understanding is you want every root node "a" and the group of all nodes that have the direct/indirect relationships [:similar_to] with the "a", if so, try this,
MATCH (a)-[r:similar_to*]->(b)
Where not(a<-[:similar_to]-())
RETURN a, collect(distinct b.id) as group
The "WHERE" clause restricts the node "a" to be the root node of each group.
The "RETURN" clause groups all nodes on the matched paths by the root node "a".
If you want to include each root "a" in the group, just change the path to,
(a)-[r:similar_to*0..]->(b)