Neo4j Return Top Level Nodes in Tree - neo4j

I have the following data in Neo4j:
CREATE (t1:T {start:1, end:8})
CREATE (t2:T {start:1, end:4})
CREATE (t3:T {start:1, end:2})
CREATE (t4:T {start:3, end:4})
CREATE (t5:T {start:5, end:6})
CREATE (t6:T {start:7, end:8})
CREATE (t2)-[r1:T_OF]->(t1)
CREATE (t3)-[r2:T_OF]->(t2)
CREATE (t4)-[r3:T_OF]->(t2)
CREATE (t5)-[r4:T_OF]->(t1)
CREATE (t6)-[r5:T_OF]->(t1)
This creates a tree with start and end values, which in my actual application are epoch dates. I want to be able to find the nodes that don't have shorter/smaller nodes attached to them in a given range.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
(MAGIC)
RETURN t
My goal is for this to only return t2 and t5, even though t3 and t4 fall in the range. Since they have a T_OF relationship to t2, they should be ignored.
I've tried a few different ways, but unfortunately I can't figure this one out.
Please let me know if I should explain better.

Does this work for you?
It collects all the T nodes with the right date range, and then filters out all the nodes that have a T_OF relationship to any node in the collection.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
WITH COLLECT(t) AS ct
RETURN FILTER(x IN ct
WHERE ALL (y IN ct
WHERE NOT ((x)-[:T_OF]->(y))))
AS result;

You can use paths as expressions and also negate them.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
AND NOT (t)<-[:T_OF]-()
RETURN t

Related

Create relationships in Neo4j

I have a graph with about 800k nodes and I want to create random relationships among them, using Cypher.
Examples like the following didn't work because the cartesian product is too big:
match (u),(p)
with u,p
create (u)-[:LINKS]->(p);
For example I want 1 relationship for each node (800k), or 10 relationships for each node (8M).
In short, I need a query Cypher in order to UNIFORMLY create relationships between nodes.
Does someone know the query to create relationships in this way?
So you want every node to have exactly x relationships? Try this in batches until no more relationships are updated:
MATCH (u),(p) WHERE size((u)-[:LINKS]->(p)) < {x}
WITH u,p LIMIT 10000 WHERE rand() < 0.2 // LIMIT to 10000 then sample
CREATE (u)-[:LINKS]->(p)
This should work (assuming your neo4j server has enough memory):
MATCH (n)
WITH COLLECT(n) AS ns, COUNT(n) AS len
FOREACH (i IN RANGE(1, {numLinks}) |
FOREACH (x IN ns |
FOREACH(y IN [ns[TOINT(RAND()*len)]] |
CREATE (x)-[:LINK]->(y) )));
This query collects all nodes, and uses nested loops to do the following {numLinks} times: create a LINK relationship between every node and a randomly chosen node.
The innermost FOREACH is used as a workaround for the current Cypher limitation that you cannot put an operation that returns a node inside a node pattern. To be specific, this is illegal: CREATE (x)-[:LINK]->(ns[TOINT(RAND()*len)]).

Traversing through all nodes and comparing each one with every other one

I am working on a little project and I have a dataset of about 60k nodes and 500k relationships between those nodes. The nodes are of two types. First type are are recipes and the second type are ingredients. Recipes are composed of ingredients like:
(ingredient)-[:IS_PART_OF]->(recipe)
My objective is to find how many common ingredients two recipes share. I have managed to obtain this information with the following query that compares one recipe to all others (the first one with all others):
MATCH (recipe:RECIPE{ ID: 1000000 }),(other)
WHERE (other.ID >= 1000001 AND other.ID <= 1057690)
OPTIONAL MATCH (recipe:RECIPE)<-[:IS_PART_OF]-(ingredient:INGREDIENT)- [:IS_PART_OF]->(other)
WITH ingredient, other
RETURN other.ID, count(distinct ingredient.name)
ORDER BY other.ID DESC
My first question: How can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
My second question: is it possible to write a loop that would iterate through all the recipes and check for common ingredients? The objective is to compare each recipe with all others. I think this should return (n-1)*(n/2) rows.
I have tried the above and the problem remains. Even with LIMIT and SKIP I can not run the code on the whole set. I have changed my query so it allows me to partition my set accordingly:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient.name) AS MutualIngredients, recipe2.ID
ORDER BY recipe1.ID
Until I get my hands on a better machine this will suffice.
I still haven't solved my first question: how can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
You'll need to play with this, but it's going to be something similar to this:
MATCH (recipe1:RECIPE)<-[:IS_PART_OF]-(ingred:INGREDIENT)-[:IS_PART_OF]->(recipe2:RECIPE)
WHERE ID(recipe1) < ID(recipe2)
RETURN recipe1, collect(ingred.name), recipe2
ORDER BY recipe1.ID
The match pattern gets you all of the common ingredients between two recipes. The WHERE clause ensures that you're not comparing a recipe to itself (because it would share all ingredients with itself). The return clause just gives you the two recipes you're comparing, and what they have in common.
This will be O(n^2) though, and will be very slow.
UPDATE took Nicole's suggestion, which is a good one. That should guarantee each pair is only considered once.
SOLVED: Just to share it if someone else will need it:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient1:INGREDIENT)
MATCH (recipe2)<-[:IS_PART_OF]-(ingredient2:INGREDIENT)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient1.name) + count(distinct ingredient2.name) - count(distinct ingredient.name) AS RecipesUnion, recipe2.ID
ORDER BY recipe1.ID

Create relationship between nodes having same property value in common, using one Cypher query

Beginning with Neo4j 1.9.2, and using Cypher query language, I would like to create relationships between nodes having a specific property value in common.
I have set of nodes G having a property H, without any relationship currently existing between G nodes.
In a Cypher statement, is it possible to group G nodes by H property value and create a relationship HR between each nodes becoming to same group? Knowing that each group have a size between 2 & 10 and I'm having more than 15k of such groups (15k different H values) for about 50k G nodes.
I've tried hard to manage such query without finding a correct syntax. Below is a small sample dataset:
create
(G1 {name:'G1', H:'1'}),
(G2 {name:'G2', H:'1'}),
(G3 {name:'G3', H:'1'}),
(G4 {name:'G4', H:'2'}),
(G5 {name:'G5', H:'2'}),
(G6 {name:'G6', H:'2'}),
(G7 {name:'G7', H:'2'})
return * ;
At the end, I'd like such relationships:
G1-[:HR]-G2-[:HR]-G3-[:HR]-G1
And:
G4-[:HR]-G5-[:HR]-G6-[:HR]-G7-[:HR]-G4
In another case, I may want to update massively the relationships between nodes using/comparing some of their properties. Imagine nodes of type N and nodes of type M, with N nodes related to M with a relationship named :IS_LOCATED_ON. The order of the location can be stored as a property of N nodes (N.relativePosition being Long from 1 to MAX_POSITION), but we may need later to update the graph model such a way: make N nodes linked between themselves by a new :PRECEDES relationship, so that we can find easier and faster next node N on the given set.
I'd expect such language may allow to update massive set of nodes/relationships manipulating their properties.
Is it not possible?
If not, is it planned or may be it planned?
Any help would be greatly appreciated.
Since there's nothing in the data you supplied to get rank, I've played with collections
to get one as follows:
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
RETURN n.name, n.H, rank ORDER BY n.H, n.name;
Building off of that you can then start determining relationships
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, LENGTH(FILTER(x IN others : x.name < n.name)) as rank
WITH n, others, rank, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
RETURN n.name, n.H, rank, next ORDER BY n.H, n.name;
Finally ( and slightly more condensed )
START
n=node(*), n2=node(*)
WHERE
HAS(n.H) AND HAS(n2.H) AND n.H = n2.H
WITH n, n2 ORDER BY n2.name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others : x.name > n.name)),
HEAD(others)
) as next
CREATE n-[:HR]->next
RETURN n, next;
You can just do it like that, maybe indicate direction in your relationships:
CREATE
(G1 { name:'G1', H:'1' }),
(G2 { name:'G2', H:'1' }),
(G3 { name:'G3', H:'1' }),
(G4 { name:'G4', H:'2' }),
(G5 { name:'G5', H:'2' }),
(G6 { name:'G6', H:'2' }),
(G7 { name:'G7', H:'2' }),
G1-[:HR]->G2-[:HR]->G3-[:HR]->G1,
G4-[:HR]->G5-[:HR]->G6-[:HR]->G7-[:HR]->G1
See http://console.neo4j.org/?id=ujns0x for an example.

neo4j node aggregate filter

I've got a sub graph, of node type c which has a relationship to t1 and t2. Node t1 has a relationship with w1 and w2. Node t2 has a relationship with w1.
What I want to query with cypher is from node c return the w nodes that have 2 or more t nodes related. ie w1 only.
Apparently you can't aggregate in the WHERE clause like
START c=node(7)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WHERE COUNT(t) >= 2
RETURN w.WName;
Maybe looking at it another way, this doesn't work either as I only want the w's that only relate to t1 and t2...?
START c=node(7), t1=node(10), t2=node(8)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WHERE t in [t1, t2]
RETURN t, w.WName;
Update
Anyone wanting something like the second one, this works:
START c=node(7), t1=node(8), t2=node(10)
MATCH (c)-[:T_TO]-(t1)-[:W_TO]-(w),(c)-[:T_TO]-(t2)-[:W_TO]-(w)
RETURN w.WName;
How about
START c=node(7)
MATCH (c)-[:T_TO]-(t)-[:W_TO]-(w)
WITH COUNT(t) as tCount,w
WHERE tCount >= 2
RETURN w.WName;

Limiting a Neo4j cypher query results by sum of relationship property

Is there a way to limit a cypher query by the sum of a relationship property?
I'm trying to create a cypher query that returns nodes that are within a distance of 100 of the start node. All the relationships have a distance set, the sum of all the distances in a path is the total distance from the start node.
If the WHERE clause could handle aggregate functions what I'm looking for might look like this
START n=node(1)
MATCH path = n-[rel:street*]-x
WHERE SUM( rel.distance ) < 100
RETURN x
Is there a way that I can sum the distances of the relationships in the path for the where clause?
Sure, what you want to do is like a having in a SQL query.
In cypher you can chain query segments and use the results of previous parts in the next part by using WITH, see the manual.
For your example one would assume:
START n=node(1)
MATCH n-[rel:street*]-x
WITH SUM(rel.distance) as distance
WHERE distance < 100
RETURN x
Unfortunately sum doesn't work with collections yet
So I tried to do it differently (for variable length paths):
START n=node(1)
MATCH n-[rel:street*]-x
WITH collect(rel.distance) as distances
WITH head(distances) + head(tail(distances)) + head(tail(tail(distances))) as distance
WHERE distance < 100
RETURN x
Unfortunately head of an empty list doesn't return null which could be coalesced to 0 but just fails. So this approach would only work for fixed length paths, don't know if that's working for you.
I've come across the same problem recently. In more recent versions of neo4j this was solved by the extract and reduce clauses. You could write:
START n=node(1)
MATCH path = (n)-[rel:street*..100]-(x)
WITH extract(x in rel | x.distance) as distances, x
WITH reduce(res = 0, x in rs | res + x) as distance, x
WHERE distance <100
RETURN x
i dont know about a limitation in the WHERE clause, but you can simply specify it in the MATCH clause:
START n=node(1)
MATCH path = n-[rel:street*..100]-x
RETURN x
see http://docs.neo4j.org/chunked/milestone/query-match.html#match-variable-length-relationships

Resources