Create relationships in Neo4j - neo4j

I have a graph with about 800k nodes and I want to create random relationships among them, using Cypher.
Examples like the following didn't work because the cartesian product is too big:
match (u),(p)
with u,p
create (u)-[:LINKS]->(p);
For example I want 1 relationship for each node (800k), or 10 relationships for each node (8M).
In short, I need a query Cypher in order to UNIFORMLY create relationships between nodes.
Does someone know the query to create relationships in this way?

So you want every node to have exactly x relationships? Try this in batches until no more relationships are updated:
MATCH (u),(p) WHERE size((u)-[:LINKS]->(p)) < {x}
WITH u,p LIMIT 10000 WHERE rand() < 0.2 // LIMIT to 10000 then sample
CREATE (u)-[:LINKS]->(p)

This should work (assuming your neo4j server has enough memory):
MATCH (n)
WITH COLLECT(n) AS ns, COUNT(n) AS len
FOREACH (i IN RANGE(1, {numLinks}) |
FOREACH (x IN ns |
FOREACH(y IN [ns[TOINT(RAND()*len)]] |
CREATE (x)-[:LINK]->(y) )));
This query collects all nodes, and uses nested loops to do the following {numLinks} times: create a LINK relationship between every node and a randomly chosen node.
The innermost FOREACH is used as a workaround for the current Cypher limitation that you cannot put an operation that returns a node inside a node pattern. To be specific, this is illegal: CREATE (x)-[:LINK]->(ns[TOINT(RAND()*len)]).

Related

Replacing relations from one node to another by one single relation

I have been pushing several times the same relationship between 2 nodes in Neo4j.
It was a mistake as it makes the visualization less clear.
Now, I would like to replace those several relations between 2 nodes by one single relation. It would be great if we could keep the number of relations inside a property "count" on the new unique relation.
What would be an efficient way to solve this problem ?
I have about 100 000 of relations and I am a bit worried about the time it would take.
Here is a quick example to make the problem clearer :
I have :
Node A -- R1 -- Node B
Node A -- R2 -- Node B
And I would like to have
Node A -- R {count : 2} -- Node B
Thanks!
I assume these relationships don't have any properties and Direction of the relationships doesn't matter.
You can combine these relationships with Cypher Query as shown:
MATCH (p:Node)-[r]-(c:Node)
WHERE ID(p) > ID(c)
DELETE r
WITH p, c, COUNT(r) as count
CREATE (p)-[:R{count:count}]->(c)
If you want to merge relationships having the same directions only then you can use the following query:
MATCH (p:Node)-[r]->(c:Node)
DELETE r
WITH p, c, COUNT(r) as count
CREATE (p)-[newrel:R{count:count}]->(c)
If you want to merge the properties as well then you can take help of
apoc plugin's apoc.refactor.mergeRelationships method.

Create association between nodes if one doesnt exist using cypher

Say there are 2 labels P and M. M has nodes with names M1,M2,M3..M10. I need to associate 50 nodes of P with each Node of M. Also no node of label P should have 2 association with node of M.
This is the cypher query I could come up with, but doesn't seem to work.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u limit 50
CREATE (r)-[:OWNS]->(u);
This way I would run for all 10 nodes of M. Any help in correcting the query is appreciated.
You can utilize apoc.periodic.* library for batching. More info in documentation
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
If there will always be just one (r)-[:OWNS]->(u) relationship, I would change my first match to include
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u) and not (r)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
So there is no way the procedure will fall into a loop
This query should be a fast and easy-to-understand. It is fast because it avoids Cartesian products:
MATCH (u:P)
WHERE not (:M)-[:OWNS]->(u)
WITH u LIMIT 50
MATCH (r:M {Name:'M1'})
CREATE (r)-[:OWNS]->(u);
It first matches 50 unowned P nodes. It then finds the M node that is supposed to be the "owner", and creates an OWNS relationship between it and each of the 50 P nodes.
To make this query even faster, you can first create an index on :M(Name) so that the owning M node can be found quickly (without scanning all M nodes):
CREATE INDEX ON :M(Name);
This worked for me.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u,r limit 50
CREATE (r)-[:OWNS]->(u);
Thanks for Thomas for mentioning limit on u and r.
I think one way to connect all 10 nodes :M in one query
MATCH (m:M)
WITH collect(m) as nodes
UNWIND nodes as node
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH node,p limit 50
CREATE (node)-[:OWNS]->(p)
Although I am not really sure if we need to collect and unwind, could just simplify it to:
MATCH (m:M)
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH m,p limit 50
CREATE (node)-[:OWNS]->(p)

Find all relations starting with a given node

In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.

finding the farthest node using Neo4j (node without any incoming relation)

I have created a graph db in Neo4j and want to use it for generalization purposes.
There are about 500,000 nodes (20 distinct labels) and 2.5 million relations (50 distinct types) between them.
In an example path : a -> b -> c-> d -> e
I want to find out the node without any incoming relations (which is 'a').
And I should do this for all the nodes (finding the nodes at the beginning of all possible paths that have no incoming relations).
I have tried several Cypher codes without any success:
match (a:type_A)-[r:is_a]->(b:type_A)
with a,count (r) as count
where count = 0
set a.isFirst = 'true'
or
match (a:type_A), (b:type_A)
where not (a)<-[:is_a*..]-(b)
set a.isFirst = 'true'
Where is the problem?!
Also, I have to create this code in neo4jClient, too.
Your first query will only match paths where there is a relationship [r:is_a], so counting r can never be 0. Your second query will return any arbitrary pair of nodes labeled :typeA that aren't transitively related by [:is_a]. What you want is to filter on a path predicate. For the general case try
MATCH (a)
WHERE NOT ()-->a
This translates roughly "any node that does not have incoming relationships". You can specify the pattern with types, properties or labels as needed, for instance
MATCH (a:type_A)
WHERE NOT ()-[:is_a]->a
If you want to find all nodes that have no incoming relationships, you can find them using OPTIONAL MATCH:
START n=node(*)
OPTIONAL MATCH n<-[r]-()
WITH n,r
WHERE r IS NULL
RETURN n

Cypher query to find all paths with same relationship type

I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.

Resources