I have a graph as acyclic tree with undefined depth. I need to count number of descendants for each node including node itself. So the final result should be something like that:
9
|\
4 4
|\ \
2 1 3
| |\
1 1 1
So for each node this number would be sum of numbers of its descendants + 1.
How can it be done in one query?
I could come up with something like that:
MATCH (n)
SET n.count = SIZE((n)<-[:PARENT*0..]-());
But it means a subquery for each node. Having over 1 300 000 nodes it takes ages.
Better way would be to set "1" for each leaf and ascend to the root calculating each node. Is it possible to do in one query?
I'd go for
MATCH (start)<-[:PARENT*0..]-(n)
RETURN id(start), count(n) as numberOfChildren
which counts how many nodes are found on the path. But I don't know how it performs on really large graphs (my test graph has only ~100s nodes).
You could already optimize your query by limiting the number of paths you are processing, e.g. like this :
MATCH (n)
WHERE EXISTS((n)<-[:PARENT]-())
MATCH path=(n)<-[:PARENT*0..]-(m)
WHERE NOT EXISTS((m)<-[:PARENT]-())
UNWIND nodes(path) AS node
WITH n, COUNT(DISTINCT node) AS count
SET n.count = count
Related
I have the following query
MATCH (n:Mob)
WITH COUNT(n) as total, COLLECT(n) as nodes
UNWIND nodes as node
WITH total, node
WHERE 8000 < node.order < 8100
RETURN node, total
SKIP 10
LIMIT 1
Right now, this query is giving me this error.
If I remove the SKIP part it works.
So my overall question is, how do I SKIP some of the records?
This was mainly a misunderstanding on my part. If you want to filter before bunching them together, then perform the COLLECT at a later stage.
Working code:
MATCH (n:Mob)
WITH COUNT(n) as total, n as node
WITH total, node
WHERE node.order > 1000
WITH total, node
SKIP 10
LIMIT 5
WITH collect(node) as nodes, total
RETURN nodes, total
Background
I want to create a histogram of the relationships starting from a set of nodes.
Input is a set of node ids, for example set = [ id_0, id_1, id_2, id_3, ... id_n ].
The output is a the relationship type histogram for each node (e.g. Map<Long, Map<String, Long>>):
id_0:
- ACTED_IN: 14
- DIRECTED: 1
id_1:
- DIRECTED: 12
- WROTE: 5
- ACTED_IN: 2
id_2:
...
The current cypher query I've written is:
MATCH (n)-[r]-()
WHERE id(n) IN [ id_0, id_1, id_2, id_3, ... id_n ] # set
RETURN id(n) as id, type(r) as type, count(r) as count
It returns the pair of [ id, type ] count like:
id | rel type | count
id0 | ACTED_IN | 14
id0 | DIRECTED | 1
id1 | DIRECTED | 12
id1 | WROTE | 5
id1 | ACTED_IN | 2
...
The result is collected using java and merged to the first structure (e.g. Map<Long, Map<String, Long>>).
Problem
Getting the relationship histogram on smaller graphs is fast but can be very slow on bigger datasets. For example if I want to create the histogram where the set-size is about 100 ids/nodes and each of those nodes have around 1000 relationships the cypher query took about 5 minutes to execute.
Is there more efficient way to collect the histogram for a set of nodes?
Could this query be parallelized? (With java code or using UNION?)
Is something wrong with how I set up my neo4j database, should these queries be this slow?
There is no need for parallel queries, just the need to understand Cypher efficiency and how to use statistics.
Bit of background :
Using count, will execute an expandAll, which is as expensive as the number of relationships a node has
PROFILE
MATCH (n) WHERE id(n) = 21
MATCH (n)-[r]-(x)
RETURN n, type(r), count(*)
Using size and a relationship type, uses internally getDegree which is a statistic a node has locally, and thus is very efficient
PROFILE
MATCH (n) WHERE id(n) = 0
RETURN n, size((n)-[:SEARCH_RESULT]-())
Morale of the story, for using size you need to know the relationship types a labeled node can have. So, you need to know the schema of the database ( in general you will want that, it makes things easily predictable and building dynamically efficient queries becomes a joy).
But let's assume you don't know the schema, you can use APOC cypher procedures, allowing you to build dynamic queries.
The flow is :
Get all the relationship types from the database ( fast )
Get the nodes from id list ( fast )
Build dynamic queries using size ( fast )
CALL db.relationshipTypes() YIELD relationshipType
WITH collect(relationshipType) AS types
MATCH (n) WHERE id(n) IN [21, 0]
UNWIND types AS type
CALL apoc.cypher.run("RETURN size((n)-[:`" + type + "`]-()) AS count", {n: n})
YIELD value
RETURN id(n), type, value.count
I have a path that contains several labels like Shipped, Received, Ready to ship node labels. I want to know if a certain path has multiple occurrences of node labels. They may not be in order.
(Shipped)-[:NEXT]->()-[:NEXT]->()-[:NEXT]-(:ReadyToShip)-[:NEXT]-()-[:NEXT]-(:ReadyToShip)-[:NEXT]-(:Received)
i have many paths but I want to find all the paths which have 2 or more occurrences of the ReadyToShip node labels like the one above. How can I do this? I can extract all the possible path between 2 types of nodes using this :
match path=(s:Shipped)-[:NEXT*]->(m:Received) return distinct extract(p in nodes(path) | labels(p))
But I have to extract it out and filter these myself. How can I do this in Cypher?
[UPDATED]
This query should return every path that has at least 2 ReadyToShip nodes, and the number of ReadyToShip nodes in that path:
MATCH p=(s:Shipped)-[:NEXT*]->(:ReadyToShip)-[:NEXT*]->(:ReadyToShip)-[:NEXT*]->(m:Received)
RETURN
p,
REDUCE(s = 0, n IN NODES(p) | CASE WHEN 'ReadyToShip' IN LABELS(n) THEN s + 1 ELSE s END) AS num;
I am working on a project that uses Node.js, Cypher, and Neo4j. The project's front end occasionally needs to QUICKLY pull a random user. I have seen this query on the internet:
MATCH (n:User) WHERE rand() < 0.1 RETURN n LIMIT 21
but I have no idea what this does. It seems pretty fast, but I would like to understand it. A breakdown of what I know:
MATCH | Match some nodes
(n:User) | Let's call this node n, and it has to be of type User
WHERE | Specify conditions for node match
rand() | Return a random number from 0 to 0.9999...
< | Less than
0.1 | ??
RETURN | Give back the matched node(s)
n | Our node(s)
LIMIT 21 | Don't return more than 21 nodes
What does the rand() and 0.1 do? Does it somehow limit the potential nodes to return?
If this helps, I have around 10,000 nodes
As your question already states, a WHERE clause specifies the conditions for a MATCH to succeed. So, WHERE rand() < 0.1 means the MATCH has a 10% probability of succeeding.
I have a graph with about 800k nodes and I want to create random relationships among them, using Cypher.
Examples like the following didn't work because the cartesian product is too big:
match (u),(p)
with u,p
create (u)-[:LINKS]->(p);
For example I want 1 relationship for each node (800k), or 10 relationships for each node (8M).
In short, I need a query Cypher in order to UNIFORMLY create relationships between nodes.
Does someone know the query to create relationships in this way?
So you want every node to have exactly x relationships? Try this in batches until no more relationships are updated:
MATCH (u),(p) WHERE size((u)-[:LINKS]->(p)) < {x}
WITH u,p LIMIT 10000 WHERE rand() < 0.2 // LIMIT to 10000 then sample
CREATE (u)-[:LINKS]->(p)
This should work (assuming your neo4j server has enough memory):
MATCH (n)
WITH COLLECT(n) AS ns, COUNT(n) AS len
FOREACH (i IN RANGE(1, {numLinks}) |
FOREACH (x IN ns |
FOREACH(y IN [ns[TOINT(RAND()*len)]] |
CREATE (x)-[:LINK]->(y) )));
This query collects all nodes, and uses nested loops to do the following {numLinks} times: create a LINK relationship between every node and a randomly chosen node.
The innermost FOREACH is used as a workaround for the current Cypher limitation that you cannot put an operation that returns a node inside a node pattern. To be specific, this is illegal: CREATE (x)-[:LINK]->(ns[TOINT(RAND()*len)]).