Visualize connected components in Neo4j - neo4j

I can find the highest densely connected component in the graph using the code below:
CALL algo.unionFind.stream('', ':pnHours', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
MATCH (node) where id(node) = nodeId
WITH setId, collect(node) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 1 // limiting to 3
RETURN nodes;
But it does not help me visualize the topmost densely connected component (sub-graph) because the output graph it emits are disjoint nodes. Is it possible to visualize the densely connected component. If yes, then how

I tried this query but I am getting different the result.
I haven't used these algorithms and I don't know much about it, but I think you added an extra character (colon) in the query.
Can you check with pnHours instead of :pnHours.
I remove colon(:) from the query and I am getting the proper result (also I am able to get the relationships as well because Neo4j browser fetches it although it's not specified in the query).
If you still don't get check the following query:
CALL algo.unionFind.stream('', 'pnHours', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
MATCH (node) where id(node) = nodeId
WITH setId, collect(node) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 1 // limiting to 3
WITH nodes
UNWIND nodes AS node
MATCH (node)-[r:pnHours]-()
RETURN node,r;

If you want to visualize then in Neo4j browser then use:
CALL algo.unionFind.stream('', ':pnHours', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
MATCH p=(node)-->() where id(node) = nodeId
WITH setId, collect(p) as paths
// order by the size of nodes list descending
ORDER BY size(paths) DESC
LIMIT 1 // limiting to 3
// Maybe you need to unwind paths to be able to visualize in Neo4j browser
RETURN paths;
It is not the most optimized query but should do just fine on small datasets.

The following query should return all the single-step paths in the largest pnHours-connected component (i.e., the one having the most nodes). It only gets the paths for the largest component.
CALL algo.unionFind.stream(null, 'pnHours', {}) YIELD nodeId, setId
WITH setId, COLLECT(nodeId) as nodeIds
ORDER BY SIZE(nodeIds) DESC
LIMIT 1
UNWIND nodeIds AS nodeId
MATCH path = (n)-[:pnHours]->()
WHERE ID(n) = nodeId
RETURN path
The neo4j browser's Graph visualization of the results will show all the nodes in the component and their relationships.

Related

Find closest node that matches criteria but cannot have intermediary nodes that have certain properties

I have the following criteria:
From a starting node with an internal ID of X, I want to grab the closest WorkOrderNode that has an action_code of "INS".
However, between those two nodes, there can be an arbitrary number of nodes, and I need to make sure these intermediary nodes are not WorkOrderNodes that have action_codes other than "MV" or "SPT".
Here is my latest attempt:
MATCH p=(a)<-[*]-(b:WorkOrderNode {action_code: 'INS'})
WHERE ID(a)=105
RETURN b, size(relationships(p)) as distance
ORDER BY distance
LIMIT 1
This fulfills the 1st criteria but I'm having problems implementing the 2nd criteria. I tried using AND NOT EXISTS((b)-[*]->(c:WorkOrderNode) WHERE c.action_code NOT IN ['MV', 'SPT'] in the parent WHERE clause but neo4j throws an error because I can't have a WHERE clause in an EXISTS clause.
Might not be the most optimized, but I would try the following:
MATCH p=(a)<-[*]-(b:WorkOrderNode {action_code: 'INS'})
WHERE ID(a)=105 AND
NONE(node IN nodes(p) WHERE node:WorkOrderNode AND node.action_code IN ['MV', 'SPT'])
RETURN b, length(p) as distance
ORDER BY distance
LIMIT 1

How connect Result nodes Neo4j

Neo4j has a tick box option 'connect results nodes' which i gather runs a second query to connect nodes after your initial query.
eg
MATCH (n:User)
where n.Verified = 'false'
return n
order by n.followers DESC
Limit 40
This query returns 40 nodes which are connected to each other. While this works in the Neo4j browser, I cant quite get it to connect in Neo4j bloom. So question is whats the second query thats run to connect the result nodes under the hood?
Thanks
For anyone who falls into the same problem. The answer is a subquery which checks if node ids are in the original set. In the first query you return a list of node ids using the built in ID function, then collect the nodes. In the subquery you unwind the nodes and in the subquery where clause filter the using the list of IDs.
Match (b:User)
where b.Verified = 'false' and b.followers > 60
with collect(b) as users, collect(ID(b)) as listUsers
CALL{
with users,listUsers
unwind users as x
match(x)-[r]-(c:User)
where ID(c) in listUsers
return x,r,c
}
return x,r,c

Efficiently assigning UUIDs to connected components in Neo4j

I have partitioned my graph into ~400,000 connected components using the algo.unionFind function from the Neo4j Graph Algorithms library.
Each node n within the same connected component has the same n.partition value. However, now I want to assigned each connected component a UUID so that each node n in a connected component will have n.uuid populated with a component UUID. What is the most efficient way of doing this?
Currently I am getting a list of all n.partition values and then going through each partition and running a Cypher query to update all nodes of that partition to have a generated UUID. I'm using the Python wrapper py2neo and this process is quite slow.
Edit:
The Cypher queries I am currently using are:
MATCH (n)
RETURN DISTINCT n.partition AS partition
to get a list of partitions ids and then iteratively calling:
MATCH (n)
WHERE n.partition = <PARTITION_ID>
SET n.uuid = <GENERATED_UUID>
on each of the partition ids.
Edit 2:
I am able to get through ~180k/400k of the connected components using the following query:
CALL apoc.periodic.iterate(
"MATCH (n)
WITH n.partition as partition, COLLECT(n) as nodes
RETURN partition, nodes, apoc.create.uuid() as uuid",
"FOREACH (n in nodes | SET n.uuid = uuid)",
{batchSize:1000, parallel:true}
)
before getting a heap error: "neo4j.exceptions.ClientError: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.OutOfMemoryError: Java heap space"
The best way would be to install the APOC plug-in to Neo4j so that you can use the UUID function apoc.create.uuid() in Cypher. (so that it can be generated, and assigned, in the same transaction)
To create 1 uuid per partition, you will need to use WITH to store the uuid in a temporary variable. It will be run per row, so you need to do it once you have one partition
USING PERIODIC COMMIT 5000 // commit every 5k changes
MATCH (n)
WITH DISTINCT n.partition as p // will exclude null
WITH p, apoc.create.uuid() as uuid // create reusable uuid
// now just match and assign
MATCH (n)
WHERE n.partition = p
SET n.uuid = uuid
or as InverseFalcon suggested
MATCH (n)
WHERE exists(n.partition) // to filter out nulls
WITH n.partition as p, collect(n) as nodes // collect nodes so each row is 1 partition, and it's nodes
WITH p, nodes, apoc.create.uuid() as uuid // create reusable uuid
FOREACH (n in nodes | SET n.uuid = uuid) // assign uuid to each node in collection
The first query is more periodic commit friendly, since it doesn't need to load everything into memory to start doing assignments. Without the perodic commit statement though, it will eventually load everything into memory as it has to hold on to it for the transaction log. Once it hits a commit point, it can clear the transaction log to keep memory use down.
If your data set isn't too large though, the second query should be faster because by holding everything in memory after the first node scan, it doesn't need to run another node scan to find all the nodes. Periodic commit won't help here because if you blow the heap, it will almost certainly be during the initial scan/collect phase.
To do this you'll need to collect nodes by their partition value, which means you'll have a single row per distinct partition. Then you create the UUID (it will execute per row), then you can use FOREACH to apply to each node in the partition:
MATCH (n)
// WHERE exists(n.partition) // only if there are nodes in the graph without partitions
WITH n.partition as partition, collect(n) as nodes
WITH partition, nodes, randomUUID() as uuid
FOREACH (n in nodes | SET n.uuid = uuid)
Depending on the number of nodes in your graph, you may need to combine this with some batch processing, such as apoc.periodic.iterate(), to avoid heap issues.

Neo4j: get all relations between queried nodes

I want to make a cypher query that do below tasks:
there is a given start node, and I want to get all related nodes in 2 hops
sort queried nodes by hops asc, and limit it with given number
and get all relations between result of 1.
I tried tons of queries, and I made below query for step 1, 2
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10]
But when I try to get relationships in queried path with below query, it returns all relationships in the path :
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10], relationships(path)
I think I have to match again with result of first match instead of get relationships from path directly, but all of my attempts have failed.
How can I get all relationships between queried nodes?
Any helps appreciate, thanks a lot.
[EDITED]
Something like this may work for you:
MATCH (start {eid:12018})-[rels:REAL_CALL*..2]-(end)
RETURN start, end, COLLECT(rels) AS rels_collection
ORDER BY
REDUCE(s = 2, rs in rels_collection | CASE WHEN SIZE(rs) < s THEN SIZE(rs) ELSE s END)
LIMIT 10;
The COLLECT aggregation function will generate a collection (of relationship collections) for each distinct start/end pair. The LIMIT clause limits the returned results to the first 10 start/end pairs, based on the ORDER BY clause. The ORDER BY clause uses REDCUE to calculate the minimum size of each path to a given end node.

neo4j how to use count(distinct()) over the nodes of path

I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;

Resources