Difficulty using UNWIND in Neo4j - neo4j

I am very new to Neo4j, so this is probably a simple question.
I have several hundred nodes with a property "seq" (for sequence). This number basically represents the day of the month. So all of these several hundred nodes have a seq property between 1 and 31. I want to combine all the nodes with the same seq into a single node - so that all the nodes with seq = 1 are combined into a "January 1" node. All nodes with seq =2 are combined into a "January 2" node, etc. I have a property of "pat_id" that will be combined into an array from all the merged noes for a day.
Here is my code:
WITH range(1,31) as counts
UNWIND counts AS cnt
MATCH (n:OUTPT {seq:cnt})
WITH collect(n) AS nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {
pat_id:'combine',
seq:'discard'}, mergeRels:true})
YIELD node
RETURN node
I initially tried to do this with a FOREACH loop, but I can't do a MATCH inside a FOREACH.
I have been doing an UNWIND, but it is only merging the nodes with the first value (seq = 1). I assume this is because the RETURN statement ends the loop. But when I remove the RETURN statement, I get this error:
Query cannot conclude with CALL (must be RETURN or an update clause) (line 5, column 1 (offset: 99))
"CALL apoc.refactor.mergeNodes(nodes, {properties: {"
Any help would be appreciated.

The problem is with this line:
WITH collect(n) AS nodes
You've matched to all :OUTPT nodes with a sequence number within 1-31, but then you aggregate them into a single large collection, then merge them into a single node.
If you want to collect the nodes according to the sequence number, then the sequence number (in your case, cnt) needs to be the grouping key of the aggregation:
WITH cnt, collect(n) AS nodes
That will get you a row per distinct cnt value, with the list of nodes with the same count on the associated row.
Because Cypher operations execute per row, your APOC refactor call will execute per row. Because each row is associated with a different cnt value, and each has a different list, you will be performing the refactoring for each list separately.
The output will be one row per cnt value, with a single node per row (as a result of merging all the nodes in that row's list into a single node).

Related

Neo4j: Count the number of neighbours with a specific label for each node

As a sort of validation test, I want to count the number neighbouring nodes with a given labelB for every node with labelA and then return any labelA for each the number of neighbours is not equal to 2.
Basically, parent:parent_name should always have 2 connected nodes with have label child_name. How do I return the nodes for which this statement is False?
At the moment I am doing a very time consuming "match everything in neo4j" and then groupby and count in python Pandas.
Cypher:
MATCH (parent: parent_name)
MATCH (parent)-->(child: child_name)
RETURN parent.id, child.id, 'child_name' as child_label, 'parent_name' as parent_label
Pandas post-processing:
grouped = df.groupby(['child_label', 'child.id']).apply(len)
result = grouped[grouped != 2].index # returns pairs of child_label and child.id
I can't do this at scale. Finding neighbours is one of the main use-cases of graphs! There must be a way to do this!
Maybe using UNDWIND for each parent_name? If I use count under an UNWIND it simply counts the total, not the number for that node...
a general approach could be along these lines, finding any node with label :LabelA that has not exactly two neighbours (regardless of direction of the relationship) with label LabelB
MATCH (n:LabelA)
WITH n,
SIZE([(n)--(m:LabelB) | m ]) AS nodeCountLabelB
WHERE nodeCountLabelB <> 2
RETURN n
Here is one way to get the id of each parent node that has the wrong number of child nodes, along with a (possibly empty) list of its existing child ids:
MATCH (parent:parent_name)
WITH parent.id AS parentId, [(parent)-->(child:child_name) | child.id] AS childIds
WHERE SIZE(childIds) <> 2
RETURN parentId, childIds

Neo4J: How can I find if a path traversing multiple nodes given in a list exist?

I have a graph of nodes with a relationship NEXT with 2 properties sequence (s) and position (p). For example:
N1-[NEXT{s:1, p:2}]-> N2-[NEXT{s:1, p:3}]-> N3-[NEXT{s:1, p:4}]-> N4
A node N might have multiple outgoing Next relationships with different property values.
Given a list of node names, e.g. [N2,N3,N4] representing a sequential path, I want to check if the graph contains the nodes and that the nodes are connected with relationship Next in order.
For example, if the list contains [N2,N3,N4], then check if there is a relationship Next between nodes N2,N3 and between N3,N4.
In addition, I want to make sure that the nodes are part of the same sequence, thus the property s is the same for each relationship Next. To ensure that the order maintained, I need to verify if the property p is incremental. Meaning, the value of p in the relationship between N2 -> N3 is 3 and the value p between N3->N4 is (3+1) = 4 and so on.
I tried using APOC to retrieve the possible paths from an initial node N using python (library: neo4jrestclient) and then process the paths manually to check if a sequence exists using the following query:
q = "MATCH (n:Node) WHERE n.name = 'N' CALL apoc.path.expandConfig(n {relationshipFilter:'NEXT>', maxLevel:4}) YIELD path RETURN path"
results = db.query(q,data_contents=True)
However, running the query took some time that I eventually stopped the query. Any ideas?
This one is a bit tough.
First, pre-match to the nodes in the path. We can use the collected nodes here to be a whitelist for nodes in the path
Assuming the start node is included in the list, a query might go like:
UNWIND $names as name
MATCH (n:Node {name:name})
WITH collect(n) as nodes
WITH nodes, nodes[0] as start, tail(nodes) as tail, size(nodes)-1 as depth
CALL apoc.path.expandConfig(start, {whitelistNodes:nodes, minLevel:depth, maxLevel:depth, relationshipFilter:'NEXT>'}) YIELD path
WHERE all(index in range(0, size(nodes)-1) WHERE nodes[index] = nodes(path)[index])
// we now have only paths with the given nodes in order
WITH path, relationships(path)[0].s as sequence
WHERE all(rel in tail(relationships(path)) WHERE rel.s = sequence)
// now each path only has relationships of common sequence
WITH path, apoc.coll.pairsMin([rel in relationships(path) | rel.p]) as pairs
WHERE all(pair in pairs WHERE pair[0] + 1 = pair[1])
RETURN path

Update nodes by a list of ids and values in one cypher query

I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.
Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.

How to count the numbers of node types in the Neo4j graph?

MATCH (p:Product), (s:Student), (b:Boy), (a:Attribute)
RETURN count(distinct(p)), count(distinct(s)), count(distinct(b)), count(distinct(a))
I want to know how many counts of each node types in the graph using this query. However, the Neo4j Browser gives a warning saying that this query produces a cartesian product. Is there a better way to write the query?
Yes. You want to make sure your query uses the NodeCountFromCountStore operator (you can view this in the query plan if you EXPLAIN the query, so you can check before you actually execute).
The tricky part of this is that the only way for this plan to be used is if you match to all nodes of a label, then get the count (no other variables in your WITH or RETURN!).
You can try this approach, which unions queries together, and keeps the NodeCountFromStore by adding the label column after you get the count:
match (n:Product)
with count(n) as count
return 'Product' as label, count
union all
match (n:Student)
with count(n) as count
return 'Student' as label, count
union all
match (n:Boy)
with count(n) as count
return 'Boy' as label, count
union all
match (n:Attribute)
with count(n) as count
return 'Attribute' as label, count
To get a variety of statistics for your DB, including a count of the number of nodes for every label, you can use the APOC function apoc.meta.stats.
The following query gets just the label node counts, returning a map of label names to node counts:
CALL apoc.meta.stats() YIELD labels
RETURN labels;

How to create relationship based on common Epochtime property

I am trying to do a model for state changes of a batch. I capture the various changes and I have an Epoch time column to track these. I managed to get this done with the below code :
MATCH(n:Batch), (n2:Batch)
WHERE n.BatchId = n2.Batch
WITH n, n2 ORDER BY n2.Name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others where x.EpochTime > n.EpochTime)),
HEAD(others)
) as next
CREATE (n)-[:NEXT]->(next)
RETURN n, next;
It makes my graph circular because of the HEAD(others) and doesn't stop at the Node with the maximum Epoch time. If I remove the HEAD(others) then I am unable to figure out how to stop the creation of relationship for the last node. Not sure how to put conditions around the creation of relationship so I can stop creating relationships when the next node is null
This might do what you want:
MATCH(n:Batch)
WITH n ORDER BY n.EpochTime
WITH n.BatchId AS id, COLLECT(n) AS ns
CALL apoc.nodes.link(ns, 'NEXT')
RETURN id, ns;
It orders all the Batch nodes by EpochTime, and then collects all the Batch nodes with the same BatchId value. For each collection, it calls the apoc procedure apoc.nodes.link to link all its nodes together (in chronological order) with NEXT relationships. Finally, it returns each distinct BatchId and its ordered collection of Batch nodes.

Resources