Cypher to sum dynamic properties values

Cypher to sum dynamic properties values - neo4j

I have some dynamic datasource property values on a node. There are around 5 datasources which could have one or more data sources. These properties are set via a #Properties annotation using a Map in Spring Data Neo4j. Once the node is saved, the values are set on the node as follows:
dataSource.ds1.count="10"
dataSource.ds2.count="20"
dataSource.ds3.count="30"
... etc
I want to be able to sum the values of these counts dynamically if they have been set and use this value to order the results.
Here is the some of the starting Cypher I have so far which lists the properties:
MATCH (n:Entity)
WITH n, [k in keys(n) where k =~ 'dataSources.(ds1|ds2).count'] as props
RETURN n.name, props
This returns:
╒═══════════╤═════════════════════════════════════════════════╕
│"n.name" │"props" │
╞═══════════╪═════════════════════════════════════════════════╡
│"ENT1" │["dataSources.ds1.count","dataSources.ds2.count"]│
├───────────┼─────────────────────────────────────────────────┤
│"ENT2" │["dataSources.ds1.count","dataSources.ds2.count"]│
├───────────┼─────────────────────────────────────────────────┤
│"ENT3" │["dataSources.ds1.count"] │
└───────────┴─────────────────────────────────────────────────┘
How can I use the properties names to get the values and sum them?
I was thinking of something similar to this where I could use apoc.sum but am unsure of how to iterate over the property names to get their values.
RETURN n.name, apoc.coll.sum([COALESCE(toInteger(n.`dataSources.ds1.count`), 0), etc]) as count
ORDER by count DESC
Any help would be much appreciated.

Here is one way:
MATCH (n:Entity)
RETURN
n.name,
COALESCE(n.`dataSources.ds1.count`, 0) + COALESCE(n.`dataSources.ds2.count`, 0) AS count
ORDER BY count
The COALESCE function returns its first non-NULL argument.
Or, if you pass the list of property names in a props parameter:
MATCH (n:Entity)
RETURN n.name, REDUCE(s = 0, p IN $props | s + COALESCE(n[p], 0)) AS count
ORDER BY count

Related

Replacing a string on whole database

&amp char has somehow got through different imports into the db on many different node attributes and relationship attributes. How do I replace all & strings with regular & char?
I don't know all the possible property names that I can filter on.

If you want to make this efficient, you can use CALL{} in transactions of X rows
The :auto prefix is needed if you want to run this query in the Neo4j browser
This line
WITH n, [x in keys(n) WHERE n[x] CONTAINS '&amp'] AS keys
is needed to avoid trying a replace function on a property that is not of String type, in which case Neo4j will throw an exception.
Full query
:auto MATCH (n)
CALL {
WITH n
WITH n, [x in keys(n) WHERE n[x] CONTAINS '&amp'] AS keys
CALL apoc.create.setProperties(n, keys, [k in keys | replace(n[k], '&amp', '&')])
YIELD node
RETURN node
} IN TRANSACTIONS OF 100 ROWS
RETURN count(*)
If you're using a Neo4j cluster, you will need to run this on the leader of the database with the bolt connection ( not using the neo4j:// protocol.
Same query for the relationships now
:auto MATCH (n)-[r]->(x)
CALL {
WITH r
WITH r, [x in keys(r) WHERE r[x] CONTAINS '&amp'] AS keys
CALL apoc.create.setRelProperties(r, keys, [k in keys | replace(r[k], '&amp', '&')])
YIELD rel
RETURN rel
} IN TRANSACTIONS OF 100 ROWS
RETURN count(*)

You can find the answer on this documentation:
https://neo4j.com/labs/apoc/4.3/overview/apoc.create/apoc.create.setRelProperties/
For example, below will replace &amp with & in all properties in all nodes.
MATCH (p)
// collect keys (or properties) in node p and look for properties with &amp
WITH p, [k in keys(p) WHERE p[k] CONTAINS '&amp'] AS keys WHERE size(keys) > 0
// use apoc function to update the values in each prop key
CALL apoc.create.setProperties(p, keys, [k in keys | replace(p[k], '&amp', '&')])
YIELD node
RETURN node

Update nodes by a list of ids and values in one cypher query

I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.

Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.

Collection in WITH clause gets expanded to one element per row

I want to create a map projection with node properties and some additional information.
Also I want to collect some ids in a collection and use this later in the query to filter out nodes (where ID(n) in ids...).
The map projection is created in an apoc call which includes several union matches.
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
MATCH (n)-[:has]->({"Vacation"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u, nodeWithInfo
This does not return anything because, when the where part is evaluated it doesn´t check "nodesWithAdditionalInfosIds" as a flat list but instead only gets one id per row.
The problem only exists because I am passing the ids (nodesWithAdditionalInfosIds) AND the nodeProjection (nodeWithInfo) on in the WITH clause.
If I instead only use the id collection and don´t use the nodeWithInfo projection the following adjustement works and returns my only the nodes which ids are in the id collection:
...
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
MATCH (n)-[:has]->({"Urlaub"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u
If I just return the collection "nodesWithAdditionalInfosIds" directly after the WITH clause in both examples this gets obvious. Since the first one generates a flat list in one result row and the second one gives me one id per row.
I have the feeling that I am missing a crucial knowledge about neo4js With clause.
Is there a way I can pass on my listOfIds and use it as a flat list without the need to have an exclusive WITH clause for the collection?
edit:
Right now I am using the following workaround:
After I do the check on the ID of "n" and "u" I don´t return but instead keep the filtered "n" and "u" nodes and start a second apoc call that returns "nodeWithInfo" like before.
WITH n, u
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo, n, u
WHERE nodeWithInfo.id = ID(n) OR nodeWithInfo.id = ID(u)
RETURN nodeWithInfo, n, u
This way I can return the nodes n, u and the additional information (to one of the nodes) per row. But I am sure there must be a better way.
I know ids in neo4j have to be used with care, if at all. In this case I only need them to be valid inside this query, so it doesn´t matter if the next time the same node has another id.
The problem is stripped down to the core problem (in my opinion), the original query is a little bigger with several UNION MATCH inside apoc and the actual match on nodes which ids are contained in my collection is checking for some more restrictions instead of asking for any node.

Aggregating functions, like COLLECT(), aggregate over a set of "grouping keys".
In the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
the grouping key is nodeWithInfo. Therefore, each nodesWithAdditionalInfosIds would always be a list containing one value.
And in the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
there is no grouping key. Therefore, in this situation, nodesWithAdditionalInfosIds will contain all the nodeWithInfo.id values.

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?

Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.

You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .

Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

Remove duplicates from Node array properties

I have a property A on my nodes that holds an array of string values:
n.A=["ABC","XYZ","123","ABC"]
During merges I frequently will write code similar to n.A = n.A + "New Value". The problem I've been running into is that I end up with duplicate values in my arrays; not insurmountable but I'd like to avoid it.
How can I write a cypher query that will remove all duplicate values from the array A? Some duplicates have already been inserted at this point, and I'd like to clean them up.
When adding new values to existing arrays how can I make sure I only save a copy of the array with distinct values? (may end up being the exact same logic as used to solve my first question)

The query to add a non-duplicate value can be done efficiently (in this example, I assume that the id and newValue parameters are provided):
OPTIONAL MATCH (n {id: {id}})
WHERE NONE(x IN n.A WHERE x = {newValue})
SET n.A = n.A + {newValue};
This query does not create a temporary array, and will only alter the n.A array if it does not already contain the {newValue} string.
[EDITED]
If you want to (a) create the n node if it does not already exist, and (b) append {newValue} to n.A only if {newValue} is not already in n.A, this should work:
OPTIONAL MATCH (n { id: {id} })
FOREACH (x IN (
CASE WHEN n IS NULL THEN [1] ELSE [] END ) |
CREATE ({ id: {id}, A: [{newValue}]}))
WITH n, CASE WHEN EXISTS(n.A) THEN n.A ELSE [] END AS nA
WHERE NONE (x IN nA WHERE x = {newValue})
SET n.A = nA + {newValue};
If the OPTIONAL MATCH fails, then the FOREACH clause will create a new node node (with the {id} and an array containing {newValue}), and the following SET clause will do nothing because n would be NULL.
If the OPTIONAL MATCH succeeds, then the FOREACH clause will do nothing, and the following SET clause will append {newValue} to n.A iff that value does not already exist in n.A. If the SET should be performed, but the existing node did not already have the n.A property, then the query would concatenate an empty array to {newValue} (thus generating an array containing just {newValue}) and set that as the n.A value.

Combined some of the information on UNWIND with other troubleshooting and came up with the following Cypher query for removing duplicates from existing array properties.
match (n)
unwind n.system as x
with distinct x, n
with collect(x) as set, n
set n.system = set

Once you've cleaned up existing duplicates, you can use this when adding new values:
match (n)
set n.A = filter (x in n.A where x<>"newValue") + "newValue"

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Cypher to sum dynamic properties values - neo4j

Related

Replacing a string on whole database

Update nodes by a list of ids and values in one cypher query

Collection in WITH clause gets expanded to one element per row

Neo4j indices slow when querying across 2 labels

Remove duplicates from Node array properties

Categories

Resources