Remove duplicates from Node array properties

Remove duplicates from Node array properties - neo4j

I have a property A on my nodes that holds an array of string values:
n.A=["ABC","XYZ","123","ABC"]
During merges I frequently will write code similar to n.A = n.A + "New Value". The problem I've been running into is that I end up with duplicate values in my arrays; not insurmountable but I'd like to avoid it.
How can I write a cypher query that will remove all duplicate values from the array A? Some duplicates have already been inserted at this point, and I'd like to clean them up.
When adding new values to existing arrays how can I make sure I only save a copy of the array with distinct values? (may end up being the exact same logic as used to solve my first question)

The query to add a non-duplicate value can be done efficiently (in this example, I assume that the id and newValue parameters are provided):
OPTIONAL MATCH (n {id: {id}})
WHERE NONE(x IN n.A WHERE x = {newValue})
SET n.A = n.A + {newValue};
This query does not create a temporary array, and will only alter the n.A array if it does not already contain the {newValue} string.
[EDITED]
If you want to (a) create the n node if it does not already exist, and (b) append {newValue} to n.A only if {newValue} is not already in n.A, this should work:
OPTIONAL MATCH (n { id: {id} })
FOREACH (x IN (
CASE WHEN n IS NULL THEN [1] ELSE [] END ) |
CREATE ({ id: {id}, A: [{newValue}]}))
WITH n, CASE WHEN EXISTS(n.A) THEN n.A ELSE [] END AS nA
WHERE NONE (x IN nA WHERE x = {newValue})
SET n.A = nA + {newValue};
If the OPTIONAL MATCH fails, then the FOREACH clause will create a new node node (with the {id} and an array containing {newValue}), and the following SET clause will do nothing because n would be NULL.
If the OPTIONAL MATCH succeeds, then the FOREACH clause will do nothing, and the following SET clause will append {newValue} to n.A iff that value does not already exist in n.A. If the SET should be performed, but the existing node did not already have the n.A property, then the query would concatenate an empty array to {newValue} (thus generating an array containing just {newValue}) and set that as the n.A value.

Combined some of the information on UNWIND with other troubleshooting and came up with the following Cypher query for removing duplicates from existing array properties.
match (n)
unwind n.system as x
with distinct x, n
with collect(x) as set, n
set n.system = set

Once you've cleaned up existing duplicates, you can use this when adding new values:
match (n)
set n.A = filter (x in n.A where x<>"newValue") + "newValue"

Related

Iterate through Neo4j graph matching on node properties

I have for example the following graph in Neo4j
(startnode)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#the line below can repeat itself 0..n times
(node)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#up to the endnode
(endnode)
There is an Interface properties I also need to match on. I do not want to follow all the paths, I just the one with Interface Node property I am looking for. For example Interface.VlanList CONTAINS ",23,"
I have done the following in Cypher but it applies that I already know how many iterations I am going to find which in reality is not the case.
match (n:StartNode {name:"device name"}) -[:BELONG_TO]- (i:Interface) -[:IS_CONNECTED]- (ii:Interface)-[:BELONG_TO]-(nn:Node) -[:BELONG_TO]- (iii:Interface) -[:IS_CONNECTED]- (iiii:Interface) -[:BELONG_TO]-(nnn:Node)
where i.VlanList CONTAINS ",841,"
AND ii.VlanList CONTAINS ",841,"
AND iii.VlanList CONTAINS ",841,"
return n, i,ii,nn,iii,iiii,nnn
I have been looking at the documentation but can not work out how the above could be resolved.

This should work:
// put the searchstring in a variable
WITH ',841,' AS searchstring
// look up start end endnode
MATCH (startNode: .... {...}), (endNode: .... {...})
// look for paths of variable length
// that have your search string in all nodes,
// except the first and the last one
WITH searchstring,startNode,endNode
MATCH path=(startnode)-[:BELONG_TO|IS_CONNECTED*]-(endnode)
WHERE ALL(i IN nodes(path)[1..-1] WHERE i.VlanList CONTAINS searchstring)
RETURN path
You can also look at https://neo4j.com/labs/apoc/4.1/graph-querying/path-expander/ for more ideas about how you can limit the pathfinding.

This query should work for you (assuming that the relationship directions I chose are correct):
MATCH p = (sNode:StartNode)-[:BELONG_TO]->(i1:Interface)-[:IS_CONNECTED]->(i2:Interface)-[:BELONG_TO]->(n1)-[:BELONG_TO|IS_CONNECTED*0..]->(eNode:Node)
WHERE sNode.name = "device name" AND eNode.name = "foo" AND LENGTH(p)%3 = 0
WITH p, i1, i2, n1, eNode, RELATIONSHIPS(p) AS rels, NODES(p) AS ns
WHERE n1 = eNode OR (
ALL(j IN RANGE(3, SIZE(rels)-3, 3) WHERE
'BELONG_TO' = TYPE(rels[j]) = TYPE(rels[j+2]) AND
'IS_CONNECTED' = TYPE(rels[j+1])) AND
ALL(x IN ([i1, i2] + REDUCE(s = [], i IN RANGE(3, SIZE(ns)-2, 3) | CASE WHEN i%3 = 0 THEN s ELSE s +ns[i] END))
WHERE x:Interface AND x.VlanList CONTAINS $substring)
)
RETURN p
It checks that the returned paths have the required pattern of node labels, node property value, and relationship types. It takes advantage of the variable length relationship syntax, using zero as the lower bound. Since there is no upper bound, the variable length relationship query query can take "forever" to finish (and in such a situation, you should use a reasonable upper bound).

Cypher to sum dynamic properties values

I have some dynamic datasource property values on a node. There are around 5 datasources which could have one or more data sources. These properties are set via a #Properties annotation using a Map in Spring Data Neo4j. Once the node is saved, the values are set on the node as follows:
dataSource.ds1.count="10"
dataSource.ds2.count="20"
dataSource.ds3.count="30"
... etc
I want to be able to sum the values of these counts dynamically if they have been set and use this value to order the results.
Here is the some of the starting Cypher I have so far which lists the properties:
MATCH (n:Entity)
WITH n, [k in keys(n) where k =~ 'dataSources.(ds1|ds2).count'] as props
RETURN n.name, props
This returns:
╒═══════════╤═════════════════════════════════════════════════╕
│"n.name" │"props" │
╞═══════════╪═════════════════════════════════════════════════╡
│"ENT1" │["dataSources.ds1.count","dataSources.ds2.count"]│
├───────────┼─────────────────────────────────────────────────┤
│"ENT2" │["dataSources.ds1.count","dataSources.ds2.count"]│
├───────────┼─────────────────────────────────────────────────┤
│"ENT3" │["dataSources.ds1.count"] │
└───────────┴─────────────────────────────────────────────────┘
How can I use the properties names to get the values and sum them?
I was thinking of something similar to this where I could use apoc.sum but am unsure of how to iterate over the property names to get their values.
RETURN n.name, apoc.coll.sum([COALESCE(toInteger(n.`dataSources.ds1.count`), 0), etc]) as count
ORDER by count DESC
Any help would be much appreciated.

Here is one way:
MATCH (n:Entity)
RETURN
n.name,
COALESCE(n.`dataSources.ds1.count`, 0) + COALESCE(n.`dataSources.ds2.count`, 0) AS count
ORDER BY count
The COALESCE function returns its first non-NULL argument.
Or, if you pass the list of property names in a props parameter:
MATCH (n:Entity)
RETURN n.name, REDUCE(s = 0, p IN $props | s + COALESCE(n[p], 0)) AS count
ORDER BY count

Update nodes by a list of ids and values in one cypher query

I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.

Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.

Find Nodes NOT MATCHed by a Query

I run a Cypher query and update labels of the nodes matching a certain criteria. I also want to update nodes that do not pass that criteria in the same query, before I update the matched ones. Is there a construct in Cypher that can help me achieve this?
Here is a concrete formulation. I have a pool of labels from which I choose and assign to nodes. When I run a certain query, I assign one of those labels, l, to the nodes returned under the conditions specified by WHERE clause in the query. However, l could have been assigned to other nodes previously, and I want to rid all those nodes of l which are not the result of this query.
The conditions in WHERE clause could be arbitrary; hence simple negation would probably not work. An example code is as follows:
MATCH (v)
WHERE <some set of conditions>
// here I want to remove 'l' from the nodes
// not satisfied by the above condition
SET v:l
I have solved this problem by using a temporary label through this process:
Assign x to v.
Remove l from all nodes.
Assign l to all nodes containing x.
Removing x from all nodes.
Is there a better way to achieve this in Cypher?

This seems like one reasonable solution:
MATCH (v)
WITH REDUCE(s = {a:[], d:[]}, x IN COLLECT(v) |
CASE
WHEN <some set of conditions> AND NOT('l' IN LABELS(x)) THEN {a: s.a+x, d: s.d}
WHEN 'l' IN LABELS(x) THEN {a: s.a, d: s.d+x}
END) AS actions
FOREACH (a IN actions.a | SET a:l)
FOREACH (d IN actions.d | REMOVE d:l)
The above query tests every node, and remembers in the actions.a list the nodes that need the l label but do not yet have it, and in the actions.d list the nodes that have the label but should not. Then it performs the appropriate action for each list, without updating any nodes that are already OK.

Cannot access the property value after creation Neo4j

My problem is, after I insert several nodes with uri properties of the type:
"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#Red"
Later, I want to extract the part after hashtag (which is "Red" for this case), so I use the split function and create property name with value tail(records):
MATCH (n) WITH split (n.uri, '#') AS records, n
WHERE head(records) = 'http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine'
SET n.name = tail(records)
after successfully creating the property name with appropriate name for a set of nodes, I check (everything perfect for now):
Match (n) Return keys(n)
I create a label (concept) for all the nodes:
MATCH (n) SET n :concept RETURN n
Later trying to access the value of property "name":
Match (n{name: 'Red'}) RETURN n
or
Match (n:concept{name: 'Red'}) RETURN n
I am getting the empty response (obviously it is not connected to label created, as even before I couldn't access it). I would appreciate your help. Thank you!

The TAIL(x) function returns a list (of all values in x after the first), not a scalar value. Therefore, the name in your example would have the value ["Red"], not "Red".
Instead of:
SET n.name = tail(records)
your Cypher code should have used the following (assuming your uri values always embed a "#"):
SET n.name = records[1]

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Remove duplicates from Node array properties - neo4j

Combined some of the information on UNWIND with other troubleshooting and came up with the following Cypher query for removing duplicates from existing array properties. match (n) unwind n.system as x with distinct x, n with collect(x) as set, n set n.system = set

Once you've cleaned up existing duplicates, you can use this when adding new values: match (n) set n.A = filter (x in n.A where x<>"newValue") + "newValue"

Related

Iterate through Neo4j graph matching on node properties

Cypher to sum dynamic properties values

Update nodes by a list of ids and values in one cypher query

Find Nodes NOT MATCHed by a Query

Cannot access the property value after creation Neo4j

Categories

Resources