string aggregation in Cypher

string aggregation in Cypher - neo4j

I need an equivalent of Postgres string_agg, Oracle listagg or MySQL group_concat in Cypher. My Cypher query is a stored procedure returning stream of strings (in following minimized example replaced by collection unwind). As a result, I want to get single string with concatenation of all strings.
Example:
UNWIND ['first','second','third'] as c
RETURN collect(c)
Actual result (of list type):
["first", "second", "third"]
Expected result (of String type):
"firstsecondthird"
or (nice-to-have):
"firstCUSTOMSEPARATORsecondCUSTOMSEPARATORthird"
(I just wanted to quickly build ad hoc query to verify some postprocessing actually performed by Neo4j consumer and I thought it would be simple but I cannot find any solution. Unlike this answer I want string, not collection, since my issue is in something about length of concatenated string.)

How about using APOC ?
UNWIND ['first','second','third'] as c
RETURN apoc.text.join(collect(c), '-') AS concat
Where - is your custom separator.
Result :
╒════════════════════╕
│"concat" │
╞════════════════════╡
│"first-second-third"│
└────────────────────┘
--
NO APOC
Take into account when the collection is one element only
UNWIND ['first','second','third'] as c
WITH collect(c) AS elts
RETURN CASE WHEN size(elts) > 1
THEN elts[0] + reduce(x = '', z IN tail(elts) | x + '-' + z)
ELSE elts[0]
END

You can further simplify the query as below. And no need to worry if list is empty and it will return null if l is empty list
WITH ['first','second','third'] as l
RETURN REDUCE(s = HEAD(l), n IN TAIL(l) | s + n) AS result

Related

Replacing a string on whole database

&amp char has somehow got through different imports into the db on many different node attributes and relationship attributes. How do I replace all & strings with regular & char?
I don't know all the possible property names that I can filter on.

If you want to make this efficient, you can use CALL{} in transactions of X rows
The :auto prefix is needed if you want to run this query in the Neo4j browser
This line
WITH n, [x in keys(n) WHERE n[x] CONTAINS '&amp'] AS keys
is needed to avoid trying a replace function on a property that is not of String type, in which case Neo4j will throw an exception.
Full query
:auto MATCH (n)
CALL {
WITH n
WITH n, [x in keys(n) WHERE n[x] CONTAINS '&amp'] AS keys
CALL apoc.create.setProperties(n, keys, [k in keys | replace(n[k], '&amp', '&')])
YIELD node
RETURN node
} IN TRANSACTIONS OF 100 ROWS
RETURN count(*)
If you're using a Neo4j cluster, you will need to run this on the leader of the database with the bolt connection ( not using the neo4j:// protocol.
Same query for the relationships now
:auto MATCH (n)-[r]->(x)
CALL {
WITH r
WITH r, [x in keys(r) WHERE r[x] CONTAINS '&amp'] AS keys
CALL apoc.create.setRelProperties(r, keys, [k in keys | replace(r[k], '&amp', '&')])
YIELD rel
RETURN rel
} IN TRANSACTIONS OF 100 ROWS
RETURN count(*)

You can find the answer on this documentation:
https://neo4j.com/labs/apoc/4.3/overview/apoc.create/apoc.create.setRelProperties/
For example, below will replace &amp with & in all properties in all nodes.
MATCH (p)
// collect keys (or properties) in node p and look for properties with &amp
WITH p, [k in keys(p) WHERE p[k] CONTAINS '&amp'] AS keys WHERE size(keys) > 0
// use apoc function to update the values in each prop key
CALL apoc.create.setProperties(p, keys, [k in keys | replace(p[k], '&amp', '&')])
YIELD node
RETURN node

Pattern Matching in Neo4j

Assume that in an application, the user gives us a graph and we want to consider it as a pattern and find all occurrences of the pattern in the neo4j database. If we knew what the pattern is, we could write the pattern as a Cypher query and run it against our database. However, now we do not know what the pattern is beforehand and receive it from the user in the form of a graph. How can we perform a pattern matching on the database based on the given graph (pattern)? Is there any apoc for that? Any external library?

One way of doing this is to decompose your input graph into edges and create a dynamic cypher from it. I have worked on this quite some time ago, and the solution below is not perfect but indicates a possible direction.
For example, if you feed this graph:
and you take the id(node) from the graph, (i am not taking the rel ids, this is one of the imperfections)
this query
WITH $nodeids AS selection
UNWIND selection AS s
WITH COLLECT (DISTINCT s) AS selection
WITH selection,
SPLIT(left('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',SIZE(selection)*2-1),",") AS nodeletters
WITH selection,
nodeletters,
REDUCE (acc="", nl in nodeletters |
CASE acc
WHEN "" THEN acc+nl
ELSE acc+','+nl
END) AS rtnnodes
MATCH (n) WHERE id(n) IN selection
WITH COLLECT(n) AS nodes,selection,nodeletters,rtnnodes
UNWIND nodes AS n
UNWIND nodes AS m
MATCH (n)-[r]->(m)
WITH DISTINCT "("
+nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(n) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(n)| acc + ':'+ p))+")-[:"+type(r)+"]->("
+ nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(m) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(m)| acc + ':'+ p))+")" as z,rtnnodes
WITH COLLECT(z) AS parts,rtnnodes
WITH REDUCE(y=[], x in range(0, size(parts)-1) | y + replace(parts[x],"[","[r" + (x+1))) AS parts2,
REDUCE (acc="", x in range(0, size(parts)-1) | CASE acc WHEN "" THEN acc+"r"+(x+1) ELSE acc+",r"+(x+1) END) AS rtnrels,
rtnnodes
RETURN
REDUCE (acc="MATCH ",p in parts2 |
CASE acc
WHEN "MATCH " THEN acc+p
ELSE acc+','+p
END)+
" RETURN "+
rtnnodes+","+rtnrels+
" LIMIT "+{limit}
AS cypher
returns something like
cypher: "MATCH (a:Person)-[r1:DRIVES]->(b:Car),(a:Person)-[r2:KNOWS]->(c:Person) RETURN a,b,c,r1,r2 LIMIT 50"
which you can feed to the next query.
In Graphileon, you can just select the nodes, and the result will be visualized as well.
Disclosure : I work for Graphileon

I have used patterns in genealogy queries.
The X-chromosome is not transmitted from father to son. As you traverse a family tree you can use the reduce function to create a concatenated string of the sex of the ancestor. You can then accept results that lack MM (father-son). This query gives all the descendants inheriting the ancestor's (RN=32) X-chromosome.
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return distinct m.fullname as Fullname
I am developing other pattern specific queries as part of a Neo4j PlugIn for genealogy. These will include patterns of triangulation groups.
GitHub repository for Neo4j Genealogy PlugIn

neo4j: List of ints from plain string representation

Context
I would like to read from a csv-file into my database and create nodes and connections. For the to be created order nodes, one of the fields to read is a stuffed list of Products (relational key), i.e. looks like this "[123,456,789]" where the numbers are the product ids.
Now reading the data into the db I have no problem to create nodes for the Orders and the Products; going over another iteration I now want to create the edges by kind of unwinding the list of products in the Order and linking to the according products.
Best would be if I could at creation time of the Order-nodes convert the string containing the list into a list of ints, so that a simple loop over these values and matching the Product-nodes would do the trick (also for storage efficiency this would be better).
Problem
However I cannot figure out how to convert the said string into the said format of a list containing ints. All my attempts with coming up with a cypher for this failed miserably. I will post some of them below, starting from the string l:
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s
UNWIND s as ss
COLLECT(toInteger(ss) ) as k
return k
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s, [] as k
FOREACH(ss IN s| SET k = k + toInteger(ss) )
return k
both statements failing.
EDIT
I have found a partial solution, I am however not quite satisfied with as it applied only to my task at hand, but is not a solution to the more general problem of this list conversion.
I found out that one can create an empty list as an property of a node, which can be successively updated:
CREATE (o:Order {k: []})
WITH o, '[123,456]' as l
WITH o, split(replace(replace(l,'[',''),']',''),',') as s
FOREACH(ss IN s | SET o.k= o.k + toInteger(ss) )
RETURN o.k
strangly this will only work on properties of nodes, but not on bound variables (see above)

Since the input string looks like a valid JSON object, you can simple use the apoc.convert.fromJsonList function from the APOC library:
WITH "[123,456,789]" AS l
RETURN apoc.convert.fromJsonList(l)

You can use substring() to trim out the brackets at the start and the end.
This approach will allow you to create a list of the ints:
WITH '[123,456,789]' as nums
WITH substring(nums, 1, size(nums)-2) as nums
WITH split(nums, ',') as numList
RETURN numList
You can of course perform all these operations at once, and then UNWIND the subsequent list, convert them to ints, and match them to products:
WITH '[123,456,789]' as nums
UNWIND split(substring(nums, 1, size(nums)-2), ',') as numString
WITH toInteger(numString) as num
MATCH (p:Product {id:num})
...
EDIT
If you just want to convert this to a list of integers, you can use list comprehension to do this once you have your list of strings:
WITH '[123,456,789]' as nums
WITH split(substring(nums, 1, size(nums)-2), ',') as numStrings
WITH [str in numStrings | toInteger(str)] as nums
...

cypher distinct is returning duplicate using with parameter

MATCH (c:someNode) WHERE LOWER(c.erpId) contains (LOWER("1"))
OR LOWER(c.constructionYear) contains (LOWER("1"))
OR LOWER(c.label) contains (LOWER("1"))
OR LOWER(c.name) contains (LOWER("1"))
OR LOWER(c.description) contains (LOWER("1"))with collect(distinct c) as rows, count(c) as total
MATCH (c:someNode)-[adtype:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE toString(ad.streetAddress) contains "1"
OR toString(ad.postalCity) contains "1"
with distinct rows+collect( c) as rows, count(c) +total as total
UNWIND rows AS part
RETURN part order by part.name SKIP 20 Limit 20
When I run the following cypher query it returns duplicate results. Also it the skip does not seem to work. What am I doing worng

When you use WITH DISTINCT a, b, c (or RETURN DISTINCT a, b, c), that just means that you want each resulting record ({a: ..., b: ..., c: ...}) to be distinct -- it does not affect in any way the contents of any lists that may be part of a, b, or c.
Below is a simplified query that might work for you. It does not use the LOWER() and TOSTRING() functions at all, as they appear to be superfluous. It also only uses a single MATCH/WHERE pair to find all the the nodes of interest. The pattern comprehension syntax is used as part of the WHERE clause to get a non-empty list of true value(s) iff there are any anotherObject node(s) of interest. Notice that DISTINCT is not needed.
MATCH (c:someNode)
WHERE
ANY(
x IN [c.erpId, c.constructionYear, c.label, c.name, c.description]
WHERE x CONTAINS "1") OR
[(c)-[:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE ad.streetAddress CONTAINS "1" OR ad.postalCity CONTAINS "1"
| true][0]
RETURN c AS part
ORDER BY part.name SKIP 20 LIMIT 20;

Remove duplicates from Node array properties

I have a property A on my nodes that holds an array of string values:
n.A=["ABC","XYZ","123","ABC"]
During merges I frequently will write code similar to n.A = n.A + "New Value". The problem I've been running into is that I end up with duplicate values in my arrays; not insurmountable but I'd like to avoid it.
How can I write a cypher query that will remove all duplicate values from the array A? Some duplicates have already been inserted at this point, and I'd like to clean them up.
When adding new values to existing arrays how can I make sure I only save a copy of the array with distinct values? (may end up being the exact same logic as used to solve my first question)

The query to add a non-duplicate value can be done efficiently (in this example, I assume that the id and newValue parameters are provided):
OPTIONAL MATCH (n {id: {id}})
WHERE NONE(x IN n.A WHERE x = {newValue})
SET n.A = n.A + {newValue};
This query does not create a temporary array, and will only alter the n.A array if it does not already contain the {newValue} string.
[EDITED]
If you want to (a) create the n node if it does not already exist, and (b) append {newValue} to n.A only if {newValue} is not already in n.A, this should work:
OPTIONAL MATCH (n { id: {id} })
FOREACH (x IN (
CASE WHEN n IS NULL THEN [1] ELSE [] END ) |
CREATE ({ id: {id}, A: [{newValue}]}))
WITH n, CASE WHEN EXISTS(n.A) THEN n.A ELSE [] END AS nA
WHERE NONE (x IN nA WHERE x = {newValue})
SET n.A = nA + {newValue};
If the OPTIONAL MATCH fails, then the FOREACH clause will create a new node node (with the {id} and an array containing {newValue}), and the following SET clause will do nothing because n would be NULL.
If the OPTIONAL MATCH succeeds, then the FOREACH clause will do nothing, and the following SET clause will append {newValue} to n.A iff that value does not already exist in n.A. If the SET should be performed, but the existing node did not already have the n.A property, then the query would concatenate an empty array to {newValue} (thus generating an array containing just {newValue}) and set that as the n.A value.

Combined some of the information on UNWIND with other troubleshooting and came up with the following Cypher query for removing duplicates from existing array properties.
match (n)
unwind n.system as x
with distinct x, n
with collect(x) as set, n
set n.system = set

Once you've cleaned up existing duplicates, you can use this when adding new values:
match (n)
set n.A = filter (x in n.A where x<>"newValue") + "newValue"

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

string aggregation in Cypher - neo4j

You can further simplify the query as below. And no need to worry if list is empty and it will return null if l is empty list WITH ['first','second','third'] as l RETURN REDUCE(s = HEAD(l), n IN TAIL(l) | s + n) AS result

Related

Replacing a string on whole database

Pattern Matching in Neo4j

neo4j: List of ints from plain string representation

cypher distinct is returning duplicate using with parameter

Remove duplicates from Node array properties

Categories

Resources