neo4j: List of ints from plain string representation - neo4j

Context
I would like to read from a csv-file into my database and create nodes and connections. For the to be created order nodes, one of the fields to read is a stuffed list of Products (relational key), i.e. looks like this "[123,456,789]" where the numbers are the product ids.
Now reading the data into the db I have no problem to create nodes for the Orders and the Products; going over another iteration I now want to create the edges by kind of unwinding the list of products in the Order and linking to the according products.
Best would be if I could at creation time of the Order-nodes convert the string containing the list into a list of ints, so that a simple loop over these values and matching the Product-nodes would do the trick (also for storage efficiency this would be better).
Problem
However I cannot figure out how to convert the said string into the said format of a list containing ints. All my attempts with coming up with a cypher for this failed miserably. I will post some of them below, starting from the string l:
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s
UNWIND s as ss
COLLECT(toInteger(ss) ) as k
return k
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s, [] as k
FOREACH(ss IN s| SET k = k + toInteger(ss) )
return k
both statements failing.
EDIT
I have found a partial solution, I am however not quite satisfied with as it applied only to my task at hand, but is not a solution to the more general problem of this list conversion.
I found out that one can create an empty list as an property of a node, which can be successively updated:
CREATE (o:Order {k: []})
WITH o, '[123,456]' as l
WITH o, split(replace(replace(l,'[',''),']',''),',') as s
FOREACH(ss IN s | SET o.k= o.k + toInteger(ss) )
RETURN o.k
strangly this will only work on properties of nodes, but not on bound variables (see above)

Since the input string looks like a valid JSON object, you can simple use the apoc.convert.fromJsonList function from the APOC library:
WITH "[123,456,789]" AS l
RETURN apoc.convert.fromJsonList(l)

You can use substring() to trim out the brackets at the start and the end.
This approach will allow you to create a list of the ints:
WITH '[123,456,789]' as nums
WITH substring(nums, 1, size(nums)-2) as nums
WITH split(nums, ',') as numList
RETURN numList
You can of course perform all these operations at once, and then UNWIND the subsequent list, convert them to ints, and match them to products:
WITH '[123,456,789]' as nums
UNWIND split(substring(nums, 1, size(nums)-2), ',') as numString
WITH toInteger(numString) as num
MATCH (p:Product {id:num})
...
EDIT
If you just want to convert this to a list of integers, you can use list comprehension to do this once you have your list of strings:
WITH '[123,456,789]' as nums
WITH split(substring(nums, 1, size(nums)-2), ',') as numStrings
WITH [str in numStrings | toInteger(str)] as nums
...

Related

string aggregation in Cypher

I need an equivalent of Postgres string_agg, Oracle listagg or MySQL group_concat in Cypher. My Cypher query is a stored procedure returning stream of strings (in following minimized example replaced by collection unwind). As a result, I want to get single string with concatenation of all strings.
Example:
UNWIND ['first','second','third'] as c
RETURN collect(c)
Actual result (of list type):
["first", "second", "third"]
Expected result (of String type):
"firstsecondthird"
or (nice-to-have):
"firstCUSTOMSEPARATORsecondCUSTOMSEPARATORthird"
(I just wanted to quickly build ad hoc query to verify some postprocessing actually performed by Neo4j consumer and I thought it would be simple but I cannot find any solution. Unlike this answer I want string, not collection, since my issue is in something about length of concatenated string.)
How about using APOC ?
UNWIND ['first','second','third'] as c
RETURN apoc.text.join(collect(c), '-') AS concat
Where - is your custom separator.
Result :
╒════════════════════╕
│"concat" │
╞════════════════════╡
│"first-second-third"│
└────────────────────┘
--
NO APOC
Take into account when the collection is one element only
UNWIND ['first','second','third'] as c
WITH collect(c) AS elts
RETURN CASE WHEN size(elts) > 1
THEN elts[0] + reduce(x = '', z IN tail(elts) | x + '-' + z)
ELSE elts[0]
END
You can further simplify the query as below. And no need to worry if list is empty and it will return null if l is empty list
WITH ['first','second','third'] as l
RETURN REDUCE(s = HEAD(l), n IN TAIL(l) | s + n) AS result

Update nodes by a list of ids and values in one cypher query

I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.
Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.

How to transform out-adjacency list to in-adajcency list?

For directed graphs G = (V,E). That representation maintains an array A[...] indexed by V , in which A[v] is a linked list. The linked list holds the names of all the nodes u to which v points, i.e., nodes u for which (v, u) ∈ E. (Technically, A[v] contains a pointer to the first item in the linked list).This is the default adjacency list format and can be thought of as an out- adjacency list representation
An in-adjacency list representation would be one in which A[v] is the list of nodes that point to v.
Can anyone help to give me a pseudocode for an O(|V | + |E|) algorithm that transforms the out-adjacency list representation into an in-adjacency list repre- sentation. And please explain why your algorithm is correct and why it runs in O(|V | + |E|) time.
You could do that in O(V + E) but for that you have to modify the insert operation in linked list to be done in constant time O(1).
That could be easily done, by keeping a separate pointer last, where last points to the last element inserted in a linked list. With help of last, insert operation in a linked list can be done in O(1),as opposed to usual O(N).
Now coming to the problem, lets say our new adjacency list is adj_new. We start by traversing our original adjacency list starting from the first linked list.
For each element x of linked list A[0], we do insert operation:
insert 0 in linked list adj_new[x]
We do the above for each of the linked lists. Since traversing the entire adjacency list takes time
O(V + E), and every insert operation takes O(1) time, the total time taken is O(V + E)
Following is the pseudocode :
For each linked list A[i]
{
for each element x of A[i]
{
append i to linked list adj_new[x]
}
}
adj_new[] is the in-adjacency list. If you look carefully its nothing but reversing the direction of each edge of your directed graph.

Cypher - How to query multiple Neo4j Node property fragments with "STARTS WITH"

I'm looking for a way to combine the Cypher "IN" and "STARTS WITH" query. In other words I'm looking for a way to look up nodes that start with specific string sequences that are provided as Array using IN.
The goal is to have the query run in as less as possible calls against the DB.
I browsed the documentation and played around with Neo4j a bit but wasn't able to combine the following two queries into one:
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WHERE a.prop_A IN [...Array of Strings]
RETURN a.prop_A, COLLECT ({result_b: b.prop_B})
and
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WHERE a.prop_A STARTS WITH 'String'
RETURN a.prop_A, b.prop_B
Is there a way to combine these two approaches?
Any help is greatly appreciated.
Krid
You'll want to make sure there is an index or unique constraint (whichever is appropriate) on your :Node_type_A(prop_A) to speed up your lookups.
If I'm reading your requirements right, this query may work for you, adding your input strings as appropriate (parameterize them if you can).
WITH [...] as inputs
UNWIND inputs as input
// each string in inputs is now on its own row
MATCH (a:Node_type_A)
WHERE a.prop_A STARTS WITH input
// should be an fast index lookup for each input string
WITH a
MATCH (a)-[]->(b:Node_type_B)
RETURN a.prop_A, COLLECT ({result_b: b.prop_B})
Something like this should work:
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WITH a.prop_A AS pa, b.prop_B AS pb
WITH pa, pb,
REDUCE(s = [], x IN ['a','b','c'] |
CASE WHEN pa STARTS WITH x THEN s + pb ELSE s END) AS pbs
RETURN pa, pbs;

Neo4j match nodes related to all nodes in collection

I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.
I made a query, which doesn't work quite as intended.
MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;
This query first finds the nodes A, B, C and then searches for tags, that are related to A, B or C via 1 node or less.
What I want to do though is to find tags, which are related to ALL three nodes (A, B and C).
I know I could concatenate MATCH and WITH statements, and do something like this:
MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;
But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A, B, C).
So is there a way to make it in one query, similar to my first try?
Here is a solution that only requires a single MATCH clause.
MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
The query finds all the tags (other) that are "related" (by up to 2 steps) to each of your named tags (t).
It then uses aggregation to collect the distinct other nodes for each t. In this example, we end up with 3 others collections -- 1 for each t.
It then collects all the others collections into a single coll collection.
Finally, since the result set is supposed to be the intersection of every others collection, the query walks through the nodes in first others collection, and extracts the ones that are also in the remaining others collections. And, since each others collection already contains distinct nodes, the result must also have distinct nodes.
In addition, if you have a lot of tags, the above query can be made a bit faster by:
Creating an index (or uniqueness constraint, which automatically creates an index for you) on :Tag(name), and then
Specifying the use of that index in your query -- by inserting the following clause between the MATCH and WHERE clauses. Currently, the Cypher engine does not automatically use the index for this specific query.
USING INDEX t:Tag(name)
How about this one:
MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t
The trick is put all the related nodes for t into a collection (others) and use the ALL predicate to make sure all of your A,B and C are part of that.
Here is an alternative:
MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times
It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.

Resources