How to identify a sorted list - neo4j

I want to know whether a list "A" has already been sorted by its values (in strictly ascending order). I thought about making a copy of the list (=> "B") and comparing it to "A" ordered by its values (using ASC). At the current state, I have no idea how to create a copy of a list. Maybe there is another easier way to solve this problem (using Cypher).

You can test the original list directly, which should be faster and use fewer resources than doing a new sort and/or creating a copy of the list.
For example, this will return true:
WITH [1,2,3,4] AS list
RETURN ALL(i IN RANGE(1, SIZE(list)-1) WHERE list[i-1] <= list[i]) AS inOrder
And if list was [4,1,2,3], then the above query would return false.

If you don't want to/can't use APOC you can sort the list by using
UNWIND list as item
WITH list,item ORDER BY item
It is a bit clunky because you need to carry forward all variables you need in all the WITH statements.
Example:
WITH [1,2,3,4] as list
UNWIND list as item
WITH list,item ORDER BY item
WITH list,collect(item) as sorted
return list,sorted,list=sorted
returns
╒═════════╤═════════╤═════════════╕
│"list" │"sorted" │"list=sorted"│
╞═════════╪═════════╪═════════════╡
│[1,2,3,4]│[1,2,3,4]│true │
└─────────┴─────────┴─────────────┘
However
WITH [1,2,3,1] as list
UNWIND list as item
WITH list,item ORDER BY item
WITH list,collect(item) as sorted
return list,sorted,list=sorted
returns
╒═════════╤═════════╤═════════════╕
│"list" │"sorted" │"list=sorted"│
╞═════════╪═════════╪═════════════╡
│[1,2,3,1]│[1,1,2,3]│false │
└─────────┴─────────┴─────────────┘
Or you could use the reduce function, this avoids sorting the list and iterates over the list only once:
WITH [1,2,3,4] as list
WITH reduce(result = true, i in range(0,size(list)-2) | result AND list[i] <= list[i+1]) AS sorted
return sorted

There are collection functions in the APOC Procedures that can help you out, especially apoc.coll.sort() and apoc.coll.sortNodes() depending on whether they're primitive values or nodes.
As for creating a list copy, you can use a list comprehension to perform a projection of the elements of a list like so:
WITH [1,2,3,4,5] as list1
WITH list1, [n in list1 | n] as list2
...
As far as comparisons go, equality comparisons between list do include ordering, so for the above, RETURN list1 = list2 would be true since they contain the same elements in the same order. But shuffling one of the lists like so: RETURN list1 = apoc.coll.shuffle(list2) would return false since the order is different.

Related

Update nodes by a list of ids and values in one cypher query

I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.
Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.

How to concenate multiple lists in neo4j

I was wondering wether the following would be possible with Neo4j.
Suppose I have a class of nodes, say (event:Event) whereas every Event has a tags property ([String]).
Now I can return all those arrays just fine like:
MATCH (event:Event) RETURN event.tags
However I don't understand yet how I could combine the output for the different node results to be collected in one list. Is such a thing possible with Cypher? Of course one could always programatically solve this thing, but as far as I understand Cypher offers reduce as well as native list addition.
If you can use APOC library use flatten function for collections:
MATCH (event:Event)
RETURN apoc.coll.flatten(COLLECT(event.tags))
COLLECT(event.tags) will combine all results into single list (list of lists of tags)
apoc.coll.flatten(..) will flatten the list of lists into single list
If for some reason you can't use APOC, use reduce:
MATCH (event:Event)
RETURN REDUCE(s = [], tags IN COLLECT(event.tags) | s + tags)
Map Projection may do most of what you're asking.
map projection documentation
You can start with a node and add to it.
MATCH (user:User)-[:TRIGGERED]->(event:Event) WITH event {.*, user_id:user.user_id} as user_event
This would give you an array of events with the added parameter of user_id.

neo4j: List of ints from plain string representation

Context
I would like to read from a csv-file into my database and create nodes and connections. For the to be created order nodes, one of the fields to read is a stuffed list of Products (relational key), i.e. looks like this "[123,456,789]" where the numbers are the product ids.
Now reading the data into the db I have no problem to create nodes for the Orders and the Products; going over another iteration I now want to create the edges by kind of unwinding the list of products in the Order and linking to the according products.
Best would be if I could at creation time of the Order-nodes convert the string containing the list into a list of ints, so that a simple loop over these values and matching the Product-nodes would do the trick (also for storage efficiency this would be better).
Problem
However I cannot figure out how to convert the said string into the said format of a list containing ints. All my attempts with coming up with a cypher for this failed miserably. I will post some of them below, starting from the string l:
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s
UNWIND s as ss
COLLECT(toInteger(ss) ) as k
return k
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s, [] as k
FOREACH(ss IN s| SET k = k + toInteger(ss) )
return k
both statements failing.
EDIT
I have found a partial solution, I am however not quite satisfied with as it applied only to my task at hand, but is not a solution to the more general problem of this list conversion.
I found out that one can create an empty list as an property of a node, which can be successively updated:
CREATE (o:Order {k: []})
WITH o, '[123,456]' as l
WITH o, split(replace(replace(l,'[',''),']',''),',') as s
FOREACH(ss IN s | SET o.k= o.k + toInteger(ss) )
RETURN o.k
strangly this will only work on properties of nodes, but not on bound variables (see above)
Since the input string looks like a valid JSON object, you can simple use the apoc.convert.fromJsonList function from the APOC library:
WITH "[123,456,789]" AS l
RETURN apoc.convert.fromJsonList(l)
You can use substring() to trim out the brackets at the start and the end.
This approach will allow you to create a list of the ints:
WITH '[123,456,789]' as nums
WITH substring(nums, 1, size(nums)-2) as nums
WITH split(nums, ',') as numList
RETURN numList
You can of course perform all these operations at once, and then UNWIND the subsequent list, convert them to ints, and match them to products:
WITH '[123,456,789]' as nums
UNWIND split(substring(nums, 1, size(nums)-2), ',') as numString
WITH toInteger(numString) as num
MATCH (p:Product {id:num})
...
EDIT
If you just want to convert this to a list of integers, you can use list comprehension to do this once you have your list of strings:
WITH '[123,456,789]' as nums
WITH split(substring(nums, 1, size(nums)-2), ',') as numStrings
WITH [str in numStrings | toInteger(str)] as nums
...

Check if a sequence exist in in collect in neo4j

Can somebody please tell how we can compare a sequence is present in the collection or not in Cypher / Neo4j?
Like if I say that while collect() is collecting the elements on traversal , can we check that this sequence is present when it has done collection [Element1, Element2, Element3]?
Depending if you allow gaps, you could either find the index of e1..e3 and see that they are ascending (with gaps) (apoc.coll.indexOf)
Or you could extract-3-element sublists and compare them.
WITH [1,2,3,4,5] as coll, [2,3,4] as seq
WHERE any(idx IN range(0,length(coll)-length(seq)) WHERE coll[idx..idx+length(seq)] = seq)
RETURN coll, seq

Neo4j match nodes related to all nodes in collection

I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.
I made a query, which doesn't work quite as intended.
MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;
This query first finds the nodes A, B, C and then searches for tags, that are related to A, B or C via 1 node or less.
What I want to do though is to find tags, which are related to ALL three nodes (A, B and C).
I know I could concatenate MATCH and WITH statements, and do something like this:
MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;
But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A, B, C).
So is there a way to make it in one query, similar to my first try?
Here is a solution that only requires a single MATCH clause.
MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
The query finds all the tags (other) that are "related" (by up to 2 steps) to each of your named tags (t).
It then uses aggregation to collect the distinct other nodes for each t. In this example, we end up with 3 others collections -- 1 for each t.
It then collects all the others collections into a single coll collection.
Finally, since the result set is supposed to be the intersection of every others collection, the query walks through the nodes in first others collection, and extracts the ones that are also in the remaining others collections. And, since each others collection already contains distinct nodes, the result must also have distinct nodes.
In addition, if you have a lot of tags, the above query can be made a bit faster by:
Creating an index (or uniqueness constraint, which automatically creates an index for you) on :Tag(name), and then
Specifying the use of that index in your query -- by inserting the following clause between the MATCH and WHERE clauses. Currently, the Cypher engine does not automatically use the index for this specific query.
USING INDEX t:Tag(name)
How about this one:
MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t
The trick is put all the related nodes for t into a collection (others) and use the ALL predicate to make sure all of your A,B and C are part of that.
Here is an alternative:
MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times
It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.

Resources