Check if a sequence exist in in collect in neo4j - neo4j

Can somebody please tell how we can compare a sequence is present in the collection or not in Cypher / Neo4j?
Like if I say that while collect() is collecting the elements on traversal , can we check that this sequence is present when it has done collection [Element1, Element2, Element3]?

Depending if you allow gaps, you could either find the index of e1..e3 and see that they are ascending (with gaps) (apoc.coll.indexOf)
Or you could extract-3-element sublists and compare them.
WITH [1,2,3,4,5] as coll, [2,3,4] as seq
WHERE any(idx IN range(0,length(coll)-length(seq)) WHERE coll[idx..idx+length(seq)] = seq)
RETURN coll, seq

Related

How to identify a sorted list

I want to know whether a list "A" has already been sorted by its values (in strictly ascending order). I thought about making a copy of the list (=> "B") and comparing it to "A" ordered by its values (using ASC). At the current state, I have no idea how to create a copy of a list. Maybe there is another easier way to solve this problem (using Cypher).
You can test the original list directly, which should be faster and use fewer resources than doing a new sort and/or creating a copy of the list.
For example, this will return true:
WITH [1,2,3,4] AS list
RETURN ALL(i IN RANGE(1, SIZE(list)-1) WHERE list[i-1] <= list[i]) AS inOrder
And if list was [4,1,2,3], then the above query would return false.
If you don't want to/can't use APOC you can sort the list by using
UNWIND list as item
WITH list,item ORDER BY item
It is a bit clunky because you need to carry forward all variables you need in all the WITH statements.
Example:
WITH [1,2,3,4] as list
UNWIND list as item
WITH list,item ORDER BY item
WITH list,collect(item) as sorted
return list,sorted,list=sorted
returns
╒═════════╤═════════╤═════════════╕
│"list" │"sorted" │"list=sorted"│
╞═════════╪═════════╪═════════════╡
│[1,2,3,4]│[1,2,3,4]│true │
└─────────┴─────────┴─────────────┘
However
WITH [1,2,3,1] as list
UNWIND list as item
WITH list,item ORDER BY item
WITH list,collect(item) as sorted
return list,sorted,list=sorted
returns
╒═════════╤═════════╤═════════════╕
│"list" │"sorted" │"list=sorted"│
╞═════════╪═════════╪═════════════╡
│[1,2,3,1]│[1,1,2,3]│false │
└─────────┴─────────┴─────────────┘
Or you could use the reduce function, this avoids sorting the list and iterates over the list only once:
WITH [1,2,3,4] as list
WITH reduce(result = true, i in range(0,size(list)-2) | result AND list[i] <= list[i+1]) AS sorted
return sorted
There are collection functions in the APOC Procedures that can help you out, especially apoc.coll.sort() and apoc.coll.sortNodes() depending on whether they're primitive values or nodes.
As for creating a list copy, you can use a list comprehension to perform a projection of the elements of a list like so:
WITH [1,2,3,4,5] as list1
WITH list1, [n in list1 | n] as list2
...
As far as comparisons go, equality comparisons between list do include ordering, so for the above, RETURN list1 = list2 would be true since they contain the same elements in the same order. But shuffling one of the lists like so: RETURN list1 = apoc.coll.shuffle(list2) would return false since the order is different.

Cypher - How to query multiple Neo4j Node property fragments with "STARTS WITH"

I'm looking for a way to combine the Cypher "IN" and "STARTS WITH" query. In other words I'm looking for a way to look up nodes that start with specific string sequences that are provided as Array using IN.
The goal is to have the query run in as less as possible calls against the DB.
I browsed the documentation and played around with Neo4j a bit but wasn't able to combine the following two queries into one:
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WHERE a.prop_A IN [...Array of Strings]
RETURN a.prop_A, COLLECT ({result_b: b.prop_B})
and
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WHERE a.prop_A STARTS WITH 'String'
RETURN a.prop_A, b.prop_B
Is there a way to combine these two approaches?
Any help is greatly appreciated.
Krid
You'll want to make sure there is an index or unique constraint (whichever is appropriate) on your :Node_type_A(prop_A) to speed up your lookups.
If I'm reading your requirements right, this query may work for you, adding your input strings as appropriate (parameterize them if you can).
WITH [...] as inputs
UNWIND inputs as input
// each string in inputs is now on its own row
MATCH (a:Node_type_A)
WHERE a.prop_A STARTS WITH input
// should be an fast index lookup for each input string
WITH a
MATCH (a)-[]->(b:Node_type_B)
RETURN a.prop_A, COLLECT ({result_b: b.prop_B})
Something like this should work:
MATCH (a:Node_type_A)-[]->(b:Node_type_B)
WITH a.prop_A AS pa, b.prop_B AS pb
WITH pa, pb,
REDUCE(s = [], x IN ['a','b','c'] |
CASE WHEN pa STARTS WITH x THEN s + pb ELSE s END) AS pbs
RETURN pa, pbs;

Query optimization for matching nodes with equal values

I want to collect all nodes that have the same property value
MATCH (rhs:DataValue),(lhs:DataValue) WHERE rhs.value = lhs.value RETURN rhs,lhs
I have created an index on the property
CREATE INDEX ON :DataValue(value)
the index is created:
Indexes
ON :DataValue(value) ONLINE
I have only 2570 DataValues.
match (n:DataValue) return count(n)
> 2570
However, the query takes ages/does not terminate within the timeout of my browser.
This surprises me as I have an index and expected the query to run within O(n) with n being the amount of nodes.
My train of thought is: If I'd implement it myself I could just match all nodes O(n) sort them by value O(n log n) and then go through the sorted list and return all sublists that are longer than 1 O(n). Thus, the time I could archive is O(n log n). However, I expect the sorting already being covered by the indexing.
How am I mistaken and how can I optimize this query?
Your complexity is actually O(n^2), since your match creates a cartesian product for rhs and lhs, and then does filtering for every single pairing to see if they are equal. The index doesn't apply in your query at all. You should be able to confirm that by running EXPLAIN or PROFILE on the query.
You'll want to tweak your query a little to get it to O(n). Hopefully in a future neo4j version query planning will be smarter so we don't have to be so explicit.
MATCH (lhs:DataValue)
WITH lhs
MATCH (rhs:DataValue)
WHERE rhs.value = lhs.value
RETURN rhs,lhs
Note that your returned values will include opposite pairs ((A, B), (B, A)).

cypher query to extract elements from property arrays

I have a question about extracting specific elements from array-valued properties in Neo4j. For example if the nodes in out database each have a property 'Scores', with Scores being an integer array of length 4. Is there a way to extract the first and fourth elements of every node in a path i.e. can we do something along the lines of -
start src=node(1), end =node(7)
match path=src-[*..2]-end
return extract(n in nodes(path)| n.Scores[1], n.Scores[4]);
p.s. I am using Neo4j 2.0.0-RC1
Does this work for you?
START src=node(1), end=node(7)
MATCH path=src-[*..2]-end
RETURN extract(n in nodes(path)| [n.Scores[0], n.Scores[3]] )
Basically that's creating a collection for each node of the 1st and 4th (indexes start at 0) score. See 8.2.1. Expressions in general
An expression in Cypher can be:
...
A collection of expressions:
["a", "b"], [1,2,3],["a", 2, n.property, {param}], [ ].

Querying multiple indexes not working if one condition fails in Neo4j

I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).
In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.

Resources