Neo4j - Cypher / strange output when dealing with an array as node property - neo4j

Using Cypher (Neo4j 2.1.2), it seems that array properties do not work well with aggregate functions.
For instance, I can have a RETURN clause like this:
RETURN meeting.title, count(participant) as number_part
Output: MyTitle 2
It well returns all the meetings's titles grouped by participants.
However, with an array as property rather than simple one like title, the output is strange:
RETURN meeting.arrayProperty, count(participant) as number_part
Output:
MyTitle [1,2,3] 1
MyTitle [1,2,3] 1 //not grouped by ...
Better than text, here's a graphgist I made to explain the issue, the workaround I found and what I really expect.
Does anyone know the reason? (maybe obvious...)

Just tried the following workaround: rebuild the array property as an collection:
RETURN extract(x in meeting.arrayProperty | x), count(participant) as number_part
Theory: the array property is handled as java native array whereas extract returns a collection (in Java sense). Comparing collections works based on comparing the elements whereas comparing a native array compares memory addresses which are different.

Related

(Neo4j 3.5) Executing query with two parts and the second one consists on a group of matches where at least one needs to be fulfilled

we are trying to run a cypher query but we are not able to get the results we want.
It is important to note that we cannot make this work with subqueries because we are using Neo4j 3.5 and in this version, they are still not available.
The problem is that we have two parts for a query, the first one will fix the variables for the second part, and the second part consists of multiple matches, and it has to get the results of the previous query and if at least one of the matches return a result for a row, this row is not filtered out, else if none return result it is discarded.
More specifically, the query we are trying to run is similar to the following one:
//First part of the query where we want to fix variables with the match and where
MATCH (u:User)-[:ASSIGNED_TO]->(t:Task)-[:PENDING]->(ob:Object)<-[:HAS_OPEN_OBJECT]-(do:DataObject)<-[:ASOCIATED]-(:Module)-[:CAN_LIST]->(view:WidgetObject)
WHERE u.uid = 'user_uid'
AND view.uid = 'view_uid'
AND view.object_name = do.object_type
with do, t, ob
// In this second part of the query we want to maintain the variables of the previous part and if at least one matches the value should be returned
// we have tried with UNION but we will need pagination, but even with union it's not working
MATCH (ac:Action)<-[:ASOCIATED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE ac.name CONTAINS 'body'
WITH COLLECT({data_object_uid: do.uid}) as act_filter
MATCH (c:Comment)<-[:COMMENTED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE c.body CONTAINS 'body'
WITH act_filter + COLLECT({data_object_uid: do.uid}) as comment_filter
MATCH (at:Attachment)<-[:HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE at.name CONTAINS 'body'
WITH comment_filter + COLLECT({data_object_uid: do.uid}) as attachment_filter
UNWIND attachment_filter as row
return row.data_object_uid
We are not sure if in the second part, the second and third matches are maintaining the same subset of results coming from the first part of the query.
A funny behavior we have found is that if we remove the last match we are getting results but if we add it, we are not getting any results. We do not understand this behavior because if the second match is returning results and they are stored in a variable after a collect, appending this to the next collected results should return something.
For example, if the second match returns as comment_filter [{data_object_uid: "343dienmd3-dasd"}] and the third match is not returning anything, after the concatenation in the WITH clause it should return the same thing, but the result is empty.
We need some light here, we don't know if we are close and we are making a stupid mistake or we are getting all wrong and we need to change the approach completely.
Since you do not know which of the three matches in the second part will yield results, I would try something along the lines below:
NOTE I used ASSOCIATED instead of ASOCIATED
MATCH (n)<-[:ASSOCIATED|COMMENTED|HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE
(n:Action AND n.name CONTAINS 'body')
OR
(n:Comment AND n.body CONTAINS 'body')
OR
(n:Attachment AND n.name CONTAINS 'body')
RETURN COLLECT(DISTINCT {data_object_uid: do.uid})

Cypher query with literal map syntax & dynamic keys

I'd like to make a cypher query that generates a specific json output. Part of this output includes an object with a dynamic amount of keys relative to the children of a parent node:
{
...
"parent_keystring" : {
child_node_one.name : child_node_one.foo
child_node_two.name : child_node_two.foo
child_node_three.name : child_node_three.foo
child_node_four.name : child_node_four.foo
child_node_five.name : child_node_five.foo
}
}
I've tried to create a cypher query but I do not believe I am close to achieving the desired output mentioned above:
MATCH (n)-[relone:SPECIFIC_RELATIONSHIP]->(child_node)
WHERE n.id='839930493049039430'
RETURN n.id AS id,
n.name AS name,
labels(n)[0] AS type,
{
COLLECT({
child.name : children.foo
}) AS rel_two_representation
} AS parent_keystring
I had planned for children.foo to be a count of how many occurrences of each particular relationship/child of the parent. Is there a way to make use of the reduce function? Where a report would generate based on analyzing the array proposed below? ie report would be a json object where each key is a distinct RELATIONSHIP and the property value would be the amount of times that relationship stems from the parent node?
Thank you greatly in advance for guidance you can offer.
I'm not sure that Cypher will let you use a variable to determine an object's key. Would using an Array work for you?
COLLECT([child.name, children.foo]) AS rel_two_representation
I think, Neo4j Server API output by itself should be considered as any database output (like MySQL). Even if it is possible to achieve, with default functionality, desired output - it is not natural way for database.
Probably you should look into creating your own server plugin. This allows you to implement any custom logic, with desired output.

.indexOf() equivalent in Neo4j Cypher

No matter how I swing it, I need some kind of function to find the index of a item in an array supplied as a parameter.
I am trying to simply update items in a collection based on the index of one of their properties in an array, and have been poring through Cypher docs for nearly 2 hours...
It would also be acceptable to order the items by that array, and then run a foreach on the ordered list...
Following #stefan-armbruster answer and great blog post, a slow but simple index_of can be done with:
reduce(x=[-1,0], i IN [1,2,7,5,21,5,1,435] |
CASE WHEN i = 21 THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END
)[0]
Here reduce function works with a two elements array: the position and the current index. If an element in your array matches the given condition, the first element of the reduced array will be replaced with the current index.
I put an example on neo4j console http://console.neo4j.org/?id=34byv
I've blogged about that recently. You can use the reduce function with an three element array as state variable containing
the index of the highest occupation so far
the current index (aka iteration number)
the value of highest occupation so far
As an example to find the index of max element in an array:
RETURN reduce(x=[0,0,0], i IN [1,2,2,5,2,1] |
CASE WHEN i>x[2] THEN [x[1],x[1]+1,o] ELSE [x[0], x[1]+1,x[2]] END
)[0]

Querying array element in node property

I have nodes within a the graph with property pathway storing an array of of values ranging from
path:ko00030
path:ko00010
.
.
path:koXXXXX
As an example, (i'm going to post in the batch import format:https://github.com/jexp/batch-import/tree/20)
ko:string:koid name definition l:label pathway:string_array pathway.name:string_array
ko:K00001 E1.1.1.1, adh alcohol dehydrogenase [EC:1.1.1.1] ko path:ko00010|path:ko00071|path:ko00350|path:ko00625|path:ko00626|path:ko00641|path:ko00830
the subsequent nodes might have a different combination of pathway values.
How do i query using CYPHER to retrieve all nodes with path:ko00010 in pathway
the closest i've gotten is using the solution provided for a different problem:
How to check array property in neo4j?
match (n:ko)--cpd
Where has(n.pathway) and all ( m in n.pathway where m in ["path:ko00010"])
return n,cpd;
but here only nodes with pathways matching exactly to the list provided are returned.
ie. if i were to query path:ko00010 like in the example above, I'll only be able to retrieve nodes holding path:ko00010 as the only element in the pathway property and not nodes containing path:ko00010 as well as other path:koXXXXX
In your query the extension of the predicate ALL is all the values in the property array, meaning that your query will only return those cases where every value in the pathway property array are found in the literal array ["path:ko00010"]. If I understand you right you want the opposite, you want to test that all values in the literal array ["path:ko00010"] are found in the property array pathway. If that's indeed what you want you can just switch their places, your WHERE clause will then be
WHERE HAS(n.pathway) AND ALL (m IN ["path:ko00010"] WHERE m IN n.pathway)
It is not strictly correct to say that your query only matches cases where the array you ask for and the property array are exactly the same. You could have had more than one value in the literal array, something like ["path:ko00010","path:ko00020"], and nodes with only one one of those values in their pathway array would also have matched–as long as all values in the property array could be found in the literal array. Conversely, with the altered WHERE filter that I've suggested, the query will match any node that has all of the values of the literal array in their pathway property.
If you want to filter the matched patterns with an array of values where all of them have to be present, this is good. In your example you only use one value, however, and for those queries there is no reason to use an array and the ALL predicate. You can simply do
WHERE HAS(n.pathway) and "path:ko00010" IN n.pathway
If in some context you want to include results where any of a set of values are found in the pathway property array you can just switch from ALL to ANY
WHERE HAS(n.pathway) AND ANY (m IN ["path:ko00010","path:ko00020"] WHERE m IN n.pathway)
Also, you probably don't need to check for the presence of the pathway property, unless you have some special use for it you should be fine without the HAS(n.pathway).
And once you've got the queries working right, try to switch out literal strings and arrays for parameters!
WHERE {value} IN n.pathway
// or
WHERE ALL (m IN {value_array} WHERE m IN n.pathway)
// or
WHERE ANY (m IN {value_array} WHERE m IN n.pathway)

Cypher Query combining results that can be ordered as a whole

I have a Cypher query that combines two result sets that I would like to then order as a combined result.
An example of what I am trying to do is here: http://console.neo4j.org/r/j2sotz
Which gives the error:
Cached(nf of type Collection) expected to be of type Map but it is of type Collection - maybe aggregation removed it?
Is there a way to collect multiple results into a single result that can be paged, ordered, etc?
There are many posts about combining results, but I can't find any that allow them to be treated as a map.
Thanks for any help.
You can collect into a single result like this:
Start n=node(1)match n-[r]->m
with m.name? as outf, n
match n<-[r]-m
with m.name? as inf, outf
return collect(outf) + collect(inf) as f
Unions are covered here: https://github.com/neo4j/neo4j/issues/125 (not available right now).
I haven't seen anything about specifically sorting a collection.

Resources