Issue with new Cypher 3.1 head() function - neo4j

We are using the new Cypher 3.1 head() function and are seeing some unusual (wrong?) behavior when it is used in a RETURN statement after an OPTIONAL MATCH command. It looks like values that the labels that don't match in the optional match negatively affect the data already collected from the head() perspective, while working as expected for the rest of the RETURN statement. Any ideas on what we're doing wrong or is this an unintended consequence in the Neo4j engine?
MATCH (productLine:ProductLine)-[:CHILD]->(product:Product)-[:CHILD]->(application:Application)-[:MATCHES]->(:Rule {name: 'Tier-0 Application'})
WITH productLine,
application
OPTIONAL MATCH (application)-[mr:MATCHES]->(:Rule {name: 'Multiple Regions'})
WITH application,
mr,
productLine
RETURN
productLine.name AS ProductLine,
head([(productLine)-[:PRODUCT_MANAGER]->(person:Person) | person.name]) AS ProductLineManager,
mr.numServers,
application.id AS AppId
Here's what we see:
ProductLine ProductLineManager mr.numServers AppId
PL1 null null IN000041
PL2 LAST,FIRST 6 AP010024
PL3 LAST,FIRST 6 AP107752
PL4 LAST,FIRST 11 AP106560
PL5 null null AP012190

Looks like this is a bug related to pattern comprehension, not the head() function, when there are rows generated by an an OPTIONAL MATCH.
I was able to reproduce the bug using the movies graph as a base.
I created a bug for this on the Neo4j issues tracker, complete with examples.
As a workaround, it looks like using DISTINCT on either the last WITH or on the RETURN may cause it to return expected results.

Related

How to match all possible paths through multiple relationship types with filtering (Neo4j, Cypher)

I am new to Neo4j and I have a relatively complex (but small) database which I have simplified to the following:
The first door has no key, all other doors have keys, the window doesn't require a key. The idea is that if a person has key:'A', I want to see all possible paths they could take.
Here is the code to generate the db
CREATE (r1:room {name:'room1'})-[:DOOR]->(r2:room {name:'room2'})-[:DOOR {key:'A'}]->(r3:room {name:'room3'})
CREATE (r2)-[:DOOR {key:'B'}]->(r4:room {name:'room4'})-[:DOOR {key:'A'}]->(r5:room {name:'room5'})
CREATE (r4)-[:DOOR {key:'C'}]->(r6:room {name:'room6'})
CREATE (r2)-[:WINDOW]->(r4)
Here is the query I have tried, expecting it to return everything except for room6, instead I have an error which means I really don't know how to construct the query.
with {key:'A'} as params
match (n:room {name:'room1'})-[r:DOOR*:WINDOW*]->(m)
where r.key=params.key or not exists(r.key)
return n,m
To be clear, I don't need my query debugged so much as help understanding how to write it correctly.
Thanks!
This should work for you:
WITH {key:'A'} AS params
MATCH p=(n:room {name:'room1'})-[:DOOR|WINDOW*]->(m)
WHERE ALL(r IN RELATIONSHIPS(p) WHERE NOT EXISTS(r.key) OR r.key=params.key)
RETURN n, m
With your sample data, the result is:
╒════════════════╤════════════════╕
│"n" │"m" │
╞════════════════╪════════════════╡
│{"name":"room1"}│{"name":"room2"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room3"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room4"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room5"}│
└────────────────┴────────────────┘

Query/double OPTIONAL MATCH issue

I think this is a long question for what's likely a simple answer. However I thought it wise to include the full context in case there's something wrong with my query logic (excuse the formatting if it's off - I've renamed the vars and it may be malformed, I need help with the theory and not the structure)
An organisation can have a sub office
(o:Organisation)-[:sub_office]->(an:Organisation)
Or a head office
(o)-[:head_office]->(ho:Organisation)
Persons in different sub offices can be employees or ex-employee
EX1
(o)-[:employee]->(p:Person{name:'person1'})<-[:ex_employee]-(an)
Persons can be related to other people through the management relationships. These management links can be variable length.
EX2
(o)-[:employee]->(p:Person{name:'person2'})-[:managed]->(p:Person{name:'person3'})<-[:ex_employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person4'})-[:managed]->(p:Person{name:'NOT_RETURNED1'})-[:managed]->(p:Person{name:'person5'})<-[:employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person6'})<-[:managed]-(p:Person{name:'NOT_RETURNED2'})<-[:managed]-(p:Person{name:'person8'})<-[:employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person9'})-[:managed]->(p:Person{name:'NOT_RETURNED4'})-[:managed]->(p:Person{name:'NOT_RETURNED5'})<-[:managed]-(p:Person{name:'person11'})<-[:employee]-(an)
....
I'm querying:
-organisation,
-sub office,
-how they're related
These are all working fine (I think...)
The issues I'm having is with returning Persons associated with the orgs (employees or ex employees) and their relationships to the organisation but only if they are connected to the other organisation directly (as in EX1) or through a managed chain (all of EX2 - I've tried to make it clearer by marking the Persons who won't be returned by the query as name 'NOT_RETURNED')
I've created the following:
MATCH (queryOrganisation:Organisation{name:'BigCorp'})-[orgRel]-(relatedOrganisation:Organisation)
WITH queryOrganisation, orgRel, relatedOrganisation
MATCH (queryOrganisation)-[employmentRel]->(queryPerson:Person)
OPTIONAL MATCH (queryPerson)<-[relatedOrgRel]-(relatedOrganisation)
OPTIONAL MATCH (queryPerson)-[:managed*1..]-(relatedPerson:Person)<-[relatedOrgRel]-(relatedOrganisation)
WITH queryOrganisation, orgRel, relatedOrganisation, employmentRel, queryPerson, relatedOrgRel, relatedPerson
WHERE NOT queryOrganisation.name = relatedOrganisation.name
RETURN ID(queryOrganisation) as queryOrganisationID,
ID(startNode(orgRel))as startNodeId, type(orgRel)as orgRel, ID(endNode(orgRel))as endNodeId,
ID(relatedOrganisation)as relatedOrganisationId, relatedOrganisation.name as relatedOrganisationName
COLLECT({
queryPerson:{endpoint:{ID:ID(queryPerson)}, endpointrelationship: type(employmentRel)},
relatedPerson:{endpoint:{ID:coalesce(ID(relatedPerson),ID(queryPerson))}, endpointrelationship:type(relatedOrgRel)}
}) as rels
I would have expected all the collected results to look like:
{
"startEmp":{
"ID":2715,
"startrelationship":"employee"
},
"relatedEmp":{
"ID":2722,
"endrelationship":"ex employee"
}
}
However the directly connected node results (same node ID) appear like:
{
"startEmp":{
"ID":2716,
"startrelationship":"employee"
},
"relatedEmp":{
"ID":2716,
"endrelationship":null
}
}
Why is that null appearing for type(relatedOrgRel)? Am I misunderstanding whats happening in the OPTIONAL MATCH and the relatedOrgRel gets overwritten by null during the second OPTIONAL MATCH? If so, how can I remedy?
Thanks
No, the OPTIONAL MATCHes cannot overwrite variables that are already defined.
I think the cause of the problems is when your second OPTIONAL MATCH doesn't match anything, but this is partially covered up by the COALESCE used in the collecting of persons in your return hides some of the conseque:
...
relatedPerson:{endpoint:{ID:coalesce(ID(relatedPerson),ID(queryPerson))}, endpointrelationship:type(relatedOrgRel)}
...
If relatedPerson is null, as it will be if your second OPTIONAL MATCH fails, then you're falling back to the id of queryPerson, but since you're not using a COALESCE for relatedOrgRel, this will still be null. You'll need a COALESCE here, or otherwise you'll need to figure out a better way to deal with the null variables in your OPTIONAL MATCHES in cases where they fail.

Cypher query - Optional Create

I am trying to create a social network-like structure.
I would like to create a timeline of posts which looks like this
(user:Person)-[:POSTED]->(p1:POST)-[:PREV]->[p2:POST]...
My problem is the following.
Assuming a post for a user already exists, I can create a new post by executing the following cypher query
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post:POST)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Assuming, the user has not created a post yet, this query fails. So I tried to somehow include both cases (user has no posts / user has at least one post) in one update query (I would like to insert a new post in the "post timeline")
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN rel IS NOT NULL THEN [rel] ELSE [] END |
DELETE rel
)
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
CREATE (post)-[:PREV]->(o)
)
MERGE (user)-[:POSTED]->(post)
Is there any kind of if-statement (or some type of CREATE IF NOT NULL) to avoid using a foreach loop two times (the query looks a litte bit complicated and I know that the loop will only run 1 time)?.
However, this was the only solution, I could come up with after studying this SO post. I read in an older post that there is no such thing as an if-statement.
EDIT: The question is: Is it even good to include both cases in one query since I know that the "no-post case" will only occur once and that all other cases are "at least one post"?
Cheers
I've seen a solution to cases like this in some articles. To use a single query for all cases, you could create a special terminating node for the list of posts. A person with no posts would be like:
(:Person)-[:POSTED]->(:PostListEnd)
Now in all cases you can run the query:
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Note that the no label is specified for prev_post, so it can match either (:POST) or (:PostListEnd).
After running the query, a person with 1 post will be like:
(:Person)-[:POSTED]->(:POST)-[:PREV]->(:PostListEnd)
Since the PostListEnd node has no info of its own, you can have the same one node for all your users.
I also do not see a better solution than using FOREACH.
However, I think I can make your query a bit more efficient. My solution essentially merges the 2 FOREACH tests into 1, since prev_postand rel must either be both NULL or both non-NULL. It also combines the CREATE and the MERGE (which should have been a CREATE, anyway).
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (user)-[:POSTED]->(post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
DELETE rel
CREATE (post)-[:PREV]->(o)
)
In the Neo4j v3.2 developer manual it specifies how you can create essentially a composite key made of multiple node properties at this link:
CREATE CONSTRAINT ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
However, this is only available for the Enterprise Edition, not Community.
"CASE" is as close to an if-statement as you're going to get, I think.
The FOREACH probably isn't so bad given that you're likely limited in scope. But I see no particular downside to separating the query into two, especially to keep it readable and given the operations are fairly small.
Just my two cents.

How to use START with Cypher / Neo4j 2.0

I am trying example provided in Graph Databases book (PDF page 51-52)with Neo4j 2.0.1 (latest). It appears that I cannot just copy paste the code sample from the book (I guess the syntax is no longer valid).
START bob=node:user(username='Bob'),
charlie=node:user(username='Charlie')
MATCH (bob)-[e:EMAILED]->(charlie)
RETURN e
Got #=> Index `user` does not exist.
So, I tried without 'user'
START bob=node(username='Bob'),
charlie=node(username='Charlie')
MATCH (bob)-[e:EMAILED]->(charlie)
RETURN e
Got #=> Invalid input 'u': expected whitespace, an unsigned integer, a parameter or '*'
Tried this but didn't work
START bob=node({username:'Bob'}),
(charlie=node({username:'Charlie'})
MATCH (bob)-[e:EMAILED]->(charlie)
RETURN e
Got #=> Invalid input ':': expected an identifier character, whitespace or '}'
I want to use START then MATCH to achieve this. Would appreciate little bit of direction to get started.
From version 2.0 syntax has changed.
http://docs.neo4j.org/chunked/stable/query-match.html
Your first query should look like this.
MATCH (bob {username:'Bob'})-[e:EMAILED]->(charlie {username:'Charlie'})
RETURN e
The query does not work out of the box because you'll need to create the user index first. This can't be done with Cypher though, see the documentation for more info. Your syntax is still valid, but Lucene indexes are considered legacy. Schema indexes replace them, but they are not fully mature yet (e.g. no wildcard searches, IN support, ...).
You'll want to use labels as well, in your case a User label. The query can be refactored to:
MATCH (b:User { username:'Bob' })-[e:EMAILED]->(c:User { username:'Charlie' })
RETURN e
For good performance, add a schema index on the username property as well:
CREATE INDEX ON :User(username)
Start is optional, as noted above. Given that it's listed under the "deprecated" section in the Cypher 2.0 refcard, I would try to avoid using it going forward just for safety purposes.
However, the refcard does state that you can prepend your Cypher query with "CYPHER 1.9" (without the quotes) in order to make explicit use of the older syntax.

How to check if an element is in a node.collection using Cypher?

I'm begining with Neo4j/Cypher, I have some nodes containing a property which is an array of integers. I want to check if a given number is in a node's collection and if so, append this node to the results. My query looks like this:
MATCH (a) WHERE has(a.user_ids) and (13 IN a.user_ids) RETURN a
where 13 is the given user_id. It throws a syntax error:
Type mismatch: a already defined with conflicting type Node (expected Collection<Any>)
Any idea how can I accomplish that?
Thanks in advance.
You can try the predicate ANY, which returns true if any member of a collection matches some criterion.
MATCH (a) WHERE has(a.user_ids) and ANY(user_id IN a.user_ids WHERE user_id = 13)
It looks a bit backwards now that I'm looking at it, but it should work.
Edit:
It was bugging me why your query didn't work and why my answer seemed backwards and indirect so I did a simple test. Basically, your original query works if you put the property reference in parentheses:
MATCH (a)
WHERE has(a.user_ids) and (13 IN (a.user_ids))
RETURN a
That's easier to read so that's what I should have answered. But I still couldn't see why the parentheses where necessary here, when they are not in other cases. They were not necessary inside the ANY() above, and if you 'detach' the collection from the node
MATCH (a)
WITH a.user_ids as user_ids, a
WHERE 13 IN user_ids
RETURN a
there's no problem. For some reason Cypher needs to be told to evaluate a.user_ids before IN, or it ignores user_ids and tries to evaluate 13 IN a. IN is listed as an operator in the documentation, but in this regard it woks differently than other operators. For example
MATCH (a) RETURN 13 + a.user_ids
returns fine and
MATCH (a) RETURN 13 * a.user_ids
MATCH (a) RETURN 13 < a.user_ids
fails but because a.user_ids is a collection, not because a is a node. It's probably not very important, it's easy enough to use parentheses, but it would be interesting to learn why they are necessary.
I also compared my answer to your original query with added parentheses to see if there were any performance drawback to the more indirect way. Turns out the execution plan is almost identical, 13 IN (a.user_ids) is refactored to use ANY() like in my answer.
My answer:
Filter(pred="any(user_id in Product(a,user_ids(6),true) where user_id == Literal(13))", _rows=1, _db_hits=8)
AllNodes(identifier="a", _rows=8, _db_hits=8)
Your query + ():
Filter(pred="any(-_-INNER-_- in Product(n,user_ids(6),true) where Literal(13) == -_-INNER-_-)", _rows=1, _db_hits=8)
AllNodes(identifier="n", _rows=8, _db_hits=8)
Finally, in your case you probably don't have to check for existence of property with has(). Absent properties and null are handled differently in 2.0 and if the property doesn't exist 13 IN (a.user_ids) will evaluate to false, so usually there is no reason to test for property existence before property evaluation for fear of the query breaking. The place to use has() would be when property existence is relevant in itself, and that would probably be a different property than the one evaluated, i.e. WHERE has(a.someProperty) AND 13 IN (a.someOtherProperty).
Since there is no performance difference, the more readable query is better, and since you, as far as I can see, don't really need to test for property existence, I think your query should be
MATCH (a)
WHERE 13 IN (a.user_ids)
RETURN a

Resources