Query/double OPTIONAL MATCH issue

Query/double OPTIONAL MATCH issue - neo4j

I think this is a long question for what's likely a simple answer. However I thought it wise to include the full context in case there's something wrong with my query logic (excuse the formatting if it's off - I've renamed the vars and it may be malformed, I need help with the theory and not the structure)
An organisation can have a sub office
(o:Organisation)-[:sub_office]->(an:Organisation)
Or a head office
(o)-[:head_office]->(ho:Organisation)
Persons in different sub offices can be employees or ex-employee
EX1
(o)-[:employee]->(p:Person{name:'person1'})<-[:ex_employee]-(an)
Persons can be related to other people through the management relationships. These management links can be variable length.
EX2
(o)-[:employee]->(p:Person{name:'person2'})-[:managed]->(p:Person{name:'person3'})<-[:ex_employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person4'})-[:managed]->(p:Person{name:'NOT_RETURNED1'})-[:managed]->(p:Person{name:'person5'})<-[:employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person6'})<-[:managed]-(p:Person{name:'NOT_RETURNED2'})<-[:managed]-(p:Person{name:'person8'})<-[:employee]-(an)
(o)-[:ex_employee]->(p:Person{name:'person9'})-[:managed]->(p:Person{name:'NOT_RETURNED4'})-[:managed]->(p:Person{name:'NOT_RETURNED5'})<-[:managed]-(p:Person{name:'person11'})<-[:employee]-(an)
....
I'm querying:
-organisation,
-sub office,
-how they're related
These are all working fine (I think...)
The issues I'm having is with returning Persons associated with the orgs (employees or ex employees) and their relationships to the organisation but only if they are connected to the other organisation directly (as in EX1) or through a managed chain (all of EX2 - I've tried to make it clearer by marking the Persons who won't be returned by the query as name 'NOT_RETURNED')
I've created the following:
MATCH (queryOrganisation:Organisation{name:'BigCorp'})-[orgRel]-(relatedOrganisation:Organisation)
WITH queryOrganisation, orgRel, relatedOrganisation
MATCH (queryOrganisation)-[employmentRel]->(queryPerson:Person)
OPTIONAL MATCH (queryPerson)<-[relatedOrgRel]-(relatedOrganisation)
OPTIONAL MATCH (queryPerson)-[:managed*1..]-(relatedPerson:Person)<-[relatedOrgRel]-(relatedOrganisation)
WITH queryOrganisation, orgRel, relatedOrganisation, employmentRel, queryPerson, relatedOrgRel, relatedPerson
WHERE NOT queryOrganisation.name = relatedOrganisation.name
RETURN ID(queryOrganisation) as queryOrganisationID,
ID(startNode(orgRel))as startNodeId, type(orgRel)as orgRel, ID(endNode(orgRel))as endNodeId,
ID(relatedOrganisation)as relatedOrganisationId, relatedOrganisation.name as relatedOrganisationName
COLLECT({
queryPerson:{endpoint:{ID:ID(queryPerson)}, endpointrelationship: type(employmentRel)},
relatedPerson:{endpoint:{ID:coalesce(ID(relatedPerson),ID(queryPerson))}, endpointrelationship:type(relatedOrgRel)}
}) as rels
I would have expected all the collected results to look like:
{
"startEmp":{
"ID":2715,
"startrelationship":"employee"
},
"relatedEmp":{
"ID":2722,
"endrelationship":"ex employee"
}
}
However the directly connected node results (same node ID) appear like:
{
"startEmp":{
"ID":2716,
"startrelationship":"employee"
},
"relatedEmp":{
"ID":2716,
"endrelationship":null
}
}
Why is that null appearing for type(relatedOrgRel)? Am I misunderstanding whats happening in the OPTIONAL MATCH and the relatedOrgRel gets overwritten by null during the second OPTIONAL MATCH? If so, how can I remedy?
Thanks

No, the OPTIONAL MATCHes cannot overwrite variables that are already defined.
I think the cause of the problems is when your second OPTIONAL MATCH doesn't match anything, but this is partially covered up by the COALESCE used in the collecting of persons in your return hides some of the conseque:
...
relatedPerson:{endpoint:{ID:coalesce(ID(relatedPerson),ID(queryPerson))}, endpointrelationship:type(relatedOrgRel)}
...
If relatedPerson is null, as it will be if your second OPTIONAL MATCH fails, then you're falling back to the id of queryPerson, but since you're not using a COALESCE for relatedOrgRel, this will still be null. You'll need a COALESCE here, or otherwise you'll need to figure out a better way to deal with the null variables in your OPTIONAL MATCHES in cases where they fail.

Related

(Neo4j 3.5) Executing query with two parts and the second one consists on a group of matches where at least one needs to be fulfilled

we are trying to run a cypher query but we are not able to get the results we want.
It is important to note that we cannot make this work with subqueries because we are using Neo4j 3.5 and in this version, they are still not available.
The problem is that we have two parts for a query, the first one will fix the variables for the second part, and the second part consists of multiple matches, and it has to get the results of the previous query and if at least one of the matches return a result for a row, this row is not filtered out, else if none return result it is discarded.
More specifically, the query we are trying to run is similar to the following one:
//First part of the query where we want to fix variables with the match and where
MATCH (u:User)-[:ASSIGNED_TO]->(t:Task)-[:PENDING]->(ob:Object)<-[:HAS_OPEN_OBJECT]-(do:DataObject)<-[:ASOCIATED]-(:Module)-[:CAN_LIST]->(view:WidgetObject)
WHERE u.uid = 'user_uid'
AND view.uid = 'view_uid'
AND view.object_name = do.object_type
with do, t, ob
// In this second part of the query we want to maintain the variables of the previous part and if at least one matches the value should be returned
// we have tried with UNION but we will need pagination, but even with union it's not working
MATCH (ac:Action)<-[:ASOCIATED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE ac.name CONTAINS 'body'
WITH COLLECT({data_object_uid: do.uid}) as act_filter
MATCH (c:Comment)<-[:COMMENTED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE c.body CONTAINS 'body'
WITH act_filter + COLLECT({data_object_uid: do.uid}) as comment_filter
MATCH (at:Attachment)<-[:HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE at.name CONTAINS 'body'
WITH comment_filter + COLLECT({data_object_uid: do.uid}) as attachment_filter
UNWIND attachment_filter as row
return row.data_object_uid
We are not sure if in the second part, the second and third matches are maintaining the same subset of results coming from the first part of the query.
A funny behavior we have found is that if we remove the last match we are getting results but if we add it, we are not getting any results. We do not understand this behavior because if the second match is returning results and they are stored in a variable after a collect, appending this to the next collected results should return something.
For example, if the second match returns as comment_filter [{data_object_uid: "343dienmd3-dasd"}] and the third match is not returning anything, after the concatenation in the WITH clause it should return the same thing, but the result is empty.
We need some light here, we don't know if we are close and we are making a stupid mistake or we are getting all wrong and we need to change the approach completely.

Since you do not know which of the three matches in the second part will yield results, I would try something along the lines below:
NOTE I used ASSOCIATED instead of ASOCIATED
MATCH (n)<-[:ASSOCIATED|COMMENTED|HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE
(n:Action AND n.name CONTAINS 'body')
OR
(n:Comment AND n.body CONTAINS 'body')
OR
(n:Attachment AND n.name CONTAINS 'body')
RETURN COLLECT(DISTINCT {data_object_uid: do.uid})

How to filter out non-null path between nodes in Neo4J/Cypher

My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2

Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2

Find path in Neo4j with directed edges

This is my first attempt at Neo4j, please excuse me if I am missing something very trivial.
Here is my problem:
Consider the graph as created in the following Neo4j console example:
http://console.neo4j.org/?id=y13kbv
We have following nodes in this example:
(Person {memberId, memberName, membershipDate})
(Email {value, badFlag})
(AccountNumber {value, badFlag})
We could potentially have more nodes capturing features related to a Person like creditCard, billAddress, shipAddress, etc.
All of these nodes will be the same as Email and AccountNumber nodes:
(creditCard {value, badFlag}), (billAddress {value, badFlag}),etc.
With the graph populated as seen in the Neo4j console example, assume that we add one more Person to the graph as follows:
(p7:Person {memberId:'18' , memberName:'John', membershipDate:'12/2/2015'}),
(email6:Email {value: 'john#gmail.com', badFlag:'false'}),
(a2)-[b13:BELONGS_TO]->(p7),
(email6)-[b14:BELONGS_TO]->(p7)
When we add this new person to the system, the use case is that we have to check if there exists a path from features of the new Person ("email6" and "a2" nodes) to any other node in the system where the "badFlag=true", in this case node (a1 {value:1234, badFlag:true}).
Here, the resultant path would be (email6)-[BELONGS_TO]->(p7)<-[BELONGS_TO]-(a2)-[BELONGS_TO]->(p6)<-[BELONGS_TO]-(email5)-[BELONGS_TO]->(p5)<-[BELONGS_TO]-(a1:{badFlag:true})
I tried something like this:
MATCH (newEmail:Email{value:'john#gmail.com'})-[:BELONGS_TO]->(p7)-[*]-(badPerson)<-[:BELONGS_TO]-(badFeature{badFlag:'true'}) RETURN badPerson, badFeature;
which seems to work when there is only one level of chaining, but it doesn't work when the path could be longer like in the case of Neo4j console example.
I need help with the Cypher query that will help me solve this problem.
I will eventually be doing this operation using Neo4j's Java API using my application. What could be the right way to go about doing this using Java API?

You had a typo in you query. PART_OF should be BELONGS_TO. This should work for you:
MATCH (newEmail:Email {value:'john#gmail.com'})-[:BELONGS_TO]->(p7)-[*]-(badPerson)<-[:BELONGS_TO]-(badFeature {badFlag:'true'})
RETURN badPerson, badFeature;
Aside: You seem to use string values for all properties. I'd replace the string values 'true' and 'false' with the boolean values true and false. Likewise, values that are always numeric should just use integer or float values.

Cypher query - Optional Create

I am trying to create a social network-like structure.
I would like to create a timeline of posts which looks like this
(user:Person)-[:POSTED]->(p1:POST)-[:PREV]->[p2:POST]...
My problem is the following.
Assuming a post for a user already exists, I can create a new post by executing the following cypher query
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post:POST)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Assuming, the user has not created a post yet, this query fails. So I tried to somehow include both cases (user has no posts / user has at least one post) in one update query (I would like to insert a new post in the "post timeline")
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN rel IS NOT NULL THEN [rel] ELSE [] END |
DELETE rel
)
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
CREATE (post)-[:PREV]->(o)
)
MERGE (user)-[:POSTED]->(post)
Is there any kind of if-statement (or some type of CREATE IF NOT NULL) to avoid using a foreach loop two times (the query looks a litte bit complicated and I know that the loop will only run 1 time)?.
However, this was the only solution, I could come up with after studying this SO post. I read in an older post that there is no such thing as an if-statement.
EDIT: The question is: Is it even good to include both cases in one query since I know that the "no-post case" will only occur once and that all other cases are "at least one post"?
Cheers

I've seen a solution to cases like this in some articles. To use a single query for all cases, you could create a special terminating node for the list of posts. A person with no posts would be like:
(:Person)-[:POSTED]->(:PostListEnd)
Now in all cases you can run the query:
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Note that the no label is specified for prev_post, so it can match either (:POST) or (:PostListEnd).
After running the query, a person with 1 post will be like:
(:Person)-[:POSTED]->(:POST)-[:PREV]->(:PostListEnd)
Since the PostListEnd node has no info of its own, you can have the same one node for all your users.

I also do not see a better solution than using FOREACH.
However, I think I can make your query a bit more efficient. My solution essentially merges the 2 FOREACH tests into 1, since prev_postand rel must either be both NULL or both non-NULL. It also combines the CREATE and the MERGE (which should have been a CREATE, anyway).
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (user)-[:POSTED]->(post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
DELETE rel
CREATE (post)-[:PREV]->(o)
)

In the Neo4j v3.2 developer manual it specifies how you can create essentially a composite key made of multiple node properties at this link:
CREATE CONSTRAINT ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
However, this is only available for the Enterprise Edition, not Community.

"CASE" is as close to an if-statement as you're going to get, I think.
The FOREACH probably isn't so bad given that you're likely limited in scope. But I see no particular downside to separating the query into two, especially to keep it readable and given the operations are fairly small.
Just my two cents.

How to check if an element is in a node.collection using Cypher?

I'm begining with Neo4j/Cypher, I have some nodes containing a property which is an array of integers. I want to check if a given number is in a node's collection and if so, append this node to the results. My query looks like this:
MATCH (a) WHERE has(a.user_ids) and (13 IN a.user_ids) RETURN a
where 13 is the given user_id. It throws a syntax error:
Type mismatch: a already defined with conflicting type Node (expected Collection<Any>)
Any idea how can I accomplish that?
Thanks in advance.

You can try the predicate ANY, which returns true if any member of a collection matches some criterion.
MATCH (a) WHERE has(a.user_ids) and ANY(user_id IN a.user_ids WHERE user_id = 13)
It looks a bit backwards now that I'm looking at it, but it should work.
Edit:
It was bugging me why your query didn't work and why my answer seemed backwards and indirect so I did a simple test. Basically, your original query works if you put the property reference in parentheses:
MATCH (a)
WHERE has(a.user_ids) and (13 IN (a.user_ids))
RETURN a
That's easier to read so that's what I should have answered. But I still couldn't see why the parentheses where necessary here, when they are not in other cases. They were not necessary inside the ANY() above, and if you 'detach' the collection from the node
MATCH (a)
WITH a.user_ids as user_ids, a
WHERE 13 IN user_ids
RETURN a
there's no problem. For some reason Cypher needs to be told to evaluate a.user_ids before IN, or it ignores user_ids and tries to evaluate 13 IN a. IN is listed as an operator in the documentation, but in this regard it woks differently than other operators. For example
MATCH (a) RETURN 13 + a.user_ids
returns fine and
MATCH (a) RETURN 13 * a.user_ids
MATCH (a) RETURN 13 < a.user_ids
fails but because a.user_ids is a collection, not because a is a node. It's probably not very important, it's easy enough to use parentheses, but it would be interesting to learn why they are necessary.
I also compared my answer to your original query with added parentheses to see if there were any performance drawback to the more indirect way. Turns out the execution plan is almost identical, 13 IN (a.user_ids) is refactored to use ANY() like in my answer.
My answer:
Filter(pred="any(user_id in Product(a,user_ids(6),true) where user_id == Literal(13))", _rows=1, _db_hits=8)
AllNodes(identifier="a", _rows=8, _db_hits=8)
Your query + ():
Filter(pred="any(-_-INNER-_- in Product(n,user_ids(6),true) where Literal(13) == -_-INNER-_-)", _rows=1, _db_hits=8)
AllNodes(identifier="n", _rows=8, _db_hits=8)
Finally, in your case you probably don't have to check for existence of property with has(). Absent properties and null are handled differently in 2.0 and if the property doesn't exist 13 IN (a.user_ids) will evaluate to false, so usually there is no reason to test for property existence before property evaluation for fear of the query breaking. The place to use has() would be when property existence is relevant in itself, and that would probably be a different property than the one evaluated, i.e. WHERE has(a.someProperty) AND 13 IN (a.someOtherProperty).
Since there is no performance difference, the more readable query is better, and since you, as far as I can see, don't really need to test for property existence, I think your query should be
MATCH (a)
WHERE 13 IN (a.user_ids)
RETURN a

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Query/double OPTIONAL MATCH issue - neo4j

Related

(Neo4j 3.5) Executing query with two parts and the second one consists on a group of matches where at least one needs to be fulfilled

How to filter out non-null path between nodes in Neo4J/Cypher

Find path in Neo4j with directed edges

Cypher query - Optional Create

How to check if an element is in a node.collection using Cypher?

Categories

Resources