I've read the questions about subqueries but still stuck with this use case.
I have Documents that contain one or more keywords and each document has linked user comments with a status property. I want to get just the most recent status (if it exists) returned for each document in the query. If I run a query like the following, I just get one row.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d,k
OPTIONAL MATCH (d)--(c:Comments)
ORDER BY c.created DESC LIMIT 1
RETURN d.Title as Title, k.Word as Keyword, c.Status as Status
I have hundreds of documents I want to return with the latest status like:
Title Keyword Status
War in the 19th Century WWI Reviewed
War in the 19th Century Weapons Reviewed
The Great War WWI Pending
World War I WWI <null>
I have tried multiple queries using WITH clause but no luck yet. Any suggestions would be appreciated.
This query should do what you probably intended:
MATCH (d:Document)-[:Keys]->(k:Keywords)
OPTIONAL MATCH (d)--(c:Comments)
WITH d, COLLECT(k.Word) AS Keywords, c
ORDER BY c.created DESC
WHERE c IS NOT NULL
RETURN d.Title as Title, Keywords, COLLECT(c)[0] AS Status
Since a comment is related to a document, and not a document/keyword pair, it makes more sense to return a collection of keywords for each Title/Status pair. Your original query, if it had worked, would have returned the same Title/Status pair multiple times, each time with a different keyword.
We've got a knowledge base article explaining how to limit results of a match per-row, that should give you a few good options.
EDIT
Here's a full example, using apoc.cypher.run() to perform the limited subquery.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d, COLLECT(k.Word) AS Keywords
// collect keywords first so each document on a row
CALL apoc.cypher.run('
OPTIONAL MATCH (d)--(c:Comments)
RETURN c
ORDER BY c.created DESC
LIMIT 1
', {d:d}) YIELD value
RETURN d.Title as Title, Keywords, value.c.status as Status
Related
Hi the below is within the context of Neo4j GraphQL (so I need to ensure I'm returning nodes here, rather than Maps or anything, I think).
I have nodes Users and Posts where users can write or repost posts. Essentially posts have created_date fields but when a user "reposts" that post, I would like to get the time of that repost.
The typical relationships look like this:
(u:User)-[r:WROTE]->(post:Post)
Where posts have created_date fields. When a user reposts it, the underlying post doesn't change, they just add a REPOSTED relationship to that post. Like this:
(u:User)-[r:REPOSTED]->(post2:Post)
And when I want to look at a user's posts, I want to grab any that they've written or reposted and sort them in the order of either the post.created_date if they WROTE the post or REPOSTED time if they reposted it.
I have no idea what I should be doing here, so I attempted something like this but it isn't editing the repost_date in time (it doesn't return the correct result).
MATCH (u:User)-[r:WROTE|REPOSTED]->(post:Post)
WITH (CASE WHEN r.created_date IS NOT NULL THEN r.created_date ELSE post.date END) as repost_date, post
SET post.repost_date = repost_date
RETURN post, repost_date
ORDER BY repost_date DESC
LIMIT 10
Is there another way to grab and return both dates (when both exist, i.e. it's a REPOST)?
Thank you in advance!
There are a couple ways to achieve what you want. Probably the simplest is:
MATCH (u:User)-[r:WROTE|REPOSTED]->(post:Post)
WITH post, coalesce(r.created_date, post.created_date) AS date
ORDER BY date DESC
LIMIT 10
RETURN post
You could also use a subquery UNION and then use post-filtering
MATCH (u:User)
CALL {
MATCH (u)-[:WROTE]->(p:Post)
RETURN p, p.created_date as date
UNION
MATCH (u)-[r:REPOSTED]->(p:Post)
RETURN p, r.created_date as date
}
WITH p, date
ORDER BY date DESC
LIMIT 10
RETURN p
Third option is to use CASE statement
MATCH (u:User)-[r:WROTE|REPOSTED]->(post:Post)
WITH post, CASE WHEN r:REPOSTED THEN r.created_date ELSE post.created_date END AS date
ORDER BY date DESC
LIMIT 10
RETURN post
You can use the PROFILE clause to see what is best
we are trying to run a cypher query but we are not able to get the results we want.
It is important to note that we cannot make this work with subqueries because we are using Neo4j 3.5 and in this version, they are still not available.
The problem is that we have two parts for a query, the first one will fix the variables for the second part, and the second part consists of multiple matches, and it has to get the results of the previous query and if at least one of the matches return a result for a row, this row is not filtered out, else if none return result it is discarded.
More specifically, the query we are trying to run is similar to the following one:
//First part of the query where we want to fix variables with the match and where
MATCH (u:User)-[:ASSIGNED_TO]->(t:Task)-[:PENDING]->(ob:Object)<-[:HAS_OPEN_OBJECT]-(do:DataObject)<-[:ASOCIATED]-(:Module)-[:CAN_LIST]->(view:WidgetObject)
WHERE u.uid = 'user_uid'
AND view.uid = 'view_uid'
AND view.object_name = do.object_type
with do, t, ob
// In this second part of the query we want to maintain the variables of the previous part and if at least one matches the value should be returned
// we have tried with UNION but we will need pagination, but even with union it's not working
MATCH (ac:Action)<-[:ASOCIATED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE ac.name CONTAINS 'body'
WITH COLLECT({data_object_uid: do.uid}) as act_filter
MATCH (c:Comment)<-[:COMMENTED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE c.body CONTAINS 'body'
WITH act_filter + COLLECT({data_object_uid: do.uid}) as comment_filter
MATCH (at:Attachment)<-[:HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE at.name CONTAINS 'body'
WITH comment_filter + COLLECT({data_object_uid: do.uid}) as attachment_filter
UNWIND attachment_filter as row
return row.data_object_uid
We are not sure if in the second part, the second and third matches are maintaining the same subset of results coming from the first part of the query.
A funny behavior we have found is that if we remove the last match we are getting results but if we add it, we are not getting any results. We do not understand this behavior because if the second match is returning results and they are stored in a variable after a collect, appending this to the next collected results should return something.
For example, if the second match returns as comment_filter [{data_object_uid: "343dienmd3-dasd"}] and the third match is not returning anything, after the concatenation in the WITH clause it should return the same thing, but the result is empty.
We need some light here, we don't know if we are close and we are making a stupid mistake or we are getting all wrong and we need to change the approach completely.
Since you do not know which of the three matches in the second part will yield results, I would try something along the lines below:
NOTE I used ASSOCIATED instead of ASOCIATED
MATCH (n)<-[:ASSOCIATED|COMMENTED|HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE
(n:Action AND n.name CONTAINS 'body')
OR
(n:Comment AND n.body CONTAINS 'body')
OR
(n:Attachment AND n.name CONTAINS 'body')
RETURN COLLECT(DISTINCT {data_object_uid: do.uid})
My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2
Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2
I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.
I am trying to load large dataset into neo4j-3 and looking for the options. I found one neo4j-import but the problem with that is it is for initial load only. I have to load 2M records around every week.
I tried loading through shell but having some performance issue, I tried following.
1) Creating constraint upfront.
2) Creating Node and relationships in separate query.
3) Heap space 8G
4) dbms.memory.pagecache 4G
Many times the import just hangs and does nothing for hours.
Edit - CSV load being executed:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
OPTIONAL MATCH (per:Person {UID : "Person."+row.player_cardnum})
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
EDIT
On a second look, seems like you're trying to implement some kind of conditional logic to your insert.
It looks like what you're trying to do is figure out if a :Person exists with a UID (derived from some concatenation with row.player_cardnum), and in the case where that :Person doesn't exist and the match fails, MERGE a :Person with the CardNumber given by row.player_cardnum.
If this is your goal, you're ALMOST there with your query. The problem is with your WHERE clause.
Understand that WHERE clauses are linked with a preceding MATCH, OPTIONAL MATCH, or WITH, and only affects the linked clause.
With that WHERE on that OPTIONAL MATCH, per will always be null, but more importantly, your row will still exist, and the following MERGE will ALWAYS take place for all rows in the CSV. This is probably the source of your slowdown, as it's creating new :Person nodes for all rows.
If you're trying to null out the row completely when the OPTIONAL MATCH hits on an existing :Person (so the MERGE won't happen in that case), you'll need to add a WITH clause, and make sure your WHERE clause is applied to it instead of the OPTIONAL MATCH.
Additionally, make sure that you have either unique constraints or indexes on Person.UID and Person.CardNumber. As for the UID match, I've heard that indexes are not used when there's some kind of string concatenation of the thing you're matching upon, so you may need to assemble it first and pass it in with a WITH.
Your final query would look like this:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
// first build the UID so we can take advantage of the index
WITH row, "Person." + row.player_cardnum AS UID
OPTIONAL MATCH (per:Person {UID : UID})
// the WHERE now applies to the WITH, which will filter out and null out the row when an OPTIONAL MATCH is found
WITH row, per
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;