Return top % of results in Neo4J - neo4j

I've a graph of students and the various books that they've read. I want to find out the top 10% of students who've read the most books. How can I do that? I've tried the following cypher syntax:
MATCH (s:Student)-[:READ]->(b:Book)
WITH s, COUNT(b) AS no_of_books
WHERE no_of_books > percentileCont(no_of_books, 0.9)
RETURN s.Name, no_of_books
The error 'invalid use of aggregating function' is returned. It seems that trying to use two aggregating functions on top of each other is an issue here. How can I tweak my syntax to make it work?
I'll be happy to use the LIMIT function instead if it can work with percentages as well.

Answering my own question (again), in case someone else is looking up the same issue
MATCH (s:Student)-[:READ]->(b:Book)
WITH s, COUNT(b) AS no_of_books
ORDER BY no_of_books DESC
WITH COLLECT ({Student_Name: s.Name, No_of_Books: no_of_books}) AS books_per_stu
WITH books_per_stu, toInteger(size(books_per_stu)/100) AS percentile
UNWIND book_per_stu[0..percentile] AS top_stu
RETURN top_stu
Seems like, surprisingly, there's no straightforward way to do this, like my pseudo-code in my first post. The above syntax will return the results as a list of dictionaries rather than in a tabular format. I still welcome any answer that's simpler than mine.

Related

How to match all possible paths through multiple relationship types with filtering (Neo4j, Cypher)

I am new to Neo4j and I have a relatively complex (but small) database which I have simplified to the following:
The first door has no key, all other doors have keys, the window doesn't require a key. The idea is that if a person has key:'A', I want to see all possible paths they could take.
Here is the code to generate the db
CREATE (r1:room {name:'room1'})-[:DOOR]->(r2:room {name:'room2'})-[:DOOR {key:'A'}]->(r3:room {name:'room3'})
CREATE (r2)-[:DOOR {key:'B'}]->(r4:room {name:'room4'})-[:DOOR {key:'A'}]->(r5:room {name:'room5'})
CREATE (r4)-[:DOOR {key:'C'}]->(r6:room {name:'room6'})
CREATE (r2)-[:WINDOW]->(r4)
Here is the query I have tried, expecting it to return everything except for room6, instead I have an error which means I really don't know how to construct the query.
with {key:'A'} as params
match (n:room {name:'room1'})-[r:DOOR*:WINDOW*]->(m)
where r.key=params.key or not exists(r.key)
return n,m
To be clear, I don't need my query debugged so much as help understanding how to write it correctly.
Thanks!
This should work for you:
WITH {key:'A'} AS params
MATCH p=(n:room {name:'room1'})-[:DOOR|WINDOW*]->(m)
WHERE ALL(r IN RELATIONSHIPS(p) WHERE NOT EXISTS(r.key) OR r.key=params.key)
RETURN n, m
With your sample data, the result is:
╒════════════════╤════════════════╕
│"n" │"m" │
╞════════════════╪════════════════╡
│{"name":"room1"}│{"name":"room2"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room3"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room4"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room5"}│
└────────────────┴────────────────┘

Using Neo4j fulltext indexing for Contains queries

I'm using neo4j 3.5, and have about 9 million user nodes. I was trying to implement the following query, but it was taking way too long:
MATCH (users:User) WHERE (users.username CONTAINS "joe" OR users.first_name CONTAINS "joe" OR users.last_name CONTAINS "joe")
RETURN users
LIMIT 30
I was hoping to take advantage of neo4j 3.5's newe fulltext indexing feature by creating the following index:
CALL db.index.fulltext.createNodeIndex('users', ['User'], ['username', 'first_name', 'last_name'])
and then querying the db like so
CALL db.index.fulltext.queryNodes('users', joe)
YIELD node
RETURN node.user_id
I thought this would work the same as contains and return users whose username, first_name or last_name contains joe (eg: myjoe12, joe12, 12joe, 44joeseph, etc.) but it seems to be returning users whose fields are joe exactly or contain joe separated by a whitespace (eg: Joe B, Joe y1), I tried using joe* in the query but that only returns everything starting with joe, I want to return everything containing joe or whatever search term. What would be the best way to go about this?
Speed issue / Index:
So far I know, Neo4j has a optimised index for STARTS WITH & ENDS WITH only for NOT composite indexes.
If I read this docs paragraph, my conclusion will be this: Your 9 million users will be searched one by one, neo4j doesn't use any index for your query. What makes this query really slow.
A answer to your question:
I want to return everything containing Joe or whatever search term.
You probably looking for a regex search (this is also slow and not a index search and not recommended):
Example query based on your query:
MATCH (users:User)
WHERE (users.username =~ "(?i).*joe.*" OR users.first_name =~ "(?i).*joe.*" OR users.last_name =~ "(?i).*joe.*")
RETURN users
LIMIT 30
Explanation for (?i) this means case-insensitive so Joe or joe will be matched. See regex operator docs and regex where docs
For the fulltext schema index, it looks like you'll need to use the fuzzy search operator ~ in your query, though you may need to do some filtering on the score to make sure you're looking at relevant results:
CALL db.index.fulltext.queryNodes('users', 'joe~')
YIELD node, score
WHERE score > .8
RETURN node.user_id

How to get max value in cypher subquery

I've read the questions about subqueries but still stuck with this use case.
I have Documents that contain one or more keywords and each document has linked user comments with a status property. I want to get just the most recent status (if it exists) returned for each document in the query. If I run a query like the following, I just get one row.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d,k
OPTIONAL MATCH (d)--(c:Comments)
ORDER BY c.created DESC LIMIT 1
RETURN d.Title as Title, k.Word as Keyword, c.Status as Status
I have hundreds of documents I want to return with the latest status like:
Title Keyword Status
War in the 19th Century WWI Reviewed
War in the 19th Century Weapons Reviewed
The Great War WWI Pending
World War I WWI <null>
I have tried multiple queries using WITH clause but no luck yet. Any suggestions would be appreciated.
This query should do what you probably intended:
MATCH (d:Document)-[:Keys]->(k:Keywords)
OPTIONAL MATCH (d)--(c:Comments)
WITH d, COLLECT(k.Word) AS Keywords, c
ORDER BY c.created DESC
WHERE c IS NOT NULL
RETURN d.Title as Title, Keywords, COLLECT(c)[0] AS Status
Since a comment is related to a document, and not a document/keyword pair, it makes more sense to return a collection of keywords for each Title/Status pair. Your original query, if it had worked, would have returned the same Title/Status pair multiple times, each time with a different keyword.
We've got a knowledge base article explaining how to limit results of a match per-row, that should give you a few good options.
EDIT
Here's a full example, using apoc.cypher.run() to perform the limited subquery.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d, COLLECT(k.Word) AS Keywords
// collect keywords first so each document on a row
CALL apoc.cypher.run('
OPTIONAL MATCH (d)--(c:Comments)
RETURN c
ORDER BY c.created DESC
LIMIT 1
', {d:d}) YIELD value
RETURN d.Title as Title, Keywords, value.c.status as Status

How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j?

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

Cypher Query combining results that can be ordered as a whole

I have a Cypher query that combines two result sets that I would like to then order as a combined result.
An example of what I am trying to do is here: http://console.neo4j.org/r/j2sotz
Which gives the error:
Cached(nf of type Collection) expected to be of type Map but it is of type Collection - maybe aggregation removed it?
Is there a way to collect multiple results into a single result that can be paged, ordered, etc?
There are many posts about combining results, but I can't find any that allow them to be treated as a map.
Thanks for any help.
You can collect into a single result like this:
Start n=node(1)match n-[r]->m
with m.name? as outf, n
match n<-[r]-m
with m.name? as inf, outf
return collect(outf) + collect(inf) as f
Unions are covered here: https://github.com/neo4j/neo4j/issues/125 (not available right now).
I haven't seen anything about specifically sorting a collection.

Resources