Generating N recommendations per person in Neo4J - neo4j

I follow this tutorial about collaborative filters in Neo4j.
In this tutorial, we first create a toy movie graph, as follows:
LOAD CSV WITH HEADERS FROM "https://neo4j-contrib.github.io/developer-resources/cypher/movies_actors.csv" AS line
WITH line
WHERE line.job = "ACTED_IN"
MERGE (m:Movie {title:line.title}) ON CREATE SET m.released = toInt(line.released), m.tagline = line.tagline
MERGE (p:Person {name:line.name}) ON CREATE SET p.born = toInt(line.born)
MERGE (p)-[:ACTED_IN {roles:split(line.roles,";")}]->(m)
RETURN count(*);
Next, we propose five possible co-actors for Tom Hanks:
MATCH (tom:Person)-[:ACTED_IN]->(movie1)<-[:ACTED_IN]-(coActor:Person),
(coActor)-[:ACTED_IN]->(movie2)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks"
AND NOT (tom)-[:ACTED_IN]->()<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(distinct coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5
What if I want to perform such an operation on every person who acted in "Apollo 13"? In other words, my task is to propose 5 possible co-actors to every person who acted in "Apollo 13".
How do I do this in an effective way?

A few things here. The query you pasted doesn't really make any sense:
RETURN coCoActor.name, COUNT(DISTINCT coCoActor) AS frequency
This will always return a frequency of 1, so your ORDER BY is meaningless.
I think you meant this:
RETURN coCoActor.name, COUNT(DISTINCT coActor) AS frequency
Second thing is that you don't need the variables movie1 and movie2; they're not used again in your query.
Finally, you need to assert that you're not recommending the same actor to him or herself:
WHERE actor <> coCoActor
To actually answer your question:
// Find the Apollo 13 actors.
MATCH (actor:Person)-[:ACTED_IN]->(:Movie {title:"Apollo 13"})
// Continue with query.
MATCH (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person),
(coActor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE NOT (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor) AND
actor <> coCoActor
// Group by actor and coCoActor, counting how many coActors they share as freq.
WITH actor, coCoActor, COUNT(DISTINCT coActor) AS freq
// Order by freq descending so that COLLECT()[..5] grabs the top 5 per row.
ORDER BY freq DESC
// Get the recommendations.
WITH actor, COLLECT({name: coCoActor.name, freq: freq})[..5] AS recos
RETURN actor.name, recos;

Related

Neo4J get N latest relationships per unique target node?

With the following graph:
How can I write a query that would return N latest relationships by the unique target node?
For an example, this query: MATCH (p)-[r:RATED_IN]->(s) WHERE id(p)={person} RETURN p,s,r ORDER BY r.measurementDate DESC LIMIT {N} with N = 1 would return the latest relationship, whether it is RATED_IN Team Lead or Programming, but I would like to get N latest by each type. Of course, with N = 2, I would like the 2 latest measurements per skill node.
I would like the latest relationship by a person for Team Lead and the latest one for Programming.
How can I write such a query?
-- EDIT --
MATCH (p:Person) WHERE id(p)=175
CALL apoc.cypher.run('
WITH {p} AS p
MATCH (p)-[r:RATED_IN]->(s)
RETURN DISTINCT s, r ORDER BY r.measurementDate DESC LIMIT 2',
{p:p}) YIELD value
RETURN p,value.r AS r, value.s AS s
Here's a Cypher knowledge base article on limiting MATCH results per row, with a few different suggestions on how to accomplish this given current limitations. Using APOC's apoc.cypher.run() to perform a subquery with a RETURN using a LIMIT will do the trick, as it gets executed per row (thus the LIMIT is per row).
Note that for the upcoming Neo4j 4.0 release at the end of the year we're going to be getting some nice Cypher goodies that will make this significantly easier. Stay tuned as we reveal more details as we approach its release!

Neo4J Get only the first relationship per node

I have this graph where the nodes are researchers, and they are related by a relationship named R1, the relationship has a "value" property. How can I get the name of the researchers that are in the relationships with the greatest value? It's like get all the relationships order by r.value DESC but getting only the first relationship per researcher, because I don't want to see on the table duplicated researcher names. By the way, is there a way to get the name of the researchers order by the mean of their relationship "values"? Sorry about the confused topic, I don't speak English very well, thank you very much.
I've been trying things like the Cypher query bellow:
MATCH p=(n)-[r:R1]->(c)
WHERE id(n) < id(c) and r.coauthors = false
return DISTINCT n.name order by n.campus, r.value DESC
Correct me if I am wrong, but you want one result per "n" with the highest value from "r"?
MATCH (n)-[r:R1]->(c)
WHERE r.coauthors = false
WITH n, r ORDER BY r.value DESC
WITH n, head(collect(r)) AS highR
RETURN n.name, highR.value ORDER BY n.campus, highR.value DESC
This will get you all the r's in order and pick the first head(collect(r)) after first doing an ORDER BY. Then you just need to return the values you want. Check out Neo4j Aggregation Functions for some documentation on how aggregation functions work. Good luck!
As an aside, if there is a label that all "n" have, you should add that in your MATCH: MATCH (n:Person) .... it will help speed up your query!

Neo4j: Query to find the nodes with most relationships, and their connected nodes

I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:
MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to
MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:
MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r
What am I doing wrong? Why isn't the first query working as expected?
Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.
So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.
You might rewrite this in a couple different ways.
One is to collect the authors and order on the size of the author list:
MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10
You can always collect the author and its relationship, if the relationship itself is interesting to you.
EDIT
If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:
MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a
You can collect on the authors and/or return the relationship as well, depending on what you need from the output.
You are very close: after sorting, it is necessary to rediscover the authors. For example:
MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b,
COUNT(r) AS authorsCount
ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b,
COLLECT(a) AS authors
ORDER BY size(authors) DESC

Count nodes with a certain property

I'm working on a dataset describing legislative co-sponsorship. I'm trying to return a table with the name of the bill, the number of legislators who co-sponsored it and then the number of co-sponsors who are Republican and the number who are Democrat. I feel like this should be simple to do but I keep getting syntax errors. Here's what I have so far:
MATCH (b:Bill{Year:"2016"})-[r:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(c:Legislators)
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
RETURN b.name, b.Short_description, COUNT(r) AS TOTAL, COUNT(c.Party = "Republican"), COUNT(c.Party = "Democratic")
ORDER BY COUNT(r) desc
However, in the table this query produces the count of Republican and Democrat sponsors and the count of total sponsors, are all the same. Obviously, the sum of number of Rep and Dem sponsors should equal the total.
What is the correct syntax for this query?
Use the filter:
MATCH (b:Bill{Year:"2016"})
-[r:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-
(c:Legislators)
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
WITH b, collect(distinct c) as Legislators
RETURN b.name,
b.Short_description,
SIZE(Legislators) AS TOTAL,
SIZE(FILTER(c in Legislators WHERE c.Party = "Republican")) as Republican,
SIZE(FILTER(c in Legislators WHERE c.Party = "Democratic")) as Democratic
ORDER BY TOTAL desc
Assuming that legislators can ONLY be Republican or Democratic (we'll need to make some adjustments if this isn't the case):
MATCH (b:Bill{Year:"2016"})
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
WITH b
OPTIONAL MATCH (b)-[:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(rep:Legislators)
WHERE rep.Party = "Republican"
OPTIONAL MATCH (b)-[:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(dem:Legislators)
WHERE dem.Party = "Democratic"
WITH b, COUNT(DISTINCT rep) as reps, COUNT(DISTINCT dem) as dems
RETURN b.name, b.Short_description, reps + dems AS TOTAL, reps, dems
ORDER BY TOTAL desc
This is a graph model problem, you shouldn't be counting nodes by their properties, if some nodes can have the same property and you want to count in this property, you need to create an intermediate node to set the party:
(b:Bill)-[:SPONSORED_AUTHORED]->(i:Intermediate)-[:TARGET]->(c:Legislators)
and then you create a relation between your intermediate node and the party:
(i:Intermediate)-[:BELONGS_PARTY]->(p:Party{name:"Republican"})
The intermediate node represents the data you actually have in your relationship, but it allows you to create relationships between your operation and a party, making counting easier and way faster.
Keep in mind that this is just an example, without knowing the context I don't know what should be the Intermediate real label and its property, it's just a demo of the concept.
I answered a question using this, feel free to check it (it's a real life example, maybe easier to understand): Neo4j can I make relations between relations?

Get the full graph of a query in Neo4j

Suppose tha I have the default database Movies and I want to find the total number of people that have participated in each movie, no matter their role (i.e. including the actors, the producers, the directors e.t.c.)
I have already done that using the query:
MATCH (m:Movie)<-[r]-(n:Person)
WITH m, COUNT(n) as count_people
RETURN m, count_people
ORDER BY count_people DESC
LIMIT 3
Ok, I have included some extra options but that doesn't really matter in my actual question. From the above query, I will get 3 movies.
Q. How can I enrich the above query, so I can get a graph including all the relationships regarding these 3 movies (i.e.DIRECTED, ACTED_IN,PRODUCED e.t.c)?
I know that I can deploy all the relationships regarding each movie through the buttons on each movie node, but I would like to know whether I can do so through cypher.
Use additional optional match:
MATCH (m:Movie)<--(n:Person)
WITH m,
COUNT(n) as count_people
ORDER BY count_people DESC
LIMIT 3
OPTIONAL MATCH p = (m)-[r]-(RN) WHERE type(r) IN ['DIRECTED', 'ACTED_IN', 'PRODUCED']
RETURN m,
collect(p) as graphPaths,
count_people
ORDER BY count_people DESC

Resources