Neo4j pipe data - neo4j

Hi there I am on neo4j and I am having some trouble I have one query where I want to return a the a node (cuisine) with the highest percentage like so
// 1. Find the most_popular_cuisine
MATCH (n:restaurants)
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
RETURN r.cuisine , 100 * count(*)/total as percentage
order by percentage desc
limit 1
I am trying to extend this even further by getting the top result and matching to that to get nodes with just that property like so
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
WITH r.cuisine as cuisine , count(*) as cnt
MATCH (t:restaurants)
WHERE t.cuisine = cuisine AND count(*) = MAX(cnt)
RETURN t

I think you might be better off refactoring your model a little bit such that a :Cuisine is a label and each cuisine has its own node.
(:Restaurant)-[:OFFERS]->(:Cuisine)
or
(:Restaurant)-[:SPECIALIZES_IN]->(:Cuisine)
Then your query can look like this
MATCH (cuisine:Cuisine)
RETURN cuisine, size((cuisine)<-[:OFFERS]-()) AS number_of_restaurants
ORDER BY number_of_restaurants DESC

I wasn't able to use WITH r.cuisine as cuisine , count(*) as cnt in a WITH rather than a RETURN statement, so I had to resort to a slightly more long-winded approach.
There might be a more optimized way to do this, but this works too,
// Get all unique cuisines in a list
MATCH (n:Restaurants)
WITH COUNT(n.Cuisine) as total, COLLECT(DISTINCT(n.Cuisine)) as cuisineList
// Go through each cuisine and find the number of restaurants associated with each
UNWIND cuisineList as c
MATCH (r:Restaurants{Cuisine:c})
WITH total, r.Cuisine as c, count(r) as cnt
ORDER BY cnt DESC
WITH COLLECT({Cuisine: c, Count:cnt}) as list
// For the most popular cuisine, find all the restaurants offering it
MATCH (t:Restaurants{Cuisine:list[0].Cuisine})
RETURN t

Related

Get the count of multiple properties in neo4j

I am trying to combine 2 cyphers into one for performance but have not succeeded.
I need to get the count of multiple properties unique to eachother in the same cypher.
EX 1:
Match (n)
RETURN n.foo, count(*) AS count
EX 2:
Match (n)
RETURN n.bar, count(*) AS count
I was hoping I could just run both:
Match (n)
RETURN n.foo, count(*) AS fooCount, n.bar, count(*) AS barCount
But this returns the same count for both as it is finding where they both match. Not what I want.
So was looking for a way to group them to be unique like:
Match (n)
RETURN {n.foo, count(*) AS fooCount}, {n.bar, count(*) AS barCount}
Obviously this is not valid syntax but shows what I am trying to do.
Any assistance on this is of course appreciated.
It's best to do this back to back, all at once isn't a good idea for this kind of query, as aggregation won't work in your favor.
You could try this:
MATCH (n)
WITH n.bar as bar, count(*) AS count
WITH collect({bar:bar, count:count}) as barCounts
MATCH (n)
WITH barCounts, n.foo as foo, count(*) AS count
WITH barCounts, collect({foo:foo, count:count}) as fooCounts
RETURN barCounts, fooCounts
Since you are trying to aggregate separate query results, you can also use UNION as a quick and easy way to return both at the same time.
Match (n)
RETURN "foo" as type, n.foo as value, count(*) AS count
UNION ALL
Match (n)
RETURN "bar" as type, n.bar as value, count(*) AS count
Just a few notes, both returns for a UNION must have the same column names.
Also, the "type" column in the example isn't necessary, but it shows how you can add filler if both queries don't have the same number of return columns. (Or if you want to tell which query the result is from.) If there is a "foo" and a "bar" with the same value+count, UNION ALL will keep both, and UNION will drop the duplicate (if you remove the type column).
Maybe it's outdated, but just in case someone needs it, I've found another approach using an APOC function which avoids running multiple times the same MATCH (n). In your case, it could be something like:
MATCH (n)
WITH collect(n.bar) as bars, collect(n.foo) as foos
WITH apoc.coll.frequenciesAsMap(bars) as barCounts, apoc.coll.frequenciesAsMap(foos) as fooCounts
RETURN barCounts, fooCounts
Single MATCH, multiple Counts.
Wish it could help someone!

Neo4J Get only the first relationship per node

I have this graph where the nodes are researchers, and they are related by a relationship named R1, the relationship has a "value" property. How can I get the name of the researchers that are in the relationships with the greatest value? It's like get all the relationships order by r.value DESC but getting only the first relationship per researcher, because I don't want to see on the table duplicated researcher names. By the way, is there a way to get the name of the researchers order by the mean of their relationship "values"? Sorry about the confused topic, I don't speak English very well, thank you very much.
I've been trying things like the Cypher query bellow:
MATCH p=(n)-[r:R1]->(c)
WHERE id(n) < id(c) and r.coauthors = false
return DISTINCT n.name order by n.campus, r.value DESC
Correct me if I am wrong, but you want one result per "n" with the highest value from "r"?
MATCH (n)-[r:R1]->(c)
WHERE r.coauthors = false
WITH n, r ORDER BY r.value DESC
WITH n, head(collect(r)) AS highR
RETURN n.name, highR.value ORDER BY n.campus, highR.value DESC
This will get you all the r's in order and pick the first head(collect(r)) after first doing an ORDER BY. Then you just need to return the values you want. Check out Neo4j Aggregation Functions for some documentation on how aggregation functions work. Good luck!
As an aside, if there is a label that all "n" have, you should add that in your MATCH: MATCH (n:Person) .... it will help speed up your query!

Neo4j - Find movies that share most tags with a particular movie

This Cypher query I wrote to find the top 10 movies that share the most number of tags with "Explorers (1985)" is not returning the desired result. In fact, it runs for a very long time and then stops because there isn't enough memory to complete the computation.
I'm relatively new to Cypher. I would appreciate any help someone could offer.
MATCH (m1:Movie {title:"Explorers (1985)"})-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(m2:Movie)
WITH size((m2)-[:HAS_TAG]->(t)) as cnt,
m2
RETURN m2, cnt
ORDER BY in DESC LIMIT 10
You can simplify your match a bit, since we're traversing the same relationship type twice. As for the tag count, you can either count(t), or simply count the times m2 occurs in the results, since it will have a row for each tag in common.
Give this one a try:
MATCH (:Movie {title:"Explorers (1985)"})-[:HAS_TAG*2]-(m2:Movie)
WITH m2, count(m2) as cnt
RETURN m2, cnt
ORDER BY cnt DESC LIMIT 10
I can't confirm on a large data set, but I think you can minimize loaded nodes by breaking up the query into steps like this...
// Get tags on base movie
MATCH (m1:Movie {title:"Explorers (1985)"})-[:HAS_TAG]->(t:Tag)
// Reduce tags to 1 row
WITH m1, COLLECT(id(t)) as tags
// Find only valid Movie-HAS->Tag
MATCH (m2:Movie)-[:HAS_TAG]->(t:Tag)
WHERE id(t) in tags AND NOT m2.title = "Explorers (1985)"
RETURN m2, COUNT(t) as cnt
ORDER BY cnt DESC LIMIT 10

Equivalent of row_number in neo4j

I'm trying to run a query where I assign integers based on the order they appeared in the query. I'd like it to work to the effect of:
MATCH users RETURN users ORDER BY created_at SET user.number=ROW_NUMBER()
Is there a way to do this in a single query? Thanks!
You can do it by playing a bit with a collection :
MATCH (n:User)
WITH n
ORDER BY n.created_at
WITH collect(n) as users
UNWIND range(0, size(users)-1) as pos
SET (users[pos]).number = pos

Neo4j / Cypher query syntax feedback

I'm developing a kind of reddit service to learn Neo4j.
Everything works fine, I just want to get some feedback on the Cypher query to get the most recent news stories, the author and number of comments, likes and dislikes.
I'm using Neo4j 2.0.
MATCH comments = (n:news)-[:COMMENT]-(o)
MATCH likes = (n:news)-[:LIKES]-(p)
MATCH dislikes = (n:news)-[:DISLIKES]-(q)
MATCH (n:news)-[:POSTED_BY]-(r)
WITH n, r, count(comments) AS num_comments, count(likes) AS num_likes, count(dislikes) AS num_dislikes
ORDER BY n.post_date
LIMIT 20
RETURN *
o, p, q, r are all nodes with the label user. Should the label be added to the query to speed it up?
Is there anything else you see that I could optimize?
I think you're going to want to get rid of the multiple matches. Cypher will filter on each one, filtering through one another, rather than getting all the information.
I would also avoid the paths like comments, and rather do the count on the nodes you are saving. When you do MATCH xyz = (a)-[:COMMENT]-(b) then xyz is a path, which contains the source, relationship and destination node.
MATCH (news:news)-[:COMMENT]-(comment),(news:news)-[:LIKES]-(like),(news:news)-[:DISLIKES]-(dislike),(news:news)-[:POSTED_BY]-(posted_by)
WHERE news.post_date > 0
WITH news, posted_by, count(comment) AS num_comments, count(like) AS num_likes, count(dislike) AS num_dislikes
ORDER BY news.post_date
LIMIT 20
RETURN *
I would do something like this.
MATCH (n:news)-[:POSTED_BY]->(r)
WHERE n.post_date > {recent_start_time}
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20
To speed it up and have not neo search over all your posts, I would probably index the post-date field (assuming it doesn't contain time information). And then send this query in for today, yesterday etc. until you have your 20 posts.
MATCH (n:news {post_date: {day}})-[:POSTED_BY]->(r)
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20

Resources