Issue combining two cypher queries with UNION - neo4j

I ran into the following problem when combining two cypher queries on console.neo4j.org
The query:
MATCH (p1:Crew)-[r_pfq]->(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
RETURN distinct(p1) AS person, count(r_pfq) AS friend_score, collect(fq.name) AS friends
ORDER BY friend_score DESC
LIMIT 10
works fine, as does
MATCH (f:Crew)<-[r_fqf]-(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
WITH distinct(f), count(r_fqf) AS weight
ORDER BY weight DESC
LIMIT 10
MATCH f<--(p:Crew)
RETURN distinct(p) AS person, sum(weight) AS friend_score, collect(f.name) AS friends
ORDER BY friend_score DESC
LIMIT 10
Now when I try to combine the query results using the UNION command, i.e.
MATCH (p1:Crew)-[r_pfq]->(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
RETURN distinct(p1) AS person, count(r_pfq) AS friend_score, collect(fq.name) AS friends
ORDER BY friend_score DESC
LIMIT 10
UNION
MATCH (f:Crew)<-[r_fqf]-(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
WITH distinct(f), count(r_fqf) AS weight
ORDER BY weight DESC
LIMIT 10
MATCH f<--(p:Crew)
RETURN distinct(p) AS person, sum(weight) AS friend_score, collect(f.name) AS friends
ORDER BY friend_score DESC
LIMIT 10
I get the error
Error: org.neo4j.graphdb.NotFoundException: Unknown identifier `weight`.
Can anyone provide me with an explanation why these query results can not be combined and how to properly do so? Why is the identifier known when running both queries separately but unknown in a UNION-combined query?

EDIT
The following simpler query is basically equivalent, except that the second query in the UNION does not ORDER BY weight. This is because we are already ordering by the derived friend_score, so it seemed redundant. Also, in order for a variable to be included in the ORDER BY clause, it has to be in the RETURN clause -- but the first query in the UNION does not have a weight variable, which would have violated the requirements for a legal UNION statement.
In addition, there is a second WITH clause in the second query because you have to define the variables used in an ORDER BY clause (like friend_score) before the RETURN clause!
MATCH (p1:Crew)-[r_pfq]->(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
RETURN DISTINCT (p1) AS person, count(r_pfq) AS friend_score, collect(fq.name) AS friends
ORDER BY friend_score DESC
LIMIT 10
UNION
MATCH (p:Crew)-->(f:Crew)<-[r_fqf]-(fq:Crew)
WHERE fq.name IN ["Neo", "Morpheus"]
WITH f, count(r_fqf) AS weight, p
WITH f, sum(weight) AS friend_score, p
RETURN DISTINCT (p) AS person, friend_score, collect(DISTINCT (f).name) AS friends
ORDER BY friend_score DESC
LIMIT 10

Related

cypher: not possible to access variables declared before the WITH/RETURN

I have a neo4j DB where I have the following relations:
(:journal)<-[:BELONGS_TO_JOURNAL]-(:article)
(:person)-[:WROTE]->(article)
I would like to perform a query to find, among the authors of articles belinging to the journal that has most articles, the ones having written the highest number of articles.
The following query gives the journal having the highest number of articles:
match (j:journal)-[:BELONGS_TO_JOURNAL]-()
return j.name,
count(*) as articlesCount
order by articlesCount desc limit 1
And I thought about this other query to find the request:
match (j:journal)-[:BELONGS_TO_JOURNAL]-()
with j as j, count(*) as articlesCount
match (j)<-[:BELONGS_TO_JOURNAL]-(a:article)<-[:WROTE]-(p:person)
return p, count(*) as authorsCount order by articlesCount, authorsCount limit 1
but it gives problems because articlesCount cannot be used in the return since count() is used.
Any suggestions?
Try this:
MATCH (j:journal)-[:BELONGS_TO_JOURNAL]-(a:article)<-[:WROTE]-(p:person)
WITH j, p, count(a) AS articlesCount
ORDER BY articlesCount DESC
WITH j, COLLECT({author: p, articlesCount: articlesCount})[0] AS authorWithMostArticlesForAJournal
RETURN authorWithMostArticlesForAJournal.author AS author, authorWithMostArticlesForAJournal.articlesCount AS articlesCount
In this, we first calculate the articlesCount for each distinct combination of journal and person nodes. Then we sort the records in descending order of count. Finally, for each journal we get the top author, be collecting all in the list and picking the first element of the list.

Difference between the results of two independent queries neo4j

I have 2 subqueries that returns 2 sets of users (each query return one set of users)
First query :
MATCH (:User {user_id: "69b3315a-ba4a-4021-94e1-0f494f9b957f"})-->(first_set_of_users)
RETURN first_set_of_users
Second query :
MATCH (:User {user_id: "69b3315a-ba4a-4021-94e1-0f494f9b957f"})<-[:LIKES]-(likers)-[:LIKES]->(v)
WITH DISTINCT v
MATCH (second_set_of_users)-[:LIKES]->(v)
RETURN second_set_of_users, COUNT(*) AS recoWeight
ORDER BY recoWeight DESC
What I want to finally return is all users from second_set_of_users minus the one in first_set_of_users and ORDER BY recoWeight DESC
How can I do that in just one query ? Everything I tried led to cartesian products of queries and took forever while each independent query takes less than a second.
MATCH (:User {user_id: "69b3315a-ba4a-4021-94e1-0f494f9b957f"})-->(first_set_of_users)
WITH collect(first_set_of_users) AS list_of_first_set_of_users
MATCH (:User {user_id: "69b3315a-ba4a-4021-94e1-0f494f9b957f"})<-[:LIKES]-(likers)-[:LIKES]->(v)
WITH DISTINCT v, list_of_first_set_of_users
MATCH (second_set_of_users)-[:LIKES]->(v)
WITH second_set_of_users, COUNT(*) AS recoWeight
WHERE NOT second_set_of_users IN list_of_first_set_of_users
RETURN second_set_of_users, recoWeight
ORDER BY recoWeight DESC
Explanation.
Using WITH clause we could pass the result of the first query into the second query.
And then using WHERE NOT IN we could filter the result of the second query.

Delete all but the top-k nodes of some query

I try to replicate the behaviour of the following SQL query in neo4j
DELETE FROM history
WHERE history.name = $modelName AND id NOT IN (
SELECT history.id
FROM history
JOIN model ON model.id = history.model_id
ORDER BY created DESC
LIMIT 10
)
I tried a lot of different queries, but basically I'm always struggling to incorporate finding the TOP-k elements. That's the closest I got to a solution.
MATCH (h:HISTORY)-[:HISTORY]-(m:MODEL)
WHERE h.name = $modelName
WITH h
MATCH (t:HISTORY)-[:HISTORY]-(m:MODEL)
WITH t ORDER BY t.created DESC LIMIT 10
WHERE NOT h IN t
DELETE h
With that query I get the error expected List<T> but was Node for the line WITH t ORDER BY t.created DESC LIMIT 10.
I tried changing it it COLLECT(t) AS t but then the error is expected Any, Map, Node or Relationship but was List<Node>.
So I'm pretty much stuck. Any idea how to write this query in Cypher?
Following that approach, you should reverse the order, matching to your top-k nodes, collecting them, and performing the match where the nodes matched aren't in the collection.
MATCH (t:HISTORY)-[:HISTORY]-(:MODEL)
WITH t ORDER BY t.created DESC LIMIT 10
WITH collect(t) as saved
MATCH (h:HISTORY)-[:HISTORY]-(:MODEL)
WHERE h.name = $modelName
AND NOT h in saved
DETACH DELETE h

neo4j indegree outdegree union

I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.

Get the full graph of a query in Neo4j

Suppose tha I have the default database Movies and I want to find the total number of people that have participated in each movie, no matter their role (i.e. including the actors, the producers, the directors e.t.c.)
I have already done that using the query:
MATCH (m:Movie)<-[r]-(n:Person)
WITH m, COUNT(n) as count_people
RETURN m, count_people
ORDER BY count_people DESC
LIMIT 3
Ok, I have included some extra options but that doesn't really matter in my actual question. From the above query, I will get 3 movies.
Q. How can I enrich the above query, so I can get a graph including all the relationships regarding these 3 movies (i.e.DIRECTED, ACTED_IN,PRODUCED e.t.c)?
I know that I can deploy all the relationships regarding each movie through the buttons on each movie node, but I would like to know whether I can do so through cypher.
Use additional optional match:
MATCH (m:Movie)<--(n:Person)
WITH m,
COUNT(n) as count_people
ORDER BY count_people DESC
LIMIT 3
OPTIONAL MATCH p = (m)-[r]-(RN) WHERE type(r) IN ['DIRECTED', 'ACTED_IN', 'PRODUCED']
RETURN m,
collect(p) as graphPaths,
count_people
ORDER BY count_people DESC

Resources