I have this graph where the nodes are researchers, and they are related by a relationship named R1, the relationship has a "value" property. How can I get the name of the researchers that are in the relationships with the greatest value? It's like get all the relationships order by r.value DESC but getting only the first relationship per researcher, because I don't want to see on the table duplicated researcher names. By the way, is there a way to get the name of the researchers order by the mean of their relationship "values"? Sorry about the confused topic, I don't speak English very well, thank you very much.
I've been trying things like the Cypher query bellow:
MATCH p=(n)-[r:R1]->(c)
WHERE id(n) < id(c) and r.coauthors = false
return DISTINCT n.name order by n.campus, r.value DESC
Correct me if I am wrong, but you want one result per "n" with the highest value from "r"?
MATCH (n)-[r:R1]->(c)
WHERE r.coauthors = false
WITH n, r ORDER BY r.value DESC
WITH n, head(collect(r)) AS highR
RETURN n.name, highR.value ORDER BY n.campus, highR.value DESC
This will get you all the r's in order and pick the first head(collect(r)) after first doing an ORDER BY. Then you just need to return the values you want. Check out Neo4j Aggregation Functions for some documentation on how aggregation functions work. Good luck!
As an aside, if there is a label that all "n" have, you should add that in your MATCH: MATCH (n:Person) .... it will help speed up your query!
Related
I wonder if anyone can advise how to adjust the following query so that it returns one relationship with a count of the number of actual relationships rather than every relationship? I have some nodes with many relationships and it's killing the graph's performance.
MATCH (p:Provider{countorig: "XXXX"})-[r:supplied]-(i:Importer)
RETURN p, i limit 100
Many thanks
To return the relationship name along with a count, change your "return" statement, like this:
MATCH (p:Provider{countorig: "XXXX"})-[r:supplied]-(i:Importer)
RETURN type(r), count(r)
Using type(r) will return the type of the relationship, which looks to be "supplied" in your example. And then count(r) is just using the built-in function to count the number of occurrences of that relationship in the query.
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:
MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to
MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:
MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r
What am I doing wrong? Why isn't the first query working as expected?
Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.
So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.
You might rewrite this in a couple different ways.
One is to collect the authors and order on the size of the author list:
MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10
You can always collect the author and its relationship, if the relationship itself is interesting to you.
EDIT
If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:
MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a
You can collect on the authors and/or return the relationship as well, depending on what you need from the output.
You are very close: after sorting, it is necessary to rediscover the authors. For example:
MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b,
COUNT(r) AS authorsCount
ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b,
COLLECT(a) AS authors
ORDER BY size(authors) DESC
I follow this tutorial about collaborative filters in Neo4j.
In this tutorial, we first create a toy movie graph, as follows:
LOAD CSV WITH HEADERS FROM "https://neo4j-contrib.github.io/developer-resources/cypher/movies_actors.csv" AS line
WITH line
WHERE line.job = "ACTED_IN"
MERGE (m:Movie {title:line.title}) ON CREATE SET m.released = toInt(line.released), m.tagline = line.tagline
MERGE (p:Person {name:line.name}) ON CREATE SET p.born = toInt(line.born)
MERGE (p)-[:ACTED_IN {roles:split(line.roles,";")}]->(m)
RETURN count(*);
Next, we propose five possible co-actors for Tom Hanks:
MATCH (tom:Person)-[:ACTED_IN]->(movie1)<-[:ACTED_IN]-(coActor:Person),
(coActor)-[:ACTED_IN]->(movie2)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks"
AND NOT (tom)-[:ACTED_IN]->()<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(distinct coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5
What if I want to perform such an operation on every person who acted in "Apollo 13"? In other words, my task is to propose 5 possible co-actors to every person who acted in "Apollo 13".
How do I do this in an effective way?
A few things here. The query you pasted doesn't really make any sense:
RETURN coCoActor.name, COUNT(DISTINCT coCoActor) AS frequency
This will always return a frequency of 1, so your ORDER BY is meaningless.
I think you meant this:
RETURN coCoActor.name, COUNT(DISTINCT coActor) AS frequency
Second thing is that you don't need the variables movie1 and movie2; they're not used again in your query.
Finally, you need to assert that you're not recommending the same actor to him or herself:
WHERE actor <> coCoActor
To actually answer your question:
// Find the Apollo 13 actors.
MATCH (actor:Person)-[:ACTED_IN]->(:Movie {title:"Apollo 13"})
// Continue with query.
MATCH (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person),
(coActor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE NOT (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor) AND
actor <> coCoActor
// Group by actor and coCoActor, counting how many coActors they share as freq.
WITH actor, coCoActor, COUNT(DISTINCT coActor) AS freq
// Order by freq descending so that COLLECT()[..5] grabs the top 5 per row.
ORDER BY freq DESC
// Get the recommendations.
WITH actor, COLLECT({name: coCoActor.name, freq: freq})[..5] AS recos
RETURN actor.name, recos;
Suppose I have two kinds of nodes, Person and Competency. They are related by a KNOWS relationship. For example:
(:Person {id: 'thiago'})-[:KNOWS]->(:Competency {id: 'neo4j'})
How do I query this schema to find out all Person that knows all nodes of a set of Competency?
Suppose that I need to find every Person that knows "java" and "haskell" and I'm only interested in the nodes that knows all of the listed Competency nodes.
I've tried this query:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id;
But I get back a list of all Person that knows either "java" or "haskell" and duplicated entries for those who knows both.
Adding a count(c) at the end of the query eliminates the duplicates:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id, count(c);
Then, in this particular case, I can iterate the result and filter out results that the count is less than two to get the nodes I want.
I've found out that I could do it appending consecutive match clauses to keep filtering the nodes to get the result I want, in this case:
match (p:Person)-[:KNOWS]->(:Competency {id:'haskell'})
match (p)-[:KNOWS]->(:Competency {id:'java'})
return p.id;
Is this the only way to express this query? I mean, I need to create a query by concatenating strings? I'm looking for a solution to a fixed query with parameters.
with ['java','haskell'] as skills
match (p:Person)-[:KNOWS]->(c:Competency)
where c.id in skills
with p.id, count(*) as c1 ,size(skills) as c2
where c1 = c2
return p.id
One thing you can do, is to count the number of all skills, then find the users that have the number of skill relationships equals to the skills count :
MATCH (n:Skill) WITH count(n) as skillMax
MATCH (u:Person)-[:HAS]->(s:Skill)
WITH u, count(s) as skillsCount, skillMax
WHERE skillsCount = skillMax
RETURN u, skillsCount
Chris
Untested, but this might do the trick:
match (p:Person)-[:KNOWS]->(c:Competency)
with p, collect(c.id) as cs
where all(x in ['java', 'haskell'] where x in cs)
return p.id;
How about this...
WITH ['java','haskell'] AS comp_col
MATCH (p:Person)-[:KNOWS]->(c:Competency)
WHERE c.name in comp_col
WITH comp_col
, p
, count(*) AS total
WHERE total = length(comp_col)
RETURN p.name, total
Put the competencies you want in a collection.
Match all the people that have either of those competencies
Get the count of compentencies by person where they have the same number as in the competency collection from the start
I think this will work for what you need, but if you are building these queries programatically the best performance you get might be with successive match clauses. Especially if you knew which competencies were most/least common when building your queries, you could order the matches such that the least common were first and the most common were last. I think that would chunk down to your desired persons the fastest.
It would be interesting to see what the plan analyzer in the sheel says about the different approaches.