I am having two tables.
Table 1: Consists of name of the players and their id's.
tournament=> select * from Players;
id | name
----+--------
1 | Rahul
2 | Rohit
3 | Ramesh
4 | Roshan
5 | Ryan
6 | Romelu
7 | Roman
8 | Rampu
(8 rows)
Table 2:Consist of both the opponents playing each other, column1 contains the name of the winner and column2 loser. So this means that both taking part in that Match.
tournament=> select * from Matches;
id | winner | loser
----+--------+-------
1 | 1 | 2
2 | 3 | 4
3 | 5 | 6
4 | 7 | 8
(4 rows)
Now I want to count number of matches played by different players, I have counted the number of matches won by players by following query.
SELECT Players.id, COUNT(Matches.winner) AS Points FROM Players LEFT JOIN (SELECT * from Matches) AS Matches ON Players.id = Matches.winner GROUP by Players.id Order by Points desc, Players.id;
id | points
----+---
1 | 1
3 | 1
5 | 1
7 | 1
2 | 0
4 | 0
6 | 0
8 | 0
(8 rows)
But I am not able to get the logic of how should I calculate the number of matches played by each player?
From above Matches table we can see that each players has played once but I am not able to write that in psql.
A simple nested subquery is needed, as follows:
SELECT pl.id,
(
SELECT COUNT(id)
FROM Matches
WHERE winner = pl.id OR loser = pl.id
) AS matches
FROM Players pl
The following query counts the winner and loser ID values in two separate queries which are then UNIONed together. There is no worry of double counting because a team cannot play against itself, rather it can only play against another unique team. The outer query sums the team counts.
SELECT t.id, COUNT(t.id) AS numMatches FROM (
SELECT winner AS id FROM Matches UNION ALL
SELECT loser AS id FROM Matches
) t GROUP BY t.id
Related
I got a lot of nodes, some with similar values in field X, I want to select by distinct X values and take all the popular nodes (order by some other field Y) with all their properties.
Example:
ID | X | Y | Name
1 | A | 100 | David
2 | A | 10 | Chris
3 | B | 5 | Brad
4 | B | 25 | Amber
Should return:
1 | A | 100 | David
4 | B | 25 | Amber
I managed to get the list by distinct X:
MATCH (u:NodeType)
RETURN DISTINCT u.X
I need to find the most popular (highest value of Y) nodes to join with my distinct nodes (which are now only a single property) and return whole nodes (with all the properties).
You are looking for an arg max-style query. I recently answered a similar problem using collect:
MATCH (u:NodeType)
WITH u
ORDER BY u.Y DESC
WITH u.X AS X, collect(u)[0] AS u
RETURN u
The idea is the following:
Order by the value of Y (descending).
Implicitly group by the values of X and for the aggregating function, use collect to gather other values to a list. The elements of the list are the nodes (which are still stored according to a descending order of Y).
For each collected list, select the first element with [0].
Maybe the query is a bit easier to read if you perform the last step in a separate clause (and not in the WITH clause that performs the collect):
MATCH (u:NodeType)
WITH u
ORDER BY u.Y DESC
WITH u.X AS X, collect(u) AS us
RETURN us[0] AS u
Hi I have a tables like this
exams
id | exam_name
1 | computer science
2 | Environment science
exam_students
id | exam_id | student_name
1 | 1 | Josh
2 | 1 | Michael
3 | 1 | John
I just need to join and count the total students of each exam and output something like this
exam_name | total_students |
computer science | 3 |
Environment science| 0 |
Thank you for your any help and suggestions
Try this
SELECT
a.exam_name, count(b.id) AS total_students
FROM
exams a
LEFT JOIN exam_students b ON a.id = b.exam_id
GROUP BY
a.id
Hope this help
I am struggling to get the proper cypher that is both efficient and allows pagination through skip and limit.
Here is the simple scenario: I have the related nodes (company)<-[A]-(set)<-[B]-(job) where there are multiple instances of (set) with distinct (job) instances related to them. The (job) nodes have a specific status property that can hold one of several states. We need to count the number of (job) nodes in a particular state per (set) and use skip and limit to paginate on the distinct (set) nodes.
So we can get a very efficient query for job.status counts using this.
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
return s.Description, j.Status, count(*) as StatusCount;
Which will give us a rows of the Set.Description, Job.Status, and JobStatus count. But we will get multiple rows for the Set based on the Job.Status. This is not conducive to paging over distinct sets though. Something like:
s.Description j.Status StatusCount
-------------------+--------------+----------------
Set 1 | Unassigned | 10
Set 1 | Completed | 2
Set 2 | Unassigned | 3
Set 1 | Reviewed | 10
Set 3 | Completed | 4
Set 2 | Reviewed | 7
What we are trying to achieve with the proper cypher is result rows based on distinct Sets. Something like this:
s.Description Unassigned Completed Reviewed
-------------------+--------------+-------------+----------
Set 1 | 10 | 2 | 10
Set 2 | 3 | 0 | 7
Set 3 | 0 | 4 | 0
This would then allow us to paginate over Sets using skip and limit properly.
I have tried many different approaches and cannot seem to find the right combination for this type of result. Anyone have any ideas? Thanks!
** EDIT - Using the answer provided by MIchael, here's how to get the status count values in java **
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect({Status:Status,StatusCount:StatusCount]) as StatusCounts;
List<Object> statusMaps = (List<Object>) row.get("StatusCounts");
for(Object statusEntry : statusMaps ) {
Map<String,Object> statusMap = (Map<String,Object>) statusEntry;
String status = (String) statusMap.get("Status");
Number count = statusMap.get("StatusCount");
}
You can use WITH and aggregation, and optionally a map result
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect([Status,StatusCount]);
or
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect({Status:Status,StatusCount:StatusCount]);
I have
50K Post nodes
40K Tag nodes
125K TAGGED relationships (meaning average 2,5 tags per post)
in my graph and below query causes a "Java heap space" error.
match (p1:Post)-[r1:TAGGED]->(t:Tag)<-[r2:TAGGED]-(p2:Post)
return p1.Title, count(r1), p2.Title, count(r2)
limit 10
What I expected was some repeated rows depending on number of shared tags. I was not sure how limit would work (stop after first 10 posts or tags). But, since I have limit 10 I did not expect this query to traverse all the graph. It seems like it does.
UPDATE 1
With a few changes, Christophe Willemsen's query returns 10 rows in 15 sec.
// I need label for the otherPost because Users are also TAGGED
MATCH (post:Post)-[:TAGGED]->(t)<-[:TAGGED]-(otherPost:Post)
RETURN post.Title, count(t) as cnt, otherPost.Title
// ORDER BY cnt DESC // for now I do not need this
LIMIT 10;
I thought "ORDER BY" clause may cause traversal of all possible paths so I removed the clause but it is still 15 sec. It is also 15 sec. when I make the limit value 1 or 1000 without sorting.
What I expect from Neo4j was: "Start from any Post node, then jump to its Tags and find otherPosts that are tagged with the same tag. When there are 10 found stop traversing and return the results." I am pretty sure it is not doing this.
To make my expectation clear, assume that graph is this small and we use Limit 3 in the cypher query.
p1 - [t1, t2, t3] // Post1 is tagged with t1, t2 and t3
p2 - [t2, t3, t4]
p3 - [t3, t4, t5]
What I expect is:
Start form p1 (or any Post node)
Jump to t1
No other posts are tagged with t1
Jump to t2
p2 is tagged with t2 (1 of 3)
No other posts are tagged with t2
Jump to t3
p2 is tagged with t3 (2 of 3)
p3 is tagged with t3 (3 of 3)
we reached the limit, break
But, it seems like Limit is applied after traversing all data.
So, my question is now: Did Neo4j found all the matches and returned 10 of them or did it stop searching after first 10 matches? And of course, Why?
UPDATE 2
After helpful answers I managed to decrease the scope of my question so I tried below queries.
// 3 sec.
MATCH (p:Post)-[:TAGGED]->(t:Tag)
RETURN p.Title, count(t)
LIMIT 1;
// 3 sec.
MATCH (p:Post)-[:TAGGED]->(t:Tag)
RETURN p.Title, count(t)
LIMIT 1000;
// 100 ms.
MATCH (p:Post)-[:TAGGED]->(t:Tag)
RETURN p.Title, t.Name
LIMIT 1;
// 150 ms.
MATCH (p:Post)-[:TAGGED]->(t:Tag)
RETURN p.Title, t.Name
LIMIT 1000;
So, I still do not know why but, using aggregation methods (I tried collect(t.Name) instead of count) breaks the expected (at least my expectations :) behaviour of limit functionality.
This query will result in a global graph lookup, at least for neo4j 2.1.7 and below.
I would first matching the nodes and then expanding the path
MATCH (post:Post)
MATCH (post)-[:TAGS]->(t)<-[:TAGS]-(otherPost)
RETURN post, count(t) as cnt, otherPost
ORDER BY cnt DESC
LIMIT 10;
And this is the execution plan, as you can see by matching first the post nodes only (so labels index) it costs you only retrieving those and following relationships
ColumnFilter
|
+Top
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByLabel
+----------------------+--------+--------+----------------------------------------------+------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+--------+--------+----------------------------------------------+------------------------------------------------------------------------------------------------+
| ColumnFilter | 10 | 0 | | keep columns post, cnt, otherPost |
| Top | 10 | 0 | | { AUTOINT0}; Cached( INTERNAL_AGGREGATEc24f01bf-69cc-4bd9-9aed-be257028194b of type Integer) |
| EagerAggregation | 9900 | 0 | | post, otherPost |
| Filter | 134234 | 0 | | NOT( UNNAMED30 == UNNAMED43) |
| SimplePatternMatcher | 134234 | 0 | t, UNNAMED43, UNNAMED30, post, otherPost | |
| NodeByLabel | 100 | 101 | post, post | :Post |
+----------------------+--------+--------+----------------------------------------------+------------------------------------------------------------------------------------------------+
Total database accesses: 101
And here a blog post explaining why I removed labels except for the first part of the query : http://graphaware.com/neo4j/2015/01/16/neo4j-graph-model-design-labels-versus-indexed-properties.html
What Christophe said and
Try to reduce the cardinality in between:
match (p1:Post)-[r1:TAGGED]->(t:Tag)
WITH tag, count(*) as freq, collect(distinct p1.Title) as posts
MATCH (tag)<-[r2:TAGGED]-(p2:Post)
return posts, freq, p2.Title, count(r2)
limit 10
I have two tables like the following:
players:
id | name | location
1 | john | australia
2 | anne | u.s.
3 | jen | germany
games:
player | points | player2 | points2
john | 2 | anne | 1
anne | 1 | jen | 3
jen | 3 | john | 4
jen | 4 | anne | 1
I want the SQL statement to return each player's name, his or her location, and the total points each has accumulated through all games he or she participated in.
I tried to do a left join but I couldn't get this to work.
SELECT name, location, SUM(total_points) AS total_points
FROM (
SELECT name, location, COALESCE(SUM(points), 0) AS total_points
FROM players p
LEFT JOIN games g ON p.name = g.player
GROUP BY name, location
UNION ALL
SELECT name, location, COALESCE(SUM(points2), 0)
FROM players p
LEFT JOIN games g ON p.name = g.player2
GROUP BY name, location
) x
GROUP BY name, location
DEMO