Getting Conditional Count in Join with Laravel Query Builder - laravel-5.1

I am trying to achieve the following with Laravel Query builder.
I have a table called deals . Below is the basic schema
id
deal_id
merchant_id
status
deal_text
timestamps
I also have another table called merchants whose schema is
id
merchant_id
merchant_name
about
timestamps
Currently I am getting deals using the following query
$deals = DB::table('deals')
-> join ('merchants', 'deals.merchant_id', '=', 'merchants.merchant_id')
-> where ('merchant_url_text', $merchant_url_text)
-> get();
Since only 1 merchant is associated with a deal, I am getting deals and related merchant info with the query.
Now I have a 3rd table called tbl_deal_votes. Its schema looks like
id
deal_id
vote (1 if voted up, 0 if voted down)
timestamps
What I want to do is join this 3rd table (on deal_id) to my existing query and be able to also get the upvotes and down votes each deal has received.

To do this in a single query you'll probably need to use SQL subqueries, which doesn't seem to have good fluent query support in Laravel 4/5. Since you're not using Eloquent objects, the raw SQL is probably easiest to read. (Note the below example ignores your deals.deal_id and merchants.merchant_id columns, which can likely be dropped. Instead it just uses your deals.id and merchants.id fields by convention.)
$deals = DB::select(
DB::raw('
SELECT
deals.id AS deal_id,
deals.status,
deals.deal_text,
merchants.id AS merchant_id,
merchants.merchant_name,
merchants.about,
COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count,
COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count
FROM
deals
JOIN merchants ON (merchants.id = deals.merchant_id)
LEFT JOIN (
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1 && deal_id
GROUP BY deal_id
) tbl_upvotes ON (tbl_upvotes.deal_id = deals.id)
LEFT JOIN (
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id
) tbl_downvotes ON (tbl_downvotes.deal_id = deals.id)
')
);
If you'd prefer to use fluent, this should work:
$upvotes_subquery = '
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1
GROUP BY deal_id';
$downvotes_subquery = '
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id';
$deals = DB::table('deals')
->select([
DB::raw('deals.id AS deal_id'),
'deals.status',
'deals.deal_text',
DB::raw('merchants.id AS merchant_id'),
'merchants.merchant_name',
'merchants.about',
DB::raw('COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count'),
DB::raw('COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count')
])
->join('merchants', 'merchants.id', '=', 'deals.merchant_id')
->leftJoin(DB::raw('(' . $upvotes_subquery . ') tbl_upvotes'), function($join) {
$join->on('tbl_upvotes.deal_id', '=', 'deals.id');
})
->leftJoin(DB::raw('(' . $downvotes_subquery . ') tbl_downvotes'), function($join) {
$join->on('tbl_downvotes.deal_id', '=', 'deals.id');
})
->get();
A few notes about the fluent query:
Used the DB::raw() method to rename a few selected columns.
Otherwise, there would have been a conflict between deals.id
and merchants.id in the results.
Used COALESCE to default null votes to 0.
Split the subqueries into separate PHP strings to improve readability.
Used left joins for the subqueries so deals with no upvotes/downvotes still show up.

Related

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

Optimizing SQL query using JOIN instead of NOT IN

I have a sql query that I'd like to optimize. I'm not the designer of the database, so I have no way of altering structure, indexes or stored procedures.
I have a table that consists of invoices (called faktura) and each invoice has a unique invoice id. If we have to cancel the invoice a secondary invoice is created in the same table but with a field ("modpartfakturaid") referring to the original invoice id.
Example of faktura table:
invoice 1: Id=152549, modpartfakturaid=null
invoice 2: Id=152592, modpartfakturaid=152549
We also have a table called "BHLFORLINIE" which consists of services rendered to the customer. Some of the services have already been invoiced and match a record in the invoice (FAKTURA) table.
What I'd like to do is get a list of all services that either does not have an invoice yet or does not have an invoice that's been cancelled.
What I'm doing now is this:
`SELECT
dbo.BHLFORLINIE.LeveringsDato AS treatmentDate,
dbo.PatientView.Navn AS patientName,
dbo.PatientView.CPRNR AS patientCPR
FROM
dbo.BHLFORLINIE
INNER JOIN dbo.BHLFORLOEB
ON dbo.BHLFORLOEB.BhlForloebID = dbo.BHLFORLINIE.BhlForloebID
INNER JOIN dbo.PatientView
ON dbo.PatientView.PersonID = dbo.BHLFORLOEB.PersonID
INNER JOIN dbo.HENVISNING
ON dbo.HENVISNING.BhlForloebID = dbo.BHLFORLOEB.BhlForloebID
LEFT JOIN dbo.FAKTURA
ON dbo.BHLFORLINIE.FakturaId = FAKTURA.FakturaId
WHERE
(dbo.BHLFORLINIE.LeveringsDato >= '2017-01-01' OR dbo.BHLFORLINIE.FakturaId IS NULL) AND
dbo.BHLFORLINIE.ProduktNr IN (110,111,112,113,8050,4001,4002,4003,4004,4005,4006,4007,4008,4009,6001,6002,6003,6004,6005,6006,6007,6008,7001,7002,7003,7004,7005,7006,7007,7008) AND
((dbo.FAKTURA.FakturaType = 0 AND
dbo.FAKTURA.FakturaID NOT IN (
SELECT FAKTURA.ModpartFakturaID FROM FAKTURA WHERE FAKTURA.ModpartFakturaID IS NOT NULL
)) OR
dbo.FAKTURA.FakturaType IS NULL)
GROUP BY
dbo.PatientView.CPRNR,
dbo.PatientView.Navn,
dbo.BHLFORLINIE.LeveringsDato`
Is there a smarter way of doing this? Right now the added the query performs three times slower because of the "not in" subquery.
Any help is much appreciated!
Peter
You can use an outer join and check for null values to find non matches
SELECT customer.name, invoice.id
FROM invoices i
INNER JOIN customer ON i.customerId = customer.customerId
LEFT OUTER JOIN invoices i2 ON i.invoiceId = i2.cancelInvoiceId
WHERE i2.invoiceId IS NULL

Linq Query Timing Out

I have this query that uses the DBContext entities I created.
var referral = entities.StudentReferrals.Where(x => x.ReferralID == p && x.SchoolYear == year).FirstOrDefault();
When I remove x.SchoolYear == year the query works fine, but with it my query times out. The opposite of what I would expect to happen, I would expect the more you narrow a query down via Where clause constraints the less likely it would time out.
SchoolYear is a field in the query and the query itself is valid, when I perform the query within SQL Studio Manager it returns results in less than a second.
My confusion is, why would adding a constraint to the Where clause cause a query to time out??
x.SchoolYear and year are both strings.
The full query is...
SELECT [Extent1].[BirthDate] AS [BirthDate],
[Extent1].[LegalFirstName] AS [LegalFirstName],
[Extent1].[LegalLastName] AS [LegalLastName],
[Extent1].[PreferredFirstName] AS [PreferredFirstName],
[Extent1].[PreferredLastName] AS [PreferredLastName],
[Extent1].[StudentNumber] AS [StudentNumber],
[Extent1].[LegacyStudentNumber] AS [LegacyStudentNumber],
[Extent1].[TranscriptSchoolCode] AS [TranscriptSchoolCode],
[Extent1].[OEN] AS [OEN],
[Extent1].[StatusIndicator] AS [StatusIndicator],
[Extent1].[SchoolYear] AS [SchoolYear],
[Extent1].[ReferralID] AS [ReferralID],
[Extent1].[PersonID] AS [PersonID],
[Extent1].[Active] AS [Active],
[Extent1].[ServiceTypeID] AS [ServiceTypeID],
[Extent1].[IsSchoolActive] AS [IsSchoolActive],
[Extent1].[Principal] AS [Principal],
[Extent1].[SchoolName] AS [SchoolName],
[Extent1].[SchoolCode] AS [SchoolCode],
[Extent1].[NearNorthSchoolCode] AS [NearNorthSchoolCode],
[Extent1].[TranscriptSchoolPrincipal] AS [TranscriptSchoolPrincipal],
[Extent1].[TranscriptSchoolName] AS [TranscriptSchoolName],
[Extent1].[TranscriptNearNorthSchoolCode] AS [TranscriptNearNorthSchoolCode],
[Extent1].[GuardianFirstName] AS [GuardianFirstName],
[Extent1].[GuardianLastName] AS [GuardianLastName],
[Extent1].[AreaCode] AS [AreaCode],
[Extent1].[ContactNo] AS [ContactNo],
[Extent1].[ReferredByFirstName] AS [ReferredByFirstName],
[Extent1].[ReferredByLastName] AS [ReferredByLastName],
[Extent1].[ReferredDate] AS [ReferredDate],
[Extent1].[Reason] AS [Reason],
[Extent1].[gender] AS [gender],
[Extent1].[grade] AS [grade],
[Extent1].[HomeroomTeacher] AS [HomeroomTeacher],
[Extent1].[IntakeTeamMember] AS [IntakeTeamMember],
[Extent1].[IntakeMemberID] AS [IntakeMemberID]
FROM (SELECT [StudentReferrals].[BirthDate] AS [BirthDate],
[StudentReferrals].[LegalFirstName] AS [LegalFirstName],
[StudentReferrals].[LegalLastName] AS [LegalLastName],
[StudentReferrals].[PreferredFirstName] AS [PreferredFirstName],
[StudentReferrals].[PreferredLastName] AS [PreferredLastName],
[StudentReferrals].[gender] AS [gender],
[StudentReferrals].[StudentNumber] AS [StudentNumber],
[StudentReferrals].[LegacyStudentNumber] AS [LegacyStudentNumber],
[StudentReferrals].[TranscriptSchoolCode] AS [TranscriptSchoolCode],
[StudentReferrals].[OEN] AS [OEN],
[StudentReferrals].[StatusIndicator] AS [StatusIndicator],
[StudentReferrals].[SchoolYear] AS [SchoolYear],
[StudentReferrals].[grade] AS [grade],
[StudentReferrals].[ReferralID] AS [ReferralID],
[StudentReferrals].[PersonID] AS [PersonID],
[StudentReferrals].[Active] AS [Active],
[StudentReferrals].[ServiceTypeID] AS [ServiceTypeID],
[StudentReferrals].[IsSchoolActive] AS [IsSchoolActive],
[StudentReferrals].[Principal] AS [Principal],
[StudentReferrals].[SchoolName] AS [SchoolName],
[StudentReferrals].[SchoolCode] AS [SchoolCode],
[StudentReferrals].[NearNorthSchoolCode] AS [NearNorthSchoolCode],
[StudentReferrals].[TranscriptSchoolPrincipal] AS [TranscriptSchoolPrincipal],
[StudentReferrals].[TranscriptSchoolName] AS [TranscriptSchoolName],
[StudentReferrals].[TranscriptNearNorthSchoolCode] AS [TranscriptNearNorthSchoolCode],
[StudentReferrals].[GuardianFirstName] AS [GuardianFirstName],
[StudentReferrals].[GuardianLastName] AS [GuardianLastName],
[StudentReferrals].[AreaCode] AS [AreaCode],
[StudentReferrals].[ContactNo] AS [ContactNo],
[StudentReferrals].[ReferredByFirstName] AS [ReferredByFirstName],
[StudentReferrals].[ReferredByLastName] AS [ReferredByLastName],
[StudentReferrals].[ReferredDate] AS [ReferredDate],
[StudentReferrals].[IntakeTeamMember] AS [IntakeTeamMember],
[StudentReferrals].[IntakeMemberID] AS [IntakeMemberID],
[StudentReferrals].[Reason] AS [Reason],
[StudentReferrals].[HomeroomTeacher] AS [HomeroomTeacher]
FROM [dbo].[StudentReferrals] AS [StudentReferrals]) AS [Extent1]
WHERE ([Extent1].[ReferralID] = #p__linq__0) AND ([Extent1].[SchoolYear] = #p__linq__1)
Here is the StudentReferral definition...
SELECT TOP (100) PERCENT p.person_id AS PersonID, p.birth_date AS BirthDate, p.legal_first_name AS LegalFirstName, p.legal_surname AS LegalLastName, p.preferred_first_name AS PreferredFirstName,
p.preferred_surname AS PreferredLastName, p.gender, p.student_no AS StudentNumber, p.legacy_student_number AS LegacyStudentNumber, p.transcript_school_code AS TranscriptSchoolCode,
p.oen_number AS OEN, s.status_indicator_code AS StatusIndicator, s.school_year AS SchoolYear, s.grade, CAST(CASE WHEN PATINDEX('%[^A-Za-z]%', s.Grade) = 0 THEN 1 ELSE CASE WHEN CAST(s.Grade AS int)
< 9 THEN 1 ELSE 0 END END AS bit) AS IsElementary, t.SchoolName, t.SchoolCode, t.NearNorthSchoolCode, pg.person_id AS GuardianID, pg.legal_first_name AS GuardianFirstName,
pg.legal_surname AS GuardianLastName, pt.area_code AS AreaCode, pt.phone_no AS ContactNo, pt.email_account AS Email
FROM Trillium.dbo.persons AS p INNER JOIN
Trillium.dbo.student_registrations AS s ON s.person_id = p.person_id INNER JOIN
dbo.Schools AS t ON t.SchoolCode = s.school_code INNER JOIN
NNDSB_AD_Routines.dbo.Students_Trillium_Guardians AS g ON s.person_id = g.student_person_id INNER JOIN
Trillium.dbo.persons AS pg ON g.contact_person_id = pg.person_id INNER JOIN
Trillium.dbo.person_telecom AS pt ON pg.person_id = pt.person_id
WHERE (s.status_indicator_code IN ('Active', 'PreReg')) AND (pt.telecom_type_name = 'home')
GROUP BY p.person_id, p.birth_date, p.legal_first_name, p.legal_surname, p.preferred_first_name, p.preferred_surname, p.gender, p.student_no, p.legacy_student_number, p.transcript_school_code, p.oen_number,
s.status_indicator_code, s.school_year, s.grade, CAST(CASE WHEN PATINDEX('%[^A-Za-z]%', s.Grade) = 0 THEN 1 ELSE CASE WHEN CAST(s.Grade AS int) < 9 THEN 1 ELSE 0 END END AS bit), t.SchoolName,
t.SchoolCode, t.NearNorthSchoolCode, pg.person_id, pg.legal_first_name, pg.legal_surname, pt.area_code, pt.phone_no, pt.email_account, g.primary_contact_priority
ORDER BY g.primary_contact_priority
I can almost guarantee that the query that EF produces and the query you're executing in SSMS are not the exact same SELECT statement. You probably wrote something like what Stephen Byrne has in his answer, i.e.
SELECT * from StudentReferrals WHERE ReferallID=1 AND SchoolYear='2015'
Right off the bat this query doesn't have a TOP qualifier on it which your EF query probably will due to the presence of the FirstOrDefault call.
Your first step should be to use something like SQL Profiler and grab the actual query that EF is generating. It's possible that with that query the optimizer is choosing to do a table scan because of the type of query that is being generated.
This likely won't make any difference, but you could also try rewriting your query as:
var referral = entities.StudentReferrals.FirstOrDefault(x => x.ReferralID == p && x.SchoolYear == year);
As an example, when I write the following query against my database:
OrganizationalNodes.FirstOrDefault(on => on.Name == "Justice League")
EF generates the following SQL:
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[Id] AS [Id],
-- columns omitted for brevity
FROM ( SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
-- columns omitted for brevity
'0X0X' AS [C1]
FROM [dbo].[OrganizationalItems] AS [Extent1]
INNER JOIN [dbo].[OrganizationalNodes] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Id]
WHERE N'Justice League' = [Extent1].[Name]
) AS [Limit1]
Well, to answer the question
why would adding a constraint to the Where clause cause a query to time out
The most likely cause is that you have a lot of data in the table, but no index covers the SchoolYear column. Therefore when you include in in a WHERE clause, this causes a Table Scan (because every row has to be checked to see if it should be included or not in the result set)
If you use SQL Server Management Studio and write the query manually for e.g
SELECT * from StudentReferrals WHERE ReferallID=1 AND SchoolYear='2015'
And then include the actual Execution Plan (Query->Include Actual Estimation Plan) then you will get the execution breakdown which will show you clearly if there is a Table Scan involved. If there is, create an index to "cover" the columns involved and it should fix your issue.
Update
Another possible solution could be to run DBCC FREEPROCCACHE to clear out any cached execution plans just in case for some reason SQL Server has picked something insane for whatever query is generated by Entity Framework.

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

Resources