How to do union queries in mssql - ruby-on-rails

Hey guys I have some queries that are getting generated dynamically by active record and for performance reasons I need to merge them all together and send them to MSSQL in one go.
I tried the following and it works great in postgresql but I can't get it to work in MSSQL.
(SELECT [panels].* FROM [panels] WHERE [panels].[environment_id] = 14 AND [panels].[agglo_code_id] = 23 AND [panels].[advert_area_id] = 161 AND [panels].[product_id] = 25 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 65)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= '08-21-2017' AND "AIDAAU_Avails"."FromDate" <= '09-03-2017')) ORDER BY [panels].[random_order] ASC OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY)
UNION ALL
(SELECT [panels].* FROM [panels] WHERE [panels].[environment_id] = 14 AND [panels].[agglo_code_id] = 23 AND [panels].[advert_area_id] = 136 AND [panels].[product_id] = 25 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 65)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= '08-21-2017' AND "AIDAAU_Avails"."FromDate" <= '09-03-2017')) ORDER BY [panels].[random_order] ASC OFFSET 0 ROWS FETCH NEXT 2 ROWS ONLY)
Now I think there are two issues that I can spot already. If I remove the brackets surrounding each query then I get closer but it still complains about ORDER. I have a feeling you can only order after the result but I don't have that much control over how each individual sql query is put together only how i combine them. I would ideally like to keep the ability to both order and have the limit clause for each subquery. Is there any easy way of putting these together so they will work in MSSQL and not just postgres?
Thanks for your help!

In a UNION query you can only have one ORDER BY clause and it must go at the end:
SELECT * from <table1>
UNION ALL
SELECT * from <table2>
ORDER BY <col1>
You must remove that ORDER BY on your top query and it should work correctly

If you want to order union results, you would have to throw the results into a CTE. Something like this:
with cte_name as
(
select col_1, col_2, etc
from table
union all
select col_1, col_2, etc
from table_2
)
select col_1, col_2
from cte_name
order by col_1

Related

How do I optimise this ClickHouse DB JOIN query?

I am playing around with Clickhouse DB and I am trying to figure out why the query below is giving me a DB::Exception: Memory limit (for query) exceeded and could use some help...
SELECT * FROM
(
SELECT created_at, rates.car_id, MIN(rates.price) FROM rates
WHERE
pickup_location_id = 198
AND created_at = '2020-10-01'
GROUP BY created_at, car_id
) r
JOIN cars c2 ON r.car_id = c2.id
The inner query bit performs almost instantly (millions of records) and yields only 212 results. However, adding the JOIN causes the query to fail (memory exception, 45GB)
Looks like the JOIN happens on the whole of rates/cars - and not on the "result"?
CH uses HASHJOIN and places the right table into memory into a HashTable.
In case of inner join you can swap tables:
SELECT * FROM cars c2 JOIN
(
SELECT created_at, rates.car_id, MIN(rates.price) FROM rates
WHERE
pickup_location_id = 198
AND created_at = '2020-10-01'
GROUP BY created_at, car_id
) r
ON r.car_id = c2.id

Is the way to make this union of postgres queries more efficient is to use a materialized view?

So right now, I have the following postgres query in a Rails 5.0 application - The first query basically sums viewership and groups by domestic and international stations (radio_category) as well as FM and AM (radio_type). The second query totals viewership across all domestic and international stations and groups by FM/AM.
To make it more efficient, is it better to try and write a put a raw select statement to pull only the numbers that will eventually need to be summed in a materialized view, and then write a SUM()/GROUP_BY statement to pull from the view?
Or is there some clever use of SUM() I can do that only lets me select * the raw numbers once?
Let's say I have at least 1 million rows of data.
SELECT numbers.snapshot_id,
count(*) AS radio_count,
sum(numbers.view_count) AS view_count,
radios.category AS radio_category,
radios.type AS radio_type,
CASE
WHEN radios.type = 'AM' THEN 0
WHEN radios.type = 'FM' THEN 1
END as radio_enum_type
FROM (numbers
JOIN radios ON ((radios.id = numbers.radio_id)))
GROUP BY numbers.snapshot_id, radios.category, radios.type
UNION
SELECT numbers.snapshot_id,
count(*) AS radio_count,
sum(numbers.view_count) AS view_count,
3 AS radio_category,
radios.type AS radio_type,
CASE
WHEN radios.type = 'AM' THEN 0
WHEN radios.type = 'FM' THEN 1
END as radio_enum_type
FROM (numbers
JOIN radios ON ((radios.id = numbers.radio_id)))
GROUP BY numbers.snapshot_id, 3::integer, radios.type
You can't add a row without UNION. So not sure if this is better but you could precalculate the aggregation and then make the UNION from it. However, maybe your query gets optimized by Postgres and might be the same...
WITH aggregated_numbers AS (
SELECT numbers.snapshot_id,
count(*) AS radio_count,
sum(numbers.view_count) AS view_count,
radios.category AS radio_category,
radios.type AS radio_type,
CASE
WHEN radios.type = 'AM' THEN 0
WHEN radios.type = 'FM' THEN 1
END as radio_enum_type
FROM (numbers
JOIN radios ON ((radios.id = numbers.radio_id)))
GROUP BY numbers.snapshot_id, radios.category, radios.type)
SELECT * FROM aggregated_numbers
UNION
SELECT
snapshot_id,
sum(radio_count) as radio_count,
view_count,
3 as radio_category,
radio_type,
radio_enum_type
FROM aggregated_numbers

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

SQL lookup key defined by LAG function

I want to join two tables on a key based on LAG function. My query doesn't work though. I get an error:
Msg 4108, Level 15, State 1, Line 13 Windowed functions can only appear in the SELECT or ORDER BY clauses.
I shall appreciate any suggestion on how to tackle it.
**Table A**
Key
1
2
3
and so on...
**Table B**
MaxKey | Something
3 | A
5 | B
8 | C
**Expected Results**
Key|Something
1 A
2 A
3 A
4 B
5 B
6 C
SELECT
tabA.Key
,tabB.[Something]
,LAG (tabB.MaxKey,1,1) OVER (ORDER BY tabB.MaxKey) AS MinKey
,tabB.[MaxKey]
FROM TableA as tabA
LEFT JOIN TableB as tabB
ON tabA.Key > tabB.MinKey AND tabA.Key <= tabB.MaxKey
I think you can solve this using an outer apply like this:
select * from TableA a
outer apply (
select top 1 something
from TableB b
where b.maxkey >= a.[key]
) oa
Sample SQL Fiddle
Another option is to modify your query to do the lag in a derived table, I believe this might work too:
SELECT
tabA.[Key]
,tabB.[Something]
,MinKey
,tabB.[MaxKey]
FROM TableA as tabA
LEFT JOIN (
SELECT
[Something]
,LAG (MaxKey,1,1) OVER (ORDER BY MaxKey) AS MinKey
,[MaxKey]
FROM TableB) tabB
ON tabA.[key] >= tabB.MinKey AND tabA.[key] <= tabB.MaxKey
ORDER BY tabA.[key]

How to use joins and averages together in Hive queries

I have two tables in hive:
Table1: uid,txid,amt,vendor Table2: uid,txid
Now I need to join the tables on txid which basically confirms a transaction is finally recorded. There will be some transactions which will be present only in Table1 and not in Table2.
I need to find out number of avg of transaction matches found per user(uid) per vendor. Then I need to find the avg of these averages by adding all the averages and divide them by the number of unique users per vendor.
Let's say I have the data:
Table1:
u1,120,44,vend1
u1,199,33,vend1
u1,100,23,vend1
u1,101,24,vend1
u2,200,34,vend1
u2,202,32,vend2
Table2:
u1,100
u1,101
u2,200
u2,202
Example For vendor vend1:
u1-> Avg transaction find rate = 2(matches found in both Tables,Table1 and Table2)/4(total occurrence in Table1) =0.5
u2 -> Avg transaction find rate = 1/1 = 1
Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75
Required output:
vend1,0.75
vend2,1
I can't seem to find count of both matches and occurrence in just Table1 in one hive query per user per vendor. I have reached to this query and can't find how to change it further.
SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid
Any help would be great.
I think you are running into trouble with your JOIN. When you JOIN by txid and uid, you are losing the total number of uid's per group. If I were you I would assign a column of 1's to table2 and name the column something like success or transaction and do a LEFT OUTER JOIN. Then in your new table you will have a column with the number 1 in it if there was a completed transaction and NULL otherwise. You can then do a case statement to convert these NULLs to 0
Query:
select vendor
,(SUM(avg_uid) / COUNT(uid)) as avg_of_avgs
from (
select vendor
,uid
,AVG(complete) as avg_uid
from (
select uid
,txid
,amt
,vendor
,case when success is null then 0
else success
end as complete
from (
select A.*
,B.success
from table1 as A
LEFT OUTER JOIN table2 as B
ON B.txid = A.txid
) x
) y
group by vendor, uid
) z
group by vendor
Output:
vend1 0.75
vend2 1.0
B.success in line 17 is the column of 1's that I put int table2 before the JOIN. If you are curious about case statements in Hive you can find them here
Amazing and precise answer by GoBrewers14!! Thank you so much. I was looking at it from a wrong perspective.
I made little changes in the query to get things finally done.
I didn't need to add a "success" colummn to table2. I checked B.txid in the above query instead of B.success. B.txid will be null in case a match is not found and be some value if a match is found. That checks the success & failure conditions itself without adding a new column. And then I set NULL as 0 and !NULL as 1 in the part above it. Also I changed some variable names as hive was finding it ambiguous.
The final query looks like :
select vendr
,(SUM(avg_uid) / COUNT(usrid)) as avg_of_avgs
from (
select vendr
,usrid
,AVG(complete) as avg_uid
from (
select usrid
,txnid
,amnt
,vendr
,case when success is null then 0
else 1
end as complete
from (
select A.uid as usrid,A.vendor as vendr,A.amt as amnt,A.txid as txnid
,B.txid as success
from Table1 as A
LEFT OUTER JOIN Table2 as B
ON B.txid = A.txid
) x
) y
group by vendr, usrid
) z
group by vendr;

Resources