LEFT JOIN using _PARTITIONDATE - join

I'm currently using StandardSQL in BigQuery, I tried to join two sets of table one of which is a pseudo-column table partitioned by day.
I tried to use this query below:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)
I've tried this kind of JOIN (using _PARTITIONDATE) and worked out, but for this particular query I got an error message:
Unrecognized name: _PARTITIONDATE
Can anyone tell me why this happened, and how could I solve this? Thanks in advance.

The issue is that you are not selecting the _PARTITIONDATE field from table_2 when joining it so it can't recognize it:
SELECT user_id, city_name FROM table_2
In order to solve it you can add it as follows:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT _PARTITIONDATE AS pd, user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = pd)
Note that you'll need an alias such as pd as it's a pseudocolumn
Probably it was working in the past if you were joining two tables directly such as in (you don't get selectivity benefits in that case):
FROM
table_1 a
LEFT JOIN table_2 b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)

Related

How can I join 2 tables?

I would like to join tables . Could you please help?
Select Number, OwnerId from DNIS.numbers
select ID,Name from DNIS.owners
Thank you.
Normally, SQL servers allow you to join tables from different databases as long as the former all belong to them. Here is an example showing you how to do this (all you have to do is to explicitly write the database names associated to each table in the query):
SELECT N.Number, N.OwnerId, O.ID, O.Name
FROM DB1.[dbo].DNIS numbers N
JOIN DB2.[dbo].DNIS owners O ON O.ID = N.OwnerId
You can also use the following syntax:
SELECT N.Number, N.OwnerId, O.ID, O.Name
FROM DB1..DNIS numbers N
JOIN DB2..DNIS owners O ON O.ID = N.OwnerId
In order to accomplish that you will have to specify the table and column names in your join statement, like so:
SELECT db1.tablename.column, db2.tablename.column
FROM db1.tablename INNER JOIN db2.tablename
ON db1.tablename.id = db2.tablename.id;

SQL placement join with student, friend, package

Issue:
You are given three tables: Students, Friends and Packages.
Students contains two columns: ID and Name.
Friends contains two columns: ID and Friend_ID (ID of the ONLY best friend).
Packages contains two columns: ID and Salary (offered salary in $ thousands per month).
Write a query to output the names of those students whose best friends got offered a higher salary than them. Names must be ordered by the salary amount offered to the best friends. It is guaranteed that no two students got same salary offer.
Code:
This is the code that I have come up with but it does not produce correct results. Can anyone let me know why?
select TableA.name
from
(select s.id,s.name,p.salary from students s inner join packages p on s.id=p.id) TableA,
(select f.id,f.friend_id, p2.salary from friends f inner join packages p2 on f.friend_id=p2.id) TableB
where TableA.id=TableB.id And TableA.salary>TableB.salary
order by TableB.salary desc;
I think in your query you wrote AND TableA.salary < TableB.salary instead of AND TableA.salary > TableB.salary.
Moreover I think your query can be written in a more synthetic way.
On MSSQL (but it works on MYSQL too, as query is very basic), you can try to use this one:
SELECT s.id
,s.NAME
,p.salary
, f.friend_id, p2.salary as friend_salary
FROM students s
INNER JOIN packages p ON s.id = p.id
LEFT JOIN friends f ON f.id = s.id
LEFT JOIN packages p2 ON f.friend_id = p2.id
WHERE p.salary <= p2.salary
ORDER BY s.id;
Output:
id NAME salary friend_id friend_salary
1 John 1000 2 1200
3 Pete 800 1 1000
Sample data:
CREATE TABLE students (id int, NAME VARCHAR(30));
CREATE TABLE packages (id int, salary INT);
CREATE TABLE friends (id int, friend_id INT);
INSERT INTO students values (1,'John');
INSERT INTO students values (2,'Arthur');
INSERT INTO students values (3,'Pete');
INSERT INTO packages values (1,1000);
INSERT INTO packages values (2,1200);
INSERT INTO packages values (3,800);
INSERT INTO friends values (1,2);
INSERT INTO friends values (2,3);
INSERT INTO friends values (3,1);
I used CTE for easy code readability. I am not sure whether it is fully optimized or not. But, it yields the result as expected from the question.
with std_salary as (
SELECT s.id, s.name, p.salary
FROM Students s
JOIN Packages p
ON s.id=p.id),
friend_salary as (
SELECT f.id, p.salary
FROM Friends f
JOIN Packages p
ON f.friend_id=p.id
)
SELECT name
FROM
(SELECT std_salary.name, std_salary.salary as own, friend_salary.salary as friend
FROM std_salary
JOIN friend_salary
ON std_salary.id=friend_salary.id) as final
WHERE final.own<final.friend
ORDER BY final.friend;
This worked for me in MS SQL
SELECT a.name
FROM (SELECT students.id as main_id, students.name, packages.salary
FROM students join packages on students.id = packages.id) a
JOIN (SELECT f.id as main_id1, p.salary
FROM friends f JOIN packages p ON f.friend_id = p.id) b
ON a.main_id = b.main_id1
WHERE b.salary>a.salary
ORDER BY b.salary ASC;
you have written 'where TableA.salary>TableB.salary' implying that you want to find rows where your salary is > than your friends. But the question asked was the opposite (to find names where the firends salary is > than your salary) so you can change that to 'where TableB.salary>TableA.salary' and it should work.
select my_name from
(select s.id as my_id,s.name my_name,p.salary as my_salary from students s
inner join packages p on s.id=p.id) as my_tbl inner join (select f.id as
id,f.friend_id as frnd_id,p.salary as frnd_salary from friends f inner join
packages p on f.friend_id=p.id ) as frnd_tbl on my_id=id where
frnd_salary>my_salary order by frnd_salary;

2 joins on same table while summing amounts

I've got 2 tables that I need to efficiently pull data out of.
TableA
id
full_name
TableB
table_a_id_1
table_a_id_2
amount1
amount2
TableB references two different records on TableA (they will never be the same)
I am trying to write 1 query (or the most efficient query, trying to avoid n+2 query) to get a list of all records in TableA with the totaled sums for corresponding TableB
It would look something like this:
[{id: 1, full_name: 'bob', 1_sum1: 50.0, 1_sum2: 30.0, 2_sum1: 1.0, 2_sum2: 25.0}]
where
id and full_name is from TableA
1_sum1, 1_sum2, are both summed columns from TableB where table_a_id_1 = TableA.id
2_sum1, 2_sum2 are both summed columns from TableB where table_a_id_2 = TableA.id
I really hope this isn't confusing. I've got a query like this:
results = TableA.
joins('LEFT JOIN table_b t1_stats ON t1_stats.t1_aff_id = table_a.id').
joins('LEFT JOIN table_b t2_stats ON t2_stats.t2_aff_id = table_a.id').
select("table_a.*,
sum(t1_stats.amount1) AS 1_sum1,
sum(t2_stats.amount1) AS 2_sum1,
sum(t1_stats.amount2) AS 1_sum2,
sum(t2_stats.amount2) AS 2_sum2").
group('table_a.id')
I'm not getting results that are correct on the summed totals. I think its because it should be only summing records for 1_sum1 on records from table_b where table_a_id_1 = table_a.id and instead I think it might be including all records, then summing them.
Do I need to do a sub select or something instead? This is not my strong point here, so any help on getting this query sorted out would be great!
Thanks
Try:
results = TableA.
select("table_a.*,
(SELECT sum(table_b.amount1) FROM table_b WHERE table_b.t1_aff_id = table_a.id) AS 1_sum1,
(SELECT sum(table_b.amount2) FROM table_b WHERE table_b.t1_aff_id = table_a.id) AS 1_sum2,
(SELECT sum(table_b.amount1) FROM table_b WHERE table_b.t2_aff_id = table_a.id) AS 2_sum1,
(SELECT sum(table_b.amount2) FROM table_b WHERE table_b.t2_aff_id = table_a.id) AS 2_sum2")

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

Nested queries. Select as column

Is it possible to write this query in ROR?
SELECT column_1,
(SELECT name FROM table_2 WHERE table_2.column_1 = table_1.column_1) as name
FROM table_1;
Yes, it is possible:
Table_1.select("column_1, (SELECT name FROM table_2 WHERE table_2.column_1 = table_1.column_1) as name")
If you will user Arel it will seems yet more complicated then this.
But exists other ways to simplify this query:
split it to two query and merge it together in Rails
using joins method for join table_1 and table_2 and select field table_2.name.

Resources