LEFT JOIN sparse table onto large table - join

I have two tables that look like this
Date ID Date ID Value
2017-01-01 1 2017-01-01 1 10.0
2017-01-01 2 2017-01-01 2 15.0
2017-01-01 3 2017-01-02 3 20.0
2017-01-01 4 2017-01-02 4 50.0
2017-01-02 1
2017-01-02 2
2017-01-02 3
2017-01-02 4
I want to join the in a way to get
Date ID Value
2017-01-01 1 10.0
2017-01-01 2 15.0
2017-01-01 3 NULL
2017-01-01 4 NULL
2017-01-02 1 NULL
2017-01-02 2 NULL
2017-01-02 3 20.0
2017-01-02 4 50.0
I tried left joining T2 on T1 by using ID and Date which results always just in only the records that matched. If I only join on ID I get multiple entries (each Value) for each date.
SELECT
t1.Date,
t1.ID,
t2.Value
FROM table1 t1
left join table2 t2 using (Date,ID)

Here's another way of phrasing it:
SELECT
t1.Date,
t1.ID,
t2.Value
FROM
table1 t1
LEFT OUTER JOIN table2 t2 ON
t1.Date = t2.Date
AND t1.ID = t2.ID

Related

Trying to order rows by timestamps from two different tables Snowflake

I have two tables as follows
TABLE_1
PERSON_ID
LDTS
45
2022-03-03 15:41:05.685
72
2022-03-03 15:42:08.203
15
2022-06-08 21:57:07.909
36
2022-06-28 21:58:43.558
TABLE_2
PERSON_ID
LDTS
CURRENCY
34
2022-03-03 15:00:21.814
US
28
2022-03-03 15:02:05.963
CA
52
2022-03-03 15:02:05.963
US
10
2022-06-08 14:40:13.762
US
11
2022-06-08 14:40:13.762
CA
19
2022-06-14 16:10:19.005
US
I am trying to join these tables and order by timestamp in order to get a result such as
PERSON_ID
TABLE_1.LDTS
TABLE_2.LDTS
CURRENCY
34
NULL
2022-03-03 15:00:21.814
US
28
NULL
2022-03-03 15:02:05.963
CA
52
NULL
2022-03-03 15:02:05.963
US
45
2022-03-03 15:41:05.685
NULL
NULL
72
2022-03-03 15:42:08.203
NULL
NULL
10
NULL
2022-06-08 14:40:13.762
US
11
NULL
2022-06-08 14:40:13.762
CA
15
2022-06-08 21:57:07.909
NULL
NULL
19
NULL
2022-06-14 16:10:19.005
US
36
2022-06-28 21:58:43.558
NULL
NULL
Would this just be a left join on LDTS? I am not sure how to get the resulting table such that the timestamps are ordered in this way and the columns that are not shared contain nulls if their values are not in the other table. When I try to do a full outer join, it looks like rows are duplicated for LDTS and LDTS becomes a singular column while the values for the other columns are all null. Thanks!
Getting the rows where the key in the other table is null reciprocally could be handled as a set-based issue:
select T1.PERSON_ID, T1.LDTS as T1_LDTS, T2.LDTS as T2_LDTS, CURRENCY from TABLE_1 T1 left join TABLE_2 T2 on T1.LDTS = T2.LDTS
union
select T2.PERSON_ID, T1.LDTS as T1_LDTS, T2.LDTS as T2_LDTS, CURRENCY from TABLE_2 T2 left join TABLE_1 T1 on T1.LDTS = T2.LDTS
order by nvl(T1_LDTS, T2_LDTS)
;
In response to the question in the comments, if TABLE_2 does not have a PERSON_ID column, then simply specify that it's null:
select T1.PERSON_ID, T1.LDTS as T1_LDTS, T2.LDTS as T2_LDTS, CURRENCY from TABLE_1 T1 left join TABLE_2 T2 on T1.LDTS = T2.LDTS
union
select NULL as PERSON_ID, T1.LDTS as T1_LDTS, T2.LDTS as T2_LDTS, CURRENCY from TABLE_2 T2 left join TABLE_1 T1 on T1.LDTS = T2.LDTS
order by nvl(T1_LDTS, T2_LDTS)
;
Another one - similar to #Greg -
with cte(person_id, ldts) as
(select person_id,ldts from table_1
union all
select person_id,ldts from table_2)
select t3.person_id, t1.ldts, t2.ldts, t2.currency from
cte t3 left join table_1 t1
on t3.person_id = t1.person_id
left join table_2 t2
on t3.person_id = t2.person_id
order by t3.ldts;
PERSON_ID
LDTS
LDTS
CURRENCY
34
NULL
2022-03-03 15:00:21.814
US
28
NULL
2022-03-03 15:02:05.963
CA
52
NULL
2022-03-03 15:02:05.963
US
45
2022-03-03 15:41:05.685
NULL
NULL
72
2022-03-03 15:42:08.203
NULL
NULL
10
NULL
2022-06-08 14:40:13.762
US
11
NULL
2022-06-08 14:40:13.762
CA
15
2022-06-08 21:57:07.909
NULL
NULL
19
NULL
2022-06-14 16:10:19.005
US
36
2022-06-28 21:58:43.558
NULL
NULL
Or were you trying to do something like this
SELECT COALESCE(T1.PERSON_ID,T2.PERSON_ID),
COALESCE(T1.LDTS,T2.LDTS) AS T1_T2_LDTS,
CURRENCY
FROM TABLE_1 AS T1
FULL OUTER JOIN TABLE_2 AS T2
ON T1.PERSON_ID = T2.PERSON_ID
ORDER BY T1_T2_LDTS;
IF you really wanted in the format you posted, you can also do this
SELECT COALESCE(T1.PERSON_ID,T2.PERSON_ID),
T1.LDTS,T2.LDTS ,
CURRENCY
FROM TABLE_1 AS T1
FULL OUTER JOIN TABLE_2 AS T2
ON T1.PERSON_ID = T2.PERSON_ID
ORDER BY COALESCE(T1.LDTS,T2.LDTS);

In Bigquery: How to fetch column value matching with other table but retaining the same column value if not matched

Scenario:
Have got two bigquery tables with same columns. Have to compare these two tables w.r.t. Category and Article,
i) if same present in table_2, have to fetch 'Flow' column from table_2
ii) otherwise, retain the same Flow column from Table_1 itself.
Table_1:
Category Article Flow
AA 11 Apple
AA 12 Orange
BB 13 Lemon
CC 14
Table_2:
Category Article Flow
AA 11 Melon
BB 13 Pine
Resultant Table:
Category Article Flow
AA 11 Melon
AA 12 Orange
BB 13 Pine
CC 14
Tried_Out Query:
select t1.Category, t1.Article, t2.Flow
from t1 left join t2
on t1.Category=t2.Category and t1.Article=t2.Article
Help me resolve this issue. Thanks in Advance!
Try left join:
with table_1 as (
select 'AA' as category, 11 as article, 'Apple' as flow UNION ALL
select 'AA', 12, 'Orange' UNION ALL
select 'BB', 13, 'Lemon' UNION ALL
select 'CC', 14, null
),
table_2 as (
select 'AA' as category, 11 as article, 'Melon' as flow UNION ALL
select 'BB', 13, 'Pine'
)
select
table_1.category,
table_1.article,
ifnull(table_2.flow, table_1.flow) as flow
from table_1 left join table_2 using(category, article)

How to join tables to result in final table that only shows rows with the 2 same column values SQL

Let's say I have
table 1:
ID
Date
1
july 10
2
aug 4
3
feb 20
table 2:
ID
Date
Name
Address
1
july 10
joe
123 Howard way
2
aug 4
kate
456 king ave
3
feb 20
lisa
789 giuldford way
4
march 1
jake
145 smith street
5
dec 16
robert
6784 apple street
I want the final table to pull all columns from table 2 but only the rows that have the same ID number and Date as table 1 therefore:
final table:
ID
Date
Name
Address
1
july 10
joe
123 Howard way
2
aug 4
kate
456 king ave
3
feb 20
lisa
789 giuldford way
How would I do this?
I tried using an INNER JOIN and ON with a WHERE clause and that didn't work out. I received duplicates of everything. Also tried a subquery as well. Please help. I am using Standard SQL
Simple JOIN should work!
SELECT t2.*
FROM t1
JOIN t2
USING (ID, Date)
In case if Table 1 really has just those two columns that are part of Using clause - you can use also below (simply * instead of t2.*)
SELECT *
FROM t1
JOIN t2
USING (ID, Date)
Try with Left Join with below query:
SELECT t1.Id,t1.Date,t2.Name,t2.Address
FROM Table1 t1
LEFT JOIN TABLE2 t2 ON t2.Id=t1.Id AND t2.Date=t1.Date
Try RIGHT JOIN with query below
SELECT t2.* FROM t2
RIGHT JOIN t1 USING (ID, Date)

Using psql, how do you get the total sum for last 3 days, on each day?

I have a table that contains all purchases made at each school. I’m able to get the total spent per school, per item, per day,
with the following.
SELECT
date
school_id,
item_id,
sum(price) as total_price
FROM purchases
GROUP BY school_id, item_id, date
ORDER BY school_id, date
It will return something like
date school_id item_id total_price
2016-11-18 | 1 | 1 | 0.50
2016-11-17 | 1 | 2 | 1.00
2016-11-16 | 1 | 1 | 0.50
2016-11-18 | 2 | 2 | 1.00
2016-11-17 | 2 | 2 | 1.00
2016-11-16 | 2 | 2 | 1.00
I need a table that returns the total price for the last 3 days (including the day of) on each day,
So something like
date school_id item_id total_price
2016-11-18 | 1 | 1 | 1.00
2016-11-17 | 1 | 2 | 1.00
2016-11-16 | 1 | 1 | 0.50
2016-11-18 | 2 | 2 | 3.00
2016-11-17 | 2 | 2 | 2.00
2016-11-16 | 2 | 2 | 1.00
I know I can use lag() OVER (PARTITION BY), but I may need to do this for months at the time instead of 3 days, and lag will take forever to get set up.
I’m not really sure what other method I can use. Any guidance?
A simple INNER JOIN would do
You join the table to itself, when the school and item match, and the date is 3 days range
Notice that this would give a moving average of the last 3 days, but it seems so from your question, since you want consecutive days, without jumps
SELECT
p1.date
p1.school_id,
p1.item_id,
SUM(p2.price) total_price_3_days
purchases p1
INNER JOIN purchases p2 ON p1.school_id = p2.school_id AND p1.item_id = p2.item_id AND p2.`date` BETWEEN DATE_SUB(p1.`date`, INTERVAL 3 DAY) AND p1.`date`
GROUP BY p1.school_id, p1.item_id, p1.date
ORDER BY p1.school_id, p1.date
One approach would be to just use a correlated subquery in the select clause:
SELECT
date,
school_id,
item_id,
(SELECT SUM(p2.price) FROM purchases p2
WHERE p1.school_id = p2.school_id AND
p2.date BETWEEN p1.date - INTERVAL '3 DAY' AND p1.date) AS total_price
FROM purchases p1
GROUP BY school_id, item_id, date
ORDER BY school_id, date DESC;
Demo
Another approach would be to take advantage of Postgres' window functions:
SELECT
date,
school_id,
item_id,
SUM(price) OVER (PARTITION BY school_id
ORDER BY date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS total_price
FROM purchases p1
GROUP BY school_id, item_id, date
ORDER BY school_id, date DESC;
Demo
Both generate this output:
Note that my school_id=1 output does not agree with your expected output, but I think your expected data has a typo.

Clean cumulative sum alongside grouped sum

I am working in PostgreSQL 9.6.6
For the sake of reproducibility, I'll use create tempory table to create a "constant" table to play with:
create temporary table test_table as
select * from
(values
('2018-01-01', 2),
('2018-01-01', 3),
('2018-02-01', 1),
('2018-02-01', 2))
as t (month, count)
A select * from test_table returns the following:
month | count
------------+-------
2018-01-01 | 2
2018-01-01 | 3
2018-02-01 | 1
2018-02-01 | 2
The desired output is the following:
month | sum | cumulative_sum
------------+-----+----------------
2018-01-01 | 5 | 5
2018-02-01 | 3 | 8
In other words, the values have been summed, grouping by month, and then the cumulative sum is displayed in another column.
The issue is that the only way I know to achieve this is somewhat convoluted. The grouped sum must be computed first, (as with a sub select or with statement), and then the running tally is computed with a select statement against that table, as so:
with sums as
(select month,
sum(count) as sum
from test_table
group by 1)
select month,
sum,
sum(sum) over (order by month) as cumulative_sum
from sums
What I wish could work would be something more like...
select month,
sum(count) as sum,
sum(count) over (order by month) as cumulative_sum
from test_table
group by 1
But this returns
ERROR: column "test_table.count" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: sum(count) over (order by month) as cumulative_sum
No amount of fussing with the group by clause seems to satisfy PSQL.
TL,DR: is there a way in PSQL to compute both a sum over groups and the cumulative sum over groups using just a single select statement? More generally, is there a "preferred" way to accomplish this, beyond the method I use in this question?
Your hunch to use SUM as an analytic function was on the right track, but you need to analytic sum the aggregate sum:
SELECT month,
SUM(count) as sum,
SUM(SUM(count)) OVER (ORDER BY month) AS cumulative_sum
FROM test_table
GROUP BY 1;
Demo
As to why this works, the analytic functions are applied after the GROUP BY clause has happened. So the aggregate sum in fact is available when we go take the rolling sum.

Resources