BigQuery joining and filling gaps - join

I have the following data sets I want to merge, but I am trying to fill in empty rows.
WITH data1 as (
SELECT "2021-01-01" date, 'abc' company, 3 cumulative_count UNION ALL
SELECT "2021-01-02" date, 'abc' company, 4 cumulative_count UNION ALL
SELECT "2021-01-06" date, 'abc' company, 17 cumulative_count UNION ALL
SELECT "2021-01-02" date , 'xzy' company, 2 cumulative_count UNION ALL
SELECT "2021-01-04" date , 'xzy' company, 16 cumulative_count UNION ALL
SELECT "2021-01-08" date , 'xzy' company, 16 cumulative_count
),
data_dates as (
# SELECT *
# FROM UNNEST(GENERATE_DATE_ARRAY('2021-01-01', '2021-01-08',INTERVAL 1 DAY)) AS date
SELECT "2021-01-01" date UNION ALL
SELECT "2021-01-02" date UNION ALL
SELECT "2021-01-03" date UNION ALL
SELECT "2021-01-04" date UNION ALL
SELECT "2021-01-05" date UNION ALL
SELECT "2021-01-06" date UNION ALL
SELECT "2021-01-07" date UNION ALL
SELECT "2021-01-08" date
)
SELECT
a.date,
b.company,
b.cumulative_count
FROM data_dates as a
LEFT OUTER JOIN data1 as b
ON a.date = b.date
Resulting in
How would I go about replicating the results for row 2021-01-03 from 2021-01-02 for both companies. Basically filling in the empty rows until new updated values come into play.
Also how would I specify to get company xzy for 2021-01-01.
Thank you.

Below should do
WITH data1 as (
SELECT "2021-01-01" date, 'abc' company, 3 cumulative_count UNION ALL
SELECT "2021-01-02" date, 'abc' company, 4 cumulative_count UNION ALL
SELECT "2021-01-06" date, 'abc' company, 17 cumulative_count UNION ALL
SELECT "2021-01-02" date , 'xzy' company, 2 cumulative_count UNION ALL
SELECT "2021-01-04" date , 'xzy' company, 16 cumulative_count UNION ALL
SELECT "2021-01-08" date , 'xzy' company, 16 cumulative_count
), data_dates as (
# SELECT *
# FROM UNNEST(GENERATE_DATE_ARRAY('2021-01-01', '2021-01-08',INTERVAL 1 DAY)) AS date
SELECT "2021-01-01" date UNION ALL
SELECT "2021-01-02" date UNION ALL
SELECT "2021-01-03" date UNION ALL
SELECT "2021-01-04" date UNION ALL
SELECT "2021-01-05" date UNION ALL
SELECT "2021-01-06" date UNION ALL
SELECT "2021-01-07" date UNION ALL
SELECT "2021-01-08" date
), data_companies as (
SELECT DISTINCT company
FROM data1
)
SELECT
a.date,
c.company,
IFNULL(cumulative_count, LAST_VALUE(cumulative_count IGNORE NULLS) OVER(PARTITION BY c.company ORDER BY a.date)) cumulative_count
FROM data_dates as a, data_companies as c
LEFT JOIN data1 as b
ON a.date = b.date
AND c.company = b.company
ORDER BY date, company
with output

Related

Select rows with no match in join table with where condition

In a Rails app with Postgres I have a users, jobs and followers join table. I want to select jobs that are not followed by a specific user. But also jobs with no rows in the join table.
Tables:
users:
id: bigint (pk)
jobs:
id: bigint (pk)
followings:
id: bigint (pk)
job_id: bigint (fk)
user_id: bigint (fk)
Data:
sandbox_development=# SELECT id FROM jobs;
id
----
1
2
3
(3 rows)
sandbox_development=# SELECT id FROM users;
id
----
1
2
sandbox_development=#
SELECT id, user_id, job_id FROM followings;
id | user_id | job_id
----+---------+--------
1 | 1 | 1
2 | 2 | 2
(2 rows)
Expected result
# jobs
id
----
2
3
(2 rows)
Can I create a join query that is the equivalent of this?
sandbox_development=#
SELECT j.id FROM jobs j
WHERE NOT EXISTS(
SELECT 1 FROM followings f
WHERE f.user_id = 1 AND f.job_id = j.id
);
id
----
2
3
(2 rows)
Which does the job but is a PITA to create with ActiveRecord.
So far I have:
Job.joins(:followings).where(followings: { user_id: 1 })
SELECT "jobs".* FROM "jobs"
INNER JOIN "followings"
ON "followings"."job_id" = "jobs"."id"
WHERE "followings"."user_id" != 1
But since its an inner join it does not include jobs with no followers (job id 3). I have also tried various attempts at outer joins that either give all the rows or no rows.
In Rails 5, You can use #left_outer_joins with where not to achieve the result. Left joins doesn't return null rows. So, We need to add nil conditions to fetch the rows.
Rails 5 Query:
Job.left_outer_joins(:followings).where.not(followings: {user_id: 1}).or(Job.left_outer_joins(:followings).where(followings: {user_id: nil}))
Alternate Query:
Job.left_outer_joins(:followings).where("followings.user_id != 1 OR followings.user_id is NULL")
Postgres Query:
SELECT "jobs".* FROM "jobs" LEFT OUTER JOIN "followings" ON "followings"."job_id" = "jobs"."id" WHERE "followings"."user_id" != 1 OR followings.user_id is NULL;
I'm not sure I understand, but this has the output you want and use outer join:
SELECT j.*
FROM jobs j LEFT JOIN followings f ON f.job_id = j.id
LEFT JOIN users u ON u.id = f.user_id AND u.id = 1
WHERE u.id IS NULL;

SQL: Rank by/Filter by total rank

I have a pretty standard "append only" table with created_at and group_name as columns using Amazon Redshift.
I want to produce a time series of the top N rows by group for the past [time range].
Currently I use this:
SELECT
date_trunc('day', created_at) AS timeseries,
my_table.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name
) ranking ON (ranking.group_name = my_table.group_name)
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
timeseries,
my_table.group_name,
ranking.rank
HAVING
ranking.rank <= 5
ORDER BY
timeseries DESC
This is pretty error prone to change because the filtering of created_at range is present twice, causing issues if it needs to change.
Is there a way to make this query more elegant (ideally using the time filter only once)?
you can add join condition for the created_at,
for example calculate max and min for the created_at and bring all the data between
SELECT
date_trunc('day', created_at) AS timeseries,
my_table.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
max(created_at) as max_createed,
min(created_at) as min_createed,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name
) ranking ON (ranking.group_name = my_table.group_name)
AND created_ad between min_created and max_created
GROUP BY
timeseries,
my_table.group_name,
ranking.rank
HAVING
ranking.rank <= 5
ORDER BY
timeseries DESC
also, i believed there are more elegant ways to calculate that without bring the same table twice
try this one, also the ruining should be faster
SELECT
ranking.date AS timeseries,
ranking.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
date(created_at) as date,
ROW_NUMBER() OVER (PARTITION BY date(created_at) ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name,
date(created_at) as date
) ranking
WHERE rank <=5
GROUP BY 1,2
I don't think I understand your requirements completely but this query should give the top 5 groups per day.
select timeseries, group_name, count from (
select timeseries, group_name, count,
row_number() over (partition by timeseries order by count desc) as rank
from (
select date_trunc('day', created_at) AS timeseries,
group_name,
count(*) AS count
from my_table
where created_at > sysdate - '1 day'::interval
group by 1,2
)
) where rank <= 5
order by 1 desc
This query should give the counts per day for the overall top 5 groups:
with daily_counts as (
select date_trunc('day', created_at) AS timeseries,
group_name,
count(*) AS count
from my_table
where created_at > sysdate - '1 day'::interval
group by 1,2
)
select d.timeseries, d.group_name, d.count
from daily_counts d
join (
select group_name, sum(count) as total
from daily_counts
group by group_name order by total desc
limit 5
) r on d.group_name=r.group_name
order by 1,3 desc

RIGHT OUTER JOIN returns empty results with WHERE

I need to produce a report of all records (businesses) created by a particular user each month over last months. I produced the following query and expect it to provide me with a row for each month. However, this user didn't create any records (businesses) these months so I get an empty result [].
I'm still expecting to receive a row for each month, since I'm selecting a generate_series column using RIGHT OUTER JOIN but it doesn't happen.
start = 3.months.ago
stop = Time.now
new_businesses = Business.select(
"generate_series, count(id) as new").
joins("RIGHT OUTER JOIN ( SELECT
generate_series(#{start.month}, #{stop.month})) series
ON generate_series = date_part('month', created_at)
").
where(created_at: start.beginning_of_month .. stop.end_of_month).
where(author_id: creator.id).
group("generate_series").
order('generate_series ASC')
How can I change my query to get a row for each month instead of an empty result? I'm using PosgreSQL.
UPDATE
This code works:
new_businesses = Business.select(
"generate_series as month, count(id) as new").
joins("RIGHT OUTER JOIN ( SELECT
generate_series(#{start.month}, #{stop.month})) series
ON (generate_series = date_part('month', created_at)
AND author_id = #{creator.id}
AND created_at BETWEEN '#{start.beginning_of_month.to_formatted_s(:db)}' AND
'#{stop.end_of_month.to_formatted_s(:db)}'
)
").
group("generate_series").
order('generate_series ASC')
Your problem is in the where part which is breaks any outer joins. Consider the example:
select *
from a right outer join b on (a.id = b.id)
It will returns all rows from b and linked values from a, but:
select *
from a right outer join b on (a.id = b.id)
where a.some_field = 1
will drops all rows where a is not present.
The right way to do such sings is to place the filter into the join query part:
select *
from a right outer join b on (a.id = b.id and a.some_field = 1)
or use subquery:
select *
from (select * from a where a.some_field = 1) as a right outer join b on (a.id = b.id)

Insert multiple records with join

I need some help in figuring out how to insert more than one records in a table using a join (when the join returns more than one values). So here is the scenerio:
Table A:
A_ID bigserial, Role Varchar(25), Description varchar(25)
Table B:
B_ID bigserial, Role Varchar(25), Code varchar(25)
Table A and B are connected with column Role.
Example Entries in Table_A:
1, A, Standard
2, B , Test
3, C, Test
4, D, Standard
Example Entries in Table_B:
1, A, ABC
2, B, XYZ
3, C, XYZ
4, D, ABC
Basically what I need to do is check for Roles where description = Test, then insert entry for this Custom Role to Table_B with Code = ABC (If entry doesn't exist already)
The following query will give me all the Test description Roles which do not have any entry with Code = ABC in table B
Query1:
SELECT ROLE FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.ROLE=B.ROLE
WHERE A.Description ='Test'
AND B.CODE<>'ABC';
I have the following insert query:
insert into Table_B (Role , Code)
select (SELECT ROLE FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.ROLE=B.ROLE WHERE A.Description ='Test'AND B.CODE<>'ABC'), 'ABC';
The above insert query only works when Query1 returns one role, however I am not sure how to insert into table_A when Query1 returns more than 1 results.
Can someone pls help? Not looking to use Stored Procs for the same
Thanks.
Edited:
Example Entries in Table_A:
1, A, Standard
2, B , Test
3, C, Test
4, D, Standard
5, E, TEST
Example Entries in Table_B:
1, A, ABC
2, B, XYZ
3, B, ABC
4, C, DEF
5, C, XYZ
6, D, ABC
7, E, XYZ
8, E, LLL
Query1 will not work here:
SELECT ROLE FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.ROLE=B.ROLE
WHERE A.Description ='Test'
AND B.CODE<>'ABC';
Using this query now:
SELECT distinct ROLE FROM TB where role not in (
SELECT B.ROLE FROM TA A
INNER JOIN TB B
ON A.ROLE=B.ROLE
WHERE A.Description =Test
AND B.CODE=ABC)
and role in (select role from TA where Description =Test);
How will the insert work now?
You can make another column as 'Code'.
Something like:
insert into Table_B (Role , Code)
SELECT ROLE, 'ABC' CODE FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.ROLE=B.ROLE WHERE A.Description ='Test' AND B.CODE<>'ABC';
So number of columns will be match.

How To Select Top Value

SELECT B.CustomerID, SUM(C.UnitPrice * C.Quantity) AS "Total Value"
FROM Orders B, Order_Det C
WHERE B.OrderID = C.OrderID AND “Total Value” = (SELECT MAX(“Total Value”) FROM Order_Det)
GROUP BY B.CustomerID
ORDER BY "Total Value";
the following code from the above is what i'd tried.
A customer able to make MULTIPLE orders. thus, i want to display the most valuable customer by sum of multiple their quantity purchases and unit price.
The problem i faced is i was unable to archive who is the most valuable customer. Please guide me. Tq
here it is!!
SELECT O.CustomerID, SUM(OD.Quantity*OD.UnitPrice) AS "Total Value"
FROM Orders O
INNER JOIN Order_Det OD ON O.OrderId = OD.OrderId
GROUP BY O.CustomerID
ORDER BY SUM(OD.Quantity*OD.UnitPrice)
or
SELECT O.CustomerID, SUM(OD.Quantity*OD.UnitPrice) AS "Total Value"
FROM Orders O, Order_Det OD
WHERE O.OrderId = OD.OrderId
GROUP BY O.CustomerID
ORDER BY SUM(OD.Quantity*OD.UnitPrice)
and for most valuable customer,
SELECT TOP 1 O.CustomerID
FROM Orders O
INNER JOIN Order_Det OD ON O.OrderId = OD.OrderId
GROUP BY O.CustomerID
ORDER BY SUM(OD.Quantity*OD.UnitPrice) DESC

Resources