Snowflake joins performance improvement - join

I need to create a view on top of a table which contains 1300+ columns. New data will be loaded to table every quarter(Rows in millions). While creating view I need to join other table with the base table. and i also needed to add a recent row indicator in view.
CREATE OR REPLACE SECURE VIEW VIEW_NAME AS
SELECT lkp_tbl.col1,base_tbl.col1,base_tbl.col2,base_tbl.col3,........,
base_tbl.col1334, 1 as Is_Latest_Quarter
FROM base_tbl full outer JOIN lkp_tbl
on base_tbl.CUST_ID = lkp_tbl.CUST_ID
where snapshot_dt=(select max(snapshot_dt) from base_tbl)
union all
SELECT lkp_tbl.col1,base_tbl.col1,base_tbl.col2,base_tbl.col3,........,
base_tbl.col1334,0 as Is_Latest_Quarter
FROM base_tbl full outer JOIN lkp_tbl
on base_tbl.CUST_ID = lkp_tbl.CUST_ID
where snapshot_dt!=(select max(snapshot_dt) from base_tbl);
After creating this view the performance of the query is too slow even if we are querying 100 rows. Is there a way in which we can create view in more efficient way. If not how can i increase performance?

just use one SELECT statement and use a CASE statement to calculate Is_Latest_Quarter
UPDATED WITH (ALMOST) ACTUAL SQL
CREATE OR REPLACE SECURE VIEW VIEW_NAME AS
SELECT {list of columns you want to include}
,CASE WHEN snapshot_dt=(select max(snapshot_dt) from base_tbl) THEN 1
ELSE 0 END as Is_Latest_Quarter
FROM base_tbl
full outer JOIN lkp_tbl on base_tbl.CUST_ID = lkp_tbl.CUST_ID
Alternatively, if Snowflake doesn't like that in-line subquery, your could use a CTE something like:
CREATE OR REPLACE SECURE VIEW VIEW_NAME AS
WITH MAX_DATE AS (SELECT MAX(Ssnapshot_dt) AS max_snapshot_dt FROM base_tbl),
SELECT {list of columns you want to include}
,CASE WHEN max_date.max_snapshot_dt is not null THEN 1
ELSE 0 END as Is_Latest_Quarter
FROM base_tbl
full outer JOIN lkp_tbl on base_tbl.CUST_ID = lkp_tbl.CUST_ID
LEFT OUTER JOIN MAX_DATE ON base_tbl.snapshot_dt = max_date.max_snapshot_dt

Related

INNER JOIN on a table witch do not have any data

I'm trying to use the INNER JOIN functionality in my phpMyAdmin.
My query looks like this:
SELECT * FROM ___Bookings INNER JOIN ___Origins ON ___Bookings.BOO_Origin=___Origins.ORI_Id WHERE BOO_Id=1.
The problem is at this step nothing is populated into the ___Origins table. So my query returns 0 row.
How to change my query to return a row even if I do not have the joined table populated ?
Also what's the difference between JOIN and INNER JOIN ?
Thanks so much.
Basically, you want to join the tables based on the data in the __BOOKINGS table. That's a job for LEFT JOIN, not INNER JOIN (which is the same as JOIN).
Refer here for more information on SQL Joins: http://www.sql-join.com/sql-join-types/
There isn't a difference between Join and Inner Join (you can search for it also if this isn't enough).
For the other part - how can it return a row when there isn't one?
If you need a response from PHP you can set something like
if ($query->rowCount() > 0) {
echo $records;
}
else {echo "N";}
And in you other code just state if the response is "N" (or something else) do something or not do anything.

ActiveRecord using pluck with includes/left outer joins

When I do includes it left joins the table I want to filter on, but when I add pluck that join disappears. Is there any way to mix pluck and left join without manually typing the sql for 'left join'
Here's my case:
Select u.id
From users u
Left join profiles p on u.id=p.id
Left join admin_profiles a on u.id=a.uid
Where 2 in (p.prop, a.prop, u.prop)
Doing this is just loading all the values:
Users.includes(:AdminProfiles, :Profiles).where(...).map{ |a| a[:id] }
But when I do pluck instead of map, it doesn't left join the profile tables.
Your problem is that you're using includes which doesn't really do a join, instead it fires a second query after the first one to query for the associations, in your case you want them both to be actually joined, so for that replace includes(:something) with joins(:something) and every thing should work fine.
Replying to your comment, i'm gonna quote few parts from the rails guide about active record query interface
From the section Solution to N + 1 queries problem
clients = Client.includes(:address).limit(10)
clients.each do |client|
puts client.address.postcode
end
The above code will execute just 2 queries, as opposed to 11 queries in the previous case:
SELECT * FROM clients LIMIT 10
SELECT addresses.* FROM addresses WHERE (addresses.client_id IN (1,2,3,4,5,6,7,8,9,10))
as you can see, two queries, no joins at all.
From the section Specifying Conditions on Eager Loaded Associations link
Even though Active Record lets you specify conditions on the eager loaded associations just like joins, the recommended way is to use joins instead.
Then an example:
Article.includes(:comments).where(comments: { visible: true })
This would generate a query which contains a LEFT OUTER JOIN whereas the joins method would generate one using the INNER JOIN function instead.
SELECT "articles"."id" AS t0_r0, ... "comments"."updated_at" AS t1_r5 FROM "articles" LEFT OUTER JOIN "comments" ON "comments"."article_id" = "articles"."id" WHERE (comments.visible = 1)
If there was no where condition, this would generate the normal set of two queries.

Left join with where clause not working

I was trying to get only selected rows from table A(not all rows) and rows matching table A from table B, but it shows only matching rows from table A and table B, excluding rest of the selected rows from table A.
I used this condition,
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and B.party_code=2
i have 2 tables:
table 1: A with 3 columns
CategoryName,CategoryCode(PK),ItemType
table 2: B with 2 columns
CategoryCode(FK),Discount,PartyCode(FK)(from another table)
NOTE: working in access 2007
For non-matching rows from table B, party_code = NULL, so your where clause will evaluate to false and therefore the row won't be returned. So, you need to filter the "B" records before joining. Try
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode and B.party_code=2
WHERE A.itemtype='F'
[EDIT] That doesn't work in Access. next try.
You can create a query to do your filter. Let's call it "B_filtered". This is just
SELECT * FROM B where party_code = 2
(You could make the "2" a parameter to make it more flexible).
Then, just use this query in your actual query.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
[EDIT]
Just Googled - I think you can do this directly with a subquery.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN (SELECT * FROM B where party_code = 2) AS B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
What mlinth proposed is correct, and would work for most other SQL languages. The query below is the same basic concept but using a null condition.
Try:
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and (B.party_code=2 OR B.party_code IS NULL)
If party_code is nullable, switch to using the PK or another non-nullable field.

select multiple columns from different tables and join in hive

I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1
What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.
A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2
What should be the correct hive query command? I got a max split error for my initial try.
Does this work?
create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;
If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.

Get incremental changes between Hive partitions

I have a nightly job that runs and computes some data in hive. It is partitioned by day.
Fields:
id bigint
rank bigint
Yesterday
output/dt=2013-10-31
Today
output/dt=2013-11-01
I am trying to figure out if there is a easy way to get incremental changes between today and yesterday
I was thinking about doing a left outer join but not sure what that looks like since its the same table
This is what it might looks like when there are different tables
SELECT * FROM a LEFT OUTER JOIN b
ON (a.id=b.id AND a.dt='2013-11-01' and b.dt='2-13-10-31' ) WHERE a.rank!=B.rank
But on the same table it is
SELECT * FROM a LEFT OUTER JOIN a
ON (a.id=a.id AND a.dt='2013-11-01' and a.dt='2-13-10-31' ) WHERE a.rank!=a.rank
Suggestions?
This would work
SELECT a.*
FROM A a LEFT OUTER JOIN A b ON a.id = b.id
WHERE a.dt='2013-11-01' AND b.dt='2013-10-31' AND <your-rank-conditions>;
Efficiently, this would span 1 MapReduce job only.
So I figured it out... Using Subqueries and Joins
select * from (select * from table where dt='2013-11-01') a
FULL OUTER JOIN
(select * from table where dt='2013-10-31') b
on (a.id=b.id)
where a.rank!=b.rank or a.rank is null or b.rank is null
The above will give you the diff..
You can take the diff and figure out what you need to ADD/UPDATE/REMOVE
UPDATE If a.rank!=null and b.rank!=null i.e rank changed
DELETE IF a.rank=null and b.rank!=null i.e the user is no longer ranked
ADD if a.rank!=null and b.rank=null i.e this is a new user

Resources