Joining and filtering out unnecessary data

Joining and filtering out unnecessary data - join

I need to build a query with the following requirements.
The two tables to use are
MASTER_ARCHIVE and
REP_PROFILE
As of now we are only interested in reps at the wirehouses: Wells Fargo, Morgan Stanley, UBS, Merrill Lynch
To get reps from only these firms, I need to filter the Rep Profile table by Firm ID (The Firm IDs can be found in Firm table), and can filter the Master Archive table on FIRM_CRD
What we need is 2 sets of data:
1) A list of wirehouse reps that are in the Master Archive table, but not in the Rep Profile table
2) A list of wirehouse reps that are in the Rep Profile Table, but not in the Master Archive table
Does anyone have an idea of what type of Joins and filter conditions that I would use to get the data that I'm looking for?
This is what I currently came up with!!!!
SELECT *
FROM MASTER_ARCHIVE E
Left JOIN REP_PROFILE R
ON E.REP_CRD = R.CRD_NUMBER
WHERE E.FIRM_ID IN ('F206','F443','F474','F458')
MINUS
SELECT *
FROM MASTER_ARCHIVE E
JOIN REP_PROFILE R
ON E.REP_CRD = R.CRD_NUMBER
WHERE E.FIRM_ID IN ('F206','F443','F474','F458')
--ORDER BY NAME Name

i don't understand so much, but try with this
SELECT *
FROM MASTER_ARCHIVE E
LEFT JOIN REP_PROFILE R
ON E.REP_CRD = R.CRD_NUMBER
WHERE E.FIRM_ID IN ('F206','F443','F474','F458')
AND R.CRD_NUMBER IS NULL

Related

Unusual Joins SQL

I am having to convert code written by a former employee to work in a new database. In doing so I came across some joins I have never seen and do not fully understand how they work or if there is a need for them to be done in this fashion.
The joins look like this:
From Table A
Join(Table B
Join Table C
on B.Field1 = C.Field1)
On A.Field1 = B.Field1
Does this code function differently from something like this:
From Table A
Join Table B
On A.Field1 = B.Field1
Join Table C
On B.Field1 = C.Field1
If there is a difference please explain the purpose of the first set of code.
All of this is done in SQL Server 2012. Thanks in advance for any help you can provide.

I could create a temp table and then join that. But why use up the cycles\RAM on additional storage and indexes if I can just do it on the fly?
I ran across this scenario today in SSRS - a user wanted to see all the Individuals granted access through an AD group. The user was using a cursor and some temp tables to get the users out of AD and then joining the user to each SSRS object (Folders, reports, linked reports) associated with the AD group. I simplified the whole thing with Cross Apply and a sub query.
GroupMembers table
GroupName
UserID
UserName
AccountType
AccountTypeDesc
SSRSOjbects_Permissions table
Path
PathType
RoleName
RoleDesc
Name (AD group name)
The query needs to return each individual in an AD group associated with each report. Basically a Cartesian product of users to reports within a subset of data. The easiest way to do this looks like this:
select
G.GroupName, G.UserID, G.Name, G.AccountType, G.AccountTypeDesc,
[Path], PathType, RoleName, RoleDesc
from
GroupMembers G
cross apply
(select
[Path], PathType, RoleName, RoleDesc
from
SSRSOjbects_Permissions
where
Name = G.GroupName) S;
You could achieve this with a temp table and some outer joins, but why waste system resources?

I saw this kind of joins - it's MS Access style for handling multi-table joins. In MS Access you need to nest each subsequent join statement into its level brackets. So, for example this T-SQL join:
SELECT a.columna, b.columnb, c.columnc
FROM tablea AS a
LEFT JOIN tableb AS b ON a.id = b.id
LEFT JOIN tablec AS c ON a.id = c.id
you should convert to this:
SELECT a.columna, b.columnb, c.columnc
FROM ((tablea AS a) LEFT JOIN tableb AS b ON a.id = b.id) LEFT JOIN tablec AS c ON a.id = c.id
So, yes, I believe you are right in your assumption

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.

You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

select multiple columns from different tables and join in hive

I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1
What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.
A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2
What should be the correct hive query command? I got a max split error for my initial try.

Does this work?
create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;
If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.

DB2 joins difficulties

I have the following situation (simplified):
2 BiTemp Tables
basicdata (id, btmp_tsd, name, prename)
extendeddata (id, btmp_tsd, basicid, codename, codevalue)
In extendeddata, there can be multible entries for one basicdata with each a different codename and value.
I have to create an SQL to select all rows which have changed since a specified time. For the basicdata table this is relatively simple:
SELECT ID, BTMP_TSD, NAME, PRENAME
FROM BASICDATA BD
WHERE BTMP_TSD =
(SELECT MAX(BTMP_TSD)
FROM BASICDATA BD2
WHERE BD2.ID = BD.PRTNR_ID
AND BD2.BTMP_TSD > :MINTSD
AND BD2.BTMP_TSD <= :MAXTSD
)
ORDER BY ID
WITH UR
Now I will need to Join on the second table to get the codevalue for the codename 'test'. The problem is, it may not exist, in this case, the row should be collected anyway. But if there is a row but not within the timerange, I should not get a result.
I hope I was able to explain my issue. Joins are one of the things I still don't see trough...
Edit:
Okay here's a sample
basicdata:
id,btmp_tsd,name,prename
1,2013-05-25,test,user
2,2013-06-26,user,two
3,2013-06-26,peter,hans
1,2013-06-20,test,us3r
2,2013-10-30,us3r,two
extendeddata:
id,btmp_tsd,basicid,codename,codevalue
1,2013-05-25,1,superadmin,1
2,2013-06-26,3,admin,1
3,2013-11-25,1,superadmin,0
Okay now having these entries and I want all userid's which have had any changes since 2013-10-01 I should get
User1 (Because the extendeddata superadmin had a change)
User2 (Had a Name change and I want him even tough he has no entry on the extendeddata table)
not User3 (He has an entries on both tables but it's not in the specified range)

The following query should do what you want.
select *
from basicdata b left outer join extendeddata e on b.id=e.basicid
where b.btmp_tsd >= '2013-10-01'
or e.btmp_tsd >= '2013-10-01'
DISCLAIMER: I didn't test the sql. So syntax might not be 100% perfect.

Get incremental changes between Hive partitions

I have a nightly job that runs and computes some data in hive. It is partitioned by day.
Fields:
id bigint
rank bigint
Yesterday
output/dt=2013-10-31
Today
output/dt=2013-11-01
I am trying to figure out if there is a easy way to get incremental changes between today and yesterday
I was thinking about doing a left outer join but not sure what that looks like since its the same table
This is what it might looks like when there are different tables
SELECT * FROM a LEFT OUTER JOIN b
ON (a.id=b.id AND a.dt='2013-11-01' and b.dt='2-13-10-31' ) WHERE a.rank!=B.rank
But on the same table it is
SELECT * FROM a LEFT OUTER JOIN a
ON (a.id=a.id AND a.dt='2013-11-01' and a.dt='2-13-10-31' ) WHERE a.rank!=a.rank
Suggestions?

This would work
SELECT a.*
FROM A a LEFT OUTER JOIN A b ON a.id = b.id
WHERE a.dt='2013-11-01' AND b.dt='2013-10-31' AND <your-rank-conditions>;
Efficiently, this would span 1 MapReduce job only.

So I figured it out... Using Subqueries and Joins
select * from (select * from table where dt='2013-11-01') a
FULL OUTER JOIN
(select * from table where dt='2013-10-31') b
on (a.id=b.id)
where a.rank!=b.rank or a.rank is null or b.rank is null
The above will give you the diff..
You can take the diff and figure out what you need to ADD/UPDATE/REMOVE
UPDATE If a.rank!=null and b.rank!=null i.e rank changed
DELETE IF a.rank=null and b.rank!=null i.e the user is no longer ranked
ADD if a.rank!=null and b.rank=null i.e this is a new user

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Joining and filtering out unnecessary data - join

i don't understand so much, but try with this SELECT * FROM MASTER_ARCHIVE E LEFT JOIN REP_PROFILE R ON E.REP_CRD = R.CRD_NUMBER WHERE E.FIRM_ID IN ('F206','F443','F474','F458') AND R.CRD_NUMBER IS NULL

Related

Unusual Joins SQL

Ambiguous column error creating table in Aster Studio 6.0

select multiple columns from different tables and join in hive

DB2 joins difficulties

Get incremental changes between Hive partitions

Categories

Resources