Qlikview - join two table - join

i need to join two table in Qlikview to get result.
Table:
I need to join this two table to get result table like this
Any idea? Can i use cross table and how?

For Table1 you can use CrossTable functionality to "rotate" the table but keeping the first column.
For example:
CrossTable(Location, Quantity)
Load
Reason,
LocA,
LocB
From
[Data.xlsx] (ooxml, embedded labels, table is Table1)
;
The result table after this will be:
Location Reason Quantity
LocA R1 5
LocA R2 4
LocA R3 5
LocA R4 3
LocB R1 2
LocB R2 2
LocB R3 3
LocB R4 5
(you can learn more about CrossTable at Qlik's help site - CrossTable)
After having Table1 in this format you can create composite key (as x3ja suggested). Composite key is basically two (or more) fields concatenated. In your case the join between the tables should be on two fields - Location and Reason.
// CrossTable the data to get it in correct format
Table1_Temp:
CrossTable(Location, Quantity)
Load
Reason,
LocA,
LocB
From
[Data.xlsx] (ooxml, embedded labels, table is Table1)
;
// Resident load to form the composite key
// based on Location and Reason fields
Table1:
Load
Location & '|' & Reason as Key,
Quantity
Resident
Table1_Temp
;
// We dont need Table1_Temp table anymore
Drop Table Table1_Temp;
//Load the second table and create the same composite key
Table2:
Load
Location & '|' & Reason as Key,
Location,
Reason,
Answer
From
[Data.xlsx] (ooxml, embedded labels, table is Table2)
;
After the reload your data model will look like:
And the data:
Notice that the values for Answer, Location, Reason are null in the bottom two rows. This is because the data in Table2 (based on your screenshots) don't contains combination for LocB and R2 and LocA and R4 but Table1 does.
If you want to keep only the combinations that are present in both tables then the approach is similar but with two differences:
Table2 should be loaded first
use keep function to exclude the non common records for being loaded in Table1
(keep at Qlik's help site - keep)
If you want to see the script in action just comment the first tab and uncomment the second one in the example qvw

There are a couple of ways you could do this.
Using association. Load Table 1 twice and concatenate, creating a composite key. So you'd end up with fields ReasonLocation and Quantity. Then load Table 2 creating the same composite key, giving you ReasonLocation, Location, Reason & Answer. Then the tables would associate on that composite key.
Using a join. Load Table1, left join in Table 1 based on Reason with an if statement like if [Location] = 'LocA' then [LocA] else [LocB]. That may need you to load it into a temp table first and do the if statement in a resident load.
You could also combine the two and join the tables in #1 based on the ReasonLocation field.
Hope that helps - sorry it's not fully worked through...

Related

Joining two tables based on matching two columns

I'm trying to join two tables:
Table A has three columns: State, County, and Count (of Farmer's Markets in said county)
Table B has several columns: State, County, and several data columns (like food access score)
I'm trying to combine them in such a way as to put the Count for each State/County combination (since there are multiple counties with the same name) together with the State and County and data columns from Table B.
I've been banging my head on SAS, trying to get a join to cooperate. I read a few other questions on here, but I can't find where the mistake is in my code.
PROC SQL;
CREATE TABLE WORK.QUERY1
AS
SELECT FMDV4.State, FMDV4.County, FMDV4.Count, CFSDV1.GROC14,
CFSDV1.SUPERC14, CFSDV1.CONVS14, CFSDV1.SPECS14, CFSDV1.FOODINSEC_13_15,
CFSDV1.PCT_LACCESS_POP15, CFSDV1.DIRSALES_FARMS12, CFSDV1.FMRKT16,
CFSDV1.FOODHUB16, CFSDV1.CSA12, CFSDV1.POVRATE15, CFSDV1.PERPOV10
FROM FNLPRJT.CFSDV1 AS CFSDV1
INNER JOIN FNLPRJT.FMDV4 AS FMDV4
ON (( CFSDV1.State = FMDV4.State ) AND ( CFSDV1.County =
FMDV4.County ));
QUIT;
I also tried a few variants, like:
PROC SQL;
CREATE TABLE WORK.QUERY1
AS
SELECT FMDV4.State, FMDV4.County, FMDV4.Count, CFSDV1.GROC14,
CFSDV1.SUPERC14, CFSDV1.CONVS14, CFSDV1.SPECS14, CFSDV1.FOODINSEC_13_15,
CFSDV1.PCT_LACCESS_POP15, CFSDV1.DIRSALES_FARMS12, CFSDV1.FMRKT16,
CFSDV1.FOODHUB16, CFSDV1.CSA12, CFSDV1.POVRATE15, CFSDV1.PERPOV10
FROM FNLPRJT.CFSDV1 AS CFSDV1
INNER JOIN FNLPRJT.FMDV4 AS FMDV4
ON CFSDV1.State = FMDV4.State
WHERE CFSDV1.County = FMDV4.County;
QUIT;
I get a table of 0 rows with the columns as they should be (State, County, Count, ). I'm just missing the dang data! Can anyone please help me find my mistake?
Can you try
propcase(CFSDV1.State) = propcase(FMDV4.State)
and
propcase(CFSDV1.County) = propcase(FMDV4.County);
If this doesn't work try character functions like trim and compress to remove any blanks that might be present in the data.

create new field based on multiple resident tables

Given multiple in-resident tables, I'd like to create a new field based on fields in different tables.
table1:
LOAD * INLINE [
id1,val1
a1,car1
a2,car1
];
table2:
LOAD * INLINE [
id2,id1,val2
b1,a1,type1
b2,a2,type2
];
table3:
LOAD * INLINE [
id3,id2,val3
c1,b1,mfr1
c2,b2,mfr2
];
For the sake of argument, assume table1 has ~1M rows, table2 ~1K rows, and table3 ~10 rows. I'd like to create a new field that is either added to table1 or perhaps in a new table linked by id1, resulting in:
id1 val1 newval
a1 car1 car1type1mfr1
a2 car2 car2type2mfr2
Efforts:
newtable:
load val1 & val2 & val3 as newval;
No errors but no newtable or newval.
newtable:
left join (table2)
load val1&val2 as newval resident table1;
Errs with Field not found - <val2>. (Obviously I want to extend this to include table3, but if I can't do it with 2 tables then 3 just won't work.
The real data includes seven tables for this new field (lots of foreign keys). The data is being loaded from QVDs (the data is shared across multiple QVWs), closely mimicking a SQL database; none of the tables are row-wise redundant, so combining db tables into a single QVD table may be inefficient. (Plus refreshing the data is incredibly easier one table at a time.) A colleague suggested I load-join each of the QVDs into one huge table, but that doesn't seem right (nor have I successfully chain-joined even a few tables).
Using QV 12.0 desktop on win10-x64 for deployment on QVS.
#TheBudac's was part of the way there, but it only merged two of the three. Most of the problems were stemming from incorrect multi-table joins. My confusion was in the "join" syntax in Qlik; the docs make sense to me now that I see what's happening, but it wasn't as obvious to me initially.
Here's what eventually worked best for me:
temptable:
load id1 as id1a, val1 as val1a
resident table1;
left join (temptable)
load id2 as id2a, id1 as id1a, val2 as val2a
resident table2;
left join (temptable)
load id2 as id2a, val3 as val3a
resident table3;
newtable:
load id1a as id1,
val1a & val2a & val3a as newval
resident temptable;
drop table temptable;
This produced these tables:
and this tree:
Quick walk-through:
Because I'm using left join, I start with the largest table; other joins would dictate different starting condition requirements. In my case, table1 was representing the largest, so I start with that:
temptable:
load id1 as id1a, val1 as val1a
resident table1;
Each join should be against the temporary table we're working on. Renaming variables is important so that Qlik doesn't create unnecessary synthetic keys.
left join (temptable)
load id2 as id2a, id1 as id1a, val2 as val2a
resident table2;
The use of resident is important in that it does not re-query (SQL) or re-load (QVD or other file).
Repeat with the third and further tables, always joining against temptable with the new table.
Now we use that temporary table to create our new table. You can choose to augment table1 with this data instead (certainly feasible), but for me since I'm generating several new calculated fields (not shown here), it made sense to keep them logically separated.
newtable:
load id1a as id1,
val1a & val2a & val3a as newval
resident temptable;
drop table temptable;
Note that I rename the relevant key back to its original value so that this table correctly links to table. Dropping the temporary table helps clean things up, but it does no harm to keep it around (and doing so helps in debugging/learning).
Your join is the wrong way round and QlikView can only work results after they have been joined,not in process, so you will have to do another resident load to get the values concatenated into Newval. The drop table commands are important or you will end up with massive unintentional syn tables
newtable:
left join (table1)
load * resident table2; drop table 2;
Resulttable:
load id1,
val1&val2 as NewVal
resident newtable; drop newtable;

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.
You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

select multiple columns from different tables and join in hive

I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1
What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.
A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2
What should be the correct hive query command? I got a max split error for my initial try.
Does this work?
create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;
If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

Resources