I am joining tbl_A to tbl_B, on column CustomerID in tbl_A to column Output in tbl_B which contains customer ID. However, tbl_B has all other information in related rows that I do not want to lose when joining. I tried to join using like, but I lost rows that did not contain customer ID in the output column.
Here is my join query in Hive:
select a.*, b.Output from tbl_A a
left join tbl_B b
On b.Output like concat('%', a.CustomerID, '%')
However, I lose other rows from output.
You could also achieve the objective by a simple hive query like this :)
select a.*, b.Output
from tbl_A a, tbl_B b
where b.Output like concat('%', a.CustomerID, '%')
I would suggest first extract all ID's from free floating field which in your case is 'Output' column in table B into a separate table. Then join this table with ID's to Table B again to populate in each row the ID and then this second joined table which is table B with ID's to table A.
Hope this helps.
Related
I'm trying to join to tables using PROQ SQL. One of the columns I'm using for the join has a space in the column name. The query I'm using:
PROC SQL;
CREATE TABLE TEST AS
SELECT a.*, b.*
FROM TABLE_1 a
INNER JOIN TABLE_2 b
ON a.CONTNO = b."Contract Number";
RUN;
This is the error I'm getting:
ERROR 22-322: Syntax error, expecting one of the following: a name, *.
How do I fix this?
You just need to add square brackets around the Column name. For example:
b.[Contract Number]
Tips: Using alias (a, b) can be costly. When you only have one table to join, consider typing out the table rather than doing an alias.
I am trying to join two datasets based on a flag and id.
i.e
proc sql;
create table demo as
select a.*,b.b1,b.2
from table1 a
left join table2 on
(a.flag=b.flag and a.id=b.id) or (a.flag ne b.flag and a.id=b.id)
end;
This code runs into a loop and never produces a output.
I want to make sure that where there are flag values matching get the attributes; if not get the attributes at id level so that we do not have blank values.
This join condition cannot be optimized. It is not a good practice to use or in a join. If you check your log, you'll see this:
NOTE: The execution of this query involves performing one or more Cartesian product joins
that can not be optimized.
Instead, transform your query to do a union:
proc sql;
create table demo as
select a.*,
b.b1,
b.b2
from table1 as a
left join
table2 as b
on a.flag=b.flag and a.id=b.id
UNION
select a.*,
b.b1,
b.b2
from table1 as a
left join
table2 as b
on a.flag ne b.flag and a.id=b.id
;
quit;
I am writing a hive query to join two tables; table1 and table2. In the result I just need all columns from table1 and no columns from table2.
I know the solution where I can select all the columns manually by specifying table1.column1, table1.column2.. and so on in the select statement. But I have about 22 columns in table 1. Also, I have to do the same for multiple other tables ans its painful process.
I tried using "SELECT table1.*", but I get a parse exception.
Is there a better way to do it?
Hive 0.13 onwards the following query syntax works:
SELECT a.* FROM a JOIN b ON (a.id = b.id)
This query will select all columns from a. So instead of typing all the column names (making the query cumbersome), it is a better idea to use tablealias.*
I was trying to get only selected rows from table A(not all rows) and rows matching table A from table B, but it shows only matching rows from table A and table B, excluding rest of the selected rows from table A.
I used this condition,
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and B.party_code=2
i have 2 tables:
table 1: A with 3 columns
CategoryName,CategoryCode(PK),ItemType
table 2: B with 2 columns
CategoryCode(FK),Discount,PartyCode(FK)(from another table)
NOTE: working in access 2007
For non-matching rows from table B, party_code = NULL, so your where clause will evaluate to false and therefore the row won't be returned. So, you need to filter the "B" records before joining. Try
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode and B.party_code=2
WHERE A.itemtype='F'
[EDIT] That doesn't work in Access. next try.
You can create a query to do your filter. Let's call it "B_filtered". This is just
SELECT * FROM B where party_code = 2
(You could make the "2" a parameter to make it more flexible).
Then, just use this query in your actual query.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
[EDIT]
Just Googled - I think you can do this directly with a subquery.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN (SELECT * FROM B where party_code = 2) AS B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
What mlinth proposed is correct, and would work for most other SQL languages. The query below is the same basic concept but using a null condition.
Try:
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and (B.party_code=2 OR B.party_code IS NULL)
If party_code is nullable, switch to using the PK or another non-nullable field.
I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1
What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.
A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2
What should be the correct hive query command? I got a max split error for my initial try.
Does this work?
create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;
If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.