Is it possible to join a column from Table B to Table A without creating a new table C in Hive SQL? - join

I'm working with very large tables in Hive so I'd like to avoid having to create a whole new table when joining a single column from Table 2 to Table 1.
My first pass using the INSERT and UPDATE statements with the following test data didn't work.
Is there a way to do this or is it simpler to just create Table 3 by joining Table 1 to Table 2 and then dropping Table 1?
DROP TABLE IF EXISTS table_1;
CREATE TABLE table_1 (id VARCHAR(64), cost INT, diag_cd VARCHAR(64));
INSERT INTO TABLE table_1
VALUES ('A0001', 1000, 'A1'), ('A0001', 2000, 'B1'), ('A0001', 3000, 'B1'),
('B0001', 5000, 'A1'), ('B0001', 10000, 'B1'), ('B0001', 15000, 'C1'),
('C0001', 11000, 'B1'), ('C0001', 14000, 'C1'), ('C0001', 20000, 'C1');
DROP TABLE IF EXISTS table_2;
CREATE TABLE table_2 (id VARCHAR(64), prodt_cd VARCHAR(64));
INSERT INTO TABLE table_2
VALUES ('A0001', 'OAP'), ('B0001', 'OAPIN'), ('C0001', 'MOAPIN');
INSERT INTO TABLE table_1 prodt_cd VARCHAR(64);
UPDATE table_1 t1 SET t1.prodt_cd = t2.prodt_cd
INNER JOIN table_2 t2
ON t1.id = t2.id;

After some research and help from Mike67 I found a solution.
It appears Hive does not support COLUMN UPDATE or MERGE statements, but a simple alternative is to create an empty table and then populate it with fields from a join:
DROP TABLE IF EXISTS table_3;
CREATE TABLE table_3 LIKE table_1;
INSERT INTO TABLE table_3
SELECT a.*, b.prodt_cd
FROM table_1 AS a
LEFT OUTER JOIN table_2 AS b
ON a.id = b.id;

Related

Join tables in Hive using LIKE

I am joining tbl_A to tbl_B, on column CustomerID in tbl_A to column Output in tbl_B which contains customer ID. However, tbl_B has all other information in related rows that I do not want to lose when joining. I tried to join using like, but I lost rows that did not contain customer ID in the output column.
Here is my join query in Hive:
select a.*, b.Output from tbl_A a
left join tbl_B b
On b.Output like concat('%', a.CustomerID, '%')
However, I lose other rows from output.
You could also achieve the objective by a simple hive query like this :)
select a.*, b.Output
from tbl_A a, tbl_B b
where b.Output like concat('%', a.CustomerID, '%')
I would suggest first extract all ID's from free floating field which in your case is 'Output' column in table B into a separate table. Then join this table with ID's to Table B again to populate in each row the ID and then this second joined table which is table B with ID's to table A.
Hope this helps.

Hive join query to list columns from only one table

I am writing a hive query to join two tables; table1 and table2. In the result I just need all columns from table1 and no columns from table2.
I know the solution where I can select all the columns manually by specifying table1.column1, table1.column2.. and so on in the select statement. But I have about 22 columns in table 1. Also, I have to do the same for multiple other tables ans its painful process.
I tried using "SELECT table1.*", but I get a parse exception.
Is there a better way to do it?
Hive 0.13 onwards the following query syntax works:
SELECT a.* FROM a JOIN b ON (a.id = b.id)
This query will select all columns from a. So instead of typing all the column names (making the query cumbersome), it is a better idea to use tablealias.*

SQL Join based on three keys

Database is Teradata
I have two table which I am trying to join. Following are the table structures. When I join these table I expect to get two rows as output but getting 4 rows.what is reason for this behavior. Join based on three keys should uniquely identify a row but still getting 4 rows as output. Any help is appreciated.
TableA
Weekkey|segment|type|users
201501|1|A|100
201501|1|B|100
TableB
Weekkey|segment|type|revenue
201501|1|A|200
201501|1|B|200
when I join these two table using the following query i get the following result
select a.* ,b.user
from tablea a left join tableb b on a.weekkey=b.weekkey
and a.segment=b.segment
and a.type=b.type
Weekkey|segment|type|revenue|users
201501|1|A|200|100
201501|1|B|200|100
201501|1|A|200|100
201501|1|B|200|100
Using sql server, here is ddl and sample data along with the query you posted. The output you state you are getting doesn't happen here.
create table #tablea
(
Weekkey int
, segment int
, type char(1)
, users int
)
insert #tablea
select 201501, 1, 'A', 100 union all
select 201501, 1, 'B', 100
create table #TableB
(
Weekkey int
, segment int
, type char(1)
, revenue int
)
insert #TableB
select 201501, 1, 'A', 200 union all
select 201501, 1, 'B', 200
select a.*
, b.revenue
from #tablea a
left join #tableb b on a.weekkey = b.weekkey
and a.segment = b.segment
and a.type = b.type
drop table #tablea
drop table #TableB

Left join with where clause not working

I was trying to get only selected rows from table A(not all rows) and rows matching table A from table B, but it shows only matching rows from table A and table B, excluding rest of the selected rows from table A.
I used this condition,
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and B.party_code=2
i have 2 tables:
table 1: A with 3 columns
CategoryName,CategoryCode(PK),ItemType
table 2: B with 2 columns
CategoryCode(FK),Discount,PartyCode(FK)(from another table)
NOTE: working in access 2007
For non-matching rows from table B, party_code = NULL, so your where clause will evaluate to false and therefore the row won't be returned. So, you need to filter the "B" records before joining. Try
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode and B.party_code=2
WHERE A.itemtype='F'
[EDIT] That doesn't work in Access. next try.
You can create a query to do your filter. Let's call it "B_filtered". This is just
SELECT * FROM B where party_code = 2
(You could make the "2" a parameter to make it more flexible).
Then, just use this query in your actual query.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
[EDIT]
Just Googled - I think you can do this directly with a subquery.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN (SELECT * FROM B where party_code = 2) AS B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
What mlinth proposed is correct, and would work for most other SQL languages. The query below is the same basic concept but using a null condition.
Try:
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and (B.party_code=2 OR B.party_code IS NULL)
If party_code is nullable, switch to using the PK or another non-nullable field.

Linq query for joining three tables

I have three tables table1 (main table), table2, table3.
table1 contains table1Id
table2 and table3 contain table2Id, table2RoleId, table3Id, table3RoleId.
Also the same value of table1Id, more than one record in table2Id and table3Id but the table2RoleId's and table3RoleId's are different.
I want to join table1 with table2 and table3 to display like
Table2RoleId and Table3RoleId has to display according to the Table1Id
How can I achieve this?
Thanks
i'm ignore the content of your question and will show you sample left join in linq
var result = from x in table1 join y in table2
on x.tableId1 equals y.tableId1
join z in table3 on x.tableId1 equals z.tableId1
Select new {// your return fields}

Resources