SQL Join :: Fetching records outside join condition - join

I have 2 tables A and B
A
B
The requirement is to join both tables using id column and along with that, if the fetched name value is having another record with a different id, that record should also be fetched. Like the below screenshot.
Output :
Requirements
Table B is in the size of TBs. single join of both tables will be
preferable
query needs to be executed on hive

I'm not familiar with HiveQL, but with regular SQL, you'll need to join table B to itself a second time as part of the query.
select
b_name.id, b_name.name
from
#table_A a
join #table_B b -- This table gets the "name" value for lookup
on (a.id=b.id)
join #table_B b_name -- This is the table you want to pull your "output" from
on (b.name=b_name.name)
This query essentially says that you need to find the value of the "name" column in table B, where there is a matching ID in table A, and then lookup all the rows with that name value in table B.

You can join the same table multiple times. So in the query below, b1 will give you all the names for the ids in A, and b2 is joined by name, to get you all the extra ids that are not in A.
select
b2.*
from
A
inner join B b1 on b1.id = A.id
inner join B b2 on b2.name = b1.name

Related

Can I join a table to a coalesce field?

I’m trying to join three tables. Two of these, N1 and N2, share an ID (PK) but they differ in the number of rows, so I have written two coalesce statements to clean up the PK and a related field:
SELECT
(SELECT COALESCE(N1.ID_Year, N2.ID_Year)) AS ID_Year,
(SELECT COALESCE(N1.ReceiverUniqueID, N2.ReceiverUniqueID)) AS AllNetworkRUID,
[…]
FROM N1
FULL OUTER JOIN N2
ON N1.ID_Year_RU = N2.ID_Year_RU
What I need to do is now join the third table, N3, to the second COALESCE field, ie to ‘AllNetworkRUID’, something like:
JOIN N3
ON N3.ID = AllnetworkRUID .
I’ve not been able to work out how to make that happen, of if it’s even possible.
Any suggestions?

How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle).
Ex: For join between two tables
select a.col1,b.col2 from
a join b on a.col1=b.col1
(both the tables are bucketed on col1 into same no of buckets)
But while joining with 3 or more tables on different columns,
Ex:
Select a. col1,b.col3,c.col2,d.date from
a join b on a.id=b.id join c on a.state=b.state join d on c.date=d.date
A scenario like this, how bucketing will help, if we don't want to split up the query in multiple smaller queries.

Solr outer join / not join query

I may be asking too much but I want to do a left outer join between two cores
and get data from A only where B does not have related data.
Following is exactly my equivalent SQL query (for simplicity I have removed other conditions),
1. SELECT A.* FROM A AS A
WHERE A.ID NOT IN (SELECT B.A_ID FROM B AS B WHERE B.STATUS_ID != 1)
I understand that solr join is actually subquery, I need data from only A.
It would be very easy if the not was not there in where condition for sub query.
For example,
2. SELECT A.* FROM A AS A
WHERE A.ID IN (SELECT B.A_ID FROM B AS B WHERE B.STATUS_ID != 1)
I can have q={!join from=aId to=id fromIndex=b}(-statusId:1).
How can I do a nagete here, i.e. solr query for 1

Using self join and left outer join on the same table?

Newbie with SQL development. I got this strange scenario where I want to join 3 tables Table A,B,C. The use case is to return column X which is a primary key in table C. The column X is also a FK in table A and B.
Now I want to create a view by left joining all the 3 tables. The view has have 4 columns, A.id, B.id, C.A_X, C.B_X
The users can either use A.id or B.id to get the data. Now that's the scenario.
How should I join these tables so that I don't miss any values for C.X for every A.id and B.id.
Sample results:
A.id B.id C.A_X C.B_X
1 null ABC null
null 2 null XYZ
Cheers!!

oracle outer join query

I have 3 oracle tables. A joins to B and B joins to C. I want all records from A irerspective of whether a corresponding record exists in B or C. I wrote a query like this:
select a.name from a,b,c where a.a_id = b.b_id(+) and b.b_id = c.c_id(+)
This query does not seem right to me, particularly with the second join. What will exactly happen if there is a record in A but correspondingly nothing in B and C? Will it still fetch the record?
For some reason the above query returns same count of records as select a.name from a
So I am guessing that the query is right? Also is there a better way to rewrite the query?
I presume the better query can be
Select a.name from A a left join B b on a.a_id=b.b_id inner join C c on b.b_id=c.c_id
This should give the result as you have expected
http://rajanmaharjan.com.np

Resources