Pig: Outer join on more than 2 relations - join

I want to do an outer join involving 3 tables. I tried with this:
features = JOIN group_event by group left outer, group_session by group, group_order by group;
I want all the rows of group_event to be present in the output even if one or neither of the other 2 relations have a match for that.
The command above is not working. Obviously since it is not supposed to work (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#JOIN+%28outer%29)
Outer joins will only work for two-way joins; to perform a multi-way outer join, you will need to perform multiple two-way outer join statements.
The split works and can be done like:
features1 = JOIN group_event by group left outer, group_session by group;
features2 = JOIN features1 by group_event::group left outer, group_order by group;
Any ideas to do this in a single command? (Would be useful if am joining even more number of tables)

I think at some point, we need to trust the documentation, don't try single command multiple outer join.
Why? How should the following line work?
JOIN a BY a1 LEFT OUTER, b BY b1, c BY c1
Is the LEFT OUTER working for both tables, or just the first one? If the former, then should the LEFT OUTER between b and c remove all records not matched in b? Or in a? The more you look for it, the less sense it makes, doesn't it?
What you want to do is the JOIN the relation a with b into ab and then ab with c. If you think about it, it is not natural to do it within a single command because of the intermediate state ab.

Related

Can I join a table to a coalesce field?

I’m trying to join three tables. Two of these, N1 and N2, share an ID (PK) but they differ in the number of rows, so I have written two coalesce statements to clean up the PK and a related field:
SELECT
(SELECT COALESCE(N1.ID_Year, N2.ID_Year)) AS ID_Year,
(SELECT COALESCE(N1.ReceiverUniqueID, N2.ReceiverUniqueID)) AS AllNetworkRUID,
[…]
FROM N1
FULL OUTER JOIN N2
ON N1.ID_Year_RU = N2.ID_Year_RU
What I need to do is now join the third table, N3, to the second COALESCE field, ie to ‘AllNetworkRUID’, something like:
JOIN N3
ON N3.ID = AllnetworkRUID .
I’ve not been able to work out how to make that happen, of if it’s even possible.
Any suggestions?

How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle).
Ex: For join between two tables
select a.col1,b.col2 from
a join b on a.col1=b.col1
(both the tables are bucketed on col1 into same no of buckets)
But while joining with 3 or more tables on different columns,
Ex:
Select a. col1,b.col3,c.col2,d.date from
a join b on a.id=b.id join c on a.state=b.state join d on c.date=d.date
A scenario like this, how bucketing will help, if we don't want to split up the query in multiple smaller queries.

Oracle Join precedence

When joining multiple tables, involving all join types (inner, left,cross, full etc), is there any specific order in which joins are evaluated, like we have BODMAS in Mathematics or it always reads from left to right in the order we specified.
I don't have the exact SQL but here is a sample SQL:
Select * from
A,
B
LEFT JOIN
C
ON B.ID = C.ID,
D
FULL JOIN
E
ON
D.ID = E.ID;
Will it be like A cross joining with B and the result Left joining with C and so on..?
OR
A cross joining with the Left join of B and C..

oracle outer join query

I have 3 oracle tables. A joins to B and B joins to C. I want all records from A irerspective of whether a corresponding record exists in B or C. I wrote a query like this:
select a.name from a,b,c where a.a_id = b.b_id(+) and b.b_id = c.c_id(+)
This query does not seem right to me, particularly with the second join. What will exactly happen if there is a record in A but correspondingly nothing in B and C? Will it still fetch the record?
For some reason the above query returns same count of records as select a.name from a
So I am guessing that the query is right? Also is there a better way to rewrite the query?
I presume the better query can be
Select a.name from A a left join B b on a.a_id=b.b_id inner join C c on b.b_id=c.c_id
This should give the result as you have expected
http://rajanmaharjan.com.np

successive Joins in Rhino.ETL

I have 3 tables i want to join in 1 table.
How can i do that with Rhino.ETL ?
I know how to join 2 tables, but not 3...
Thanks
John
Use a nested join.
Let's say your tables are A, B and C.
First join A and B to get AB.
Then join AB to C.

Resources