successive Joins in Rhino.ETL - rhino-etl

I have 3 tables i want to join in 1 table.
How can i do that with Rhino.ETL ?
I know how to join 2 tables, but not 3...
Thanks
John

Use a nested join.
Let's say your tables are A, B and C.
First join A and B to get AB.
Then join AB to C.

Related

How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle).
Ex: For join between two tables
select a.col1,b.col2 from
a join b on a.col1=b.col1
(both the tables are bucketed on col1 into same no of buckets)
But while joining with 3 or more tables on different columns,
Ex:
Select a. col1,b.col3,c.col2,d.date from
a join b on a.id=b.id join c on a.state=b.state join d on c.date=d.date
A scenario like this, how bucketing will help, if we don't want to split up the query in multiple smaller queries.

Solr outer join / not join query

I may be asking too much but I want to do a left outer join between two cores
and get data from A only where B does not have related data.
Following is exactly my equivalent SQL query (for simplicity I have removed other conditions),
1. SELECT A.* FROM A AS A
WHERE A.ID NOT IN (SELECT B.A_ID FROM B AS B WHERE B.STATUS_ID != 1)
I understand that solr join is actually subquery, I need data from only A.
It would be very easy if the not was not there in where condition for sub query.
For example,
2. SELECT A.* FROM A AS A
WHERE A.ID IN (SELECT B.A_ID FROM B AS B WHERE B.STATUS_ID != 1)
I can have q={!join from=aId to=id fromIndex=b}(-statusId:1).
How can I do a nagete here, i.e. solr query for 1

Using self join and left outer join on the same table?

Newbie with SQL development. I got this strange scenario where I want to join 3 tables Table A,B,C. The use case is to return column X which is a primary key in table C. The column X is also a FK in table A and B.
Now I want to create a view by left joining all the 3 tables. The view has have 4 columns, A.id, B.id, C.A_X, C.B_X
The users can either use A.id or B.id to get the data. Now that's the scenario.
How should I join these tables so that I don't miss any values for C.X for every A.id and B.id.
Sample results:
A.id B.id C.A_X C.B_X
1 null ABC null
null 2 null XYZ
Cheers!!

Pig: Outer join on more than 2 relations

I want to do an outer join involving 3 tables. I tried with this:
features = JOIN group_event by group left outer, group_session by group, group_order by group;
I want all the rows of group_event to be present in the output even if one or neither of the other 2 relations have a match for that.
The command above is not working. Obviously since it is not supposed to work (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#JOIN+%28outer%29)
Outer joins will only work for two-way joins; to perform a multi-way outer join, you will need to perform multiple two-way outer join statements.
The split works and can be done like:
features1 = JOIN group_event by group left outer, group_session by group;
features2 = JOIN features1 by group_event::group left outer, group_order by group;
Any ideas to do this in a single command? (Would be useful if am joining even more number of tables)
I think at some point, we need to trust the documentation, don't try single command multiple outer join.
Why? How should the following line work?
JOIN a BY a1 LEFT OUTER, b BY b1, c BY c1
Is the LEFT OUTER working for both tables, or just the first one? If the former, then should the LEFT OUTER between b and c remove all records not matched in b? Or in a? The more you look for it, the less sense it makes, doesn't it?
What you want to do is the JOIN the relation a with b into ab and then ab with c. If you think about it, it is not natural to do it within a single command because of the intermediate state ab.

oracle outer join query

I have 3 oracle tables. A joins to B and B joins to C. I want all records from A irerspective of whether a corresponding record exists in B or C. I wrote a query like this:
select a.name from a,b,c where a.a_id = b.b_id(+) and b.b_id = c.c_id(+)
This query does not seem right to me, particularly with the second join. What will exactly happen if there is a record in A but correspondingly nothing in B and C? Will it still fetch the record?
For some reason the above query returns same count of records as select a.name from a
So I am guessing that the query is right? Also is there a better way to rewrite the query?
I presume the better query can be
Select a.name from A a left join B b on a.a_id=b.b_id inner join C c on b.b_id=c.c_id
This should give the result as you have expected
http://rajanmaharjan.com.np

Resources