Evening All,
I have been chipping away at this one for a while and for some reason i just can't seem to get my logic to return the way I expect it to.
I have 3 Data tables as well as 3 business concept linking tables.
Table1
Table2
Table3
Rules:
Table 1 can be linked to Table 2
Table 1 can be directly linked to table 3
Table 1 can be indirectly linked to table 3 via table 2
I have tried a fair few variations however It seems to truncate records.
SELECT
*
FROM
Table1 T1
INNER JOIN Table1_to_Table2_Link L1 on L1.T1_ID = T1.ID
RIGHT JOIN TABLE2 T2 ON L1.T2_ID = T2.ID
INNER JOIN Table2_to_Table3_Link L2 ON L2.T2_ID = T2.ID
Right JOIN Table3 T3 ON L2.T3_ID = T3.ID
INNER JOIN Table1_to_Table3_Link L3 on T1.ID = L3.T1_ID
Its a bit awkward to explain but in summart
I require All the Data from Table 1
And only the Data in Tables 2 and 3 if they are directly/indirectly related to table 1. Tables 2 and 3 don't necessarily have to have a related business concept.
The Return Expected is;
Any assistance would be kindly appreciated
You were right. It was not that simple. However I could get desired output by below query
SELECT
T1.*,
T2.*,
T3.*
FROM
Table1 T1
LEFT JOIN Table1_to_Table2_Link L1 on T1.ID = L1.T1_ID
LEFT JOIN TABLE2 T2 ON T2.ID = L1.T2_ID
LEFT JOIN (
SELECT T1_ID AS ID,T3_ID AS table3Id FROM dbo.Table1_to_Table3_Link
UNION ALL
SELECT T2_ID AS ID,T3_ID AS table3Id FROM dbo.Table2_to_Table3_Link
) S
ON T1.ID = s.ID
OR t2.ID = s.id
LEFT JOIN dbo.Table3 T3 ON S.table3Id = T3.ID
Hope it helps.
Related
I've got the following problem. In my oracle db I have query as follows:
select * from table1 t1
inner join table2 t2 on
(t1.id_1= t2.id_1 or t1.id_2 = t2.id_2)
and it works perfectly.
Nowadays I need to re-write query on hive. I've seen that OR clause doesn't work in JOINS in hive (error warning : 'OR not supported in JOIN').
Is there any workaround for this except splitting query between two separate and union them?
Another way is to union two joins, e.g.,
select * from table1 t1
inner join table2 t2 on
(t1.id_1= t2.id_1)
union all
select * from table1 t1
inner join table2 t2 on
(t1.id_2 = t2.id_2)
Hive does not support non-equi joins. Common approach is to move join ON condition to the WHERE clause. In the worst case it will be the CROSS JOIN + WHERE filter, like this:
select *
from table1 t1
cross join table2 t2
where (t1.id_1= t2.id_1 or t1.id_2 = t2.id_2)
It may work slow because of rows multiplication by CROSS JOIN.
You can try to do two LEFT joins instead of CROSS and filter out cases when both conditions are false (like INNER JOIN in your query). This may perform faster than cross join because will not multiply all the rows. Also columns selected from second table can be calculated using NVL() or coalesce().
select t1.*,
nvl(t2.col1, t3.col1) as t2_col1, --take from t2, if NULL, take from t3
... calculate all other columns from second table in the same way
from table1 t1
left join table2 t2 on t1.id_1= t2.id_1
left join table2 t3 on t1.id_2 = t3.id_2
where (t1.id_1= t2.id_1 OR t1.id_2 = t3.id_2) --Only joined records allowed likke in your INNER join
As you asked, no UNION is necessary.
sorry for the late response.
For a key in table A, there may be 2 or more records present in tables B and C. That is, one another column in these tables will have a date value which would be making the keys unique. So I want to extract the record that has maximum date value. And that's why I am using the max function. I know that the subquery which I have coded should not be included in the ON clause and it would do the filtering before the join statement. So eventually I want to know how to mention the max clause in the query.
Example:
Table A
Key - AAAAA
Table B:
Record 1
Key - AAAAA
Date - 2017-10-01
Record 2
Key - AAAAA
Date - 2017-10-05
I want the only the record AAAAA/2017-10-05 to be selected from the table B
Basically records from table A where A.c3 = 'Y' should be extracted first (assume it gives 500 records)
Then join these 500 records with tables B and C (left outer, to have all the matching records and the non-matching records should have nulls in the columns from the tables B and C)
In tables B and C, if more than 1 record present with different dates, the maximum date field should be extracted.
Hence final output should contain 500 records.
This is all you need for what you describe
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = ‘Y’
These lines are causing your problem...basically forcing your outer joins to an inner joins.
AND B.C3 = (SELECT MAX(B3) FROM TABLE2 T1
WHERE T1.B1 = B.B1)
AND C.C3 = (SELECT MAX(C3) FROM TABLE3 T1
WHERE T1.C1 = C.C1)
If there's no match in B or C , then B.C3 and/or C.C3 will be NULL and NULL can't be = to anything (or <> to anything for that matter)
What are you trying to accomplish with the above that you've not included in the question?
Just do it?
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = 'Y' and (B.B1 is null or C.B1 is null)
I am unable to pass the equality check using the below HIVE query.
I have 3 table and i want to join these table. I trying as below, but get error :
FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date'
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 ON
( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
EDITED based on help from FuzzyTree:
1st:
We tried to edit above query using between and where clause, but not getting any output from the query.
But If we changed the above query by removing the between clause with date, then I got some output based on "v3.adv_price <= t3.comp_price", but not using "date filter".
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
2nd :
Next we tried to pass only one date as :
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price and v3.start_dt <= t3.visit_date
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
So, now it's showing some result but if we pass both the start and end date filter, it; not showing any result.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job.
Try moving your inequalities to the where clause
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where t3.visit_date between v3.start_dt and v3.end_dt
and v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
I have an sql query below that is taking too long to execute. kindly check the query and optimise it for me, i need to count number of files from a file_Actions table but combining it three other tables using inner join
SELECT count(*) as total
FROM (SELECT t1.cfid as cfid,MAX(t1.timestamp) d
FROM file_actions t1
INNER JOIN case_files t2 ON t2.cfid=t1.cfid
INNER JOIN case_file_allocations t3 ON t1.cfid=t3.cfid
INNER JOIN cbeta_user t4
WHERE t4.id=t1.user_id
AND t4.team_leader='$user' and t2.closed<>'yes' AND
t2.deleted<>1 AND
t3.reallocated<>'yes' GROUP BY t1.cfid) a
WHERE d < '$yesterday'
I think it is the inner joins that causes the query to take so long to execute causing the system to slow
Try including the WHERE d < '$yesterday' into the subquery a. Remove the fields and place the Count(*). If your tables aren't indexed on those values that you are using for conditions and relations, try to make an index.
SELECT count(*) as total
FROM file_actions t1
INNER JOIN case_files t2
ON t2.cfid=t1.cfid
INNER JOIN case_file_allocations t3
ON t1.cfid=t3.cfid
INNER JOIN cbeta_user t4
WHERE t4.id=t1.user_id
AND t4.team_leader='$user' and t2.closed<>'yes'
AND t2.deleted<>1
AND t3.reallocated<>'yes'
AND d < '$yesterday'
GROUP BY t1.cfid
Recommended reading: http://www.code-fly.com/5-tips-to-make-your-sql-queries-faster/
I have three tables table1 (main table), table2, table3.
table1 contains table1Id
table2 and table3 contain table2Id, table2RoleId, table3Id, table3RoleId.
Also the same value of table1Id, more than one record in table2Id and table3Id but the table2RoleId's and table3RoleId's are different.
I want to join table1 with table2 and table3 to display like
Table2RoleId and Table3RoleId has to display according to the Table1Id
How can I achieve this?
Thanks
i'm ignore the content of your question and will show you sample left join in linq
var result = from x in table1 join y in table2
on x.tableId1 equals y.tableId1
join z in table3 on x.tableId1 equals z.tableId1
Select new {// your return fields}