Left join with where clause not working - join

I was trying to get only selected rows from table A(not all rows) and rows matching table A from table B, but it shows only matching rows from table A and table B, excluding rest of the selected rows from table A.
I used this condition,
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and B.party_code=2
i have 2 tables:
table 1: A with 3 columns
CategoryName,CategoryCode(PK),ItemType
table 2: B with 2 columns
CategoryCode(FK),Discount,PartyCode(FK)(from another table)
NOTE: working in access 2007

For non-matching rows from table B, party_code = NULL, so your where clause will evaluate to false and therefore the row won't be returned. So, you need to filter the "B" records before joining. Try
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode and B.party_code=2
WHERE A.itemtype='F'
[EDIT] That doesn't work in Access. next try.
You can create a query to do your filter. Let's call it "B_filtered". This is just
SELECT * FROM B where party_code = 2
(You could make the "2" a parameter to make it more flexible).
Then, just use this query in your actual query.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
[EDIT]
Just Googled - I think you can do this directly with a subquery.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN (SELECT * FROM B where party_code = 2) AS B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'

What mlinth proposed is correct, and would work for most other SQL languages. The query below is the same basic concept but using a null condition.
Try:
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and (B.party_code=2 OR B.party_code IS NULL)
If party_code is nullable, switch to using the PK or another non-nullable field.

Related

Run 2 select statement in DB2 depending of the result of one of them

I'm trying to do an SP in DB2 with 2 select statements. If the first select returns null, perform the second one.
For example
Select a, b, c from table A where...
--If first select returns null
Select a, from table B where...
I tried a lot of ideas but none of them worked.
Thanks
You can use this general pattern, of course you will have to adapt your two result sets to match
WITH first AS
(
SELECT ..result1.. FROM table1
WHERE ..clause1..
)
SELECT ..result1.. FROM first
UNION
SELECT ..result2.. FROM table2
WHERE 0=(SELECT COUNT(1) FROM first)
AND
..clause2..
Here is a simple way to write that
Select a, from table B where...
and not exists (select * from table a where...)
union
select a,.. from table A)

Left outer join with 3 tables and subquery

sorry for the late response.
For a key in table A, there may be 2 or more records present in tables B and C. That is, one another column in these tables will have a date value which would be making the keys unique. So I want to extract the record that has maximum date value. And that's why I am using the max function. I know that the subquery which I have coded should not be included in the ON clause and it would do the filtering before the join statement. So eventually I want to know how to mention the max clause in the query.
Example:
Table A
Key - AAAAA
Table B:
Record 1
Key - AAAAA
Date - 2017-10-01
Record 2
Key - AAAAA
Date - 2017-10-05
I want the only the record AAAAA/2017-10-05 to be selected from the table B
Basically records from table A where A.c3 = 'Y' should be extracted first (assume it gives 500 records)
Then join these 500 records with tables B and C (left outer, to have all the matching records and the non-matching records should have nulls in the columns from the tables B and C)
In tables B and C, if more than 1 record present with different dates, the maximum date field should be extracted.
Hence final output should contain 500 records.
This is all you need for what you describe
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = ‘Y’
These lines are causing your problem...basically forcing your outer joins to an inner joins.
AND B.C3 = (SELECT MAX(B3) FROM TABLE2 T1
WHERE T1.B1 = B.B1)
AND C.C3 = (SELECT MAX(C3) FROM TABLE3 T1
WHERE T1.C1 = C.C1)
If there's no match in B or C , then B.C3 and/or C.C3 will be NULL and NULL can't be = to anything (or <> to anything for that matter)
What are you trying to accomplish with the above that you've not included in the question?
Just do it?
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = 'Y' and (B.B1 is null or C.B1 is null)

Join tables in Hive using LIKE

I am joining tbl_A to tbl_B, on column CustomerID in tbl_A to column Output in tbl_B which contains customer ID. However, tbl_B has all other information in related rows that I do not want to lose when joining. I tried to join using like, but I lost rows that did not contain customer ID in the output column.
Here is my join query in Hive:
select a.*, b.Output from tbl_A a
left join tbl_B b
On b.Output like concat('%', a.CustomerID, '%')
However, I lose other rows from output.
You could also achieve the objective by a simple hive query like this :)
select a.*, b.Output
from tbl_A a, tbl_B b
where b.Output like concat('%', a.CustomerID, '%')
I would suggest first extract all ID's from free floating field which in your case is 'Output' column in table B into a separate table. Then join this table with ID's to Table B again to populate in each row the ID and then this second joined table which is table B with ID's to table A.
Hope this helps.

Left outer join with Where Clause

I am experienced with Access and about 12 months into SQL Server SSMS.
I am not getting results I expect with a left outer join, and I don't know why. Maybe I don't understand something.
I have Table 1 (the left side) with 600k products
I have table 2 with 150,000 products (sub set of table 1).
When I do this
SELECT [Product_Code], [Product_Desc], Store
FROM [Product Range]
I get 600,000 records
When I do a left join like this
SELECT [Product_Code], [Product_Desc], r.store, soh.SOH
FROM [Product Range] as r
LEFT JOIN [dbo].SOH as soh on r.[Product_Code] = soh.PRODUCT_Code
AND r.store = soh.store
WHERE soh.CalYearWeek=1512
I get 500k records. But I am confused. I thought a left join was supposed to return me all records from my left table regardless of anything else.
I then tried this (and I don't know why I would need to add the Null condition anyway)
SELECT [Product_Code],[Product_Desc],r.store,soh.SOH
FROM [Product Range] as r
LEFT OUTER JOIN [dbo].SOH as soh on r.[Product_Code] = soh.PRODUCT_Code
AND r.store = soh.store
WHERE soh.CalYearWeek=1512 or soh.CalYearWeek is null
and I got 550,000 records - still not the full 600k.
I am completely confused and don't know what is wrong. Can anyone help me please :-)
Matt
The problem us the WHERE conditions are executed after the join is made, so soh.CalYearWeek=1512 will only be true for successful joins - missed joins have all nulls, and the where clause filters them out.
The solution is simple: Move the condition into the join:
SELECT [Product_Code], [Product_Desc], r.store, soh.SOH
FROM [Product Range] as r
LEFT JOIN [dbo].SOH as soh on r.[Product_Code] = soh.PRODUCT_Code
AND r.store = soh.store
AND soh.CalYearWeek=1512
Conditions on the join are executed as the join is being made, so you'll still get your left join, but only to rows in the right table that have that special condition.
Putting non-null conditions on the right table in the WHERE clause effectively turns a LEFT join into an INNER join, since the right table can only have a non-null value if the join was successful.
You're correct in that a basic left join with no WHERE clauses will return a row for all records in the LEFT table with either data for the RIGHT table when it exists, or NULL where it doesn't.
And that is what you're getting, but then you're adding a WHERE clause which will filter out certain rows. So if you just had :
SELECT [Product_Code] ,[Product_Desc] ,r.store ,soh.SOH
FROM [Product Range] as r left join [dbo].SOH as soh
on r.[Product_Code] = soh.PRODUCT_Code
and r.store = soh.store
Then you would be seeing 600k records returned.
But then you're removing the 100k records where soh.CalYearWeek is not 1512 with the line :
WHERE soh.CalYearWeek=1512
By adding the :
or soh.CalYearWeek is null
You are adding back 50k more records where that is true. So basically, the WHERE clause acts upon the whole set of records at that time (after the join has taken place) and filters out rows which don't match. The mention of RIGHTTABLE.COLUMN in a where clause is really just because by then, the column in the full row is decribed by that full identifier rather than just its column name alone.
In fact the problem is not in WHERE clause. The problem, if you can call this a problem, is in JOIN itself and how it behaves. In fact you can get exactly 600K rows, no rows at all, less then 600K rows or even more then 600K rows. It depends on data in those tables.
You should understand difference between putting predicates in JOIN condition and WHERE clause. There is a big difference. Also you should understand how predicates work with NULLs.
If you have a row with code 'A' in left table, and no row with code 'A' in right table you will get one row from left table and NULLs from right table. If in right table you have one row with code 'A' you will get 1 row from left and one row from right. If you have N rows with code 'A' in left table and M rows with code 'A' in right one, you will get M*N rows in result.
To summarize here is formula for calculating number of rows in result set when using LEFT JOIN:
COUNT = Count of rows from left table where there are no corresponding rows from right table + SUM(COUNT(code[i])*COUNT(code[i])), i.e. sum of cartesian product of counts of distinct matching codes from both tables.
You get at least 600K rows after left join. In year column you can get NULLs in two ways: 1. there was no corresponding row for code in right table, 2. there was corresponding row from right table but column year is NULL itself.
When you are further filtering resultset with soh.CalYearWeek=1512, rows with NULLs and different values are eliminated from result.
Consider example:
DECLARE #t1 TABLE(Code INT)
DECLARE #t2 TABLE(Code INT, Year INT)
INSERT INTO #t1 VALUES
(1), (2), (3)
SELECT * FROM #t1 t1
JOIN #t2 t2 ON t2.Code = t1.Code
WHERE t2.Year = 1512
And now different results depending on data in second table:
--count 1
INSERT INTO #t2 VALUES
(1, 1512)
--count 0
INSERT INTO #t2 VALUES
(1, NULL)
--count 3
INSERT INTO #t2 VALUES
(1, 1512), (1, 1512), (1, 1512)
--count 6
INSERT INTO #t2 VALUES
(1, 1512), (2, 1512), (2, 1512), (3, 1512), (3, 1512), (3, 1512)

select multiple columns from different tables and join in hive

I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1
What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.
A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2
What should be the correct hive query command? I got a max split error for my initial try.
Does this work?
create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;
If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.

Resources