SQL Join based on three keys - join

Database is Teradata
I have two table which I am trying to join. Following are the table structures. When I join these table I expect to get two rows as output but getting 4 rows.what is reason for this behavior. Join based on three keys should uniquely identify a row but still getting 4 rows as output. Any help is appreciated.
TableA
Weekkey|segment|type|users
201501|1|A|100
201501|1|B|100
TableB
Weekkey|segment|type|revenue
201501|1|A|200
201501|1|B|200
when I join these two table using the following query i get the following result
select a.* ,b.user
from tablea a left join tableb b on a.weekkey=b.weekkey
and a.segment=b.segment
and a.type=b.type
Weekkey|segment|type|revenue|users
201501|1|A|200|100
201501|1|B|200|100
201501|1|A|200|100
201501|1|B|200|100

Using sql server, here is ddl and sample data along with the query you posted. The output you state you are getting doesn't happen here.
create table #tablea
(
Weekkey int
, segment int
, type char(1)
, users int
)
insert #tablea
select 201501, 1, 'A', 100 union all
select 201501, 1, 'B', 100
create table #TableB
(
Weekkey int
, segment int
, type char(1)
, revenue int
)
insert #TableB
select 201501, 1, 'A', 200 union all
select 201501, 1, 'B', 200
select a.*
, b.revenue
from #tablea a
left join #tableb b on a.weekkey = b.weekkey
and a.segment = b.segment
and a.type = b.type
drop table #tablea
drop table #TableB

Related

sql join clause is not returning expected results with arabic text

I really been working to solve this problem for quite a while, I have 2 tables with the same structure as follows:
registerationNumber int, CompanyName nvarchar, areaName nvarchar,phoneNumber int , email nvarchar, projectStatus nvarchar
All columns with data type nvarchar contains Arabic text except the email column,
tableA contains 675 rows and tableB contains 397 rows all exists in tableA
What I am trying to do is select the non matching rows from tableA,
they should be 675 - 398 = 277 rows
everytime I run the where clause I get all tables returned
The join clause I am writing is like this:
select a.registerationNumber
from tableA a left outer join tableB b
on a.registerationNumber = b.registerationNumber
but I am not getting any results, I tried all types of joins but I am getting the same results.
I created a sample database and inserted English data in the tables and it worked fine with the following clause:
select * from tblAllProjects a right join tblhalfProjects h
on a.registerationNumber = h.registerationNumber
which means that I am writing the correct the correct syntax,
I know that I should use the following syntax on selecting Arabic text:
Select * from tableA where comanyName like N'arabic_text'
Anyone knows what seems to be the problem ?
I meant to say that you should do:
select a.registerationNumber
from tableA a left outer join tableB b
on a.registerationNumber = b.registerationNumber
where b.registrationNumber is NULL
This should select all registrationNumbers from tableA, with no match in tableB.
another way to write this (but probably slower) is:
select a.registerationNumber
from tableA a
where a.registerationNumber not in (
select b.registerationNumber
from tableB )
You should/can use this second approach if there are only "a few" records returned from the sub-query.

Is it possible to join a column from Table B to Table A without creating a new table C in Hive SQL?

I'm working with very large tables in Hive so I'd like to avoid having to create a whole new table when joining a single column from Table 2 to Table 1.
My first pass using the INSERT and UPDATE statements with the following test data didn't work.
Is there a way to do this or is it simpler to just create Table 3 by joining Table 1 to Table 2 and then dropping Table 1?
DROP TABLE IF EXISTS table_1;
CREATE TABLE table_1 (id VARCHAR(64), cost INT, diag_cd VARCHAR(64));
INSERT INTO TABLE table_1
VALUES ('A0001', 1000, 'A1'), ('A0001', 2000, 'B1'), ('A0001', 3000, 'B1'),
('B0001', 5000, 'A1'), ('B0001', 10000, 'B1'), ('B0001', 15000, 'C1'),
('C0001', 11000, 'B1'), ('C0001', 14000, 'C1'), ('C0001', 20000, 'C1');
DROP TABLE IF EXISTS table_2;
CREATE TABLE table_2 (id VARCHAR(64), prodt_cd VARCHAR(64));
INSERT INTO TABLE table_2
VALUES ('A0001', 'OAP'), ('B0001', 'OAPIN'), ('C0001', 'MOAPIN');
INSERT INTO TABLE table_1 prodt_cd VARCHAR(64);
UPDATE table_1 t1 SET t1.prodt_cd = t2.prodt_cd
INNER JOIN table_2 t2
ON t1.id = t2.id;
After some research and help from Mike67 I found a solution.
It appears Hive does not support COLUMN UPDATE or MERGE statements, but a simple alternative is to create an empty table and then populate it with fields from a join:
DROP TABLE IF EXISTS table_3;
CREATE TABLE table_3 LIKE table_1;
INSERT INTO TABLE table_3
SELECT a.*, b.prodt_cd
FROM table_1 AS a
LEFT OUTER JOIN table_2 AS b
ON a.id = b.id;

Left outer join with 3 tables and subquery

sorry for the late response.
For a key in table A, there may be 2 or more records present in tables B and C. That is, one another column in these tables will have a date value which would be making the keys unique. So I want to extract the record that has maximum date value. And that's why I am using the max function. I know that the subquery which I have coded should not be included in the ON clause and it would do the filtering before the join statement. So eventually I want to know how to mention the max clause in the query.
Example:
Table A
Key - AAAAA
Table B:
Record 1
Key - AAAAA
Date - 2017-10-01
Record 2
Key - AAAAA
Date - 2017-10-05
I want the only the record AAAAA/2017-10-05 to be selected from the table B
Basically records from table A where A.c3 = 'Y' should be extracted first (assume it gives 500 records)
Then join these 500 records with tables B and C (left outer, to have all the matching records and the non-matching records should have nulls in the columns from the tables B and C)
In tables B and C, if more than 1 record present with different dates, the maximum date field should be extracted.
Hence final output should contain 500 records.
This is all you need for what you describe
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = ‘Y’
These lines are causing your problem...basically forcing your outer joins to an inner joins.
AND B.C3 = (SELECT MAX(B3) FROM TABLE2 T1
WHERE T1.B1 = B.B1)
AND C.C3 = (SELECT MAX(C3) FROM TABLE3 T1
WHERE T1.C1 = C.C1)
If there's no match in B or C , then B.C3 and/or C.C3 will be NULL and NULL can't be = to anything (or <> to anything for that matter)
What are you trying to accomplish with the above that you've not included in the question?
Just do it?
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = 'Y' and (B.B1 is null or C.B1 is null)

Informix one to many format issue

Trying to fix my Informix query results format from a one to many relationship. My current query is using a JOIN but is creating a new line for every time there is a match to the JOIN ON condition. I should add the below is only an example, the real data is thousands of entries with about a 100 unique "category" entries so I cant hard code WHERE statements, it needs to read each entry and add if a match. I tried a GROUP_CONCAT however is just returned an error, guess its not a informix function, I also tried reading this thread but have yet been unable to get working. Show a one to many relationship as 2 columns - 1 unique row (ID & comma separated list)
Any help will be appreciated.
IBM/Informix-Connect Version 3.70.UC4
IBM/Informix LIBGLS LIBRARY Version 5.00.UC5
IBM Informix Dynamic Server Version 11.70.FC8W1
Tables
movie
name rating movie_id
rio g 1
horton g 2
blade r 3
lotr_1 pg13 4
lotr_2 pg13 5
paul_blart pg 6
category
cat_name id
kids 1
comedy 2
action 3
fantasy 4
category_member
movie_name cat_name catmem_id
lotr_1 action 1
lotr_1 fantasy 2
rio kids 3
rio comedy 4
When I use
#!/bin/bash
echo "SET isolation dirty read;
UNLOAD to /export/home/movie/movieDetail.unl DELIMITER ','
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category b ON b.movie_name = a.name
;" | dbaccess thedb;
What I get is
rio,g,kids
rio,g,comedy
lotr_1,pg13,action
lotr_1,pg13,fantasy
What I would like is
rio,g,kids,comedy
lotr_1,pg13,action,fantasy
Install the GROUP_CONCAT user-defined aggregate
You must install the GROUP_CONCAT user-defined aggregate from SO 715350 (referenced in your question) into your database. The GROUP_CONCAT aggregate is not defined by Informix, but can be added if you use the SQL from that question. One difference between that and a normal built-in function is that you need to install the aggregate in each database in the server where you need to use it. There might be a way to do a 'global install' (for all databases in a given server), but I've forgotten (or, more accurately, never learned) how to do it.
Writing your queries
With the sample database listed at the bottom:
The query in the question does not run:
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category b ON b.movie_name = a.name;
SQL -217: Column (movie_name) not found in any table in the query (or SLV is undefined).
This can be fixed by changing category to category_member. This produces:
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category_member b ON b.movie_name = a.name;
rio g kids
rio g comedy
horton g
blade r
lotr_1 pg13 action
lotr_1 pg13 fantasy
lotr_2 pg13
paul_blart pg
The LEFT JOIN appears to be unwanted. And using GROUP_CONCAT produces approximately the desired answer:
SELECT a.name, a.rating, GROUP_CONCAT(b.cat_name)
FROM movie a
JOIN category_member b ON b.movie_name = a.name
GROUP BY a.name, a.rating;
rio g kids,comedy
lotr_1 pg13 action,fantasy
If you specify the delimiter as ,, the commas in the data from the GROUP_CONCAT operator will be escaped to avoid ambiguity:
SELECT a.NAME, a.rating, GROUP_CONCAT(b.cat_name)
FROM movie a
JOIN category_member b ON b.movie_name = a.NAME
GROUP BY a.NAME, a.rating;
rio,g,kids\,comedy
lotr_1,pg13,action\,fantasy
Within standard Informix utilities, there isn't a way to avoid that; they don't leave the selected/unloaded data in an ambiguous format.
I'm not convinced that the database schema is very well organized. The Movie table is OK; the Category table is OK; but the Category_Member table would be more orthodox if it used the schema:
DROP TABLE IF EXISTS category_member;
CREATE TABLE category_member
(
movie_id INTEGER NOT NULL REFERENCES Movie(Movie_id),
category_id INTEGER NOT NULL REFERENCES Category(Id),
PRIMARY KEY(movie_id, category_id)
);
INSERT INTO category_member VALUES(4, 3);
INSERT INTO category_member VALUES(4, 4);
INSERT INTO category_member VALUES(1, 1);
INSERT INTO category_member VALUES(1, 2);
-- Use GROUP_CONCAT
SELECT a.NAME, a.rating, GROUP_CONCAT(c.cat_name)
FROM movie a
JOIN category_member b ON b.movie_id = a.movie_id
JOIN category c ON b.category_id = c.id
GROUP BY a.NAME, a.rating;
The output from this query is the same as from the previous one, but the joining is more orthodox.
Sample database
DROP TABLE IF EXISTS movie;
CREATE TABLE movie
(
name VARCHAR(20) NOT NULL UNIQUE,
rating CHAR(4) NOT NULL,
movie_id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO movie VALUES("rio", "g", 1);
INSERT INTO movie VALUES("horton", "g", 2);
INSERT INTO movie VALUES("blade", "r", 3);
INSERT INTO movie VALUES("lotr_1", "pg13", 4);
INSERT INTO movie VALUES("lotr_2", "pg13", 5);
INSERT INTO movie VALUES("paul_blart", "pg", 6);
DROP TABLE IF EXISTS category;
CREATE TABLE category
(
cat_name VARCHAR(10) NOT NULL UNIQUE,
id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO category VALUES("kids", 1);
INSERT INTO category VALUES("comedy", 2);
INSERT INTO category VALUES("action", 3);
INSERT INTO category VALUES("fantasy", 4);
DROP TABLE IF EXISTS category_member;
CREATE TABLE category_member
(
movie_name VARCHAR(20) NOT NULL,
cat_name VARCHAR(10) NOT NULL,
catmem_id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO category_member VALUES("lotr_1", "action", 1);
INSERT INTO category_member VALUES("lotr_1", "fantasy", 2);
INSERT INTO category_member VALUES("rio", "kids", 3);
INSERT INTO category_member VALUES("rio", "comedy", 4);

Left join with where clause not working

I was trying to get only selected rows from table A(not all rows) and rows matching table A from table B, but it shows only matching rows from table A and table B, excluding rest of the selected rows from table A.
I used this condition,
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and B.party_code=2
i have 2 tables:
table 1: A with 3 columns
CategoryName,CategoryCode(PK),ItemType
table 2: B with 2 columns
CategoryCode(FK),Discount,PartyCode(FK)(from another table)
NOTE: working in access 2007
For non-matching rows from table B, party_code = NULL, so your where clause will evaluate to false and therefore the row won't be returned. So, you need to filter the "B" records before joining. Try
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode and B.party_code=2
WHERE A.itemtype='F'
[EDIT] That doesn't work in Access. next try.
You can create a query to do your filter. Let's call it "B_filtered". This is just
SELECT * FROM B where party_code = 2
(You could make the "2" a parameter to make it more flexible).
Then, just use this query in your actual query.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
[EDIT]
Just Googled - I think you can do this directly with a subquery.
SELECT A.CategoryName,B_filtered.discount
from A LEFT JOIN (SELECT * FROM B where party_code = 2) AS B_filtered ON A.CategoryCode = B_filtered.CategoryCode
WHERE A.itemtype='F'
What mlinth proposed is correct, and would work for most other SQL languages. The query below is the same basic concept but using a null condition.
Try:
SELECT A.CategoryName,B.discount
from A LEFT JOIN B ON A.CategoryCode = B.CategoryCode
WHERE A.itemtype='F' and (B.party_code=2 OR B.party_code IS NULL)
If party_code is nullable, switch to using the PK or another non-nullable field.

Resources