Join on Subquery: Google BigQuery - join

I am trying to join two tables:
select * from
(select *, STRING(ID) as ID_string from Dataset1.Table1 where create_date >= 1388514600) as A left join each Dataset2.Table1 as B on A.ID_string = B.ID
On running the above query, I get the following error:
Field 'ID_string' not found in table 'Dataset1.Table1'
Why is the join not recognizing the newly created column "ID_string"?

Solution: Try the same query, but specifying each field by its name (instead of using *).

Related

alias column names without mentioning column name

I'm trying to get all columns from each table with a prefix in the output, without mentioning all column names specifically in the select statement. Like:
SELECT *
FROM TABLE1 as T1
FULL JOIN TABLE2 as T2
ON T1.number=T2.number
Where I would want to get all column names from table1 and table2 prefixed with "T1" and "T2".
Many thanks in advance!
SELECT
CONCAT('T1', COLUMN_NAME), ORDINAL_POSITION
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'TABLE1'
ORDER BY 2
UNION
SELECT CONCAT('T2', COLUMN_NAME), ORDINAL_POSITION
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'TABLE2'
ORDER BY 2

How to fix DF-JOIN-002 Error in Azure Data Factory (Only two Join conditions allowed)

I have a data flow with a Union on two tables then joining the results of the Union to another table. I keep receiving the following error when I try debugging the pipeline or previewing the data.
DF-JOIN-002 at Join 'Join1'(Line 40/Col 26): Only 2 join condition(s) allowed
I'm basically trying to build a pipeline to automate this query:
SELECT DISTINCT k.acct_id, s.Id, Email, FirstName, LastName FROM table_3 s
INNER JOIN
( (SELECT acct_id, event_date FROM table_1)
UNION (SELECT acct_id, event_date FROM table_2)) k
ON k.acct_id = s.Archtics_acct_id__c
WHERE event_date = 'xxxx-xx-xx'
enter image description here
I figured it out after some time. I had to delete the Join activity and add it again. It was still linked to another source. Even though only two sources were selected in the join settings

LEFT JOIN using _PARTITIONDATE

I'm currently using StandardSQL in BigQuery, I tried to join two sets of table one of which is a pseudo-column table partitioned by day.
I tried to use this query below:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)
I've tried this kind of JOIN (using _PARTITIONDATE) and worked out, but for this particular query I got an error message:
Unrecognized name: _PARTITIONDATE
Can anyone tell me why this happened, and how could I solve this? Thanks in advance.
The issue is that you are not selecting the _PARTITIONDATE field from table_2 when joining it so it can't recognize it:
SELECT user_id, city_name FROM table_2
In order to solve it you can add it as follows:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT _PARTITIONDATE AS pd, user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = pd)
Note that you'll need an alias such as pd as it's a pseudocolumn
Probably it was working in the past if you were joining two tables directly such as in (you don't get selectivity benefits in that case):
FROM
table_1 a
LEFT JOIN table_2 b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)

Left outer join with 3 tables and subquery

sorry for the late response.
For a key in table A, there may be 2 or more records present in tables B and C. That is, one another column in these tables will have a date value which would be making the keys unique. So I want to extract the record that has maximum date value. And that's why I am using the max function. I know that the subquery which I have coded should not be included in the ON clause and it would do the filtering before the join statement. So eventually I want to know how to mention the max clause in the query.
Example:
Table A
Key - AAAAA
Table B:
Record 1
Key - AAAAA
Date - 2017-10-01
Record 2
Key - AAAAA
Date - 2017-10-05
I want the only the record AAAAA/2017-10-05 to be selected from the table B
Basically records from table A where A.c3 = 'Y' should be extracted first (assume it gives 500 records)
Then join these 500 records with tables B and C (left outer, to have all the matching records and the non-matching records should have nulls in the columns from the tables B and C)
In tables B and C, if more than 1 record present with different dates, the maximum date field should be extracted.
Hence final output should contain 500 records.
This is all you need for what you describe
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = ‘Y’
These lines are causing your problem...basically forcing your outer joins to an inner joins.
AND B.C3 = (SELECT MAX(B3) FROM TABLE2 T1
WHERE T1.B1 = B.B1)
AND C.C3 = (SELECT MAX(C3) FROM TABLE3 T1
WHERE T1.C1 = C.C1)
If there's no match in B or C , then B.C3 and/or C.C3 will be NULL and NULL can't be = to anything (or <> to anything for that matter)
What are you trying to accomplish with the above that you've not included in the question?
Just do it?
SELECT A.A1, A.A2, B.B1, B.B2, C.C1, C.C2
FROM TABLE1 A
LEFT OUTER JOIN TABLE2 B
ON A.A1 = B.B1
LEFT OUTER JOIN TABLE3 C
ON A.A1 = C.C1
WHERE A.C3 = 'Y' and (B.B1 is null or C.B1 is null)

Error in Hive Query while joining tables

I am unable to pass the equality check using the below HIVE query.
I have 3 table and i want to join these table. I trying as below, but get error :
FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date'
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 ON
( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
EDITED based on help from FuzzyTree:
1st:
We tried to edit above query using between and where clause, but not getting any output from the query.
But If we changed the above query by removing the between clause with date, then I got some output based on "v3.adv_price <= t3.comp_price", but not using "date filter".
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
2nd :
Next we tried to pass only one date as :
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price and v3.start_dt <= t3.visit_date
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
So, now it's showing some result but if we pass both the start and end date filter, it; not showing any result.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job.
Try moving your inequalities to the where clause
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where t3.visit_date between v3.start_dt and v3.end_dt
and v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);

Resources