KSQLDB Multiple columns JOIN as struct - join

Question about joining stream-table over multiple keys column (struct)
My table:
CREATE TABLE table1 WITH (KEY_FORMAT='JSON', VALUE_FORMAT='JSON')
AS SELECT
STRUCT(x := s.col1, y:= s.col2) as K,
s.some_column,
FROM mystream s
group by STRUCT(x := s.col1, y:= s.col2);
My stream:
CREATE STREAM joined WITH (KAFKA_TOPIC='joined_topic', KEY_FORMAT='kafka', VALUE_FORMAT='kafka', WRAP_SINGLE_VALUE=false)
AS SELECT
T.ID,
T.Doc
FROM `CDC_TOPIC` T
INNER JOIN table1 S ON S.K = STRUCT(x:= s.col1, y:= s.col2);
PARTITION BY T.ID;
I got the following error:
Invalid comparison expression 'STRUCT(x:=s.col1, y:=t.col2)' in join '(S.K = STRUCT(x:=col`, y:=col2))'. Each side of the join comparision must contain references from exactly one source
Is the a workaround to achieve the multiple columns join from different sources?

Related

Join two datasets based on a flag and id

I am trying to join two datasets based on a flag and id.
i.e
proc sql;
create table demo as
select a.*,b.b1,b.2
from table1 a
left join table2 on
(a.flag=b.flag and a.id=b.id) or (a.flag ne b.flag and a.id=b.id)
end;
This code runs into a loop and never produces a output.
I want to make sure that where there are flag values matching get the attributes; if not get the attributes at id level so that we do not have blank values.
This join condition cannot be optimized. It is not a good practice to use or in a join. If you check your log, you'll see this:
NOTE: The execution of this query involves performing one or more Cartesian product joins
that can not be optimized.
Instead, transform your query to do a union:
proc sql;
create table demo as
select a.*,
b.b1,
b.b2
from table1 as a
left join
table2 as b
on a.flag=b.flag and a.id=b.id
UNION
select a.*,
b.b1,
b.b2
from table1 as a
left join
table2 as b
on a.flag ne b.flag and a.id=b.id
;
quit;

How to get the table name from a column in a join query with SQL Server and FIREDAC?

I looking for get metadata on a TFDQuery (FireDAC).
I have this query:
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.Code = t2.code
I would like to know the column information (table name, the real column name in the table, ....)
I find this post : How to get the table name from a field in a join query with MSSQL? (mysql_field_table equivalent) but I have not the same structure on FireDac.
As RBA already mentioned you have to enable ExtendedMetaData in the connection first. When done you can get the field column description via query.GetFieldColumn(field) and access the table and column name with its ActualOriginTabName and ActualOriginColName properties.
column := query.GetFieldColumn(field);
orgTableName := column.ActualOriginTabName;
orgColumnName := column.ActualOriginColName;

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.
You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

How to get return rowcount from stored proceedure - SQL Server 2012

I am writing a stored procedure in SQL Server 2012 and facing problem while reading the number of rows that the stored procedure will return after matching all the conditions and join criteria.
My stored procedure is:
SELECT DISTINCT
COUNT(crs.CourseId) OVER() AS Recordcounts,
crs.CourseId,
crs.CourseName,
crs.CourseDescription,
(SELECT CourseGroupName FROM CourseGroup cgrp
WHERE cgrp.CourseGroupId = crs.CourseGroupId) AS Category
FROM
Courses crs
INNER JOIN
CourseRequests creq ON crs.CourseId = creq.CourseId
WHERE
crs.Coursename <> ''''
It is returning 16 as "Recordcounts" for one of condition, but in actual, the result is 3 rows only.
Can anybody help me with this?
Thanks
Below screenshot will give more clear idea about problem for one of condition:
Try this:
;with cte as(
SELECT distinct
crs.CourseId,
crs.CourseName,
crs.CourseDescription,
(SELECT CourseGroupName FROM CourseGroup cgrp
WHERE cgrp.CourseGroupId = crs.CourseGroupId) AS Category
FROM
Courses crs
INNER JOIN
CourseRequests creq ON crs.CourseId = creq.CourseId
WHERE
crs.Coursename <> '''')
Select *, (select COUNT(CourseId) from cte) AS Recordcounts
from cte

Hive Join returning zero records

I have two Hive tables and I am trying to join both of them. The tables are not clustered or partitioned by any field. Though the tables contain records for common key fields, the join query always returns 0 records. All the data types are 'string' data types.
The join query is simple and looks something like below
select count(*) cnt
from
fsr.xref_1 A join
fsr.ipfile_1 B
on
(
A.co_no = B.co_no
)
;
Any idea what could be going wrong? I have just one record (same value) in both the tables.
Below are my table definitions
CREATE TABLE xref_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
CREATE TABLE ipfile_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
Hi You are using Star Schema Join. Please use your query like this:
SELET COUNT(*) cnt FROM A a JOIN B b ON (a.key1 = b.key1);
If still have issue Then use MAPJOIN:
set hive.auto.convert.join=true;
select count(*) from A join B on (key1 = key2)
Please see Link for more detail.

Resources