How to define a cluster key based on a expression in Snowflake?

How to define a cluster key based on a expression in Snowflake? - data-warehouse

As per Snowflake documentation (https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html) a cluster key can be defined as one or more table columns/expressions.
The example they bring is:
-- cluster by expressions
create or replace table t2 (c1 timestamp, c2 string, c3 number) cluster by (to_date(c1), substring(c2, 0, 10));
I want to extract from a date column the year,month and day and to create a cluster key based on those expressions, but didn't find a workaround.
This is what i tried already:
CREATE TABLE TBL_DATECREATED (DATECREATED_UTC)
CLUSTER BY (
TO_DATE(DATECREATED_UTC)
)
AS
SELECT DATECREATED_UTC FROM BASETABLE_CONTACTS
Result:
SQL compilation error: invalid type [TO_DATE(TBL_DATECREATED.DATECREATED_UTC)] for parameter 'TO_DATE'
**Mention: SELECT TO_DATE(DATECREATED_UTC) FROM BASETABLE_CONTACTS works fine!
CREATE MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC, EMAILADDRESS)
CLUSTER BY ( year(DATECREATED_UTC)
-- extract(year from DATECREATED_UTC)
,EMAILADDRESS
)
AS
SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS
Result:
SQL compilation error: Function EXTRACT does not support UNKNOWN argument type
(for commented expression i received the same error message)
CREATE MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC, EMAILADDRESS)
CLUSTER BY ( DATECREATED_UTC
,substring(EMAILADDRESS, 1, 3)
)
AS
SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS
Result:
SQL compilation error: error line 3 at position 14 Invalid argument types for function 'SUBSTRING': (UNKNOWN, NUMBER(1,0), NUMBER(1,0))
Thanks in advance for each suggestion/solution!

Try the following, when you define the clustering key at the same time as you create the table maybe Snowflake isn't able to determine the datatype of the column properly?
CREATE MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC timestamp, EMAILADDRESS varchar)
CLUSTER BY ( year(DATECREATED_UTC)
-- extract(year from DATECREATED_UTC)
,EMAILADDRESS
)
AS
SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS

For the first error, try adding a datatype to the create table. For example:
CREATE TABLE TBL_DATECREATED (DATECREATED_UTC timestamptz)
For the second and third issues, check that the data types are what you're expecting.

We have to use the below statements to define a cluster key based on the expression for a materialized view.
CREATE or replace MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC,EMAILADDRESS) cluster by(DATECREATED_UTC,EMAILADDRESS ) AS SELECT to_date(DATECREATED_UTC), EMAILADDRESS FROM BASETABLE_CONTACTS;
CREATE or replace MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC,EMAILADDRESS) cluster by(DATECREATED_UTC,EMAILADDRESS ) AS SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS;
alter materialized view MV_DATECREATED cluster by(TO_DATE(DATECREATED_UTC),EMAILADDRESS );

Related

How to pass a table name as a parameter in BigQuery procedure?

I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.

Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery

bigquery sql table function with string interpolation

I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);

This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');

how to get the integer column to integer list in informix

To read an integer column into integer list
create function somedum()
returning int
define s LIST(INTEGER NOT NULL);
select id into s from informix.emptest;
end function
create table emptest(id int)
insert into emptest(id)values(7)
when I execute the above function
iam getting error as attached image

The Informix documentation on inserting elements into a LIST seems misleading to me, as my first impression of the example ( Insert into a LIST ) also led me to use the SELECT INTO syntax and get the same error:
-9634 No cast from integer to set(integer not null)
The SELECT INTO syntax can be used to copy/insert an entire LIST, not elements into the LIST.
To insert an element into the LIST ( or generally manipulate it's elements ) we need to use the Informix virtual table interface, in this case using the TABLE syntax to create a virtual table from the LIST which then can be used to do the usual insert/select/update/delete operations ( Collection-Derived Table ).
CREATE FUNCTION somedum()
RETURNING LIST( INTEGER NOT NULL ) AS alist;
DEFINE my_list LIST( INTEGER NOT NULL );
INSERT INTO TABLE( my_list ) SELECT id FROM emptest;
RETURN my_list;
END FUNCTION;
Executing the function we get:
EXECUTE FUNCTION somedum();
alist LIST{7 }

esper how to use table data created by epl

I'm new to esper, I want to get data stored in tbl_config. Here are some esper config file:
config.epl
module rms.config;
create table tbl_config(
id java.math.BigDecimal primary key,
time java.math.BigDecimal
);
create schema ConfigListEvent as (
id java.math.BigDecimal,
time java.math.BigDecimal
);
#Audit
#Name("LoadConfigDataFromDBRule")
insert into ConfigListEvent
select tbl.ID as id, tbl.time as time
from ImportDataEvent,
sql: rms ['select * from T_CONFIG'] as tbl;
#Audit
#Priority(1)
#Name("DeleteConfigDataRule")
on ConfigListEvent as evt
delete from tbl_config as tbl where evt.id = tbl.id;
#Audit
#Name("InsertConfigDataRule")
on ConfigListEvent
insert into tbl_config select *;
stat.epl
module rms.stat;
uses rms.config;
#Name("Create-PaymentContext")
create window PaymentWindow.win:time(2 hour) as PaymentRequest;
#Audit
#Name("insertPaymentRequest ")
#Priority(1)
insert into PaymentWindow select * from PaymentRequest;
rule.epl
module rms.rule;
uses rms.config;
uses rms.stat;
#Audit
#Name("xxx")
#Description("check max times per IntervalTime")
on PaymentRequest as pay
select CustomUtil.getEndTime(pay.createTime,tbl_config["time"]) as startTime from PaymentWindow as payWindow;
then system launch with errors:
com.espertech.esper.epl.expression.core.ExprValidationException: Failed to validate method-chain parameter expression 'tbl_config["time"]': Incompatible type returned by a key expression for use with table 'tbl_config', the key expression '"time"' returns 'java.lang.String' but the table expects 'java.math.BigDecimal'
It has confused me for a few days, Thanks for any help!

The table has a key field "id" that is type BigDecimal.
The expression tbl_config["time"] however provides the string value "time" as a key and not a BigDecimal value. Try tbl_config[id] assuming there is a field named 'id' in payment request that has a type BigDecimal.
The on-delete and on-insert in config.epl look a little awkward and on-merge would make this one easy to read statement.

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.

You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to define a cluster key based on a expression in Snowflake? - data-warehouse

For the first error, try adding a datatype to the create table. For example: CREATE TABLE TBL_DATECREATED (DATECREATED_UTC timestamptz) For the second and third issues, check that the data types are what you're expecting.

Related

How to pass a table name as a parameter in BigQuery procedure?

bigquery sql table function with string interpolation

how to get the integer column to integer list in informix

esper how to use table data created by epl

Ambiguous column error creating table in Aster Studio 6.0

Categories

Resources