I'm trying to define a DB2 Stored Proc that would (ideally), CREATE VIEW, then do a SELECT against that VIEW to build another piece of SQL, then execute that SQL using a CURSOR and return a result set.
I've 2 problems:
DB2 doesn't appear to like the mix of CREATE, SELECT and DECLARE CURSOR within a single SP,
and I can't figure out what syntax to use to declare a cursor based on SQL that is stored as a string in the declared VARCHAR that is the output from the SELECT statement.
Has anyone done anything similar and/or able to give me some syntax examples?
Sure you can do that in DB2. In order to execute a 'create view' you need to use a dynamic SQL. The same for the select on the view. Finally, for the cursor, you define a generic cursor, and execute it dynamically.
DECLARE SENTENCE VARCHAR(256);
DECLARE TABNAME VARCHAR(128);
DECLARE STMT STATEMENT;
DECLARE TABLES_CURSOR CURSOR
FOR TABLES_RS;
SET SENTENCE = 'CREATE VIEW TABS SELECT TABNAME FROM SYSCAT.TABLES';
PREPARE STMT FROM SENTENCE;
EXECUTE STMT;
SET SENTENCE = 'SELECT TABNAME FROM TABS';
PREPARE TABLES_RS FROM SENTENCE;
OPEN TABLES_CURSOR;
FETCH TABLES_CURSOR INTO TABNAME;
You can see real SQL PL examples in my project db2unit https://github.com/angoca/db2unit/blob/master/src/main/sql-pl/04-Body.sql
Related
I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery
I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');
Consider an enterprise that captures sensor data for different production facilities. per facility, we create an aggregation query that averages the values to 5min timeslots. This query exists out of a long list of with-clauses and writes data to a table (called aggregation_table).
Now my problem: currently we have n queries running that exactly run the same logic, the only thing that differs are table names (and sometimes column names but let's ignore that for now).
Instead of managing n different scripts that are basically the same, I would like to put it in a stored procedure that is able to work like this:
CALL aggregation_query(facility_name) -> resolve the different tables for that facility and then use them in the different with clauses
On top of that, instead of having this long set of clauses that give me the end-result, I would like to chunk them up in logical blocks that are parametrizable, So for example, if I call the aforementioned stored_procedure for facility A, I want to be able to pass / use this table name in these different functions, where the output can be re-used in the next statement (like you would do with with clauses).
Another argument of why I want to chunk this up in re-usable blocks is because we have many "derivatives" on this aggregation query, for example to manage historical data, to correct data or to have the sensor data on another aggregation level. As these become overly complex, it is much easier to manage them without having to copy paste and adjust these every time.
In the current set-up, it could be useful to know that I am only entitled to use plain BigQuery, As my team is not allowed to access the CI/CD / scheduling and repository. (meaning that I cannot solve the issue by having CI/CD that deploys the n different versions of the procedure and functions)
So in the end, I would like to end up with something like this using only bigquery:
CREATE OR REPLACE PROCEDURE
`aggregation_function`()
BEGIN
DECLARE
tablename STRING;
DECLARE
active_table_name STRING; ##get list OF tables CREATE TEMP TABLE tableNames AS
SELECT
table_catalog,
table_schema,
table_name
FROM
`catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE
table_name = tablename;
WHILE
(
SELECT
COUNT(*)
FROM
tableNames) >= 1 DO ##build dataset + TABLE name
SET
active_table_name = CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`'); ##use concat TO build string AND execute
EXECUTE IMMEDIATE '''
INSERT INTO
`aggregation_table_for_facility` (timeslot, sensor_name, AVG_VALUE )
WITH
STEP_1 AS (
SELECT
*
FROM
my_table_function_step_1(active_table_name,
parameter1,
parameter2) ),
STEP_2 AS (
SELECT
*
FROM
my_table_function_step_2(STEP_1,
parameter1,
parameter2) )
SELECT * FROM STEP_2
'''
USING active_table_name as active_table_name;
DELETE
FROM
tableNames
WHERE
table_name = tablename;
END WHILE
;
END
;
I was hoping someone could make a snippet on how I can do this in Standard SQL / Bigquery, so basically:
stored procedure that takes in a string variable and is able to use that as a table (partly solved in the approach above, but not sure if there are better ways)
(table) function that is able to take this table_name parameter as well and return back a table that can be used in the next with clause (or alternatively writes to a temp table)
I think below code snippets should provide you with some insights when dealing with procedures, inserts and execute immediate statements.
Here I'm creating a procedure which will insert values into a table that exists on the information schema. Also, as a value I want to return I use OUT active_table_name to return the value I assigned inside the procedure.
CREATE OR REPLACE PROCEDURE `project-id.dataset`.custom_function(tablename STRING,OUT active_table_name STRING)
BEGIN
DECLARE query STRING;
SET active_table_name= (SELECT CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`')
FROM `project-id.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = tablename);
#multine query can be handled by using ''' or """
Set query =
'''
insert into %s (string_field_0,string_field_1,string_field_2,string_field_3,string_field_4,int64_field_5)
with custom_query as (
select string_field_0,string_field_2,'169 BestCity',string_field_3,string_field_4,55677 from %s limit 1
)
select * from custom_query;
''';
# querys must perform operations and must be the last thing to perform
# pass parameters using format
execute immediate (format(query,active_table_name,active_table_name));
END
You can also use a loop to iterate trough records from a working table so it will execute the procedure and also be able to get the value from the procedure to use somewhere else.ie:A second procedure to perform a delete operation.
DECLARE tablename STRING;
DECLARE out_value STRING;
FOR record IN
(SELECT tablename from `my-project-id.dataset.table`)
DO
SET tablename = record.tablename;
LOOP
call `project-id.dataset`.custom_function(tablename,out_value);
select out_value;
END LOOP;
END FOR;
To recap, there are some restrictions such as the possibility to call procedures inside a execute immediate or to use execute immediate inside an execute immediate, to count a few. I think these snippets should help you dealing with your current situation.
For this sample I use the following documentation:
Data Manipulation Language
Dealing with outputs
Information Schema Tables
Execute Immediate
For...In
Loops
I need to write a procedure in Redshift that will write to a table, but the table name comes from the input string. Then I declare a variable that puts together the table name.
CREATE OR REPLACE PROCEDURE my_schema.data_test(current "varchar")
LANGUAGE plpgsql
AS $$
declare new_table varchar(50) = 'new_tab' || '_' || current;
BEGIN
select 'somestring' as colname into new_table;
commit;
END;
$$
This code runs but it doesn't create a new table, no errors. If I remove the declare statement then it works, creating a table called "new_table". It's just not using the declared variable name.
It's hard to find good examples because Redshift is postgresql and all the postgresql pages say that it only has functions, not procedures. But Redshift procedures were introduced last year and I don't see many examples.
Well, when you are declaring a variable "new_table", and performing a SELECT ..INTO "new_table", the value is getting assigned to the variable "new_table". You will see that if you return your variable using a OUT parameter.
And when you remove the declaration, it simply work as a SELECT INTO syntax of Redshift SQL and creates a table.
Now to the solution:
Create a table using the CREATE TABLE AS...syntax.
Also you need to pass the value of declared variable, so use the EXECUTE command.
CREATE OR REPLACE PROCEDURE public.ct_tab (vname varchar)
AS $$
DECLARE tname VARCHAR(50):='public.swap_'||vname;
BEGIN
execute 'create table ' || tname || ' as select ''name''';
END;
$$ LANGUAGE plpgsql
;
Now if you call the procedure passing 'abc', a table named "swap_abc" will be created in public schema.
call public.ct_tab('abc');
Let me know if it helps :)
I have a stored procedure in Informix that uses external tables to unload data to a disk file from a select statement. Is it possible to give the DISK file name as a parameter to the stored procedure? My stored procedure is as follows:
create procedure spUnloadData(file_name_param varchar(64))
create temp table temp_1(
col_11 smallint
) with no log;
INSERT INTO temp_1 select col1 from data_table;
CREATE EXTERNAL TABLE temp1_ext
SAMEAS temp_1
USING (
--DATAFILES ("DISK:/home/informix/temp.dat")
DATAFILES("DISK:" || file_name_param )
);
INSERT INTO temp1_ext SELECT * FROM temp_1;
DROP TABLE temp1_ext ;
DROP TABLE temp_1;
END PROCEDURE;
I am trying to pass in the DISK filename as a parameter(from my shell script, timestamped).
Any help is appreciated.
NH
You would have to use Dynamic SQL in the stored procedure — for example, the EXECUTE IMMEDIATE statement.
You create a string containing the text of the SQL and then execute it. Adapting your code:
CREATE PROCEDURE spUnloadData(file_name_param VARCHAR(64))
DEFINE stmt VARCHAR(255); -- LVARCHAR might be safer
CREATE TEMP TABLE temp_1(
col_11 SMALLINT
) WITH NO LOG;
INSERT INTO temp_1 select col1 from data_table;
LET stmt = 'CREATE EXTERNAL TABLE temp1_ext ' ||
'SAMEAS temp_1 USING DATAFILES("DISK:' ||
TRIM(file_name_param) ||
'")';
EXECUTE IMMEDIATE stmt;
INSERT INTO temp1_ext SELECT * FROM temp_1;
DROP TABLE temp1_ext;
DROP TABLE temp_1;
END PROCEDURE;
Untested code — the concept should be sound, though.
This assumes you are using a reasonably current version of Informix; the necessary feature is in 12.10, and some version 11.70 versions too, I believe.
I made slight changes to my code to unload data(as Informix default '|' separated fields). Instead of using a temp table, I was able to select columns directly into an external table dynamically.