bigquery sql table function with string interpolation - stored-procedures

I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);

This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');

Related

How to pass a table name as a parameter in BigQuery procedure?

I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery

Bigquery - parametrize tables and columns in a stored procedure

Consider an enterprise that captures sensor data for different production facilities. per facility, we create an aggregation query that averages the values to 5min timeslots. This query exists out of a long list of with-clauses and writes data to a table (called aggregation_table).
Now my problem: currently we have n queries running that exactly run the same logic, the only thing that differs are table names (and sometimes column names but let's ignore that for now).
Instead of managing n different scripts that are basically the same, I would like to put it in a stored procedure that is able to work like this:
CALL aggregation_query(facility_name) -> resolve the different tables for that facility and then use them in the different with clauses
On top of that, instead of having this long set of clauses that give me the end-result, I would like to chunk them up in logical blocks that are parametrizable, So for example, if I call the aforementioned stored_procedure for facility A, I want to be able to pass / use this table name in these different functions, where the output can be re-used in the next statement (like you would do with with clauses).
Another argument of why I want to chunk this up in re-usable blocks is because we have many "derivatives" on this aggregation query, for example to manage historical data, to correct data or to have the sensor data on another aggregation level. As these become overly complex, it is much easier to manage them without having to copy paste and adjust these every time.
In the current set-up, it could be useful to know that I am only entitled to use plain BigQuery, As my team is not allowed to access the CI/CD / scheduling and repository. (meaning that I cannot solve the issue by having CI/CD that deploys the n different versions of the procedure and functions)
So in the end, I would like to end up with something like this using only bigquery:
CREATE OR REPLACE PROCEDURE
`aggregation_function`()
BEGIN
DECLARE
tablename STRING;
DECLARE
active_table_name STRING; ##get list OF tables CREATE TEMP TABLE tableNames AS
SELECT
table_catalog,
table_schema,
table_name
FROM
`catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE
table_name = tablename;
WHILE
(
SELECT
COUNT(*)
FROM
tableNames) >= 1 DO ##build dataset + TABLE name
SET
active_table_name = CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`'); ##use concat TO build string AND execute
EXECUTE IMMEDIATE '''
INSERT INTO
`aggregation_table_for_facility` (timeslot, sensor_name, AVG_VALUE )
WITH
STEP_1 AS (
SELECT
*
FROM
my_table_function_step_1(active_table_name,
parameter1,
parameter2) ),
STEP_2 AS (
SELECT
*
FROM
my_table_function_step_2(STEP_1,
parameter1,
parameter2) )
SELECT * FROM STEP_2
'''
USING active_table_name as active_table_name;
DELETE
FROM
tableNames
WHERE
table_name = tablename;
END WHILE
;
END
;
I was hoping someone could make a snippet on how I can do this in Standard SQL / Bigquery, so basically:
stored procedure that takes in a string variable and is able to use that as a table (partly solved in the approach above, but not sure if there are better ways)
(table) function that is able to take this table_name parameter as well and return back a table that can be used in the next with clause (or alternatively writes to a temp table)
I think below code snippets should provide you with some insights when dealing with procedures, inserts and execute immediate statements.
Here I'm creating a procedure which will insert values into a table that exists on the information schema. Also, as a value I want to return I use OUT active_table_name to return the value I assigned inside the procedure.
CREATE OR REPLACE PROCEDURE `project-id.dataset`.custom_function(tablename STRING,OUT active_table_name STRING)
BEGIN
DECLARE query STRING;
SET active_table_name= (SELECT CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`')
FROM `project-id.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = tablename);
#multine query can be handled by using ''' or """
Set query =
'''
insert into %s (string_field_0,string_field_1,string_field_2,string_field_3,string_field_4,int64_field_5)
with custom_query as (
select string_field_0,string_field_2,'169 BestCity',string_field_3,string_field_4,55677 from %s limit 1
)
select * from custom_query;
''';
# querys must perform operations and must be the last thing to perform
# pass parameters using format
execute immediate (format(query,active_table_name,active_table_name));
END
You can also use a loop to iterate trough records from a working table so it will execute the procedure and also be able to get the value from the procedure to use somewhere else.ie:A second procedure to perform a delete operation.
DECLARE tablename STRING;
DECLARE out_value STRING;
FOR record IN
(SELECT tablename from `my-project-id.dataset.table`)
DO
SET tablename = record.tablename;
LOOP
call `project-id.dataset`.custom_function(tablename,out_value);
select out_value;
END LOOP;
END FOR;
To recap, there are some restrictions such as the possibility to call procedures inside a execute immediate or to use execute immediate inside an execute immediate, to count a few. I think these snippets should help you dealing with your current situation.
For this sample I use the following documentation:
Data Manipulation Language
Dealing with outputs
Information Schema Tables
Execute Immediate
For...In
Loops

Use parameter in UDF or Stored Procedure to return table

I have a case when using the Snowflake API. We have a bunch of materialized views, multiple for each customer of ours. When using the API, we would like to use functions or stored procedures to produce a result in runtime, and taking the customer ID as a parameter. This parameter would be used to fetch data from the correct views, to avoid have functions for each client. However, I'm running into some issues:
SQL UD(T)F isn't working, since it seems that you can't
use a parameter in the FROM clause. Neither can you use variables or
running multiple statements if I understood it correctly.
JavaScript UD(T)F isn't working since your not allowed to execute
statements.
Stored procedures are working for single values (which is one use
case), but not for returning a table. It can return a VARIANT, but
that would add work in the service consuming the API which we
would like to avoid.
Do you have any suggestions on how to to achieve the result we're after? We could create pre-calculated views with the results we're after, but that would result in a ton of views since we need other parameters as well in the API for a dynamic (filtered) result. Therefore a function-based approach seems much more neat and easier to maintain.
Thanks for your help and support!
If the view structure is the same for each client, you can do a UDTF with UNION ALL, merge all views and filter only for the one you need.
In practice, you will always read only one view depending on the parameter.
Something like that:
create function t(Client varchar)
returns table(col1 varchar, col2 varchar)
as
$$
SELECT col1, col2
FROM ViewClient1
WHERE Client = 'Client1'
UNION ALL
SELECT col1, col2
FROM ViewClient2
WHERE Client = 'Client2'
UNION ALL
SELECT col1, col2
FROM ViewClient3
WHERE Client = 'Client3'
$$;
Snowflake Scripting stored procedure could be used.
create or replace procedure test_sp_dynamic(table_name string)
returns table(col varchar, col2 varchar)
language sql
as
$$
declare
res RESULTSET;
query VARCHAR DEFAULT 'SELECT Y, Z FROM TABLE(?)';
begin
res := (execute immediate :query using (TABLE_NAME));
return table(res);
end;
$$;
Test data:
CREATE OR REPLACE TABLE view_1(Y VARCHAR, Z VARCHAR) AS SELECT 1, 2;
CREATE OR REPLACE TABLE view_2(Y VARCHAR, Z VARCHAR) AS SELECT 3, 4 ;
CALL test_sp_dynamic('VIEW_1');
-- 1 2
CALL test_sp_dynamic('VIEW_2');
-- 3 4

Return 2 resultset from cursor based on one query (nested cursor)

I'm trying to obtain 2 different resultset from stored procedure, based on a single query. What I'm trying to do is that:
1.) return query result into OUT cursor;
2.) from this cursor results, get all longest values in each column and return that as second OUT
resultset.
I'm trying to avoid doing same thing twice with this - get data and after that get longest column values of that same data. I'm not sure If this is even possible, but If It is, can somebody show me HOW ?
This is an example of what I want to do (just for illustration):
CREATE OR REPLACE PROCEDURE MySchema.Test(RESULT OUT SYS_REFCURSOR,MAX_RESULT OUT SYS_REFCURSOR)
AS
BEGIN
OPEN RESULT FOR SELECT Name,Surname FROM MyTable;
OPEN MAX_RESULT FOR SELECT Max(length(Name)),Max(length(Surname)) FROM RESULT; --error here
END Test;
This example compiles with "ORA-00942: table or view does not exist".
I know It's a silly example, but I've been investigating and testing all sorts of things (implicit cursors, fetching cursors, nested cursors, etc.) and found nothing that would help me, specially when working with stored procedure returning multiple resultsets.
My overall goal with this is to shorten data export time for Excel. Currently I have to run same query twice - once for calculating data size to autofit Excel columns, and then for writing data into Excel.
I believe that manipulating first resultset in order to get second one would be much faster - with less DB cycles made.
I'm using Oracle 11g, Any help much appreciated.
Each row of data from a cursor can be read exactly once; once the next row (or set of rows) is read from the cursor then the previous row (or set of rows) cannot be returned to and the cursor cannot be re-used. So what you are asking is impossible as if you read the cursor to find the maximum values (ignoring that you can't use a cursor as a source in a SELECT statement but, instead, you could read it using a PL/SQL loop) then the cursor's rows would have been "used up" and the cursor closed so it could not be read from when it is returned from the procedure.
You would need to use two separate queries:
CREATE PROCEDURE MySchema.Test(
RESULT OUT SYS_REFCURSOR,
MAX_RESULT OUT SYS_REFCURSOR
)
AS
BEGIN
OPEN RESULT FOR
SELECT Name,
Surname
FROM MyTable;
OPEN MAX_RESULT FOR
SELECT MAX(LENGTH(Name)) AS max_name_length,
MAX(LENGTH(Surname)) AS max_surname_length
FROM MyTable;
END Test;
/
Just for theoretical purposes, it is possible to only read from the table once if you bulk collect the data into a collection then select from a table-collection expression (however, it is going to be more complicated to code/maintain and is going to require that the rows from the table are stored in memory [which your DBA might not appreciate if the table is large] and may not be more performant than compared to just querying the table twice as you'll end up with three SELECT statements instead of two).
Something like:
CREATE TYPE test_obj IS OBJECT(
name VARCHAR2(50),
surname VARCHAR2(50)
);
CREATE TYPE test_obj_table IS TABLE OF test_obj;
CREATE PROCEDURE MySchema.Test(
RESULT OUT SYS_REFCURSOR,
MAX_RESULT OUT SYS_REFCURSOR
)
AS
t_names test_obj_table;
BEGIN
SELECT Name,
Surname
BULK COLLECT INTO t_names
FROM MyTable;
OPEN RESULT FOR
SELECT * FROM TABLE( t_names );
OPEN MAX_RESULT FOR
SELECT MAX(LENGTH(Name)) AS max_name_length,
MAX(LENGTH(Surname)) AS max_surname_length
FROM TABLE( t_names );
END Test;
/

union between requests with remplacement variables in sqlplus

I have 14 fields which are similar and I search the string 'A' on each of them. I would like after that order by "position" field
-- some set in order to remove a lot of useless text
def col='col01'
select '&col' "Fieldname",
&col "value",
position
from oneTable
where &col like '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
/
...
def col='col14'
/
Write all the fields which contains 'A'. The problem is that those field are not ordered by position.
If I use UNION between table, I cannot take advantage of the substitution variables (&col), and I have to write a bash in unix in order to make the replacement back into ksh. The problem is of course that database code have to be hard-coded in this script (connection is not easy stuff).
If I use a REFCURSOR with OPEN, I cannot group the results sets together. I have only one request and cannot make an UNION of then. (print refcursor1 union refcursor2; print refcursor1+refcursor2 raise an exception, select * from refcursor1 union select * from refcursor2, does not work also).
How can concatenate results into one big "REFCURSOR"? Or use a union between two distinct run ('/') of my request, something like holding the request while typing new definition of variables?
Thank you for any advice.
Does this answer your question ?
CREATE TABLE #containingAValueTable
(
FieldName VARCHAR(10),
FieldValue VARCHAR(1000),
position int
)
def col='col01'
INSERT INTO #containingAValueTable
(
FieldName , FieldValue, position
)
SELECT '&col' "Fieldname",
&col "value",
position
FROM yourTable
WHERE &col LIKE '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
INSERT INTO...
/
def col='col14'
/
select * from #containingAValueTable order by postion
DROP #containingAValueTable
But I'm not totally sure about your use of the 'substitution variable' called "col" (and i only have SQL Server to test my request so I used explicit field names)
edit : Sorry for the '#' charcater, we use it so often in SQL Server for temporaries, I didn't even know it was SQL Server specific (moreover I think it's mandatory in sql server for creating temporary table). Whatever, I'm happy I could be useful to you.

Resources