DB2: Procedure with a simple select return more than 15x slow compared to the same select in a query - stored-procedures

I have a select that return a single column, using as where clause 4 columns, and I created a specific index for this query. When I do this select using Dbeaver, the result return in 30~50 milliseconds.
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
Now I created a procedure with this same simple select. The proc declare a P1, a 'cursor with return', do the select, open the cursor and close the P1. When I use the same dbeaver connection the result is returning between 1.2 ~ 1.6 seconds.
CREATE PROCEDURE myProc (
IN a DECIMAL(10),
IN b DECIMAL(10),
IN c DECIMAL(10),
IN d DECIMAL(10)
)
LANGUAGE SQL
DYNAMIC RESULT SETS 1
SPECIFIC myProc
P1: BEGIN
DECLARE cursor1 CURSOR WITH RETURN for
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
OPEN cursor1;
END P1
Is this huge return difference correct? If wrong, Is there something wrong in my procedure that justifies this return time difference? Or could be something wrong in the DB configuration or the server that justifies this difference, something like few resources in the DB server or something like the proc not using the index( I don't know if procs in the DB2 use the index by default like the queries )?
I am new in DB2 and new in procedures creation. Thank you in advance.
Best regards,
Luis

I don't know if is the best way, but I solved the problem using a solution that I read for SQL Server. The solution is create local variables that will receive the parameters values, and use the variable in the queries, to guarantee always the best execution plan. I didn't knew if this was valid to DB2, but my proc now has almost the same response time compared to query. Worked!
Link of SQL Server post: SQL Server: Query fast, but slow from procedure
In this link above a user called #Jake give this explanation:
"The reason this happens is because the procedures query plan is being cached, along with the parameters that were passed to it. On subsequent calls, this query plan generated will be reused with new parameters. This can cause problems because if the data is unevenly distributed, one parameter can generate a sub-optimal plan vs. another. Using local variables essentially does the same as OPTIMIZE FOR UNKNOWN because local variables cannot be sniffed."
I think that is the same for DB2, because worked. After I change these old procedures to use local variables my execution plan begun to use the indexes recently created

Related

How to pass a table name as a parameter in BigQuery procedure?

I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery

Bigquery - parametrize tables and columns in a stored procedure

Consider an enterprise that captures sensor data for different production facilities. per facility, we create an aggregation query that averages the values to 5min timeslots. This query exists out of a long list of with-clauses and writes data to a table (called aggregation_table).
Now my problem: currently we have n queries running that exactly run the same logic, the only thing that differs are table names (and sometimes column names but let's ignore that for now).
Instead of managing n different scripts that are basically the same, I would like to put it in a stored procedure that is able to work like this:
CALL aggregation_query(facility_name) -> resolve the different tables for that facility and then use them in the different with clauses
On top of that, instead of having this long set of clauses that give me the end-result, I would like to chunk them up in logical blocks that are parametrizable, So for example, if I call the aforementioned stored_procedure for facility A, I want to be able to pass / use this table name in these different functions, where the output can be re-used in the next statement (like you would do with with clauses).
Another argument of why I want to chunk this up in re-usable blocks is because we have many "derivatives" on this aggregation query, for example to manage historical data, to correct data or to have the sensor data on another aggregation level. As these become overly complex, it is much easier to manage them without having to copy paste and adjust these every time.
In the current set-up, it could be useful to know that I am only entitled to use plain BigQuery, As my team is not allowed to access the CI/CD / scheduling and repository. (meaning that I cannot solve the issue by having CI/CD that deploys the n different versions of the procedure and functions)
So in the end, I would like to end up with something like this using only bigquery:
CREATE OR REPLACE PROCEDURE
`aggregation_function`()
BEGIN
DECLARE
tablename STRING;
DECLARE
active_table_name STRING; ##get list OF tables CREATE TEMP TABLE tableNames AS
SELECT
table_catalog,
table_schema,
table_name
FROM
`catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE
table_name = tablename;
WHILE
(
SELECT
COUNT(*)
FROM
tableNames) >= 1 DO ##build dataset + TABLE name
SET
active_table_name = CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`'); ##use concat TO build string AND execute
EXECUTE IMMEDIATE '''
INSERT INTO
`aggregation_table_for_facility` (timeslot, sensor_name, AVG_VALUE )
WITH
STEP_1 AS (
SELECT
*
FROM
my_table_function_step_1(active_table_name,
parameter1,
parameter2) ),
STEP_2 AS (
SELECT
*
FROM
my_table_function_step_2(STEP_1,
parameter1,
parameter2) )
SELECT * FROM STEP_2
'''
USING active_table_name as active_table_name;
DELETE
FROM
tableNames
WHERE
table_name = tablename;
END WHILE
;
END
;
I was hoping someone could make a snippet on how I can do this in Standard SQL / Bigquery, so basically:
stored procedure that takes in a string variable and is able to use that as a table (partly solved in the approach above, but not sure if there are better ways)
(table) function that is able to take this table_name parameter as well and return back a table that can be used in the next with clause (or alternatively writes to a temp table)
I think below code snippets should provide you with some insights when dealing with procedures, inserts and execute immediate statements.
Here I'm creating a procedure which will insert values into a table that exists on the information schema. Also, as a value I want to return I use OUT active_table_name to return the value I assigned inside the procedure.
CREATE OR REPLACE PROCEDURE `project-id.dataset`.custom_function(tablename STRING,OUT active_table_name STRING)
BEGIN
DECLARE query STRING;
SET active_table_name= (SELECT CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`')
FROM `project-id.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = tablename);
#multine query can be handled by using ''' or """
Set query =
'''
insert into %s (string_field_0,string_field_1,string_field_2,string_field_3,string_field_4,int64_field_5)
with custom_query as (
select string_field_0,string_field_2,'169 BestCity',string_field_3,string_field_4,55677 from %s limit 1
)
select * from custom_query;
''';
# querys must perform operations and must be the last thing to perform
# pass parameters using format
execute immediate (format(query,active_table_name,active_table_name));
END
You can also use a loop to iterate trough records from a working table so it will execute the procedure and also be able to get the value from the procedure to use somewhere else.ie:A second procedure to perform a delete operation.
DECLARE tablename STRING;
DECLARE out_value STRING;
FOR record IN
(SELECT tablename from `my-project-id.dataset.table`)
DO
SET tablename = record.tablename;
LOOP
call `project-id.dataset`.custom_function(tablename,out_value);
select out_value;
END LOOP;
END FOR;
To recap, there are some restrictions such as the possibility to call procedures inside a execute immediate or to use execute immediate inside an execute immediate, to count a few. I think these snippets should help you dealing with your current situation.
For this sample I use the following documentation:
Data Manipulation Language
Dealing with outputs
Information Schema Tables
Execute Immediate
For...In
Loops

Left join table on multiple tables in SAS

I've got multiple master tables in the same format with the same variables. I now want to left join another variable but I can't combine the master tables due to limited storage on my computer. Is there a way that I can left join a variable onto multiple master tables within one PROC SQL? Maybe with the help of a macro?
The LEFT JOIN code looks like this for one join but I'm looking for an alternative than to copy and paste this 5 times:
PROC SQL;
CREATE TABLE New AS
SELECT a.*, b.Value
FROM Old a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
You can't do it in one create table statement, as it only creates one table at a time. But you can do a few things, depending on what your actual limiting factor is (you mention a few).
If you simply want to avoid writing the same code five times, but otherwise don't care how it executes, then just write the code in a macro, as you reference.
%macro update_table(old=, new=);
PROC SQL;
CREATE TABLE &new. AS
SELECT a.*, b.Value
FROM &old. a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
%mend update_table;
%update_table(old=old1, new=new1)
%update_table(old=old2, new=new2)
%update_table(old=old3, new=new3)
Of course, if the names of the five tables are in a pattern, you can perhaps automate this further based on that pattern, but you don't give sufficient information to figure that out.
If you on the other hand need to do this more efficiently in terms of processing than running the SQL query five times, it can be done a number of ways, depending on the specifics of your additional table and your specific limitations. It looks to me that you have a good use case for a format lookup here, for example; see for example Jenine Eason's paper, Proc Format, a Speedy Alternative to Sort/Merge. If you're just merging on the ID, this is very easy.
data for_format;
set additional;
start = ID;
label = value;
fmtname='AdditionalF'; *or '$AdditionalF' if ID is character-valued;
output;
if _n_=1 then do; *creating an "other" option so it returns missing if not found;
hlo='o';
label = ' ';
output;
end;
run;
And then you just have five data steps with a PUT statement adding the value, or even you could simply format the ID variable with that format and it would have that value whenever you did most PROCs (if this is something like a classifier that you don't truly need "in" the data).
You can do this in a single pass through the data in a Data Step using a hash table to lookup values.
data new1 new2 new3;
set old1(in=a) old2(in=b) old3(in=c);
format value best.;
if _n_=1 then do;
%create_hash(lk,id,value,"Additional");
end;
value = .;
rc = lk.find();
drop rc;
if a then
output new1;
else if b then
output new2;
else if c then
output new3;
run;
%create_hash() macro available here.
You could, alternatively, use Joe's format with the same Data Step syntax.

How can I use call a stored procedure within SAS without using ProcSQL?

At my Co-Op my manager is asking me to take my SAS output tables that I've gathered and then to execute a stored procedure that will upload and update any data that has changed into an online KPI(Key Performance Indicators) excel sheet.
Apparently my boss isn't too sure of how to do this even and he's been programming for quite some time.
In laymen's terms this is what I need to do:
create a table of gathered KPI's (Done)
Send the table to the Stored Procedure (I don't want to use ProcSQL in SAS 9.3 because I would be hardcoding in too many fields)
Have the Stored procedure read into the online datasheet (done)
Replace KPI's if they have changed (done)
Here is the ProcSQL that I have figured: Ambiguous names have been given to keep anominity:
%let id = 'HorseRaddish';
%let pwd = 'ABC321';
proc sql;
connect to odbc (dsn='JerrySeinfeld' uid=&id pwd=&pwd);
execute (spKPIInsertUpdateKPIData '411', '7.2', '8808', 'M', 'NANANA', 'WorkStation', 'Testing1212', '1', '8/3/2013 10:42AM') by odbc;
disconnect from odbc;
quit;
run;
Above code works fine, but like I said it's a pain to hard code in KPI calues for hundreds of fields.
If it were me, and I had flexibility to do so, I would rewrite the SP to pull the parameters from a table and upload the table, then call the SP. That's got to be faster.
If it's not, you can script that SP line fairly easily. You will still run it in PROC SQL, but you don't have to write it out by hand.
Something like:
proc sql;
select cats("execute(spKPIInsertUdateKPIData '",var1,"''",var2,"','",var3,<... more ...>,"') by odbc") into :execlist separated by ';';
quit;
That creates a macro variable &execlist that contains the calls to the SP. Then you just do
proc sql;
connect to odbc ... ;
&execlist.
disconnect from odbc;
quit;
That does have some length limits, you might have to do it a bit differently (either cut it up or use %include) if you are over ~20k characters.
But again, this is probably not a very good way to do this - better is load to table and have the SP update from that table. Something like:
libname sqldb oledb init_string=whatever;
proc sql;
drop table sqldb._tempSP_KPI;
create table sqldb._tempSP_KPI as select * from <dataset containing values>;
connect to oledb (init_string=whatever);
<exec SP that uses the _tempSP_KPI table)>
quit;
quit;

DB2 iSeries v6.1 vs Linux v9.7 functionality on xmltable, xml doc

Techies--
I'm getting sql0204 XML in *LIBL type *SQLUDT not found on an i db2 6.1 install when I try to deploy a stored proc I know works on Linux v9.7. The reason I am attempting to get this to work is because I really need to pass a table (or array) variable. I couldn't find a way to send a multi-dim array to a sproc on the 6.1 v of i, so I thought I'd try getting around that with an xml doc. But that failed too... Does anyone have any advice for me on how to solve this issue?
Here's the sproc that works on v9.7,Linux:
CREATE PROCEDURE HCMDEV.EMP_MULTIPLE_XML (IN DOC XML)
DYNAMIC RESULT SETS 1
READS SQL DATA
LANGUAGE SQL SPECIFIC EMP_MULTIPLE_XML
P1: BEGIN
DECLARE CSR1 CURSOR WITH RETURN FOR
SELECT emp.EMPID,
emp.FIRSTNAME,
emp.LASTNAME,
emp.DIVISION,
emp.DISTRICT,
emp.LOCATION,
emp.OPERATIONALAREA,
emp.TERMDATE,
emp.REHIREDATE,
emp.HIREDATE,
emp.ADDRESSLINE1,
emp.ADDRESSLINE2,
emp.CITY,
emp.STATE,
emp.ZIPCODE,
emp.TELEPHONE1,
emp.POSITIONCODE,
emp.POSITIONTITLE,
emp.HIRECODE
FROM HCMDEV.EMPLOYEE emp
WHERE EMP.EMPID IN
(SELECT X.EMPID
FROM XMLTABLE('$d/EMPLOYEE/EMPID' PASSING DOC AS "d" COLUMNS EMPID CHAR(9) PATH '.') AS X);
OPEN CSR1;
END P1
For those following this thread, xmltable is NOT available until V7R1. The workaround is to create a stored proc that accepts as the IN paramater a CLOB datatype. In my case, I pass a long string with commas to separate the values for the empid. That works just fine.
Here's a sampling:
CREATE PROCEDURE HCMDEV.EMP_MULT (IN emp_array CLOB(2M))
DYNAMIC RESULT SETS 1
READS SQL DATA
LANGUAGE SQL SPECIFIC EMP_MULT
P1: BEGIN
-- #######################################################################
-- # Returns specific employees based on incoming clob array
-- #######################################################################
DECLARE v_dyn varchar(10000);
DECLARE v_sql varchar(10000);
DECLARE cursor1 CURSOR WITH RETURN FOR v_dyn;
SET v_sql =
'SELECT
emp.EMPID,
emp.FIRSTNAME,
emp.LASTNAME
FROM HCMDEV.EMPLOYEE emp
WHERE emp.EMPID IN (' || emp_array || ')';
PREPARE v_dyn FROM v_sql;
OPEN cursor1;
END P1

Resources