How should I create and populate data in a temp table? then how can I print data retrieved on temp table? I saw example like this; but no further help. SO I need some help.
SELECT DISTINCT * FROM Table INTO TEMP Temp1;
You select the data just like you would from any other table:
SELECT * INTO r_temp1.* FROM Temp1
or defining cursors for such a statement, and using a FOREACH loop, or ...
The trick is that you need to know at compile time what the columns in Temp1 are going to be. In this case, you could use:
DEFINE r_temp1 RECORD LIKE Table.*
In more general cases, you'll probably assemble a record by hand from the relevant bits and pieces:
DEFINE r_temp1 RECORD
col1 LIKE Table1.ColumnA,
col2 LIKE Table2.ColumnB,
...
colN LIKE TableZ.ColumnZ
END RECORD
I created the following function that can be called passing in the temp-table name:
function unload_temp(l_table)
define
l_table char(20),
l_file char(20),
str char(512)
let l_file = l_table clipped, ".out"
let str = "select * from ",l_table clipped
whenever error continue
prepare stmt_table_exists from str
whenever error stop
#Return if table does not exist
if sqlca.sqlcode = -206
then
return
end if
unload to l_file delimiter "|" str
end function #unload_temp()
This will create a file called tableName.out.
This is extremely helpful when debugging and you want to see how a temp table looks in middle running a program.
Related
I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery
I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');
Consider an enterprise that captures sensor data for different production facilities. per facility, we create an aggregation query that averages the values to 5min timeslots. This query exists out of a long list of with-clauses and writes data to a table (called aggregation_table).
Now my problem: currently we have n queries running that exactly run the same logic, the only thing that differs are table names (and sometimes column names but let's ignore that for now).
Instead of managing n different scripts that are basically the same, I would like to put it in a stored procedure that is able to work like this:
CALL aggregation_query(facility_name) -> resolve the different tables for that facility and then use them in the different with clauses
On top of that, instead of having this long set of clauses that give me the end-result, I would like to chunk them up in logical blocks that are parametrizable, So for example, if I call the aforementioned stored_procedure for facility A, I want to be able to pass / use this table name in these different functions, where the output can be re-used in the next statement (like you would do with with clauses).
Another argument of why I want to chunk this up in re-usable blocks is because we have many "derivatives" on this aggregation query, for example to manage historical data, to correct data or to have the sensor data on another aggregation level. As these become overly complex, it is much easier to manage them without having to copy paste and adjust these every time.
In the current set-up, it could be useful to know that I am only entitled to use plain BigQuery, As my team is not allowed to access the CI/CD / scheduling and repository. (meaning that I cannot solve the issue by having CI/CD that deploys the n different versions of the procedure and functions)
So in the end, I would like to end up with something like this using only bigquery:
CREATE OR REPLACE PROCEDURE
`aggregation_function`()
BEGIN
DECLARE
tablename STRING;
DECLARE
active_table_name STRING; ##get list OF tables CREATE TEMP TABLE tableNames AS
SELECT
table_catalog,
table_schema,
table_name
FROM
`catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE
table_name = tablename;
WHILE
(
SELECT
COUNT(*)
FROM
tableNames) >= 1 DO ##build dataset + TABLE name
SET
active_table_name = CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`'); ##use concat TO build string AND execute
EXECUTE IMMEDIATE '''
INSERT INTO
`aggregation_table_for_facility` (timeslot, sensor_name, AVG_VALUE )
WITH
STEP_1 AS (
SELECT
*
FROM
my_table_function_step_1(active_table_name,
parameter1,
parameter2) ),
STEP_2 AS (
SELECT
*
FROM
my_table_function_step_2(STEP_1,
parameter1,
parameter2) )
SELECT * FROM STEP_2
'''
USING active_table_name as active_table_name;
DELETE
FROM
tableNames
WHERE
table_name = tablename;
END WHILE
;
END
;
I was hoping someone could make a snippet on how I can do this in Standard SQL / Bigquery, so basically:
stored procedure that takes in a string variable and is able to use that as a table (partly solved in the approach above, but not sure if there are better ways)
(table) function that is able to take this table_name parameter as well and return back a table that can be used in the next with clause (or alternatively writes to a temp table)
I think below code snippets should provide you with some insights when dealing with procedures, inserts and execute immediate statements.
Here I'm creating a procedure which will insert values into a table that exists on the information schema. Also, as a value I want to return I use OUT active_table_name to return the value I assigned inside the procedure.
CREATE OR REPLACE PROCEDURE `project-id.dataset`.custom_function(tablename STRING,OUT active_table_name STRING)
BEGIN
DECLARE query STRING;
SET active_table_name= (SELECT CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`')
FROM `project-id.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = tablename);
#multine query can be handled by using ''' or """
Set query =
'''
insert into %s (string_field_0,string_field_1,string_field_2,string_field_3,string_field_4,int64_field_5)
with custom_query as (
select string_field_0,string_field_2,'169 BestCity',string_field_3,string_field_4,55677 from %s limit 1
)
select * from custom_query;
''';
# querys must perform operations and must be the last thing to perform
# pass parameters using format
execute immediate (format(query,active_table_name,active_table_name));
END
You can also use a loop to iterate trough records from a working table so it will execute the procedure and also be able to get the value from the procedure to use somewhere else.ie:A second procedure to perform a delete operation.
DECLARE tablename STRING;
DECLARE out_value STRING;
FOR record IN
(SELECT tablename from `my-project-id.dataset.table`)
DO
SET tablename = record.tablename;
LOOP
call `project-id.dataset`.custom_function(tablename,out_value);
select out_value;
END LOOP;
END FOR;
To recap, there are some restrictions such as the possibility to call procedures inside a execute immediate or to use execute immediate inside an execute immediate, to count a few. I think these snippets should help you dealing with your current situation.
For this sample I use the following documentation:
Data Manipulation Language
Dealing with outputs
Information Schema Tables
Execute Immediate
For...In
Loops
Would you please provide an an example for a Redshift procedure where you have used a cursor and an UPDATE statement in conjunction? Is that even feasible, I couldn't find an example. I'm looking for a simple template code to learn how to have these 2 together in a single procedure on Redshift.
Here is an example use case:
I have a table like this:
CREATE TABLE test_tbl
(
Contactid VARCHAR(500),
sfdc_OppId_01 VARCHAR(500),
sfdc_OppId_02 VARCHAR(500),
sfdc_OppId_03 VARCHAR(500),
sfdc_OppId_04 VARCHAR(500),
sfdc_OppId_05 VARCHAR(500),
sfdc_OppId_06 VARCHAR(500)
)
I want to update each sfdc_OppId_xx with the relative value from another table; sfdc_tbl. Here is what sfdc_tbl looks like:
sfdc_contactId
sfdc_Opp_Id
AA123hgt
999999
AA123hgt
888888
AA123hgt
777777
AA123hgt
432567
AA123hgt
098765
AA123hgt
112789
So as you see, there are duplicate sfdc_contactid in the sfdc_tbl. My final goal is to list all the sfdc_Opp_Id for given contactid horizontally in the test_tbl. I shall not have duplicate contactid in the test_tbl.
INSERT INTO test_tbl (Contactid)
SELECT sfdc_contactId
FROM sfdc_tbl
GROUP BY sfdc_contactId
And here is what I'm trying to do:
CREATE OR REPLACE PROCEDURE testing_procedure (results INOUT refcursor)
AS
$$
BEGIN
OPEN cursor_testing FOR
SELECT
Ops.sfdc_Opp.id,
ROW_NUMBER () OVER(PARTITION BY Ops.sfdc_contactId ORDER BY sfdc_Opp_Id ) RWN
FROM sfdc_tbl Ops
INNER JOIN test_tbl tbl
ON Ops.sfdc_contactId = tbl.contactid;
UPDATE test_tbl
SET sfdc_Opp_01 = CASE WHEN cursor_testing.RWN = 1 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
sfdc_Opp_02 = CASE WHEN cursor_testing.RWN = 2 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
sfdc_Opp_03 = CASE WHEN cursor_testing.RWN = 3 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
sfdc_Opp_04 = CASE WHEN cursor_testing.RWN = 4 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
sfdc_Opp_05 = CASE WHEN cursor_testing.RWN = 5 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
sfdc_Opp_06 = CASE WHEN cursor_testing.RWN = 6 THEN cursor_testing.sfdc_Ops_id ELSE NULL END
;
END;
$$
LANGUAGE plpgsql;
I keep getting an error
incorrect syntax at or near "cursor_testing"
I've answered a question with a similar solution. The SQL uses a cursor's data to INSERT into a table and this same path should work for UPDATE - How to join System tables or Information Schema tables with User defined tables in Redshift
That being said and looking at your code I really think you would be better off using a temp table rather than a cursor. The first thing to note is that a cursor is not a table. Your use pattern won't work AFAIK. You read a cursor row by row (or bunches) which is contrary to Redshift's columnar table storage. So you will need to loop on the rows from the cursor and perform N updates. This will be extremely slow! You would be querying columnar data, storing the results in a cursor as rows, reading these row one by one, and then performing a new query (UPDATE) for each row. Ick! Stay in "columnar land" and use a temp table.
I want to get the number of selected rows as well as the selected data. At the present I have to use two sql statements:
one is
select * from XXX where XXX;
the other is
select count(*) from XXX where XXX;
Can it be realised with a single sql string?
I've checked the source code of sqlite3, and I found the function of sqlite3_changes(). But the function is only useful when the database is changed (after insert, delete or update).
Can anyone help me with this problem? Thank you very much!
SQL can't mix single-row (counting) and multi-row results (selecting data from your tables). This is a common problem with returning huge amounts of data. Here are some tips how to handle this:
Read the first N rows and tell the user "more than N rows available". Not very precise but often good enough. If you keep the cursor open, you can fetch more data when the user hits the bottom of the view (Google Reader does this)
Instead of selecting the data directly, first copy it into a temporary table. The INSERT statement will return the number of rows copied. Later, you can use the data in the temporary table to display the data. You can add a "row number" to this temporary table to make paging more simple.
Fetch the data in a background thread. This allows the user to use your application while the data grid or table fills with more data.
try this way
select (select count() from XXX) as count, *
from XXX;
select (select COUNT(0)
from xxx t1
where t1.b <= t2.b
) as 'Row Number', b from xxx t2 ORDER BY b;
just try this.
You could combine them into a single statement:
select count(*), * from XXX where XXX
or
select count(*) as MYCOUNT, * from XXX where XXX
To get the number of unique titles, you need to pass the DISTINCT clause to the COUNT function as the following statement:
SELECT
COUNT(DISTINCT column_name)
FROM
'table_name';
Source: http://www.sqlitetutorial.net/sqlite-count-function/
For those who are still looking for another method, the more elegant one I found to get the total of row was to use a CTE.
this ensure that the count is only calculated once :
WITH cnt(total) as (SELECT COUNT(*) from xxx) select * from xxx,cnt
the only drawback is if a WHERE clause is needed, it should be applied in both main query and CTE query.
In the first comment, Alttag said that there is no issue to run 2 queries. I don't agree with that unless both are part of a unique transaction. If not, the source table can be altered between the 2 queries by any INSERT or DELETE from another thread/process. In such case, the count value might be wrong.
Once you already have the select * from XXX results, you can just find the array length in your program right?
If you use sqlite3_get_table instead of prepare/step/finalize you will get all the results at once in an array ("result table"), including the numbers and names of columns, and the number of rows. Then you should free the result with sqlite3_free_table
int rows_count = 0;
while (sqlite3_step(stmt) == SQLITE_ROW)
{
rows_count++;
}
// The rows_count is available for use
sqlite3_reset(stmt); // reset the stmt for use it again
while (sqlite3_step(stmt) == SQLITE_ROW)
{
// your code in the query result
}