Is it possible to only capture Inserts and Updates in snowflake streams - stream

I have created snowflake stream on table TEST1 to capture change in the data in TEST1 table as below. I only want Inserts and Updates to be captured in STREAM_TEST, so that when I delete records from TEST1, it shouldn't get captured in STREAM_TEST. Is there a way to do that?
create or replace stream STREAM_TEST
on table TEST1
APPEND_ONLY = FALSE
comment = 'stream on stage table';

Related

How to PURGE Stage or delete files from Stage when using SNOWPIPE?

Snowflake provides Snowpipe to Copy data into a Table as soon as it is available in a Stage, but it misses the option of Purge.
Is there another way to achieve this?
There's no direct way to achieve Purge in case of Snowpipe but it can be achieved through the combination of Snowpipe, Stream and Task
Let's assume we have our Data files to be loaded residing in GCS Bucket
Step 1: Create a Snowpipe on Snowflake with an External stage
Refer to this Documentation
// Create a Staging Table
CREATE TABLE SNOWPIPE_DB.PUBLIC.GCP_STAGE_TABLE (COL1 STRING);
// Create Destination Table
CREATE TABLE SNOWPIPE_DB.PUBLIC.GCP_DESTINATION_TABLE (COL1 STRING);
// Create an External Stage
CREATE STAGE SNOWPIPE_DB.PUBLIC.GCP_STAGE
URL='gcs://bucket/files/'
STORAGE_INTEGRATION = '<STORAGE_INTEGRATION>';
// Create Snowpipe
CREATE PIPE SNOWPIPE_DB.PUBLIC.GCP_Pipe
AUTO_INGEST = true
INTEGRATION = '<NOTIFICATION_INTEGRATION>'
AS
COPY INTO SNOWPIPE_DB.PUBLIC.GCP_STAGE_TABLE
FROM #SNOWPIPE_DB.PUBLIC.GCP_STAGE;
Step 2: Create Stream on Table GCP_STAGE_TABLE
A stream records data manipulation language (DML) changes made to a table, including information about inserts, updates, and deletes.
Refer to this Documentation
// Create Stream in APPEND_ONLY Mode since we are concerned with INSERTS only
CREATE OR REPLACE STREAM SNOWPIPE_DB.PUBLIC.RESPONSES_STREAM
ON TABLE SNOWPIPE_DB.PUBLIC.GCP_STAGE_TABLE
APPEND_ONLY = TRUE;
Now, whenever some data is uploaded on the GCS Bucket, GCP_STAGE_TABLE is populated by Snowpipe and so is our Stream RESPONSES_STREAM
RESPONSES_STREAM Would look like this
COL1
METADATA$ACTION
METADATA$ISUPDATE
METADATA$ROW_ID
MOHAMMED
INSERT
FALSE
kjee941e66d4ca4hhh1e2b8ddba12c9c905a829
TURKY
INSERT
FALSE
b7c5uytba6c1jhhfb6e9d85e3d3cfd7249192b0d8
Since the Stream has APPEND_ONLY Mode, we will only see INSERT in METADATA$ACTION
Step 3: Create a Procedure to PURGE the stage and Populate GCP_DESTINATION_TABLE
// Create a Procedure
CREATE OR REPLACE Load_Data()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
var purgeStage = `REMOVE #SNOWPIPE_DB.PUBLIC.GCP_STAGE`;
var populateTable = `INSERT INTO SNOWPIPE_DB.PUBLIC.GCP_DESTINATION_TABLE
SELECT * FROM RESPONSES_STREAM`;
try {
snowflake.execute ( {sqlText: purgeStage} );
snowflake.execute ( {sqlText: populateTable} );
return "Succeeded.";
}
catch (err) {
return "Failed: " + err;
}
$$
The above Procedure uses REMOVE command to purge the Stage and Populates the Table GCP_DESTINATION_TABLE.
Populating the Table GCP_DESTINATION_TABLE from the Stream RESPONSES_STREAM Clears the Stream.
Step 4: Create a Task to call the Procedure Load_Data()
Refer to this Documentation
We create a Task with a 5 mins interval which 1st checks the Stream RESPONSES_STREAM for any Data loaded to GCP_STAGE_TABLE, if True, executes a Procedure Load_Data()
// Task DDL
CREATE OR REPLACE TASK MASTER_TASK
WAREHOUSE = LOAD_WH
SCHEDULE = '5 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('SNOWPIPE_DB.PUBLIC.RESPONSES_STREAM') //Checks the stream for Data
AS
CALL Load_Data();
SYSTEM$STREAM_HAS_DATA('RESPONSES_STREAM') evaluates to True when Data is loaded to GCP_STAGE_TABLE, this then makes the Task execute the Procedure call.
Even though the Procedure is not called every 5 mins, it's worth noting that WHEN SYSTEM$STREAM_HAS_DATA('RESPONSES_STREAM') does consume a minute Compute resource, to reduce this, the frequency can be changed from 5 mins to a greater duration.
To make this an ELT task, Procedure can have some transformation logic and a Tree of Tasks can be made.
Note:
REMOVE is not officially supported for External stages but it still worked for GCS Bucket for me.
Let me know if it works for you in the case of AWS S3 and Azure.

DB2 stored procedure to load data in different database

I am creating a stored procedure to load data from source database: MYDB -> target database: NEWDB.
I will load the data in tables SCHEMA1.EMPLOYEE1, SCHEMA1.EMPLOYEE2, ...
Edit 1:
CREATE or replace PROCEDURE SCHEMA1.PROC_LOAD ()
SPECIFIC PROC_LOAD
LANGUAGE SQL
BEGIN
DECLARE v_table varchar(100);--
DECLARE truncate_stmt varchar(1000);--
DECLARE load_stmt varchar(1000);--
for v_table as select rtrim(tabname) as tabname from syscat.tables where tabschema='SCHEMA1' and tabname like '%EMPLOYEE%'
do
-- Truncate the table first
set truncate_stmt = 'ALTER TABLE SCHEMA1.'||v_table.tabname||' ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE';--
prepare s1 from truncate_stmt;--
execute s1;--
-- Load the data
set load_stmt = 'LOAD FROM (DATABASE MYDB SELECT * FROM SCHEMA1.'||v_table.tabname||'_HIST) OF CURSOR MESSAGES ON SERVER INSERT INTO SCHEMA1.'||v_table.tabname||' NONRECOVERABLE';--
CALL SYSPROC.ADMIN_CMD (load_stmt);--
end for;--
END;
Above is the code of my db2 stored procedure, I have created it successfully, but when I call it, it returns the error:
ERROR [24501] [IBM][DB2/NT64] SQL0501N The cursor specified in a FETCH statement or CLOSE statement is not open or a cursor variable in a cursor scalar function reference is not open.
In the target database, I select the data from SCHEMA1.EMPLOYEE1 and it shows that the data is loaded successfully, but for EMPLOYEE2,3,..., the old data is still there, it seems that only the first table in the loop is loaded successfully.
Any idea? Also my db2 platform is db2 11.1 on luw. Thanks in advance.
LOAD commits implicitly, so you have to use WITH HOLD clause of the FOR statement to not close the cursor upon such a commit.

Shared Table not printing on Client

Basically I have a SHARED lua file where I define the table.
I did this because I thought if we define the Table in a shared file we can use it client and server side.
SHARED.lua:
TableA = {}
Then I edit it on a SERVER lua file.
SERVER.lua:
function UpdateTable()
// Clean Table first
for k in pairs(TableA) do
TableA[k] = nil
end
... not worth showing the rest ...
// Insert New Values
for i=1, 10 do
table.insert(TableA, result[i])
end
// Debug Print
print(table.ToString(TableA)) // It Prints every value correctly
end
Now when I try to print it client side, it says the Table exists but it's empty.
CLIENT.lua:
print(table.ToString(TableA))// Prints "{}" and it shouldn't be empty
Note: UpdateTable() runs every 5min
Apparently when we define a table shared, doesn't mean the values will be shared through the server and client. It only means that the code will run on both server/client.
You have to network them for them to "share" the values on the table.

How to load table from file

I know how to write table content in a text file and restore it. But what is the best practice to write a custom table type in a file?
Here is my situation:
I have a list of tables f.e. objective1 = {} objective2 = {} objective3 = {} ..
Each objective has its own execute function which checks some conditions and execute some commands depending on the conditions.
Each new game a pick a random set of objectives and store them in an array Objectives = { [1] = { objective1 , objective3 } }
Now I want to save the array of random objectives and load them later from the file.
Is there any possibility to save the name of the tables in the array and to restore them by the name?
Would be great to have a solution without using a factory pattern or using indices for saving.
If any of the data in the table is not a userdata (for eg. a SQL or socket connection, or other libraries) you can use the pickle module to write the data to a file and later load the same from there.
I personally use a different pickle library (written by Walter Doekes). The usage is as follows:
local TableToStore = { 'a', 'b', 1, {'nested', 'content', {c = 'inside'}} }
pickle.store( 'backupFileName.bak', {TableToStore = TableToStore} )
and to restore the data, I can simply use
dofile "backupFileName.bak"
and I'll get back a table named TableToStore after this.

BEFORE INSERT trigger with stored procedure call (DB2 LUW 9.5)

I am trying to create a BEFORE INSERT trigger that will check the incoming value of a field, and replace it with the same field in another row if that the field is null. However, when I add the CALL statement to my trigger, an error is returned "The trigger "ORGSTRUCT.CSTCNTR_IN" is defined with an unsupported triggered SQL statement". I checked the documentation and saw that cursors weren't supported in the BEFORE (part of the reason for making the stored procedure in the first place), but even when I remove the cursor declaration from the stored procedure the call still generates the same error.
Trigger:
CREATE TRIGGER orgstruct.cstcntr_IN
NO CASCADE
BEFORE INSERT ON orgstruct.tOrgs
REFERENCING NEW AS r
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
DECLARE prnt_temp BIGINT;
DECLARE cstcntr_temp CHAR(11);
SET prnt_temp = r.prnt;
SET cstcntr_temp = r.cstcntr;
CALL orgstruct.trspGetPrntCstCntr(prnt_temp,cstcntr_temp);
SET r.cstcntr = cstcntr_temp;
END
Stored procedure:
CREATE PROCEDURE orgstruct.trspGetPrntCstCntr (
IN p_prnt BIGINT,
OUT p_cstcntr CHAR(11)
)
SPECIFIC trGetPrntCstCntr
BEGIN
IF p_prnt IS NULL THEN
RETURN;
END IF;
BEGIN
DECLARE c1 CURSOR
FOR
SELECT cstcntr
FROM orgstruct.tOrgs
WHERE id = p_prnt
FOR READ ONLY;
OPEN c1;
FETCH FROM c1 INTO p_cstcntr;
CLOSE c1;
END;
END
According to the documentation, CALL is allowed in a BEFORE trigger, so I don't understand what the problem is.
A before trigger can call a stored procedure, but the stored proc can't do anything not allowed in the trigger.
In your case, the default level of data access for a SQL stored proc is MODIFIES SQL DATA, which is not allowed in the trigger. You could recreate your stored procedure, changing the data access level to READS SQL DATA; this will allow you to create the trigger.
However: There is no reason to call a stored procedure for something this simple; You can do it using a simple inline trigger:
create trigger orgstruct.cstcntr_IN
no cascade
before insert on orgstruct.tOrgs
referencing new as r
for each row
mode db2sql
set r.cstcntr = case
when r.p_prnt is not null
then (select cstcntr from tOrgs where id = r.p_prnt fetch first 1 row only)
else r.cstcntr
end;
This will be a LOT more efficient because it eliminates both the stored procedure call and the cursor processing inside the stored proc. Even if you wanted to use the stored proc, you could eliminate the cursor inside the stored proc and improve performance.
FYI: the logic that you posted contains an error, and will always set CSTCNTR to NULL. The trigger posted in this answer not do this. :-)

Resources