Am currently loading data from one snowflake table to another table in snowflake, also doing some datatype conversions while doing the data loads
But when there is any error, my load is getting failed.I need to capture the error rows in a table and continue my load though any errors occur.
I have tried that using stored procedure as below but only able to capture error information:-
Please let me know if there is any way to achieve this in snowflake.
CREATE OR REPLACE PROCEDURE LOAD_TABLE_A()
RETURNS varchar
NOT NULL
LANGUAGE javascript
AS
$$
var result;
var sql_command = "insert into TABLE A"
sql_command += " select"
sql_command += " migration_status,to_date(status_date,'ddmmyyyy') as status_date,"
sql_command += " to_time(status_time,'HH24MISS') as status_time,unique_unit_of_migration_number,reason,"
sql_command += " to_timestamp_ntz(current_timestamp) as insert_date_time"
sql_command += " from TABLE B"
sql_command += " where insert_date_time>(select max(insert_date_time) from TABLE A);"
try {
snowflake.execute({ sqlText: sql_command});
result = "Succeeded";
}
catch (err) {
result = "Failed";
snowflake.execute({
sqlText: `insert into mcs_error_log VALUES (?,?,?,?)`
,binds: [err.code, err.state, err.message, err.stackTraceTxt]
});
}
return result;
$$;
I worked through an example how to send good rows from one table to another while sending bad ones to a separate table. It should be on the Snowflake blog shortly. The key is using multi-table inserts like so:
-- Create a staging table with all columns defined as strings.
-- This will hold all raw values from the load filess.
create or replace table SALES_RAW
( -- Actual Data Type
SALE_TIMESTAMP string, -- timestamp
ITEM_SKU string, -- int
PRICE string, -- number(10,2)
IS_TAXABLE string, -- boolean
COMMENTS string -- string
);
-- Create the production table with actual data types.
create or replace table SALES_STAGE
(
SALE_TIMESTAMP timestamp,
ITEM_SKU int,
PRICE number(10,2),
IS_TAXABLE boolean,
COMMENTS string
);
-- Simulate adding some rows from a load file. Two rows are good.
-- Four rows generate errors when converting to the data types.
insert into SALES_RAW
(SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)
values
('2020-03-17 18:21:34', '23289', '3.42', 'TRUE', 'Good row.'),
('2020-17-03 18:21:56', '91832', '1.41', 'FALSE', 'Bad row: SALE_TIMESTAMP has the month and day transposed.'),
('2020-03-17 18:22:03', '7O242', '2.99', 'T', 'Bad row: ITEM_SKU has a capital "O" instead of a zero.'),
('2020-03-17 18:22:10', '53921', '$6.25', 'F', 'Bad row: PRICE should not have a dollar sign.'),
('2020-03-17 18:22:17', '90210', '2.49', 'Foo', 'Bad row: IS_TAXABLE cannot be converted to true or false'),
('2020-03-17 18:22:24', '80386', '1.89', '1', 'Good row.');
-- Make sure the rows inserted okay.
select * from SALES_RAW;
-- Create a table to hold the bad rows.
create or replace table SALES_BAD_ROWS like SALES_RAW;
-- Insert good rows into SALES_STAGE and
-- bad rows into SALES_BAD_ROWS
insert first
when SALE_TIMESTAMP_X is null and SALE_TIMESTAMP is not null or
ITEM_SKU_X is null and SALE_TIMESTAMP is not null or
PRICE_X is null and PRICE is not null or
IS_TAXABLE_X is null and IS_TAXABLE is not null
then
into SALES_BAD_ROWS
(SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)
values
(SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)
else
into SALES_STAGE
(SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)
values
(SALE_TIMESTAMP_X, ITEM_SKU_X, PRICE_X, IS_TAXABLE_X, COMMENTS)
select try_to_timestamp (SALE_TIMESTAMP) as SALE_TIMESTAMP_X,
try_to_number (ITEM_SKU, 10, 0) as ITEM_SKU_X,
try_to_number (PRICE, 10, 2) as PRICE_X,
try_to_boolean (IS_TAXABLE) as IS_TAXABLE_X,
COMMENTS,
SALE_TIMESTAMP,
ITEM_SKU,
PRICE,
IS_TAXABLE
from SALES_RAW;
-- Examine the two good rows
select * from SALES_STAGE;
-- Examine the four bad rows
select * from SALES_BAD_ROWS;
Load error information is captured by Snowflake and can be accessed by querying the COPY_HISTORY table function.
https://docs.snowflake.net/manuals/sql-reference/functions/copy_history.html
Within the COPY INTO command you can decide how to proceed with a file if one or more rows fail the load process by using the ON_ERROR parameter.
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html#copy-options-copyoptions
I recommend you check out try_cast.
https://docs.snowflake.net/manuals/sql-reference/functions/try_cast.html
Also for your query, I would just use a view and if performance is an issue a materialized view.
I think a nice solution is to wrap your SQL calls with a helper method.
For example lets say instead of doing snowflake.execute({}) ...
You use something like:
EXEC(select * from table1 where x > ?,[param1]);
Inside the EXEC method you can have a try catch and you can easily add things like a continue handler, or exit_handler were you can put logic to log your errors on a table.
I have assembled a repo with a tools and some snippets. Maybe take a look at: https://github.com/orellabac/SnowJS-Helpers
Related
I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');
I need to find out rows that are present in table A and missing from table B (using LEFT JOIN) wherein table A and table B are two tables with same structure but within different schema.
But the query has to be constructed using Dynamic SQL and the columns that need to be used for performing JOIN are stored in a string. How to extract the column names from string and use them to dynamically construct below query :
Database is Azure SQL Server
eg :
DECLARE #ColNames NVARCHAR(150) = 'col1,col2'
Query to be constructed based on columns defined in ColNames :-
SELECT *
FROM Table A
Left Join
Table B
ON A.col1 = B.col1
AND A.col2 = B.col2
AND B.col1 IS NULL AND B.col2 IS NULL
If the number of columns in #ColNames is more then the SELECT statement needs to cater for all the column.
Without knowing the full context, try this:
DECLARE #ColNames NVARCHAR(150) = 'col1,col2'
DECLARE #JoinContion NVARCHAR(MAX) = ''
DECLARE #WhereCondition NVARCHAR(MAX) = ''
SELECT #JoinContion += CONCAT('[a].', QUOTENAME(Value), ' = ', '[b].', QUOTENAME(Value), (CASE WHEN LEAD(Value) OVER(ORDER BY Value) IS NOT NULL THEN ' AND ' ELSE '' END))
,#WhereCondition += CONCAT('[a].', QUOTENAME(Value), ' IS NULL', (CASE WHEN LEAD(Value) OVER(ORDER BY Value) IS NOT NULL THEN ' AND ' ELSE '' END))
FROM STRING_SPLIT(#ColNames,N',')
SELECT #JoinContion, #WhereCondition
String_Split: To split the input string into columns
Lead: to determine if we need the AND keyword when it's not the last row.
Be aware the NOT EXISTS is probably a better solution then LEFT JOIN
To read an integer column into integer list
create function somedum()
returning int
define s LIST(INTEGER NOT NULL);
select id into s from informix.emptest;
end function
create table emptest(id int)
insert into emptest(id)values(7)
when I execute the above function
iam getting error as attached image
The Informix documentation on inserting elements into a LIST seems misleading to me, as my first impression of the example ( Insert into a LIST ) also led me to use the SELECT INTO syntax and get the same error:
-9634 No cast from integer to set(integer not null)
The SELECT INTO syntax can be used to copy/insert an entire LIST, not elements into the LIST.
To insert an element into the LIST ( or generally manipulate it's elements ) we need to use the Informix virtual table interface, in this case using the TABLE syntax to create a virtual table from the LIST which then can be used to do the usual insert/select/update/delete operations ( Collection-Derived Table ).
CREATE FUNCTION somedum()
RETURNING LIST( INTEGER NOT NULL ) AS alist;
DEFINE my_list LIST( INTEGER NOT NULL );
INSERT INTO TABLE( my_list ) SELECT id FROM emptest;
RETURN my_list;
END FUNCTION;
Executing the function we get:
EXECUTE FUNCTION somedum();
alist LIST{7 }
Is there a way to create postgres stored function (using plpgsql to be able to set input parameters) that returns a custom data set?
I've tried to do something like this according to official manual:
CREATE FUNCTION extended_sales(p_itemno int)
RETURNS TABLE(quantity int, total numeric) AS $$
BEGIN
RETURN QUERY SELECT quantity, quantity * price FROM sales
WHERE itemno = p_itemno;
END;
$$ LANGUAGE plpgsql;
but result is an array with only one column which contains type (quantity, total), but I need to get two column array with 'quantity' column and 'total' column.
At a guess you're running:
SELECT extended_sales(1);
This will return a composite type column. If you want it expanded, you must instead run:
SELECT * FROM extended_sales(1);
Also, as #a_horse_with_no_name notes, a PL/pgSQL function is completely unnecessary here. Presumably this is a simplified example?
In future please include:
Your PostgreSQL version; and
The exact SQL you ran and the exact output you got
I am trying to find the minimum value of a primary key column of type (int) in a particular Table
A portion of my Stored Procedure Code:
IF NOT EXISTS
(
SELECT *
FROM Table
)
BEGIN
SELECT *
FROM Table
END
ELSE
BEGIN
SELECT Min(ColumnOne)
FROM Table
END
This is my main code after reading:
if (!reader.Read())
return "EMPTY TABLE";
else
return reader.GetInt32(0).ToString();
My ExecuteReader has no problem but when I got an exception at the statement
reader.GetInt32(0).ToString()
I believe I extract the information wrongly when my tables have more than one entry. What is the correct function I should call from reader to get the number??
i didn't get your Question.
AS you specify min() Value in Question And you wrote max() Function in T-SQL Script.
You Can try below if you want to retrieve next Val of a Column
Select isnull(max(ColumnOne),0)+1 FROM Table
Above Query will return you
1 in case Table is empty
else max current Value+1(Next Available Value)
from Table.