Snowflake Store Procedure - Loop through csv files in AWS S3 and COPY INTO tables with same name

Snowflake Store Procedure - Loop through csv files in AWS S3 and COPY INTO tables with same name - stored-procedures

I was wondering if someone could help me with the error message I am getting from Snowflake. I am trying to create a stored procedure that will loop through 125 files in S3 and copy into the corresponding tables in Snowflake. The names of the tables are the same names as the csv files. In the example I only have 2 file names set up (if someone knows a better way than having to liste all 125, that will be extremely. helpful) .
The error message I am getting is the following:
syntax error line 5 at position 11 unexpected '1'.
syntax error line 6 at position 22 unexpected '='. (line 4)
CREATE OR REPLACE PROCEDURE load_data_S3(file_name VARCHAR,table_name VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
FOR i IN 1 to 2 LOOP
CASE i
WHEN 1 THEN
SET file_name = 'file1.csv';
SET table_name = 'FILE1';
WHEN 2 THEN
SET file_name = 'file2.csv';
SET table_name = 'FILE2';
--WILL LIST THE REMAINING 123 WHEN STATEMENTS
ELSE
-- Do nothing
END CASE;
COPY INTO table_name
FROM #externalstg/file_name
FILE_FORMAT = (type='csv');
END LOOP;
RETURN 'Data loaded successfully';
END;
$$;

There are various ways to list the files in a stage (see the post here). You can loop through the resultset and run COPY INTO on each record

Related

How to read the values from one flat file and use them in loop in PostgreSQL file?

The requirement is to read the values from one flat file and use them in for loop to modify the pgSQL queries.
I'm new to PostgreSQL. Please suggest any.
CREATE FUNCTION file_vals()
RETURNS varchar[] AS
$BODY$
return (readLines('my_file.txt'))
$BODY$
language plpgsql;
DO $ANONYMOUS_BLOCK$ declare
/* data */
begin
for c, l in user_c, file_vals()
loop
perform x.y(user => c.user, password => l);
RAISE NOTICE '%', c.user;
end loop;
end $ANONYMOUS_BLOCK$;
I'm trying to read the values from my_file.txt and also use the values in for loop to insert the values in database(x.y is the psql function) .
my_file.txt contains passwords:
a
b
c
d
user_c is cursor, which is having user values.
Need to use both the values i.e, user_c and file_vals in one single for loop statement.
Error:
psql:test.psql:6: ERROR: syntax error at or near "return"
LINE 4: return (readLines('my_file.txt'))
^
SET
psql:test.psql:20: ERROR: "c" is not a known variable
LINE 7: for c, l in user_c, file_vals()

INFORMIX: How can I store query result into a variable and echo it to the system as a string?

Is it possible to do something like this inside an Informix stored procedure :
DEFINE my_data VARCHAR(255);
LET meta = (select count(*), something from tab11);
SYSTEM 'echo '|| meta;

You capture the output using an INTO clause and wrap the SELECT in a FOREACH. This fetches one row of data at a time, and you need separate variables for each column that you select. You can then manipulate those into a bigger string.
You can then use SYSTEM.
However, the output of echo will be sent to /dev/null (or NUL:). If that's what you want, fine — but why? If not, you'll need to organize redirection to somewhere else for yourself.
CREATE PROCEDURE echo(str VARCHAR(200) DEFAULT 'hello world');
DEFINE cmd VARCHAR(255);
LET cmd = "echo " || str || " >>/Users/jleffler/tmp/arcana.out";
SYSTEM cmd;
END PROCEDURE;
EXECUTE PROCEDURE echo();
EXECUTE PROCEDURE echo("The world is your oyster");
DROP PROCEDURE echo;
You'll need to adjust the file name to suit your purposes — the chances are high that you don't have my home directory on your machine.
Example output file:
hello world
The world is your oyster
Permissions on file and directories leading to file:
2 drwxr-xr-x root wheel 2017-05-24 17:17:16 /
169236 drwxr-xr-x root admin 2016-09-20 12:46:37 /Users
609973 drwxr-xr-x jleffler staff 2017-05-24 17:18:45 /Users/jleffler
1670154 drwxr-xr-x jleffler staff 2017-05-24 17:19:02 /Users/jleffler/tmp
63140467 -rw-r--r-- jleffler staff 2017-05-24 17:19:02 /Users/jleffler/tmp/arcana.out

Agree with everything #Jonathan Leffler mentioned above. Here is another example where the select the statement returns a single integer value similar to what you have have shown in your question.
create procedure test();
DEFINE my_data int;
LET my_data = (select count(*) from systables);
system 'echo ' || my_data || ' > /tmp/my_data';
end procedure;
execute procedure test();
In my test system, the output of the select statement
select count(*) from systables;
(count(*))
113
1 row(s) retrieved.
When I execute the procedure, the result of the system statement is the file /tmp/my_data.
cat /tmp/my_data
113
In short, it is certainly possible to achieve what you are looking to do. However depending on the result set of the select statement, you may need more complex handling inside the stored procedure.

Shift in the columns of spool file

I am using a shell script to extract the data from 'extr' table. The extr table is a very big table having 410 columns. The table has 61047 rows of data. The size of one record is around 5KB.
I the script is as follows:
#!/usr/bin/ksh
sqlplus -s \/ << rbb
set pages 0
set head on
set feed off
set num 20
set linesize 32767
set colsep |
set trimspool on
spool extr.csv
select * from extr;
/
spool off
rbb
#-------- END ---------
One fine day the extr.csv file was having 2 records with incorrect number of columns (i.e. one record with more number of columns and other with less). Upon investigation I came to know that the two duplicate records were repeated in the file. The primary key of the records should ideally be unique in file but in this case 2 records were repeated. Also, the shift in the columns was abrupt.
Small example of the output file:
5001|A1A|AAB|190.00|105|A
5002|A2A|ABB|180.00|200|F
5003|A3A|AAB|153.33|205|R
5004|A4A|ABB|261.50|269|F
5005|A5A|AAB|243.00|258|G
5006|A6A|ABB|147.89|154|H
5003|A7A|AAB|249.67|AAB|153.33|205|R
5004|A8A|269|F
5009|A9A|AAB|368.00|358|S
5010|AAA|ABB|245.71|215|F
Here the primary key records for 5003 and 5004 have reappeared in place of 5007 and 5008. Also the duplicate reciords have shifted the records of 5007 and 5008 by appending/cutting down their columns.
Need your help in analysing why this happened? Why the 2 rows were extracted multiple times? Why the other 2 rows were missing from the file? and Why the records were shifted?
Note: This script is working fine since last two years and has never failed except for one time (mentioned above). It ran successfully during next run. Recently we have added one more program which accesses the extr table with cursor (select only).

I reproduced a similar behaviour.
;-> cat input
5001|A1A|AAB|190.00|105|A
5002|A2A|ABB|180.00|200|F
5003|A3A|AAB|153.33|205|R
5004|A4A|ABB|261.50|269|F
5005|A5A|AAB|243.00|258|G
5006|A6A|ABB|147.89|154|H
5009|A9A|AAB|368.00|358|S
5010|AAA|ABB|245.71|215|F
See the input file as your database.
Now I write a script that accesses "the database" and show some random freezes.
;-> cat writeout.sh
# Start this script twice
while IFS=\| read a b c d e f; do
# I think you need \c for skipping \n, but I do it different one time
echo "$a|$b|$c|$d|" | tr -d "\n"
(( sleeptime = RANDOM % 5 ))
sleep ${sleeptime}
echo "$e|$f"
done < input >> output
EDIT: Removed cat input | in script above, replaced by < input
Start this script twice in the background
;-> ./writeout.sh &
;-> ./writeout.sh &
Wait until both jobs are finished and see the result
;-> cat output
5001|A1A|AAB|190.00|105|A
5002|A2A|ABB|180.00|200|F
5003|A3A|AAB|153.33|5001|A1A|AAB|190.00|105|A
5002|A2A|ABB|180.00|205|R
5004|A4A|ABB|261.50|269|F
5005|A5A|AAB|243.00|200|F
5003|A3A|AAB|153.33|258|G
5006|A6A|ABB|147.89|154|H
5009|A9A|AAB|368.00|358|S
5010|AAA|ABB|245.71|205|R
5004|A4A|ABB|261.50|269|F
5005|A5A|AAB|243.00|258|G
5006|A6A|ABB|147.89|215|F
154|H
5009|A9A|AAB|368.00|358|S
5010|AAA|ABB|245.71|215|F
When I edit the last line of writeout.sh into done > output I do not see the problem, but that might be due to buffering and the small amount of data.
I still don't know exactly what happened in your case, but it really seems like 2 progs writing simultaneously to the same script.
A job in TWS could have been restarted manually, 2 scripts in your masterscript might write to the same file or something else.
Preventing this in the future can be done using some locking / checks (when the output file exists, quit and return errorcode to TWS).

DB2 Stored Procedure using a Cursor

DB2 V9 Z/Os
CREATE PROCEDURE SERDB.I21MMSNOUPD ()
RESULT SETS 1
LANGUAGE SQL
FENCED
COLLID SER
WLM ENVIRONMENT DDSNSPENV
RUN OPTIONS 'NOTEST(NONE,*,*,*)'
P1: BEGIN
--Declare variables
DECLARE CONSUMER INTEGER;
DECLARE NEW_MMS_NO INTEGER;
DECLARE END_TABLE INT DEFAULT 0;
DECLARE C1 CURSOR FOR
SELECT I20_CONSUMER_ID,
NEW_MMS_NO
FROM SERDB.I20_TEMP
-- WHERE I20_CONSUMER_ID = 164921;
ORDER BY I20_CONSUMER_ID;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET END_TABLE = 1;
OPEN C1;
FETCH C1 INTO CONSUMER,
NEW_MMS_NO;
WHILE END_TABLE = 0 DO
UPDATE SERDB.I20_CONSUMER_T
SET I20_MMS_NO = NEW_MMS_NO
WHERE I20_CONSUMER_ID = CONSUMER;
END WHILE;
CLOSE C1;
END P1
The above stored procedure builds with a cond code 0, but fails to execute even when a specific consumer_id. Does anyone see something wrong?
Individual sql statements run exactly as they're supposed to.
I've followed the examples for Cursors in SQL Procedures from IBM.
Thank you

I agree 100% with #X-Zero, this seems like a huge amount of work defining cursors and what-not, when you could do a simple set-based operation (likely with better performance). Here are two examples of how you can do it with a single operation:
Normal UPDATE:
UPDATE SESSION.I20_CONSUMER_T A
SET I20_MMS_NO = (
SELECT NEW_MMS_NO
FROM SESSION.I20_TEMP B
WHERE A.I20_CONSUMER_ID = B.CONSUMER
)
WHERE EXISTS (
SELECT 1
FROM SESSION.I20_TEMP C
WHERE A.I20_CONSUMER_ID = C.CONSUMER
)
New MERGE hotness:
MERGE INTO SESSION.I20_CONSUMER_T AS T
USING SESSION.I20_TEMP AS M
ON T.I20_CONSUMER_ID = M.CONSUMER
WHEN MATCHED THEN
UPDATE SET T.I20_MMS_NO = M.NEW_MMS_NO
ELSE IGNORE
These were tested on DB2 for Linux/Unix/Windows v9.7, but should work on any version of DB2 newer than 9.1 (DB2 for iSeries is a wildcard, I never remember what that platform does or doesn't support :) )

The FETCH command must be inside the WHILE, so that each time it is invoked, it fetches a row.

mysql stored procedure: using declared vars in a limit statement returns an error

I have the following code:
delimiter ;
DROP PROCEDURE IF EXISTS ufk_test;
delimiter //
CREATE PROCEDURE ufk_test(IN highscoreChallengeId INT UNSIGNED)
BEGIN
DECLARE vLoopOrder INT UNSIGNED DEFAULT 5;
DECLARE vLoopLimit INT UNSIGNED DEFAULT 10;
select * from fb_user LIMIT vLoopOrder,vLoopLimit;
END//
delimiter ;
Mysql returns the following error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'vLoopOrder,vLoopLimit;
END' at line 11
it seems that I cannot use declared variables in a LIMIT statement. is there any other way to overcome this ?
of course this is a simple example, here i could just put static numbers but I need to know if it's possible in any way to use any kind of variables with LIMIT.
Thanks

i use something like:
SET #s = CONCAT('SELECT * FROM table limit ', vLoopOrder ', ', vLoopLimit);
PREPARE stmt1 FROM #s;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Snowflake Store Procedure - Loop through csv files in AWS S3 and COPY INTO tables with same name - stored-procedures

There are various ways to list the files in a stage (see the post here). You can loop through the resultset and run COPY INTO on each record

Related

How to read the values from one flat file and use them in loop in PostgreSQL file?

INFORMIX: How can I store query result into a variable and echo it to the system as a string?

Shift in the columns of spool file

DB2 Stored Procedure using a Cursor

mysql stored procedure: using declared vars in a limit statement returns an error

Categories

Resources