I am struggling to dynamically build a CREATE or ALTER statement from a list of keys(n) results to be employed using the apoc.load.jdbcUpdate function. I have this function working very well for CREATE and INSERT statements for simple CREATE and INSERT statements.
The challenge I have is the need to export numerous lists of keys(n) in order to build a table in SQL/MS Access. Is there a way to "loop" through this list or collection to dynamically build this statement? This would be followed by a similar dynamically built INSERT statement to insert the records themselves.
My actual data consists of several neo4j queries to get keys(n) of various nodes and combines them into a long list. I'll likely have 50-200 columns for each export and every column will be unique or different with only about 10 columns common to every export. At present, I can make every column TEXT 12 but next steps will be getting the associated Type(n) for each Keys(n).
Aother option I tried was trying to ALTER an existing "template" table however the same issue with "looping through columns persists.
Many thanks in advance!
:param exportUrl => 'jdbc:ucanaccess:///artifacts/export/sample.accdb'
WITH ['colA', 'colB', 'colC', 'colD', 'colE', 'colF', 'colG', 'colH', 'colI', 'colJ'] AS listCOLS
UNWIND listCOLS as columnNAMES
WITH columnNAMES,
'CREATE Table TBL_ATTR ([colA] TEXT(12), [colB] Double, [colC] TEXT(36), [colD] TEXT(38), [colE] Integer, [colF] TEXT(24), ' +
'[colG] TEXT(20), [colH] TEXT(2), [colI] TEXT(4), [colJ] TEXT(50))' as statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
return COUNT(row);
The code above seems to correctly run and CREATE the table however I do receive the following exception
Failed to invoke procedure `apoc.load.jdbcUpdate`: Caused by: org.hsqldb.HsqlException: object name already exists: TBL_NEW```
It appears I have something that works now using a CREATE function then adding an ALTER TABLE statement for each element in the rows (after UNWIND.)
Effectively I unwind the list into a table format then the apoc function runs table with some default column and then run ALTER table commands to edit for each column but it also allows me to modify the data type for each row (in the example below)
It does seem redundant to run each line to the apoc.load.jdbcUpdate() function it allows me to have some control and doesn't really impact speed. (creates ~100 columns in 54ms for a larger data set which doesn't seem to bad)
:param exportUrl => 'jdbc:ucanaccess:///artifacts/export/sample.accdb'
WITH ['colA', 'colB', 'colC', 'colD', 'colE', 'colF', 'colG', 'colH', 'colI', 'colJ'] AS listCOLS
UNWIND listCOLS as columnNAMES
WITH columnNAMES,
'CREATE Table TBL_ATTR ([DEFAULT] TEXT(12))' as statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
WITH columnNAMES,
CASE columnNAMES //modify data types for various data
WHEN "colB" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' DOUBLE'
WHEN "colC" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' INTEGER'
WHEN "colH" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' DATE'
WHEN "colJ" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' TEXT(12)'
ElSE 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' TEXT(25)'
END as statement
WITH columnNAMES, statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
return count(row);
While the function executes successfully and I can see the columns correctly reflected in the export file, I do receive an error so not sure what is throwing the exception.
Failed to invoke procedure `apoc.load.jdbcUpdate`: Caused by: org.hsqldb.HsqlException: object name already exists: TBL_ATTR
Related
I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery
I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');
Consider an enterprise that captures sensor data for different production facilities. per facility, we create an aggregation query that averages the values to 5min timeslots. This query exists out of a long list of with-clauses and writes data to a table (called aggregation_table).
Now my problem: currently we have n queries running that exactly run the same logic, the only thing that differs are table names (and sometimes column names but let's ignore that for now).
Instead of managing n different scripts that are basically the same, I would like to put it in a stored procedure that is able to work like this:
CALL aggregation_query(facility_name) -> resolve the different tables for that facility and then use them in the different with clauses
On top of that, instead of having this long set of clauses that give me the end-result, I would like to chunk them up in logical blocks that are parametrizable, So for example, if I call the aforementioned stored_procedure for facility A, I want to be able to pass / use this table name in these different functions, where the output can be re-used in the next statement (like you would do with with clauses).
Another argument of why I want to chunk this up in re-usable blocks is because we have many "derivatives" on this aggregation query, for example to manage historical data, to correct data or to have the sensor data on another aggregation level. As these become overly complex, it is much easier to manage them without having to copy paste and adjust these every time.
In the current set-up, it could be useful to know that I am only entitled to use plain BigQuery, As my team is not allowed to access the CI/CD / scheduling and repository. (meaning that I cannot solve the issue by having CI/CD that deploys the n different versions of the procedure and functions)
So in the end, I would like to end up with something like this using only bigquery:
CREATE OR REPLACE PROCEDURE
`aggregation_function`()
BEGIN
DECLARE
tablename STRING;
DECLARE
active_table_name STRING; ##get list OF tables CREATE TEMP TABLE tableNames AS
SELECT
table_catalog,
table_schema,
table_name
FROM
`catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE
table_name = tablename;
WHILE
(
SELECT
COUNT(*)
FROM
tableNames) >= 1 DO ##build dataset + TABLE name
SET
active_table_name = CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`'); ##use concat TO build string AND execute
EXECUTE IMMEDIATE '''
INSERT INTO
`aggregation_table_for_facility` (timeslot, sensor_name, AVG_VALUE )
WITH
STEP_1 AS (
SELECT
*
FROM
my_table_function_step_1(active_table_name,
parameter1,
parameter2) ),
STEP_2 AS (
SELECT
*
FROM
my_table_function_step_2(STEP_1,
parameter1,
parameter2) )
SELECT * FROM STEP_2
'''
USING active_table_name as active_table_name;
DELETE
FROM
tableNames
WHERE
table_name = tablename;
END WHILE
;
END
;
I was hoping someone could make a snippet on how I can do this in Standard SQL / Bigquery, so basically:
stored procedure that takes in a string variable and is able to use that as a table (partly solved in the approach above, but not sure if there are better ways)
(table) function that is able to take this table_name parameter as well and return back a table that can be used in the next with clause (or alternatively writes to a temp table)
I think below code snippets should provide you with some insights when dealing with procedures, inserts and execute immediate statements.
Here I'm creating a procedure which will insert values into a table that exists on the information schema. Also, as a value I want to return I use OUT active_table_name to return the value I assigned inside the procedure.
CREATE OR REPLACE PROCEDURE `project-id.dataset`.custom_function(tablename STRING,OUT active_table_name STRING)
BEGIN
DECLARE query STRING;
SET active_table_name= (SELECT CONCAT('`',table_catalog,'.',table_schema,'.' ,table_name,'`')
FROM `project-id.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = tablename);
#multine query can be handled by using ''' or """
Set query =
'''
insert into %s (string_field_0,string_field_1,string_field_2,string_field_3,string_field_4,int64_field_5)
with custom_query as (
select string_field_0,string_field_2,'169 BestCity',string_field_3,string_field_4,55677 from %s limit 1
)
select * from custom_query;
''';
# querys must perform operations and must be the last thing to perform
# pass parameters using format
execute immediate (format(query,active_table_name,active_table_name));
END
You can also use a loop to iterate trough records from a working table so it will execute the procedure and also be able to get the value from the procedure to use somewhere else.ie:A second procedure to perform a delete operation.
DECLARE tablename STRING;
DECLARE out_value STRING;
FOR record IN
(SELECT tablename from `my-project-id.dataset.table`)
DO
SET tablename = record.tablename;
LOOP
call `project-id.dataset`.custom_function(tablename,out_value);
select out_value;
END LOOP;
END FOR;
To recap, there are some restrictions such as the possibility to call procedures inside a execute immediate or to use execute immediate inside an execute immediate, to count a few. I think these snippets should help you dealing with your current situation.
For this sample I use the following documentation:
Data Manipulation Language
Dealing with outputs
Information Schema Tables
Execute Immediate
For...In
Loops
I have an old application I am supporting that uses a Microsoft Access database. The original table design did not add primary keys to every table. I am working on a migration program that among other things is adding and filling in a new primary key field (GUID) when needed.
This is happening in three steps:
Add a new guid field with no constraints
Fill the field with new unique guids
Add the primary key constraints
My problem is setting the unique guids when the table has duplicate rows. Here is my code to set the guids.
Query.SQL.Add('SELECT * FROM ' + TableName);
Query.Open;
while Query.Eof = false do
begin
Query.Edit;
Query.FieldByName(NewPrimaryKeyFieldName).AsGuid := TGuid.NewGuid;
Query.Post;
Query.Next;
end;
FireDac generates an update statement that contains a where clause with all the original fields/values in the row (since there is no unique field for it to use). However, because the rows are complete duplicates the statement still updates two rows.
FireDac correctly errors with this message
Update command updated [2] instead of [1] record.
I can open up the database in Access and delete the duplicate records or assign them a unique guid by editing the table. I would like my conversion tool to automatically do this.
Is there some way to work with these duplicate rows in FireDac? Either to update just one at a time, or to delete just one of them?
In my opinion there is no way to do it with just one SQL Statement.
I would do this:
1. Copy the whole table without duplicates by using a new temp table
SELECT DISTINCT * FROM <TABLENAME>
Add the Keys
Delete old table content and copy new content from new table
Notes:
The DB Should be unavailable for everyone else for that Operation
2. Make BACKUP before
I'm searching a way to simulate "create table as select" in Firebird from SP.
We are using this statement frequently in another product, because it is very easy for make lesser, indexable sets, and provide very fast results in server side.
create temp table a select * from xxx where ...
create indexes on a ...
create temp table b select * from xxx where ...
create indexes on b ...
select * from a
union
select * from b
Or to avoid the three or more levels in subqueries.
select *
from a where id in (select id
from b
where ... and id in (select id from c where))
The "create table as select" is very good cos it's provide correct field types and names so I don't need to predefine them.
I can simulate "create table as" in Firebird with Delphi as:
Make select with no rows, get the table field types, convert them to create table SQL, run it, and make "insert into temp table " + selectsql with rows (without order by).
It's ok.
But can I create same thing in a common stored procedure which gets a select sql, and creates a new temp table with the result?
So: can I get query result's field types to I can create field creator SQL from them?
I'm just asking if is there a way or not (then I MUST specify the columns).
Executing DDL inside stored procedure is not supported by Firebird. You could do it using EXECUTE STATEMENT but it is not recommended (see the warning in the end of "No data returned" topic).
One way to do have your "temporary sets" would be to use (transaction-level) Global Temporary Table. Create the GTT as part of the database, with correct datatypes but without constraints (those would probably get into way when you fill only some columns, not all) - then each transaction only sees it's own version of the table and data...