Correctly use the SQL Function REGEXP_SUBSTR in Informatica - data-warehouse

--Specific Informatica PowerCenter Qs--
I have incoming data field like this and need to extract the substrings from either side of the hyphens and store them in individual fields of the target table. I am getting the correct results from the database but the same is not working in Informatica. In Expression my code says parsed successfully but nothing is getting loaded.
It would be great if someone can assist me with all the 8 REGEXP code lines as it seems it differs quite a bit as I traverse deep into the string.
select replace(regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*' ,'[^-]*(-|$)',1,1), '-', '' ) from dual;
select replace(regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*' ,'[^-]*(-|$)',1,2), '-', '' ) from dual;
select replace(regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*' ,'[^-]*(-|$)',1,3), '-', '' ) from dual;
select regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*','[^-]+',1,1) from dual;
select regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*','[^-]+',1,2) from dual;
select regexp_substr('ABC-10000-DEF-200-*-*-XYZ-*','[^-]+',1,3) from dual;
INFA Case 1: When I am using the below, its succeeding for the first occurrence but coming as nulls for the other 7 substring extracts.
REG_EXTRACT(String_Input,'([^-]*),?([^-]*),?([^-]*).*',1) --> Succeeds
REG_EXTRACT(String_Input,'([^-]*),?([^-]*),?([^-]*).*',2) --> Null
REG_EXTRACT(String_Input,'([^-]*),?([^-]*),?([^-]*).*',3) -- Null and so on till 8.
Case 2: When I use the below, I get all Nulls.
REG_EXTRACT('String_Input','[^-]+',1,1) --> Null
REG_EXTRACT('String_Input','[^-]+',1,2) --> Null
REG_EXTRACT('String_Input','[^-]+',1,3) --> Null ```

Related

bigquery sql table function with string interpolation

I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');

Is there a simpler way to get the argument type list for a snowflake procedure?

I need to transfer ownership of snowflake procedures post clone to a new Role.
To do this I'm using a procedure which works through all objects from the database.information_schema.xxxx views.
The procedures are problematic though, the SHOW PROCEDURE has a column which shows the procedure signature as just argument types, but the information_schema.procedures view shows the actual parameter name as well as its argument type, which if passed into a GRANT command does not work - the grant expects the Argument Type signature only, not the parameter names :/
SHOW PROCEDURE ARGUMENTS => PROCEDURE_NAME(VARCHAR) RETURN VARCHAR
INFORMATION_SCHEMA.PROCEDURES.ARGUMENT_SIGNATURE => PROCEDURE_NAME(P_PARAM1 VARCHAR)
I eventually came upwith this which was fun, but feels rather complicated, the question is - have I missed a simpler approach?
SELECT procedure_name
, concat('(',listagg(argtype, '') within group (order by argindex)) cleanArgTypes
FROM (SELECT procedure_name
, argument_signature
, lf.index argindex
, lf.value argtype
FROM rock_dev_test_1.information_schema.procedures
, lateral flatten(input=>split(decode(argument_signature
,'()','( )'
,argument_signature
),' ')
,outer=>true) lf
WHERE lf.index/2 != round(lf.index/2)
)
GROUP BY procedure_name
, argument_signature
ORDER by 1,2;
cleanArgTypes => (VARCHAR)
This takes the overspecific argument_signature splits it into an array using space as a delimiter, then laterally flatten the return set into rows, discard the parameter names (always at an even index) then groups by parameter name and signature and uses ListAgg to put the parameter argument types back into a string.
Small wrinkle in that () doesn't work, so has to be shifted to ( )
Whilst I enjoyed dabbling with some of Snowflakes Semi-structured capabilities, If there was a simpler approach I'd rather use it!
Mostly the same code, but it doesn't need to be nested, I swapped from the arg_sig (the input) to using the SEQ of the split, but mostly the same still:
SELECT p.procedure_name
,'( '|| listagg(split_part(trim(t.value),' ',2), ', ') within group (order by t.index) || ')' as out
FROM information_schema.procedures as p
,table(split_to_table(substring(p.argument_signature, 2,length(p.argument_signature)-2), ',')) t
group by 1, t.seq;
for the toy procedures in my stack overflow schema I get:
PROCEDURE_NAME
OUT
DATE_HANDLER
( DATE)
TODAYS_DELIVERY_AMOUNT
( VARCHAR)
ABC
( TIMESTAMP_NTZ, TIMESTAMP_NTZ, VARCHAR)
ABC_DAILY
( )
STRING_HANDLER
( VARCHAR)
I don't think there's a built-in way to do this. Here's an alternate way:
with A as
(
select PROCEDURE_NAME, split(replace(trim(ARGUMENT_SIGNATURE, '()'), ','), ' ') ARGS
,ARGUMENT_SIGNATURE
from test.information_schema.procedures P
)
select PROCEDURE_NAME
,listagg(VALUE::string, ',') as ARGS
from A, table(flatten(ARGS))
where index % 2 = 1
group by PROCEDURE_NAME
;
You could also use a result scan after the show command to get the name of the procedure and argument signature in a single string:
show procedures;
select split("arguments", ')')[0] || ')' as SIGNATURE from table(result_scan(last_query_id()));
I wrote this to pull out the list of procedures from the information schema with properly formatted argument signature, using a combination of splitting the string up with SPLIT, putting each value on a separate row with LATERAL FLATTEN, filtering to only get the data types using the INDEX, then re-grouping with LISTAGG. No subquery needed either.
SELECT PROCEDURE_SCHEMA || '.' || PROCEDURE_NAME || '(' ||
REPLACE(LISTAGG(C.Value,' ') WITHIN GROUP (ORDER BY C.INDEX),'(','') AS "Procedure"
FROM INFORMATION_SCHEMA.PROCEDURES,
LATERAL FLATTEN (INPUT => SPLIT(ARGUMENT_SIGNATURE,' ')) C
WHERE INDEX % 2 = 1 OR ARGUMENT_SIGNATURE = '()'
GROUP BY PROCEDURE_SCHEMA, PROCEDURE_NAME, ARGUMENT_SIGNATURE

Solve the syntax error with Redshift operator does not exist and add explicit casts

I am a newbie in the area of redshift data modeling and got myself into trouble with an error.ERROR:
--Final version
syntax error ERROR: operator does not exist: text | record Hint: No
operator matches the given name and argument type(s). You may need to
add explicit type casts. Where: SQL statement "SELECT 'create temp
table ' || $1 || ' as select * from' | $2 |" PL/pgSQL function "egen"
line 36 at execute statement [ErrorId:
1-61dc32bf-0a451f5e2c2639235abb8876]
I am trying to do a simple transformation that gets returned in output when the procedure is called. (As of now I got to find from the documentation we have to use either temp table or cursors to achieve this)
Pseudocode:
I am trying to restrict data to its latest one in (2019) Get the
list of managers create columns if a person is a manager or not from the list.
Return it as a result
Data looks as follows Employee Data
My Select query works fine out of the procedure, please find my complete code below.
CREATE OR REPLACE PROCEDURE EGEN(tmp_name INOUT varchar(256) )
AS $$
DECLARE
--As i have less data managed to create it as an array or please use temp or table and join it with the actual query to perform transformation
MGR_RECORD RECORD;
DATAS RECORD;
item_cnt int := 0;
V_DATE_YEAR int := 0;
BEGIN
--EXECUTE (select cast(extract(year from current_date) as integer)-3) INTO V_DATE_YEAR;
--Manager Records are stored here below
SELECT DISTINCT managerid from "dev"."public"."emp_salary" INTO MGR_RECORD;
SELECT employeeid,
managerid,
promotion,
q_bonus,
d_salary,
case when contractor = 'x'
then 'TemporaryEmployee'
else 'PermanentEmployee'
END as EmployeeType,
-- IFstatement not supported under select query
case when employeeid in (select distinct managerid FROM "dev"."public"."emp_salary" )
then 'Manager'
else 'Ordinary FTE'
END as FTETYPE
FROM "dev"."public"."emp_salary" where cast(extract(year from promotion) as int ) >= 2019 into DATAS;
--COMMIT;
tmp_name := 'ManagerUpdatedTable';
EXECUTE 'drop table if exists ' || tmp_name;
EXECUTE 'create temp table ' || 'ManagerUpdatedTable' || ' as select * from' |DATAS| ;
END;
$$ LANGUAGE plpgsql;
-- Call tests CALL EGEN('myresult'); SELECT * from myresult;
Also, additional query (Can we replace )
case when employeeid in (select distinct managerid FROM "dev"."public"."emp_salary" )
then 'Manager'
else 'Ordinary FTE'
END as FTETYPE
this transform in query to IF , if possible please provide details.
Thanks and Regards,
Gabby

ruby on rails' alphabetical order method doesn't place word boundary before "a"? [duplicate]

I use PostgreSQL 9.3.3 and I have a table with one column named as title (character varying(50)).
When I have executed the following query:
select * from test
order by title asc
I got the following results:
#
A
#Example
Why "#Example" is in the last position? In my opinion "#Example" should be in the second position.
Sort behaviour for text (including char and varchar as well as the text type) depends on the current collation of your locale.
See previous closely related questions:
PostgreSQL Sort
https://stackoverflow.com/q/21006868/398670
If you want to do a simplistic sort by ASCII value, rather than a properly localized sort following your local language rules, you can use the COLLATE clause
select *
from test
order by title COLLATE "C" ASC
or change the database collation globally (requires dump and reload, or full reindex). On my Fedora 19 Linux system, I get the following results:
regress=> SHOW lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#'))
SELECT title FROM v ORDER BY title ASC;
title
-------
#
a
#a
a#
a#a
(5 rows)
regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#'))
SELECT title FROM v ORDER BY title COLLATE "C" ASC;
title
-------
#
#a
a
a#
a#a
(5 rows)
PostgreSQL uses your operating system's collation support, so it's possible for results to vary slightly from host OS to host OS. In particular, at least some versions of Mac OS X have significantly broken unicode collation handling.
It seems, that when sorting Oracle as well as Postgres just ignore non alpha numeric chars, e.g.
select '*'
union all
select '#'
union all
select 'A'
union all
select '*E'
union all
select '*B'
union all
select '#C'
union all
select '#D'
order by 1 asc
returns (look: that DBMS doesn't pay any attention on prefix before 'A'..'E')
*
#
A
*B
#C
#D
*E
In your case, what Postgres actually sorts is
'', 'A' and 'Example'
If you put '#' in the middle od the string, the behaviour will be the same:
select 'A#B'
union all
select 'AC'
union all
select 'A#D'
union all
select 'AE'
order by 1 asc
returns (# ignored, and so 'AB', 'AC', 'AD' and 'AE' actually compared)
A#B
AC
A#D
AE
To change the comparison rules you should use collation, e.g.
select '#' collate "POSIX"
union all
select 'A' collate "POSIX"
union all
select '#Example' collate "POSIX"
order by 1 asc
returns (as it required in your case)
#
#Example
A

union between requests with remplacement variables in sqlplus

I have 14 fields which are similar and I search the string 'A' on each of them. I would like after that order by "position" field
-- some set in order to remove a lot of useless text
def col='col01'
select '&col' "Fieldname",
&col "value",
position
from oneTable
where &col like '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
/
...
def col='col14'
/
Write all the fields which contains 'A'. The problem is that those field are not ordered by position.
If I use UNION between table, I cannot take advantage of the substitution variables (&col), and I have to write a bash in unix in order to make the replacement back into ksh. The problem is of course that database code have to be hard-coded in this script (connection is not easy stuff).
If I use a REFCURSOR with OPEN, I cannot group the results sets together. I have only one request and cannot make an UNION of then. (print refcursor1 union refcursor2; print refcursor1+refcursor2 raise an exception, select * from refcursor1 union select * from refcursor2, does not work also).
How can concatenate results into one big "REFCURSOR"? Or use a union between two distinct run ('/') of my request, something like holding the request while typing new definition of variables?
Thank you for any advice.
Does this answer your question ?
CREATE TABLE #containingAValueTable
(
FieldName VARCHAR(10),
FieldValue VARCHAR(1000),
position int
)
def col='col01'
INSERT INTO #containingAValueTable
(
FieldName , FieldValue, position
)
SELECT '&col' "Fieldname",
&col "value",
position
FROM yourTable
WHERE &col LIKE '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
INSERT INTO...
/
def col='col14'
/
select * from #containingAValueTable order by postion
DROP #containingAValueTable
But I'm not totally sure about your use of the 'substitution variable' called "col" (and i only have SQL Server to test my request so I used explicit field names)
edit : Sorry for the '#' charcater, we use it so often in SQL Server for temporaries, I didn't even know it was SQL Server specific (moreover I think it's mandatory in sql server for creating temporary table). Whatever, I'm happy I could be useful to you.

Resources