Transform portion of ETL using Scriptella?

Transform portion of ETL using Scriptella? - scriptella

I am trying out Scriptella to see if it will meet my needs. So far, it seems like a great tool. I've spent several hours studying sample scripts, searching forums, and trying to get the hang of nested queries/scripts.
This is an example of my ETL file, slightly cleaned up for brevity. Lines beginning with # are added and not part of the actual ETL file. I am trying to insert/retrieve IDs and then pass them on to later script blocks. The most promising way to do this appears to be using global variables but I'm getting null when trying to retrieve the values. Later, I will be adding code in the scripts blocks that parse and significantly transform fields before adding them into the DB.
There are no errors. I'm just not getting the OS ID and Category IDs that I'd expect. Thank you in advance.
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="in" driver="csv" url="mycsvfile.csv"/>
<connection id="dest" url="jdbc:mysql://localhost:3306/pvm3" user="user" password="password"/>
<connection id="js" driver="script"/>
<query connection-id="in">
<!-- all columns are selected, notably: OPERATINGSYSTEM, CATEGORY, QID, TITLE -->
<query connection-id="dest">
#Check to see if the OS already exists, and get the ID if it does
select max(os_id) as os_id, count(*) as os_cnt from etl_os where os = ?OPERATINGSYSTEM;
#If it doesnt exist then add it and get the auto_increment value
<script if="os_cnt==0">
insert into etl_os(os) values(?OPERATINGSYSTEM);
<query connection-id="dest">
select last_insert_id() as os_id;
#Store in global so it can be accessed in later script blocks
<script connection-id="js">
etl.globals.put('os_id', os_id);
</script>
</query>
</script>
#Same style select/insert as above for category_id (excluded for brevity)
#See if KB record exists by qid, if not then add it with the OS ID and category ID we got earlier
<query connection-id="dest">
select max(qid) as existing_qid, count(*) as kb_cnt from etl_qids where qid = ?QID
<script if="kb_cnt==0">
insert into etl_qids(qid, category_id, os_id) values (?QID, ?{etl.globals.get('category_id')}, ?{etl.globals.get('os_id')});
</script>
</query>
</query>
</query>
</etl>

Found out how to do it. Essentially, just nest queries to modify the data before passing it to a script. The below is a quick type-up of the solution. I did not understand at first that queries could be immediately nested to transform the row before passing it for processing. My impression was also that only scripts could manipulate the data.
(Query)Raw data -> (Query)manipulate data -> (Script)write new data.
.. in is a CSV file ..
.. js is a driver="script" block ..
<query connection-id="in">
<query connection-id="js">
//transform data as needed here
if (BASE_TYPE == '-') BASE_TYPE = '0';
if (SECONDARY_TYPE == '-') SECONDARY_TYPE = '0';
SIZES = SIZES.toLowerCase();
query.next(); //call nested scripts
<script connection-id="db">
INSERT IGNORE INTO sizes(size) VALUES (?SIZE);
INSERT IGNORE INTO test(base_type,secondary_type) VALUES (?BASE_TYPE, ?SECONDARY_TYPE);
</script>
</query>
</query>

Related

Receiving responses of different formats for the same query in vespa

I have such schema
schema embeddings {
document embeddings {
field id type int {}
field text_embedding type tensor<double>(d0[960]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
}
}
rank-profile closeness {
num-threads-per-search:1
inputs {
query(query_embedding) tensor<double>(d0[960])
}
first-phase {
expression: closeness(field, text_embedding)
}
}
Such services:
...
<container id="query" version="1.0">
<search/>
<nodes>
<node hostalias="query" />
</nodes>
</container>
<content id='mind' version='1.0'>
<redundancy>1</redundancy>
<documents>
<document type='embeddings' mode="index"/>
</documents>
<nodes>
<node hostalias="content1" distribution-key="0"/>
</nodes>
</content>
...
Then I have the number of queries of the same format:
{
'yql': 'select * from embeddings where ({approximate:false, targetHits:100} nearestNeighbor(text_embedding, query_embedding));',
'timeout': 5,
"hits":100,
'input': {
'query(query_embedding)': [...],
},
'ranking': {
'profile': 'closeness',
},
}
which are then run via app.query_batch(test_queries)
The problem is some responses look like this (and contain id field as integers, just like I inserted):
{'id': 'id:embeddings:embeddings::786559', 'relevance': 0.5703559830732123, 'source': 'mind', 'fields': {'sddocname': 'embeddings', 'documentid': 'id:embeddings:embeddings::786559'}}
and others look like this (neither containing int id as I inserted, nor keeping the format of the previous example):
{'id': 'index:mind/0/b0dde169c545ce11e8fd1a17', 'relevance': 0.49024561522459087, 'source': 'mind'}
How can I make all responses look like the first one? Why are they different at all?

Some of them are filled with content and some are not, I suppose because it timed out. Check the coverage info, and run with traceLevel=3 to see more details.
Some more background info on what's going on:
Searches are executed in two phases: First, minimal information on each hits hit is returned from each content node up to the issuing container. These partial lists are then merged to produce the final hits length list of matches. For those we execute phase two, which is to fill the content of the final hits. This involves doing another request to each of the content nodes to get the relevant content.
If there's little time left, or lots of data, or expensive summary features to compute, or a slow disk subsystem or network, or a node in some kind of trouble, this may time out leaving only some hits filled so that you'll see this.
Why are the id's not the true document id in these cases? The text string id is stored in the disk document blob but not in memory as an attribute, so it needs to be fetched in the fill phase too. If it is not, an internally generated unique id is used instead.

Strip string from a field with defined separator in invoice report

first of all - I'm still newbie in Odoo so this is maybe explained wrong but I will try.
In inherited invoice_report xml document i have conditional field that needs to be shown - if column (field) in db is equal to another column. To be more precise - if invoice_origin from account_move is equal to name in sale_order.
This is it's code:
<t t-foreach="request.env['sale.order'].search([('name', '=', o.invoice_origin)])" t-as="obj">
For example in database this invoice_origin is [{'invoice_origin': 'S00151-2022'}]
On invoices that are created from more than one sales orders it is this [{'invoice_origin': 'S00123-2022, S00066-2022'}]
How can I strip this data to use in foreach separately the part [{'invoice_origin': 'S00123-2022'}] and separately [{'invoice_origin': 'S00066-2022'}]
Thank you.

You can try to split up the invoice origin and use the result for your existing code:
<t t-set="origin_list" t-value="o.invoice_origin and o.invoice_origin.split(', ') or []" />
<t t-foreach="request.env['sale.order'].search([('name', 'in', origin_list)])" t-as="obj">
<!-- do something -->
</t>

How to exclude multiple values in OData call?

I am creating a SAPUI5 application. This application is connected to a backend SAP system via OData. In the SAPUI5 application I use a smart chart control. Out of the box the smart chart lets the user create filters for the underlying data. This works fine - except if you try to use multiple 'not equals' for one property. Is there a way to accomplish this?
I found out that all properties within an 'and_expression' (including nested or_expressions) must have unique name.
The reason why two parameters with the same property don't get parsed into the select options:
/IWCOR/CL_ODATA_EXPR_UTILS=>GET_FILTER_SELECT_OPTIONS takes the expression you pass and parses it into a table of select options.
The select option table returned is of type /IWCOR/IF_ODATA_TYPES=>EDM_SELECT_OPTION_T which is a HASHED TABLE .. WITH UNIQUE KEY property.
From: https://archive.sap.com/discussions/thread/3170195
The problem is that you cannot combine NE terms with OR. Because both parameters after the NE should not be shown in the result set.
So at the end the it_filter_select_options is empty and only the iv_filter_string is filled.
Is there a manual way of facing this problem (evaluation of the iv_filter_string) to handle multiple NE terms?
This would be an example request:
XYZ/SmartChartSet?$filter=(Category%20ne%20%27Smartphone%27%20and%20Category%20ne%20%27Notebook%27)%20and%20Purchaser%20eq%20%27CompanyABC%27%20and%20BuyDate%20eq%20datetime%272018-10-12T02%3a00%3a00%27&$inlinecount=allpages
Normally I want this to exclude items with the category 'Notebook' and 'Smartphone' from my result set that I retrieve from the backend.

If there is a bug inside /iwcor/cl_odata_expr_utils=>get_filter_select_options which makes it unable to treat multiple NE filters of the same component, and you cannot wait for an OSS. I would suggest to wrap it inside a new static method that will make the following logic (if you will be stuck with the ABAP implementation i would try to at least partially implement it when i get time):
Get all instances of <COMPONENT> ne '<VALUE>' inside a () (using REGEX).
Replace each <COMPONENT> with <COMPONENT>_<i> so there will be ( <COMPONENT>_1 ne '<VALUE_1>' and <COMPONENT>_2 ne '<VALUE_2>' and... <COMPONENT>_<n> ne '<VALUE_n>' ).
Call /iwcor/cl_odata_expr_utils=>get_filter_select_options with the modified query.
Modify the rt_select_options result by changing COMPONENT_<i> to <COMPONENT> again.

I can't find the source but I recall that multiple "ne" isn't supported. Isn't that the same thing that happens when you do multiple negatives in SE16, some warning is displayed?
I found this extract for Business ByDesign:
Excluding two values using the OR operator (for example: $filter=CACCDOCTYPE ne ‘1000’ or CACCDOCTYPE ne ‘4000’) is not possible.
The workaround I see is to select the Categories you actively want, not the ones you don't in the UI5 app.
I can also confirm that my code snippet I've used a long time for filtering also has the same problem...
* <SIGNATURE>---------------------------------------------------------------------------------------+
* | Instance Public Method ZCL_MGW_ABS_DATA->FILTERING
* +-------------------------------------------------------------------------------------------------+
* | [--->] IO_TECH_REQUEST_CONTEXT TYPE REF TO /IWBEP/IF_MGW_REQ_ENTITYSET
* | [<-->] CR_ENTITYSET TYPE REF TO DATA
* | [!CX!] /IWBEP/CX_MGW_BUSI_EXCEPTION
* | [!CX!] /IWBEP/CX_MGW_TECH_EXCEPTION
* +--------------------------------------------------------------------------------------</SIGNATURE>
METHOD FILTERING.
FIELD-SYMBOLS <lt_entityset> TYPE STANDARD TABLE.
ASSIGN cr_entityset->* TO <lt_entityset>.
CHECK: cr_entityset IS BOUND,
<lt_entityset> IS ASSIGNED.
DATA(lo_filter) = io_tech_request_context->get_filter( ).
/iwbep/cl_mgw_data_util=>filtering(
exporting it_select_options = lo_filter->get_filter_select_options( )
changing ct_data = <lt_entityset> ).
ENDMETHOD.

SPParameters Not Matching Stored Procedure

I have posted this question in the Logi Analytics DevNet but it is a graveyard there and I am hoping I can get an answer here. I am using Logi Info v12.2.116.
I am using a stored procedure to get data to fill my reports. The stored procedure calls an RPG program. I have done this many times, but today when I created a new stored procedure, a new report, the parameters for some reason do not match up. I have restarted Logi as well... Sometimes it doesn't pick up new stored procedures until it reboots but that didn't work either. I have never seen this error before:
The number of parameter values set or registered does not match the number of parameters
Here is the code for my parameters:
<DataLayer Type="SP" Command="myStoredProcedure">
<SPParameters NullValue="'">
<SPParameter SPParamDirection="Input" ID="GAct" SPParamSize="2" SPParamType="dt-129" Value="RE" />
<SPParameter SPParamDirection="Input" ID="rsDetail" SPParamType="dt-129" Value="N" SPParamSize="1" />
<SPParameter SPParamDirection="Input" ID="rsFromDate" SPParamSize="10" SPParamType="dt-7" Value="# Request.paramFromDate~" />
<SPParameter SPParamDirection="Input" ID="rsToDate" SPParamSize="10" SPParamType="dt-7" Value="# Request.paramToDate~" />
<SPParameter SPParamDirection="Input" ID="rsDepts" SPParamSize="256" SPParamType="dt-129" Value="# Request.paramAllDepartments~" />
</SPParameters>
Here is the stored procedure definition:
1 IN GACT CHARACTER 2 No default
2 IN RSDETAIL CHARACTER 1 No default
3 IN RSFROMDATE DATE No default
4 IN RSTODATE DATE No default
5 IN RSDEPTS CHARACTER 256 No default
What in the world is causing this problem? Yes I am connecting to the same partition/library. Yes the program and stored procedure work perfectly fine when being called outside of Logi. Yes I am calling the right stored procedure.

Change:
<SPParameter SPParamDirection="Input" ID="GAct" SPParamSize="2" SPParamType="dt-129" Value="RE" />
To:
<SPParameter SPParamDirection="Input" ID="GAct" SPParamSize="2" SPParamType="dt-200" Value="RE" />
Char is generally considered to be a single character and is likely what is causing your issue. Also, unrelated, but I typically just use size 0 on all my SPParams to avoid truncation issues.

MyBatis set fetch size on ResultSet as out parameter of procedure

I have stored procedure that I need to call using MyBatis. Anyway I managed to call this stored procedure. Procedure has multiple out parameters. One of out parameter is oracle cursor. I need to iterate over Oracle Cursor, but when I do this without any fine-tuning of jdbc driver using fetchSize attribute, it goes row by row and this solution is very slow.
I am able to set on procedure call fethcSize attribute:
<select id="getEvents" statementType="CALLABLE" parameterMap="eventInputMap" fetchSize="1000">
{call myProc(?, ?, ?, ?, ?)}
</select>
But this doesn't helps at all. I think that this doesn't work because of multiple out parameters - so program doesn't know where this fetch size should be applied - on which out parameter. Is there any way to set fetch size on ResultSet(Oracle cursor)? Like when I use CallableStatemen from java.sql package I am able to set on ResultSet fetch size.
Here are mapping files and procedure call from program:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper
PUBLIC "-//ibatis.apache.org//DTD Mapper 3.0//EN"
"http://ibatis.apache.org/dtd/ibatis-3-mapper.dtd">
<mapper namespace="mypackage.EventDao">
<resultMap id="eventResult" type="Event">
<result property="id" column="event_id" />
<result property="name" column="event_name" />
</resultMap>
<parameterMap id="eventInputMap" type="map" >
<parameter property="pnNetworkId" jdbcType="NUMERIC" javaType="java.lang.Integer" mode="IN"/>
<parameter property="pvUserIdentityId" jdbcType="VARCHAR" javaType="java.lang.String" mode="IN"/>
<parameter property="result" resultMap="eventResult" jdbcType="CURSOR" javaType="java.sql.ResultSet" mode="OUT" />
<parameter property="success" jdbcType="INTEGER" javaType="java.lang.Integer" mode="OUT"/>
<parameter property="message" jdbcType="VARCHAR" javaType="java.lang.String" mode="OUT"/>
</parameterMap>
<select id="getEvents" statementType="CALLABLE" parameterMap="eventInputMap" fetchSize="1000">
{call myProc(?, ?, ?, ?, ?)}
</select>
</mapper>
And call from program:
SqlSession session = sqlSessionFactory.openSession();
Map<String, Object> eventInputMap = new HashMap<String, Object>();
try {
EventDao ed = session.getMapper(EventDao.class);
eventInputMap.put("pnNetworkId", networkId);
eventInputMap.put("pvUserIdentityId", identityId);
eventInputMap.put("success", 0);
eventInputMap.put("message", null);
ed.getEvents(eventInputMap);
session.selectList("EventDao.getEvents", eventInputMap);
} catch (Exception e) {
e.printStackTrace();
}finally{
session.close();
}
Thanks in advance!

The provided code works. I have even checked 3 ways to write it: like here with parameterMap, without parameterMap (mapping directly in the statement), and through annotations, everything works.
I used to thing the fetchSize setting was not propagated from main statement to OUT param resultSet, until I really tested that, lately.
To apprehend whether the fetch size is used or not and how much effect it has, the result must contain a large enough number of rows. And of course, the poorest is the latency from app to DB, the more noticeable is the effect.
For my test, the cursor used by procedure returned 5400 rows of 120 columns (but the most important is the row count).
To give an order of magnitude, I have measured fetching times, i.e from the stored procedure return to the statement return, with a result list filled with data fetched from cursor. Then I log the instanciation of the first object mapped, this occurs near the beginning of the global fetching, probably after the first fetch:
public static boolean firstInstance = true;
public Item() {
if (firstInstance) {
LOGGER.debug("Item first instance");
firstInstance=false;
}
}
And I log again just at the end, after the session.selectList returns.
This is for test purpose only. Do not let that is your code. Find a clean way to do it.
Here are some timing depending on the configured fetch size:
- fetchSize=1 => 13000 ms
- fetchSize=10 => 5300 ms
- fetchSize=100 => 3800 ms
- fetchSize=300 => 3700 ms
- fetchSize=500 => 3650 ms
- fetchSize=1000 => 3600 ms
Oracle JDBC driver default fetchSize is 10.
Testing with fetchSize=1 allows proving the supplied setting is used.
With 100, here, 30% are saved. Beyond, the gain is negligible (with this use case, and environment)
Anyway, it would be interesting to be able to know when procedure execution finishes and when result fetch starts.
Unfortunately, Mybatis logs very few. I thought custom result handler could help, but looking at the source code of class org.apache.ibatis.executor.resultset.DefaultResultSetHandler, I notice that
unlike method handleResultSet (used for simple select statements) that allows using a custom result handler, method handleRefCursorOutputParameter (used here for procedure OUT cursor) does not. Then no need to trying passing a custom result handler: it will be ignored.
I am interested in a solution if anyone has one. But it seems an evolution request will be required.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Transform portion of ETL using Scriptella? - scriptella

Related

Receiving responses of different formats for the same query in vespa

Strip string from a field with defined separator in invoice report

How to exclude multiple values in OData call?

SPParameters Not Matching Stored Procedure

MyBatis set fetch size on ResultSet as out parameter of procedure

Categories

Resources