KSQL: EXPLODE in WHERE expression - ksqldb

I have an avro input_stream something like this:
VERSION STRING,
ID STRUCT<
NAME STRING,
STATUS STRING
>,
ROLES ARRAY<
STRUCT<
DESC STRING,
TYPE STRING
>
>
and I want to falatten it and do some filtering as well.
This is working:
select version, explode(roles)->type from input_stream where version='1.0' emit changes;
but this fails:
select version, explode(roles)->type from input_stream where explode(roles)->type='master' emit changes;
Invalid Predicate: Can't find any functions with the name 'EXPLODE'. expression:(explode(roles)->type='master')
How can I filter directly the array elements based on type without creating a new stream that will query and filter from this exploded one?
Thanks!
Example how to reproduce:
>create stream test_for_robin_moffatt (VERSION STRING,ID STRUCT<NAME STRING,STATUS STRING>,ROLES ARRAY<STRUCT<DESC STRING,TYPE STRING>>) with (kafka_topic='test_for_robin_moffatt', value_format='json', partitions=1);
>insert into TEST_FOR_ROBIN_MOFFATT (VERSION,ID, ROLES) values ('1.1', STRUCT(name:='zoltan', status:='active'), ARRAY[STRUCT(desc:='blabla', type:='master')]);
>SET 'auto.offset.reset' = 'earliest';
>select version, id->name, id->status, explode(roles)->desc, explode(roles)->type from TEST_FOR_ROBIN_MOFFATT emit changes;
>select version, id->name, id->status, explode(roles)->desc, explode(roles)->type from TEST_FOR_ROBIN_MOFFATT where version='1.1' emit changes;
>select version, id->name, id->status, explode(roles)->desc, explode(roles)->type from TEST_FOR_ROBIN_MOFFATT where id->status='active' emit changes;
The statements above work fine:
+---------------------------------------------+---------------------------------------------+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|VERSION |NAME |STATUS |DESC |TYPE |
+---------------------------------------------+---------------------------------------------+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|1.1 |zoltan |active |blabla |master |
But, this fails:
>select version, id->name, id->status, explode(roles)->desc, explode(roles)->type from TEST_FOR_ROBIN_MOFFATT where explode(roles)->type='master' emit changes;
Invalid Predicate: Can't find any functions with the name 'EXPLODE'. expression:(EXPLODE(ROLES)->TYPE = 'master'), schema:`VERSION` STRING, `ID` STRUCT<`NAME` STRING, `STATUS` STRING>, `ROLES` ARRAY<STRUCT<`DESC` STRING, `TYPE` STRING>>, `ROWTIME` BIGINT

Related

Programmatically extract column descriptions from Informix

Is there a system table where Informix database stores column descriptions?
I know how to do it in SQL Server, Oracle, but not in Informix...
There are several methods to obtain information regarding Informix database objects (like tables or columns) using the Informix catalog tables.
You can get the info directly from the catalog using some basic SQL select statement, for example:
SELECT TRIM(c.colname) colname,
CASE
WHEN MOD(coltype,256)=0 THEN 'CHAR'
WHEN MOD(coltype,256)=1 THEN 'SMALLINT'
WHEN MOD(coltype,256)=2 THEN 'INTEGER'
WHEN MOD(coltype,256)=3 THEN 'FLOAT'
WHEN MOD(coltype,256)=4 THEN 'SMALLFLOAT'
WHEN MOD(coltype,256)=5 THEN 'DECIMAL'
WHEN MOD(coltype,256)=6 THEN 'SERIAL'
WHEN MOD(coltype,256)=7 THEN 'DATE'
WHEN MOD(coltype,256)=8 THEN 'MONEY'
-- needs more entries --
ELSE TO_CHAR(coltype)
END AS Type,
BITAND(coltype,256)=256 AS NotNull
FROM systables AS t
JOIN syscolumns AS c ON t.tabid = c.tabid
WHERE t.tabtype = 'T'
AND t.tabname = 'customer'
ORDER by c.colno;
Note that the CASE section will need an entry for each Informix data type.
Description of all catalog tables can be found at:
https://www.ibm.com/docs/en/informix-servers/14.10?topic=reference-system-catalog-tables
https://www.ibm.com/docs/en/informix-servers/14.10?topic=tables-systables
https://www.ibm.com/docs/en/informix-servers/14.10?topic=tables-syscolumns
Another method will be to use the INFO SQL statement as described here:
https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-info-statement
Something like:
informix#DESKTOP:~/IDS$ dbaccess stores7 -
Database selected.
> info columns for customer;
Column name Type Nulls
customer_num serial no
fname char(15) yes
lname char(15) yes
company char(20) yes
address1 char(20) yes
address2 char(20) yes
city char(15) yes
state char(2) yes
zipcode char(5) yes
phone char(18) yes
>
Alternatively, you can use any metadata method provided by the API (like JDBC's DatabaseMetaData.getColumns()

KSQLDB: Using CREATE STREAM AS SELECT with Differing KEY SCHEMAS

Here is the description of the problem statement:
STREAM_SUMMARY: A stream with one of the value columns as an ARRAY-of-STRUCTS.
Name : STREAM_SUMMARY
Field | Type
------------------------------------------------------------------------------------------------------------------------------------------------
ROWKEY | STRUCT<asessment_id VARCHAR(STRING), institution_id INTEGER> (key)
assessment_id | VARCHAR(STRING)
institution_id | INTEGER
responses | ARRAY<STRUCT<student_id INTEGER, question_id INTEGER, response VARCHAR(STRING)>>
------------------------------------------------------------------------------------------------------------------------------------------------
STREAM_DETAIL: This is a stream to be created from STREAM1, by "exploding" the the array-of-structs into separate rows. Note that the KEY schema is also different.
Below is the Key and Value schema I want to achieve (end state)...
Name : STREAM_DETAIL
Field | Type
-------------------------------------------------------------------------------------------------------
ROWKEY | **STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER> (key)**
assessment_id | VARCHAR(STRING)
institution_id | INTEGER
student_id | INTEGER
question_id | INTEGER
response | VARCHAR(STRING)
My objective is to create the STREAM_DETAIL from the STREAM_SUMMARY.
I tried the below:
CREATE STREAM STREAM_DETAIL WITH (
KAFKA_TOPIC = 'stream_detail'
) AS
SELECT
STRUCT (
`assessment_id` := "assessment_id",
`student_id` := EXPLODE("responses")->"student_id",
`question_id` := EXPLODE("responses")->"question_id"
)
, "assessment_id"
, "institution_id"
, EXPLODE("responses")->"student_id"
, EXPLODE("responses")->"question_id"
, EXPLODE("responses")->"response"
FROM STREAM_SUMMARY
EMIT CHANGES;
While the SELECT query works fine, the CREATE STREAM returned with the following error:
"Key missing from projection."
If I add the ROWKEY column in the SELECT clause in the above statement, things work, however, the KEY schema of the resultant STREAM is same as the original SREAM's key.
The "Key" schema that I want in the new STREAM is : STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER> (key)
Alternatively, I tried creating the STREAM_DETAIL by hand (using plain CREATE STREAM statement by providing key and value SCHEMA_IDs). Later I tried the INSERT INTO approach...
INSERT INTO STREAM_DETAIL
SELECT ....
FROM STREAM_SUMMARY
EMIT CHANGES;
The errors were the same.
Can you please guide on how can I achieve enriching a STREAM but with a different Key Schema? Note that a new/different Key schema is important for me since I use the underlying topic to be synced to a database via a Kafka sink connector. The sink connector requires the key schema in this way, for me to be able to do an UPSERT.
I am not able to get past this. Appreciate your help.
You can't change the key of a stream when it is created from another stream.
But there is a different approach to the problem.
What you want is re-key. And to do so you need to use ksqlDB table. Can be solved like -
CREATE STREAM IF NOT EXISTS INTERMEDIATE_STREAM_SUMMARY_FLATTNED AS
SELECT
ROWKEY,
EXPLODE(responses) as response
FROM STREAM_SUMMARY;
CREATE TABLE IF NOT EXISTS STREAM_DETAIL AS -- This also creates a underlying topic
SELECT
ROWKEY -> `assessment_id` as `assessment_id`,
response -> `student_id` as `student_id`,
response -> `question_id` as `question_id`,
LATEST_BY_OFFSET(ROWKEY -> `institution_id`, false) as `institution_id`,
LATEST_BY_OFFSET(response -> `response`, false) as `response`
FROM INTERMEDIATE_STREAM_SUMMARY_FLATTNED
GROUP BY ROWKEY -> `assessment_id`, response -> `student_id`, response -> `question_id`;
Key schema will be STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER>, you can check schema registry or print the topic to validate that. In ksqlDB describe table will show you flat key, but don't panic.
I have used similar and sync the final topic to database.

How to delete a value from ksqldb table or insert a tombstone value?

How is it possible to mark a row in a ksql table for deletion via Rest api or at least as a statement in ksqldb-cli?
CREATE TABLE movies (
title VARCHAR PRIMARY KEY,
id INT,
release_year INT
) WITH (
KAFKA_TOPIC='movies',
PARTITIONS=1,
VALUE_FORMAT = 'JSON'
);
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (48, 'Aliens', 1986);
This doesn't work for obvious reasons, but DELETE statement doesn't exist in ksqldb:
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (48, null, null);
Is there a way to create a recommended tombstone null value or do I need to write it directly to the underlying topic?
There is a way to do this that's a bit of a workaround. The trick is to use the KAFKA value format to write a tombstone to the underlying topic.
Here's an example, using your original DDL.
-- Insert a second row of data
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (42, 'Life of Brian', 1986);
-- Query table
ksql> SET 'auto.offset.reset' = 'earliest';
ksql> select * from movies emit changes limit 2;
+--------------------------------+--------------------------------+--------------------------------+
|TITLE |ID |RELEASE_YEAR |
+--------------------------------+--------------------------------+--------------------------------+
|Life of Brian |42 |1986 |
|Aliens |48 |1986 |
Limit Reached
Query terminated
Now declare a new stream that will write to the same Kafka topic using the same key:
CREATE STREAM MOVIES_DELETED (title VARCHAR KEY, DUMMY VARCHAR)
WITH (KAFKA_TOPIC='movies',
VALUE_FORMAT='KAFKA');
Insert a tombstone message:
INSERT INTO MOVIES_DELETED (TITLE,DUMMY) VALUES ('Aliens',CAST(NULL AS VARCHAR));
Query the table again:
ksql> select * from movies emit changes limit 2;
+--------------------------------+--------------------------------+--------------------------------+
|TITLE |ID |RELEASE_YEAR |
+--------------------------------+--------------------------------+--------------------------------+
|Life of Brian |42 |1986 |
Examine the underlying topic
ksql> print movies;
Key format: KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 2021/02/22 11:01:05.966 Z, key: Aliens, value: {"ID":48,"RELEASE_YEAR":1986}, partition: 0
rowtime: 2021/02/22 11:02:00.194 Z, key: Life of Brian, value: {"ID":42,"RELEASE_YEAR":1986}, partition: 0
rowtime: 2021/02/22 11:04:52.569 Z, key: Aliens, value: <null>, partition: 0

How to select a set of fields from input data as an array of repeated fields in beam SQL

Problem Statement:
I have an input PCollection with following fields:
{
firstname_1,
lastname_1,
dob,
firstname_2,
lastname_2,
firstname_3,
lastname_3,
}
then I execute a Beam SQL operation such that output of resultant PCollection should be like
----------------------------------------------
name.firstname | name.lastname | dob
----------------------------------------------
firstname_1 | lastname_1 | 202009
firstname_2 | lastname_2 |
firstname_3 | lastname_3 |
-----------------------------------------------
To be precise:
array[
(firstname_1,lastname_1,dob),
(firstname_2,lastname_2,dob),
(firstname_3,lastname_3,dob)
]
Here is the code snippet where I execute Beam SQL:
PCollectionTuple tuple=
PCollectionTuple.of(new TupleTag<>("testPcollection"), testPcollection);
PCollection<Row> result = tuple
.apply(SqlTransform.query(
"SELECT array[(firstname_1,lastname_1,dob), (firstname_2,lastname_2,dob), (firstname_3,lastname_3,dob)]"));
I am not getting proper results.
Can someone guide me how to query an array of repeated field in Beam SQL?
You can take a look at this example on how to access arrays in Beam SQL - https://github.com/apache/beam/blob/d110f6b7610b26edc1eb9a4b698840b21c151847/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java#L234
Your SQL query has a few errors.
You have named the input to the SQL query testPcollection. Your SQL query does not select FROM testPcollection. Let us assume you meant it to be FROM testPcollection.
You use the syntax (firstname_1, lastname_1, doc) in both your expected output and your query. This is not any valid SQL expression.

esper how to use table data created by epl

I'm new to esper, I want to get data stored in tbl_config. Here are some esper config file:
config.epl
module rms.config;
create table tbl_config(
id java.math.BigDecimal primary key,
time java.math.BigDecimal
);
create schema ConfigListEvent as (
id java.math.BigDecimal,
time java.math.BigDecimal
);
#Audit
#Name("LoadConfigDataFromDBRule")
insert into ConfigListEvent
select tbl.ID as id, tbl.time as time
from ImportDataEvent,
sql: rms ['select * from T_CONFIG'] as tbl;
#Audit
#Priority(1)
#Name("DeleteConfigDataRule")
on ConfigListEvent as evt
delete from tbl_config as tbl where evt.id = tbl.id;
#Audit
#Name("InsertConfigDataRule")
on ConfigListEvent
insert into tbl_config select *;
stat.epl
module rms.stat;
uses rms.config;
#Name("Create-PaymentContext")
create window PaymentWindow.win:time(2 hour) as PaymentRequest;
#Audit
#Name("insertPaymentRequest ")
#Priority(1)
insert into PaymentWindow select * from PaymentRequest;
rule.epl
module rms.rule;
uses rms.config;
uses rms.stat;
#Audit
#Name("xxx")
#Description("check max times per IntervalTime")
on PaymentRequest as pay
select CustomUtil.getEndTime(pay.createTime,tbl_config["time"]) as startTime from PaymentWindow as payWindow;
then system launch with errors:
com.espertech.esper.epl.expression.core.ExprValidationException: Failed to validate method-chain parameter expression 'tbl_config["time"]': Incompatible type returned by a key expression for use with table 'tbl_config', the key expression '"time"' returns 'java.lang.String' but the table expects 'java.math.BigDecimal'
It has confused me for a few days, Thanks for any help!
The table has a key field "id" that is type BigDecimal.
The expression tbl_config["time"] however provides the string value "time" as a key and not a BigDecimal value. Try tbl_config[id] assuming there is a field named 'id' in payment request that has a type BigDecimal.
The on-delete and on-insert in config.epl look a little awkward and on-merge would make this one easy to read statement.

Resources