KSQL queries have strange keys infront of the key value - ksqldb

I have created a stream like this:
CREATE STREAM TEST1 WITH (KAFKA_TOPIC='TEST_1',VALUE_FORMAT='AVRO');
I then query the stream like this via CLI:
SELECT * FROM TEST1;
The results are looking like this:
1571225518167 | \u0000\u0000\u0000\u0000\u0001\u0006key | 7 | 7 | blue
I wonder why the key is formated like this. Is my query somehow wrong? The value should be like this:
1571225518167 | key | 7 | 7 | blue

Your key is in Avro format, which KSQL doesn't support yet.
If you have control over the data producer, write the key in string format (e.g. Kafka Connect use the org.apache.kafka.connect.storage.StringConverter). If not, and you need to use the key e.g. for driving a KSQL table you'd need to re-key the data using KSQL:
CREATE STREAM TEST1_REKEY AS SELECT * FROM TEST1 PARTITION BY my_key_col

Related

KSQLDB: Using CREATE STREAM AS SELECT with Differing KEY SCHEMAS

Here is the description of the problem statement:
STREAM_SUMMARY: A stream with one of the value columns as an ARRAY-of-STRUCTS.
Name : STREAM_SUMMARY
Field | Type
------------------------------------------------------------------------------------------------------------------------------------------------
ROWKEY | STRUCT<asessment_id VARCHAR(STRING), institution_id INTEGER> (key)
assessment_id | VARCHAR(STRING)
institution_id | INTEGER
responses | ARRAY<STRUCT<student_id INTEGER, question_id INTEGER, response VARCHAR(STRING)>>
------------------------------------------------------------------------------------------------------------------------------------------------
STREAM_DETAIL: This is a stream to be created from STREAM1, by "exploding" the the array-of-structs into separate rows. Note that the KEY schema is also different.
Below is the Key and Value schema I want to achieve (end state)...
Name : STREAM_DETAIL
Field | Type
-------------------------------------------------------------------------------------------------------
ROWKEY | **STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER> (key)**
assessment_id | VARCHAR(STRING)
institution_id | INTEGER
student_id | INTEGER
question_id | INTEGER
response | VARCHAR(STRING)
My objective is to create the STREAM_DETAIL from the STREAM_SUMMARY.
I tried the below:
CREATE STREAM STREAM_DETAIL WITH (
KAFKA_TOPIC = 'stream_detail'
) AS
SELECT
STRUCT (
`assessment_id` := "assessment_id",
`student_id` := EXPLODE("responses")->"student_id",
`question_id` := EXPLODE("responses")->"question_id"
)
, "assessment_id"
, "institution_id"
, EXPLODE("responses")->"student_id"
, EXPLODE("responses")->"question_id"
, EXPLODE("responses")->"response"
FROM STREAM_SUMMARY
EMIT CHANGES;
While the SELECT query works fine, the CREATE STREAM returned with the following error:
"Key missing from projection."
If I add the ROWKEY column in the SELECT clause in the above statement, things work, however, the KEY schema of the resultant STREAM is same as the original SREAM's key.
The "Key" schema that I want in the new STREAM is : STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER> (key)
Alternatively, I tried creating the STREAM_DETAIL by hand (using plain CREATE STREAM statement by providing key and value SCHEMA_IDs). Later I tried the INSERT INTO approach...
INSERT INTO STREAM_DETAIL
SELECT ....
FROM STREAM_SUMMARY
EMIT CHANGES;
The errors were the same.
Can you please guide on how can I achieve enriching a STREAM but with a different Key Schema? Note that a new/different Key schema is important for me since I use the underlying topic to be synced to a database via a Kafka sink connector. The sink connector requires the key schema in this way, for me to be able to do an UPSERT.
I am not able to get past this. Appreciate your help.
You can't change the key of a stream when it is created from another stream.
But there is a different approach to the problem.
What you want is re-key. And to do so you need to use ksqlDB table. Can be solved like -
CREATE STREAM IF NOT EXISTS INTERMEDIATE_STREAM_SUMMARY_FLATTNED AS
SELECT
ROWKEY,
EXPLODE(responses) as response
FROM STREAM_SUMMARY;
CREATE TABLE IF NOT EXISTS STREAM_DETAIL AS -- This also creates a underlying topic
SELECT
ROWKEY -> `assessment_id` as `assessment_id`,
response -> `student_id` as `student_id`,
response -> `question_id` as `question_id`,
LATEST_BY_OFFSET(ROWKEY -> `institution_id`, false) as `institution_id`,
LATEST_BY_OFFSET(response -> `response`, false) as `response`
FROM INTERMEDIATE_STREAM_SUMMARY_FLATTNED
GROUP BY ROWKEY -> `assessment_id`, response -> `student_id`, response -> `question_id`;
Key schema will be STRUCT<asessment_id VARCHAR(STRING), student_id INTEGER, question_id INTEGER>, you can check schema registry or print the topic to validate that. In ksqlDB describe table will show you flat key, but don't panic.
I have used similar and sync the final topic to database.

Is it possible to use literal data as stream source in Sumologic?

Is it possible for a Sumologic user to define data source values inside a Query and use it in subquery condition?
For example in SQL, one can use literal data as source table.
-- example in MySQL
SELECT * FROM (
SELECT 1 as `id`, 'Alice' as `name`
UNION ALL
SELECT 2 as `id`, 'Bob' as `name`
-- ...
) as literal_table
I wonder if Sumo logic also have such kind of functionality.
I believe combining such literal with subqueries would make user's life easier.
I believe the equivalent in a Sumo Logic query would be combining the save operator to create a lookup table in a subquery: https://help.sumologic.com/05Search/Subqueries#Reference_data_from_child_query_using_save_and_lookup
Basically something like this:
_sourceCategory=katta
[subquery:(_sourceCategory=stream explainJSONPlan.ETT) error
| where !(statusmessage="Finished successfully" or statusmessage="Query canceled" or isNull(statusMessage))
| count by sessionId, statusMessage
| fields -_count
| save /explainPlan/neededSessions
| compose sessionId keywords]
| parse "[sessionId=*]" as sessionId
| lookup statusMessage from /explainPlan/neededSessions on sessionid=sessionid
Where /explainPlan/neededSessions is your literal data table that you select from later on in the query (using lookup).
You can define a lookup table with some static map/dictionary you update not so often (you can even point to a file in the internet in case you change the mapping often).
And then you can use the |lookup operator. It's nothing special for subqueries.
Disclaimer: I am currently employed by Sumo Logic.

ksql table adds extra characters to rowkey

I have some kafka topics with avro format, I created an stream and a table to be able to join with ksql, but the result of the join comes always as null.
Following the troubleshoot, I found that the key is prepended with some character, which dependes on the length of the string. I suppose it has to do with something about avro, but I can't find where is the problem.
CREATE TABLE entity_table ( Id VARCHAR, Info info )
WITH
(
KAFKA_TOPIC = 'pisos',
VALUE_FORMAT='avro',
KEY = 'Id');
select * from entity_table;
1562839624583 | $99999999999.999999 | 99999999999.510136 | 1
1562839631250 | &999999999990.999999 | 99999999999.510136 | 2
How are you populating the Kafka topic? KSQL currently only supports string keys. If you can't change how the topic is populated you could do:
CREATE STREAM entity_src WITH (KAFKA_TOPIC = 'pisos', VALUE_FORMAT='avro');
CREATE STREAM entity_rekey AS SELECT * FROM entity_src PARTITION BY ID;
CREATE TABLE entity_table with (KAFKA_TOPIC='entity_rekey', VALUE_FORMAT='AVRO');
BTW you don't need to specify the schema if you are using Avro.

How to show same column in dbgrid with different criteria

i need your help to finish my delphi homework.
I use ms access database and show all data in 1 dbgrid using sql. I want to show same column but with criteria (50 record per column)
i want select query to produce output like:
No | Name | No | Name |
1 | A | 51 | AA |
2 | B | 52 | BB |
3~50 | | 53~100| |
Is it possible ?
I can foresee issues if you choose to return a dataset with duplicate column names. To fix this, you must change your query to enforce strictly unique column names, using as. For example...
select A.No as No, A.Name as Name, B.No as No2, B.Name as Name2 from TableA A
join TableB B on B.Something = A.Something
Just as a note, if you're using a TDBGrid, you can customize the column titles. Right-click on the grid control in design-time and select Columns Editor... and a Collection window will appear. When adding a column, link it to a FieldName and then assign a value to Title.Caption. This will also require that you set up all columns. When you don't define any columns here, it automatically returns all columns in the query.
On the other hand, a SQL query may contain duplicate field names in the output, depending on how you structure the query. I know this is possible in SQL Server, but I'm not sure about MS Access. In any case, I recommend always returning a dataset with unique column names and then customizing the DB Grid's column titles. After all, it is also possible to connect to an excel spreadsheet, which can very likely have identical column names. The problem arrives when you try to read from one of those columns for another use.

How to get output of sql queries in FitNesse + DbFit?

I am trying to get sql query output in DBfit using i.e. !|Execute|select * from abc| but don't know how it will display in DBfit.
I think that you are looking for the Inspect Query table (you can find reference docs about it here).
!|Inspect Query|select * from abc|
When executed, this will print the resultset of the query.
First, the execute fixture is typically used for actions that do not return data, e.g.:
!|Execute|insert into tablename values (…)|
or
!|Execute|update tablename st... where...|
However, even some non-data actions have more specific commands. The above update can be done with, for example, with:
!|Update|tablename |
|field_to_change=|field_to_select|
|new value |matching value |
For returning data, use the query fixture
!|query|select Id, BatchNum from tablename|
|Id |BatchNum? |
|1 |>>Bat1 |
|2 |<<Bat1 |
As shown, just put your field names in the row below the fixture, then your data rows below that.

Resources