Consumer_failed_message in kafka stream: Records not pushed from topic - ksqldb

I have a flow where from IBM mainframe IIDR, I am sending records to Kafka topic. The value_format of the message coming to Kafka topic is AVRO and the key is in AVRO format too. The records are pushed into the Kafka topic. I have a stream associated with that topic. But the records are not passed into the stream.
Example of the test_iidr topic -
rowtime: 5/30/20 7:06:34 PM UTC, key: {"col1": "A", "col2": 1}, value: {"col1": "A", "col2": 11, "col3": 2, "iidr_tran_type": "QQ", "iidr_a_ccid": "0", "iidr_a_user": " ", "iidr_src_upd_ts": "2020-05-30 07:06:33.262931000", "iidr_a_member": " "}
The value_format in the stream is AVRO and the column names are all checked.
The stream creation query -
CREATE STREAM test_iidr (
col1 STRING,
col2 DECIMAL(2,0),
col3 DECIMAL(1,0),
iidr_tran_type STRING,
iidr_a_ccid STRING,
iidr_a_user STRING,
iidr_src_upd_ts STRING,
iidr_a_member STRING)
WITH (KAFKA_TOPIC='test_iidr', PARTITIONS=1, REPLICAS=3, VALUE_FORMAT='AVRO');
Is it failing to load into the stream from the topic as the KEY is not mentioned in WITH statement?
The schema registry has the test_iidr-value and test_iidr-key subjects registered in it.
The key.converter and value.converter in the Kafka-connect docker is set as - org.apache.kafka.connect.json.JsonConverter. Is this JsonConverter creating this issue?
I created a completely different pipeline with different stream and inserted the same data manually using insert into statements. It worked. Only the IIDR flow is not working and the records are not pushed into the stream from the topic.
I am using Confluent kafka version 5.5.0.

The JsonConverter in the connect config could well be converting your Avro data to JSON.
To determine the key and value serialization formats you can use the PRINT command, (which I can see you've already run). PRINT will output the key and value formats when it runs. For example:
ksql> PRINT some_topic FROM BEGINNING LIMIT 1;
Key format: JSON or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 5/30/20 7:06:34 PM UTC, key: {"col1": "A", "col2": 1}, value: {"col1": "A", "col2": 11, "col3": 2, "iidr_tran_type": "QQ", "iidr_a_ccid": "0", "iidr_a_user": " ", "iidr_src_upd_ts": "2020-05-30 07:06:33.262931000", "iidr_a_member": " "}
So the first thing to check is the formats output for the key and value by PRINT and then update your CREATE statement accordingly.
Note, ksqlDB does not yet support Avro/Json keys, so you may want/need to repartition your data, see: https://docs.ksqldb.io/en/latest/developer-guide/syntax-reference/#what-to-do-if-your-key-is-not-set-or-is-in-a-different-format
Side note: If the schema for the value is stored in the Schema Registry, then you don't need to define the columns in your CREATE statement as ksqlDB will load the columns from the Schema Registry
Side note: you don't need PARTITIONS=1, REPLICAS=3 in the WITH clause for existing topics, only if you want ksqlDB to create the topic for you.

Related

Telegraf MQTT input data flatten

How can I use Telegraf to extract timestamp and sensor value from an MQTT message and insert it into a PostgreSQL database with separate timestamp and sensor value columns?
I am receiving this JSON object from MQTT:
{"sensor": "current", "data": [[1614945972418042880, 1614945972418042880], [1614945972418294528, 0.010058338362502514], [1614945972418545920, 0.010058338362502514]]}
It contains two fields: "sensor" and "data". The "sensor" field contains a string value that identifies the type of sensor and the "data" field contains an array of arrays, where each sub-array contains a timestamp and a sensor value. I am using Telegraf to output this data to a PostgreSQL database. I would like to separate the timestamp and sensor value and flatten it out of the list and use the sensor name as the column name, how can I configure Telegraf to do this?
So my table would look like this :
timestamp
current
1614945972418042880
1614945972418042880
1614945972418294528
0.010058338362502514
[[inputs.mqtt_consumer]]
servers = ["tcp://localhost:1883"]
topics = ["your_topic"]
data_format = "json"
json_query = "data.*"
tag_keys = ["sensor","timestamp"]
measurement = "sensors"`

dataflow streaming job early results

Related Early results from GroupByKey transform
Read avro Source Files from GCS in stream mode
Filter Experiment Event and Output Key value pair.
Key ->"experimentId": "aa", "experimentVariant": 2, "uuid": abbcd
value-> eventDate
Fixed window to buffer elements(60-120 seconds) with discarding panes
Combine per key to collect distinct dates.
Output example ->
One window result
Key ->"experimentId": "aa", "experimentVariant": 2, "uuid": abbcd
value -> Set("2020-06-01","2020-06-02")
next window result
Key ->"experimentId": "aa", "experimentVariant": 2, "uuid": abbcd
value -> Set("2020-06-03")
write to gcs
issue is combine step does not give output for a long time even though window is of just 60 seconds.
Aggregating steps such as Combine do not give output until the watermark reaches the end of the window (unless non-default triggering is set up). For file-based sources, it can be the case that records are not sorted by timestamp, so the entire file must be read before it is safe to advance the watermark. On Dataflow, you can see this in the UI.

how to add labels when ingesting data into neo4j from Kafka with neo4j-streams?

I have a stream of Kafka messages that look like this:
{
"ts": 1574487125808,
"uid": "Cxxzpx3A12ai2Ckn4f",
"id_orig_h": "10.0.1.19",
"id_orig_p": 53312,
"id_resp_h": "10.0.1.16",
"id_resp_p": 8080,
"proto": "tcp",
"service": "http",
"duration": 9.636139154434204,
"orig_bytes": 760,
"resp_bytes": 220,
"conn_state": "SF",
"local_orig": true,
"local_resp": true,
"missed_bytes": 0,
"history": "ShADadfF",
"orig_pkts": 6,
"orig_ip_bytes": 1080,
"resp_pkts": 5,
"resp_ip_bytes": 488
}
I'm using the neo4j-streams (v.3.5.4) consumer to build a graph of the messages. Neo4j version is v.3.5.12. I added this to neo4j.conf:
streams.sink.enabled=true
streams.sink.topic.cypher.conn=MERGE (origin:Origin {id: event.id_orig_h}) MERGE (response:Response {id: event.id_resp_h}) MERGE (origin)-[:CONNECTED_TO]->(response)
... and it builds the graph. Great!
I'd like to add the ip address labels to the nodes, and am struggling to get that bit working. I've tried adding SET commands, e.g.
streams.sink.topic.cypher.conn=MERGE (origin:Origin {id: event.id_orig_h}) SET origin:event.id_orig_h MERGE (response:Response {id: event.id_resp_h}) SET response:event.id_resp_h MERGE (origin)-[:CONNECTED_TO]->(response)
Which resulted in the following error:
ErrorData(originalTopic=conn, timestamp=1574493915864, partition=0, offset=617872, exception=org.neo4j.graphdb.QueryExecutionException: Invalid input '.': expected an identifier character, whitespace, NodeLabel, a property map, a relationship pattern, ',', FROM GRAPH, CONSTRUCT, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE UNIQUE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, CALL, RETURN, UNION, ';' or end of input (line 1, column 86 (offset: 85))
"UNWIND {events} AS event MERGE (origin:Origin {id: event.id_orig_h}) SET origin:event.id_orig_h MERGE (response:Response {id: event.id_resp_h}) SET response:event.id_resp_h MERGE (origin)-[:CONNECTED_TO]->(response) "
^, key=null, value={"ts":1574493904862,"uid":"C04y9u1KAZfAIHVVu","id_orig_h":"10.0.1.61","id_orig_p":53790,"id_resp_h":"10.0.1.1","id_resp_p":53,"proto":"udp","service":"dns","duration":0.0001399517059326172,"orig_bytes, executingClass=class streams.kafka.KafkaAutoCommitEventConsumer)
at streams.service.errors.KafkaErrorService.report(KafkaErrorService.kt:37)
at streams.kafka.KafkaAutoCommitEventConsumer.executeAction(KafkaAutoCommitEventConsumer.kt:95)
at streams.kafka.KafkaAutoCommitEventConsumer.readSimple(KafkaAutoCommitEventConsumer.kt:85)
at streams.kafka.KafkaAutoCommitEventConsumer.read(KafkaAutoCommitEventConsumer.kt:132)
at streams.kafka.KafkaEventSink$createJob$1.invokeSuspend(KafkaEventSink.kt:95)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:32)
at kotlinx.coroutines.DispatchedTask$DefaultImpls.run(Dispatched.kt:235)
at kotlinx.coroutines.AbstractContinuation.run(AbstractContinuation.kt:19)
at kotlinx.coroutines.scheduling.Task.run(Tasks.kt:94)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:732)
It looks like I've butchered the SET command. I've tried various permutations in the SET section, e.g. no event prefix, escaping the period after event, and removing the event. entirely.
Can you see what I'm doing wrong?
Labels cannot be set from a variable, or you will have to use APOC .
I suggest to read the Neo4j naming rules and recommendations : https://neo4j.com/docs/cypher-manual/current/syntax/naming/
Secondly, can you describe why you would like to set the IP as a label ? This doesn't make a lot of sense to be honest.

KSQL create table from stream for latest data

I have a topic called customers and I have created a stream for it
CREATE STREAM customers_stream (customerId INT, isActive BOOLEAN)
WITH (KAFKA_TOPIC='customers', VALUE_FORMAT='json');
My producer for customers topic is generating a Integer key and a json value. But when I see the row key is being set to a some binary value
ksql> print 'customers';
Format:JSON
{"ROWTIME":1570305904984,"ROWKEY":"\u0000\u0000\u0003�","customerId":1001,"isActive":true}
{"ROWTIME":1570307584257,"ROWKEY":"\u0000\u0000\u0003�","customerId":1002,"isActive":true}
Now if i create a table it results in a single row (maybe because row key is the same??)
CREATE TABLE customers (customerId INT, isActive BOOLEAN)
WITH (KAFKA_TOPIC='customers', KEY='customerId',VALUE_FORMAT='json');
After searching the web I bumped into this article https://www.confluent.io/stream-processing-cookbook/ksql-recipes/setting-kafka-message-key and created a new stream by repartitioning on the key
CREATE STREAM customers_stream2 AS \
SELECT * FROM customers_stream \
PARTITION BY customerId;
So how do I create a table which has the latest values of customers data?
creating a table from stream is resulting in a error
CREATE TABLE customers_2_table_active AS
SELECT CUSTOMERID,ISACTIVE
FROM customers_stream2;
Invalid result type. Your SELECT query produces a STREAM. Please use CREATE STREAM AS SELECT statement instead.
I need the latest value of the various rows so that another microservice can query the new table.
Thank you in advance
Rekeying seems to be the right approach, however, you cannot convert a STREAM into a TABLE directly.
Note, that your rekeyed stream customers_stream2 is written into a corresponding topic. Hence, you should be able to crate a new TABLE from the stream's topic to get the latest value per key.

Merged Google Fusion Tables - Select Query with WHERE clause on Merge Key Error

I have merged two Fusion Tables together on the key "PID". Now I would like to do a SELECT query WHERE PID = "value'. The error comes back that no column with the name PID exists in the table. A query for another column gives this result:
"kind": "fusiontables#sqlresponse",
"columns": [
"\ufeffPID",
"Address",
"City",
"Zoning"
],
"rows": [
[
"001-374-079",
"# LOT 15 MYSTERY BEACH RD",
"No_City_Value",
"R-1"
],
It appears that the column name has been changed from "PID" to "\ufeffPID", which no matter how many attempts to get the syntax to read a GET Url, I keep getting an error.
Is there any limitation with querying on the key of a merged table? Since I cannot seem to get the name correct for the column a work around would be to use the Column ID but that does not seem to be an option either. Here is the URL:
https://www.googleapis.com/fusiontables/v1/query?sql=SELECT 'PID','Address','City','Zoning' FROM 1JanYNl3T45kFFxqAmGS0BRgkopj4AS207qnLVQI WHERE '\ufeffPID' = 001-493-078&key=myKey
Cheers
I have no explanation for \ufeff in there; that's the Unicode character 'ZERO WIDTH NO-BREAK SPACE', so it's conceivable that it's actually there in the column name because it would be invisible in the UI. So, first off I would recommend changing the name in the base tables and see if that works.
Column IDs for merge tables have a different form than for base tables. An easy way to get them is to add the filters of interest to one of your tabs (any type will do) and then do Tools > Publish. The top text ("Send a link in email or IM") has a query URL that has what you need. Run it through a URL decoder such as http://meyerweb.com/eric/tools/dencoder/ and you'll see the column ID for PID is col0>>0.
Rod

Resources