ksql DB to minio - ksqldb

I'm investigating the feasibility of using ksql db to filter out empty struct fields in kafka records in avro format then dump records to s3 in parquet format - removing all empty records in avro required in transforming to parquet.
Ksql can handle the filtering part in a very straightforward way. My questions is how I can connect ksql to minio.
Using kafka connector would require ksql db to write back to kafka under a new topic?
I also heard ksql has built-in connectors, not sure how it works.

Related

Preprocess MQTT messages before ingest to InfluxDB

I'm just researching InfluxDB at this stage and got this question: is it possible to do anything with MQTT message before ingest it to InfluxDB?
I get some sensors data trough MQTT and it's in JSON format, but main parameters are coming in encrypted format as a field of the JSON and before to ingest it I need to preprocess them and extract real values. What would be the best practice to do this?
I understand that I can write my own service that subscribe to MQTT, encode the data and then send it to InfluxDb, but is it only the option? Can I extend Telegraf somehow to do this?

In thingsboards, How to save timeseries data in influxdb or Tdengine?

In thingsboard, I want to save timeseries data in influxdb or Tdengine.
please help me.
As of 3.3.4.1 Thingsboard supports only Postgres, Cassandra and Timescale databases. But you can use REST rule node to duplicate your data to InfluxDB via its Cloud API.

Change Data Capture PostgreSQL to Neo4j

In my project I have PostgreSQL database as main DB and I need to keep synchronized my neo4j DB. To do so I want to use Debezium for CDC, kafka and neo4j streams plugin. One of the reasons I prefer Debezium to jdbc is because it's real time. So at this point I want to get
PostgreSQL -> Debezium -> Kafka -> Confluent -> Neo4j
from documentation I found sink Neo4j CDC but only from another Neo4j DB.
Sink ingestion strategies
Change Data Capture Event
This method allows to ingest CDC events coming from another Neo4j Instance.
Why only from another Neo4j instance? I am confused because I don't understand how exactly I should implement Change data Capture from PostgreSQL to neo4j.

Querying Kafka Stream Store

Does the KSQL Editor in the UI, query all the KSQL Threads Store of an application at once, or just one of them ? How about the KSQL CLI ?

web logs parsing for Spark Streaming

I plan to create a system where I can read web logs in real time, and use apache spark to process them. I am planning to use kafka to pass the logs to spark streaming to aggregate statistics.I am not sure if I should do some data parsing (raw to json ...), and if yes, where is the appropriate place to do it (spark script, kafka, somewhere else...) I will be grateful if someone can guide me. Its kind of a new stuff to me. Cheers
Apache Kafka is a distributed pub-sub messaging system. It does not provide any way to parse or transform data it is not for that. But any Kafka consumer can process, parse or transform the data published to Kafka and republished the transformed data to another topic or store it in a database or file system.
There are many ways to consume data from Kafka one way is the one you suggested, real-time stream processors(apache flume, apache-spark, apache storm,...).
So the answer is no, Kafka does not provide any way to parse the raw data. You can transform/parse the raw data with spark but as well you can write your own consumer as there are many Kafka clients ports or use any other built consumer Apache flume, Apache storm, etc

Resources