Preprocess MQTT messages before ingest to InfluxDB - mqtt

I'm just researching InfluxDB at this stage and got this question: is it possible to do anything with MQTT message before ingest it to InfluxDB?
I get some sensors data trough MQTT and it's in JSON format, but main parameters are coming in encrypted format as a field of the JSON and before to ingest it I need to preprocess them and extract real values. What would be the best practice to do this?
I understand that I can write my own service that subscribe to MQTT, encode the data and then send it to InfluxDb, but is it only the option? Can I extend Telegraf somehow to do this?

Related

ksql DB to minio

I'm investigating the feasibility of using ksql db to filter out empty struct fields in kafka records in avro format then dump records to s3 in parquet format - removing all empty records in avro required in transforming to parquet.
Ksql can handle the filtering part in a very straightforward way. My questions is how I can connect ksql to minio.
Using kafka connector would require ksql db to write back to kafka under a new topic?
I also heard ksql has built-in connectors, not sure how it works.

Which is the faster or better way of sending large amounts of sensor data into QuestDB?

I have a large amount of streaming IoT sensor data and I'm using QuestDB as a time series database. I'm curious whether it's better to send this over Postgres or via Influx line protocol if I have a lot of measurements. I would like to use Python for this, if possible, but it depends which performs better.
InfluxDB line protocol is designed for high-throughput ingestion. This interface is for write operations only, so you will still need to use PostgreSQL wire protocol or REST API for querying.
It's recommended then for your IoT / high-throughput scenario to use InfluxDB line protocol as this is the most efficient. There are Python examples for simple socket writes over TCP and UDP on the official documentation.

Streaming InfluxDB data

I've used influxDB for a while now, but never had to continuously stream data from it. A simple GET /query was sufficient. But now I need a way to stream data to the frontend to draw pretty graphs and such.
So far we've been running GET /query periodically from the frontend, but this is highly inefficient. I would much rather get keep the connection open and receive the data when it's written to the DB. Searching the interwebs there doesn't seem to be support neither for websockets, nor for HTTP/2 in influxDB right now.
So, question to others, who possibly hit this issue - how did you solve this?
InfluxDB v1.x supports subscriptions. As data is written to InfluxDB, writes are duplicated to subscriber endpoints via HTTP, HTTPS, or UDP in line protocol.
https://docs.influxdata.com/influxdb/v1.7/administration/subscription-management/

How Kapacitor get Stream in TICK architecure

As per my information, Kapacitor can work on streams or batches. In case of batches, it fetches data from Influxdb and operate on that.
But how does it work with stream. Does it subscribe to InfluxDB or Telegraph. I hope it subscribe to InfluxDB. So in case any client write data to InfluxDb, Kapacitor also receive that data. Is this understanding correct? Or it subscribe directly to Telegraph?
Why this question is important to us is because we want to use Azure IoT hub in place of Telegraph. So, we will read the data from Azure IoT hub and write it to InfluxDb. We hope that we can use Kapacitor Stream here.
Thanks In Advance
Kapacitor subscribes to influxDB. By default if we do not mention what databases to subscribe to, it subscribes to all of them. In Kapacitor's config file you can mention the databases in InfluxDB which you want to subscribe to.

web logs parsing for Spark Streaming

I plan to create a system where I can read web logs in real time, and use apache spark to process them. I am planning to use kafka to pass the logs to spark streaming to aggregate statistics.I am not sure if I should do some data parsing (raw to json ...), and if yes, where is the appropriate place to do it (spark script, kafka, somewhere else...) I will be grateful if someone can guide me. Its kind of a new stuff to me. Cheers
Apache Kafka is a distributed pub-sub messaging system. It does not provide any way to parse or transform data it is not for that. But any Kafka consumer can process, parse or transform the data published to Kafka and republished the transformed data to another topic or store it in a database or file system.
There are many ways to consume data from Kafka one way is the one you suggested, real-time stream processors(apache flume, apache-spark, apache storm,...).
So the answer is no, Kafka does not provide any way to parse the raw data. You can transform/parse the raw data with spark but as well you can write your own consumer as there are many Kafka clients ports or use any other built consumer Apache flume, Apache storm, etc

Resources