Telegraf MQTT input data flatten - mqtt

How can I use Telegraf to extract timestamp and sensor value from an MQTT message and insert it into a PostgreSQL database with separate timestamp and sensor value columns?
I am receiving this JSON object from MQTT:
{"sensor": "current", "data": [[1614945972418042880, 1614945972418042880], [1614945972418294528, 0.010058338362502514], [1614945972418545920, 0.010058338362502514]]}
It contains two fields: "sensor" and "data". The "sensor" field contains a string value that identifies the type of sensor and the "data" field contains an array of arrays, where each sub-array contains a timestamp and a sensor value. I am using Telegraf to output this data to a PostgreSQL database. I would like to separate the timestamp and sensor value and flatten it out of the list and use the sensor name as the column name, how can I configure Telegraf to do this?
So my table would look like this :
timestamp
current
1614945972418042880
1614945972418042880
1614945972418294528
0.010058338362502514
[[inputs.mqtt_consumer]]
servers = ["tcp://localhost:1883"]
topics = ["your_topic"]
data_format = "json"
json_query = "data.*"
tag_keys = ["sensor","timestamp"]
measurement = "sensors"`

Related

Consumer_failed_message in kafka stream: Records not pushed from topic

I have a flow where from IBM mainframe IIDR, I am sending records to Kafka topic. The value_format of the message coming to Kafka topic is AVRO and the key is in AVRO format too. The records are pushed into the Kafka topic. I have a stream associated with that topic. But the records are not passed into the stream.
Example of the test_iidr topic -
rowtime: 5/30/20 7:06:34 PM UTC, key: {"col1": "A", "col2": 1}, value: {"col1": "A", "col2": 11, "col3": 2, "iidr_tran_type": "QQ", "iidr_a_ccid": "0", "iidr_a_user": " ", "iidr_src_upd_ts": "2020-05-30 07:06:33.262931000", "iidr_a_member": " "}
The value_format in the stream is AVRO and the column names are all checked.
The stream creation query -
CREATE STREAM test_iidr (
col1 STRING,
col2 DECIMAL(2,0),
col3 DECIMAL(1,0),
iidr_tran_type STRING,
iidr_a_ccid STRING,
iidr_a_user STRING,
iidr_src_upd_ts STRING,
iidr_a_member STRING)
WITH (KAFKA_TOPIC='test_iidr', PARTITIONS=1, REPLICAS=3, VALUE_FORMAT='AVRO');
Is it failing to load into the stream from the topic as the KEY is not mentioned in WITH statement?
The schema registry has the test_iidr-value and test_iidr-key subjects registered in it.
The key.converter and value.converter in the Kafka-connect docker is set as - org.apache.kafka.connect.json.JsonConverter. Is this JsonConverter creating this issue?
I created a completely different pipeline with different stream and inserted the same data manually using insert into statements. It worked. Only the IIDR flow is not working and the records are not pushed into the stream from the topic.
I am using Confluent kafka version 5.5.0.
The JsonConverter in the connect config could well be converting your Avro data to JSON.
To determine the key and value serialization formats you can use the PRINT command, (which I can see you've already run). PRINT will output the key and value formats when it runs. For example:
ksql> PRINT some_topic FROM BEGINNING LIMIT 1;
Key format: JSON or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 5/30/20 7:06:34 PM UTC, key: {"col1": "A", "col2": 1}, value: {"col1": "A", "col2": 11, "col3": 2, "iidr_tran_type": "QQ", "iidr_a_ccid": "0", "iidr_a_user": " ", "iidr_src_upd_ts": "2020-05-30 07:06:33.262931000", "iidr_a_member": " "}
So the first thing to check is the formats output for the key and value by PRINT and then update your CREATE statement accordingly.
Note, ksqlDB does not yet support Avro/Json keys, so you may want/need to repartition your data, see: https://docs.ksqldb.io/en/latest/developer-guide/syntax-reference/#what-to-do-if-your-key-is-not-set-or-is-in-a-different-format
Side note: If the schema for the value is stored in the Schema Registry, then you don't need to define the columns in your CREATE statement as ksqlDB will load the columns from the Schema Registry
Side note: you don't need PARTITIONS=1, REPLICAS=3 in the WITH clause for existing topics, only if you want ksqlDB to create the topic for you.

Telegraf + MQTT + influxdb: Modifying input of mqtt

Let us assume I have various MQTT clients who send data within some topic, for instance for temperature sensors tele/temp/%devicename%/SENSOR in a JSON Format, such as
{"Time":"2020-03-24T20:17:04","DS18S20":{"Temperature":22.8},"TempUnit":"C"}}
My basic telegraf.conf looks as following
# Influxdb Output
[[outputs.influxdb]]
database = "telegraf"
# sensors
[[inputs.mqtt_consumer]]
name_override = "sensor"
topics = ["tele/temp/+/SENSOR"]
data_format = "json"
My problem is now that I fail to do basic operations on that json data.
I do not want to save the host and the topic. How can I drop fields?
The topic contains the %devicename%. How can I add it as tag?
I cannot use json_query, since the %devicename% is there and there is only one field.
How can I rename the field %devicename%_Temperature to "temperature"?
In general, I would like to have an easy way to keep only the measurement of a multitude of sensors in the following format
timestamp | temperature | device
2020-03-24T20:17:04 | 22.8 | DS18S20
Thanks a lot!
If you don't want to save the topic in InfluxDB set the following as part of your inputs_mqtt_consumer:
topic_tag=""
Adding the topic as a tag name can be done using processors.regex
[[processors.regex]]
[[processors.regex.tags]]
key = "topic"
pattern = ".*/(.*)/.*"
replacement = "${1}"
result_key = "devicename"
In this case the 2nd section of the topic would become the devicename.

How to transform to Entity Attribute Value (EAV) using Spoon Normalise

I am trying to use Spoon (Pentaho Data Integration) to change data that is in typical row format to Entity Attribute Value format.
My source data is as follows:
My Normaliser is setup as follows:
And here are the results:
Why is the value for the CONDITION_START_DATE and CONDITION_STOP_DATE in the string_value column instead of the date_value column?
According to this documentation
Fieldname: Name of the fields to normalize
Type: Give a string to classify the field.
New field: You can give one or more fields where the new value should transferred to.
Please check Normalizing multiple rows in a single step section in http://wiki.pentaho.com/display/EAI/Row+Normaliser. Accordind to this, you should have a group of fields with the same Type (pr_sl -> Product1, pr1_nr -> Product1), only in this case you can get multiple fields in output (pr_sl -> Product Sales, pr1_nr -> Product Number).
In your case you can convert dates to strings and then use row normalizer with single new field and then use formula for example:
And then convert date_value to date.

InfluxDB design issue

I am using influxDB and using line protocol to insert large set of data into Data base. Data i am getting is in the form of Key value pair, where key is long string contains Hierarchical data and value is simple integer value.
Sample Key Value data :
/path/units/unit/subunits/subunit[name\='NAME1']/memory/chip/application/filter/allocations
value = 500
/path/units/unit/subunits/subunit[name\='NAME2']/memory/chip/application/filter/allocations
value = 100
(Note Name = 2)
/path/units/unit/subunits/subunit[name\='NAME1']/memory/chip/application/filter/free
value = 700
(Note Instead of allocation it is free at the leaf)
/path/units/unit/subunits/subunit[name\='NAME2']/memory/graphics/application/filter/swap
value = 600
Note Instead of chip, graphics is in path)
/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/size
value = 400
Note Different path but till subunit it is same
/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free
value=100
Note Same path but last element is different
Below is the line protocol i am using to insert data.
interface, Key= /path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free, valueData= 500
I am Using one measurement namely, Interface. And one tag and one field set. But this DB design is causing issue for querying data.
How can I design database so that i can query like, Get all record for subunit where name = Name1 or get all size data for every hard disk.
Thanks in advance.
The Schema I'd recommend would be the following:
interface,filename=/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free value=500
Where filename is a tag and value is the field.
Given that the cardinality of filename in the thousands this schema should work well.

CQL3:How does one avoid string being translated to ASCII?

I store my message and message id in cassandra database.I use https://github.com/matehat/cqerl client to work with ejabberd server for storing messages in cassandra database.I fetch records from cassandra database by select query using cqerl client:
cqerl:run_query(Pid,"SELECT * FROM CONV_DETAILS;").
I get list of binary integer as output as follows:
[<<0,0,0,9,52,57,49,56,52,48,52,57,55>>,
<<0,0,0,8,0,0,0,14,204,123,132,9>>,
<<0,0,0,2,110,111>>]
for the string "what are you doing now?".
How Can I avoid original string translated into ascii number as above?

Resources