export telemetry data of a device in ThingsBoard - iot

im using thingsboard community edition.
i want to know if there is a way to export all time series data of a device into csv or any other file format. i need all the data to analyse it.
thingsboard Professional edition has this feature. but how about Community Edition?

Default csv/xls export is only available in the professional version.
But you could use the REST api to acquire the historical data.
My reference below states:
You can also fetch list of historical values for particular entity type and entity id using GET request to the following URL
http(s)://host:port/api/plugins/telemetry/{entityType}/{entityId}/values/timeseries?keys=key1,key2,key3&startTs=1479735870785&endTs=1479735871858&interval=60000&limit=100&agg=AVG
The supported parameters are described below:
keys - comma separated list of telemetry keys to fetch.
startTs - unix timestamp that identifies start of the interval in milliseconds.
endTs - unix timestamp that identifies end of the interval in milliseconds.
interval - the aggregation interval, in milliseconds.
agg - the aggregation function. One of MIN, MAX, AVG, SUM, COUNT, NONE.
limit - the max amount of data points to return or intervals to process.
ThingsBoard will use startTs, endTs and interval to identify aggregation partitions or sub-queries and execute asynchronous queries to DB that leverage built-in aggregation functions."
Reference:Thingsboard docs: ts data values api

Related

How to send non aggregated metric to Influx from Springboot application?

I have a SpringBoot application that is under moderate load. I want to collect metric data for a few of the operations of my app. I am majorly interested in Counters and Timers.
I want to count the number of times a method was invoked (# of invocation over a window, for example, #invocation over last 1 day, 1 week, or 1 month)
If the method produces any unexpected result increase failure count and publish a few tags with that metric
I want to time a couple of expensive methods, i.e. I want to see how much time did that method took, and also I want to publish a few tags with metrics to get more context
I have tried StatsD-SignalFx and Micrometer-InfluxDB, but both these solutions have some issues I could not solve
StatsD aggregates the data over flush window and due to aggregation metric tags get messed up. For example, if I send 10 events in a flush window with different tag values, and the StatsD agent aggregates those events and publishes only one event with counter = 10, then I am not sure what tag values it's sending with aggregated data
Micrometer-InfluxDB setup has its own problems, one of them being micrometer sending 0 values for counters if no new metric is produced and in that fake ( 0 value counter) it uses same tag values from last valid (non zero counter)
I am not sure how, but Micrometer also does some sort of aggregation at the client-side in MeterRegistry I believe, because I was getting a few counters with a value of 0.5 in InfluxDB
Next, I am planning to explore Micrometer/StatsD + Telegraf + Influx + Grafana to see if it suits my use case.
Questions:
How to avoid metric aggregation till it reaches the data store (InfluxDB). I can do the required aggregation in Grafana
Is there any standard solution to the problem that I am trying to solve?
Any other suggestion or direction for my use case?

Can InfluxDB have Continuous Queries with same source & target measurements but with different/new tags?

Below is the scenario against which I have this question.
Requirement:
Pre-aggregate time series data within influxDb with granularity of seconds, minutes, hours, days & weeks for each sensor in a device.
Current Proposal:
Create five Continuous Queries (one for each granularity level i.e. Seconds, minutes ...) for each sensor of a device in a different retention policy as that of the raw time series data, when the device is onboarded.
Limitation with Current Proposal:
With increased number of device/sensor (time series data source), the influx will get bloated with too many Continuous Queries (which is not recommended) and will take a toll on the influxDb instance itself.
Question:
To avoid the above problems, is there a possibility to create Continuous Queries on the same source measurement (i.e. raw timeseries measurement) but the aggregates can be differentiated within the measurement using new tags introduced to differentiate the results from Continuous Queries from that of the raw time series data in the measurement.
Example:
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."raw"."strain" GROUP BY time(1s),*
END
As far as I know, and have seen from the docs, it's not possible to apply new tags in continuous queries.
If I've understood the requirements correctly this is one way you could approach it.
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."strain_seconds_retention_policy"."strain" GROUP BY time(1s),*
END
This would save the data in the same measurement but a different retention policy - strain_seconds_retention_policy. When you do a select you specify the corresponding retention policy from which to select.
Note that, it is not possible to perform a select from several retention policies at the same time. If you don't specify one, the default one is used (and not all of them). If it is something you need then another approach could be used.
I don't quite get why you'd need to define a continuous query per device and per sensor. You only need to define five (1 per seconds, minutes, hours, days, weeks) and do a group by * (all) which you already do. As long as the source datapoint has a tag with the id for the corresponding device and sensor, the resampled datapoint will have it too. Any newly added devices (data) will just be processed automatically by those 5 queries and saved into the corresponding retention policies.
If you do want to apply additional tags, you could process the data outside the database in a custom script and write it back with any additional tags you need, instead of using continuous queries

InfluxDB design choices and query

I am currently writing a program that does anomaly detection on system calls by creating a database of “normal behaviour”. The goal now is to visualize the data being collected. There are about 200 occurences a minute where a process does a systemcall. Currently, I am writing these system calls to an InfluxDB where the name of the process that does the calls is a tag and the system calls it does are fields.
A datapoint now is a timestamp, measurement, the processname, a call it did
I want to visualize the occurences of certain calls a process did in a specific amount of time, for example:
How many times did process x call ‘fopen’ in the last 5 minutes?
Is my schema sufficient for this? How would a query for this look like?
Thank you in advance.

Create sum of multiple queries with influxdb

I have four singlestat panels which show my used space on different hosts (every host has also different type_instances):
The query for one of this singlestats is the following:
Question: Is there a way to create a fifth singlestat panel which sows the sum of the other 4 singlestats ? (The sum of all "storj_value" where type=shared)
The influx query language does not currently support aggregations across metrics (eg, JOINs). It is possible with Kapacitor but that requires that new aggregated values for all the measurements are written to the DB, by writing code to do it, which will need to be queried separately.
Only option currently is to use an API that does have cross-metric function support, for example Graphite with an InfluxDB storage back-end, InfluxGraph.
The two APIs are quite different - Influx's is query language based, Graphite is not - and tagged InfluxDB data will need to be configured as a Graphite metric path via templates, see configuration examples.
After that, Graphite functions that act across series can be used, in particular for the above question, sumSeries.

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.
Metrics are written into InfluxDB in the following way. Here I only show one as an example
new Serie.Builder("perfmetric1")
.columns("time", "value", "id", "type")
.values(getTime(), getPerf1(), getId(), getType())
.build()
Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.
select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h
For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.
I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?
As #valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.
In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:
select * from perfmetric1 into perfmetric1.[id];
Later you would do:
select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h
This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.
Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

Resources