Derivative not working in batch tick script kapacitor - influxdb

I am using Derivative function to calcute bandwidth and send alert via kapacitor,
below is the query
|query(''' SELECT derivative(mean("bandwidth_in"), 1s) *8 as "value" FROM "router"."autogen"."cisco_router" where host = '10.1.11.1' and ( interface_name = 'XXX' or interface_name = 'XXXXX')) AND time < now() GROUP BY time(1s) ''')
.cluster('network')
.period(7m)
.every(6m)
.groupBy(time(10s), *)
and when i try to save this tick script in chronograf i get below error.
failed to parse InfluxQL query: derivative aggregate requires a GROUP BY interval

Related

Workaround for a nested query in Kapactior TICKscript

As far I know there is no possibility to execute nested query in Kapacitor TickScript, so I'm looking for some other way to achieve the same result as I have in InfluxQL query:
select count(*) from (SELECT sum("value") FROM "measurement"."autogen"."consumption" WHERE (time > now() -5d -1h AND time <= now() - 5d) GROUP BY time(60m, 1ms), "param1", "param2", "param3")
The result of that query is a single point with one value, which contains total row numbers from nested query, for example 50.
I wrote something similar in tickscript:
var cron = '0 59 * * * * *'
var size_1 = batch
|query('''
SELECT sum(value) FROM "measurement"."autogen"."consumption"
''')
.period(1h)
.offset(5d - 1m - 1ms)
.groupBy(time(60m, 1ms), 'param1', 'param2', 'param3')
.fill(1)
.cron(cron)
|count('sum')
|log()
size_1
|alert()
.kafka()
.kafkaTopic('influx')
But I don't get a single value in output, instead of that I have multiple points still grouped by that 3 parameters from query ('param1', 'param2', 'param3') and they are only counted non-unique set of params, fragment of kapacitor log():
2021-04-21T03:51:00.220+02:00
Kapacitor Point
3 Tags
param1: 2.8.0
param2: 0015474_7
param3: SUPPLEMENTARY
1 Fields
count: 2
2021-04-21T03:51:00.221+02:00
Kapacitor Point
3 Tags
param1: 2.8.0
param2: PW0001_1
param3: SUPPLEMENTARY
1 Fields
count: 2
etc.
How to get the same output with single count() resulut in kapacitor tickscript like in influxQL query?

InfluxDB 1.7.2 - Top X over time

I’m new to InfluxDB. I’m using it to store ntopng timeseries data.
ntopng writes a measurement called asn:traffic that stores how many bytes were sent and received for an ASN.
> show tag keys from "asn:traffic"
name: asn:traffic
tagKey
------
asn
ifid
> show field keys from "asn:traffic"
name: asn:traffic
fieldKey fieldType
-------- ---------
bytes_rcvd float
bytes_sent float
>
I can run a query to see the data rate in bps for a specific ASN:
> SELECT non_negative_derivative(mean("bytes_rcvd"), 1s) * 8 FROM "asn:traffic" WHERE "asn" = '2906' AND time >= now() - 12h GROUP BY time(30s) fill(none)
name: asn:traffic
time non_negative_derivative
---- -----------------------
1550294640000000000 30383200
1550294700000000000 35639600
...
...
...
>
However, what I would like to do is create a query that I can use to return the top N ASNs by data rate and plot that on a Grafana graph. Sort of like this example that is using ELK.
I've tried a few variants from posts here and elsewhere, but I haven't been able to get what I'm after. For example, this query I think gets me closer to where I want to be, but there are no values in asn:
> select top(bps,asn,10) from (SELECT non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
name: asn:traffic
time top asn
---- --- ---
1550299860000000000 853572800
1550301660000000000 1197327200
1550301720000000000 1666883866.6666667
1550310780000000000 674889600
1550329320000000000 20979431866.666668
1550332740000000000 707015600
1550335920000000000 2066646533.3333333
1550336820000000000 618554933.3333334
1550339280000000000 669084933.3333334
1550340300000000000 704147333.3333334
>
Thinking then that perhaps the sub query needs to select asn also, however that proceeds an error about mixing queries:
> select top(bps,asn,10) from (SELECT asn, non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
ERR: mixing aggregate and non-aggregate queries is not supported
>
Anyone have any thoughts on a solution?
EDIT 1
Per the suggestion by George Shuklin, modifying the query to include asn in GROUP BY displays ASN in the CLI output, but that doesn't translate in Grafana. I'm expecting a stacked graph with each layer of the stacked graph being one of the top 10 asn results.
Try to make ASN as tag, than you can use group by time(30s), 'asn', and that tag will be available in the outer query.

Create InfluxDB Continuous Query where the measurement name is based on tag values

I have a measurement called reading where all the rows are of the form:
time channel host value
2018-03-05T05:38:41.952057914Z "1" "4176433" 3.46
2018-03-05T05:39:26.113880408Z "0" "5222355" 120.23
2018-03-05T05:39:30.013558256Z "1" "5222355" 5.66
2018-03-05T05:40:13.827140492Z "0" "4176433" 3.45
2018-03-05T05:40:17.868363704Z "1" "4176433" 3.42
where channel and host are tags.
Is there a way I can automatically generate a continuous query such that:
The CQ measurement's name is of the form host_channel
Until now I have been doing them 1 by 1, for example
CREATE CONTINUOUS QUERY 4176433_1 ON database_name
BEGIN
SELECT mean(value) INTO 4176433_1
FROM reading
WHERE host = '4176433' AND channel = '1'
GROUP BY time(1m)
END
but is there a way I can automatically get 1m sampling per host & channel any time a new host is added to the database? Thanks!
There is no way of doing this in InfluxDB, by the number of reasons. Encoding tag values in a measurements names contradicts InfluxDB official best practices and being discouraged.
I suggest you just going with:
CREATE CONTINUOUS QUERY reading_aggregator ON database_name
BEGIN
SELECT mean(value), host + '_' + channel AS host_channel
INTO mean_reading
FROM reading
GROUP BY time(1m), host, channel
END

There is no datapoints when i apply aggregation method to my field in grafana

I am using influxdb as the backend of grafana, when I configure my metrics with the following query, everything works
SELECT "puller.request"
FROM "api_puller"
WHERE "hostname" = 'xxxxxxx'
AND $timeFilter
This generates the following sql to influxdb
SELECT "puller.request"
FROM "api_puller"
WHERE "hostname" = 'xxxxxxx'
AND time now() - 5m
The problem arises when I want to get the mean of my metrics
SELECT mean("puller.request")
FROM "api_puller"
WHERE "hostname" = 'xxxxxxx'
AND $timeFilter
The sql for this query is :
SELECT mean("puller.request")
FROM "api_puller"
WHERE "hostname" 'xxxxxxx'
AND time > now() - 5m
There is no data points show on my dashboard.
I copied that sql to execute it directly in influxdb. influxdb indeed output some value. But grafana still cannot show any data point.
Is there something wrong with my query? or how should I make the mean function work?
My grafana version is : Grafana v4.4.3 (commit: 54c79c5)
Thanks in advance.

Query the most recent timestamp (MAX/Last) for a specific key, in Influx

Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.
Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4
tl;dr;
The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY.
For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.
I tried this and it worked for me in a single command :
SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'
The result I got :
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4

Resources