Create a check. Schedule every 1m offset 8s. CRIT when value is above 20 -- which it is.
Create a notification endpoint: HTTP POST to nc running on http://127.0.0.1:8087 with no authentication.
while true; do cat 201.txt | nc -l -p 8087 -q 1 | dos2unix >> in.txt; echo -e "\n----" >> in.txt ; done
Where 201.txt is
HTTP/1.1 201 Created
Server: netcat
Content-Type: text/plain; charset=UTF-8
Created
Checked working from REST client.
Create a notification rule: run every 2m offset 21s when status is equal to CRIT and message to endpoint.
Nothing happens!
The check has a green tick next to it saying success.
The view history shows a graph with a red horizontal line at the threshold and a blue line for the data. There are two rows in the table below showing runs 90 and 100 minutes ago, and corresponding vertical red lines on the graph.
Suspiciously, when clicking through to the task the red threshold band is not shown on the chart. Clicking to edit shows this definition
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data = from(bucket: "zigbee")
|> range(start: -1m)
|> filter(fn: (r) => r["room"] == "Outside")
|> filter(fn: (r) => r["_field"] == "temperature")
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
option task = {name: "Test outside 20", every: 1m, offset: 8s}
check = {_check_id: "09b167b262e86000", _check_name: "Test outside 20", _type: "threshold", tags: {}}
crit = (r) => r["temperature"] > 20.0
messageFn = (r) => "Check: ${ r._check_name } is: ${ r._level }. Value is ${r._value}"
data |> v1["fieldsAsCols"]() |> monitor["check"](data: check, messageFn: messageFn, crit: crit)
(I tried lengthening the aggregate window in case it was too small to get any data points, but it hasn't made a difference.)
The notification endpoint history has two entries also from 90 and 100 minuted ago (new data points have been received since then) both with an orange triangle in the SENT column.
The notificaiton rule has a green tick next to it and the same two history entries with orange triangles.
What's going on?!
The orange triangles are probably a red herring because no new ones are appearing -- nothing is triggering. Irritatingly, there appears to be no way to find out what the orange triangles mean.
Just checking through it again there was an orange triangle next to the alert, but when I refreshed the page it went back to a green tick. There are no new rows in the history table.
Update
If I edit the task and then save it then the UI goes to a secret Task page that shows logs.
could not execute task run: runtime error #16:33-16:96: check: failed to evaluate map function: 14:71-14:82: interpolated expression produced a null value
That means my message: apparently there is no such thing as ${r._value}?
I have 2 measurements as follows:
metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519
The frequency of the "metrics" measurement is about once in 2/3 seconds.
And the second one is:
http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518
The frequency for http_requests is about 1 second.
I want to combine the 2 metrics into a single table.
_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15
As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece?
IE...
if response measurements came at time instances
1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13
and CPU came in at
1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30
Then resulting table is
_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30
The CPU value combines to the point with the closest timestamp (+/- difference)
How can this be achieved in flux in influxdb?
I guess downsampling with aggregateWindow and fill could work.
Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.
With
from(bucket: "stackx")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
|> drop(columns: ["_measurement"]) // or remove from group other way
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> sort(columns: ["_time"], desc: false)
|> fill(column: "cpu", usePrevious: true)
|> fill(column: "response_time", usePrevious: true)
the result would be
_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12
I try to improve the respond-time of Grafana and InfluxDB.
Hopefully someone can give me some support.
My setup is:
OS: Windows 10
Docker version 20.10.5
Docker-Container: Grafana v7.5.5
Docker-Container: InfluxDB Version 2.0.5
There are different Dashboards, with different numbers of panels.
One Granfa-Dashboard for example has 24 Graph-Panels,
with refesh-rate of 500ms,1s,2s which should updated the Graph-Panel fast.
I know that 500ms refesh-rate is high, but the same behavior is seen with 1s, 2s, 5s.
The queries use this scheme:
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == "DEVICE1" and
r.COMMON == "VALUE1"
)
|> set(key: "_field", value: "")
|> set(key: "COMMON", value: "DEVICE 1")
|> aggregateWindow(every: v.windowPeriod, fn: last)
What I observe is that some graph updates very slowly.
With the Chrome-Browser I have observed the traffic.
There are some states, like "canceled" or "pending", that seems to slow down the response.
A second observation is, that some queries require up to 500 ms to process.
So my main question is, how I could improve response time if a Grafana-Dashboard request a query from InfluxDB.
Update:
With these lines before a query, different statistics (TotalDuration, CompileDuration, QueueDuration,...) about the execution will be enabled:
import "profiler"
option profiler.enabledProfilers = ["query",
"operator"]
Is it possible to plot this statistics?
Thx in advance
Source:
https://docs.influxdata.com/flux/v0.x/stdlib/profiler/
Question
I have an InfluxDB v1.8 running on a Raspberry Pi 2.
I have a measurement "gridElectricityMeter" that has (next to other fields not important here) a field "import" which contains the total reading of the energy meter in Watt-hours. The measurement receives a new value every 10 seconds.
I would like to create a Bar chart that shows the imported amount of energy by hour of day in some time frame to be specified (I am using Grafana to create the chart, so the time frame is set there).
E.g. if the raw data in InfluxDB looks like this:
time
import
2021-01-01T00:00:00Z
0
2021-01-01T01:00:00Z
2
2021-01-01T01:20:00Z
8
2021-01-01T02:00:00Z
10
2021-01-02T00:00:00Z
10
2021-01-02T01:00:00Z
20
(In reality there are of course far more values, one value per 10 seconds. A slight lack of precision because the timestamps are likely a few seconds of the full hour may be accepted.)
Then I want the following result:
hour
sum
explanation
0
12
2 (first day) + 10 (second day)
1
8
8 (first day)
2
0
0
3
0
0
...
...
Data should be filled with zero if unavailable (e.g. querying "today", of course there is no data for the time after "now").
What I have tried so far
As far as I understand I won't be able to do this with InfluxQL. So here is what I tried using flux. Also I think I understood that InfluxDB >= 2.0 is 64-bit only and thus won't run on Raspberry Pi 2.
Here is what I came up with:
import "date"
import "generate"
// generate a table with 24 entries (hours 0-23) and "sum=0":
initial = generate.from(
count: 24,
fn: (n) => n,
// start and stop are actually irrelevant, but they are required args
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> map(fn: (r) => ({hour: r._value, sum: 0}))
// First group data by day and hour
data = from(bucket: "myDatabase")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "gridElectricityMeter" and r._field == "import")
|> map(fn: (r) => ({r with hour: date.hour(t: r._time), group: date.truncate(t: r._time, unit: 1h)}))
|> group(columns: ["group"])
// In each group we get the first&last point so that we get the first and last reading from the energy meter. The difference between those values is the energy imported.
first = data
|> first()
last = data
|> last()
// Calculate energy used in each group and then regroup by hour (and not day):
byhour = join(tables: {first: first, last: last}, on: ["group"])
|> map(fn: (r) => ({r with hour: r.hour_first, _value: r._value_last - r._value_first}))
|> group(columns: ["hour"])
// manual summing because of https://github.com/influxdata/flux/issues/2505
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r._value}), identity: {sum: 0})
// Fill hours we have no data for with zero:
union(tables: [byhour, initial])
|> group(columns: ["hour"])
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r.sum}), identity: {sum: 0})
|> group()
|> sort(columns: ["hour"])
It does seem to work and actually do what I want it to do. However it seem ridiculously complicated. And also it is slow. Querying data for a single day takes about 7 seconds. A single day contains 66024=8640 data points in that measurements which really does not sound very much to me and it should not be that slow.
Is there a better way of doing it?
We have an influxdb 2.0 instance that is quickly growing in size (suspect docker metrics), so I would like to find out which measurements are "spamming".
TLDR: how do we find the measurements that take up most of the disk space on influxdb 2.0
Long version:
i know old influx had a stats _internal db with some relevant metrics, found _monitoring in 2.0, but mine is kinda empty, only have write_errors measurement there
tried a bunch of flux queries like, but i don't think they give what i'm looking for:
from(bucket: "telegraf")
|> range(start: -15m)
|> filter(fn: (r) => r._measurement == "docker_container_blkio")
|> count()
tried influxQL queries like via the v1 API, like:
SELECT COUNT(system) FROM telegraf
but that gave:
"results": [
{
"statement_id": 0
}
]
}
played a bit with influxd inspect export-index, that would also be usefull if it gave me some stats about the measurements
i saw influxdb 1.8 had this: https://docs.influxdata.com/influxdb/v1.8/tools/influx_inspect/#report-disk