influxdb query for grafana with grouped data - influxdb

I'm using influxdb to store some service metrics. These are simple metrics, such as read bytes or active connections. Then, with grafana, I'm composing some visualizations on top of this.
Displaying something as 'read bytes' is quite simple, it's basically summing up values, grouped by a time interval.
SELECT sum("value") FROM "bytesReceived" WHERE $timeFilter GROUP BY time($__interval) fill(0)
It's on the 'active connections' that I'm having some trouble figuring out. These are tcp sockets connected to a service, where the measurement is the number of connected sockets; this is updated whenever a socket connects or disconnects.
If I had only one instance of the service, this would be easy, I would just do something like
SELECT last("value") FROM "activeConnections" WHERE $timeFilter GROUP BY time($__interval) fill(0)
The thing is that there are multiple instances of the service, which are created dynamically. The measurement is written with the additional tag 'host', that is populated with an id for the runtime service.
So, let's get into the data points.
select * from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z'
This spits out something like
time host value
---- ---- -----
1526486436041433600 58e5bd04a313 5
1526486438158741000 58e5bd04a313 4
1526486438712713000 58e5bd04a313 3
1526486811218129000 29b39780fd7b 4
So as you can notice, we end up with 3 connections on one host and 4 on another. The problem in hand is... displaying that data merged as a whole, where that last line should be 7, for example.
I tried grouping data by host
select last(value) from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z' group by host
which gives me the last value for each group
name: activeConnections
tags: host=29b39780fd7b
time last
---- ----
1526486811218129000 4
name: activeConnections
tags: host=58e5bd04a313
time last
---- ----
1526486706993942700 3
Also tried using a subquery
select * from ( select last(value) from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z' group by host )
But I get the same problem, where I don't know how to group things nicely for grafana with a time interval.
Does any care to comment and help? Would be much appreciated.

Ok,
I seem to have found a solution. It's a shame that Grafana doesn't support sub-queries, so the query needs to be inserted manually with raw view. There's an issue open here.
So, what I needed was a way to group all the hosts value into a single plot line. That can be achieved with the following query:
SELECT sum("value") FROM (SELECT last("value") as "value" FROM "activeConnections" WHERE $timeFilter GROUP BY time($__interval), "host") GROUP BY time($__interval) fill(previous)
I was close before, but failed to notice that in the inner select, if you don't give a name to the resulting select, it comes out as "last" by default. So I was trying to sum up "value", but the field didn't exist out of the sub-query.
Hope this helps someone. Thank you Yuri, for your comment. It pointed me into the right direction.

Related

InfluxDB 1.7.2 - Top X over time

I’m new to InfluxDB. I’m using it to store ntopng timeseries data.
ntopng writes a measurement called asn:traffic that stores how many bytes were sent and received for an ASN.
> show tag keys from "asn:traffic"
name: asn:traffic
tagKey
------
asn
ifid
> show field keys from "asn:traffic"
name: asn:traffic
fieldKey fieldType
-------- ---------
bytes_rcvd float
bytes_sent float
>
I can run a query to see the data rate in bps for a specific ASN:
> SELECT non_negative_derivative(mean("bytes_rcvd"), 1s) * 8 FROM "asn:traffic" WHERE "asn" = '2906' AND time >= now() - 12h GROUP BY time(30s) fill(none)
name: asn:traffic
time non_negative_derivative
---- -----------------------
1550294640000000000 30383200
1550294700000000000 35639600
...
...
...
>
However, what I would like to do is create a query that I can use to return the top N ASNs by data rate and plot that on a Grafana graph. Sort of like this example that is using ELK.
I've tried a few variants from posts here and elsewhere, but I haven't been able to get what I'm after. For example, this query I think gets me closer to where I want to be, but there are no values in asn:
> select top(bps,asn,10) from (SELECT non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
name: asn:traffic
time top asn
---- --- ---
1550299860000000000 853572800
1550301660000000000 1197327200
1550301720000000000 1666883866.6666667
1550310780000000000 674889600
1550329320000000000 20979431866.666668
1550332740000000000 707015600
1550335920000000000 2066646533.3333333
1550336820000000000 618554933.3333334
1550339280000000000 669084933.3333334
1550340300000000000 704147333.3333334
>
Thinking then that perhaps the sub query needs to select asn also, however that proceeds an error about mixing queries:
> select top(bps,asn,10) from (SELECT asn, non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
ERR: mixing aggregate and non-aggregate queries is not supported
>
Anyone have any thoughts on a solution?
EDIT 1
Per the suggestion by George Shuklin, modifying the query to include asn in GROUP BY displays ASN in the CLI output, but that doesn't translate in Grafana. I'm expecting a stacked graph with each layer of the stacked graph being one of the top 10 asn results.
Try to make ASN as tag, than you can use group by time(30s), 'asn', and that tag will be available in the outer query.

InfluxDB Group By Field (Not Tag) and Get Top 10

i am very new to influxdb.
I have a dataset like this; (Every row/point is a connection)
time dest_ip source_ip
---- ------- ---------
2018-08-10T11:42:38.848793088Z 211.158.223.252 10.10.10.227
2018-08-10T11:42:38.87115392Z 211.158.223.252 10.10.10.59
2018-08-10T11:42:38.875289088Z 244.181.55.139 10.10.10.59
2018-08-10T11:42:38.880222208Z 138.63.15.221 10.10.10.59
2018-08-10T11:42:38.886027008Z 229.108.28.201 10.10.10.227
2018-08-10T11:42:38.892329728Z 229.108.28.201 10.10.10.181
2018-08-10T11:42:38.896943104Z 229.108.28.201 10.10.10.59
2018-08-10T11:42:38.904005376Z 22.202.67.174 10.10.10.227
2018-08-10T11:42:38.908818688Z 138.63.15.221 10.10.10.181
2018-08-10T11:42:38.913192192Z 138.63.15.221 10.10.10.181
dest_ip and source_ip are field, not tag.
Is it possible to group by dest_ip all connection records somehow and get top 10 records with counts?
Is it possible to group by dest_ip and source_ip together and get top 10 records with counts too?
Or any other solution to get top 10 source_ip to dest_ip relations according to connection counts?
Currently InfluxDB only supports tags and time interval in GROUP BY clause; as you can see the syntax of group by clause (for more information refer to InfluxDB documention):
SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>),[tag_key]
But if you insert dest_ip and source_ip as tags instead of fields, you achieve all your mentioned desires with InfluxQL query language.

Grafana + InfluxDB + telegraf

I'm using grafana to monitor network device. As u can see at screen1 , I got many interfaces for monitor, 28 physical interfaces + many virtual (vlan).
Graph show me all interfaces, but I want and opportunity to choose interface from the drop-down list. Then I found that I can solve this problem with "variables".
I make one variable and I can choose interface I want, but it didn`t effect on graph when I chose custom interface.
screen1
My variable:
Variable config
And my db query:
SELECT derivative(mean("ifHCInOctets"), 1s) *8 AS "Input", derivative(mean("ifHCOutOctets"), 1s) *8 AS "Output" FROM "autogen"."interface" WHERE $timeFilter GROUP BY time($__interval), "ifDescr" fill(null)
WHERE "interface" =~ /^$ifDescr$/
lose the brackets around the query in the grafana query when you make the dashboard. That should work. That's how i filter host names anyway, so my full query is
SELECT mean("usage_idle") * -1 + 100 FROM "cpu" WHERE "host" =~ /^$Server$/ AND "cpu" = 'cpu-total' AND $timeFilter GROUP BY time($Interval) fill(null)
That should help piece together the query you need. You could just use Grafana's query builder, and just change the where clause to use the regex value for the variable
Query Builder in Grafana
The brackets is right if you are writing in TICK script or querying the database directly from the cli. Grafana uses slightly different query syntax.

How can you tell if a Influx Database contains data?

I'm currently trying to count the number of rows in an InfluxDB, but the following fails.
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"..:MEASUREMENT";
with the message
InfluxData API responded with status code=BadRequest, response={"error":"error parsing query: found :, expected ; at line 1, char 73"}
To my understanding this query should be checking all measurements and counting them?
(I inherited this code from someone else, so apologies for not understanding it better)
If you need a binary answer to the question "tell if a Influx Database contains data?" then just do
select count(*) from /.*/
In case if the current retention policy in the current database is empty (contains 0 rows) it will return just nothing. Otherwise it will return something like this:
name: api_calls
time count_value
---- -----------
0 5
name: cpu
time count_value
---- -----------
0 1
Also you can specify retention policy explicitly:
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"./.*/

Query the most recent timestamp (MAX/Last) for a specific key, in Influx

Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.
Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4
tl;dr;
The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY.
For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.
I tried this and it worked for me in a single command :
SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'
The result I got :
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4

Resources