I'm currently trying to count the number of rows in an InfluxDB, but the following fails.
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"..:MEASUREMENT";
with the message
InfluxData API responded with status code=BadRequest, response={"error":"error parsing query: found :, expected ; at line 1, char 73"}
To my understanding this query should be checking all measurements and counting them?
(I inherited this code from someone else, so apologies for not understanding it better)
If you need a binary answer to the question "tell if a Influx Database contains data?" then just do
select count(*) from /.*/
In case if the current retention policy in the current database is empty (contains 0 rows) it will return just nothing. Otherwise it will return something like this:
name: api_calls
time count_value
---- -----------
0 5
name: cpu
time count_value
---- -----------
0 1
Also you can specify retention policy explicitly:
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"./.*/
Related
I am trying to count distinct sessionIds from a measurement. sessionId being a tag, I count the distinct entries in a "parent" query, since distinct() doesn't works on tags.
In the subquery, I use a group by sessionId limit 1 to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1 already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1 is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?
I can confirm that I get the same result.
In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count, containing a number. You can remove the EXACT modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.
I’m new to InfluxDB. I’m using it to store ntopng timeseries data.
ntopng writes a measurement called asn:traffic that stores how many bytes were sent and received for an ASN.
> show tag keys from "asn:traffic"
name: asn:traffic
tagKey
------
asn
ifid
> show field keys from "asn:traffic"
name: asn:traffic
fieldKey fieldType
-------- ---------
bytes_rcvd float
bytes_sent float
>
I can run a query to see the data rate in bps for a specific ASN:
> SELECT non_negative_derivative(mean("bytes_rcvd"), 1s) * 8 FROM "asn:traffic" WHERE "asn" = '2906' AND time >= now() - 12h GROUP BY time(30s) fill(none)
name: asn:traffic
time non_negative_derivative
---- -----------------------
1550294640000000000 30383200
1550294700000000000 35639600
...
...
...
>
However, what I would like to do is create a query that I can use to return the top N ASNs by data rate and plot that on a Grafana graph. Sort of like this example that is using ELK.
I've tried a few variants from posts here and elsewhere, but I haven't been able to get what I'm after. For example, this query I think gets me closer to where I want to be, but there are no values in asn:
> select top(bps,asn,10) from (SELECT non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
name: asn:traffic
time top asn
---- --- ---
1550299860000000000 853572800
1550301660000000000 1197327200
1550301720000000000 1666883866.6666667
1550310780000000000 674889600
1550329320000000000 20979431866.666668
1550332740000000000 707015600
1550335920000000000 2066646533.3333333
1550336820000000000 618554933.3333334
1550339280000000000 669084933.3333334
1550340300000000000 704147333.3333334
>
Thinking then that perhaps the sub query needs to select asn also, however that proceeds an error about mixing queries:
> select top(bps,asn,10) from (SELECT asn, non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
ERR: mixing aggregate and non-aggregate queries is not supported
>
Anyone have any thoughts on a solution?
EDIT 1
Per the suggestion by George Shuklin, modifying the query to include asn in GROUP BY displays ASN in the CLI output, but that doesn't translate in Grafana. I'm expecting a stacked graph with each layer of the stacked graph being one of the top 10 asn results.
Try to make ASN as tag, than you can use group by time(30s), 'asn', and that tag will be available in the outer query.
I am trying to do a query against the INFLUX-DB to get unique values.Below is the query I use,
select host AS Host,(value/100) AS Load from metrics where time > now() - 1h and command='Check_load_current' and value>4000;
The output for the query is,
What I actually want is the unique "Host" values. For example I want "host-1" as a output repeated only once(latest value) eventhough the load values are different.How can I achieve this? Any help would be much helpful.
Q: I want the latest values from each unique "Host", how do I achieve it?
Given the following database:
time host value
---- ---- -----
1529508443000000000 host01 42.72
1529508609000000000 host05 53.94
1529508856000000000 host01 40.37
1529508913000000000 host02 41.02
1529508937000000000 host01 44.49
A: Consider breaking the problem down.
First you can group the "tag values" into their individual buckets using the "Groupby" operation.
Select * from number group by "host"
name: number
tags: host=host01
time value
---- -----
1529508443000000000 42.72
1529508856000000000 40.37
1529508937000000000 44.49
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
Next, you will want to order the data in each bucket to be in descending order and then tell influxdb to only return the top 1 row of each bucket.
Hence add the "Order by DESC" and the "limit 1" filter to the first query and it should yield you the desire result.
> select * from number group by "host" order by desc limit 1;
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host01
time value
---- -----
1529508937000000000 44.49
Reference:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-group-by-clause
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#order-by-time-desc
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-limit-and-slimit-clauses
If you want to get only the latest value for each unique host tag do the following:
SELECT host AS Host, last(value)/100 AS Load
FROM metrics
GROUP BY host
I'm using influxdb to store some service metrics. These are simple metrics, such as read bytes or active connections. Then, with grafana, I'm composing some visualizations on top of this.
Displaying something as 'read bytes' is quite simple, it's basically summing up values, grouped by a time interval.
SELECT sum("value") FROM "bytesReceived" WHERE $timeFilter GROUP BY time($__interval) fill(0)
It's on the 'active connections' that I'm having some trouble figuring out. These are tcp sockets connected to a service, where the measurement is the number of connected sockets; this is updated whenever a socket connects or disconnects.
If I had only one instance of the service, this would be easy, I would just do something like
SELECT last("value") FROM "activeConnections" WHERE $timeFilter GROUP BY time($__interval) fill(0)
The thing is that there are multiple instances of the service, which are created dynamically. The measurement is written with the additional tag 'host', that is populated with an id for the runtime service.
So, let's get into the data points.
select * from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z'
This spits out something like
time host value
---- ---- -----
1526486436041433600 58e5bd04a313 5
1526486438158741000 58e5bd04a313 4
1526486438712713000 58e5bd04a313 3
1526486811218129000 29b39780fd7b 4
So as you can notice, we end up with 3 connections on one host and 4 on another. The problem in hand is... displaying that data merged as a whole, where that last line should be 7, for example.
I tried grouping data by host
select last(value) from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z' group by host
which gives me the last value for each group
name: activeConnections
tags: host=29b39780fd7b
time last
---- ----
1526486811218129000 4
name: activeConnections
tags: host=58e5bd04a313
time last
---- ----
1526486706993942700 3
Also tried using a subquery
select * from ( select last(value) from activeConnections where time > '2018-05-16T16:00:00Z' and time < '2018-05-16T16:10:00Z' group by host )
But I get the same problem, where I don't know how to group things nicely for grafana with a time interval.
Does any care to comment and help? Would be much appreciated.
Ok,
I seem to have found a solution. It's a shame that Grafana doesn't support sub-queries, so the query needs to be inserted manually with raw view. There's an issue open here.
So, what I needed was a way to group all the hosts value into a single plot line. That can be achieved with the following query:
SELECT sum("value") FROM (SELECT last("value") as "value" FROM "activeConnections" WHERE $timeFilter GROUP BY time($__interval), "host") GROUP BY time($__interval) fill(previous)
I was close before, but failed to notice that in the inner select, if you don't give a name to the resulting select, it comes out as "last" by default. So I was trying to sum up "value", but the field didn't exist out of the sub-query.
Hope this helps someone. Thank you Yuri, for your comment. It pointed me into the right direction.
Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.
Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4
tl;dr;
The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY.
For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.
I tried this and it worked for me in a single command :
SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'
The result I got :
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4