I run a shared web hosting using CloudLinux.
From it, I can get a bunch of performence metric
So, my influxDB is :
measurement : lve
fields : CPU,EP,IO,IOPS,MEM,MEMPHY,NETI,NETO,NPROC,fEP,fMEM,fMEMPHY,fNPROC,lCPU,lCPUW,lEP,lIO,lIOPS,lMEM,lMEMPHY,lNETI,lNETO,lNPROC,nCPU
tags : xpool, host, user (where : xpool is xen-pool uid, host is hostname of cloudLinux, user is username of shared hosting)
data is gathered each 5 seconds
How is the query sentence to :
Select records from specific xpool+host , and
get 5 unique username that produce TOP CPU usage in 5 minute periode from it ?.
There is hundreds usaername but I want got top-5 only.
Note: Samething like example 4 of TOP() from https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#top, unless that expected results is:
name: h2o_feet
time top location
---- --- --------
2015-08-18T00:00:00Z 8.12 coyote_creek
2015-08-18T00:54:00Z 2.054 santa_monica
Rather than :
name: h2o_feet
time top location
---- --- --------
2015-08-18T00:48:00Z 7.11 coyote_creek
2015-08-18T00:54:00Z 6.982 coyote_creek
2015-08-18T00:54:00Z 2.054 santa_monica
2015-08-18T00:24:00Z 7.635 coyote_creek
2015-08-18T00:30:00Z 7.5 coyote_creek
2015-08-18T00:36:00Z 7.372 coyote_creek
2015-08-18T00:00:00Z 8.12 coyote_creek
2015-08-18T00:06:00Z 8.005 coyote_creek
2015-08-18T00:12:00Z 7.887 coyote_creek
Since '8.12' is the highest value of 'coyote_creek' and '2.054' is the highest value of 'santa_monica'
Sincerely
-bino-
Probably a subquery could help, for example, this is from a database using telegraf:
SELECT top,host FROM (SELECT TOP(usage_user, 1) AS top, host from cpu WHERE time > now() -1m GROUP BY host)
It will output something like:
name: cpu
time top host
---- --- ----
1527489800000000000 1.4937106918238994 1.host.tld
1527489808000000000 0.3933910306845004 2.host.tld
1527489810000000000 4.17981072555205 3.host.tld
1527489810000000000 0.8654602675059009 4.host.tld
The first query is:
SELECT TOP(usage_user, 1) AS top, host from cpu WHERE time > now() -1m GROUP BY host
Is using TOP to get only 1 item and using the field usage_user
Then to "pretty print" A subquery is used:
SELECT top,host FROM (...)
Related
I’m new to InfluxDB. I’m using it to store ntopng timeseries data.
ntopng writes a measurement called asn:traffic that stores how many bytes were sent and received for an ASN.
> show tag keys from "asn:traffic"
name: asn:traffic
tagKey
------
asn
ifid
> show field keys from "asn:traffic"
name: asn:traffic
fieldKey fieldType
-------- ---------
bytes_rcvd float
bytes_sent float
>
I can run a query to see the data rate in bps for a specific ASN:
> SELECT non_negative_derivative(mean("bytes_rcvd"), 1s) * 8 FROM "asn:traffic" WHERE "asn" = '2906' AND time >= now() - 12h GROUP BY time(30s) fill(none)
name: asn:traffic
time non_negative_derivative
---- -----------------------
1550294640000000000 30383200
1550294700000000000 35639600
...
...
...
>
However, what I would like to do is create a query that I can use to return the top N ASNs by data rate and plot that on a Grafana graph. Sort of like this example that is using ELK.
I've tried a few variants from posts here and elsewhere, but I haven't been able to get what I'm after. For example, this query I think gets me closer to where I want to be, but there are no values in asn:
> select top(bps,asn,10) from (SELECT non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
name: asn:traffic
time top asn
---- --- ---
1550299860000000000 853572800
1550301660000000000 1197327200
1550301720000000000 1666883866.6666667
1550310780000000000 674889600
1550329320000000000 20979431866.666668
1550332740000000000 707015600
1550335920000000000 2066646533.3333333
1550336820000000000 618554933.3333334
1550339280000000000 669084933.3333334
1550340300000000000 704147333.3333334
>
Thinking then that perhaps the sub query needs to select asn also, however that proceeds an error about mixing queries:
> select top(bps,asn,10) from (SELECT asn, non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
ERR: mixing aggregate and non-aggregate queries is not supported
>
Anyone have any thoughts on a solution?
EDIT 1
Per the suggestion by George Shuklin, modifying the query to include asn in GROUP BY displays ASN in the CLI output, but that doesn't translate in Grafana. I'm expecting a stacked graph with each layer of the stacked graph being one of the top 10 asn results.
Try to make ASN as tag, than you can use group by time(30s), 'asn', and that tag will be available in the outer query.
I am trying to do a query against the INFLUX-DB to get unique values.Below is the query I use,
select host AS Host,(value/100) AS Load from metrics where time > now() - 1h and command='Check_load_current' and value>4000;
The output for the query is,
What I actually want is the unique "Host" values. For example I want "host-1" as a output repeated only once(latest value) eventhough the load values are different.How can I achieve this? Any help would be much helpful.
Q: I want the latest values from each unique "Host", how do I achieve it?
Given the following database:
time host value
---- ---- -----
1529508443000000000 host01 42.72
1529508609000000000 host05 53.94
1529508856000000000 host01 40.37
1529508913000000000 host02 41.02
1529508937000000000 host01 44.49
A: Consider breaking the problem down.
First you can group the "tag values" into their individual buckets using the "Groupby" operation.
Select * from number group by "host"
name: number
tags: host=host01
time value
---- -----
1529508443000000000 42.72
1529508856000000000 40.37
1529508937000000000 44.49
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
Next, you will want to order the data in each bucket to be in descending order and then tell influxdb to only return the top 1 row of each bucket.
Hence add the "Order by DESC" and the "limit 1" filter to the first query and it should yield you the desire result.
> select * from number group by "host" order by desc limit 1;
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host01
time value
---- -----
1529508937000000000 44.49
Reference:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-group-by-clause
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#order-by-time-desc
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-limit-and-slimit-clauses
If you want to get only the latest value for each unique host tag do the following:
SELECT host AS Host, last(value)/100 AS Load
FROM metrics
GROUP BY host
I'm currently trying to count the number of rows in an InfluxDB, but the following fails.
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"..:MEASUREMENT";
with the message
InfluxData API responded with status code=BadRequest, response={"error":"error parsing query: found :, expected ; at line 1, char 73"}
To my understanding this query should be checking all measurements and counting them?
(I inherited this code from someone else, so apologies for not understanding it better)
If you need a binary answer to the question "tell if a Influx Database contains data?" then just do
select count(*) from /.*/
In case if the current retention policy in the current database is empty (contains 0 rows) it will return just nothing. Otherwise it will return something like this:
name: api_calls
time count_value
---- -----------
0 5
name: cpu
time count_value
---- -----------
0 1
Also you can specify retention policy explicitly:
SELECT count(*) FROM "TempData_Quarantine_1519835017000_1519835137000"./.*/
I would like to know if it is possible somehow from the CLI of the influx to select the data of a specific shard. I also would like to select the series within two timestamps but i haven't yet found how. Any input would be appreciated, thank you.
Q: I would like to know if it possible somehow from the CLI of the influx to select the data of a specific shard.
A: At influxdb 1.3 this is not possible. However you should be able to work out what data lives in there.
Query to get the shards information:
show shards
it should tell you the start and end date time of the data (across all series in the database) contained in that shard.
For instance
Given Shard info:
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
123 mydb autogen 123 2012-11-26T00:00:00Z 2012-12-03T00:00:00Z 2012-12-03T00:00:00Z
124 mydb autogen 124 2013-01-14T00:00:00Z 2013-01-21T00:00:00Z 2013-01-21T00:00:00Z
125 mydb autogen 125 2013-04-29T00:00:00Z 2013-05-06T00:00:00Z 2013-05-06T00:00:00Z
Given Measurements:
name: measurements
name
----
measurement_abc
measurement_def
measurement_123
Shard 123 will contain all of the data across the noted measurements above that fall in the start time of 2012-11-26T00:00:00Z and end time of 2012-12-03T00:00:00Z. That is, running a drop shard 123 would see data in that range disappearing across the measurements.
Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.
Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4
tl;dr;
The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY.
For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.
I tried this and it worked for me in a single command :
SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'
The result I got :
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4