How to query from an Influx database with an absent field?

How to query from an Influx database with an absent field? - influxdb

I have a measurement gathered by telegraf. It has following structure:
name: smart_device
fieldKey fieldType
-------- ---------
exit_status integer
health_ok boolean
read_error_rate integer
seek_error_rate integer
temp_c integer
udma_crc_errors integer
When I query this database I can do this:
> select * from smart_device where "health_ok" = true limit 1
name: smart_device
time capacity device enabled exit_status health_ok host model read_error_rate seek_error_rate serial_no temp_c udma_crc_errors wwn
---- -------- ------ ------- ----------- --------- ---- ----- --------------- --------------- --------- ------ --------------- ---
15337409500 2000398934016 sda Enabled 0 true osd21 Hitachi HDS722020ALA330 0 0 JK11A4B8JR2EGW 38 0 5000cca222e6384f
and this:
> select * from smart_device limit 1
name: smart_device
time capacity device enabled exit_status health_ok host model read_error_rate seek_error_rate serial_no temp_c udma_crc_errors wwn
---- -------- ------ ------- ----------- --------- ---- ----- --------------- --------------- --------- ------ --------------- ---
1533046990 sda 0 osd21
But when I try to filter out records with empty health_ok, I get empty output:
> select * from smart_device where "health_ok"!= true
>
How can I select measurements with empty (no? null?) health_ok?

Unfortunately there is currently no way to do this using InfluxQL. InfluxDB is a form of document oriented database; it means rows of a measurement can have different schema. Therefore, there is no a concept of null for a field of a row; actually this row dose not have the field. for example suppose there are 4 rows in the measurement cost
> select * from cost
name: cost
time isok type value
---- ---- ---- -----
1533970927859614000 true 1 100
1533970938243629700 true 2 101
1533970949371761100 3 103
1533970961571703900 2 104
As you can see, there are two rows with isok=true and two rows which have no field named isok; so there is only one way to select the time of rows which have the isok field with this query:
> select isok from cost
name: cost
time isok
---- ----
1533970927859614000 true
1533970938243629700 true
Since InfluxQL currently dose not support subquery in where clause, therefor there is no way to query for rows with no isok field (If InfluxDB supports this type of query, you can query like this SELECT * FROM cost WHERE time NOT IN (SELECT isok FROM cost))

It's not exactly the answer for the original question, but I found a special trick for Kapacitor.
If this query has been executed by kapacitor, it (kapacitor) has a special node default which allows to add missing fields/tags with some value.
For the health_ok query it will look like this (tickscript):
var data = stream
|from()
.measurement('smart_device')
|default()
.field('health_ok', FALSE)
This allows to assume that if health_ok is missed, it is FALSE.

Related

How can I query all InfluxDB _internal database measurements?

I cannot query following measurements from _internal database of InfluxDB using Influxql:
database
write
shard
See results for following commands:
> show databases
name: databases
name
----
_internal
>use _internal
> show measurements
name: measurements
name
----
cq
database
httpd
queryExecutor
runtime
shard
subscriber
tsm1_cache
tsm1_engine
tsm1_filestore
tsm1_wal
write
> select * from database limit 1;
ERR: error parsing query: found fron, expected FROM at line 1, char 10
> select * from write limit 1;
ERR: error parsing query: found WRITE, expected identifier at line 1, char 15
> select * from shard limit 1;
ERR: error parsing query: found SHARD, expected identifier at line 1, char 15
But I can successfully query some other measurements
> select * from queryExecutor limit 1;
name: queryExecutor
time hostname queriesActive queriesExecuted queriesFinished queryDurationNs recoveredPanics
---- -------- ------------- --------------- --------------- --------------- ---------------
1559923260000000000 localhost.localdomain 0 0 0 0 0
How can I query/extract data from _internal database of InfluxDB across all measurements availbale?

um,,,, it's too late... but I'm find a way
In some cases, an expanded identifier error occurs if one of the queries' identifiers is an InfluxQL keyword.To successfully query an identifier that is also a keyword, tie the identifier to a large quote.
select * from "database"

InfluxDB 1.7.2 - Top X over time

I’m new to InfluxDB. I’m using it to store ntopng timeseries data.
ntopng writes a measurement called asn:traffic that stores how many bytes were sent and received for an ASN.
> show tag keys from "asn:traffic"
name: asn:traffic
tagKey
------
asn
ifid
> show field keys from "asn:traffic"
name: asn:traffic
fieldKey fieldType
-------- ---------
bytes_rcvd float
bytes_sent float
>
I can run a query to see the data rate in bps for a specific ASN:
> SELECT non_negative_derivative(mean("bytes_rcvd"), 1s) * 8 FROM "asn:traffic" WHERE "asn" = '2906' AND time >= now() - 12h GROUP BY time(30s) fill(none)
name: asn:traffic
time non_negative_derivative
---- -----------------------
1550294640000000000 30383200
1550294700000000000 35639600
...
...
...
>
However, what I would like to do is create a query that I can use to return the top N ASNs by data rate and plot that on a Grafana graph. Sort of like this example that is using ELK.
I've tried a few variants from posts here and elsewhere, but I haven't been able to get what I'm after. For example, this query I think gets me closer to where I want to be, but there are no values in asn:
> select top(bps,asn,10) from (SELECT non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
name: asn:traffic
time top asn
---- --- ---
1550299860000000000 853572800
1550301660000000000 1197327200
1550301720000000000 1666883866.6666667
1550310780000000000 674889600
1550329320000000000 20979431866.666668
1550332740000000000 707015600
1550335920000000000 2066646533.3333333
1550336820000000000 618554933.3333334
1550339280000000000 669084933.3333334
1550340300000000000 704147333.3333334
>
Thinking then that perhaps the sub query needs to select asn also, however that proceeds an error about mixing queries:
> select top(bps,asn,10) from (SELECT asn, non_negative_derivative(mean(bytes_rcvd), 1s) * 8 as bps FROM "asn:traffic" WHERE time >= now() - 12h GROUP BY time(30s) fill(none))
ERR: mixing aggregate and non-aggregate queries is not supported
>
Anyone have any thoughts on a solution?
EDIT 1
Per the suggestion by George Shuklin, modifying the query to include asn in GROUP BY displays ASN in the CLI output, but that doesn't translate in Grafana. I'm expecting a stacked graph with each layer of the stacked graph being one of the top 10 asn results.

Try to make ASN as tag, than you can use group by time(30s), 'asn', and that tag will be available in the outer query.

Influx-DB Uniq Query

I am trying to do a query against the INFLUX-DB to get unique values.Below is the query I use,
select host AS Host,(value/100) AS Load from metrics where time > now() - 1h and command='Check_load_current' and value>4000;
The output for the query is,
What I actually want is the unique "Host" values. For example I want "host-1" as a output repeated only once(latest value) eventhough the load values are different.How can I achieve this? Any help would be much helpful.

Q: I want the latest values from each unique "Host", how do I achieve it?
Given the following database:
time host value
---- ---- -----
1529508443000000000 host01 42.72
1529508609000000000 host05 53.94
1529508856000000000 host01 40.37
1529508913000000000 host02 41.02
1529508937000000000 host01 44.49
A: Consider breaking the problem down.
First you can group the "tag values" into their individual buckets using the "Groupby" operation.
Select * from number group by "host"
name: number
tags: host=host01
time value
---- -----
1529508443000000000 42.72
1529508856000000000 40.37
1529508937000000000 44.49
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
Next, you will want to order the data in each bucket to be in descending order and then tell influxdb to only return the top 1 row of each bucket.
Hence add the "Order by DESC" and the "limit 1" filter to the first query and it should yield you the desire result.
> select * from number group by "host" order by desc limit 1;
name: number
tags: host=host05
time value
---- -----
1529508609000000000 53.94
name: number
tags: host=host02
time value
---- -----
1529508913000000000 41.02
name: number
tags: host=host01
time value
---- -----
1529508937000000000 44.49
Reference:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-group-by-clause
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#order-by-time-desc
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-limit-and-slimit-clauses

If you want to get only the latest value for each unique host tag do the following:
SELECT host AS Host, last(value)/100 AS Load
FROM metrics
GROUP BY host

influxdb query for 5 top cpu usage

I run a shared web hosting using CloudLinux.
From it, I can get a bunch of performence metric
So, my influxDB is :
measurement : lve
fields : CPU,EP,IO,IOPS,MEM,MEMPHY,NETI,NETO,NPROC,fEP,fMEM,fMEMPHY,fNPROC,lCPU,lCPUW,lEP,lIO,lIOPS,lMEM,lMEMPHY,lNETI,lNETO,lNPROC,nCPU
tags : xpool, host, user (where : xpool is xen-pool uid, host is hostname of cloudLinux, user is username of shared hosting)
data is gathered each 5 seconds
How is the query sentence to :
Select records from specific xpool+host , and
get 5 unique username that produce TOP CPU usage in 5 minute periode from it ?.
There is hundreds usaername but I want got top-5 only.
Note: Samething like example 4 of TOP() from https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#top, unless that expected results is:
name: h2o_feet
time top location
---- --- --------
2015-08-18T00:00:00Z 8.12 coyote_creek
2015-08-18T00:54:00Z 2.054 santa_monica
Rather than :
name: h2o_feet
time top location
---- --- --------
2015-08-18T00:48:00Z 7.11 coyote_creek
2015-08-18T00:54:00Z 6.982 coyote_creek
2015-08-18T00:54:00Z 2.054 santa_monica
2015-08-18T00:24:00Z 7.635 coyote_creek
2015-08-18T00:30:00Z 7.5 coyote_creek
2015-08-18T00:36:00Z 7.372 coyote_creek
2015-08-18T00:00:00Z 8.12 coyote_creek
2015-08-18T00:06:00Z 8.005 coyote_creek
2015-08-18T00:12:00Z 7.887 coyote_creek
Since '8.12' is the highest value of 'coyote_creek' and '2.054' is the highest value of 'santa_monica'
Sincerely
-bino-

Probably a subquery could help, for example, this is from a database using telegraf:
SELECT top,host FROM (SELECT TOP(usage_user, 1) AS top, host from cpu WHERE time > now() -1m GROUP BY host)
It will output something like:
name: cpu
time top host
---- --- ----
1527489800000000000 1.4937106918238994 1.host.tld
1527489808000000000 0.3933910306845004 2.host.tld
1527489810000000000 4.17981072555205 3.host.tld
1527489810000000000 0.8654602675059009 4.host.tld
The first query is:
SELECT TOP(usage_user, 1) AS top, host from cpu WHERE time > now() -1m GROUP BY host
Is using TOP to get only 1 item and using the field usage_user
Then to "pretty print" A subquery is used:
SELECT top,host FROM (...)

Retention Policy doesn't deletes data

I’m new to influxdb and i want to implement Retention Policy (RP) for my logs.
I loaded a static data using telegraf and have created a RP for that :
CREATE DATABASE test WITH DURATION 60m
but it is not deleting my previous logs .
As i have observed that influx stores data on UTC time format whereas my telegraf server uses system time. Could that be a isuue ??

I would check two things using the Influx CLI. First, check the retention policies on your DB.
> SHOW RETENTION POLICIES
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 1h0m0s 1h0m0s 1 true
For example, I can see my autogen policy has a duration of 1 hour and a shardGroupDuration of 1 hour.
Second, check the shards.
> SHOW SHARDS
name: tester
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
130 tester autogen 130 2018-02-20T21:00:00Z 2018-02-20T22:00:00Z 2018-02-20T23:00:00Z
131 tester autogen 131 2018-02-20T22:00:00Z 2018-02-20T23:00:00Z 2018-02-21T00:00:00Z
132 tester autogen 132 2018-02-20T23:00:00Z 2018-02-21T00:00:00Z 2018-02-21T01:00:00Z
Data is removed when the newest point has a timestamp after the expiry time.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to query from an Influx database with an absent field? - influxdb

Related

How can I query all InfluxDB _internal database measurements?

InfluxDB 1.7.2 - Top X over time

Influx-DB Uniq Query

influxdb query for 5 top cpu usage

Retention Policy doesn't deletes data

Categories

Resources