InfluxDB: High cardinality for specific shards

InfluxDB: High cardinality for specific shards - influxdb

I'm querying data from different shards and used EXPLAIN to check how many series are being fetched for that particular date range.
> SHOW SHARDS
.
.
658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
Executing EXPLAIN for data from shard 658 and it's giving expected result in terms of number of series. SensorId is only tag key and as date range fall into only shard it's giving NUMBER OF SERIES: 1
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642
Environment info:
System info: Linux 4.4.0-1087-aws x86_64
InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)
Update - 1
On checking field cardinality, I observed a spike in RAM.
> SHOW FIELD KEY CARDINALITY
Update - 2
I've rebuilt the indexes, but the cardinality is still high.
Update - 3
I found out that shard has "SensorId" as tag as well as field that causing high cardinality when querying with the "SensorId" filter.
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time count
---- -----
1970-01-01T00:00:00Z 40
But when I'm checking tag values with key 'SensorId', it's not showing empty string that present in the above query.
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
Update - 4
Inspected data using influx_inspect dumptsm and re-validated that null tag values are present
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005

Related

INFLUXDB : SELECT DATA FROM MEASUREMENT ORDERED BY TIME WHERE TIME = MAX(TIME)

I just now started with influx, Need help to get the data sorted by timestamp and the latest data by it.
select DB,AREA,sptotal,spfree,pctfree from ORA_SIZE GROUP BY DB order by time ;
name: ORA_SIZE
tags: DB=DB43B
time DB AREA sptotal spfree pctfree
---- -- ---- ------- ------ -------
1587919100011225116 DB43B DATA 442 303 68
1587919100011225116 DB43B SYSTEM 40 35 87
1587919088732608975 DB43B DATA 442 303 68
1587919088732608975 DB43B SYSTEM 40 35 87
Here, I want to retrieve only
1587919088732608975 DB43B DATA 442 303 68
1587919088732608975 DB43B SYSTEM 40 35 87
as they are the latest data based on time,
is there any query for this? Please advise.
Thanks,

You can use the Last Function to get the latest value of any field.
Example :
SELECT LAST(DB,AREA,sptotal,spfree,pctfree) FROM ORA_SIZE [WHERE_clause] [GROUP_BY_clause]
This query should return your desired data.
You can find the documentation here

Group data by consecutive points in Influxdb

Suppose I have this data:
time value
---- ----
0 28
1 27
2 26
3 25
4 26
5 27
I want to get values greater than 25 separated by consecutive points, as follow:
Group1
time value
---- ----
0 28
1 27
2 26
Group2
time value
---- ----
4 26
5 27
Is there anyway to do that with one query or should I post-process the data?

how to get column names in table in sqlplus 12c

I am executing the following command with these results:
SQL> select * from employee;
12 sachin 48000 23
13 raja 49000 23
35 vikas 40000 26
11 sau 22000 24
23 viru 40000 26
87 raju 4500
I would also like to get the name of the column. How may I do this?

Try Set heading on - this will toggle the column headers on or off.

Tableau running count reset

I have a list of sporting matches by time with result and margin. I want Tableau to keep a running count of number of matches since the last x (say, since the last draw - where margin = 0).
This will mean that on every record, the running count will increase by one unless that match is a draw, in which case it will drop back to zero.
I have not found a method of achieving this. The only way I can see to restart counts is via dates (e.g. a new year).
As an aside, I can easily achieve this by creating a running count tally OUTSIDE of Tableau.
The interesting thing is that Tableau then doesn't quite deal with this well with more than one result on the same day.
For example, if the structure is:
GameID Date Margin Running count
...
48 01-01-15 54 122
49 08-01-15 12 123
50 08-01-15 0 124
51 08-01-15 17 0
52 08-01-15 23 1
53 15-01-15 9 2
...
Then when trying to plot running count against date, Tableau rearranges the data to show:
GameID Date Margin Running count
...
48 01-01-15 54 122
51 08-01-15 17 0
52 08-01-15 23 1
49 08-01-15 12 123
50 08-01-15 0 124
53 15-01-15 9 2
...
I assume it is doing this because by default it sorts the running count data in ascending order when dates are identical.

Fill blank variable with subsequent values

I have a dataset structured like below;
id contracthours13 contracthours14 contracthours13u contracthours14u
12 . 13 . 13
13 30 30 . .
14 . . 15 16
15 . 5 6 7
If contracthours13 is missing I want the value in contracthours14 to move across. If this is missing then I want contacthours13u to move across and the same then for contracthours14u if the previous 3 are all missing. I know this is fairly simple syntax but I just can't get my head around how to do it without having the run simpler syntax 3 times. If anyone could help it would be greatly appreciated.
Edit: below is what I would like my dataset to look like afterwards.
id contracthours13
12 13
13 30
14 15
15 5

Look up VECTOR / LOOP examples.
DATA LIST FREE / ID CH13 CH14 CH13U CH14U.
BEGIN DATA.
1 -1 13 -1 -1
2 30 30 -1 -1
3 -1 -1 15 16
4 -1 5 6 7
END DATA.
DATASET NAME DSRaw.
RECODE ALL (-1=SYSMIS).
VECTOR V= CH14 TO CH14U.
LOOP #i = 1 TO 3 IF (NVALID(CH13)=0).
COMPUTE CH13=V(#i).
END LOOP IF NVALID(V(#i))=1.
LIST.
EXE.
**List**
ID CH13 CH14 CH13U CH14U
1.00 13.00 13.00 . .
2.00 30.00 30.00 . .
3.00 15.00 . 15.00 16.00
4.00 5.00 5.00 6.00 7.00
Number of cases read: 4 Number of cases listed: 4

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

InfluxDB: High cardinality for specific shards - influxdb

Related

INFLUXDB : SELECT DATA FROM MEASUREMENT ORDERED BY TIME WHERE TIME = MAX(TIME)

Group data by consecutive points in Influxdb

how to get column names in table in sqlplus 12c

Tableau running count reset

Fill blank variable with subsequent values

Categories

Resources