InfluxDB: Wrong start time when group by more than 1 day - influxdb

I stuck at one issue related group by when the time interval is more than 1 day. It's giving the wrong start time for different grain when grain is more than 1 day.
Grain = 1day
ExpectedStartTime=2019-01-01
ActualStartTime=2019-01-01
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(1d), "host" LIMIT 2;
time mean_messages
---- ----
2019-01-01T00:00:00Z 181232
2019-01-02T00:00:00Z 179728
Grain = 2day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-31
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(2d), "host" LIMIT 2;
time mean_messages
---- ----
2018-12-31T00:00:00Z 181232
2019-01-02T00:00:00Z 347824
Grain = 5day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-30
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(5d), "host" LIMIT 2;
time mean_messages
---- ----
2018-12-30T00:00:00Z 529056
2019-01-04T00:00:00Z 826694.3999999762
I read in the documentation that Influx uses present time boundary, but doesn't say how present time boundary is calculated. Is it a start of a month or a start of a week or time of first data received or time of the shard starting?
If I know how this present time boundary is being calculated, I can specify offset in groupby to keep the first slot starting from 2019-01-01.

InfluxDB uses epoch time to calculate the present time boundaries. It creates groupby slots with reference to epoch time.
To keep start time same in groupby I need to pass an offset.
Here is a simple offset calculation function written in python which takes start time and groupby interval.
def get_offset(start_dt, interval_m):
epoch = datetime.datetime.utcfromtimestamp(0)
offset = (start_time - epoch).total_seconds() % (interval_m * 60)
return offset
start_dt = datetime.datetime(2019,1,1,0,0)
interval_m = 1440 * 3 # 3 days
offset_s = get_offset(start_dt, interval_m) # 172800
Groupby interval with 3 days, the query will look like below with offset.
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(3d, 172800s), "host" LIMIT 2;
time mean_messages
---- ----
2019-01-01T00:00:00+05:30 539232
2019-01-04T00:00:00+05:30 464640
https://github.com/influxdata/influxdb/issues/8010

Related

InfluxQL time calculations return no records

I'd like to query InfluxDB using InfluxQL and exclude any rows from 0 to 5 minutes after the hour.
Seems pretty easy to do using the time field (the number of nanoseconds since the epoch) and a little modulus math. But the problem is that any WHERE clause with even the simplest calculation on time returns zero records.
How can I get what I need if I can't perform calculations on time? How can I exclude any rows from 0 to 5 minutes after the hour?
# Returns 10 records
SELECT * FROM "telegraf"."autogen"."processes" WHERE time > 0 LIMIT 10
# Returns 0 records
SELECT * FROM "telegraf"."autogen"."processes" WHERE (time/1) > 0 LIMIT 10

How to obtain time interval value reports from InfluxDB

Using InfluxDB: Is there any way to build a time-bucketed report of a field value representing a state that persists over time? Ideally in InfluxQL query language
More specifically as an example: Say a measurement contains points that report changes in the light bulb state (On / Off). They could be 0s and 1s as in the example below, or any other value. For example:
time light
---- -----
2022-03-18T00:00:00Z 1
2022-03-18T01:05:00Z 0
2022-03-18T01:55:00Z 0
2022-03-18T02:30:00Z 1
2022-03-18T04:06:00Z 0
The result should be a listing of intervals indicating if this light was on or off during each time interval (e.g. hours), or what percentage of that time it was on. For the given example, the result if grouping hourly should be:
Hour
Value
2022-03-18 00:00
1.00
2022-03-18 01:00
0.17
2022-03-18 02:00
0.50
2022-03-18 03:00
1.00
2022-03-18 04:00
0.10
Note that:
for 1am bucket, even if the light starts and ends in On state, it was On for only 10 over 60 minutes, so the value is low (10/60)
and more importantly the bucket from 3am to 4am has value "1" as the light was On since the last period, even if there was no change in this time period. This rules out usage of simple aggregation (e.g. MEAN) over a GROUP BY TIME(), as there would not be any way to know if an empty/missing bucket corresponds to an On or Off state as it only depends on the last reported value before that time bucket.
Is there a way to implement it in pure InfluxQL, without retrieving potentially big data sets (points) and iterating through them in a client?
I consider that raw data could be obtained by query:
SELECT "light" FROM "test3" WHERE $timeFilter
Where "test3" is your measurement name and $timeFilter is from... to... time period.
In this case we need to use a subquery which will fill our data, let's consider grouping (resolution) time as 1s:
SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)
This query gives us 1/0 value every 1s. We will use it as a subquery.
NOTE: You should be informed that this way does not consider if beginning of data period within $timeFilter has been started with light on or off. This way will not provide any data before hour with any value within $timeFilter.
In next step you should use integral() function on data you got from subquery, like this:
SELECT integral("filled_light",1h) from (SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)) group by time(1h)
This is how it looks on charts:
And how Result data looks in a table:
This is not a perfect way of getting it to work but I hope it resolves your problem.

InfluxDB sum returned values with same time

I'm trying to retrieve the sum of same values that has the same timestamp.
My query is
SELECT value FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z'
My returned values are
time value
---- -----
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
How can I change my query the result will be
time sum
---- -----
2021-03-07T00:00:00Z 3
2021-03-08T00:00:00Z 6
2021-03-09T00:00:00Z 9
SELECT SUM(value) FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z' GROUP BY time(1h) FILL(none)
GROUP BY time(1h) - group results by time column with interval of 1h
FILL(none) - ignore empty results

Influx DB: query return wrong time values (everything is +1)

I am struggling here on some quite simple queries on my influx DB.
every minute, a measurement from a sensor is written into the DB.
I i try to query a certain day, e.g from 00:00:00 to 23:59:59 the starting time of the query results is not 00:01:00 as I would expect, unfortunately it is 01:01:00
The epoch time value is e.g.
1578182460000000000
If i convert that value into human readable format I get (epochconverter.com):
1578182460000000000
Supports Unix timestamps in seconds, milliseconds, microseconds and nanoseconds.
Assuming that this timestamp is in nanoseconds (1 billionth of a second):
GMT: Sunday, 5. January 2020 00:01:00
Your time zone: Sonntag, 5. Jänner 2020 01:01:00 GMT+01:00
What is wrong?
The program which is writing values into the db and the system which is reading the values out of the database are both in the same timezone. (Europe/Vienna GMT+1)
the query is:
**> SELECT * FROM generalhistory WHERE time > '2020-01-05T00:00:00Z' and time < '2020-01-06' and DPName = 'Aussenbereich.Sensor.Hum' order by time asc limit 1;**
name: generalhistory
time DPName ID Manager Timestamp Value_Numeric Value_String
---- ------ -- ------- --------- ------------- ------------
1578182460000000000 Aussenbereich.Sensor.Hum 30104823 IPDriver_4 2020-01-05\ 01:01:00 99.9
Looking forward
BR
Dieter
Its quite right, the problem just when using local time (GMT+1) as filter arguments.
Therefore to get the right UTC time (GMT+0) that equal to local time (GMT+1), than the time filter must be deducted with one hour,
And the query would be following:
SELECT * FROM generalhistory
WHERE time >= '2020-01-05T00:00:00Z' -1h
and time < '2020-01-06T00:00:00Z' -1h
and DPName = 'Aussenbereich.Sensor.Hum'
ORDER BY time ASC
LIMIT 1;
Hope my answer could help.
Time syntax

InfluxDB average of distinct count over time

Using Influx DB v0.9, say I have this simple query:
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(1m)
Which gives results like:
08:00 5
08:01 10
08:02 5
08:03 10
08:04 5
Now I want a query that produces points with an average of those values over 5 minutes. So the points are now 5 minutes apart, instead of 1 minute, but are an average of the 1 minute values. So the above 5 points would be 1 point with a value of the result of (5+10+5+10+5)/5.
This does not produce the results I am after, for clarity, since this is just a count, and I'm after the average.
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
This doesn't work (gives errors):
select mean(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
Also doesn't work (gives error):
select mean(count(distinct("id"))) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
In my actual usage "id" is a string (content, not a tag, because count distinct not supported for tags in my version of InfluxDB).
To clarify a few points for readers, in InfluxQL, functions like COUNT() and DISTINCT() can only accept fields, not tags. In addition, while COUNT() supports the nesting of the DISTINCT() function, most nested or sub-functions are not yet supported. In addition, nested queries, subqueries, or stored procedures are not supported.
However, there is a way to address your need using continuous queries, which are a way to automate the processing of data and writing those results back to the database.
First take your original query and make it a continuous query (CQ).
CREATE CONTINUOUS QUERY count_foo ON my_database_name BEGIN
SELECT COUNT(DISTINCT("id")) AS "1m_count" INTO main_1m_count FROM "main" GROUP BY time(1m)
END
There are other options for the CQ, but that basic one will wake up every minute, calculate the COUNT(DISTINCT("id")) for the prior minute, and then store that result in a new measurement, main_1m_count.
Now, you can easily calculate your 5 minute mean COUNT from the pre-calculated 1 minute COUNT results in main_1m_count:
SELECT MEAN("1m_count") FROM main_1m_count WHERE time > now() - 30m GROUP BY time(5m)
(Note that by default, InfluxDB uses epoch 0 and now() as the lower and upper time range boundaries, so it is redundant to include and time < now() in the WHERE clause.)

Resources