InfluxDB sum returned values with same time - influxdb

I'm trying to retrieve the sum of same values that has the same timestamp.
My query is
SELECT value FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z'
My returned values are
time value
---- -----
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
How can I change my query the result will be
time sum
---- -----
2021-03-07T00:00:00Z 3
2021-03-08T00:00:00Z 6
2021-03-09T00:00:00Z 9

SELECT SUM(value) FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z' GROUP BY time(1h) FILL(none)
GROUP BY time(1h) - group results by time column with interval of 1h
FILL(none) - ignore empty results

Related

InfluxQL time calculations return no records

I'd like to query InfluxDB using InfluxQL and exclude any rows from 0 to 5 minutes after the hour.
Seems pretty easy to do using the time field (the number of nanoseconds since the epoch) and a little modulus math. But the problem is that any WHERE clause with even the simplest calculation on time returns zero records.
How can I get what I need if I can't perform calculations on time? How can I exclude any rows from 0 to 5 minutes after the hour?
# Returns 10 records
SELECT * FROM "telegraf"."autogen"."processes" WHERE time > 0 LIMIT 10
# Returns 0 records
SELECT * FROM "telegraf"."autogen"."processes" WHERE (time/1) > 0 LIMIT 10

InfluxDB: Wrong start time when group by more than 1 day

I stuck at one issue related group by when the time interval is more than 1 day. It's giving the wrong start time for different grain when grain is more than 1 day.
Grain = 1day
ExpectedStartTime=2019-01-01
ActualStartTime=2019-01-01
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(1d), "host" LIMIT 2;
time mean_messages
---- ----
2019-01-01T00:00:00Z 181232
2019-01-02T00:00:00Z 179728
Grain = 2day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-31
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(2d), "host" LIMIT 2;
time mean_messages
---- ----
2018-12-31T00:00:00Z 181232
2019-01-02T00:00:00Z 347824
Grain = 5day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-30
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(5d), "host" LIMIT 2;
time mean_messages
---- ----
2018-12-30T00:00:00Z 529056
2019-01-04T00:00:00Z 826694.3999999762
I read in the documentation that Influx uses present time boundary, but doesn't say how present time boundary is calculated. Is it a start of a month or a start of a week or time of first data received or time of the shard starting?
If I know how this present time boundary is being calculated, I can specify offset in groupby to keep the first slot starting from 2019-01-01.
InfluxDB uses epoch time to calculate the present time boundaries. It creates groupby slots with reference to epoch time.
To keep start time same in groupby I need to pass an offset.
Here is a simple offset calculation function written in python which takes start time and groupby interval.
def get_offset(start_dt, interval_m):
epoch = datetime.datetime.utcfromtimestamp(0)
offset = (start_time - epoch).total_seconds() % (interval_m * 60)
return offset
start_dt = datetime.datetime(2019,1,1,0,0)
interval_m = 1440 * 3 # 3 days
offset_s = get_offset(start_dt, interval_m) # 172800
Groupby interval with 3 days, the query will look like below with offset.
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(3d, 172800s), "host" LIMIT 2;
time mean_messages
---- ----
2019-01-01T00:00:00+05:30 539232
2019-01-04T00:00:00+05:30 464640
https://github.com/influxdata/influxdb/issues/8010

Influxdb + grafana : daily count on timeseries

I have a timeseries with a value column.
I want to get a cumulative count of values per days.
For example if I have for day 1 following points :
(timestamp1, value1) (timestamp2, value2) (timestamp3, value3)
I want to have a graph with :
zero value displayed for day 1 00:00 to timestamp1
1 value displayed for timestamp1 to timestamp2
2 value displayed for timestamp2 to timestamp3
3 value displayed for timestamp3 to day2 00:00
zero value displayed for day 2 00:00 to first value of day2
and so on
I could do such a request :
select count(value) from series where time = today group by time($interval)
But I won't get expected result as "group by" doesn't do a cumulative count a values per day but a cumulative count of value per $interval.
And if I do :
select count(value) from series where time = today group by time(today)
I will only have 1 count value per day.
How can I do it?
Not sure I understand, but could this be a solution?
select count(value) from series where time = today group by time(24h)

InfluxDB average of distinct count over time

Using Influx DB v0.9, say I have this simple query:
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(1m)
Which gives results like:
08:00 5
08:01 10
08:02 5
08:03 10
08:04 5
Now I want a query that produces points with an average of those values over 5 minutes. So the points are now 5 minutes apart, instead of 1 minute, but are an average of the 1 minute values. So the above 5 points would be 1 point with a value of the result of (5+10+5+10+5)/5.
This does not produce the results I am after, for clarity, since this is just a count, and I'm after the average.
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
This doesn't work (gives errors):
select mean(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
Also doesn't work (gives error):
select mean(count(distinct("id"))) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
In my actual usage "id" is a string (content, not a tag, because count distinct not supported for tags in my version of InfluxDB).
To clarify a few points for readers, in InfluxQL, functions like COUNT() and DISTINCT() can only accept fields, not tags. In addition, while COUNT() supports the nesting of the DISTINCT() function, most nested or sub-functions are not yet supported. In addition, nested queries, subqueries, or stored procedures are not supported.
However, there is a way to address your need using continuous queries, which are a way to automate the processing of data and writing those results back to the database.
First take your original query and make it a continuous query (CQ).
CREATE CONTINUOUS QUERY count_foo ON my_database_name BEGIN
SELECT COUNT(DISTINCT("id")) AS "1m_count" INTO main_1m_count FROM "main" GROUP BY time(1m)
END
There are other options for the CQ, but that basic one will wake up every minute, calculate the COUNT(DISTINCT("id")) for the prior minute, and then store that result in a new measurement, main_1m_count.
Now, you can easily calculate your 5 minute mean COUNT from the pre-calculated 1 minute COUNT results in main_1m_count:
SELECT MEAN("1m_count") FROM main_1m_count WHERE time > now() - 30m GROUP BY time(5m)
(Note that by default, InfluxDB uses epoch 0 and now() as the lower and upper time range boundaries, so it is redundant to include and time < now() in the WHERE clause.)

SPSS dataset restructuring involving variable for survey completion date

I'm using SPSS and have a dataset comprised of individuals' responses to a survey question. This is longitudinal data, so the subjects have taken the survey at least twice and some as many as four or five times.
My variables are ID (scale), date of survey completion (date - dd-mmm-yyyy), and response to survey question (scale).
The dataset is sorted by ID then date (ascending). Each date corresponds to survey time 1, time 2, etc. What I would like to do is compute a new variable time that corresponds to the survey completion dates for a particular participant. I would then like to use that variable to complete a long-to-wide restructuring of the dataset.
So, I'd like to accomplish the following and am not sure how to go about doing it:
1) I have something like this:
ID Date Assessment_Answer
----------------------------------
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
2) Want to compute another variable that would give me this:
ID Date Assessment_Answer Time
-----------------------------------------
1 01-Jan-2009 4 Time1
1 01-Jan-2010 1 Time2
1 01-Jan-2011 5 Time3
2 15-Oct-2012 6 Time1
2 15-Oct-2013 0 Time2
3) And restructure so that I have something like this:
ID Time1 Time2 Time3 Time4
--------------------------
1 4 1 5
2 6 0
You can use sequential case processing to create a variable that is a counter within each ID. So for example:
*Making fake data.
DATA LIST FREE / ID (F1.0) Date (DATE10) Assessment_Answer (F1.0).
BEGIN DATA
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
END DATA.
*Making counter within ID.
SORT CASES BY Id Date.
DO IF ($casenum = 1) OR (Id <> LAG(ID)).
COMPUTE Time = 1.
ELSE.
COMPUTE Time = LAG(Time) + 1.
END IF.
FORMATS Time (F2.0).
EXECUTE.
Now you can use CASESTOVARS to reshape the data like you requested.
CASESTOVARS
/ID = Id
/INDEX = Time
/DROP Date.

Resources