Writing to Influx DB results points beyond retention policy dropped=1 - influxdb

I am trying to test continuous queries as per influx example.
I am trying to create the measurement with following data.
insert bus_data,passengers=5 complaints=8 1557187200
insert bus_data,passengers=8 complaints=8 1557188100
insert bus_data,passengers=8 complaints=8 1557189000
insert bus_data,passengers=7 complaints=8 1557189900
insert bus_data,passengers=8 complaints=8 1557190800
insert bus_data,passengers=15 complaints=8 1557191700
insert bus_data,passengers=15 complaints=8 1557192600
insert bus_data,passengers=17 complaints=8 1557193500
insert bus_data,passengers=20 complaints=8 155719440
But this results in error:
ERR: {"error":"partial write: points beyond retention policy dropped=1"}
Here are my retention policies and oneday is default:
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 false
oneday 24h0m0s 1h0m0s 1 true
onemonth 720h0m0s 24h0m0s 1 false
MONTH 720h0m0s 24h0m0s 1 false
YEARs 8736h0m0s 168h0m0s 1 false

your default retention policies is 24 hours, but be the time of your data is before 24 hour, so there is no need to write to db, because data is only be saved for 24 hour

Related

Google Sheets query output changes upon closing and reopening workbook

I have a workbook where I track game stats for my local community. I added a chart that changes upon a few selections and I use filter to get the desired result. The data comes from a sheet where I use query to calculate month to month differences (since I could not find this easily done with google's provided pivot options). One of the query's looks like this
=query('Response Edits'!1:1112,"select A,B,C WHERE A IS NOT NULL AND NOT H matches '"&textjoin("|",TRUE,query('Response Edits'!1:1112,"select min(H) WHERE A IS NOT NULL group by D",0))&"' order by D, C ASC",0)
A converts the month value in the timestamp to the correct survey month (e.g. a 2020-07-01 would be for 06 survey and 2020-07-29 would
be for 07 survey)
B converts the year value in the timestamp to the correct survey year
C is the timestamp of the survey submission
D is the player name
H is the player XP of the survey submission (I use this as a lazy solution since it only increases and because I could not figure out a
way to include the key phrase date using multiple datetime e.g.
NOT C matches date texjoin("|",TRUE,"select min(C)...") did not work)
the textjoin is just to remove the earliest date submitted because it would not have a month to month value. Here is a portion of the output of the query above and another query which I believe is correct:
7 2020 2020-07-31 23:18:48 ... 6873449 198 11610
8 2020 2020-08-31 22:15:53 ... 7789713 175 8732
9 2020 2020-09-30 23:03:12 ... 5994347 139 8932
When I close the the sheet and reopen it I notice that my chart has only 0 values because my sheet with the query functions is only outputting 0. The above query and my other query have also given a different output, which I have provided a portion for below:
6 2020 2020-06-30 22:04:02 ... 0 0 0
7 2020 2020-07-31 23:18:48 ... 0 0 0
8 2020 2020-08-31 22:15:53 ... 0 0 0
9 2020 2020-09-30 23:03:12 ... 0 0 0
I am new to using query, but the formula seems correct, because if I change the last 0 in the formula (which is the option for header) to 1 and then back to 0 I get the desired result.
Tl;dr Why does the queried data not output correctly when I close and reopen a workbook? And why does it output correctly after the formula is changed and changed back (including selecting undo)? Is it potentially textjoin or matches causing the problem in the query?
try to run this:
=QUERY('Response Edits'!A1:H1112,
"select A,B,C
where A is not null
and not H matches '"&TEXTJOIN("|", 1,
QUERY('Response Edits'!A1:H1112,
"select min(H)
where A is not null group by D", 0))&"'
order by D, C", 0)

InfluxDB sum returned values with same time

I'm trying to retrieve the sum of same values that has the same timestamp.
My query is
SELECT value FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z'
My returned values are
time value
---- -----
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
How can I change my query the result will be
time sum
---- -----
2021-03-07T00:00:00Z 3
2021-03-08T00:00:00Z 6
2021-03-09T00:00:00Z 9
SELECT SUM(value) FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z' GROUP BY time(1h) FILL(none)
GROUP BY time(1h) - group results by time column with interval of 1h
FILL(none) - ignore empty results

InfluxDB get list of changes

I would like to get a result such as the following:
name from_value to_value at
tag A 10 15 2019-02-11 16:00
tag B 1 2 2019-02-11 16:00
tag A 15 20 2019-02-11 16:05
tag B 2 3 2019-02-11 16:05
tag A 20 25 2019-02-11 16:10
tag B 3 4 2019-02-11 16:10
basically a column "from_value" (previous value current point) and a column "to_value" (current value current point).
To select only the current point value I do:
SELECT value FROM data WHERE "name"='tag A'
What if I wanted to select also the previous value?
SELECT prev(value) AS "from_value", value AS "to_value" FROM data WHERE "name"='tag A'
Can I do something like the above or I need to always save the previous value every time for every new point?
With group by time you can use last() and difference() functions to get value changes per time interval.
SELECT LAST(value)-DIFFERENCE(LAST(value)) as FromValue, LAST(value) as ToValue
FROM demo where time > 1549983975150000000
GROUP BY time(10ms),tagA FILL(none)
name: demo
tags: tagA=1
time FromValue ToValue
---- --------- -------
1549984410470000000 10
1549984421820000000 10 15
1549984431180000000 15 17
1549984436350000000 17 10
1549984753810000000 10 10
SELECT * FROM demo
name: demo
time tagA value
---- ---- -----
1549984410475859753 1 10
1549984421827992234 1 15
1549984431180379398 1 17
1549984436356232522 1 10
1549984753817094214 1 10

Delete points between hours in InfluxDB

I have a measurement that stores prices for every 10 seconds (their seconds last with 0-10-20-30-40-50).
I would like to delete old points (older than 1 year) to keep only prices every hour.
How to get these candidates?
You can achieve this with Retention Policy + Continuous Query:
CREATE RETENTION POLICY "one_year" ON "database_name" DURATION 52w REPLICATION 1 DEFAULT
autogen RP has an infinite retention duration:
CREATE CONTINUOUS QUERY "aggregate_prices" ON "database_name"
BEGIN
SELECT mean("value")
INTO "autogen"."prices"
FROM "prices"
GROUP BY time(1h)
END

How to load Bucketed HIVE table using LOAD DATA LOCAL INPATH

Can we load a Bucketed HIVE table using LOAD DATA LOCAL INPATH ... command. I have executed it for a sample file, but data values are inserted as NULL.
hduser#ubuntu:~$ cat /home/hduser/Desktop/hive_external/hive_external/emp2.csv
101,EName1,110.1
102,EName2,120.1
103,EName3,130.1
hive (default)> load data local inpath '/home/hduser/Desktop/hive_external/hive_external' overwrite into table emp_bucket;
Loading data to table default.emp_bucket
Table default.emp_bucket stats: [numFiles=1, numRows=0, totalSize=51, rawDataSize=0]
OK
Time taken: 1.437 seconds
hive (default)> select * from emp_bucket;
OK
emp_bucket.emp_id emp_bucket.emp_name emp_bucket.emp_salary
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Time taken: 0.354 seconds, Fetched: 3 row(s)
hive (default)> show create table emp_bucket;
OK
createtab_stmt
CREATE TABLE `emp_bucket`(
`emp_id` int,
`emp_name` string,
`emp_salary` float)
CLUSTERED BY (
emp_id)
INTO 3 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:54310/user/hive/warehouse/emp_bucket'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='51',
'transient_lastDdlTime'='1457967994')
Time taken: 0.801 seconds, Fetched: 22 row(s)
But when INSERTED using insert command the data got INSERTED successfully.
hive (default)> select * from koushik.emp2;
OK
emp2.id emp2.name emp2.salary
101 EName1 110.1
102 EName2 120.1
103 EName3 130.1
Time taken: 0.266 seconds, Fetched: 3 row(s)
hive (default)> insert overwrite table emp_bucket select * from koushik.emp2;
Query ID = hduser_20160314080808_ae88f1c8-3db6-4a5c-99d2-e9a5312c597d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1457951378402_0002, Tracking URL = http://localhost:8088/proxy/application_1457951378402_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1457951378402_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2016-03-14 08:09:33,203 Stage-1 map = 0%, reduce = 0%
2016-03-14 08:09:48,243 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.24 sec
2016-03-14 08:09:59,130 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 6.39 sec
2016-03-14 08:10:02,382 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 8.8 sec
2016-03-14 08:10:03,442 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 11.03 sec
MapReduce Total cumulative CPU time: 11 seconds 30 msec
Ended Job = job_1457951378402_0002
Loading data to table default.emp_bucket
Table default.emp_bucket stats: [numFiles=3, numRows=3, totalSize=51, rawDataSize=48]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 11.03 sec HDFS Read: 12596 HDFS Write: 273 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 30 msec
OK
emp2.id emp2.name emp2.salary
Time taken: 103.027 seconds
hive (default)> select * from emp_bucket;
OK
emp_bucket.emp_id emp_bucket.emp_name emp_bucket.emp_salary
102 EName2 120.1
103 EName3 130.1
101 EName1 110.1
Time taken: 0.08 seconds, Fetched: 3 row(s)
Question is can't a HIVE bucketed table be loaded from a file?
You may have to enable bucketing before loading a file in a bucketed table.
Use this to set bucketing attribute first and then load your file.
set hive.enforce.bucketing = true;
comment here if it doesn't work.
Apparently Hive does not support bucketing on external tables. Thus, instead of the LOAD DATA INPATH route, you apparently have to INSERT OVERWRITE TABLE ..., cf hadoop tutorial.

Resources