I had data from last 7 days in influx. I applied retention policy and suddenly all data got deleted. i have single instance of influx running.
CREATE RETENTION POLICY stats_30_day ON server_stats DURATION 30d REPLICATION 1
ALTER RETENTION POLICY stats_30_day ON server_stats DURATION 30d REPLICATION 1 default
Any idea what went wrong ?
You changed what your default retention policy is. So when you query you'll have to specify the other retention policy. See:
https://docs.influxdata.com/influxdb/v0.13/troubleshooting/frequently_encountered_issues/#missing-data-after-creating-a-new-default-retention-policy
Related
Below is the scenario against which I have this question.
Requirement:
Pre-aggregate time series data within influxDb with granularity of seconds, minutes, hours, days & weeks for each sensor in a device.
Current Proposal:
Create five Continuous Queries (one for each granularity level i.e. Seconds, minutes ...) for each sensor of a device in a different retention policy as that of the raw time series data, when the device is onboarded.
Limitation with Current Proposal:
With increased number of device/sensor (time series data source), the influx will get bloated with too many Continuous Queries (which is not recommended) and will take a toll on the influxDb instance itself.
Question:
To avoid the above problems, is there a possibility to create Continuous Queries on the same source measurement (i.e. raw timeseries measurement) but the aggregates can be differentiated within the measurement using new tags introduced to differentiate the results from Continuous Queries from that of the raw time series data in the measurement.
Example:
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."raw"."strain" GROUP BY time(1s),*
END
As far as I know, and have seen from the docs, it's not possible to apply new tags in continuous queries.
If I've understood the requirements correctly this is one way you could approach it.
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."strain_seconds_retention_policy"."strain" GROUP BY time(1s),*
END
This would save the data in the same measurement but a different retention policy - strain_seconds_retention_policy. When you do a select you specify the corresponding retention policy from which to select.
Note that, it is not possible to perform a select from several retention policies at the same time. If you don't specify one, the default one is used (and not all of them). If it is something you need then another approach could be used.
I don't quite get why you'd need to define a continuous query per device and per sensor. You only need to define five (1 per seconds, minutes, hours, days, weeks) and do a group by * (all) which you already do. As long as the source datapoint has a tag with the id for the corresponding device and sensor, the resampled datapoint will have it too. Any newly added devices (data) will just be processed automatically by those 5 queries and saved into the corresponding retention policies.
If you do want to apply additional tags, you could process the data outside the database in a custom script and write it back with any additional tags you need, instead of using continuous queries
Could you please help to resolve the below error -
INSERT blocks,color=blue number=90 1547095860
ERR: {"error":"partial write: points beyond retention policy dropped=1"}
Note - 1547095860 is - 1/10/2019, 10:21:00 AM Standard Time
I am trying to insert this data on today i.e. 25th jan 2019.
MY DB settings are as in the image
enter image description here
Thanks!
Based on the screenshot provided, I suppose you're attempting to insert data into one of the databases named LINSERVER?
By default, if no retention policy (RP) is specified for a database then the system will go with the autogen RP; which has an infinite retention period. In other words, you can insert data anywhere from the start of epoch time till the end and data will be preserved .
If a retention policy of example 7d is defined for a database, this means that you can only insert data for a limited period of one week. Data point who falls outside the scope the retention period is considered as expired and will be "garbage collected" by the database.
This also means that you cannot backfill data outside the retention period. My guess is database could have a relatively small retention period which resulted in the new data that you just inserted to be considered as expired. Henceforth the error message "points beyond retention policy dropped=1"
Solution: Either use the default autogen RP or make the duration of your database retention policy longer. All depending on your use case.
I’m new to influx and grafana and want to know in terms of percentage over a 24 hour period on whether the machine is off, idle or on. This is a IOT project and we're recording the state (off, idle on) by the power used. The data ends up being stored in influx under state as either 0 = off, 1 = idle and 2 = on. How can I achieve this in influx query or Grafana? Any pointers or help is appreciated.
I would use the pie chart plugin to show the three states and respective percentages over time. You can do something like:
SELECT mean("value") FROM "state" WHERE value=[0|1|2] GROUP BY time($__interval) fill(null)
For each part of the chart.
Have a look at grafana-discrete-panel (also available in official grafana plugin repository). It is designed specially to visualize state changes in time.
I have created one year policy in InfluxDB and shard group duration was automatically set to 168h.
This is how my retentions look like now:
This is how my shards look like now:
What does it mean for my data that shard's end time is set one week ahead?
It means that all of data written to database st_test and retention policy a_year with a timestamp between 2016-10-03 and 2016-10-10 will be stored in shard 16.
A retention policy is a container for shards. Each shard in the retention policy will have 1w worth of data. And after 1y that shard will expire and we will remove it.
See the shard documentation for more information.
In order to understand shard group durations, you need to understand its relation with retention policy duration.
The Retention policy DURATION determines how long InfluxDB keeps the data. While SHARD DURATION clause determines the time range covered by a shard group.
A single shard group covers a specific time interval; InfluxDB determines that time interval by looking at the DURATION of the relevant retention policy (RP). The table below outlines the default relationship between the DURATION of an RP and the time interval of a shard group,
When you create a retention policy, you can modify that shard duration,
CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]
By Default, rollups tables like rollup360, rollup60, rollup7200, rollup86400 have value 0 for default_time_to_live which means the data never expires. But as per Opscenter Metrics blog Using Cassandra’s built in ttl support, OpsCenter expires the columns in the rollups60 column family after 7 days, the rollups300 column family after 4 weeks, the rollups 7200 column family after 1 year, and the data in the rollups86400 column family never expires.
What is the reason behind this setting and Where do we set the TTL for these tables?
Since OpsCenter data is growing, shouldn't we have TTLs defined for
rollups tables at the table level?
But in opscenterd.conf default values are listed below.
[cassandra_metrics]
1min_ttl = 86400
5min_ttl = 604800
2hr_ttl = 2419200
Which settings has preference over the other?
There are defaults if not set anywhere defined in the documentation:
1min_ttl Sets the time in seconds to expire 1
minute data points. The default value is 604800 (7 days).
5min_ttl Sets the time in seconds to expire 5
minute data points. The default value is 2419200 (28 days).
2hr_ttl Sets the time in seconds to expire 2 hour
data points. The default value is 31536000 (365 days).
24hr_ttl Sets the time to expire 24 hour data
points. The default value is 0, or never.
If you dont set them it will use the defaults, but if you override them in the [cassandra_metrics] section of the opscenterd.conf. When the agent on the node stores a rollup for a period it will include whatever TTL its associated with, ie (not exactly how opscenter does it but for demonstration purposes):
INSERT INTO rollups60 (key, timestamp, value) VALUES (...) USING TTL 604800;
In your example you lowered the TTLs which would decrease the amount of data stored. So for:
1) You set lower TTL to decrease amount of data stored on disk. You can configure it as you mentioned in your ticket. Although the compaction strategy can affect this significantly.
2) There is a default ttl setting on the tables, but there really isn't much difference between setting it per query and having it in the table. Doing an alter table is pretty expensive if need to change it compared to just changing the value of the ttl on the inserts. If having issues with obsolete data in tables try switching to LeveledCompactionStrategy (not this increases IO on compactions but probably not noticeable)
According to :
https://docs.datastax.com/en/latest-opsc/opsc/configure/opscChangingPerformanceDataExpiration_t.html
"Edit the cluster_name.conf file."
Chris, you suggested to edit the opscenterd.conf.