Unable to insert data in influxdb - influxdb

Could you please help to resolve the below error -
INSERT blocks,color=blue number=90 1547095860
ERR: {"error":"partial write: points beyond retention policy dropped=1"}
Note - 1547095860 is - 1/10/2019, 10:21:00 AM Standard Time
I am trying to insert this data on today i.e. 25th jan 2019.
MY DB settings are as in the image
enter image description here
Thanks!

Based on the screenshot provided, I suppose you're attempting to insert data into one of the databases named LINSERVER?
By default, if no retention policy (RP) is specified for a database then the system will go with the autogen RP; which has an infinite retention period. In other words, you can insert data anywhere from the start of epoch time till the end and data will be preserved .
If a retention policy of example 7d is defined for a database, this means that you can only insert data for a limited period of one week. Data point who falls outside the scope the retention period is considered as expired and will be "garbage collected" by the database.
This also means that you cannot backfill data outside the retention period. My guess is database could have a relatively small retention period which resulted in the new data that you just inserted to be considered as expired. Henceforth the error message "points beyond retention policy dropped=1"
Solution: Either use the default autogen RP or make the duration of your database retention policy longer. All depending on your use case.

Related

Can InfluxDB have Continuous Queries with same source & target measurements but with different/new tags?

Below is the scenario against which I have this question.
Requirement:
Pre-aggregate time series data within influxDb with granularity of seconds, minutes, hours, days & weeks for each sensor in a device.
Current Proposal:
Create five Continuous Queries (one for each granularity level i.e. Seconds, minutes ...) for each sensor of a device in a different retention policy as that of the raw time series data, when the device is onboarded.
Limitation with Current Proposal:
With increased number of device/sensor (time series data source), the influx will get bloated with too many Continuous Queries (which is not recommended) and will take a toll on the influxDb instance itself.
Question:
To avoid the above problems, is there a possibility to create Continuous Queries on the same source measurement (i.e. raw timeseries measurement) but the aggregates can be differentiated within the measurement using new tags introduced to differentiate the results from Continuous Queries from that of the raw time series data in the measurement.
Example:
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."raw"."strain" GROUP BY time(1s),*
END
As far as I know, and have seen from the docs, it's not possible to apply new tags in continuous queries.
If I've understood the requirements correctly this is one way you could approach it.
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."strain_seconds_retention_policy"."strain" GROUP BY time(1s),*
END
This would save the data in the same measurement but a different retention policy - strain_seconds_retention_policy. When you do a select you specify the corresponding retention policy from which to select.
Note that, it is not possible to perform a select from several retention policies at the same time. If you don't specify one, the default one is used (and not all of them). If it is something you need then another approach could be used.
I don't quite get why you'd need to define a continuous query per device and per sensor. You only need to define five (1 per seconds, minutes, hours, days, weeks) and do a group by * (all) which you already do. As long as the source datapoint has a tag with the id for the corresponding device and sensor, the resampled datapoint will have it too. Any newly added devices (data) will just be processed automatically by those 5 queries and saved into the corresponding retention policies.
If you do want to apply additional tags, you could process the data outside the database in a custom script and write it back with any additional tags you need, instead of using continuous queries

InfluxDB - what's shard group duration

I have created one year policy in InfluxDB and shard group duration was automatically set to 168h.
This is how my retentions look like now:
This is how my shards look like now:
What does it mean for my data that shard's end time is set one week ahead?
It means that all of data written to database st_test and retention policy a_year with a timestamp between 2016-10-03 and 2016-10-10 will be stored in shard 16.
A retention policy is a container for shards. Each shard in the retention policy will have 1w worth of data. And after 1y that shard will expire and we will remove it.
See the shard documentation for more information.
In order to understand shard group durations, you need to understand its relation with retention policy duration.
The Retention policy DURATION determines how long InfluxDB keeps the data. While SHARD DURATION clause determines the time range covered by a shard group.
A single shard group covers a specific time interval; InfluxDB determines that time interval by looking at the DURATION of the relevant retention policy (RP). The table below outlines the default relationship between the DURATION of an RP and the time interval of a shard group,
When you create a retention policy, you can modify that shard duration,
CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]

Is it possible keep only value changes in influxdb?

Is it possible to downsample older data using influxdb in a way that it only keeps change of values?
My example is the following:
I have a binary sensor sending data every 10 min, so naturally the consecutive values look something like this: 0,0,0,0,0,1,1,0,0,0,0...
My goal is to keep this kind of raw data over a certain period of time using retention policies and downsample the data for longer storage. I want to delete all successive values with the same number so that I have only the datapoint with their timestamps when the value actually changed. The downsampled data should look like this: 0,1,0,1,0,1,0.... but with the correct timestamp when the event actually occurred.
Currently this isn't possible with InfluxDB, though the plan is to eventually support this kind of use case.
I would encourage you to open an feature request on the InfluxDB repo asking for this.

influx: all old data got deleted after applying retention policy

I had data from last 7 days in influx. I applied retention policy and suddenly all data got deleted. i have single instance of influx running.
CREATE RETENTION POLICY stats_30_day ON server_stats DURATION 30d REPLICATION 1
ALTER RETENTION POLICY stats_30_day ON server_stats DURATION 30d REPLICATION 1 default
Any idea what went wrong ?
You changed what your default retention policy is. So when you query you'll have to specify the other retention policy. See:
https://docs.influxdata.com/influxdb/v0.13/troubleshooting/frequently_encountered_issues/#missing-data-after-creating-a-new-default-retention-policy

iOS and Mysql Events

I'm working on an app that connects to a mysql backend. It's a little simliar to snapchat in that once the current user gets the pics from the users they follow and see them they can never again see these pics. However, I can't just delete the pics from the database, the user who uploaded the pic still needs to see them. So I've come up with an interesting design and I want to know if its good or not.
When uploading the pic I would also create a mysql event that would run the same time exactly one day after the pic was uploaded deleting itself. If I have people uploading pics all the time events would be created all the time. How does this effect the mysql database. Is this even scalable?
No, not scalable: Deleting of single records is quick, however if your volume increases, you run into trouble. You do however have a classic case for using partitioning:
Create table your_images (insert_date DATE,some_image BLOB, some_owner INT)
ENGINE=InnoDB /* row_format=compressed key_block_size=4 */
PARTITION BY RANGE COLUMNS (insert_date)
PARTITION p01 VALUES LESS THAN ('2015-07-12'),
PARTITION p02 VALUES LESS THAN ('2015-07-03'),
PARTITION p0x VALUES LESS THAN (ETC),
PARTITION p0n VALUES LESS THAN (MAXVALUE));
You can then insert just as you are used to, drop the partitions once per day (using 1 event for all your data), and create new partitions also once per day (using the same event which is dropping your old partitions).
To make certain a photo lives for 24 hours (minimum), the partition cleanup has to occur with a 1 day delay (So cleanup the day before yesterday, not yessterday itself).
A date filter in your query getting the image from the database is still needed to prevent the images from older then a day being displayed.

Resources