DB folder utilising lot of space creating space issue - influxdb

I have a grafana windows server.Where we have integrated HyperV snaphot related infor as well as CPU, Memory usage of HV's etc. I could see below folder in our grana windows server
C:\InfluxDB\data\telegraf\autogen
Under this autogen folder, I can see multiple subfolder with .tsm files. Each file create every 7 days and the folder size is around 4 to 5GB. There are many files in this autogen folder from 2nd Feb 2017 to 14 Mar 2018 which is utilizing around 225GB space.

What you see:
autogen is a default Retention Policy (RP) auto-created by InfluxDB and has an infinite data retention duration. All datapoints in Influx are logically stored in shards. Physically shards data is compressed and stored in .tsm files. Shards are unified into shards groups. Each shard group covers a specific time range defined by so-called shard duration and stores datapoints belonging to this time interval. By default for RP with retention duration > 6 month shard group duration is set to 7 days.
For more info see docs on storage engine.
Regarding your questions:
"Is there anyway we can shrink the size of autogen file?"
Probably no. The only thing you can do is to rely on InfluxDB internal compression. Here they say that it may be improved if you increase shard duration.
*Although, because InfluxDB drop the whole shard rather then separate datapoints, the increase of shard duration will make your data to be stored until the whole shard goes out of scope of current retention duration and only then it will be dropped. Though, if you have an infinite retention duration it doesn't matter. This leads us to the second question.
"Is it possible to delete the old file under autogen folder?"
If you can afford loosing old data or can't afford to much storage space InfluxDB lets to specify data Retention Policy (RP), already mentioned above. Basically, all your measurements are associated with a specific RP and the data will be deleted as soon as retention duration comes to the end. So if you specify a RP of 1 year, InfluxDB will automatically delete all datapoints older then now() - 1 year. RP is a standard (and pretty obvious) way of dealing with storage issues. A logical continuation of RP idea is to group and aggregate your data over time over longer discrete time intervals (downsampling). In Influx it can be achieved with Continuous Queries (CQ). You can read more of data retention and downsamping here.
In conclusion, storage limitation are inevitable and properly configured retention policies is the way to go.

Related

How to downsample data in AWS TimeStream

I understand AWS TimeStream allows data to be moved to different types of storage based on retention period but we also need data to be downsampled based on retention period.
For e.g.
48 hours, one second granularity
30 days, one minute granularity
10 years, one hour granularity
How can this be achieved?
I don't think timestream currently supports that in storage. The nature of time-series databases is that you write once & change very very seldom. So by the intention behind it, this kind of granularity change you'd do in the query with for example the bin() function.

How to release the deleted shard storage in InfluxDB?

I changed the retention policy of a database to keep data only for one day and after that, I dropped all shards created from autogen RP, but InfluxDB still takes a huge part of the storage on /var/lib/influxdb/data/<DB NAME>/_series folder and it's increasing continuously.
How can I release the storage related to the deleted points?

Parse 100GB File Storage Limit

Hey guys so I developed a social network on iOS and used parse for the back end. Our app has taken off and over 50,000 images have been posted in ten days. Aside from hitting the 600 req/sec api limit soon it appears we might fill up the 100gb storage limit sooner. Does this limit (file storage) reset monthly, or once you hit 100gb you are done. It seems like a tiny amount of storage for a PaaS company.
According to the Parse.com website, you receive 2TB file storage in with any package, not 100GB. If you're asking if they give you an additional 2TB each month, the answer would be no. At the beginning of the next month, you are still using the space, it does not reset (unlike, for example, bandwidth). This is the case with (probably) all cloud (SaaS, IaaS, PaaS, etc.) providers. You can increase the amount of file storage for 10c/GB per month.
As for database storage, it seems that 100GB is the hard limit. Again, being storage, you do not get an extra 100GB per month.
If your database is larger than 100GB and you are hitting more than 600 req/sec (averaged over a minute - i.e. 36000 req/min) then you may want to consider building your own infrastructure, perhaps in AWS or similar, so you can scale it properly. You may also want to consider moving your uploaded images outside of the database if they are not already - DB storage is considerably more expensive than file storage - both in terms of cost and performance.
Parse.com has larger plans available - up to a point.
HOWEVER - if you are going to be doing 600 requests a second (wow, since that's 50 MILLION requests a day) you'll need to look at two possibilities:
You can keep you requests under this limit by using local caching, streamlined calls, etc.
You will eventually need to migrate off of the Parse ecosystem.
If memory serves, there used to be an option to get a custom plan with more requests/second. It seems to have disappeared from the Pricing page, to be replaced with this:
What is the cost for an app with a burst limit above 600 requests per second? What happens if I require more than 600 requests/second?
We do not provide custom plans for apps that require more than 600 requests per second.
UPDATE: It looks like there is also a hard limit of 100GB of file storage...
The overage rate for database size is $10/GB but we only allow increases in increments of 20GB. When you you exceed 20GB of database size we will increase your soft limit to 40GB and begin charging you an incremental $200/month. When you hit your soft limit of 40GB we will increase your soft limit to 60GB... and so on up to a hard limit of 100GB.

Automatically clear old data

Is it possible automatically to clear old data in Influx DB? Let's say some configuration option to keep records for 1 month only? In my server I store quite much statistics, so preventing running the free storage out I would like to have such feature.
Yes, it's simple, just add shard with Retention on 7 days.

Retention policy to aggregate several metrics with regular expression in graphite

We are storing metrics having build number in the metric name. Here is the format of the metric in graphite.
latency.<host>.<request>.<buildNumber>.average
Issue with above format is that buildNumber is ever changing value and in our case it changes every week because of the release cycle. This results in new storage file(.wsp) every week and since whisper allocates space upfront, we never fully utilized the space because of changing build number.
I know disk space is cheap resource but still at some point I think we will have lot of unused space.
For e.g if each metric file is 10MB large and if we are sending 5000 different metrics for latency then for a particular build number we will use up 50GB. Now if every week we are sending a new build number then 1TB of disk space will get filled in 20 weeks which is roughly 5 months.(1TB = 1000GB)/(50GB per week) = 20 weeks
Above problem could be solved if we can aggregate multiple metrics in one of last month. Is there any way of specifying a retention policy where multiple metrics are merged in one using some aggregation method?
Or is there any way for tackling this kind of problem in graphite?
If you use the Ceres storage engine for Graphite instead of using Whisper, you will avoid the problems of pre-allocation of space. https://github.com/graphite-project/ceres
I don't believe you can, during downsampling, merge multiple metrics with a specified aggregation. However, you can do this at the point of ingestion via aggregation-rules.conf. Documentation can be found here: http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf

Resources