Is it possible in influxdb to merge older shard groups

Is it possible in influxdb to merge older shard groups - influxdb

I´m using influx as a DB for my home automation. It seems that i did something wrong a year ago with my shard group duration so that i got a shard group for every hour. I changed the duration now but i´ve thousands of files containing my last year.
I´m looking for an possibility to merge older shard groups to remain my data but to reduce the number of files. I really tried to search but i didn´t found any solution till now.
I fixed the shard group to a week now. But the old ones are still there containing only one hour.

Related

Scheduling a Cognos report based on Dates from a data warehouse table

We have a report already written for Student Services, but we need to schedule it for specific times in the term; these times are from the date table in our data warehouse. For example, we need it on the first day of the term (one of the MANY dates defined in our date table), and two weeks prior to the first day of the term. If the current date is either one of these dates, we need the report to run; otherwise no. Should I use trigger-based Cognos reporting? Is there a way to do it in regular Cognos scheduling? Should I schedule it out of an external (Oracle) stored procedure?

We were able to set up Event Studio to first have a daily check to see if it is 14 days before term (had to add that to the date table), and 2 weeks after start of term (also in our date table). Set up the run condition, set up tasks for the reports required, then set up the email. Could not set up Run Agent in Event studio (IBM was singularly unhelpful here) so we scheduled it in Cognos. It runs like a charm.

Youtube Data API reduced to Zero Queries per day (Audit / Form)

i have a project on Google Cloud Platform, using the Youtube Data V3 API. Everything was going well, earlier this month after receiving several emails alerting that I had to do an Audit, the queries for the day stopped completely. they were completely zeroed.
I followed the link to perform the Audit, and i successfully completed all the changes that were told to me in my application, strictly following the regulations. The audit went well. No further changes were required from me.
But the issue is that the queries per day remain at zero. I can't edit. It occurred to me that maybe using the Google Cloud Trial there could be some change. Negative. I'm still unable to increase the limits, not even using the balance they give you as a gift.
The project used approximately a margin of about 25,000 to 300,000 queries / day. I have requested 500.000 queries / day filling in the quota expansion form to have a little more margin.
Meanwhile the project has been stopped for almost a month. If anyone knows something or how I should proceed about it,
Thank you very much.
Have a nice day,

DB folder utilising lot of space creating space issue

I have a grafana windows server.Where we have integrated HyperV snaphot related infor as well as CPU, Memory usage of HV's etc. I could see below folder in our grana windows server
C:\InfluxDB\data\telegraf\autogen
Under this autogen folder, I can see multiple subfolder with .tsm files. Each file create every 7 days and the folder size is around 4 to 5GB. There are many files in this autogen folder from 2nd Feb 2017 to 14 Mar 2018 which is utilizing around 225GB space.

What you see:
autogen is a default Retention Policy (RP) auto-created by InfluxDB and has an infinite data retention duration. All datapoints in Influx are logically stored in shards. Physically shards data is compressed and stored in .tsm files. Shards are unified into shards groups. Each shard group covers a specific time range defined by so-called shard duration and stores datapoints belonging to this time interval. By default for RP with retention duration > 6 month shard group duration is set to 7 days.
For more info see docs on storage engine.
Regarding your questions:
"Is there anyway we can shrink the size of autogen file?"
Probably no. The only thing you can do is to rely on InfluxDB internal compression. Here they say that it may be improved if you increase shard duration.
*Although, because InfluxDB drop the whole shard rather then separate datapoints, the increase of shard duration will make your data to be stored until the whole shard goes out of scope of current retention duration and only then it will be dropped. Though, if you have an infinite retention duration it doesn't matter. This leads us to the second question.
"Is it possible to delete the old file under autogen folder?"
If you can afford loosing old data or can't afford to much storage space InfluxDB lets to specify data Retention Policy (RP), already mentioned above. Basically, all your measurements are associated with a specific RP and the data will be deleted as soon as retention duration comes to the end. So if you specify a RP of 1 year, InfluxDB will automatically delete all datapoints older then now() - 1 year. RP is a standard (and pretty obvious) way of dealing with storage issues. A logical continuation of RP idea is to group and aggregate your data over time over longer discrete time intervals (downsampling). In Influx it can be achieved with Continuous Queries (CQ). You can read more of data retention and downsamping here.
In conclusion, storage limitation are inevitable and properly configured retention policies is the way to go.

Automatically clear old data

Is it possible automatically to clear old data in Influx DB? Let's say some configuration option to keep records for 1 month only? In my server I store quite much statistics, so preventing running the free storage out I would like to have such feature.

Yes, it's simple, just add shard with Retention on 7 days.

Retention policy to aggregate several metrics with regular expression in graphite

We are storing metrics having build number in the metric name. Here is the format of the metric in graphite.
latency.<host>.<request>.<buildNumber>.average
Issue with above format is that buildNumber is ever changing value and in our case it changes every week because of the release cycle. This results in new storage file(.wsp) every week and since whisper allocates space upfront, we never fully utilized the space because of changing build number.
I know disk space is cheap resource but still at some point I think we will have lot of unused space.
For e.g if each metric file is 10MB large and if we are sending 5000 different metrics for latency then for a particular build number we will use up 50GB. Now if every week we are sending a new build number then 1TB of disk space will get filled in 20 weeks which is roughly 5 months.(1TB = 1000GB)/(50GB per week) = 20 weeks
Above problem could be solved if we can aggregate multiple metrics in one of last month. Is there any way of specifying a retention policy where multiple metrics are merged in one using some aggregation method?
Or is there any way for tackling this kind of problem in graphite?

If you use the Ceres storage engine for Graphite instead of using Whisper, you will avoid the problems of pre-allocation of space. https://github.com/graphite-project/ceres
I don't believe you can, during downsampling, merge multiple metrics with a specified aggregation. However, you can do this at the point of ingestion via aggregation-rules.conf. Documentation can be found here: http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart