Is there a influxdb timeseries amount limit? - influxdb

Hello somebody know or have experiences if is there a limit of amount timeserie in influxdb. I pretend one timeserie each day with same schema, for example timeserie_2014_12_19_wd5, because i see in influxdb is possible query with wildcards. wd5 mean weekday 5.
Is there any problem for scalibility / managment / performance?
I pretend later build some continuous queries with same pattern. Maybe each year i will have around 1000 timeseries. But i will can compact them
thanks.

InfluxDB v0.8.x doesn't have hard limit on the number of series and I've seen it run with > 100K. v0.9.0 is in development now, with goals of significant increases in scalability and throughput.

Related

How to downsample data in AWS TimeStream

I understand AWS TimeStream allows data to be moved to different types of storage based on retention period but we also need data to be downsampled based on retention period.
For e.g.
48 hours, one second granularity
30 days, one minute granularity
10 years, one hour granularity
How can this be achieved?
I don't think timestream currently supports that in storage. The nature of time-series databases is that you write once & change very very seldom. So by the intention behind it, this kind of granularity change you'd do in the query with for example the bin() function.

Does influxDB have a limit to the number of databases you can have?

Perf testing a tool. We generate a bunch of metrics per test run. But we want to keep each test run separate. This seems like a db per run would allow us to do that, and at the same time allow us to give the tools we create to customers who would only have 1 install, and thus need only one db. But we are talking hundreds of db's... granted they should be smaller as most would only be for a set of metrics covering a couple of hours. But will influxdb limit us? or will performance suffer significantly?
Looks memory/virtual memory appears to be the limiting factor.
On two 16gb boxes I added 500 db's with data on one, and 500 sets of data to the same db on the other. The data was pretty small actually, the individual dbs were 440K after being loaded.
Memory use on the 500 db's was way higher.
500dbs 19.9g virtual, 3.3g resident
1db 2.5g virtual, .9g resident

Should I store a global counter or an aggregated value in a TSDB

This question is really about the data schema. I have a program which has a bunch of discrete events, and I want to get beautiful graphs out.
From my knowledge, I understand that I should really keep a counter of the number of events that have occurred, and on a regular interval, transfer that cumulative counter to the TSDB (as part of a cron job or similar).
What I currently have though is a system where the monitor, on a regular interval, tells the TSB how many events occurred during that interval (a fixed hard coded value!).
Which of these two design patterns is better? What are the factors that affect that decision? Do I have a counter value here or is it just a measurement?
I have various concerns, including but not limited to the efficiency of the monitoring tool.
You tagged the question with InfluxDB but it seems like what you are really asking about is the collection agent. For that I would look at Telegraf.
StatsD is also a really great lightweight API that is available for most major languages now, from which you can efficiently emit different types of stats (counters, timings, etc); either for every event or at a sample rate you define.
I implemented a solution that gather metrics emitted from my app using StatsD, metrics that were pulled (JMX queries), and basic host level stats you get for free with Telegraf. Every host (30+) runs a single telegraf instance which delivers its stats to a centralized InfluxDB server on some interval (i.e. 30 seconds).
So with an approach like that you get a good balance of performance and data precision.

Retention policy to aggregate several metrics with regular expression in graphite

We are storing metrics having build number in the metric name. Here is the format of the metric in graphite.
latency.<host>.<request>.<buildNumber>.average
Issue with above format is that buildNumber is ever changing value and in our case it changes every week because of the release cycle. This results in new storage file(.wsp) every week and since whisper allocates space upfront, we never fully utilized the space because of changing build number.
I know disk space is cheap resource but still at some point I think we will have lot of unused space.
For e.g if each metric file is 10MB large and if we are sending 5000 different metrics for latency then for a particular build number we will use up 50GB. Now if every week we are sending a new build number then 1TB of disk space will get filled in 20 weeks which is roughly 5 months.(1TB = 1000GB)/(50GB per week) = 20 weeks
Above problem could be solved if we can aggregate multiple metrics in one of last month. Is there any way of specifying a retention policy where multiple metrics are merged in one using some aggregation method?
Or is there any way for tackling this kind of problem in graphite?
If you use the Ceres storage engine for Graphite instead of using Whisper, you will avoid the problems of pre-allocation of space. https://github.com/graphite-project/ceres
I don't believe you can, during downsampling, merge multiple metrics with a specified aggregation. However, you can do this at the point of ingestion via aggregation-rules.conf. Documentation can be found here: http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf

how is one I/O measured in google app engine

I am making my first App. I am new to both SQL and GAE. Google Cloud SQL has tier "D0", which has "included I/O per day" of 200k. I have an example, could you please explain how many I/O's is this example?
Suppose I have a table in my Cloud SQL of 10 rows and 3 headers. the headers are "article name", "author", "date of publishing". so there are 30 fields in total. When a user starts my App and requests latest information, I want to send the user all 30 fields. I can send this to the user with a single SQL code.
Is the execution of that query counted as thirty I/O because 30 fields were transferred or one I/O because one SQL query was run?
Appreciate your help.
The pricing guide has this to say;
The number of I/O requests to storage made by your database instance depends on your queries, workload and data set. Cloud SQL will cache data in memory to serve your queries efficiently and to minimise the number of I/O requests.
In other words, neither of the two options, some queries may be served entirely from memory, generating no I/O, while some may generate many I/O requests. Optimising the database well with indexes will make your queries cheaper, generating table scans over large tables will cost more.
In short, same good practice rules apply as keeping a fast database as on a local machine, but not doing the optimisation won't just make your queries slower, but make them cost more.
The # of I/Os refers to disk operations. So that really depends on the query and the cached data.

Resources