I like some functions of influxDB that's why I would like to use it instead of just MySQL etc.
In my case I need to pull from the DB exactly the same time series I pushed into it and any data change between what I put and what I got is considered a data corruption.
Is it possible to disable downsampling in InfluxDB?
as per
documentation
these are features: Continuous Queries (CQ) and Retention Policies (RP) but they are not optional and forced to be used. Am I right? or there is a way of turning these things off?
Is there any other time series database that supports statistical functions and works with Grafana but does not have downsampling (or it is optional)?
Continuous Queries (CQ) and Retention Policies (RP) are optional. You don't need to use them. You can use default retention policy named autogen which has infinite retention and you can keep data with original granularity forever (= unless you will reach some resource limits - disk/memory/response times/...).
Related
We know that Prometheus has three phases of data storage:
In-memory: this is where the recent memory is stored. It allows for fast query using PromQL as it is RAM memory. [Am I wrong?]
After a few hours the in-memory data is formally saved to the disk in the format of Blocks.
After the data retention period is over, data is stored in a remote access.
I wanted to ask if it is efficient to query over the data stored in the remote access. If I need a lot of metrics to monitor for my org, do I need Grafana Mimir, which handles upto 1 billion active metrics.
Also, as a side question, how many MBs/GBs metrics can Prometheus store before the retention period is over?
Sparingly. Yes. Prom wont like it if you try query over a few years for example since it will go to storage for everything. but getting metrics from storage for an hour is easy and wont be a problem.
how many MBs/GBs metrics can Prometheus store? Its irrelevant. The retention period is intendant of the amount of data stored. You can store 100MB in a day or 100GB in a day it doesn't matter. What will matter is cardinality
I am wondering what the minimum time is for Prometheus' scrape_interval parameter. According to the Prometheus Documentation, the value for this parameter needs to follow a regex which seems to me that only intervals equal or greater than 1 second are allowed, since, e.g. "1ms" or "0.01s" do not match this regex. In my application however, I would like to have scraping in milliseconds, so I am interested in whether this is possible with Prometheus.
Many thanks in advance!
According to the Prometheus documentation, the minimum value you can give for the scrape_interval seems to be 0 (according to the given regex in the docs).
Regex - ((([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?|0)
According to this regex, you can specify scrape_interval in ms as well. But you need to specify it as 0s1ms. This is because if you specify the time as 1ms; 1m will match with minutes and remaining s will cause an error (didn't really test this scenario, but looks like this is the expected outcome by looking at the regex).
While Prometheus supports scrape intervals smaller than one second as described in this answer, it isn't recommended to use scrape_interval values smaller than one second because of the following issues:
Non-zero network delays between Prometheus and scrape target. These delays are usually in the range 0.1ms - 100ms depending on the distance between Prometheus and scrape target.
Non-zero delays in scrape target's response handler, which generates the response for Prometheus.
These non-deterministic delays may introduce big relative errors to scrape timings for scrape_interval values smaller than one second.
Too small scrape_interval values also may result in scrape errors if the target cannot be scraped during the configured scrape interval. In this case Prometheus would store up=0 metric for every unsuccessful scrape. See these docs about up metric.
P.S. If you need storing high-frequency samples into time series, then it would be better pushing these samples directly to a monitoring system, which supports push protocols for data ingestion. For example, VictoriaMetrics supports popular push protocols such as Influx, Graphite, OpenTSDB, CSV, DataDog etc. - see these docs for details. It supports timestamps with millisecond precision. If you need even higher precision for timestamps, then take a look at InfluxDB - it supports timestamps with nanosecond precision. Note that too high precision for timestamps usually leads to increased resource usage - disk space, RAM, CPU.
Disclosure: I work on VictoriaMetrics.
I'm looking into our company using Prometheus to gather stats from our experiments which run on Kubernetes. There's a plan to use labels to mark the name of specific experiments in our cloud / cluster. This means we will generate a lot of labels which will hog storage over time. When the associated time series have expired, will the labels also be deleted?
tldr; From an operational perspective, Prometheus does not differentiate between time-series names and labels; by deleting your experiment data you will effectively recover the labels you created.
What follows is only relevant to Prometheus >= 2.0
Prometheus stores a times series for each unique combination of metric name, label, and label value. So my_metric{my_tag="a"}, my_metric{my_tag="b"}, and your_metric{} are all just different time series; there is nothing special about labels or label values vs. metrics names.
Furthermore, Prometheus stores data in 2-hour frames on disk. So any labels you've created do not effect operations of your database after two hours, except for on-disk storage size, and query performance if you actually access that older data. Both of these concerns are addressed after your data is purged. Experiment away!
hello i really want to use bosun/tsdbrelay/opentsdb with the telegraf collector, as it gets all the metrics we want to monitor out of the box.
i allready have a small setup to push metrics from 5 servers to bosun for indexing and opentsdb for storage.
i used the haproxy configs from kyle brandts bosun infrastructure blog to make the tsdbs ha-ready
but bosun is showing that it cannot use the auto-type for metrics, and also in the primary stats view does not show any graphs for cpu / mem etc.
what can i provide that the graphs show up.
kind regards.
Both of these features are mostly scollector specific. The "host" view (I've considered ripping that out, it was done in the early days, better to use something like grafana) depends on scollector specific metrics such as os.cpu.
As far as "Auto" for rate vs gauge, that is also metadata that comes from scollector and sent to bosun. If you want to try to mimic the behavior see https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L30 and https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L195 - you would need to create at least the "rate" key for each metric you are getting from telegraph.
This is in the context of a small data-center setup where the number of servers to be monitored are only in double-digits and may grow only slowly to few hundreds (if at all). I am a ganglia newbie and have just completed setting up a small ganglia test bed (and have been reading and playing with it). The couple of things I realise -
gmetad supports interactive queries on port 8652 using which I can get metric data subsets - say data of particular metric family in a specific cluster
gmond seems to always return the whole dump of data for all metrics from all nodes in a cluster (on doing 'netcat host 8649')
In my setup, I dont want to use gmetad or RRD. I want to directly fetch data from the multiple gmond clusters and store it in a single data-store. There are couple of reasons to not use gmetad and RRD -
I dont want multiple data-stores in the whole setup. I can have one dedicated machine to fetch data from the multiple, few clusters and store them
I dont plan to use gweb as the data front end. The data from ganglia will be fed into a different monitoring tool altogether. With this setup, I want to eliminate the latency that another layer of gmetad could add. That is, gmetad polls say every minute and my management tool polls gmetad every minute will add 2 minutes delay which I feel is unnecessary for a relatively small/medium sized setup
There are couple of problems in the approach for which I need help -
I cannot get filtered data from gmond. Is there some plugin that can help me fetch individual metric/metric-group information from gmond (since different metrics are collected in different intervals)
gmond output is very verbose text. Is there some other (hopefully binary) format that I can configure for export?
Is my idea of eliminating gmetad/RRD completely a very bad idea? Has anyone tried this approach before? What should I be careful of, in doing so from a data collection standpoint.
Thanks in advance.