I have two release data in different time intervals. But I want to plot these two releases in the grafana with same interval time. can this possible to fake the time interval and plot the graph? . Because x-axis default it takes time-series. So i can't go with other parameters.
Please suggest on this.
I've created graphs like this, although it took a bit of work.
To start out, InfluxDB doesn't support timeShift yet as detailed here:
https://github.com/influxdata/influxdb/issues/142
So I used a separate HTTP server called the influxdb-timeshift proxy:
https://github.com/maxsivanov/influxdb-timeshift-proxy
My stack looked like this:
Grafana Dashboard --> influxdb-timeshift-proxy --> InfluxDB
Here are descriptions of the two "-->" in the above schematic:
The --> on the left: I created a Grafana Datasource to point to the
tcp port of the influxdb-timeshift-proxy
The --> on the right: The "influxdb-timeshift-proxy" startup
configuration points to the InfluxDB server.
With this in place, to get the time shifting to happen, the SQL-like statements to InfluxDB need a carefully formatted field 'alias' like this:
"SELECT mean( "meanAT" ) AS shift_855296_seconds" blah blah sql blah.
See the influxdb-timeshift-proxy github page above for syntax details.
With a Grafana dashboard, to get two lines (aka series) on a time series graph, I configure two sql statements. The above represents one SQL from a test 9-10 days ago, then I'd SELECT a different test (my baseline that I ran today) with timeshift of 0:
"SELECT mean( "meanAT" ) AS shift_0_seconds" blah blah sql blah.
So that answers your question, but it is of limited use -- because some poor human has gotta calculate the difference between the test times then dial the result (shift_855296_seconds) into the SQL in the Dashboard.
Why? because out-of-the-box, Grafana Dashboards execute SQL statements that are (mostly) hard-configured into a dashboard.
To get Grafana to execute SQL where the shift alias is dynamically generated, I
wrote a Grafana Scripted dashboard in javascript. Here are the high-level instructions for scripted dashboards:
http://docs.grafana.org/reference/scripting/
FYI, Grafana scripted dashboards are poorly documented and the 'development environment' for debugging is primitive, at best, and I was unable to get the javascript 'require' thingy (that includes 3rd party libraries) to work. But there is limited help on the Grafana discussion board and it does actually work -- creating a very nice time shifting dashboard on the fly is possible.
The http URLs to launch/display the 'scripted dashboard' can easily be embedded in some other dashboard you create. Just add your scripted dashboard URL to a "Text Panel" using markdown:
http://docs.grafana.org/features/panels/text/
Ultimately, the influxdb-timeshift-proxy is a stop-gap solution.
I have not tried it, but it looks like Kapacitor can also be used to provide the timeshifting, as described here:
https://docs.influxdata.com/kapacitor/v1.3/nodes/shift_node/
--Erik
Do you mean the X-Axis Mode option on the Graph panel?
Not sure if I understand your question correctly.
If you want to just mark your release - you can use Annotations - http://docs.grafana.org/reference/annotations/#influxdb-annotations.
If you want to show dashboard only for specific timeframe - you can encode that in URL with 'from' and 'to' parameters - https://[your-dashboard-url]?from=1488868669245&to=1488873058626
But yes, there's no way how you can put a parameter on X-axis in current Grafana.
Related
If I send a gauge to Prometheus then the payload has a timestamp and a value like:
metric_name {label="value"} 2.0 16239938546837
If I query it on Prometheus I can see a continous line. Without sending a payload for the same metric the line stops. Sending the same metric after some minutes I get another continous line, but it is not connected with the old line.
Is this fixed in Prometheus how long a timeseries last without getting an update?
I think the first answer by Marc is in a different context.
Any timeseries in prometheus goes stale in 5m by default if the collection stops - https://www.robustperception.io/staleness-and-promql. In other words, the line stops on graph (or grafana).
So if you resume the metrics collection again within 5 minutes, then it will connect the line by default. But if there is no collection for more than 5 minutes then it will show a disconnect on the graph. You can tweak that on Grafana to ignore drops but that not ideal in some cases as you do want to see when the collection stopped instead of giving the false impression that there was continuous collection. Alternatively, you can avoid the disconnect using some functions like avg_over_time(metric_name[10m]) as needed.
There is two questions here :
1. How long does prometheus keeps the data ?
This depends on the configuration you have for your storage. By default, on local storage, prometheus have a retention of 15days. You can find out more in the documentation. You can also change this value with this option : --storage.tsdb.retention.time
2. When will I have a "hole" in my graph ?
The line you see on a graph is made by joining each point from each scrape. Those scrape are done regularly based on the scrape_interval value you have in your scrape_config. So basically, if you have no data during one scrape, then you'll have a hole.
So there is no definitive answer, this depends essentially on your scrape_interval.
Note that if you're using a function that evaluate metrics for a certain amount of time, then missing one scrape will not alter your graph. For example, using a rate[5m] will not alter your graph if you scrape every 1m (as you'll have 4 other samples to do the rate).
I have an API that fetches data packets from different servers. It formats this data to different small JSON units. I wrote an algorithm that sends them to graphite with the command json2graphite.
The sending works very well, the incoming data doesn't look bad either.
Now the problem:
The data displayed in graphite shows that each entry is followed by a null.
The data points that should be connected
I am aware that this data can also be connected using a function provided by the Graphite interface, but this doesn't help because Grafana boards always jump back and forth between value and null.
Is there a way to tell Grafana that it only goes to null if there was no data for more than 1 min or so?
I already tried to fix the problem with the data from "storage-schemas.conf" and "storage-aggregation.conf". Unfortunately without success.
storage-schemas.conf:
[default_1min_for_1day]
pattern = .*
retentions = 10s:6h,30s:8d,1m:31d,10m:1y,1h:5y
aggregation.conf:
[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average
If you want to know any more, ask me. : )
Grafana has an option to connect datapoints that are separated by nulls. You can see how to enable this in the screenshot shown under Display Styles settings on Grafana's documentation.
In Graphite composer you can also do it by specifying the connected line mode under Graph options here:
Additionally, you could use Graphite's keepLastValue function to carry the last received value over gaps where there are nulls.
I haven't found a direct solution but I will now try to minimize the interval between the entries. I noticed that the requests take much too long: 2-5 minutes.
There are probably too many servers, so the requests block the port too long.
The problem is not solved yet but I think I will mark it as solved if nobody says I have the problem within 5 days.
I'm trying to figure out the best or a reasonable approach to defining alerts in InfluxDB. For example, I might use the CPU batch tickscript that comes with telegraf. This could be setup as a global monitor/alert for all hosts being monitored by telegraf.
What is the approach when you want to deviate from the above setup for a host, ie instead of X% for a specific server we want to alert on Y%?
I'm happy that a distinct tickscript could be created for the custom values but how do I go about excluding the host from the original 'global' one?
This is a simple scenario but this needs to meet the needs of 10,000 hosts of which there will be 100s of exceptions and this will also encompass 10s/100s of global alert definitions.
I'm struggling to see how you could use the platform as the primary source of monitoring/alerting.
As said in the comments, you can use the sideload node to achieve that.
Say you want to ensure that your InfluxDB servers are not overloaded. You may want to allow 100 measurements by default. Only on one server, which happens to get a massive number of datapoints, you want to limit it to 10 (a value which is exceeded by the _internal database easily, but good for our example).
Given the following excerpt from a tick script
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|eval(lambda: "numMeasurements")
.as('value')
var customized = data
|sideload()
.source('file:///etc/kapacitor/customizations/demo/')
.order('hosts/host-{{.hostname}}.yaml')
.field('maxNumMeasurements',100)
|log()
var trigger = customized
|alert()
.crit(lambda: "value" > "maxNumMeasurements")
and the name of the server with the exception being influxdb and the file /etc/kapacitor/customizations/demo/hosts/host-influxdb.yaml looking as follows
maxNumMeasurements: 10
A critical alert will be triggered if value and hence numMeasurements will exceed 10 AND the hostname tag equals influxdb OR if value exceeds 100.
There is an example in the documentation handling scheduled downtimes using sideload
Furthermore, I have created an example available on github using docker-compose
Note that there is a caveat with the example: The alert flaps because of a second database dynamically generated. But it should be sufficient to show how to approach the problem.
What is the cost of using sideload nodes in terms of performance and computation if you have over 10 thousand servers?
Managing alerts manually directly in Chronograph/Kapacitor is not feasible for big number of custom alerts.
At AMMP Technologies we need to manage alerts per database, customer, customer_objects. The number can go into the 1000s. We've opted for a custom solution where keep a standard set of template tickscripts (not to be confused with Kapacitor templates), and we provide an interface to the user where only expose relevant variables. After that a service (written in python) combines the values for those variables with a tickscript and using the Kapacitor API deploys (updates, or deletes) the task on the Kapacitor server. This is then automated so that data for new customers/objects is combined with the templates and automatically deployed to Kapacitor.
You obviously need to design your tasks to be specific enough so that they don't overlap and generic enough so that it's not too much work to create tasks for every little thing.
I have an elasticsearch instance that receives logs from multiple backup routines. I'd like to query ES for these logs from Grafana and set up a panel that shows the last time for the different backups. Ideally I would also like to be able to show this in color if the time is longer than a certain threshold.
Basically the idea is to have a display that shows, for instance, green if a certain backup has been completed in the last 24 hours, and red if it hasn't.
How would I do this in Grafana with ES as the datasource?
Exact implementation depends on the used panel.
Example for singlestat: write ES query and then select Stat: Time of last point, you may need to select suitable unit/format:
Unfortunately, Grafana doesn't understand thresholds in your requested time format (older than 24 hours). You will need to return it as metric (for example as age of last backup in seconds) = you will need to write query for that. That means, that you will have 2 stats to show (last time + age), so you won't be able to use singlestat. Probably table panel will be better - you can use thresholding based on the age metric there.
In addition to the great answer by Jan Garaj, it looks like there is work being done to make this type of thing much easier in the future.
Check out this issue to check progress.
I have the current use case:
We have a system that computes different response time metrics for messages that we want to insert in InfluxDB. This system writes JSON entries to a file.
We use telegraf with JSON plugin to extract the fields we want and insert into InfluxDB.
So far so good.
But we have an issue with 1 particular information.
The system will emit messages where mId is the Unique identifier, in the below examples we have 2 uuidXXXX and uuidYYYY:
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeExitBus”:endTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
And what we want here is to graph the timeInBus which is equal to “timeWeExitBus-timeWeEnterBus” for each unique mId.
So my questions are:
IMU, uuid would be a field not a tag as it is unlimited, same for timeWeExitBus and timeWeEnterBus which would be numeric fields since we want to use functions on them. And timeInBus would be the measurement. Am I right ?
Is this use case a good one for Influx / Telegraf or are we misusing it for this ? IMU, it doesn’t look like a good use case to try to compute this on telegraf side, but I don’t see how to do it in InfluxDB, I initially thought ELAPSED function could help but I end up thinking it doesn’t work here
If it’s a good use case, could you point me to documentation helping implementing this ?