I'm using InfluxData stack for anomaly detection in time series data, using InfluxDB and Kapacitor.
I collected some open source samples and set the following tick script for detecting anomalies:
batch
.query('select mean(value) from "nycTaxi"."default"."nycTaxi"')
.period(1h)
.every(2h)
.groupBy(time(1h))
.mapReduce(influxql.percentile('mean', 90.0))
.eval(lambda: sigma("percentile"))
.as('sigma')
.keep('percentile', 'sigma')
.alert()
.warn(lambda: "sigma" > 2.0)
.log('/path/alerts.log')
.crit(lambda: "sigma" > 3.0)
.log('/path/alerts.log')
Obtaining alerts like the following:
{"id":"nycTaxi:nil",
"message":"nycTaxi:nil is WARNING",
"time":"2016-09-13T14:43:21.892057062Z",
"level":"WARNING",
"data":{
"series":[
{
"name":"nycTaxi",
"columns":[
"time",
"percentile",
"sigma"
],
"values":[
[
"2016-09-13T14:43:21.892057062Z",
1279,
2.002345963142575
]]}]}}
To record the data I used this line kapacitor record batch -start 2014-07-01T00:00:00Z -stop 2015-02-31T00:00:00Z -name nyc
For some reason Kapacitor interprets the time as a 2016 date when in the DB the oldest date is 2015-01-31. Why does this happen?
InfluxDb feeds Kapacitor with data kind of in real-time (it's not really intended to go backwards through all your historical data, it was meant as in-time analysis/alerting tool).
Your current query basically just looks at the most recent data (1h) so that's why you're seeing 2016 in there. That is by design. If you want to check for anomalies in your historical data, you will have to write a small program (for example using an InfluxDb library for the language of your choice) which will go through all your old data hour-by-hour, fetch it and from there analyze it. You could also perhaps use backfills for this.
I posted an issue in the Kapacitor repo and the solution to my problem was to use the following line for replaying the data kapacitor replay -id RECORDING_ID -name nyc -fast -rec-time. The key here is the flag -rec-time which solved the issue.
Kudos to Nathanielc that solved the issue.
Related
If I send a gauge to Prometheus then the payload has a timestamp and a value like:
metric_name {label="value"} 2.0 16239938546837
If I query it on Prometheus I can see a continous line. Without sending a payload for the same metric the line stops. Sending the same metric after some minutes I get another continous line, but it is not connected with the old line.
Is this fixed in Prometheus how long a timeseries last without getting an update?
I think the first answer by Marc is in a different context.
Any timeseries in prometheus goes stale in 5m by default if the collection stops - https://www.robustperception.io/staleness-and-promql. In other words, the line stops on graph (or grafana).
So if you resume the metrics collection again within 5 minutes, then it will connect the line by default. But if there is no collection for more than 5 minutes then it will show a disconnect on the graph. You can tweak that on Grafana to ignore drops but that not ideal in some cases as you do want to see when the collection stopped instead of giving the false impression that there was continuous collection. Alternatively, you can avoid the disconnect using some functions like avg_over_time(metric_name[10m]) as needed.
There is two questions here :
1. How long does prometheus keeps the data ?
This depends on the configuration you have for your storage. By default, on local storage, prometheus have a retention of 15days. You can find out more in the documentation. You can also change this value with this option : --storage.tsdb.retention.time
2. When will I have a "hole" in my graph ?
The line you see on a graph is made by joining each point from each scrape. Those scrape are done regularly based on the scrape_interval value you have in your scrape_config. So basically, if you have no data during one scrape, then you'll have a hole.
So there is no definitive answer, this depends essentially on your scrape_interval.
Note that if you're using a function that evaluate metrics for a certain amount of time, then missing one scrape will not alter your graph. For example, using a rate[5m] will not alter your graph if you scrape every 1m (as you'll have 4 other samples to do the rate).
I have the current use case:
We have a system that computes different response time metrics for messages that we want to insert in InfluxDB. This system writes JSON entries to a file.
We use telegraf with JSON plugin to extract the fields we want and insert into InfluxDB.
So far so good.
But we have an issue with 1 particular information.
The system will emit messages where mId is the Unique identifier, in the below examples we have 2 uuidXXXX and uuidYYYY:
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeExitBus”:endTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
And what we want here is to graph the timeInBus which is equal to “timeWeExitBus-timeWeEnterBus” for each unique mId.
So my questions are:
IMU, uuid would be a field not a tag as it is unlimited, same for timeWeExitBus and timeWeEnterBus which would be numeric fields since we want to use functions on them. And timeInBus would be the measurement. Am I right ?
Is this use case a good one for Influx / Telegraf or are we misusing it for this ? IMU, it doesn’t look like a good use case to try to compute this on telegraf side, but I don’t see how to do it in InfluxDB, I initially thought ELAPSED function could help but I end up thinking it doesn’t work here
If it’s a good use case, could you point me to documentation helping implementing this ?
I'm bootstrapping a brand new TICK stack and really loving the whole system overall. . . however, there's one bit about Kapacitor which is puzzling me.
If you look at the diagram here: https://www.influxdata.com/time-series-platform/kapacitor/, there's one arrow connecting Telegraf to Kapacitor. Telegraf can send metric data directly to Influx which makes me wonder what the use case is for forwarding through Kapacitor to Influx.
The only use case which comes to mind is that you can move processing logic out of agent plugins into Kapacitor and thereby minimize the agent's footprint.
Long story short, am I missing something here, is that the use case implied by the arrow from Kapacitor to Influx?
Kapacitor gives you the ability to process data streams (or read from an existing influxdb instance) and write out to influxdb. The beauty of this is having a separate process altogether handling your data processing from the primary backend.
A classic example is downsampling. If you wanted to do this in influxdb directly you'd need to handle a continuous query to do that for you...but they're somewhat of a pain to manage. Kapacitor can help make this easier as follows:
stream
|from()
.database('telegraf')
.measurement('cpu')
.groupBy(*)
|window()
.period(5m)
.every(5m)
.align()
|mean('usage_idle')
.as('usage_idle')
|influxDBOut()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('mean_cpu_idle')
.precision('s')
Hope that helps!
I have two release data in different time intervals. But I want to plot these two releases in the grafana with same interval time. can this possible to fake the time interval and plot the graph? . Because x-axis default it takes time-series. So i can't go with other parameters.
Please suggest on this.
I've created graphs like this, although it took a bit of work.
To start out, InfluxDB doesn't support timeShift yet as detailed here:
https://github.com/influxdata/influxdb/issues/142
So I used a separate HTTP server called the influxdb-timeshift proxy:
https://github.com/maxsivanov/influxdb-timeshift-proxy
My stack looked like this:
Grafana Dashboard --> influxdb-timeshift-proxy --> InfluxDB
Here are descriptions of the two "-->" in the above schematic:
The --> on the left: I created a Grafana Datasource to point to the
tcp port of the influxdb-timeshift-proxy
The --> on the right: The "influxdb-timeshift-proxy" startup
configuration points to the InfluxDB server.
With this in place, to get the time shifting to happen, the SQL-like statements to InfluxDB need a carefully formatted field 'alias' like this:
"SELECT mean( "meanAT" ) AS shift_855296_seconds" blah blah sql blah.
See the influxdb-timeshift-proxy github page above for syntax details.
With a Grafana dashboard, to get two lines (aka series) on a time series graph, I configure two sql statements. The above represents one SQL from a test 9-10 days ago, then I'd SELECT a different test (my baseline that I ran today) with timeshift of 0:
"SELECT mean( "meanAT" ) AS shift_0_seconds" blah blah sql blah.
So that answers your question, but it is of limited use -- because some poor human has gotta calculate the difference between the test times then dial the result (shift_855296_seconds) into the SQL in the Dashboard.
Why? because out-of-the-box, Grafana Dashboards execute SQL statements that are (mostly) hard-configured into a dashboard.
To get Grafana to execute SQL where the shift alias is dynamically generated, I
wrote a Grafana Scripted dashboard in javascript. Here are the high-level instructions for scripted dashboards:
http://docs.grafana.org/reference/scripting/
FYI, Grafana scripted dashboards are poorly documented and the 'development environment' for debugging is primitive, at best, and I was unable to get the javascript 'require' thingy (that includes 3rd party libraries) to work. But there is limited help on the Grafana discussion board and it does actually work -- creating a very nice time shifting dashboard on the fly is possible.
The http URLs to launch/display the 'scripted dashboard' can easily be embedded in some other dashboard you create. Just add your scripted dashboard URL to a "Text Panel" using markdown:
http://docs.grafana.org/features/panels/text/
Ultimately, the influxdb-timeshift-proxy is a stop-gap solution.
I have not tried it, but it looks like Kapacitor can also be used to provide the timeshifting, as described here:
https://docs.influxdata.com/kapacitor/v1.3/nodes/shift_node/
--Erik
Do you mean the X-Axis Mode option on the Graph panel?
Not sure if I understand your question correctly.
If you want to just mark your release - you can use Annotations - http://docs.grafana.org/reference/annotations/#influxdb-annotations.
If you want to show dashboard only for specific timeframe - you can encode that in URL with 'from' and 'to' parameters - https://[your-dashboard-url]?from=1488868669245&to=1488873058626
But yes, there's no way how you can put a parameter on X-axis in current Grafana.
After a mailing at t0, I will have several "delivered" (and open and click) events (schema and example)
mailing_name, timestamp, email_id, event_type
niceattack, 2016-07-14 12:11:00, 42, open
niceattack, 2016-07-14 12:11:08, 842, open
niceattack, 2016-07-14 12:11:34, 847, open
I would like to see for a mailing how long it takes to be delivered to half of the recipients. So say that I'm sending an email to 1000 addresses now, the first open event is in 2 min, the last one is going to be in a week (and min/max first last seems to be easy to find) but what I'd like to see is that half of the recipients opened it in the first 2 hours after it was sent.
The goal is to send being able to compare is sending now vs on sat morning makes a difference on how fast it's open on average, or if one specific mailing get quicker exposure, and correlate that with other events (how many click on a link, take a specific action on our site...)
I tried to use a cumulate function (how many open event for mailing for each point), but it seems that the cumulative function isn't yet implemented https://github.com/influxdata/influxdb/issues/813
How do you solve that problem with influxdb?
Solving this problem with InfluxDB alone is not currently possible, however if you're willing to add Kapacitor into the mix, then it should be possible. In particular you'll need to write a User Defined Function (UDF) for that cumulative function in Kapacitor.
The general process will look like the following:
Install and Configure Kapacitor
Create a UDF for the cumulative function you're looking for
Enable that UDF inside of Kapacitor
Write a TICKscript that uses the UDF and writes the results back to InfluxDB
Enable a task defined by the TICKscript you've written
Query the InfluxDB instance to get the results of the cumulative function.
My appoligies for being so high level on this. This is a fairly involved process, but should give you the result you're looking for.