I'm having issues with Prometheus alerting rules. I have various cAdvisor specific alerts set up, for example:
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
title: 'Container CPU usage (instance {{ $labels.instance }})'
description: 'Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
When the condition is met, I can see the alert in the "Alerts" tab in Prometheus, however some labels are missing thus not allowing alertmanager to send a notification via Slack. To be specific, I attach custom "env" label to each target:
{
"targets": [
"localhost:8080",
],
"labels": {
"job": "cadvisor",
"env": "production",
"__metrics_path__": "/metrics"
}
}
But when the alert based on cadvisor metrics is firing, the labels are: alertname, instance and severity - no job label, no env label.
All the other alerts from other exporters (f.e. node-exporter) work just fine and the label is present.
This is due to the sum function that you use; it gathered all the time series present and added them together, groping BY (instance, name). If you run the same query in Prometheus, you will see that sum left only grouping labels:
{instance="foo", name="bar"} 135.38819037447163
Other aggregation methods like avg, max, min, etc, work in the same fashion. To bring the label back simply add env to the grouping list: by (instance, name, env).
Related
I need to verify if a user logging into the website has sufficient permissions to modify my bot's behavior on a guild. I used passport-discord to get information on the user, and this is what I got:
{
...
guilds: [
{
id: 'guild id',
name: 'guild name',
icon: 'guild icon',
owner: false,
permissions: 104189504,
features: [Array],
permissions_new: '1037338791488'
}
...
]
}
For confidentiality purposes, I replaced the guild information above "owner". Now my question: How to convert the "permissions" section into an array of the user's permissions ?
I don't know if there is an API where you can do it (probably not), but it's not that complicated to implement this. Discord provides two community-created calculators for permissions, which do the opposite thing - you select which permissions you need and you get a calculated permissions result:
https://discordapi.com/permissions.html#104189504
https://finitereality.github.io/permissions-calculator/?v=104189504
You could quite easily create a function that will do the reverse - convert the decimal permissions number into a hex number, then map the hexes to named permissions.
E.g. your permission from example 104189504 is 635CE40 in hex representation. Permissions are mapped to 1, 2, 4, or 8, so if you encounter any other hex, it means there is a sum. Thus, your permission int contains the following permissions:
4000000
2000000
200000
100000
40000
10000
8000
4000
800
400
200
40
Which you can then map to names, e.g. 4000000 is "Change nickname", and so on.
If you want to keep your bot's permission checks simple, you might find it sufficient to check if the member executing the command has a specific role.
I am using the consul exporter to ingest the health and status of my services into Prometheus. I'd like to fire alerts when the status of services and nodes in Consul is critical and then use tags extracted from Consul when routing those alerts.
I understand from this discussion that service tags are likely to be exported as a separate metric, but I'm not sure how to join one series with another so I can leverage the tags with the health status.
For example, the following query:
max(consul_health_service_status{status="critical"}) by (service_name, status,node) == 1
could return:
{node="app-server-02",service_name="app-server",status="critical"} 1
but I'd also like 'env' from this series:
consul_service_tags{node="app-server-02",service_name="app-server",env="prod"} 1
to get joined along node and service_name to pass the following to the Alertmanager as a single series:
{node="app-server-02",service_name="app-server",status="critical",env="prod"} 1
I could then match 'env' in my routing.
Is there any way to do this? It doesn't look to me like any operations or functions give me the ability to group or join like this. As far as I can see, the tags would already need to be labels on the consul_health_service_status metric.
You can use the argument list of group_left to include extra labels from the right operand (parentheses and indents for clarity):
(
max(consul_health_service_status{status="critical"})
by (service_name,status,node) == 1
)
+ on(service_name,node) group_left(env)
(
0 * consul_service_tags
)
The important part here is the operation + on(service_name,node) group_left(env):
the + is "abused" as a join operator (fine since 0 * consul_service_tags always has the value 0)
group_left(env) is the modifier that includes the extra label env from the right (consul_service_tags)
The answer in this question is accurate. I want to also share a clearer explanation on joining two metrics preserving SAME Labels (might not be directly answering the question). In these metrics following label is there.
name (eg: aaa, bbb, ccc)
I have a metric name metric_a, and if this returns no data for some of the labels, I wish to fetch data from metric_b. i.e:
metric_a has values for {name="aaa"} and {name="bbb"}
metric_b has values for {name="ccc"}
I want the output to be for all three name labels. The solution is to use or in Prometheus.
sum by (name) (increase(metric_a[1w]))
or
sum by (name) (increase(metric_b[1w]))
The result of this will have values for {name="aaa"}, {name="bbb"} and {name="ccc"}.
It is a good practice in Prometheus ecosystem to expose additional labels, which can be joined to multiple metrics, via a separate info-like metric as explained in this article. For example, consul_service_tags metric exposes a set of tags, which can be joined to metrics via (service_name, node) labels.
The join is usually performed via on() and group_left() modifiers applied to * operation. The * doesn't modify values for time series on the left side because info-like metrics usually have constant 1 values. The on() modifier is used for limiting the labels used for finding matching time series on the left and the right side of *. The group_left() modifier is used for adding additional labels from time series on the right side of *. See these docs for details.
For example, the following PromQL query adds env label from consul_service_tags metric to consul_health_service_status metric with the same set of (service_name, node) labels:
consul_health_service_status
* on(service_name, node) group_left(env)
consul_service_tags
Additional label filters can be added to consul_health_service_status if needed. For example, the following query returns only time series with status="critical" label:
consul_health_service_status{status="critical"}
* on(service_name, node) group_left(env)
consul_service_tags
I've been setting up Grafana to pull some annotations from an InfluxDB database.
It seems that when multiple annotations exist within the same millisecond, Grafana will only display the last one.
Is there a way to display multiple annotations that occurred within the same millisecond ? This is for a high-time precison project so I prefer to avoid hacking it by modifying event timestamps.
Here's an example InfluxDB database:
> select * from events;
name: events
time key name title
---- --- ---- -----
1515664469946000001 as_start event1 test
1515664469946999999 as_start event4 test
1515664469947000000 as_start event3 test
1515664469956000000 as_start event2 test
I use the following query in Grafana:
select "name","title","key" from events WHERE $timeFilter
Which yields this:
graph screenshot
"event1" is not visible and was instead "overwritten" by "event4". "event3" and "event2" are visible however.
Thanks!
select SUM(value)
from /measurment1|measurment2/
where time > now() - 60m and host = 'hostname' limit 2;
Name: measurment1
time sum
---- ---
1505749307008583382 4680247
name: measurment2
time sum
---- ---
1505749307008583382 3004489
But is it possible to get value of SUM(measurment1+measurment2) , so that I see only o/p .
Not possible in influx query language. It does not support functions across measurements.
If this is something you require, you may be interested in layering another API on top of influx that do this, like Graphite via Influxgraph.
For the above, something like this.
/etc/graphite-api.yaml:
finders:
- influxgraph.InfluxDBFinder
influxdb:
db: <your database>
templates:
# Produces metric paths like 'measurement1.hostname.value'
- measurement.host.field*
Start the graphite-api/influxgraph webapp.
A query /render?from=-60min&target=sum(*.hostname.value) then produces the sum of value on tag host='hostname' for all measurements.
{measurement1,measurement2}.hostname.value can be used instead to limit it to specific measurements.
NB - Performance wise (of influx), best to have multiple values in the same measurement rather than same value field name in multiple measurements.
I have a list of classes that extracts info from the web. Every time each one of them saves something, it sends a different counter to graphite. So, every one of them is a different metric.
How do I know how many of them satisfy a certain condition??
For example, let:
movingAverage(summarize(groupByNode(counters.crawlers.*.saved, 2, "sumSeries), "1hour"), 24)
be the average of content download in past 24 hours. How can i know, at a moment "t", how many of my metrics have this value above 0?
In the rendering endpoint, add format=json. This would return the data-points with corresponding epochs in JSON, which is a breeze to parse. The time-stamps wherein your script sent nothing will be NULL.
[{
"target": "carbon.agents.ip-10-0-0-228-a.metricsReceived",
"datapoints":
[
[912, 1383888170],
[789, 1383888180],
[800, 1383888190],
[null, 1383888200],
[503, 1383888210],
[899, 1383888220]
]
}]
You can use the currentAbove function. Check it out. For example:
currentAbove(stats.route.*.servertime.*, 5)
The above example gets all of the metrics (in the series) that are above 5.
You could then count the number of targets returned, and while Graphite doesn't provide a way to count the "buckets" you should be able to capture it fairly easily.
For example, a way to get a quick count (using curl to pipe to grep and count on the word "target"). Like so:
> curl -silent http://test.net/render?target=currentAbove(stats.cronica.dragnet.messages.*, 5)&format=json \
> | grep -Po "target" | grep -c "target"