Prometheus AlertManager - Send Alerts to different clients based on routes - monitoring

I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.
I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?
My AlertManager YAML file is:
route:
receiver: X
receivers:
- name: X
email_configs:
- name: Y
email_configs:
And alert.rule files is:
groups:
- name: A
rules:
- alert: A_down
expr: expression
for: 1m
labels:
severity: critical
annotations:
summary: "A is down"
- name: B
rules:
- alert: B_down
expr: expression
for: 1m
labels:
severity: warning
annotations:
summary: "B is down"

The config should roughly look like this (not tested):
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: X
- match:
alertname: B_down
receiver: Y
The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.

For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:
SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route)
-> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...
For example:
if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.
There's 3 files important to configure alerts on your prometheus system:
alertmanager.yml - configuration of you routes (getting the triggered
errors) and receivers (how to handle this errors)
rules.yml - This rules will contain all the thresholds and rules
you'll define in your system.
prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).
I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it):
On /var/data/prometheus-stack/alertmanager/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'JohnDoe#gmail.com'
route:
receiver: defaultTrigger
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
routes:
- match_re:
service: service_overload
owner: ATeam
receiver: pagerDutyTrigger
receivers:
- name: 'pagerDutyTrigger'
pagerduty_configs:
- send_resolved: true
routing_key: <myPagerDutyToken>
Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml
groups:
- name: alerts
rules:
- alert: service_overload_more_than_5000
expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
for: 10m
labels:
service: service_overload
severity: pager
dev_team: myteam
annotations:
dev_team: myteam
priority: Blocker
identifier: '{{ $labels.name }}'
description: 'service overflow'
value: '{{ humanize $value }}%'
On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:
global:
...
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
rule_files:
- "yourRuleFile.yml"
...
Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.
Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts

Related

Cypher queries fails with Neo4jError: Unknown function 'apoc.convert.fromJsonMap' but apoc should be installed

I deployed Neo4j in my AKS cluster using the standalone Helm chart.
It all gets deployed and my Node.js server connects to Neo4j correctly.
However queries throw the Neo4jError: Unknown function 'apoc.convert.fromJsonMap' error, so apoc is clearly missing.
I followed the procedure described here https://neo4j.com/docs/operations-manual/current/kubernetes/configuration/#operations-installing-plugins and my Values are here below.
The only difference I find is that in the guide apoc core is actually enabled afterwards by upgrading the helm chart, while I'm installing it with the option enabled already.
Looking at https://neo4j.com/docs/apoc/current/config/ I saw
As of Neo4j v.5.0, APOC config settings are no longer supported in the neo4j.conf file. Please move all apoc.* settings to apoc.conf. It is also possible to set the config settings using environment variables.
so as neo4j-standalone is using version 4.4.16 I moved the apoc configurations from apoc.config to neo4.config but still apoc procedures are not found by the queries.
Is there something I'm missing out to configure in order to enable apoc?
Thank you very much.
neo4j-db:
# neo4j-standalone:
nameOverride: "neo4j"
fullnameOverride: 'neo4j'
neo4j:
# Name of your cluster
name: "fixit-neo4j" # this will be the label: app: value for the service selector
password: "password"
##
passwordFromSecret: ""
passwordFromSecretLookup: false
edition: "community"
acceptLicenseAgreement: "yes"
offlineMaintenanceModeEnabled: false
resources:
cpu: "1000m"
memory: "2Gi"
volumes:
data:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-data
resources:
requests:
storage: 4Gi
backups:
mode: 'share' # share an existing volume (e.g. the data volume)
share:
name: 'logs'
logs:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-logs
resources:
requests:
storage: 4Gi
services:
# A ClusterIP service with the same name as the Helm Release name should be used for Neo4j Driver connections originating inside the
# Kubernetes cluster.
default:
# Annotations for the K8s Service object
annotations: { }
# A LoadBalancer Service for external Neo4j driver applications and Neo4j Browser
neo4j:
### this would create cluster-neo4j svc
enabled: false
# env:
# NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled : "true"
server.bolt.tls_level: "REQUIRED"
server.bolt.listen_address: "0.0.0.0:7687"
dbms.ssl.policy.bolt.client_auth: "NONE"
dbms.ssl.policy.bolt.enabled: "true"
server.directories.plugins: "/var/lib/neo4j/labs"
dbms.security.procedures.unrestricted: "apoc.*"
server.config.strict_validation.enabled: "false"
dbms.security.procedures.allowlist: "gds.*,apoc.*"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
startupProbe:
failureThreshold: 1000
periodSeconds: 50
ssl:
# setting per "connector" matching neo4j config
bolt:
privateKey:
secretName: tls-secret
subPath: tls.key
publicCertificate:
secretName: tls-secret
subPath: tls.crt
trustedCerts:
sources: [ ]
revokedCerts:
sources: [ ]
OK after a bit of looking at quite a few issues on the same subject, I found that some solutions for this problem was to add dbms.directories.plugins: "/var/lib/neo4j/labs" and dbms.config.strict_validation: "false" in the config section which, as I understand it, mirrors these settings both for server and dbms. It indeed worked, but it's weird that in the official guide it's not mentioned. I mean, these mirrored settings make sense, tell both the server and the dbms where to look for plugins, but still it should be mentioned. I see so many post about this, which means the documentation is not clear enough. It's easy to take things for granted and in fact because this mirrored plugin location both for the server AND dbms need is just not stated anywhere in the docs, I as many others thought that dbms was already configured with the same location as server.directories.plugins: "/var/lib/neo4j/labs" ( which the docs say to configure ) and haven't added it, but hey.. ain't nobody's perfect I guess. Hope they change the docs then for future devs' sake, but meanwhile this answer could be helpful.
So the correct configuration is
env:
NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled: 'true'
server.bolt.tls_level: 'REQUIRED'
server.bolt.listen_address: '0.0.0.0:7687'
dbms.ssl.policy.bolt.client_auth: 'NONE'
dbms.ssl.policy.bolt.enabled: 'true'
## apoc
server.directories.plugins: '/var/lib/neo4j/labs'
server.config.strict_validation.enabled: 'false'
dbms.security.procedures.unrestricted: 'apoc.*'
dbms.security.procedures.allowlist: 'gds.*,apoc.*'
### additional needed dbms config mirroring server config
dbms.directories.plugins: "/var/lib/neo4j/labs"
dbms.config.strict_validation: "false"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
It seems the docs are missing installing the APOC plugin. Change the following line to include APOC as well:
NEO4J_PLUGINS: '["graph-data-science", "apoc"]'
and you should be good

How to disable apikey for local serverless development?

I created a simple api (using serverless) which is protected by an apikey (when deployed via $ serverless deploy). However, for local development ($ serverless offline) I do not want to use an api key. How can I disable this for local only?
This is my serverless.yml:
service: my-service
frameworkVersion: "3"
provider:
name: aws
runtime: nodejs16.x
region: eu-central-1
apiGateway:
apiKeys:
- name: my-apikey
value: ${ssm:my-apikey}
functions:
myfunc:
handler: src/v1/myfunc/index.get
events:
- http:
path: /v1/myfunc
method: get
private: true
plugins:
- serverless-esbuild
- serverless-offline
- serverless-dotenv-plugin
Note: I am aware that I could simply set private: false when doing local development but this is quite tedious when there is a long list of functions.
The solution was to use the --noAuth option:
serverless offline --noAuth

how to promtail parse json to label and timestamp

I have a probleam to parse a json log with promtail, please, can somebody help me please. I try many configurantions, but don't parse the timestamp or other labels.
log entry:
{timestamp=2019-10-25T15:25:41.041-03, level=WARN, thread=http-nio-0.0.0.0-8080-exec-2, mdc={handler=MediaController, ctxCli=127.0.0.1, ctxId=FdD3FVqBAb0}, logger=br.com.brainyit.cdn.vbox.
controller.MediaController, message=[http://localhost:8080/media/sdf],c[500],t[4],l[null], context=default}
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: vbox-main
static_configs:
- targets:
- localhost
labels:
job: vbox
appender: main
__path__: /var/log/vbox/main.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
message: message
context: context
level: level
timestamp:
source: timestamp
format: RFC3339Nano
labels:
context:
level:
output:
source: message
I've tried the setup of Promtail with Java SpringBoot applications (which generates logs to file in JSON format by Logstash logback encoder) and it works.
The example log line generated by application:
{"timestamp":"2020-06-06T01:00:30.840+02:00","version":1,"message":"Started ApiApplication in 1.431 seconds (JVM running for 6.824)","logger_name":"com.github.pnowy.spring.api.ApiApplication","thread_name":"main","level":"INFO","level_value":20000}
The prometail config:
# Promtail Server Config
server:
http_listen_port: 9080
grpc_listen_port: 0
# Positions
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: springboot
pipeline_stages:
- json:
expressions:
level: level
message: message
timestamp: timestamp
logger_name: logger_name
stack_trace: stack_trace
thread_name: thread_name
- labels:
level:
- template:
source: new_key
template: 'logger={{ .logger_name }} threadName={{ .thread_name }} | {{ or .message .stack_trace }}'
- output:
source: new_key
static_configs:
- targets:
- localhost
labels:
job: applogs
__path__: /Users/przemek/tools/promtail/*.log
Please notice that the output (the log text) is configured first as new_key by Go templating and later set as the output source. The logger={{ .logger_name }} helps to recognise the field as parsed on Loki view (but it's an individual matter of how you want to configure it for your application).
Here you will find quite nice documentation about entire process: https://grafana.com/docs/loki/latest/clients/promtail/pipelines/
The example was run on release v1.5.0 of Loki and Promtail (Update 2020-04-25: I've updated links to current version - 2.2 as old links stopped working).
The section about timestamp is here: https://grafana.com/docs/loki/latest/clients/promtail/stages/timestamp/ with examples - I've tested it and also didn't notice any problem. Hope that help a little bit.
The JSON configuration part: https://grafana.com/docs/loki/latest/clients/promtail/stages/json/
Result on Loki:

Grafana Provisioning Notification Channels not working

I am trying to build a Docker container with existing datasources, dashboards and notification channels. The provisioning of datasources and dashboards are working but not the provisioning of Notification Channels. Using Grafana v6.3.5 (commit: 67bad72)
I am using the example config from the Grafana Provisioning documentation. I have added it to the /etc/grafana/provisioning/notifiers directory to a file called AlertNotificationChannel.yaml
I can see it is processing the file because I can see a message "Deleting alert notification logger=provisioning.notifiers name=notification-channel-1 uid=notifier1" in the logs. However no messages about inserting or updating alert notification and nothing in UI.
Contents of yaml file:
notifiers:
- name: notification-channel-1
type: slack
uid: notifier1
# either
org_id: 2
# or
org_name: Main Org.
is_default: true
send_reminder: true
frequency: 1h
disable_resolve_message: false
# See `Supported Settings` section for settings supporter for each
# alert notification type.
settings:
recipient: "XXX"
token: "xoxb"
uploadImage: true
url: https://slack.com
delete_notifiers:
- name: notification-channel-1
uid: notifier1
# either
org_id: 2
# or
org_name: Main Org.
I believe this functionality was added after v5 of Grafana and I am trying to follow the documentation but not working.
So I was having the same issue for a little bit today and I was able to make it work. I want to guess that you ended up finding a solution but I find it useful to post an example of something that works for future people going through this issue. The reason nothing was appearing in the UI is probably cause they were a mistake somewhere.
This is an example of my docker-compose:
grafana:
image: grafana/grafana
container_name: grafana
restart: always
user: "0"
ports:
- "3000:3000"
volumes:
- type: bind
source: "/root/Docker/grafana/grafana"
target: "/var/lib/grafana"
- type: bind
source: "/root/Docker/grafana/provisioning"
target: "/etc/grafana/provisioning"
This is an example of my "/grafana/provisioning/notifiers/slack.yml"
notifiers:
- name: slack-alarming
type: slack
username: Grafa_Alert
is_default: true
send_reminder: true
org_name: LML
settings:
uploadImage: true
url: POSTHOOKURL from slack
Note that the org Name is the name of my company and the username is random.
Thanks,
Wassim

SNMP Exporter (prometheus) + Extend OID

I'm new on Prometheus and i'm trying to monitor some extend OID I created with the snmp_exporter, but it doesn't work as expected.
My script just does an "echo $VALUE" (value is an integer or a string).
I have this snmpd.conf :
extend value-return-test /usr/local/bin/script.sh
I generated his OID :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendResult.\"value-return-test\" -On
Now I'm able to get all the snmp extend link to my configuration :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects |grep value-return-test
Now, here is my prometheus configuration prometheus.yml :
global:
scrape_interval: 5s
- job_name: 'snmp'
metrics_path: /snmp
params:
module: [tests]
static_configs:
- targets:
- 127.0.0.1 # SNMP device - add your IPs here
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter.
and my snmp.yaml :
tests:
walk:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
With that configuration I'm not able to get my value on the page http://localhost:9116/snmp?target=127.0.0.1&module=tests :
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.004676028
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 0
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.004477656
However if I put my configuration into an other block like the if_mib, I'm able to get the values BUT they are put in the wrong place :
As you can see I got the value "1" instead of "6".
I also tried the snmp exporter generator but i'm not able to build it :
$ go build
# github.com/prometheus/snmp_exporter/generator
./net_snmp.go:6:38: fatal error: net-snmp/net-snmp-config.h: No such file or directory
compilation terminated.
Thanks for your help
If you are able to change snmpd.conf that implies that you have enough control over the machine to run the node exporter. I'd suggest using the textfile collector of the node exporter to expose this data, rather than spending time figuring out the intricacies of how SNMP and MIBs work.
In general you should prefer the Node/WMI exporters where possible over using SNMP.
Using the get parameter instead of walk worked for me.
tests:
get:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32

Resources