I'm new on Prometheus and i'm trying to monitor some extend OID I created with the snmp_exporter, but it doesn't work as expected.
My script just does an "echo $VALUE" (value is an integer or a string).
I have this snmpd.conf :
extend value-return-test /usr/local/bin/script.sh
I generated his OID :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendResult.\"value-return-test\" -On
Now I'm able to get all the snmp extend link to my configuration :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects |grep value-return-test
Now, here is my prometheus configuration prometheus.yml :
global:
scrape_interval: 5s
- job_name: 'snmp'
metrics_path: /snmp
params:
module: [tests]
static_configs:
- targets:
- 127.0.0.1 # SNMP device - add your IPs here
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter.
and my snmp.yaml :
tests:
walk:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
With that configuration I'm not able to get my value on the page http://localhost:9116/snmp?target=127.0.0.1&module=tests :
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.004676028
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 0
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.004477656
However if I put my configuration into an other block like the if_mib, I'm able to get the values BUT they are put in the wrong place :
As you can see I got the value "1" instead of "6".
I also tried the snmp exporter generator but i'm not able to build it :
$ go build
# github.com/prometheus/snmp_exporter/generator
./net_snmp.go:6:38: fatal error: net-snmp/net-snmp-config.h: No such file or directory
compilation terminated.
Thanks for your help
If you are able to change snmpd.conf that implies that you have enough control over the machine to run the node exporter. I'd suggest using the textfile collector of the node exporter to expose this data, rather than spending time figuring out the intricacies of how SNMP and MIBs work.
In general you should prefer the Node/WMI exporters where possible over using SNMP.
Using the get parameter instead of walk worked for me.
tests:
get:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
Related
I am using GrafanaCloud. I have a dashboard variable to create a list of nodes:
node -> label_values(agent_hostname)
Then I have my query to graph CPU for the selected hostname:
instance:node_cpu_utilisation:rate5m{agent_hostname="$node"}
This works fine for a single host but I would like to be able to have lines for several servers on one graph. I have 'Include All' and 'Multi-value' switched on for the variable. At the moment, when I choose a second or third server from my nodes variable the graph shows 'No data'. Do I need to amend my variable so that it parses with a pipe (OR) symbol at the end? And if so how would I do that?
I still haven't solved this but I managed the following workaround. I added the following to the grafana-agent.yaml file so that prometheus would add an environment and role label for each machine:
prometheus_remote_write:
url: {{ prometheus_url }}
write_relabel_configs:
- source_labels: [__address__]
regex: '.*'
target_label: instance
replacement: {{ ansible_hostname }}
- source_labels: [__address__]
regex: '.*'
target_label: environment
replacement: {{ env | default('legacy') }}
- source_labels: [__address__]
regex: '.*'
target_label: role
replacement: {{ role | default('none') }}
Then in the dashboard variables I added:
Then in the query for each metric I added the role and environment, for example for load:
node_load15{role="$role", environment="$environment" }
This allows me to show load for several machines on the one graph and also allows me to easily switch between environments and clusters using the variables drop-down at the top.
I have a probleam to parse a json log with promtail, please, can somebody help me please. I try many configurantions, but don't parse the timestamp or other labels.
log entry:
{timestamp=2019-10-25T15:25:41.041-03, level=WARN, thread=http-nio-0.0.0.0-8080-exec-2, mdc={handler=MediaController, ctxCli=127.0.0.1, ctxId=FdD3FVqBAb0}, logger=br.com.brainyit.cdn.vbox.
controller.MediaController, message=[http://localhost:8080/media/sdf],c[500],t[4],l[null], context=default}
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: vbox-main
static_configs:
- targets:
- localhost
labels:
job: vbox
appender: main
__path__: /var/log/vbox/main.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
message: message
context: context
level: level
timestamp:
source: timestamp
format: RFC3339Nano
labels:
context:
level:
output:
source: message
I've tried the setup of Promtail with Java SpringBoot applications (which generates logs to file in JSON format by Logstash logback encoder) and it works.
The example log line generated by application:
{"timestamp":"2020-06-06T01:00:30.840+02:00","version":1,"message":"Started ApiApplication in 1.431 seconds (JVM running for 6.824)","logger_name":"com.github.pnowy.spring.api.ApiApplication","thread_name":"main","level":"INFO","level_value":20000}
The prometail config:
# Promtail Server Config
server:
http_listen_port: 9080
grpc_listen_port: 0
# Positions
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: springboot
pipeline_stages:
- json:
expressions:
level: level
message: message
timestamp: timestamp
logger_name: logger_name
stack_trace: stack_trace
thread_name: thread_name
- labels:
level:
- template:
source: new_key
template: 'logger={{ .logger_name }} threadName={{ .thread_name }} | {{ or .message .stack_trace }}'
- output:
source: new_key
static_configs:
- targets:
- localhost
labels:
job: applogs
__path__: /Users/przemek/tools/promtail/*.log
Please notice that the output (the log text) is configured first as new_key by Go templating and later set as the output source. The logger={{ .logger_name }} helps to recognise the field as parsed on Loki view (but it's an individual matter of how you want to configure it for your application).
Here you will find quite nice documentation about entire process: https://grafana.com/docs/loki/latest/clients/promtail/pipelines/
The example was run on release v1.5.0 of Loki and Promtail (Update 2020-04-25: I've updated links to current version - 2.2 as old links stopped working).
The section about timestamp is here: https://grafana.com/docs/loki/latest/clients/promtail/stages/timestamp/ with examples - I've tested it and also didn't notice any problem. Hope that help a little bit.
The JSON configuration part: https://grafana.com/docs/loki/latest/clients/promtail/stages/json/
Result on Loki:
Say my OpenAPI definition has two servers. Both share the same variables. Thus I want to reference these variables to prevent duplicate code.
Actually I split my OpenAPI into files and combine it with swagger-cli bundle.
This is what it creates:
openapi: 3.0.2
info:
title: My API
description: 'some description'
version: 1.0.0
servers:
- url: 'https://stage-api.domain.com/foo/{v1}/{v2}/{v3}'
description: Staging API server for QA
variables:
v1:
description: 'variable 1'
default: 'something'
enum:
- 'foo1'
- 'foo2'
v2:
description: 'variable 2'
default: 'something'
enum:
- 'foo1'
- 'foo2'
v3:
description: 'variable 3'
default: 'something'
enum:
- 'foo1'
- 'foo2'
- url: 'https://api.domain.com/foo/{v1}/{v2}/{v3}'
description: PRODUCTION API server
variables:
region:
$ref: '#/servers/0/variables/v1'
brand:
$ref: '#/servers/0/variables/v2'
locale:
$ref: '#/servers/0/variables/v3'
paths: {}
Trying to validate this in Swagger Editor I get the following error:
Structural error at servers.1.variables.v1 should NOT have
additional properties additionalProperty: $ref Jump to line xx
Structural error at servers.1.variables.v1 should have required
property 'default' missingProperty: default Jump to line xx
Is it possible to reference the server variables or reuse them in another way?
Of course I could run swagger-cli bundle -r but I would want to prevent using that.
No, this is not supported. You can request changes to the OpenAPI Specification at
https://github.com/OAI/OpenAPI-Specification/issues
In your example, the server paths are almost the same except for the subdomain, so you can use a single server definition and make the subdomain a variable:
servers:
- url: 'https://{env}.domain.com/foo/{v1}/{v2}/{v3}'
variables:
env:
description: Environment - staging or production
default: stage-api
enum:
- stage-api
- api
# other variables
# ...
I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.
I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?
My AlertManager YAML file is:
route:
receiver: X
receivers:
- name: X
email_configs:
- name: Y
email_configs:
And alert.rule files is:
groups:
- name: A
rules:
- alert: A_down
expr: expression
for: 1m
labels:
severity: critical
annotations:
summary: "A is down"
- name: B
rules:
- alert: B_down
expr: expression
for: 1m
labels:
severity: warning
annotations:
summary: "B is down"
The config should roughly look like this (not tested):
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: X
- match:
alertname: B_down
receiver: Y
The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.
For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:
SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route)
-> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...
For example:
if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.
There's 3 files important to configure alerts on your prometheus system:
alertmanager.yml - configuration of you routes (getting the triggered
errors) and receivers (how to handle this errors)
rules.yml - This rules will contain all the thresholds and rules
you'll define in your system.
prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).
I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it):
On /var/data/prometheus-stack/alertmanager/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'JohnDoe#gmail.com'
route:
receiver: defaultTrigger
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
routes:
- match_re:
service: service_overload
owner: ATeam
receiver: pagerDutyTrigger
receivers:
- name: 'pagerDutyTrigger'
pagerduty_configs:
- send_resolved: true
routing_key: <myPagerDutyToken>
Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml
groups:
- name: alerts
rules:
- alert: service_overload_more_than_5000
expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
for: 10m
labels:
service: service_overload
severity: pager
dev_team: myteam
annotations:
dev_team: myteam
priority: Blocker
identifier: '{{ $labels.name }}'
description: 'service overflow'
value: '{{ humanize $value }}%'
On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:
global:
...
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
rule_files:
- "yourRuleFile.yml"
...
Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.
Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts
Is there a way to pass a boolean value for spec.container.env.value ?
I want to override, with helm, a boolean env variables in a docker parent image (https://github.com/APSL/docker-thumbor) : UPLOAD_ENABLED
I made a simpler test
If you try the following yaml :
apiVersion: v1
kind: Pod
metadata:
name: envar-demo
labels:
purpose: demonstrate-envars
spec:
containers:
- name: envar-demo-container
image: gcr.io/google-samples/node-hello:1.0
env:
- name: DEMO_GREETING
value: true
And try to create it with kubernetes, you got the following error :
kubectl create -f envars.yaml
the error :
error: error validating "envars.yaml": error validating data: expected type string, for field spec.containers[0].env[0].value, got bool; if you choose to ignore these errors, turn validation off with --validate=false
with validate=false
Error from server (BadRequest): error when creating "envars.yaml": Pod in version "v1" cannot be handled as a Pod: [pos 192]: json: expect char '"' but got char 't'
It doesn't work with integer values too
spec.container.env.value is defined as string. see here:
https://kubernetes.io/docs/api-reference/v1.6/#envvar-v1-core
You'd have to cast/convert/coerse to boolean in your container when using this value
Try escaping the value. The below worked for me:
- name: DEMO_GREETING
value: "'true'"
This works for me.
In my example, one is hardcoded, and the other comes from an env var.
env:
- name: MY_BOOLEAN
value: 'true'
- name: MY_BOOLEAN2
value: '${MY_BOOLEAN2_ENV_VAR}'
So basically, I wrap single quotes around everything, just in case.
WARNING: Dont use hyphens in your env var names, that will not work...
if you are the helm chart implementer, just quote it
data:
# VNC_ONLY: {{ .Values.vncOnly }} <-- Wrong
VNC_ONLY: "{{ .Values.vncOnly }}" # <-- Correct
From command line you can also use
--set-string
instead of
--set
and you will be able to pass value without escaping
for instance:
--set-string "env.my-setting=False"