I am new to Prometheus, and I have been trying to set up blackbox exporter for monitoring my server by the http_post_2xx module but not http_2xx module. However, given so much researching on the internet, I still do not figure out.
Here is the background of my situation: I used to monitor my website available or not by Postman. After sending a post request, I should be able to receive a signal indicating the status is 200 OK or could not get any response, manually. This is ineffective and unresponsible as I should not be noticed an error from my website-visitor but not myself. Therefore, I turn to Prometheus.
Blackbox exporter seems like my solution. I build blackbox exporter on my server, and the configure file is like this:
modules:
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{text: "hi"}'
I configure the prometheus.yml in this way:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_post_2xx]
static_configs:
- targets:
- 10.0.100.130:2001
- 10.0.100.130:2002 # The IP address I want to monitor
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.100.130:9115 # Turn on this port for sending metrics
The dashboard I apply is 5345, but I got something like this:
enter image description here
I do not know why the HTTP Status Code is N/A or No, but the status from Postman is 200 OK. Is there anything wrong of my configuration?
Related
Context
I am working on a POC for a client which involves a Citrix Netscaler. My entire demo is a docker-compose.yml with:
different DBMS
some web services
my monitoring solution (grafana, prometheus, telegraf)
I would like to use this image as a reverse proxy for the web services and monitor this service with prometheus.
Need
I would like to set thing so that no manual action would be required to run the demo. In the context of nginx, I would simply mount the relevant conf file somewhere in /etc/nginx/conf.d. Using a Citrix netscaler, I am not sure
whether it is even possible
how to proceed (the only doc I could found display a very graphical/complicated process)
In a nutshell, I would like to be able to route http requests to the different web services by overriding some configuration file, like so:
netscaler:
image: store/citrix/netscalercpx:12.0-56.20
container_name: ws-netscaler
ports:
- 444:443
- 81:80
expose:
- 161
volumes:
- ./netscaler/some.conf:/nsconfig/some.conf:ro # what I am trying to achieve
environment:
- EULA=yes
cap_add:
- NET_ADMIN
ulimits:
nproc: 1
About this specific image
It appears that all netscaler related files are here
root#61baa67a839f:/# ls /netscaler
cli_script.sh nitro ns_service_stop nscli_linux nsconmsg nsnetsvc nssslgen pitboss
docker_startup.sh ns_reboot nsaggregatord nsconfigaudit nslinuxtimer nsppe nstraceaggregator showtechsupport.pl
netscaler.conf ns_service_start nsapimgr nsconfigd nslped nssetup_linux nstracemergenclean.sh snmpd
and here
root#61baa67a839f:/# ls -R nsconfig
nsconfig:
dns monitors nsboot.conf snmpd.conf ssl
nsconfig/dns:
nsconfig/monitors:
nsconfig/ssl:
ns-root.cert ns-root.req ns-server.cert ns-server.req ns-sftrust-root.key ns-sftrust-root.srl ns-sftrust.der ns-sftrust.req
ns-root.key ns-root.srl ns-server.key ns-sftrust-root.cert ns-sftrust-root.req ns-sftrust.cert ns-sftrust.key ns-sftrust.sig
Based on nsboot.conf's content
root#61baa67a839f:/# cat /nsconfig/nsboot.conf
add route 0 0 172.18.0.1
set rnat 192.0.0.0 255.255.255.0 -natip 172.18.0.2
add ssl certkey ns-server-certificate -cert ns-server.cert -key ns-server.key
set tcpprofile nstcp_default_profile mss 1460
set ns hostname 61baa67a839f
and this documentation, I am assuming that this would be the place. Am I right in assuming so?
Edit
Overriding nsboot.conf did not work as expected, for this file is quite probably written by entrypoint.sh. I end up with multiple definitions. It seems that the correct way to do it is by injecting /etc/cpx.conf (source).
# /etc/cpx.conf
WS_ADDRESS=$(getent hosts some_web_service | awk '{ print $1 }')
add cs vserver some_ws HTTP $WS_ADDRESS 5000
But I can't access the resource through the netscaler (mainly because I do not understand NetScaler CLI yet)
$ curl http://localhost:5000/hello
Hello, World!%
$ curl http://localhost:81/some_ws/hello
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /some_ws/hello was not found on this server.</p>
</body></html>
I have a probleam to parse a json log with promtail, please, can somebody help me please. I try many configurantions, but don't parse the timestamp or other labels.
log entry:
{timestamp=2019-10-25T15:25:41.041-03, level=WARN, thread=http-nio-0.0.0.0-8080-exec-2, mdc={handler=MediaController, ctxCli=127.0.0.1, ctxId=FdD3FVqBAb0}, logger=br.com.brainyit.cdn.vbox.
controller.MediaController, message=[http://localhost:8080/media/sdf],c[500],t[4],l[null], context=default}
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: vbox-main
static_configs:
- targets:
- localhost
labels:
job: vbox
appender: main
__path__: /var/log/vbox/main.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
message: message
context: context
level: level
timestamp:
source: timestamp
format: RFC3339Nano
labels:
context:
level:
output:
source: message
I've tried the setup of Promtail with Java SpringBoot applications (which generates logs to file in JSON format by Logstash logback encoder) and it works.
The example log line generated by application:
{"timestamp":"2020-06-06T01:00:30.840+02:00","version":1,"message":"Started ApiApplication in 1.431 seconds (JVM running for 6.824)","logger_name":"com.github.pnowy.spring.api.ApiApplication","thread_name":"main","level":"INFO","level_value":20000}
The prometail config:
# Promtail Server Config
server:
http_listen_port: 9080
grpc_listen_port: 0
# Positions
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: springboot
pipeline_stages:
- json:
expressions:
level: level
message: message
timestamp: timestamp
logger_name: logger_name
stack_trace: stack_trace
thread_name: thread_name
- labels:
level:
- template:
source: new_key
template: 'logger={{ .logger_name }} threadName={{ .thread_name }} | {{ or .message .stack_trace }}'
- output:
source: new_key
static_configs:
- targets:
- localhost
labels:
job: applogs
__path__: /Users/przemek/tools/promtail/*.log
Please notice that the output (the log text) is configured first as new_key by Go templating and later set as the output source. The logger={{ .logger_name }} helps to recognise the field as parsed on Loki view (but it's an individual matter of how you want to configure it for your application).
Here you will find quite nice documentation about entire process: https://grafana.com/docs/loki/latest/clients/promtail/pipelines/
The example was run on release v1.5.0 of Loki and Promtail (Update 2020-04-25: I've updated links to current version - 2.2 as old links stopped working).
The section about timestamp is here: https://grafana.com/docs/loki/latest/clients/promtail/stages/timestamp/ with examples - I've tested it and also didn't notice any problem. Hope that help a little bit.
The JSON configuration part: https://grafana.com/docs/loki/latest/clients/promtail/stages/json/
Result on Loki:
I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.
I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?
My AlertManager YAML file is:
route:
receiver: X
receivers:
- name: X
email_configs:
- name: Y
email_configs:
And alert.rule files is:
groups:
- name: A
rules:
- alert: A_down
expr: expression
for: 1m
labels:
severity: critical
annotations:
summary: "A is down"
- name: B
rules:
- alert: B_down
expr: expression
for: 1m
labels:
severity: warning
annotations:
summary: "B is down"
The config should roughly look like this (not tested):
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: X
- match:
alertname: B_down
receiver: Y
The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.
For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:
SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route)
-> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...
For example:
if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.
There's 3 files important to configure alerts on your prometheus system:
alertmanager.yml - configuration of you routes (getting the triggered
errors) and receivers (how to handle this errors)
rules.yml - This rules will contain all the thresholds and rules
you'll define in your system.
prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).
I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it):
On /var/data/prometheus-stack/alertmanager/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'JohnDoe#gmail.com'
route:
receiver: defaultTrigger
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
routes:
- match_re:
service: service_overload
owner: ATeam
receiver: pagerDutyTrigger
receivers:
- name: 'pagerDutyTrigger'
pagerduty_configs:
- send_resolved: true
routing_key: <myPagerDutyToken>
Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml
groups:
- name: alerts
rules:
- alert: service_overload_more_than_5000
expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
for: 10m
labels:
service: service_overload
severity: pager
dev_team: myteam
annotations:
dev_team: myteam
priority: Blocker
identifier: '{{ $labels.name }}'
description: 'service overflow'
value: '{{ humanize $value }}%'
On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:
global:
...
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
rule_files:
- "yourRuleFile.yml"
...
Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.
Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts
I'm new on Prometheus and i'm trying to monitor some extend OID I created with the snmp_exporter, but it doesn't work as expected.
My script just does an "echo $VALUE" (value is an integer or a string).
I have this snmpd.conf :
extend value-return-test /usr/local/bin/script.sh
I generated his OID :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendResult.\"value-return-test\" -On
Now I'm able to get all the snmp extend link to my configuration :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects |grep value-return-test
Now, here is my prometheus configuration prometheus.yml :
global:
scrape_interval: 5s
- job_name: 'snmp'
metrics_path: /snmp
params:
module: [tests]
static_configs:
- targets:
- 127.0.0.1 # SNMP device - add your IPs here
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter.
and my snmp.yaml :
tests:
walk:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
With that configuration I'm not able to get my value on the page http://localhost:9116/snmp?target=127.0.0.1&module=tests :
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.004676028
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 0
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.004477656
However if I put my configuration into an other block like the if_mib, I'm able to get the values BUT they are put in the wrong place :
As you can see I got the value "1" instead of "6".
I also tried the snmp exporter generator but i'm not able to build it :
$ go build
# github.com/prometheus/snmp_exporter/generator
./net_snmp.go:6:38: fatal error: net-snmp/net-snmp-config.h: No such file or directory
compilation terminated.
Thanks for your help
If you are able to change snmpd.conf that implies that you have enough control over the machine to run the node exporter. I'd suggest using the textfile collector of the node exporter to expose this data, rather than spending time figuring out the intricacies of how SNMP and MIBs work.
In general you should prefer the Node/WMI exporters where possible over using SNMP.
Using the get parameter instead of walk worked for me.
tests:
get:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
I have simple services as:
transactions-core-service and transactions-api-service.
transactions-api-service invokes transactions-core-service to return a list of transactions. transactions-api-service is enabled with hystrix command.
Both are registered in Eureka server with below services ids:
TRANSACTIONS-API-SERVICE n/a (1) (1) UP (1) - 192.168.2.12:transactions-api-service:8083
TRANSACTIONS-CORE-SERVICE n/a (1) (1) UP (1) - 192.168.2.12:transactions-core-service:8087
Below is Zuul server:
#SpringBootApplication
#Controller
#EnableZuulProxy
public class ZuulApplication {
public static void main(String[] args) {
new SpringApplicationBuilder(ZuulApplication.class).web(true).run(args);
}
}
Zuul Configurations:
===============================================
info:
component: Zuul Server
server:
port: 8765
endpoints:
restart:
enabled: true
shutdown:
enabled: true
health:
sensitive: false
zuul:
ignoredServices: "*"
routes:
transactions-api-service:
path: transactions/accounts/**
serviceId: transactions-api-service
eureka:
client:
serviceUrl:
defaultZone: http://localhost:8761/eureka/
logging:
level:
ROOT: INFO
org.springframework.web: DEBUG
===============================================
When I try to invoke transactions-api-service with url (http://localhost:8765/transactions/accounts/123/transactions/786) I get Zuul Exception:
2016-02-13 11:29:29.050 WARN 4936 --- [nio-8765-exec-1]
o.s.c.n.z.filters.post.SendErrorFilter : Error during filtering
com.netflix.zuul.exception.ZuulException: Forwarding error
at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:131)
~[spring-cloud-net flix-core-1.1.0.M3.jar:1.1.0.M3]
at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:76)
~[spring-cloud-netflix- core-1.1.0.M3.jar:1.1.0.M3] ......
If I invoke the transactions-api-service individually (with localhost /accounts/123/transactions/786), it works fine.
Am I missing any configurations on Zuul?
You need to change zuul execution timeout by adding this property in application.yml of zuul server:
# Increase the Hystrix timeout to 60s (globally)
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 60000
Please refer to this thread on netflix issues: https://github.com/spring-cloud/spring-cloud-netflix/issues/321
Faced same issue. In my case, zuul was using service discovery. As a solution, below configuration worked like a charm.
ribbon.ReadTimeout=60000
Reference to the property usage is here.
You have an incorrect indentation. Instead of:
zuul:
ignoredServices: "*"
routes:
transactions-api-service:
path: transactions/accounts/**
serviceId: transactions-api-service
It should be:
zuul:
ignoredServices: "*"
routes:
transactions-api-service:
path: transactions/accounts/**
serviceId: transactions-api-service
you can use this to avoid 500 error
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=1000000
zuul.host.connect-timeout-millis=10000
zuul.host.socket-timeout-millis=1000000
In case if your Zuul gateway uses discovery service for service lookup in that case you can disable the hystrix timeout or increase the hysterix timeout as below :
# Disable Hystrix timeout globally (for all services)
hystrix.command.default.execution.timeout.enabled: false
#To disable timeout foror particular service,
hystrix.command.<serviceName>.execution.timeout.enabled: false
# Increase the Hystrix timeout to 60s (globally)
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 60000
# Increase the Hystrix timeout to 60s (per service)
hystrix.command.<serviceName>.execution.isolation.thread.timeoutInMilliseconds: 60000
I was having same issue with zuul server, it got resolved with below property
Let's say you have 2 clients clientA and clientB,
so for clientA, spring.application.name=clientA and server.port=1111
for clientB spring.application.name=clientB and server.port=2222 in there respective application.propeties files.
You want to connect this 2 servers to ZuulServer which is running on port 8087.
add below properties in you ZuulServer application.properties file
spring.application.name=gateway-service
eureka.client.serviceUrl.defaultZone=http://localhost:8761/eureka
eureka.client.register-with-eureka=true
eureka.client.fetch-registry=true
clientA.ribbon.listOfServers=http://localhost:1111
clientB.ribbon.listOfServers=http://localhost:2222
server.port=8087
Note: I am using Eureka Client with my Zuul Server. you can skip that part. Adding this solution in case its helpful for someone.