I am trying to scrape traefik metrics from prometheus.
Traefik (latest) is hosted as a service on a swarm cluster, and the prometheus metrics are activated.
The matching endpoint is 10.200.1.1:8088/metrics
When I reach my endpoint from the navigator, I see the expected metrics :
...
# HELP traefik_config_last_reload_failure Last config reload failure
# TYPE traefik_config_last_reload_failure gauge
traefik_config_last_reload_failure 0
# HELP traefik_config_last_reload_success Last config reload success
# TYPE traefik_config_last_reload_success gauge
traefik_config_last_reload_success 1.53633684e+09
# HELP traefik_config_reloads_failure_total Config failure reloads
# TYPE traefik_config_reloads_failure_total counter
traefik_config_reloads_failure_total 0
# HELP traefik_config_reloads_total Config reloads
# TYPE traefik_config_reloads_total counter
traefik_config_reloads_total 76
...
So, to my pov, editing the following prometheus.yml (and POSTing to the /-/reload) should add these metrics.
global:
scrape_interval: 15s
rule_files:
- "targets.rules"
- "host.rules"
- "containers.rules"
scrape_configs:
...
- job_name: 'traefik'
metrics_path: '/metrics'
static_configs:
- targets: ['10.200.1.2:8088']
But unfortunately, none of those appear on prometheus api's drop down list.
Since I am new to traefik and prometheus, I am quite sure I understood something wrong.
I tried to follow a few guides (such as this one), but could not manage to have it work (may have worked with the previous version).
So.... does anyone have an idea on what I do wrong and/or what is the correct way?
After a while, many attempts and some pertinent questions later : I ended up thinking it was not about my configuration...
So since I also observed some randomly odd behavior (such as some 503 errors on my remote /providers call), I started thinking the problem was related to the access to my machine.
So I tried to demote the manager and promote another node of the swarm instead.
... And it worked!
My traefik metrics now appear in prometheus!
I still have to understand what is wrong with my former manager, but at least, I am stepping forward!
Thanks #AlinSînpălean & #AndreasJägle for your help!
Related
I am using K6 for Load Testing.
I have cloned the K6, Grafana, InfluxDB docker-compose set up from here:
https://github.com/loadimpact/k6
Each time I start Grafana, I have to manually import the dashboard I want to use ('Import' - ID2587 - Load).
I am new to Docker (and Grafana!)....is there anyway to have this dashboard preloaded in the container so I don't have to manually add it each time?
mount your dashboard and datasources into grafana container
when running docker-compose up -d influxdb grafana
refer the docker-compose file and grafana folder here
And make sure the datasource in your dashboard.json is updated with name of the datasource mentioned in datasource.yml
I have created a small tutorial in k6 community. Hope this solves your case.
A few small improvements which I think can help the docker-compose setup be awesome to use:
Use the awesome 'k6 Load Testing Results - by dcadwallader' dashboard:
https://grafana.com/grafana/dashboards/2587
Map a local dashboards directory, as well as the settings for the dashboard with all of the org ids and settings pre-configured, e.g.:
volumes:
- ./dashboards:/var/lib/grafana/dashboards
- ./grafana-dashboard.yaml:/etc/grafana/provisioning/dashboards/dashboard.yaml
- ./grafana-datasource.yaml:/etc/grafana/provisioning/datasources/datasource.yaml
https://github.com/luketn/docker-k6-grafana-influxdb/blob/master/docker-compose.yml#L32-L35
Set the uid in the dashboard JSON file for consistent links, e.g.:
{
uid: "k6",
https://github.com/luketn/docker-k6-grafana-influxdb/blob/master/dashboards/k6-load-testing-results_rev3.json#L53
Ref: https://medium.com/swlh/beautiful-load-testing-with-k6-and-docker-compose-4454edb3a2e3
And: https://github.com/luketn/docker-k6-grafana-influxdb
I am new to prometheus/alertmanager.
I have created a cron job which executes shell script every minute. This shell script generates "test.prom" file (with a gauge metric in it) in the same directory which is assigned to --textfile.collector.directory argument (to node-exporter). I verified (using curl http://localhost:9100/metrics) that the node-exporter exposes that custom metric correctly.
When I tried to run a query against that custom metric in prometheus dashboard, it does not show up any results (it says no data found).
I could not figure out why the query against the metric exposed via node-exporter textfile collector fails. Any clues what I missed ? Also please let me know how to check and ensure that prometheus scraped my custom metric 'test_metric` ?
My query in prometheus dashboard is test_metric != 0 (in prometheus dashboard) which did not give any results. But I exposed test_metric via node-exporter textfile.
Any help is appreciated !!
BTW, the node-exporter is running as docker container in Kubernetes environment.
I had a similar situation, but it was not a configuration problem.
Instead, my data included timestamps:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 53.87 1541426242
network_connectivity_rtt{host="hop_1"} 58.8 1541426242
network_connectivity_rtt{host="hop_2"} 21.93 1541426242
network_connectivity_rtt{host="hop_3"} 71.69 1541426242
PNE was picking them up without any problem once I reloaded it. As prometheus is running under systemd, I had to check the logs like this:
journalctl --system -u prometheus.service --follow
There I read this line:
msg="Error on ingesting samples that are too old or are too far into the future"
Once I removed the timestamps, values started appearing. This lead me to read more in detail about the timestamps, and I found out they have to be in miliseconds. So this format now is ok:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 50.47 1541429581376
network_connectivity_rtt{host="hop_1"} 3.38 1541429581376
network_connectivity_rtt{host="hop_2"} 11.2 1541429581376
network_connectivity_rtt{host="hop_3"} 20.72 1541429581376
I hope it helps someone else.
Its my bad. I did not included scrape instructions for node-exporter in prometheus.yaml file. It worked after including them.
This issue is happening because of stale metrics.
Lets say you have written you metric in file at 13.00
by default after 5min prometheus will consider you metric stale and it might disappear from there at the time you are making query.
My scenario is that in blackbox.yml, i have ssh_banner module which checks for ssh like below.
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
Below is relevant the prometheus.yml:
- job_name: 'ssh_test'
scrape_interval: 20s
metrics_path: /probe
params:
module: ["ssh_banner"]
target: [ "node1:22", "node2:22"]
static_configs:
- targets:
- 'blackbox:9115'
I can see it is only doing ssh test for node1 not for node2. Is there any way to put in sigle place. I know creating a separate job would solve this problem. but number of servers can be many. so creating a separate job for every node doesn't looks good idea.
You need to follow the documentation and add relabelling rules to all this to work. There is a guide for this exact use case too.
Although Prometheus says that the alerts are fired, my alert manager does not receive any alerts. It says "No Alerts".
This is just for testing purposes in my local machine. Here is my prometheus.yml
---
rule_files:
- ~/Documents/prometheus-data/alert.rules
scrape_configs:
- job_name: node
scrape_interval: 15s
static_configs:
- targets:
- "127.0.0.1:9100"
I use the following command to start prometheus.
./prometheus -config.file=prometheus.yml -alertmanager.url=http://127.0.0.1:9093
Am I missing anything?
I believe the issue is the path to your rules file at ~/Documents/prometheus-data/alert.rules, notably the ~ character.
Moving the rules rules file to the same directory as Prometheus and referencing it as just alert.rules worked for me when I tested your setup. I also tested removing the ~ character and using the absolute path to the alert.rules file which also worked.
Iam new to Kubernetes , i want to know what does 'V' in the following mean ?
spec:
containers:
- args:
- -v=9
It seems to me it denotes verbose logging , is there any documentation for the same regarding the various levels of logging avbl like what values that arg v can take ?
Kubernetes uses glog. The available values are 1-4, as denoted in this doc