Prometheus does not push alerts to AlertManager

Prometheus does not push alerts to AlertManager - monitoring

Although Prometheus says that the alerts are fired, my alert manager does not receive any alerts. It says "No Alerts".
This is just for testing purposes in my local machine. Here is my prometheus.yml
---
rule_files:
- ~/Documents/prometheus-data/alert.rules
scrape_configs:
- job_name: node
scrape_interval: 15s
static_configs:
- targets:
- "127.0.0.1:9100"
I use the following command to start prometheus.
./prometheus -config.file=prometheus.yml -alertmanager.url=http://127.0.0.1:9093
Am I missing anything?

I believe the issue is the path to your rules file at ~/Documents/prometheus-data/alert.rules, notably the ~ character.
Moving the rules rules file to the same directory as Prometheus and referencing it as just alert.rules worked for me when I tested your setup. I also tested removing the ~ character and using the absolute path to the alert.rules file which also worked.

Related

Error: endorsement failure during invoke. response: status:500 message:"error in simulation: failed to execute transaction [duplicate]

I just reinstalled Fabric Samples v2.2.0 from Hyperledger Fabric repository according to the documentation.
But when I try to run asset-transfer-basic application located in fabric-samples/asset-transfer-basic/application-javascript directory by running node app.js the wallet is created and an admin and user is registered. But then it tries to invoke the function as given in app.js and shows this error
error: [Transaction]: Error: No valid responses from any peers. Errors:
peer=peer0.org1.example.com:7051, status=500, message=error in simulation: failed to execute transaction
aa705c10403cb65cecbd360c13337d03aac97a8f233a466975773586fe1086f6: could not launch chaincode basic_1.0:b359a077730d7
f44d6a437ad49d1da951f6a01c6d1eed4f85b8b1f5a08617fe7: error starting container: error starting container:
API error (404): network _test not found
Response of a transaction to invoke a function
This error never occured before. But somehow after reinstalling docker and Hyperledger Fabric fabric-samples it never seems to find the network _test.
N.B. : Before reinstalling name of the network was net_test. But now when I try docker network ls it shows a network called docker_test. I am using Windows Subsystem for Linux (WSL) version 1.
NETWORK ID NAME DRIVER SCOPE
b7ac05456f46 bridge bridge local
acaa5856b871 docker_test bridge local
866f58b9078d host host local
4812f94efb15 none null local
How can I fix the issue occurring when I try to run the application?

In my opinion, the CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE setting seems to be wrong.
you can check docker-compose.yaml or core.yaml
1. docker-compose.yaml
I will explain fabric-samples/test-network as targeting according to your current situation.
You can check in CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE in docker-compose.yaml
Perhaps in your case(fabric-samples/test-network), the value of ${COMPOSE_PROJECT_NAME} was not set properly, so it was set to _test.
Make sure the value is set correctly and change it to your network name.
# hyperledger/fabric-samples/test-network/docker/docker-compose-test-net.yaml
# based v2.2
...
peer0.org1.example.com:
container_name: peer0.org1.example.com
image: hyperledger/fabric-peer:2.2
environment:
- CORE_VM_ENDPOINT=unix:///host/var/run/docker.sock
# - CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=${COMPOSE_PROJECT_NAME}_test
- CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=docker_test
...
2. core.yaml
If you have not set the value in the docker-compose.yaml peer, you need to check the core.yaml referenced by the peer.
you can find the networkMode parameter in core.yaml
# core.yaml
...
vm:
docker:
hostConfig:
# NetworkMode: host
NetworkMode: docker_test
...
If neither is set, it will be set to the default value. However, as you see _test being logged, the wrong value have been set in one of the two section, and you need to correct the value to the value you intended.

This issue is related to docker networking. In complete to #nezuko-response.
Create a file and name it ".env" in the same directory where your docker-compose file exists.
Add the following line in it:
COMPOSE_PROJECT_NAME=net
Use docker-compose up to update the container with the new configurations.
Or bring the HL network down (./network.sh down) and up (./network.sh up), restarting the test-nework.
Otherwise you'll still get the same error even after creating ".env" file.
More explanation about docker networking

run ./network down
then
export COMPOSE_PROJECT_NAME=net
afterwards
./network start

I copied this from someone .This one worked for me !!
Please create a file named ".env" in the same directory where your docker-compose file exists. Add the following line in ".env" file:-
COMPOSE_PROJECT_NAME=net

This worked for me
export COMPOSE_PROJECT_NAME=net

Traefik 2.0 - Path router rule not working with docker labels

I setup a GraphQL playground listening on port 4000.
So I added the following Traefik labels:
graphql:
restart: unless-stopped
labels:
- traefik.enable=true
- "traefik.http.routers.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.rule=Host(`graphql.${CI_ENVIRONMENT_HOST}`)"
- traefik.http.routers.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.tls.certresolver=letsencrypt
- traefik.http.services.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.loadbalancer.server.port=4000
This is working when I try to get graphql.site.com.
Now I want it to match site.com/graphql, so I changed the router label to this:
"traefik.http.routers.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.rule=Host(`${CI_ENVIRONMENT_HOST}`) && Path(`/graphql`)"
And with this configuration, I have a 404 error on site.com/graphql.
What did I miss?

In my opinion there is no backend application listening on the path /graphql.
Solution 1:
Make backend application (GraphQL) listen on path /graphql.
Probably you should also use PathPrefix(`/graphql`) instead of Path(`/graphql`)
Solution2:
Use traefik StripPrefix, which removes prefixes from the path before forwarding the request.
Use these labels.:
- "traefik.http.routers.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.rule=Host(`${CI_ENVIRONMENT_HOST}`)"
- "traefik.http.middlewares.stripprefix-graphql.stripprefix.prefixes=/graphql"
- "traefik.http.routers.${CI_PROJECT_PATH_SLUG}-${CI_ENVIRONMENT_SLUG}-graphql.middlewares=stripprefix-graphql#docker"
In case the backend is serving assets (e.g., images or Javascript files) you need to implement additional changes on your backend:
More info here: https://docs.traefik.io/middlewares/stripprefix/.
Hope this helps.

How does Path('') work?
(the traefik docs dont explain it)
I'd like to create a traefik ingress rule that finds a substring in a host name, e.g. I would like to match api in all 3 these examples:
mysite.com/subroute1/api/get
mysite.com/subroute2/api/post
mysite.com/subroute2/api/post
PathPrefix will not work because the prefix is different for subroute1, subroute2 and subroute3.
Can I use Path('/api'), will it work for all 3 subroutes? (or isn't there something like a PathContains('/api'))?

Travis CI Build to deploy on Cloud Foundry fails

I am trying to deploy a python Flask Application on Cloudfoundry but it fails.
It shows the output
The app cannot be mapped to route hello.cfapps.io because the route exists in a different space.
Please find the screenshot of the error
Here is what my travis.yml looks like:
stages:
- test
- deploy
language: python
python:
- '3.6'
env:
- PORT=8080
cache: pip
script: python hello.py &
jobs:
include:
- stage: test
install:
- pip install -r requirements.txt
- pip install -r tests/requirements_test.txt
script:
- python hello.py &
- python tests/test.py
- stage: deploy
deploy:
provider: cloudfoundry
username: vaibhavgupta0702#gmail.com
password:
secure: myencrytedpassword
api: https://api.run.pivotal.io
organization: Hello_Flask
space: development
on:
repo: vaibhavgupta0702/flask_helloWorld
Here is what my manifest.yml file looks like
---
applications:
- name: hello
memory: 128M
buildpacks:
- https://github.com/vaibhavgupta0702/flask_helloWorld.git
command: python hello.py &
timeout: 60
env:
PORT: 8080
I do not understand why the error is coming. Any help would be highly appreciated.

The app cannot be mapped to route hello.cfapps.io because the route exists in a different space.
This means exactly what it says. The domain cfapps.io is a shared domain which can be used by many people on the platform. When you see this error, it is telling you that someone else using the platform has already pushed an app which is utilizing that route.
There's a couple possibilities here:
Routes are scoped to a space. If you have multiple spaces, it's possible that the route in question could be used by an app in one of your other spaces. What you can do is run cf routes --orglevel. This will list all the routes in all the spaces under your organization. If you see the route hello listed under one of your spaces, simply run cf delete-route cfapps.io --hostname hello in the space where the route exists. That will delete it. Then deploy again.
Someone else is using the route. This means it would be in another org & space where you can't see it being used. In this case, there's not much you can do. You just need to pick another route or use a custom, private domain (note that custom, private domains require you to register a domain name & configure DNS as described here).
You can pick another route in a couple ways.
Use a random route. This works OK for testing, but not for anything where you want a consistent address. To use, just add random-route: true to your manifest.
Change your app name. By default, the route assigned to your app will be <app-name>.<default-domain>. Thus you get hello.cfapps.io because hello is your app name and cfapps.io is the default domain on PWS. If you change your app name to something unique, that'll result in a unique route that no one else is using.
Specifically define one or more routes. You can do this in your manifest.yml file. You need to add a routes: block and then add one or more routes.
Example:
---
...
routes:
- route: route1.example.com
- route: route2.example.com
- route: route3.example.com

Scraping traefik metrics from prometheus

I am trying to scrape traefik metrics from prometheus.
Traefik (latest) is hosted as a service on a swarm cluster, and the prometheus metrics are activated.
The matching endpoint is 10.200.1.1:8088/metrics
When I reach my endpoint from the navigator, I see the expected metrics :
...
# HELP traefik_config_last_reload_failure Last config reload failure
# TYPE traefik_config_last_reload_failure gauge
traefik_config_last_reload_failure 0
# HELP traefik_config_last_reload_success Last config reload success
# TYPE traefik_config_last_reload_success gauge
traefik_config_last_reload_success 1.53633684e+09
# HELP traefik_config_reloads_failure_total Config failure reloads
# TYPE traefik_config_reloads_failure_total counter
traefik_config_reloads_failure_total 0
# HELP traefik_config_reloads_total Config reloads
# TYPE traefik_config_reloads_total counter
traefik_config_reloads_total 76
...
So, to my pov, editing the following prometheus.yml (and POSTing to the /-/reload) should add these metrics.
global:
scrape_interval: 15s
rule_files:
- "targets.rules"
- "host.rules"
- "containers.rules"
scrape_configs:
...
- job_name: 'traefik'
metrics_path: '/metrics'
static_configs:
- targets: ['10.200.1.2:8088']
But unfortunately, none of those appear on prometheus api's drop down list.
Since I am new to traefik and prometheus, I am quite sure I understood something wrong.
I tried to follow a few guides (such as this one), but could not manage to have it work (may have worked with the previous version).
So.... does anyone have an idea on what I do wrong and/or what is the correct way?

After a while, many attempts and some pertinent questions later : I ended up thinking it was not about my configuration...
So since I also observed some randomly odd behavior (such as some 503 errors on my remote /providers call), I started thinking the problem was related to the access to my machine.
So I tried to demote the manager and promote another node of the swarm instead.
... And it worked!
My traefik metrics now appear in prometheus!
I still have to understand what is wrong with my former manager, but at least, I am stepping forward!
Thanks #AlinSînpălean & #AndreasJägle for your help!

Is it possible to give 2 targets for ssh monitoring in single prometheus job?

My scenario is that in blackbox.yml, i have ssh_banner module which checks for ssh like below.
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
Below is relevant the prometheus.yml:
- job_name: 'ssh_test'
scrape_interval: 20s
metrics_path: /probe
params:
module: ["ssh_banner"]
target: [ "node1:22", "node2:22"]
static_configs:
- targets:
- 'blackbox:9115'
I can see it is only doing ssh test for node1 not for node2. Is there any way to put in sigle place. I know creating a separate job would solve this problem. but number of servers can be many. so creating a separate job for every node doesn't looks good idea.

You need to follow the documentation and add relabelling rules to all this to work. There is a guide for this exact use case too.

Categories

HOME

machine-learning

dart

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Prometheus does not push alerts to AlertManager - monitoring

Related

Error: endorsement failure during invoke. response: status:500 message:"error in simulation: failed to execute transaction [duplicate]

Traefik 2.0 - Path router rule not working with docker labels

Travis CI Build to deploy on Cloud Foundry fails

Scraping traefik metrics from prometheus

Is it possible to give 2 targets for ssh monitoring in single prometheus job?

Categories

Resources