Icinga2 client Host culster-zone check command not going down (RED) when lost connection - icinga

I have setup a single master with 2 client endpoints in my icintga2 monitoring system using director with Top-Down mode.
I have also setup 2 client nodes with both accept configs and accept commands.
(hopefully this means I'm running Top Down Command Endpoint mode)
The service checks (disk/mem/load) for the 3 hosts are returning correct results. But my problem is:
according to the example from Top Down Command Endpoint example,
host icinga2-client1 is using "hostalive" as the host check_command.
eg.
object Host "icinga2-client1.localdomain" {
check_command = "hostalive" //check is executed on the master
address = "192.168.56.111"
vars.client_endpoint = name //follows the convention that host name == endpoint name
}
But one issue I have is that
if the client1 icinga process is not running,
the host status stays GREEN and also all of service status (disk/mem/load) stay all GREEN as well
because master is not getting any service check updates and hostalive check command is able to ping the node.
Under Best Practice - Health Check section,
it mentioned to use "cluster-zone" check commands.
I was expecting while using "cluster-zone",
the host status would be RED
when the client node icinga process is stopped,
but somehow this is not happening.
Does anyone has any idea?
My zone/host/endpoint configurations are as follows:
object Zone "icinga-master" {
endpoints = [ "icinga-master" ]
}
object Host "icinga-master" {
import "Master-Template"
display_name = "icinga-master [192.168.100.71]"
address = "192.168.100.71"
groups = [ "Servers" ]
}
object Endpoint "icinga-master" {
host = "192.168.100.71"
port = "5665"
}
object Zone "rick-tftp" {
parent = "icinga-master"
endpoints = [ "rick-tftp" ]
}
object Endpoint "rick-tftp" {
host = "172.16.181.216"
}
object Host "rick-tftp" {
import "Host-Template"
display_name = "rick-tftp [172.16.181.216]"
address = "172.16.181.216"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
object Zone "tftp-server" {
parent = "icinga-master"
endpoints = [ "tftp-server" ]
}
object Endpoint "tftp-server" {
host = "192.168.100.221"
}
object Host "tftp-server" {
import "Host-Template"
display_name = "tftp-server [192.168.100.221]"
address = "192.168.100.221"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
template Host "Host-Template" {
import "pnp4nagios-host"
check_command = "cluster-zone"
max_check_attempts = "5"
check_interval = 1m
retry_interval = 30s
enable_notifications = true
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = true
enable_perfdata = true
}
Thanks,
Rick

Related

Uploading file to ECS task

I'm trying to upload a simple .yml file when creating an ECS task via Terraform, here is the code ./main.tf:
resource "aws_ecs_task_definition" "grafana" {
family = "grafana"
cpu = "256"
memory = "512"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
container_definitions = jsonencode([
{
name = "grafana"
image = "grafana/grafana:latest"
portMappings = [
{
containerPort = 3000,
hostPort = 3000,
protocol = "tcp"
}
]
}
])
}
How do I go about adding ./datasource.yml (located on my host machine) to the container within the task definition so that when the task runs it can use it? I wasn't sure if volume { } could be used?
I think you have two alternatives here:
rebuild the docker image including your modified datasource.yaml.
COPY datasource.yaml /usr/share/grafana/conf/provisioning/datasource.yaml
or
mount a volume that you can easily mount and push files programmatically (EFS turns out to be a bit complicated to do this)
mount_points = [ {
sourceVolume = "grafana"
containerPath = "/var/lib/grafana/conf/provisioning"
readOnly = false
}
]
volumes = [
{
name = "grafana"
host_path = "/ecs/grafana-provisioning"}
]
I wasn't sure if volume { } could be used?
As a matter of fact you can, check the docs https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_task_definition#example-usage
volume {
name = "grafana-volume"
host_path = "./datasource.yml"
}

cannot access nomad docker task via local ip:port

with following job config. curl NOMAD_IP_http:NOMAD_PORT_http cannot access http-echo service.
there is no listenig port on localhost for incomming request.
why and how to access the http-echo service
job "job" {
datacenters = ["dc1"]
group "group" {
count = 2
network {
port "http" {}
}
service {
name = "http-echo"
port = "http"
tags = [
"http-echo",
]
check {
type = "http"
path = "/health"
interval = "30s"
timeout = "2s"
}
}
task "task" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello and welcome to ${NOMAD_IP_http} running on port ${NOMAD_PORT_http}",
]
}
resources {}
}
}
}
UPDATE
after config driver network_mode, curl successfully.
network_mode = "host"
You forgot to add ports at job -> group -> task ->ports
Now it works on latest nomad(v1.1.3+).
job "job" {
datacenters = ["dc1"]
group "group" {
count = 2
network {
port "http" {}
# or maps to container's default port
# port "http" {
# to = 5678
# }
#
}
service {
name = "http-echo"
port = "http"
tags = [
"http-echo",
]
check {
type = "http"
path = "/health"
interval = "30s"
timeout = "2s"
}
}
task "task" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello and welcome to ${NOMAD_IP_http} running on port ${NOMAD_PORT_http}",
]
ports = ["http"]
}
resources {}
}
}
}
Then run docker ps, you will get the mapped port, and curl works.

Running a nomad job for a docker container that Traefik can find

I'm currently running a docker container with Traefik as the load balancer using the following docker-compose file:
services:
loris:
image: bdlss/loris-grok-docker
labels:
- traefik.http.routers.loris.rule=Host(`loris.my_domain`)
- traefik.http.routers.loris.tls=true
- traefik.http.routers.loris.tls.certresolver=lets-encrypt
- traefik.port=80
networks:
- web
It is working fairly well. As part of one my first attempts using Nomad, I simply want to be able to start this container using a nomad job loris.nomad instead of using the docker-compose file.
The Docker container 'Labels' and the 'Network' identification are quite important for Traefik to do the dynamic routing.
My question is: where can I put this "label" information and "network" information in the loris.nomad file so that it starts the container in the same way that the docker-compose file currently does.
I've tried putting this information in the task.config stanza but this doesn't work and I'm having trouble following the documentation. I've seen examples where an additional "service" stanza has been added, but I"m still not sure.
Here's the basics of that nomad file I want to modify.
# loris.nomad
job "loris" {
datacenters = ["dc1"]
group "loris" {
network {
port "http" {
to = 5004
}
task "loris" {
driver = "docker"
config {
image = "bdlss/loris-openjpeg-docker"
ports = ["http"]
}
resources {
cpu = 500
memory = 512
}
}
}
}
Any advice is much appreciated.
Well, the most appropriate option for running traefik in nomad and load-balance between containers is using consul catalog (required for service discovery).
For this to run you have to confgure the consule connection when you start nomad. If you like to test things out locally you can do this by simply running sudo nomad agent -dev-connect. Consul can be started with consul agent -dev -client="0.0.0.0".
Now you can simply provide your traefik configuration using tags as it is shown here.
If you really need (which will cause issues in a clustered setup for sure) to run traefik in nomad with docker provider you can do the following:
First you need to enable host path mounting in the docker plugin. See this and this. You can place your configuration in an extra file like extra.hcl which looks like this:
plugin "docker" {
config {
volumes {
enabled = true
}
}
}
Now you can start nomad with this extra setting sudo nomad agent -dev-connect -config=extra.hcl. Now you can provide your traefik settings in the config/labels block, like (full):
job "traefik" {
region = "global"
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
task "traefik" {
driver = "docker"
config {
image = "traefik:v2.3"
//network_mode = "host"
volumes = [
"local/traefik.yaml:/etc/traefik/traefik.yaml",
"/var/run/docker.sock:/var/run/docker.sock"
]
labels {
traefik.enable = true
traefik.http.routers.from-docker.rule = "Host(`docker.loris.mydomain`)"
traefik.http.routers.from-docker.entrypoints = "web"
traefik.http.routers.from-docker.service = "api#internal"
}
}
template {
data = <<EOF
log:
level: DEBUG
entryPoints:
traefik:
address: ":8080"
web:
address: ":80"
api:
dashboard: true
insecure: true
accessLog: {}
providers:
docker:
exposedByDefault: false
consulCatalog:
prefix: "traefik"
exposedByDefault: false
endpoint:
address: "10.0.0.20:8500"
scheme: "http"
datacenter: "dc1"
EOF
destination = "local/traefik.yaml"
}
resources {
cpu = 100
memory = 128
network {
mbits = 10
port "http" {
static = 80
}
port "traefik" {
static = 8080
}
}
}
service {
name = "traefik"
tags = [
"traefik.enable=true",
"traefik.http.routers.from-consul.rule=Host(`consul.loris.mydomain`)",
"traefik.http.routers.from-consul.entrypoints=web",
"traefik.http.routers.from-consul.service=api#internal"
]
check {
name = "alive"
type = "tcp"
port = "http"
interval = "10s"
timeout = "2s"
}
}
}
}
}
(There might be a setting to bind to 0.0.0.0 I defined those domains in my /etc/hosts to point to my main interface IP).
You can test it with this modified webapp spec (I didn't figure out how to map ports correctly, like container:80 -> host:<random>, but I think it is enough to show how complicated it gets :)):
job "demo-webapp" {
datacenters = ["dc1"]
group "demo" {
count = 3
task "server" {
env {
// "${NOMAD_PORT_http}"
PORT = "80"
NODE_IP = "${NOMAD_IP_http}"
}
driver = "docker"
config {
image = "hashicorp/demo-webapp-lb-guide"
labels {
traefik.enable = true
traefik.http.routers.webapp-docker.rule = "Host(`docker.loris.mydomain`) && Path(`/myapp`)"
traefik.http.services.webapp-docker.loadbalancer.server.port = 80
}
}
resources {
network {
// Used for docker provider
mode ="bridge"
mbits = 10
port "http"{
// Used for docker provider
to = 80
}
}
}
service {
name = "demo-webapp"
port = "http"
tags = [
"traefik.enable=true",
"traefik.http.routers.webapp-consul.rule=Host(`consul.loris.mydomain`) && Path(`/myapp`)",
]
check {
type = "http"
path = "/"
interval = "2s"
timeout = "2s"
}
}
}
}
}
I hope this somehow answers your question.

Terraform docker cannot authenticate with container registry for remote host

I am on a Windows machine using Terraform 0.13.4 and trying to spin up some containers on a remote host using Terraform and the Docker provider:
provider "docker" {
host = "tcp://myvm:2376/"
registry_auth {
address = "myregistry:443"
username = "myusername"
password = "mypassword"
}
ca_material = file(pathexpand(".docker/ca.pem"))
cert_material = file(pathexpand(".docker/cert.pem"))
key_material = file(pathexpand(".docker/key.pem"))
}
data "docker_registry_image" "mycontainer" {
name = "myregistry:443/lvl1/lvl2/myimage:latest"
}
I am having a hard time with this as it cannot authenticate with my private registry. Always getting 401 Unauthorized.
If I don't do this to grab the sha256_digest and just use the docker_container resource, everything works but it forces replacements of the running containers.
Hello Angelos if you dont want to force replace the running container you should try this :
provider "docker" {
host = "tcp://myvm:2376/"
registry_auth {
address = "myregistry:443"
username = "myusername"
password = "mypassword"
}
ca_material = file(pathexpand(".docker/ca.pem"))
cert_material = file(pathexpand(".docker/cert.pem"))
key_material = file(pathexpand(".docker/key.pem"))
}
data "docker_registry_image" "mycontainer" {
name = "myregistry:443/lvl1/lvl2/myimage:latest"
}
resource "docker_image" "example" {
name = data.docker_registry_image.mycontainer.name
pull_triggers = [data.docker_registry_image.mycontainer.sha256_digest]
keep_locally = true
}
then in the container use :
resource "docker_container" "example" {
image = docker_image.example.latest
name = "container_name"
}
you shoukd use
docker_image.example.latest
Using the resource docker_image itself if it already exist he wont pull the image and doesn't restart the container but if you pass the name as a string he will replace the container everytime.
https://www.terraform.io/docs/providers/docker/r/container.html
Turns out that the code is correct and that the container service I am using (older version of ProGet) is not replying correctly for the auth calls. I tested the code using another registry and it all works as expected.

Nomad+Docker: Using the local Docker image, avoiding cleanup

My problem
I use nomad to schedule and deploy Docker images across several nodes. I am using a pretty stable image, so I want that image to be loaded locally rather than fetched from Dockerhub each time.
The docker.cleanup.image argument should do just that:
docker.cleanup.image Defaults to true. Changing this to false will prevent Nomad from removing images from stopped tasks, which is exactly what I want.
The documentation example is:
client {
options {
"docker.cleanup.image" = "false"
}
}
However, I don't know where this stanza goes. I tried placing it in the job or task sections of the fairly simple configuration file, with no success.
Code (configuration file)
job "example" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 30
min_healthy_time = "10s"
healthy_deadline = "3m"
auto_revert = false
canary = 0
}
group "cache" {
count = 30
restart {
attempts = 10
interval = "5m"
delay = "25s"
mode = "delay"
}
ephemeral_disk {
size = 300
}
task "redis" {
driver = "docker"
config {
image = "whatever/whatever:v1"
port_map {
db = 80
}
}
env {
"LOGGER" = "ec2-52-58-216-66.eu-central-1.compute.amazonaws.com"
}
resources {
network {
mbits = 10
port "db" {}
}
}
service {
name = "global-redis-check"
tags = ["global", "cache"]
port = "db"
}
}
}
}
My question
Where do I place the client stanza in the nomad configuration file?
This doesn't go in your job file, it goes on the nomad agents (the clients where nomad jobs are deployed).

Resources