Nomad+Docker: Using the local Docker image, avoiding cleanup - docker

My problem
I use nomad to schedule and deploy Docker images across several nodes. I am using a pretty stable image, so I want that image to be loaded locally rather than fetched from Dockerhub each time.
The docker.cleanup.image argument should do just that:
docker.cleanup.image Defaults to true. Changing this to false will prevent Nomad from removing images from stopped tasks, which is exactly what I want.
The documentation example is:
client {
options {
"docker.cleanup.image" = "false"
}
}
However, I don't know where this stanza goes. I tried placing it in the job or task sections of the fairly simple configuration file, with no success.
Code (configuration file)
job "example" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 30
min_healthy_time = "10s"
healthy_deadline = "3m"
auto_revert = false
canary = 0
}
group "cache" {
count = 30
restart {
attempts = 10
interval = "5m"
delay = "25s"
mode = "delay"
}
ephemeral_disk {
size = 300
}
task "redis" {
driver = "docker"
config {
image = "whatever/whatever:v1"
port_map {
db = 80
}
}
env {
"LOGGER" = "ec2-52-58-216-66.eu-central-1.compute.amazonaws.com"
}
resources {
network {
mbits = 10
port "db" {}
}
}
service {
name = "global-redis-check"
tags = ["global", "cache"]
port = "db"
}
}
}
}
My question
Where do I place the client stanza in the nomad configuration file?

This doesn't go in your job file, it goes on the nomad agents (the clients where nomad jobs are deployed).

Related

Uploading file to ECS task

I'm trying to upload a simple .yml file when creating an ECS task via Terraform, here is the code ./main.tf:
resource "aws_ecs_task_definition" "grafana" {
family = "grafana"
cpu = "256"
memory = "512"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
container_definitions = jsonencode([
{
name = "grafana"
image = "grafana/grafana:latest"
portMappings = [
{
containerPort = 3000,
hostPort = 3000,
protocol = "tcp"
}
]
}
])
}
How do I go about adding ./datasource.yml (located on my host machine) to the container within the task definition so that when the task runs it can use it? I wasn't sure if volume { } could be used?
I think you have two alternatives here:
rebuild the docker image including your modified datasource.yaml.
COPY datasource.yaml /usr/share/grafana/conf/provisioning/datasource.yaml
or
mount a volume that you can easily mount and push files programmatically (EFS turns out to be a bit complicated to do this)
mount_points = [ {
sourceVolume = "grafana"
containerPath = "/var/lib/grafana/conf/provisioning"
readOnly = false
}
]
volumes = [
{
name = "grafana"
host_path = "/ecs/grafana-provisioning"}
]
I wasn't sure if volume { } could be used?
As a matter of fact you can, check the docs https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_task_definition#example-usage
volume {
name = "grafana-volume"
host_path = "./datasource.yml"
}

Running a nomad job for a docker container that Traefik can find

I'm currently running a docker container with Traefik as the load balancer using the following docker-compose file:
services:
loris:
image: bdlss/loris-grok-docker
labels:
- traefik.http.routers.loris.rule=Host(`loris.my_domain`)
- traefik.http.routers.loris.tls=true
- traefik.http.routers.loris.tls.certresolver=lets-encrypt
- traefik.port=80
networks:
- web
It is working fairly well. As part of one my first attempts using Nomad, I simply want to be able to start this container using a nomad job loris.nomad instead of using the docker-compose file.
The Docker container 'Labels' and the 'Network' identification are quite important for Traefik to do the dynamic routing.
My question is: where can I put this "label" information and "network" information in the loris.nomad file so that it starts the container in the same way that the docker-compose file currently does.
I've tried putting this information in the task.config stanza but this doesn't work and I'm having trouble following the documentation. I've seen examples where an additional "service" stanza has been added, but I"m still not sure.
Here's the basics of that nomad file I want to modify.
# loris.nomad
job "loris" {
datacenters = ["dc1"]
group "loris" {
network {
port "http" {
to = 5004
}
task "loris" {
driver = "docker"
config {
image = "bdlss/loris-openjpeg-docker"
ports = ["http"]
}
resources {
cpu = 500
memory = 512
}
}
}
}
Any advice is much appreciated.
Well, the most appropriate option for running traefik in nomad and load-balance between containers is using consul catalog (required for service discovery).
For this to run you have to confgure the consule connection when you start nomad. If you like to test things out locally you can do this by simply running sudo nomad agent -dev-connect. Consul can be started with consul agent -dev -client="0.0.0.0".
Now you can simply provide your traefik configuration using tags as it is shown here.
If you really need (which will cause issues in a clustered setup for sure) to run traefik in nomad with docker provider you can do the following:
First you need to enable host path mounting in the docker plugin. See this and this. You can place your configuration in an extra file like extra.hcl which looks like this:
plugin "docker" {
config {
volumes {
enabled = true
}
}
}
Now you can start nomad with this extra setting sudo nomad agent -dev-connect -config=extra.hcl. Now you can provide your traefik settings in the config/labels block, like (full):
job "traefik" {
region = "global"
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
task "traefik" {
driver = "docker"
config {
image = "traefik:v2.3"
//network_mode = "host"
volumes = [
"local/traefik.yaml:/etc/traefik/traefik.yaml",
"/var/run/docker.sock:/var/run/docker.sock"
]
labels {
traefik.enable = true
traefik.http.routers.from-docker.rule = "Host(`docker.loris.mydomain`)"
traefik.http.routers.from-docker.entrypoints = "web"
traefik.http.routers.from-docker.service = "api#internal"
}
}
template {
data = <<EOF
log:
level: DEBUG
entryPoints:
traefik:
address: ":8080"
web:
address: ":80"
api:
dashboard: true
insecure: true
accessLog: {}
providers:
docker:
exposedByDefault: false
consulCatalog:
prefix: "traefik"
exposedByDefault: false
endpoint:
address: "10.0.0.20:8500"
scheme: "http"
datacenter: "dc1"
EOF
destination = "local/traefik.yaml"
}
resources {
cpu = 100
memory = 128
network {
mbits = 10
port "http" {
static = 80
}
port "traefik" {
static = 8080
}
}
}
service {
name = "traefik"
tags = [
"traefik.enable=true",
"traefik.http.routers.from-consul.rule=Host(`consul.loris.mydomain`)",
"traefik.http.routers.from-consul.entrypoints=web",
"traefik.http.routers.from-consul.service=api#internal"
]
check {
name = "alive"
type = "tcp"
port = "http"
interval = "10s"
timeout = "2s"
}
}
}
}
}
(There might be a setting to bind to 0.0.0.0 I defined those domains in my /etc/hosts to point to my main interface IP).
You can test it with this modified webapp spec (I didn't figure out how to map ports correctly, like container:80 -> host:<random>, but I think it is enough to show how complicated it gets :)):
job "demo-webapp" {
datacenters = ["dc1"]
group "demo" {
count = 3
task "server" {
env {
// "${NOMAD_PORT_http}"
PORT = "80"
NODE_IP = "${NOMAD_IP_http}"
}
driver = "docker"
config {
image = "hashicorp/demo-webapp-lb-guide"
labels {
traefik.enable = true
traefik.http.routers.webapp-docker.rule = "Host(`docker.loris.mydomain`) && Path(`/myapp`)"
traefik.http.services.webapp-docker.loadbalancer.server.port = 80
}
}
resources {
network {
// Used for docker provider
mode ="bridge"
mbits = 10
port "http"{
// Used for docker provider
to = 80
}
}
}
service {
name = "demo-webapp"
port = "http"
tags = [
"traefik.enable=true",
"traefik.http.routers.webapp-consul.rule=Host(`consul.loris.mydomain`) && Path(`/myapp`)",
]
check {
type = "http"
path = "/"
interval = "2s"
timeout = "2s"
}
}
}
}
}
I hope this somehow answers your question.

How to pull docker image from public registry with nomad job?

I'am using nomad on GCE and I cannot pull docker images from the public registry.
I can do a pull form the command line with docker pull gerlacdt/helloapp:v0.1.0
But when trying to run a nomad job with a public registry image, I have this error:
Failed to find docker auth for repo "gerlacdt/helloapp": docker-credential-gcr
Relevant files :
The /root/.docker/config.json file:
{
"auths": {
"https://index.docker.io/v1/": {}
},
"credHelpers": {
"asia.gcr.io": "gcr",
"eu.gcr.io": "gcr",
"gcr.io": "gcr",
"staging-k8s.gcr.io": "gcr",
"us.gcr.io": "gcr"
}
}
The nomad client config:
datacenter = "europe-west1-c"
name = "consul-clients-092s"
region = "europe-west1"
bind_addr = "0.0.0.0"
advertise {
http = "172.27.3.132"
rpc = "172.27.3.132"
serf = "172.27.3.132"
}
client {
enabled = true
options = {
"docker.auth.config" = "/root/.docker/config.json"
"docker.auth.helper" = "gcr"
}
}
consul {
address = "127.0.0.1:8500"
}
The job file:
job "helloapp" {
datacenters = ["europe-west1-b", "europe-west1-c", "europe-west1-d"]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
# Configure the job to do rolling updates
update {
stagger = "10s"
max_parallel = 1
}
group "hello" {
count = 1
restart {
attempts = 2
interval = "1m"
delay = "10s"
mode = "fail"
}
# Define a task to run
task "hello" {
driver = "docker"
config {
image = "gerlacdt/helloapp:v0.1.0"
port_map {
http = 8080
}
}
service {
name = "${TASKGROUP}-service"
tags = [
# "traefik.tags=public",
"traefik.frontend.rule=Host:bla.zapto.org",
"traefik.frontend.entryPoints=http",
"traefik.tags=exposed"
]
port = "http"
check {
name = "alive"
type = "http"
interval = "10s"
timeout = "3s"
path = "/health"
}
}
resources {
cpu = 500 # 500 MHz
memory = 128 # 128MB
network {
mbits = 1
port "http" {
}
}
}
logs {
max_files = 10
max_file_size = 15
}
kill_timeout = "10s"
}
}
}
The complete error message from nomad client logs:
failed to initialize task "hello" for alloc "c845bdb9-500a-dc40-0f17-2b79fe4866f1": Failed to find docker auth for repo "gerlacdt/helloapp": docker-credential-gcr with input "gerlacdt/helloapp" failed with stderr:

Icinga2 client Host culster-zone check command not going down (RED) when lost connection

I have setup a single master with 2 client endpoints in my icintga2 monitoring system using director with Top-Down mode.
I have also setup 2 client nodes with both accept configs and accept commands.
(hopefully this means I'm running Top Down Command Endpoint mode)
The service checks (disk/mem/load) for the 3 hosts are returning correct results. But my problem is:
according to the example from Top Down Command Endpoint example,
host icinga2-client1 is using "hostalive" as the host check_command.
eg.
object Host "icinga2-client1.localdomain" {
check_command = "hostalive" //check is executed on the master
address = "192.168.56.111"
vars.client_endpoint = name //follows the convention that host name == endpoint name
}
But one issue I have is that
if the client1 icinga process is not running,
the host status stays GREEN and also all of service status (disk/mem/load) stay all GREEN as well
because master is not getting any service check updates and hostalive check command is able to ping the node.
Under Best Practice - Health Check section,
it mentioned to use "cluster-zone" check commands.
I was expecting while using "cluster-zone",
the host status would be RED
when the client node icinga process is stopped,
but somehow this is not happening.
Does anyone has any idea?
My zone/host/endpoint configurations are as follows:
object Zone "icinga-master" {
endpoints = [ "icinga-master" ]
}
object Host "icinga-master" {
import "Master-Template"
display_name = "icinga-master [192.168.100.71]"
address = "192.168.100.71"
groups = [ "Servers" ]
}
object Endpoint "icinga-master" {
host = "192.168.100.71"
port = "5665"
}
object Zone "rick-tftp" {
parent = "icinga-master"
endpoints = [ "rick-tftp" ]
}
object Endpoint "rick-tftp" {
host = "172.16.181.216"
}
object Host "rick-tftp" {
import "Host-Template"
display_name = "rick-tftp [172.16.181.216]"
address = "172.16.181.216"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
object Zone "tftp-server" {
parent = "icinga-master"
endpoints = [ "tftp-server" ]
}
object Endpoint "tftp-server" {
host = "192.168.100.221"
}
object Host "tftp-server" {
import "Host-Template"
display_name = "tftp-server [192.168.100.221]"
address = "192.168.100.221"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
template Host "Host-Template" {
import "pnp4nagios-host"
check_command = "cluster-zone"
max_check_attempts = "5"
check_interval = 1m
retry_interval = 30s
enable_notifications = true
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = true
enable_perfdata = true
}
Thanks,
Rick

How to setup Nomad via Terraform

I am beginner and I have problem to find solution for Terraform and Nomad. I need run Nomad and hashi-ui for web management of Nomad. I try to setup and run Nomad server via terrafom. Hashi-ui I have like nomad job. Nomad server and Hashi-ui run well. Hashi-ui I run in docker. Now I need to create terraform file for automation initial setup and orchestrate nomad. My server running on Debian 8.
My terraform file nomad.tf:
# Configure the Nomad provider
provider "nomad" {
address = "http://localhost:4646"
region = "global"
# group = "server"
}
variable "version" {
default = "latest"
}
data "template_file" "job" {
template = "${file("./hashi-ui.nomad")}"
vars {
version = "${var.version}"
}
}
# Register a job
resource "nomad_job" "hashi-ui" {
jobspec = "${data.template_file.job.rendered}"
}
And nomad job hashi-ui.nomad:
job "hashi-ui" {
region = "global"
datacenters = ["dc1"]
type = "service"
group "server" {
count = 1
task "hashi-ui" {
driver = "docker"
config {
image = "jippi/hashi-ui"
network_mode = "host"
}
service {
port = "http"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
env {
NOMAD_ENABLE = 1
NOMAD_ADDR = "http://0.0.0.0:4646"
}
resources {
cpu = 500
memory = 512
network {
mbits = 5
port "http" {
static = 3000
}
}
}
}
}
}
Terraform plan shows changes, but terraform apply throws this error:
Error applying plan:
1 error(s) occurred:
nomad_job.hashi-ui: 1 error(s) occurred:
nomad_job.hashi-ui: error applying jobspec: Put http://localhost:4646/v1/jobs?region=global: dial tcp [::1]:4646: getsockopt: connection refused
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
If I run nomad server beside than error is
1 error(s) occurred:
nomad_job.hashi-ui: 1 error(s) occurred:
nomad_job.hashi-ui: error applying jobspec: Unexpected response code: 500 (1 error(s) occurred:
Task group server validation failed: 1 error(s) occurred:
2 error(s) occurred:
Max parallel can not be less than one: 0 < 1
Stagger must be greater than zero: 0s)
Can you help me please?
You're missing a max parallel and stagger in your nomad job spec:
job "hashi-ui" {
region = "global"
datacenters = ["dc1"]
type = "service"
update {
stagger = "30s"
max_parallel = 2
}
count = 1
task "hashi-ui" {
driver = "docker"
config {
image = "jippi/hashi-ui"
network_mode = "host"
}
...

Resources