How to keep nomad task from exiting? - devops

In docker we have -t flag to keep containers from exiting. How can achieve the same thing in nomad?
I want to debug if I can ping one service from another, so I just want a container with curl. However, if I try to deploy the ubuntu image specifying it like below it exits and keeps restarting. What can I do so it just keeps running?
task "testubuntu" {
driver = "docker"
config {
image = "ubuntu:latest"
}
resources {
cpu = 500
memory = 256
network {
mbits = 10
}
}
}

Another solution would be to set a "dummy" entry point tail -f /dev/null
task "testubuntu" {
driver = "docker"
config {
image = "ubuntu:latest"
entrypoint = [
"tail", "-f", "/dev/null",
]
}
resources {
cpu = 500
memory = 256
}
}
It is particularly useful, when you have a task that errors at the container startup but there is not much useful information in the logs. This "dummy" entry point will keep container alive allowing you to get inside container and execute a real startup command with attached debugger for example.
Apart from tail -f /dev/null, you can also simply use yes as an entry point. However, it will pollute stdout and affect your logging solution if it is setup.

Add container = true in the config stanza
task "testubuntu" {
driver = "docker"
config {
image = "ubuntu:latest"
container = true
}
resources {
cpu = 500
memory = 256
network {
mbits = 10
}
}
}

Related

Nomad Connect Two docker Containers

I am having trouble establishing communication between two docker containers via nomad. Containers are in the same task group but still unable to reach each other. Even when using NOMAD_ADDR_ environment variable. Can anyone help in this regard? I tried both host and bridge network mode.
My nomad config is given below. Images are pulled and the Redis container and application container starts, but then app container crashes with Redis Connection Refused error
The second issue is, as you might have guessed is of prettifying the code with proper indentation etc. Just like Javascript or HTML or YAML is automatically formatted in VS code. I am unable to find a code prettifier for the HCL language.
job "app-deployment" {
datacenters = ["dc1"]
group "app" {
network {
mode = "bridge"
port "web-ui" { to = 5000 }
port "redis" { to = 6379 }
}
service {
name = "web-ui"
port = "web-ui"
// check {
// type = "http"
// path = "/health"
// interval = "2s"
// timeout = "2s"
// }
}
task "myapp" {
driver = "docker"
config {
image_pull_timeout = "10m"
image = "https://docker.com"
ports = ["web-ui"]
}
env {
REDIS_URL="redis://${NOMAD_ADDR_redis}"
// REDIS_URL="redis://$NOMAD_IP_redis:$NOMAD_PORT_redis"
NODE_ENV="production"
}
}
task "redis" {
driver = "docker"
config {
image = "redis"
ports = ["redis"]
}
}
}
}
So I was able to resolve it, basically, when you start nomad agent in dev mode, by default it binds to the loopback interface and that is why you get 127.0.0.1 as IP and node port in NOMAD env variables. 127.0.0.1 resolves to localhost inside container and hence it is unable to reach the Redis server.
To fix the issue, simply run
ip a
Identify the primary network interface for me it was my wifi interface. Then start the nomad like below.
nomad agent -dev -network-interface="en0"
# where en0 is the primary network interface
That way u will still be able to access the nomad UI on localhost:4646 but your containers will get the HOST IP from your network rather then 127.0.0.1

Kubernetes pod keeps crashing with no error in logs

I am trying to deploy apache docker image using Terraform on Kubernetes Cluster
I tried the following command and able to hit the URL localhost:8082 from browser successfully
docker run -it --rm -d -p 8082:80 webservice
I then created a kubernetes_deployment using Terraform but pod keeps crashing and there's nothing in logs
resource "kubernetes_deployment" "api" {
metadata {
name = "ex-api"
labels = {
app = "EX"
component = "api"
}
}
spec {
replicas = 1
selector {
match_labels = {
app = "EX"
}
}
template {
metadata {
name = "ex-api"
labels = {
app = "EX"
component = "api"
}
}
spec {
container {
image = "${var.web_service_image}:${var.web_service_image_tag}"
image_pull_policy = "IfNotPresent"
name = "api-image"
# All the other configuration options should be here too.
port {
container_port = 80
name = "web"
}
} # end of container block
} # end of spec block
} # end of template block
} # end of spec out-block
}
Pod's output
kubectl get pod
NAME READY STATUS RESTARTS AGE
ex-api-5458586bd8-ex6sp 0/1 CrashLoopBackOff 19 72m
I assume I should either add some command or daemonize (eg -itd when using docker) it so that it keeps running. I may be wrong here
Kindly let me know what should I do to overcome this
No logs or no events shown when you run the describe command generally suggests that that there is an issue with invoking your entrypoint in your Dockerfile. So, you may have to overwrite the command in your deployment.yaml
In your case - your deployment may need to use the command that you have or tried to use in your Dockerfile. Apparently, kubernetes pod is unable to use what you have defined in the Dockerfile.

Increasing the disk size that docker can access in Container Optimized OS

I am attempting to run a simple daily batch script that can run for some hours, after which it will send the data it generated and shut down the instance. To achieve that, I have put the following into user-data:
users:
- name: cloudservice
uid: 2000
runcmd:
- sudo HOME=/home/root docker-credential-gcr configure-docker
- |
sudo HOME=/home/root docker run \
--rm -u 2000 --name={service_name} {image_name} {command}
- shutdown
final_message: "machine took $UPTIME seconds to start"
I am creating the instance using a python script to generate the configuration for the API like so:
def build_machine_configuration(
compute, name: str, project: str, zone: str, image: str
) -> Dict:
image_response = (
compute.images()
.getFromFamily(project="cos-cloud", family="cos-stable")
.execute()
)
source_disk_image = image_response["selfLink"]
machine_type = f"zones/{zone}/machineTypes/n1-standard-1"
# returns the cloud init from above
cloud_config = build_cloud_config(image)
config = {
"name": f"{name}",
"machineType": machine_type,
# Specify the boot disk and the image to use as a source.
"disks": [
{
"type": "PERSISTENT",
"boot": True,
"autoDelete": True,
"initializeParams": {"sourceImage": source_disk_image},
}
],
# Specify a network interface with NAT to access the public
# internet.
"networkInterfaces": [
{
"network": "global/networks/default",
"accessConfigs": [{"type": "ONE_TO_ONE_NAT", "name": "External NAT"}],
}
],
# Allow the instance to access cloud storage and logging.
"serviceAccounts": [
{
"email": "default",
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/bigquery",
],
}
],
# Metadata is readable from the instance and allows you to
# pass configuration from deployment scripts to instances.
"metadata": {
"items": [
{
# Startup script is automatically executed by the
# instance upon startup.
"key": "user-data",
"value": cloud_config,
},
{"key": "google-monitoring-enabled", "value": True},
]
},
}
return config
I am however running out of disk space inside the docker engine.
Any ideas on how to increase the size of the volume available to docker services?
The Docker engine uses the space of the disk of the Instance. So if the container doesn't have space is because the disk of the Instance is full.
The first thing that you can try to do is create an Instance with a bigger disk. The documentation says:
disks[ ].initializeParams.diskSizeGb string (int64 format)
Specifies the size of the disk in base-2 GB. The size must be at least
10 GB. If you specify a sourceImage, which is required for boot disks,
the default size is the size of the sourceImage. If you do not specify
a sourceImage, the default disk size is 500 GB.
You could increase the size adding the field diskSizeGb in the deployment:
"disks": [
{
[...]
"initializeParams": {
"diskSizeGb": 50,
[...]
Other thing you could try is execute the following command in the instance to see if the disk is full and what partition is full:
$ df -h
In the same way you could execute the following command to see the disk usage of the Docker Engine:
$ docker system df
The client and daemon API must both be at least 1.25 to use this command. Use the docker version command on the client to check your client and daemon API versions.
If you want more infomration you could use the flag -v
$ docker system df -v

Running gitlab-runner with multiple docker daemons

I'm trying to have several gitlab runners using different docker daemons on the same host
Currently using gitlab-runner 10.7.0 and docker 19.03.3. The goal is to maximize the usage of resources. Since I have two SSD disks on the machine, I want the runners to use both of them. The only way I found to have some runners use one disk while some others use the other disk is to have two docker daemons, one running on each disk.
I have one docker daemon running on unix:///var/run/docker-1.sock and one on unix:///var/run/docker-2.sock. They use each a dedicated bridge created manually. The (systemd) startup command line looks like /usr/bin/dockerd --host unix:///var/run/docker_socket/docker-%i.sock --containerd=/run/containerd/containerd.sock --pidfile /var/run/docker-%i.pid --data-root /data/local%i/docker/ --exec-root /data/local%i/docker_run/ --bridge docker-%i --fixed-cidr 172.%i0.0.1/17
The gitlab_runner mounts /var/run/docker_socket/ and runs on docker-1.sock.
I tried having one per docker daemon but then two jobs runs on the same runner although the limit is set to 1 (and also there are some sometimes errors appearing like ERROR: Job failed (system failure): Error: No such container: ...)
After registration the config.toml looks like:
concurrent = 20
check_interval = 0
[[runners]]
name = "[...]-large"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-1.sock"
tls_verify = false
image = "debian:jessie"
memory = "24g"
cpuset_cpus = "1-15"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-1.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
[[runners]]
name = "[...]-medium-1"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-2.sock"
tls_verify = false
image = "debian:jessie"
memory = "12g"
cpuset_cpus = "20-29"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-2.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
The two docker daemons are working fine. Tested with docker --host unix:///var/run/docker-<id>.sock ps
The current solution seems to be kind of OK but there are random errors in the gitlab_runner logs:
ERROR: Appending trace to coordinator... error couldn't execute PATCH against http://[...]/api/v4/jobs/223116/trace: Patch http://[...]/api/v4/jobs/223116/trace: read tcp [...] read: connection reset by peer runner=0ec8a845
Other people tried this, apparently with some success:
This one seems to list the whole set of options needed to properly run each instance of dockerd : Is it possible to start multiple docker daemons on the same machine. What are yours?
This other https://www.jujens.eu/posts/en/2018/Feb/25/multiple-docker/, does not speak about the possible extra bridge config.
NB: Docker documentation says the feature is experimental: https://docs.docker.com/engine/reference/commandline/dockerd/#run-multiple-daemons

Docker Container from Terraform will not start

I launched a Docker container with Terraform, simple code.
> cat main.tf
provider "docker"{
}
resource "docker_image" "ubuntu"{
name = "ubuntu:latest"
}
resource "docker_container" "webserver" {
image = "${docker_image.ubuntu.latest}"
name = "dev-web-p01"
#start = true
must_run = true
publish_all_ports = true
}
I can see the container spun up but not running.
> docker container -ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
63c770e28ad2 47b19964fb50 "/bin/bash" 10 minutes ago Exited (0) 3 minutes ago dev-web-p01
My attempt to start and connect to the container fails and I am not sure why?
> docker container start 63c
63c
> docker container exec -it 63c /bin/bash
Error response from daemon: Container 63c770e28ad256e77442cb2fb8b9b8bbc14b8f37b99296bc63f2d249209e0399 is not running
I have tried this for a couple of times but it doesn't work. Sorry bit of a noob here.
Exited (0) means program successfully completed. With docker you need to execute some long running commands to ensure it doesn't finish immediately.
Best way to test some changes with docker, is waiting for nothing. Try this:
resource "docker_image" "ubuntu" {
name = "ubuntu:latest"
}
resource "docker_container" "webserver" {
image = "${docker_image.ubuntu.latest}"
name = "terraform-docker-test"
must_run = true
publish_all_ports = true
command = [
"tail",
"-f",
"/dev/null"
]
}

Resources