Gitlab DIND Runner TLS Failure - docker

Im trying to setup a gitlab runner with dind, to build docker images in Gitlab CI Pipelines, but im getting the following errors each build:
*** WARNING: Service runner-project-2-concurrentdocker-0 probably didn't start properly.
Health check error:
service "runner-project-2-concurrentdocker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2021-10-12T14:12:26.652132966Z time="2021-10-12T14:12:26.651911909Z" level=info msg="Starting up"
2021-10-12T14:12:26.653211174Z time="2021-10-12T14:12:26.653132075Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2021-10-12T14:12:26.653320513Z time="2021-10-12T14:12:26.653240584Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network." host="tcp://0.0.0.0:2375"
2021-10-12T14:12:27.653863417Z time="2021-10-12T14:12:27.653434622Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message" host="tcp://0.0.0.0:2375"
My Gitlab Runner config.toml looks like this:
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-docker"
url = "https://gitlab.my.host/"
token = "MYTOKEN"
executor = "docker"
environment = ["DOCKER_DRIVER=overlay2", "DOCKER_HOST=tcp://docker:2375/", "DOCKER_TLS_CERTDIR="]
[runners.docker]
tls_verify = false
image = "docker:dind"
privileged = true
[[runners.docker.services]]
name = "docker:dind"
command = ["--registry-mirror", "http://192.168.1.21"]
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache"]
shm_size = 0
There are mysterious bug reports and user questions on those errors, but non of them seem to fix my problem. I tried removing the tls_verify settings, and setting the DOCKER_TLS_CERTDIR: "" CI Pipeline variable and what not. Is there any chance to get those runners booting up fast again with or without tls verification?

Related

Running gitlab-runner with multiple docker daemons

I'm trying to have several gitlab runners using different docker daemons on the same host
Currently using gitlab-runner 10.7.0 and docker 19.03.3. The goal is to maximize the usage of resources. Since I have two SSD disks on the machine, I want the runners to use both of them. The only way I found to have some runners use one disk while some others use the other disk is to have two docker daemons, one running on each disk.
I have one docker daemon running on unix:///var/run/docker-1.sock and one on unix:///var/run/docker-2.sock. They use each a dedicated bridge created manually. The (systemd) startup command line looks like /usr/bin/dockerd --host unix:///var/run/docker_socket/docker-%i.sock --containerd=/run/containerd/containerd.sock --pidfile /var/run/docker-%i.pid --data-root /data/local%i/docker/ --exec-root /data/local%i/docker_run/ --bridge docker-%i --fixed-cidr 172.%i0.0.1/17
The gitlab_runner mounts /var/run/docker_socket/ and runs on docker-1.sock.
I tried having one per docker daemon but then two jobs runs on the same runner although the limit is set to 1 (and also there are some sometimes errors appearing like ERROR: Job failed (system failure): Error: No such container: ...)
After registration the config.toml looks like:
concurrent = 20
check_interval = 0
[[runners]]
name = "[...]-large"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-1.sock"
tls_verify = false
image = "debian:jessie"
memory = "24g"
cpuset_cpus = "1-15"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-1.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
[[runners]]
name = "[...]-medium-1"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-2.sock"
tls_verify = false
image = "debian:jessie"
memory = "12g"
cpuset_cpus = "20-29"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-2.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
The two docker daemons are working fine. Tested with docker --host unix:///var/run/docker-<id>.sock ps
The current solution seems to be kind of OK but there are random errors in the gitlab_runner logs:
ERROR: Appending trace to coordinator... error couldn't execute PATCH against http://[...]/api/v4/jobs/223116/trace: Patch http://[...]/api/v4/jobs/223116/trace: read tcp [...] read: connection reset by peer runner=0ec8a845
Other people tried this, apparently with some success:
This one seems to list the whole set of options needed to properly run each instance of dockerd : Is it possible to start multiple docker daemons on the same machine. What are yours?
This other https://www.jujens.eu/posts/en/2018/Feb/25/multiple-docker/, does not speak about the possible extra bridge config.
NB: Docker documentation says the feature is experimental: https://docs.docker.com/engine/reference/commandline/dockerd/#run-multiple-daemons

gitlab runner - network_mode = "host"

I want to setup CI/CD in GitLab.
So i installed docker and the gitlab-runner on linux, created a config for a runner and started everything. So far so good.
The runner works, and docker works.
But i am using the linux subsystem from windows, so i need to run the docker container with parameter "--network host" otherwise they not gonna work.
So right now i try to configure the gitlab-runner to use the host network via the "network_mode" parameter. But it does not work. I get the same error as if i would run a docker container directly and without the "--network host".
The error:
WARNING: Preparation failed: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"process_linux.go:351: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: time=\\\\"2019-04-12T18:42:33+02:00\\\\" level=fatal msg=\\\\"failed to add interface vethfc7c8d1 to sandbox: failed to get link by name \\\\\\\\"vethfc7c8d1\\\\\\\\": Link not found\\\\" \\n\\"\"" (executor_docker.go:423:16s) job=123project=123 runner=123
This is my config:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "MyHostName"
url = "https://my.gitlab.url/"
token = "SoMeFaNcYcOdE-e"
executor = "docker"
[runners.docker]
tls_verify = false
image = "beevelop/ionic:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mode = "host"
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
My question is how do i force the gitlab runner to create the containers to use the host network like with the docker parameter: "--network host"
I was unable to solve the problem directly, but i found an alternative way which is a lot better.
I configured the GitLab Container Registry
of the repository to upload and white list a custom docker image and then enabled the Shared Runners of my company. The custom image i uploaded was created via a Dockerfile using docker for windows, avoiding the struggle of the buggy docker in the linux subsystem of windows. Now i can execute my CI pipeline flawlessly and have full control over the used image and do not have to keep my local machine running.

Unable to query docker version: Get https://<ip>/v1.15/version: remote error: tls: bad certificate

I am using gitlab runner to run my tests on Digital ocean servers
After I re-installed gitlab runner I started to get the following errors:
I have the following config for my runner:
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "Builds coordinator"
url = "https://gitlab.com/"
token = "<token>"
executor = "docker+machine"
limit = 10
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 0
OffPeakTimezone = ""
OffPeakIdleCount = 0
OffPeakIdleTime = 0
MachineDriver = "digitalocean"
MachineName = "gitlab-runner-autoscale-%s"
MachineOptions = [
"digitalocean-image=coreos-stable",
"digitalocean-ssh-user=core",
"digitalocean-access-token=<token>",
"digitalocean-region=lon1",
"digitalocean-size=4gb",
"digitalocean-private-networking"
]
[runners.cache]
Type = "s3"
Path = "cache_for_builds"
Shared = false
[runners.cache.s3]
ServerAddress = "ams3.digitaloceanspaces.com"
AccessKey = "<key>"
SecretKey = "<secret>"
BucketName = "cache-for-builds"
BucketLocation = "ams3"
Insecure = false
I tried to do docker-machine ls when the build was running and I saw the following output:
... ERRORS
... Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
or
... ERRORS
... Unable to query docker version: Get https://<ip>:2376/v1.15/version: remote error: tls: bad certificate
I tried to search for certs on my server with find -name "*cert*" and found them in /root/.docker/machine/certs. I added path to the certs to the [runners.docker] section as described in the docs but it didn't help:
[runners.docker]
...
tls_cert_path = "/root/.docker/machine/certs"
When I view gitlab runner logs with journalctl -u gitlab-runner I see
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "<ip>:2376": remote error: tls: bad certificate driver=digitalocean name=r
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: Be advised that this will trigger a Docker daemon restart which might stop running containers. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:07:00 builds-coordinator gitlab-runner[3360]: WARNING: Machine creation failed, trying to provision error=exit status 1 name=runner-gitlab-runner-autoscale-<id>
Jan 13 21:07:01 builds-coordinator gitlab-runner[3360]: Waiting for SSH to be available... name=runner-gitlab-runner-autoscale-<id> operation=provision
Before I was using gitlab runner version 11.5.1
How can I fix these errors and make gitlab runner running the builds?

Docker-ssh non-root path/getsockopt: connection refused

I’m trying to use the gitlab-runner with docker-ssh. Here is how my config.toml looks like:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
Will be retried in 3s …
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
So I tried to change the build directory and here hows my config.toml file looks like now:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
builds_dir = “/home/myuser/“
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = [”/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this new error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
WARNING: Since GitLab Runner 10.0 docker-ssh and docker-ssh+machine executors are marked as DEPRECATED and will be removed in one of the upcoming releases
Pulling docker image ubuntu:latest …
Using docker image sha256:cd6d8154f1e16e38493c3c2798977c5e142be5e5d41403ca89883840c6d51762 for ubuntu:latest …
ERROR: Preparation failed: dial tcp 172.17.0.2:22: getsockopt: connection refused
Will be retried in 3s …
Any idea what am I doing wrong?
Stick with an HTTPS URL, and try fixing instead the error:
build directory needs to be absolute and non-root path
See this thread
I was running my CI on an old gitlab-ci-multi-runner 9.5.1.
I update to gitlab-runner 10.8.0 and now it’s ok.
Or this thread:
Set build_dir="C:\\gitlab-runner\\builds" in the config.toml.

Gitlab-runner docker container is using the Gitlab container_id as the clone url

I am trying to configure a simple Gitlab-ci build pipeline and am running all of the components in docker containers. I followed the general guides on docs.gitlab.com and got a runner registered with gitlab. But when a build kicks off, the runner tries to clone the repository in question and seems to use the gitlab instance's container-id in place of the url, and I get an unreachable-host error:
Cloning repository...
Cloning into '/builds/root/ci-demo'...
fatal: unable to access 'http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx#cdfd596f2bc4/root/ci-demo.git/': Could not resolve host: cdfd596f2bc4
ERROR: Job failed: exit code 1
Is there something obvious that I've overlooked? There are quite a few similar questions on SO and the internet in general, but none seem to have a problem with the target container-id being substituted for the url.
gitlab-runner's config.toml:
concurrent = 1
check_interval = 0
[[runners]]
name = "runner_name"
url = "http://[ipaddr]:[port]/"
token = "xxxxxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "maven:latest"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]

Resources