Running gitlab-runner with multiple docker daemons - docker

I'm trying to have several gitlab runners using different docker daemons on the same host
Currently using gitlab-runner 10.7.0 and docker 19.03.3. The goal is to maximize the usage of resources. Since I have two SSD disks on the machine, I want the runners to use both of them. The only way I found to have some runners use one disk while some others use the other disk is to have two docker daemons, one running on each disk.
I have one docker daemon running on unix:///var/run/docker-1.sock and one on unix:///var/run/docker-2.sock. They use each a dedicated bridge created manually. The (systemd) startup command line looks like /usr/bin/dockerd --host unix:///var/run/docker_socket/docker-%i.sock --containerd=/run/containerd/containerd.sock --pidfile /var/run/docker-%i.pid --data-root /data/local%i/docker/ --exec-root /data/local%i/docker_run/ --bridge docker-%i --fixed-cidr 172.%i0.0.1/17
The gitlab_runner mounts /var/run/docker_socket/ and runs on docker-1.sock.
I tried having one per docker daemon but then two jobs runs on the same runner although the limit is set to 1 (and also there are some sometimes errors appearing like ERROR: Job failed (system failure): Error: No such container: ...)
After registration the config.toml looks like:
concurrent = 20
check_interval = 0
[[runners]]
name = "[...]-large"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-1.sock"
tls_verify = false
image = "debian:jessie"
memory = "24g"
cpuset_cpus = "1-15"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-1.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
[[runners]]
name = "[...]-medium-1"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-2.sock"
tls_verify = false
image = "debian:jessie"
memory = "12g"
cpuset_cpus = "20-29"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-2.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
The two docker daemons are working fine. Tested with docker --host unix:///var/run/docker-<id>.sock ps
The current solution seems to be kind of OK but there are random errors in the gitlab_runner logs:
ERROR: Appending trace to coordinator... error couldn't execute PATCH against http://[...]/api/v4/jobs/223116/trace: Patch http://[...]/api/v4/jobs/223116/trace: read tcp [...] read: connection reset by peer runner=0ec8a845

Other people tried this, apparently with some success:
This one seems to list the whole set of options needed to properly run each instance of dockerd : Is it possible to start multiple docker daemons on the same machine. What are yours?
This other https://www.jujens.eu/posts/en/2018/Feb/25/multiple-docker/, does not speak about the possible extra bridge config.
NB: Docker documentation says the feature is experimental: https://docs.docker.com/engine/reference/commandline/dockerd/#run-multiple-daemons

Related

GitLab pipeline exiting with error code 137 when running Cypress

I'm creating a Docker image based on alpine:3.13, which is used for my test stage, running in a pipeline on GitLab.
There I install all the dependencies. The app consists of two components, which I will call front and back.
I run the following command to set up front and back and finally execute cypress in headless mode.
"e2e:run": "concurrently -n front, back \"yarn front\" \"yarn back\" \"yarn front:wait && yarn back:wait && yarn cypress:run\""
It builds front and back fine, but then the job log doesn't show any progress for a few minutes until I finally get this exit code:
ERROR: Job failed: command terminated with exit code 137
From my research so far, I concluded it seems to be related to a lack of memory.
Is there any other reasonable option?
What could I do to provide more memory/reduce memory consumption?
As #SamBob mentioned, this issue is likely due to low memory within the running docker container, and the shm_size parameter can increase it. However, since you're not directly running your image in the job (ie, doing docker run...) but rather the gitlab-runner process is, you'll have to set the shm_size parameter within the Runner's configuration for the Docker executor. To do this, you'll also have to run your own runners if you aren't already.
When running your own runners, each will have a config.toml file in /etc/gitlab-runner that looks like this by default:
listen_address = ":9252"
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner-1"
url = "https://gitlab.example.com"
token = "TOKEN"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.docker]
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
As you can see, by default the shm_size parameter is set to 0 bytes. You can edit this file to increase the shm_size, then restart the gitlab-runner service to reload the new config.
One other thing I do with my runners is to add a shm-increased tag on those runners I've increased since only a couple jobs in my pipelines need more shared memory.
To see more information on running your own Gitlab Runners, see here.
To see more on the shm_size parameter for Gitlab Runners, and other advanced runner configuration options, see here.
To see information on tagging runners and jobs, see here.

Gitlab DIND Runner TLS Failure

Im trying to setup a gitlab runner with dind, to build docker images in Gitlab CI Pipelines, but im getting the following errors each build:
*** WARNING: Service runner-project-2-concurrentdocker-0 probably didn't start properly.
Health check error:
service "runner-project-2-concurrentdocker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2021-10-12T14:12:26.652132966Z time="2021-10-12T14:12:26.651911909Z" level=info msg="Starting up"
2021-10-12T14:12:26.653211174Z time="2021-10-12T14:12:26.653132075Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2021-10-12T14:12:26.653320513Z time="2021-10-12T14:12:26.653240584Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network." host="tcp://0.0.0.0:2375"
2021-10-12T14:12:27.653863417Z time="2021-10-12T14:12:27.653434622Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message" host="tcp://0.0.0.0:2375"
My Gitlab Runner config.toml looks like this:
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-docker"
url = "https://gitlab.my.host/"
token = "MYTOKEN"
executor = "docker"
environment = ["DOCKER_DRIVER=overlay2", "DOCKER_HOST=tcp://docker:2375/", "DOCKER_TLS_CERTDIR="]
[runners.docker]
tls_verify = false
image = "docker:dind"
privileged = true
[[runners.docker.services]]
name = "docker:dind"
command = ["--registry-mirror", "http://192.168.1.21"]
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache"]
shm_size = 0
There are mysterious bug reports and user questions on those errors, but non of them seem to fix my problem. I tried removing the tls_verify settings, and setting the DOCKER_TLS_CERTDIR: "" CI Pipeline variable and what not. Is there any chance to get those runners booting up fast again with or without tls verification?

gitlab runner - network_mode = "host"

I want to setup CI/CD in GitLab.
So i installed docker and the gitlab-runner on linux, created a config for a runner and started everything. So far so good.
The runner works, and docker works.
But i am using the linux subsystem from windows, so i need to run the docker container with parameter "--network host" otherwise they not gonna work.
So right now i try to configure the gitlab-runner to use the host network via the "network_mode" parameter. But it does not work. I get the same error as if i would run a docker container directly and without the "--network host".
The error:
WARNING: Preparation failed: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"process_linux.go:351: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: time=\\\\"2019-04-12T18:42:33+02:00\\\\" level=fatal msg=\\\\"failed to add interface vethfc7c8d1 to sandbox: failed to get link by name \\\\\\\\"vethfc7c8d1\\\\\\\\": Link not found\\\\" \\n\\"\"" (executor_docker.go:423:16s) job=123project=123 runner=123
This is my config:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "MyHostName"
url = "https://my.gitlab.url/"
token = "SoMeFaNcYcOdE-e"
executor = "docker"
[runners.docker]
tls_verify = false
image = "beevelop/ionic:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mode = "host"
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
My question is how do i force the gitlab runner to create the containers to use the host network like with the docker parameter: "--network host"
I was unable to solve the problem directly, but i found an alternative way which is a lot better.
I configured the GitLab Container Registry
of the repository to upload and white list a custom docker image and then enabled the Shared Runners of my company. The custom image i uploaded was created via a Dockerfile using docker for windows, avoiding the struggle of the buggy docker in the linux subsystem of windows. Now i can execute my CI pipeline flawlessly and have full control over the used image and do not have to keep my local machine running.

Docker-ssh non-root path/getsockopt: connection refused

I’m trying to use the gitlab-runner with docker-ssh. Here is how my config.toml looks like:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
Will be retried in 3s …
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
So I tried to change the build directory and here hows my config.toml file looks like now:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
builds_dir = “/home/myuser/“
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = [”/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this new error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
WARNING: Since GitLab Runner 10.0 docker-ssh and docker-ssh+machine executors are marked as DEPRECATED and will be removed in one of the upcoming releases
Pulling docker image ubuntu:latest …
Using docker image sha256:cd6d8154f1e16e38493c3c2798977c5e142be5e5d41403ca89883840c6d51762 for ubuntu:latest …
ERROR: Preparation failed: dial tcp 172.17.0.2:22: getsockopt: connection refused
Will be retried in 3s …
Any idea what am I doing wrong?
Stick with an HTTPS URL, and try fixing instead the error:
build directory needs to be absolute and non-root path
See this thread
I was running my CI on an old gitlab-ci-multi-runner 9.5.1.
I update to gitlab-runner 10.8.0 and now it’s ok.
Or this thread:
Set build_dir="C:\\gitlab-runner\\builds" in the config.toml.

Gitlab-runner docker container is using the Gitlab container_id as the clone url

I am trying to configure a simple Gitlab-ci build pipeline and am running all of the components in docker containers. I followed the general guides on docs.gitlab.com and got a runner registered with gitlab. But when a build kicks off, the runner tries to clone the repository in question and seems to use the gitlab instance's container-id in place of the url, and I get an unreachable-host error:
Cloning repository...
Cloning into '/builds/root/ci-demo'...
fatal: unable to access 'http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx#cdfd596f2bc4/root/ci-demo.git/': Could not resolve host: cdfd596f2bc4
ERROR: Job failed: exit code 1
Is there something obvious that I've overlooked? There are quite a few similar questions on SO and the internet in general, but none seem to have a problem with the target container-id being substituted for the url.
gitlab-runner's config.toml:
concurrent = 1
check_interval = 0
[[runners]]
name = "runner_name"
url = "http://[ipaddr]:[port]/"
token = "xxxxxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "maven:latest"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]

Resources