GitLab pipeline exiting with error code 137 when running Cypress - docker

I'm creating a Docker image based on alpine:3.13, which is used for my test stage, running in a pipeline on GitLab.
There I install all the dependencies. The app consists of two components, which I will call front and back.
I run the following command to set up front and back and finally execute cypress in headless mode.
"e2e:run": "concurrently -n front, back \"yarn front\" \"yarn back\" \"yarn front:wait && yarn back:wait && yarn cypress:run\""
It builds front and back fine, but then the job log doesn't show any progress for a few minutes until I finally get this exit code:
ERROR: Job failed: command terminated with exit code 137
From my research so far, I concluded it seems to be related to a lack of memory.
Is there any other reasonable option?
What could I do to provide more memory/reduce memory consumption?

As #SamBob mentioned, this issue is likely due to low memory within the running docker container, and the shm_size parameter can increase it. However, since you're not directly running your image in the job (ie, doing docker run...) but rather the gitlab-runner process is, you'll have to set the shm_size parameter within the Runner's configuration for the Docker executor. To do this, you'll also have to run your own runners if you aren't already.
When running your own runners, each will have a config.toml file in /etc/gitlab-runner that looks like this by default:
listen_address = ":9252"
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner-1"
url = "https://gitlab.example.com"
token = "TOKEN"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.docker]
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
As you can see, by default the shm_size parameter is set to 0 bytes. You can edit this file to increase the shm_size, then restart the gitlab-runner service to reload the new config.
One other thing I do with my runners is to add a shm-increased tag on those runners I've increased since only a couple jobs in my pipelines need more shared memory.
To see more information on running your own Gitlab Runners, see here.
To see more on the shm_size parameter for Gitlab Runners, and other advanced runner configuration options, see here.
To see information on tagging runners and jobs, see here.

Related

Can I change the port a GitLab runner uses to pull from my self-hosted GitLab instance?

I have two virtual machines both running docker. On one, I am hosting a GitLab instance according to https://docs.gitlab.com/ee/install/docker.html. On the other, I have a GitLab runner running inside a container https://docs.gitlab.com/runner/install/docker.html.
Due to some port constraints, I am running the GitLab instance on non-standard ports (4443 instead of 443, for instance).
I am able to successfully register the runner, and GitLab can send a job that the runner will pick up. However, when that runner pulls the git repo it is apparently looking at the wrong port for that git pull:
My GitLab is on port 4443 not port 443. The GitLab runner config has the correct port in the url field and is, again, able to connect and receive jobs.
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner1"
url = "https://vm-pr01:4443/"
token = "egZUzK44hYVrhy6DTfey"
tls-ca-file = "/etc/gitlab-runner/certs/cert.crt"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "docker:20.10.16"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
Finally, I did try to change the GitLab ssh port according to this question:
Gitlab with non-standard SSH port (on VM with Iptable forwarding)
But that wasn't effective after restarting GitLab.
Are there other angles to try? My last thought was to host my own docker image with the correct ports changed in SSH config but I imagine there's a better way.
Use the clone_url configuration for the runner to change this.
[[runners]]
# ...
clone_url = "https://hostname:port"
Make sure the scheme matches your instance (http or https).
For whatever reason, the default clone url will not precisely respect the setting from url (scheme and port are assumed) so you must provide both url and clone_url in your configuration scenario.

After docker prune, the Gitlab-Runner continues to reference the deleted images in its cache

I have a Gitlab-Runner (version: 14.4.0) in a VM (Ubuntu). The docker version is 20.10.10.
Everything was working as expected.
Then I wanted to delete the installed images in the folder "/var/lib/docker/vfs".
I have done the following steps.
systemctl stop docker
cd /usr/share/gitlab-runner
./clear-docker-cache prune
docker system prune -f --all
ls -la /var/lib/docker/vfs/dir/
# returns an empty dir which is what I want
systemctl daemon-reload
systemctl start docker
systemctl stop gitlab-runner
systemctl start gitlab-runner
After that I tried to start a new build job using this gitlab-runner. Unfortunately, the Gitlab runner continues to reference the images I`ve deleted.
The following error messages occur when I want to build something with the runner.
Using Docker executor with image my-alpine:0.1.6 ...
ERROR: Preparation failed: adding cache volume: set volume permissions: create permission container for volume "runner-o19hepv1-project-133520-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": Error response from daemon: 48ac0f992674b920004317b8b6fc91dbc72f01327ca96005f7b19693f3c128ca: stat /var/lib/docker/vfs/dir/48ac0f992674b920004317b8b6fc91dbc72f01327ca96005f7b19693f3c128ca: no such file or directory (linux_set.go:95:0s)
How do I get rid of these error messages?
What did I do wrong with my approach. In principle, I would also like the images to be deleted once a week later.
The gitlab-runner systemd service is started with
/usr/bin/gitlab-runner "run" "--working-directory" "/home/gitlab-runner" "--config" "/etc/gitlab-runner/config.toml" "--service" "gitlab-runner" "--user" "gitlab-runner"
and the configuration (config.toml) is
concurrent = 5
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "my-gitlabrunner"
url = "https://git.tech.rz.db.de/"
token = "mytoken"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
I had a similar problem and, in my case, this was caused when the runner tried to crate a "permissions container" using a faulty image. Deleting that image so that it would re-download sorted it for me, the image was called registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-8925d9a0
$ docker image rm registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-8925d9a0
Error response from daemon: exit status 1: "/usr/bin/zfs fs destroy -r system/docker/418e78d27d51c2e2628534aaf9f84c5d76748d62e548a4de356328e0fb3a0c31" => cannot open 'system/docker/418e78d27d51c2e2628534aaf9f84c5d76748d62e548a4de356328e0fb3a0c31': dataset does not exist
Despite the error message the image was deleted. When I then retried a CI job it was downloaded again and everything has worked fine since.

Running gitlab-runner with multiple docker daemons

I'm trying to have several gitlab runners using different docker daemons on the same host
Currently using gitlab-runner 10.7.0 and docker 19.03.3. The goal is to maximize the usage of resources. Since I have two SSD disks on the machine, I want the runners to use both of them. The only way I found to have some runners use one disk while some others use the other disk is to have two docker daemons, one running on each disk.
I have one docker daemon running on unix:///var/run/docker-1.sock and one on unix:///var/run/docker-2.sock. They use each a dedicated bridge created manually. The (systemd) startup command line looks like /usr/bin/dockerd --host unix:///var/run/docker_socket/docker-%i.sock --containerd=/run/containerd/containerd.sock --pidfile /var/run/docker-%i.pid --data-root /data/local%i/docker/ --exec-root /data/local%i/docker_run/ --bridge docker-%i --fixed-cidr 172.%i0.0.1/17
The gitlab_runner mounts /var/run/docker_socket/ and runs on docker-1.sock.
I tried having one per docker daemon but then two jobs runs on the same runner although the limit is set to 1 (and also there are some sometimes errors appearing like ERROR: Job failed (system failure): Error: No such container: ...)
After registration the config.toml looks like:
concurrent = 20
check_interval = 0
[[runners]]
name = "[...]-large"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-1.sock"
tls_verify = false
image = "debian:jessie"
memory = "24g"
cpuset_cpus = "1-15"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-1.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
[[runners]]
name = "[...]-medium-1"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-2.sock"
tls_verify = false
image = "debian:jessie"
memory = "12g"
cpuset_cpus = "20-29"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-2.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
The two docker daemons are working fine. Tested with docker --host unix:///var/run/docker-<id>.sock ps
The current solution seems to be kind of OK but there are random errors in the gitlab_runner logs:
ERROR: Appending trace to coordinator... error couldn't execute PATCH against http://[...]/api/v4/jobs/223116/trace: Patch http://[...]/api/v4/jobs/223116/trace: read tcp [...] read: connection reset by peer runner=0ec8a845
Other people tried this, apparently with some success:
This one seems to list the whole set of options needed to properly run each instance of dockerd : Is it possible to start multiple docker daemons on the same machine. What are yours?
This other https://www.jujens.eu/posts/en/2018/Feb/25/multiple-docker/, does not speak about the possible extra bridge config.
NB: Docker documentation says the feature is experimental: https://docs.docker.com/engine/reference/commandline/dockerd/#run-multiple-daemons

gitlab runner - network_mode = "host"

I want to setup CI/CD in GitLab.
So i installed docker and the gitlab-runner on linux, created a config for a runner and started everything. So far so good.
The runner works, and docker works.
But i am using the linux subsystem from windows, so i need to run the docker container with parameter "--network host" otherwise they not gonna work.
So right now i try to configure the gitlab-runner to use the host network via the "network_mode" parameter. But it does not work. I get the same error as if i would run a docker container directly and without the "--network host".
The error:
WARNING: Preparation failed: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"process_linux.go:351: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: time=\\\\"2019-04-12T18:42:33+02:00\\\\" level=fatal msg=\\\\"failed to add interface vethfc7c8d1 to sandbox: failed to get link by name \\\\\\\\"vethfc7c8d1\\\\\\\\": Link not found\\\\" \\n\\"\"" (executor_docker.go:423:16s) job=123project=123 runner=123
This is my config:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "MyHostName"
url = "https://my.gitlab.url/"
token = "SoMeFaNcYcOdE-e"
executor = "docker"
[runners.docker]
tls_verify = false
image = "beevelop/ionic:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mode = "host"
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
My question is how do i force the gitlab runner to create the containers to use the host network like with the docker parameter: "--network host"
I was unable to solve the problem directly, but i found an alternative way which is a lot better.
I configured the GitLab Container Registry
of the repository to upload and white list a custom docker image and then enabled the Shared Runners of my company. The custom image i uploaded was created via a Dockerfile using docker for windows, avoiding the struggle of the buggy docker in the linux subsystem of windows. Now i can execute my CI pipeline flawlessly and have full control over the used image and do not have to keep my local machine running.

Gitlab-runner docker container is using the Gitlab container_id as the clone url

I am trying to configure a simple Gitlab-ci build pipeline and am running all of the components in docker containers. I followed the general guides on docs.gitlab.com and got a runner registered with gitlab. But when a build kicks off, the runner tries to clone the repository in question and seems to use the gitlab instance's container-id in place of the url, and I get an unreachable-host error:
Cloning repository...
Cloning into '/builds/root/ci-demo'...
fatal: unable to access 'http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx#cdfd596f2bc4/root/ci-demo.git/': Could not resolve host: cdfd596f2bc4
ERROR: Job failed: exit code 1
Is there something obvious that I've overlooked? There are quite a few similar questions on SO and the internet in general, but none seem to have a problem with the target container-id being substituted for the url.
gitlab-runner's config.toml:
concurrent = 1
check_interval = 0
[[runners]]
name = "runner_name"
url = "http://[ipaddr]:[port]/"
token = "xxxxxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "maven:latest"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]

Resources