Unable to query docker version: Get https://<ip>/v1.15/version: remote error: tls: bad certificate - docker

I am using gitlab runner to run my tests on Digital ocean servers
After I re-installed gitlab runner I started to get the following errors:
I have the following config for my runner:
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "Builds coordinator"
url = "https://gitlab.com/"
token = "<token>"
executor = "docker+machine"
limit = 10
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 0
OffPeakTimezone = ""
OffPeakIdleCount = 0
OffPeakIdleTime = 0
MachineDriver = "digitalocean"
MachineName = "gitlab-runner-autoscale-%s"
MachineOptions = [
"digitalocean-image=coreos-stable",
"digitalocean-ssh-user=core",
"digitalocean-access-token=<token>",
"digitalocean-region=lon1",
"digitalocean-size=4gb",
"digitalocean-private-networking"
]
[runners.cache]
Type = "s3"
Path = "cache_for_builds"
Shared = false
[runners.cache.s3]
ServerAddress = "ams3.digitaloceanspaces.com"
AccessKey = "<key>"
SecretKey = "<secret>"
BucketName = "cache-for-builds"
BucketLocation = "ams3"
Insecure = false
I tried to do docker-machine ls when the build was running and I saw the following output:
... ERRORS
... Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
or
... ERRORS
... Unable to query docker version: Get https://<ip>:2376/v1.15/version: remote error: tls: bad certificate
I tried to search for certs on my server with find -name "*cert*" and found them in /root/.docker/machine/certs. I added path to the certs to the [runners.docker] section as described in the docs but it didn't help:
[runners.docker]
...
tls_cert_path = "/root/.docker/machine/certs"
When I view gitlab runner logs with journalctl -u gitlab-runner I see
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "<ip>:2376": remote error: tls: bad certificate driver=digitalocean name=r
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: ERROR: Be advised that this will trigger a Docker daemon restart which might stop running containers. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:06:59 builds-coordinator gitlab-runner[3360]: The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag. driver=digitalocean name=runner-gitlab-runner-autoscale-<id> operation=create
Jan 13 21:07:00 builds-coordinator gitlab-runner[3360]: WARNING: Machine creation failed, trying to provision error=exit status 1 name=runner-gitlab-runner-autoscale-<id>
Jan 13 21:07:01 builds-coordinator gitlab-runner[3360]: Waiting for SSH to be available... name=runner-gitlab-runner-autoscale-<id> operation=provision
Before I was using gitlab runner version 11.5.1
How can I fix these errors and make gitlab runner running the builds?

Related

Gitlab DIND Runner TLS Failure

Im trying to setup a gitlab runner with dind, to build docker images in Gitlab CI Pipelines, but im getting the following errors each build:
*** WARNING: Service runner-project-2-concurrentdocker-0 probably didn't start properly.
Health check error:
service "runner-project-2-concurrentdocker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2021-10-12T14:12:26.652132966Z time="2021-10-12T14:12:26.651911909Z" level=info msg="Starting up"
2021-10-12T14:12:26.653211174Z time="2021-10-12T14:12:26.653132075Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2021-10-12T14:12:26.653320513Z time="2021-10-12T14:12:26.653240584Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network." host="tcp://0.0.0.0:2375"
2021-10-12T14:12:27.653863417Z time="2021-10-12T14:12:27.653434622Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message" host="tcp://0.0.0.0:2375"
My Gitlab Runner config.toml looks like this:
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-docker"
url = "https://gitlab.my.host/"
token = "MYTOKEN"
executor = "docker"
environment = ["DOCKER_DRIVER=overlay2", "DOCKER_HOST=tcp://docker:2375/", "DOCKER_TLS_CERTDIR="]
[runners.docker]
tls_verify = false
image = "docker:dind"
privileged = true
[[runners.docker.services]]
name = "docker:dind"
command = ["--registry-mirror", "http://192.168.1.21"]
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache"]
shm_size = 0
There are mysterious bug reports and user questions on those errors, but non of them seem to fix my problem. I tried removing the tls_verify settings, and setting the DOCKER_TLS_CERTDIR: "" CI Pipeline variable and what not. Is there any chance to get those runners booting up fast again with or without tls verification?

Running gitlab-runner with multiple docker daemons

I'm trying to have several gitlab runners using different docker daemons on the same host
Currently using gitlab-runner 10.7.0 and docker 19.03.3. The goal is to maximize the usage of resources. Since I have two SSD disks on the machine, I want the runners to use both of them. The only way I found to have some runners use one disk while some others use the other disk is to have two docker daemons, one running on each disk.
I have one docker daemon running on unix:///var/run/docker-1.sock and one on unix:///var/run/docker-2.sock. They use each a dedicated bridge created manually. The (systemd) startup command line looks like /usr/bin/dockerd --host unix:///var/run/docker_socket/docker-%i.sock --containerd=/run/containerd/containerd.sock --pidfile /var/run/docker-%i.pid --data-root /data/local%i/docker/ --exec-root /data/local%i/docker_run/ --bridge docker-%i --fixed-cidr 172.%i0.0.1/17
The gitlab_runner mounts /var/run/docker_socket/ and runs on docker-1.sock.
I tried having one per docker daemon but then two jobs runs on the same runner although the limit is set to 1 (and also there are some sometimes errors appearing like ERROR: Job failed (system failure): Error: No such container: ...)
After registration the config.toml looks like:
concurrent = 20
check_interval = 0
[[runners]]
name = "[...]-large"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-1.sock"
tls_verify = false
image = "debian:jessie"
memory = "24g"
cpuset_cpus = "1-15"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-1.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
[[runners]]
name = "[...]-medium-1"
limit = 1
output_limit = 32768
url = "[...]"
token = "[...]"
executor = "docker"
[runners.docker]
host = "unix:///var/run/docker-2.sock"
tls_verify = false
image = "debian:jessie"
memory = "12g"
cpuset_cpus = "20-29"
privileged = false
security_opt = ["seccomp=unconfined"]
disable_cache = false
volumes = ["/var/run/docker-2.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
The two docker daemons are working fine. Tested with docker --host unix:///var/run/docker-<id>.sock ps
The current solution seems to be kind of OK but there are random errors in the gitlab_runner logs:
ERROR: Appending trace to coordinator... error couldn't execute PATCH against http://[...]/api/v4/jobs/223116/trace: Patch http://[...]/api/v4/jobs/223116/trace: read tcp [...] read: connection reset by peer runner=0ec8a845
Other people tried this, apparently with some success:
This one seems to list the whole set of options needed to properly run each instance of dockerd : Is it possible to start multiple docker daemons on the same machine. What are yours?
This other https://www.jujens.eu/posts/en/2018/Feb/25/multiple-docker/, does not speak about the possible extra bridge config.
NB: Docker documentation says the feature is experimental: https://docs.docker.com/engine/reference/commandline/dockerd/#run-multiple-daemons

Docker-ssh non-root path/getsockopt: connection refused

I’m trying to use the gitlab-runner with docker-ssh. Here is how my config.toml looks like:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
Will be retried in 3s …
Using Docker executor with image ubuntu:latest …
ERROR: Preparation failed: build directory needs to be absolute and non-root path
So I tried to change the build directory and here hows my config.toml file looks like now:
[[runners]]
name = “CI/CD docker-ssh alfa”
url = “https://gitlab.com/”
token = “<SOME_TOKEN>“
executor = “docker-ssh”
builds_dir = “/home/myuser/“
[runners.ssh]
user = “myuser”
password = “my password”
[runners.docker]
tls_verify = false
image = “ubuntu:latest”
privileged = false
disable_cache = false
volumes = [”/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
But I got this new error:
Running with gitlab-runner 11.3.0 (d78e9e67)
on CI/CD docker-ssh alfa 1f147b76
Using Docker executor with image ubuntu:latest …
WARNING: Since GitLab Runner 10.0 docker-ssh and docker-ssh+machine executors are marked as DEPRECATED and will be removed in one of the upcoming releases
Pulling docker image ubuntu:latest …
Using docker image sha256:cd6d8154f1e16e38493c3c2798977c5e142be5e5d41403ca89883840c6d51762 for ubuntu:latest …
ERROR: Preparation failed: dial tcp 172.17.0.2:22: getsockopt: connection refused
Will be retried in 3s …
Any idea what am I doing wrong?
Stick with an HTTPS URL, and try fixing instead the error:
build directory needs to be absolute and non-root path
See this thread
I was running my CI on an old gitlab-ci-multi-runner 9.5.1.
I update to gitlab-runner 10.8.0 and now it’s ok.
Or this thread:
Set build_dir="C:\\gitlab-runner\\builds" in the config.toml.

ECS images are failing in gitlab-runner-autoscaling?

I am new bie to gitlab-runner, i have tried to setup gitlab-runner-autoscaling but i am unable to download ecr images in a build. When i try to ssh into docker-machine i am able to download images, i even tried to ssh into the VM and tried to pull ecr images as root and as ubuntu user(ubuntu 16.04 AMI), it only fails while running a build .
Please let me know how i can troubleshoot.
1. How can i find the command gitlab-runner is using to pull ecr image/
2. How to find the user its running the docker command.
Runner config:
[[runners]]
name = "registry-test4"
limit = 1
url = "http://gitlab.xxxxxxxx.com/"
token = "xxxxxxxxxxxxxxx"
executor = "docker+machine"
[runners.docker]
tls_verify = false
image = "ruby:2.1"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.machine]
IdleCount = 1
MachineDriver = "amazonec2"
MachineName = "gitlab-runner-ci-%s"
MachineOptions = ["amazonec2-iam-instance-profile=xxxxxxxxxxx", "amazonec2-ssh-user=ubuntu", "amazonec2-region=us-east-1", "amazonec2-instance-type=t2.large", "amazonec2-ami=ami-xxxxx", "amazonec2-vpc-id=vpc-xxxxx", "amazonec2-subnet-id=subnet-xxxxx", "amazonec2-zone=a", "amazonec2-root-size=32", "amazonec2-keypair-name=spot", "amazonec2-ssh-keypath=/root/.ssh/spot", "amazonec2-userdata=/etc/gitlab-runner/bootstrap.sh", "amazonec2-request-spot-instance=true", "amazonec2-security-group=docker_machine_git_as_prod", "amazonec2-security-group=consul-agent-prod", "amazonec2-private-address-only", "amazonec2-spot-price=x.xx"]
OffPeakPeriods = ["* * 5-11 * * mon-fri *", "* * * * * sat,sun *"]
OffPeakTimezone = ""
OffPeakIdleCount = 1
OffPeakIdleTime = 1200
Error:
Running with gitlab-runner 10.2.0 (0a75cdd1)
on registry-test4 (31b91ac3)
Using Docker executor with image xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/dev/sbt:latest ...
Using docker image sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxfor predefined container...
Pulling docker image xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/dev/sbt:latest ...
ERROR: Preparation failed: Error response from daemon: Get https://xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/v2/dev/sbt/manifests/latest: no basic auth credentials
Will be retried in 3s ...
.gitlab-ci.yml
---
main:
image: xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/dev/sbt:latest
script: sbt +runCI
Solved this issue , by installing ecr binary
https://github.com/awslabs/amazon-ecr-credential-helper
on gitlab-runner server passing these parameters in /root/.docker/config.json. (earlier ecr was installed only on the VM docker-machine was provisioning.)
{
"credsStore": "ecr-login"
}

Gitlab-runner docker container is using the Gitlab container_id as the clone url

I am trying to configure a simple Gitlab-ci build pipeline and am running all of the components in docker containers. I followed the general guides on docs.gitlab.com and got a runner registered with gitlab. But when a build kicks off, the runner tries to clone the repository in question and seems to use the gitlab instance's container-id in place of the url, and I get an unreachable-host error:
Cloning repository...
Cloning into '/builds/root/ci-demo'...
fatal: unable to access 'http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx#cdfd596f2bc4/root/ci-demo.git/': Could not resolve host: cdfd596f2bc4
ERROR: Job failed: exit code 1
Is there something obvious that I've overlooked? There are quite a few similar questions on SO and the internet in general, but none seem to have a problem with the target container-id being substituted for the url.
gitlab-runner's config.toml:
concurrent = 1
check_interval = 0
[[runners]]
name = "runner_name"
url = "http://[ipaddr]:[port]/"
token = "xxxxxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "maven:latest"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]

Resources