Rancher : Failed to find rancher-agent container - docker

Im trying to create a new host (DigitalOccean) using rancher UI, every thing was Good but at the end i got this error :
"Failed to find rancher-agent container"
Logs :
time="2017-07-20T09:55:57Z" level=info msg="stdout: Running pre-create checks..." resourceId: =1ph86 service=gms
time="2017-07-20T09:55:58Z" level=info msg="stdout: Creating machine..." resourceId: =1ph86 service=gms
time="2017-07-20T09:55:58Z" level=info msg="stdout: (oo) Creating SSH key..." resourceId: =1ph86 service=gms
time="2017-07-20T09:55:59Z" level=info msg="stdout: (oo) Creating Digital Ocean droplet..." resourceId: =1ph86 service=gms
time="2017-07-20T09:56:00Z" level=info msg="stdout: (oo) Waiting for IP address to be assigned to the Droplet..." resourceId: =1ph86 service=gms
...
time="2017-07-20T09:57:31Z" level=info msg="pulling rancher/agent:v1.2.5 image." service=gms
time="2017-07-20T09:57:43Z" level=info msg="Container created for machine" containerId=5bef89f75de6fc256f0adbe1cc9c7138292aaa4bd7d8446546d208823cd8b22f machineId=1ph86 resourceId=1ph86 service=gms
time="2017-07-20T09:58:46Z" level=error msg="Failed to find rancher-agent container" machineId=1ph86 resourceId=1ph86 service=gms
time="2017-07-20T09:58:46Z" level=error msg="Error processing event" err="Failed to find rancher-agent container" eventId=08649e06-ddcd-445d-b120-91c0e7498835 eventName="physicalhost.bootstrap;handler=goMachineService" resourceId=1ph86
any idea ?

You should make sure that the NEW HOST has the Docker Image of rancher/agent:v1.2.5.
By using the command below:
# sudo docker images|grep rancher/agent
rancher/agent v1.2.2 6777bc8a1147 3 months ago 233.7 MB
If the host does not has the docker image, get it by using sudo docker pull rancher/agent:v1.2.5
Then checking the container logs of rancher agent.
By using the command below:
# sudo docker ps -a |grep rancher/agent
1c03d064165c rancher/agent:v1.2.2 "/run.sh run" 5 days ago Up 5 days rancher-agent
# sudo docker logs 1c03d064165c
If you find that container, even its status is Exited\Created\Dead, read its logs carefully and find the solutions of that BUG by asking for Google\Github;
IF you can NOT find that container, read the file of docker-compose.yml and rancher-compose.yml, make sure that you get the right docker image, include the right image version.

Related

Unable to fix error Cannot connect to the Docker daemon at tcp://localhost:2375/. Is the docker daemon running? for remote GitLab runner

I am struggling to resolve the issue
Cannot connect to the Docker daemon at tcp://localhost:2375/. Is the docker daemon running?
I am using our companies GitLab EE instance, which comes with a bunch of shared group runners. However I would like to be able to use my own runners especially since I will be able to employ the GPU for some machine learning tasks. I have the following .gitlab-ci.yml:
run_tests:
image: python:3.9-slim-buster
before_script:
- apt-get update
- apt-get install make
script:
- python --version
- pip --version
- make test
build_image:
image: docker:20.10.23
services:
- docker:20.10.23-dind
variables:
DOCKER_TLS_CRETDIR: "/certs"
DOCKER_HOST: tcp://localhost:2375/
before_script:
- echo "User $REGISTRY_USER"
- echo "Token $ACCESS_TOKEN"
- echo "Host $REGISTRY_HOST_ALL"
- echo "$ACCESS_TOKEN" | docker login --username $REGISTRY_USER --password-stdin $REGISTRY_HOST_ALL
script:
- docker build --tag $REGISTRY_HOST_ALL/<PATH_TO_USER>/python-demoapp .
- docker push $REGISTRY_HOST_ALL/<PATH_TO_USER>/python-demoapp
The application is currently a demo and it's used in the following tutorial. Note that <PATH_TO_USER> in the above URLs is just a placeholder (I cannot reveal the original one since it contains internal information) and points at my account space, where the project python-demoapp is located. With untagged jobs enabled, I am hoping to have the following workflow:
Push application code change
GitLab pipeline triggered
2.1 Execute tests
2.2 Build image
2.3 Push image to container repository
Re-use image with application inside (e.g. run locally)
I have setup the variables accordingly to contain my username, an access token (generated in GitLab) and the registry host. All of these are correct and I am able to execute everything up to the docker build ... section.
Now as for the runner I followed the instructions provided in GitLab to set it up. I chose to create a VM (QEMU+KVM+libvirt) with a standard minimal installation of Debian 11 with everything set to default (including NAT network, which appears to be working since I can access the Internet through it), where the runner currently resides. I am doing this in order to save the setup and later on transfer it onto a server and run multiple instances of the VM with slight modification (e.g. GPU passthrough for Nvidia CUDA Docker/Podman setup).
Beside the runner (binary was downloaded from our GitLab instance), I installed Docker CE (in the future will be replaced with Podman due to licensing and pricing) following the official instructions. The Docker executor is ran as a systemd service (docker.service, docker.socket), that is I need sudo to interact with it. The runner has its own user (also part of the sudo group) as the official documentation is telling me to do.
The GitLab runner's configuration file gitlab-runner-config.toml contains the following information:
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "Test runner (Debian 11 VM, Docker CE, personal computer)"
url = "<COMPANY_GITLAB_INSTANCE_URL>"
id = <RUNNER_ID>
token = "<ACCESS_TOKEN>"
token_obtained_at = 2023-01-24T09:18:33Z
token_expeires_at = 2023-02-01T00:00:00Z
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "python:3.9-slim-buster"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
cache_dir = "/cache"
volumes = ["/cache", "/certs/client", "/var/run/docker.sock"]
shm_size = 0
The configuration file was generated by running
sudo gitlab-runner register --url <COMPANY_GITLAB_INSTANCE_URL> --registration-token <ACCESS_TOKEN>
I added the extra cache volumes beside /cache, the cache_dir and changed priveleged to true` (based on my research). All for this based on various posts (including Docker's own issue tracking system) from people having the same issue.
I have made sure that dockerd is listening on the respective port (see comment below for the original poster):
$ sudo ss -nltp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=601,fd=3))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=601,fd=4))
LISTEN 0 4096 *:2375 *:* users:(("dockerd",pid=618,fd=9))
In addition I have added export DOCKER_HOST=tcp://0.0.0.0:2375 to the .bashrc of ever user out there (except root - perhaps that's the problem?) including the gitlab-runner user.
The Dockerfile within the repository contains the following:
FROM python:3.9-slim-buster
RUN apt-get update && apt-get install make
The log file from the CICD pipeline for this job is (trimmed down) as follows:
Running with gitlab-runner 15.8.0 (12335144)
on Test runner (Debian 11 VM, Docker CE, personal computer) <IDENTIFIER>, system ID: <SYSTEM_ID>
Preparing the "docker" executor 02:34
Using Docker executor with image docker:20.10.23 ...
Starting service docker:20.10.23-dind ...
Pulling docker image docker:20.10.23-dind ...
Using docker image sha256:70ae571e74c1d711d3d5bf6f47eaaf6a51dd260fe0036c7d6894c008e7d24297 for docker:20.10.23-dind with digest docker#sha256:85a1b877d0f59fd6c7eebaff67436e26f460347a79229cf054dbbe8d5ae9f936 ...
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-dbms-tss-project-42787-concurrent-0-b0bbcfd1a821fc06-docker-0 probably didn't start properly.
Health check error:
service "runner-dbms-tss-project-42787-concurrent-0-b0bbcfd1a821fc06-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2023-01-26T10:09:30.933962365Z Certificate request self-signature ok
2023-01-26T10:09:30.933981575Z subject=CN = docker:dind server
2023-01-26T10:09:30.943472545Z /certs/server/cert.pem: OK
2023-01-26T10:09:32.607191653Z Certificate request self-signature ok
2023-01-26T10:09:32.607205915Z subject=CN = docker:dind client
2023-01-26T10:09:32.616426179Z /certs/client/cert.pem: OK
2023-01-26T10:09:32.705354066Z time="2023-01-26T10:09:32.705227099Z" level=info msg="Starting up"
2023-01-26T10:09:32.706355355Z time="2023-01-26T10:09:32.706298649Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2023-01-26T10:09:32.707357671Z time="2023-01-26T10:09:32.707318325Z" level=info msg="libcontainerd: started new containerd process" pid=72
2023-01-26T10:09:32.707460567Z time="2023-01-26T10:09:32.707425103Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2023-01-26T10:09:32.707466043Z time="2023-01-26T10:09:32.707433214Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2023-01-26T10:09:32.707468621Z time="2023-01-26T10:09:32.707445818Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
2023-01-26T10:09:32.707491420Z time="2023-01-26T10:09:32.707459517Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2023-01-26T10:09:32.768123834Z time="2023-01-26T10:09:32Z" level=warning msg="containerd config version `1` has been deprecated and will be removed in containerd v2.0, please switch to version `2`, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-header"
2023-01-26T10:09:32.768761837Z time="2023-01-26T10:09:32.768714616Z" level=info msg="starting containerd" revision=5b842e528e99d4d4c1686467debf2bd4b88ecd86 version=v1.6.15
2023-01-26T10:09:32.775684382Z time="2023-01-26T10:09:32.775633270Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
2023-01-26T10:09:32.775764839Z time="2023-01-26T10:09:32.775729470Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.779824244Z time="2023-01-26T10:09:32.779733556Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.779836825Z time="2023-01-26T10:09:32.779790644Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.779932891Z time="2023-01-26T10:09:32.779904447Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.779944348Z time="2023-01-26T10:09:32.779929392Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.779958443Z time="2023-01-26T10:09:32.779940747Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
2023-01-26T10:09:32.779963141Z time="2023-01-26T10:09:32.779951447Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.780022382Z time="2023-01-26T10:09:32.780000266Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.780134525Z time="2023-01-26T10:09:32.780107812Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.780499276Z time="2023-01-26T10:09:32.780466045Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
2023-01-26T10:09:32.780507315Z time="2023-01-26T10:09:32.780489797Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
2023-01-26T10:09:32.780548237Z time="2023-01-26T10:09:32.780529316Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
2023-01-26T10:09:32.780552144Z time="2023-01-26T10:09:32.780544232Z" level=info msg="metadata content store policy set" policy=shared
2023-01-26T10:09:32.795982271Z time="2023-01-26T10:09:32.795854170Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
2023-01-26T10:09:32.795991535Z time="2023-01-26T10:09:32.795882407Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
2023-01-26T10:09:32.795993243Z time="2023-01-26T10:09:32.795894367Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
2023-01-26T10:09:32.795994639Z time="2023-01-26T10:09:32.795932065Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.795996061Z time="2023-01-26T10:09:32.795949931Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.795997456Z time="2023-01-26T10:09:32.795963627Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796001074Z time="2023-01-26T10:09:32.795983562Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796219139Z time="2023-01-26T10:09:32.796194319Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796231068Z time="2023-01-26T10:09:32.796216520Z" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796240878Z time="2023-01-26T10:09:32.796228403Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796254974Z time="2023-01-26T10:09:32.796239993Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.796261567Z time="2023-01-26T10:09:32.796252251Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
2023-01-26T10:09:32.796385360Z time="2023-01-26T10:09:32.796360610Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
2023-01-26T10:09:32.796451372Z time="2023-01-26T10:09:32.796435082Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
2023-01-26T10:09:32.797042788Z time="2023-01-26T10:09:32.796984264Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
2023-01-26T10:09:32.797093357Z time="2023-01-26T10:09:32.797073997Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797100437Z time="2023-01-26T10:09:32.797091084Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
2023-01-26T10:09:32.797148696Z time="2023-01-26T10:09:32.797138286Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797164876Z time="2023-01-26T10:09:32.797153186Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797176732Z time="2023-01-26T10:09:32.797165488Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797187328Z time="2023-01-26T10:09:32.797176464Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797208889Z time="2023-01-26T10:09:32.797196407Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797220812Z time="2023-01-26T10:09:32.797209290Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797232031Z time="2023-01-26T10:09:32.797221051Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797242686Z time="2023-01-26T10:09:32.797231676Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797254415Z time="2023-01-26T10:09:32.797243815Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
2023-01-26T10:09:32.797484534Z time="2023-01-26T10:09:32.797456547Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797500729Z time="2023-01-26T10:09:32.797487444Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797524336Z time="2023-01-26T10:09:32.797502098Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
2023-01-26T10:09:32.797535447Z time="2023-01-26T10:09:32.797526933Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
2023-01-26T10:09:32.797562995Z time="2023-01-26T10:09:32.797539848Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
2023-01-26T10:09:32.797570791Z time="2023-01-26T10:09:32.797558864Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
2023-01-26T10:09:32.797589770Z time="2023-01-26T10:09:32.797579849Z" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
2023-01-26T10:09:32.797766243Z time="2023-01-26T10:09:32.797741256Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
2023-01-26T10:09:32.797805542Z time="2023-01-26T10:09:32.797792483Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
2023-01-26T10:09:32.797836935Z time="2023-01-26T10:09:32.797820296Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
2023-01-26T10:09:32.797854712Z time="2023-01-26T10:09:32.797842891Z" level=info msg="containerd successfully booted in 0.029983s"
2023-01-26T10:09:32.802286356Z time="2023-01-26T10:09:32.802232926Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2023-01-26T10:09:32.802291484Z time="2023-01-26T10:09:32.802269035Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2023-01-26T10:09:32.802322916Z time="2023-01-26T10:09:32.802306355Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
2023-01-26T10:09:32.802369464Z time="2023-01-26T10:09:32.802323232Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2023-01-26T10:09:32.803417318Z time="2023-01-26T10:09:32.803366010Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2023-01-26T10:09:32.803424723Z time="2023-01-26T10:09:32.803376046Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2023-01-26T10:09:32.803426453Z time="2023-01-26T10:09:32.803384392Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
2023-01-26T10:09:32.803428210Z time="2023-01-26T10:09:32.803389450Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2023-01-26T10:09:32.837720263Z time="2023-01-26T10:09:32.837658881Z" level=info msg="Loading containers: start."
2023-01-26T10:09:32.886897024Z time="2023-01-26T10:09:32.886828923Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address"
2023-01-26T10:09:32.920867085Z time="2023-01-26T10:09:32.920800006Z" level=info msg="Loading containers: done."
2023-01-26T10:09:32.944768798Z time="2023-01-26T10:09:32.944696558Z" level=info msg="Docker daemon" commit=6051f14 graphdriver(s)=overlay2 version=20.10.23
2023-01-26T10:09:32.944804324Z time="2023-01-26T10:09:32.944774928Z" level=info msg="Daemon has completed initialization"
2023-01-26T10:09:32.973804146Z time="2023-01-26T10:09:32.973688991Z" level=info msg="API listen on /var/run/docker.sock"
2023-01-26T10:09:32.976059008Z time="2023-01-26T10:09:32.975992051Z" level=info msg="API listen on [::]:2376"
*********
Pulling docker image docker:20.10.23 ...
Using docker image sha256:25deb61ef2709b05249ad4e66f949fd572fb43d67805d5ea66fe3f86766b5cef for docker:20.10.23 with digest docker#sha256:2655039c6abfc8a1d75978c5258fccd5c5cedf880b6cfc72077f076d0672c70a ...
Preparing environment 00:00
Running on runner-dbms-tss-project-42787-concurrent-0 via debian...
Getting source from Git repository 00:02
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/<PATH_TO_USER>/python-demoapp/.git/
Checking out 93e494ea as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:25deb61ef2709b05249ad4e66f949fd572fb43d67805d5ea66fe3f86766b5cef for docker:20.10.23 with digest docker#sha256:2655039c6abfc8a1d75978c5258fccd5c5cedf880b6cfc72077f076d0672c70a ...
$ echo "User $REGISTRY_USER"
User [MASKED]
$ echo "Token $ACCESS_TOKEN"
Token [MASKED]
$ echo "Host $REGISTRY_HOST_ALL"
Host ..............
$ echo "$ACCESS_TOKEN" | docker login --username $REGISTRY_USER --password-stdin $REGISTRY_HOST_ALL
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
$ docker build --tag $REGISTRY_HOST_ALL/<PATH_TO_USER>/python-demoapp .
Cannot connect to the Docker daemon at tcp://localhost:2375/. Is the docker daemon running?
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1
From my understanding I need two images here:
The python-capable one - here the official Python image from Docker Hub, which is used to run the tests as well as for the image that is added to the container registry
The Docker DinD one - this is the Docker in Docker setup, which allows building a Docker image inside a running Docker container.
The second one is way above my head and it's the (for me) obvious culprit for my headaches.
Perhaps important additional information: my computer is outside our company's network. The GitLab instance is accessible externally through user authentification (username + password for the WebUI, access tokens and SSH keys otherwise).
Do I need two separate runners? I have seen a lot of examples, where people are using a single runner to do multiple jobs including testing and image building (even packaging) so I don't believe I do. I am not really a Docker expert as you can probably tell. :D If more information is required, please let me know in the comments below, especially if I am overdoing it and there is a much easier way to accomplish what I am trying to.
DISCUSSION
Health check error regarding Docker volume
I can see the following error in the log posted above:
Health check error:
service "runner-dbms-tss-project-42787-concurrent-0-b0bbcfd1a821fc06-docker-0-wait-for-service" timeout
The footprint looked familiar so I went back to check some old commands I executed and apparently this is a Docker volume. However on my host
$ docker volume ls
DRIVER
local runner-...415a70
local runner-...66cea8
neither volumes have that name. So I am guessing that this is a volume that is created by Docker in Docker.
Adding hosts to JSON configuration file for Docker daemon
I added the following configuration at /etc/systemd/system/docker.service.d/90-docker.conf:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --config-file /etc/docker/daemon.json
with daemon.json containing the following:
{
"hosts": [
"tcp://0.0.0.0:2375",
"unix:///var/run/docker.sock"
]
}
Now I am noticing an additional error in the job's log:
failed to load listeners: can't create unix socket /var/run/docker.sock: is a directory
On my host I checked and the path is an actual socket file (information retrieved by executing file command on the path). This means that the issues is again inside the Docker container, that is part of the DinD. I have read online that apparently Docker would automatically create the path and it will be a directory for some reason.
In addition the above mentioned error in the original question has now changed to
unable to resolve docker endpoint: Invalid bind address format: http://localhost:2375/
even though I cannot find any http://localhost:2375 entry on my host, leading again to the conclusion that something with the DinD setup went wrong.

Cannot connect to the Docker daemon after failed pull

When I try to pull a certain docker image, my pull fails, and then prevents me from connecting to the docker deamon again until I reboot my laptop. The Image in question is an official Jupyter images which works fine on my other machine. Restarting the Deamon does not help, but rebooting my laptop does.
I tried to docker system prune -a already, that's why there are no images on my laptop anymore. Does somebody have an idea how to fix this problem?
I think the problem might be connected to one of the images not finishing it's extraction.
EDIT
I have the same problem with a alpine image. see below
me#mylaptop $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
me#mylaptop $ docker pull jupyter/datascience-notebook
Using default tag: latest
latest: Pulling from jupyter/datascience-notebook
e6ca3592b144: Extracting [==================================================>] 28.56MB/28.56MB
534a5505201d: Download complete
990916bd23bb: Download complete
979cd14ae800: Download complete
5e8b9f8fa9e0: Download complete
6f224ed88dc4: Download complete
6ee9ec4a62a8: Download complete
7a1ae22ba760: Download complete
a1602338a8d7: Download complete
fce5135a7ea1: Download complete
e62a1c9017ef: Download complete
a5049ad1c512: Download complete
ec06c1612b0a: Download complete
acceda87b341: Download complete
939052532b6f: Download complete
d2dee4cc07fe: Download complete
4fe5e9dd4fad: Download complete
8fd08517e0c6: Download complete
7105a3ca8c38: Download complete
66c0798f609e: Download complete
94f3fc35ed38: Download complete
aa68263474a3: Download complete
6e7d1433394b: Download complete
f5902e69d9b7: Download complete
490bb991b4de: Download complete
fab6e92b04fa: Download complete
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root297865553: device or resource busy
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ sudo systemctl start docker
me#mylaptop $ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-09-30 08:11:12 CEST; 15min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 908 (dockerd)
Tasks: 10
Memory: 140.8M
CGroup: /system.slice/docker.service
└─908 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992433459+02:00" level=info msg="Loading containers: start."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.227615723+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can b>
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.296603004+02:00" level=info msg="Loading containers: done."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.486944893+02:00" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: >
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.487273874+02:00" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=19.03.12-ce
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.491959213+02:00" level=info msg="Daemon has completed initialization"
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.530816090+02:00" level=info msg="API listen on /run/docker.sock"
sep 30 08:11:12 mylaptop systemd[1]: Started Docker Application Container Engine.
sep 30 08:23:36 mylaptop dockerd[908]: time="2020-09-30T08:23:36.941202710+02:00" level=info msg="Attempting next endpoint for pull after error: failed to register layer: Error processing tar fi>
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ docker pull alpine:3.12.0
3.12.0: Pulling from library/alpine
df20fa9351a1: Extracting [==================================================>] 2.798MB/2.798MB
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root517304538: device or resource busy
Solved it. The problem is that my kernel was/became to old.
The warning below by systemctl brought made me find this post on forums.docker.com
me#mylaptop $ systemctl status docker
...
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
...
I'm running Manjaro so I upgrade my kernel with this command:
sudo mhwd-kernel -i linux54
After which docker worked again.

Running Docker in Ubuntu Bionic Container

I am installing Docker CE on the latest Ubuntu Docker image and getting the following error. I followed installation instructions carefully; maybe installing Docker on a Docker container is not the way to go about this? I'm working with Jenkins Pipelines and have Jenkins installed on the Ubuntu container; the next piece is to get Docker running.
time="2018-10-26T13:25:09.920187300Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2018-10-26T13:25:09.920228600Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc
time="2018-10-26T13:25:09.920250500Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2018-10-26T13:25:09.920286200Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420047e60, CONNECTING" module=grpc
time="2018-10-26T13:25:09.920480100Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420047e60, READY" module=grpc
time="2018-10-26T13:25:09.920501400Z" level=info msg="Loading containers: start."
time="2018-10-26T13:25:09.920666400Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: , error: exec: \"modprobe\": executable file not found in $PATH"
time="2018-10-26T13:25:09.920704800Z" level=warning msg="Running modprobe nf_nat failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
time="2018-10-26T13:25:09.920733300Z" level=warning msg="Running modprobe xt_conntrack failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
Error starting daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.6.1: can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
(exit status 3)
A typical Docker container is run with a restricted set of permissions. Even if you are root in the container, you cannot modify the network configuration, nor can you mount filesystems. So the error you are seeing...
Error starting daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.6.1: can't initialize iptables table `nat': Permission denied (you must be root)
...is happening because of that restriction. You can create an
unrestricted container by creating with your container with:
docker run --privileged ...
You may be able to use something slightly more granular and grant
the NET_ADMIN capability, as in:
docker run --cap-add NET_ADMIN ...
This will work as long as the only "special" privilege required by the container is network configuration.

How to fix docker daemon that will not restart due to hns error

Docker for Windows Server
Windows Server version 1709, with containers
Docker version 17.06.2-ee-6, build e75fdb8
Swarm mode (worker node, part of swarm with ubuntu masters)
After containers connected to an overlay network started intermittently losing their network adapters, I restarted the machine. Now daemon will not start. Below is the last lines of output from running docker -D.
Please let me know how to fix this.
time="2018-05-15T15:10:06.731160000Z" level=debug msg="Option Experimental: false"
time="2018-05-15T15:10:06.731160000Z" level=debug msg="Option DefaultDriver: nat"
time="2018-05-15T15:10:06.731160000Z" level=debug msg="Option DefaultNetwork: nat"
time="2018-05-15T15:10:06.734183700Z" level=info msg="Restoring existing overlay networks from HNS into docker"
time="2018-05-15T15:10:06.735174400Z" level=debug msg="[GET]=>[/networks/] Request : "
time="2018-05-15T15:12:06.789120400Z" level=debug msg="Network (d4d37ce) restored"
time="2018-05-15T15:12:06.796122200Z" level=debug msg="Endpoint (4114b6e) restored to network (d4d37ce)"
time="2018-05-15T15:12:06.796122200Z" level=debug msg="Endpoint (819eb70) restored to network (d4d37ce)"
time="2018-05-15T15:12:06.797124900Z" level=debug msg="Endpoint (ade55ea) restored to network (d4d37ce)"
time="2018-05-15T15:12:06.798125600Z" level=debug msg="Endpoint (d0054fc) restored to network (d4d37ce)"
time="2018-05-15T15:12:06.798125600Z" level=debug msg="Endpoint (e2af8d8) restored to network (d4d37ce)"
time="2018-05-15T15:12:06.854118500Z" level=debug msg="[GET]=>[/networks/] Request : "
time="2018-05-15T15:14:06.860654000Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
Error starting daemon: Error initializing network controller: hnsCall failed in Win32: Server execution failed (0x80080005)
Here is complete set of steps to completely rebuild all docker issues withing swarm host. Sometimes only some steps are sufficient (specifically hns part), so you can try those first.
Remove all docker services and user-defined networks (so all docker networks except `nat` and `none`
Leave the swarm cluster (docker swarm leave --force)
Stop the docker service (PS C:\> stop-service docker)
Stop the HNS service (PS C:\> stop-service hns)
In regedit, delete all of the registry keys under these paths:
HKLM:\SYSTEM\CurrentControlSet\Services\vmsmp\parameters\SwitchList
HKLM:\SYSTEM\CurrentControlSet\Services\vmsmp\parameters\NicList
Now go to Device Manager, and disable then remove all network adapters that are “Hyper-V Virtual Ethernet…” adapters
Now rename your HNS.data file (the goal is to effectively “delete” it by renaming it):
C:\ProgramData\Microsoft\Windows\HNS\HNS.data
Also rename C:\ProgramData\docker folder (the goal is to effectively “delete” it by renaming it)
C:\ProgramData\docker
Now reboot your machine

Docker fails to start after install with "loopback attach failed"

I have installed docker-ce from repository following instructions at:
https://docs.docker.com/install/linux/docker-ce/centos/
I receive an error attempting to start docker:
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
journalctl has the following:
...
dockerd[3647]: time="2018-02-05T14:47:05-08:00" level=info msg="containerd successfully booted in 0.002946s" module=containerd
dockerd[3647]: time="2018-02-05T14:47:05.456552594-08:00" level=error msg="There are no more loopback devices available."
dockerd[3647]: time="2018-02-05T14:47:05.456585240-08:00" level=error msg="[graphdriver] prior storage driver devicemapper failed: loopback attach failed"
dockerd[3647]: Error starting daemon: error initializing graphdriver: loopback attach failed
systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Docker Application Container Engine.
I have seen articles about using something other than loopback devices, but as far as I can tell, those indicate an optimization to be made - and do not imply that the initial startup should fail.
CentOS Linux release 7.4.1708 (Core)
If you run Linux in a VM on Xen, you need to install the kernel and use pygrub (see https://wiki.debian.org/PyGrub) and update to docker version 19.03.0.
install pygrub
1. In your VM execute:
mkdir /boot/grub
apt-get install -y linux-image-amd64
cat > /boot/grub/menu.lst << EOF
default 0
timeout 2
title Debian GNU/Linux
root (hd0,0)
kernel /vmlinuz root=/dev/xvda2 ro
initrd /initrd.img
title Debian GNU/Linux (recovery mode)
root (hd0,0)
kernel /vmlinuz root=/dev/xvda2 ro single
initrd /initrd.img
EOF
2. halt your VM, for example:
xen destroy vm01
3. edit your xen config
for example for your VM /etc/xen/vm01.cfg in your DOM0 (comment out the first two lines and add the last three):
#kernel = '/boot/vmlinuz-4.9.0-9-amd64'
#ramdisk = '/boot/initrd.img-4.9.0-9-amd64'
extra = 'elevator=noop'
bootloader = '/usr/lib/xen-4.8/bin/pygrub'
bootloader_args = [ '--kernel=/vmlinuz', '--ramdisk=/initrd.img', ]
4. start your vm:
xen create /etc/xen/vm01.cfg
I have the same problem in a Debian 9 VM and the same in Debian 8 VM both on the same Debian XEN 4.8 host.
loopback seems not to exist:
# losetup -f
losetup: cannot find an unused loop device: No such device
You can create those with
#!/bin/bash
ensure_loop(){
num="$1"
dev="/dev/loop$num"
if test -b "$dev"; then
echo "$dev is a usable loop device."
return 0
fi
echo "Attempting to create $dev for docker ..."
if ! mknod -m660 $dev b 7 $num; then
echo "Failed to create $dev!" 1>&2
return 3
fi
return 0
}
ensure_loop 0
ensure_loop 0
But this is just a tip to find the right solution, it didn't solve it completely, now since /dev/loop0 exists, I have the error:
Error opening loopback device: open /dev/loop0: no such device or address
[graphdriver] prior storage driver devicemapper failed: loopback attach failed
Update:
I installed apt-get install docker-ce docker-ce-cli containerd.io like described in the latest docs and now with the latest version:
$ docker --version
Docker version 19.03.0, build aeac9490dc
still the same issue:
failed to start daemon: error initializing graphdriver: loopback attach failed
This is the full log:
level=info msg="Starting up"
level=warning msg="failed to rename /var/lib/docker/tmp for background deletion: rename /var/lib/docker/tmp
/var/lib/docker/tmp-old: file exists. Deleting synchronously"
level=info msg="parsed scheme: \"unix\"" module=grpc
level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}
] }" module=grpc
level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0005e8660, CONNECTING" module=grpc
level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0005e8660, READY" module=grpc
level=info msg="parsed scheme: \"unix\"" module=grpc
level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}
] }" module=grpc
level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0007f5b10, CONNECTING" module=grpc
level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0007f5b10, READY" module=grpc
level=error msg="There are no more loopback devices available."
level=error msg="[graphdriver] prior storage driver devicemapper failed: loopback attach failed"
failed to start daemon: error initializing graphdriver: loopback attach failed
Update 2:
In the end I found out, that pygrub was missing in the VM, which seems to be a new dependency since some version.
This answer was a dead end path, I added another answer, but I leave this here for other users, that have a different problem to get some hints.
I have meet this issue too. I resolved this issue!
In my VMWare workstation, the VM have TWO virtual network interfaces.
I removed one of the virtual network interfaces, and reserved only one.
Start VMWare workstation,start docker service, it works successfully!
I installed docker on CentOS7.6(1810),but when I start docker:
#systemctl start docker
Docker starts failed.
#journalctl -xe
It show some messages like "start daemon: error initializing graphdriver: loopback attach failed".

Resources