Docker container exit immediately - docker

I've troubles with Docker EE, I've successfully started last week 3 containers I've made, now I need to run a simple nodejs container, I did. docker run -d node but it exists immediately and I've got the following error
time="2020-11-16T11:25:05+01:00" level=error msg="Error waiting for container: failed to shutdown container: container 8a5e6905d6432a9e0ab4dc46b50654e6afe4a6f297dd478d4b07b0dd69e00009 encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110): subsequent terminate failed container 8a5e6905d6432a9e0ab4dc46b50654e6afe4a6f297dd478d4b07b0dd69e00009 encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110)"
I'm under Windows 2019 Server Standard. Where can I start looking at?

Related

podman wsl communication issue

I am switching from docker to podman currently. Usually that works just fine. However on one of my many company laptops I ran into the following error:
PS C:\WINDOWS\system32> podman pull quay.io/podman/hello
Trying to pull quay.io/podman/hello:latest...
Error: initializing source docker://quay.io/podman/hello:latest: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp 54.163.152.191:443: i/o timeout
The above error I also get with other container registries. I tried:
Tried:
podman machine set --rootful
removing hyper-v and wsl
changing resolv.conf and adding nameserver
(tried also 8.8.8.8)
looked into symantec endpoint protection logs
(connection is not blocked)
switched between wsl 1 and 2
also tried some stuff from this thread (cf. No internet connection on WSL Ubuntu (Windows Subsystem for Linux))
I also do not get any internet inside e.g. an Ubuntu WSL VM. In Powershell running e.g. curl google.com works just fine
For completeness sake with the third option changes I get:
podman pull quay.io/podman/hello
Trying to pull quay.io/podman/hello:latest...
Error: initializing source docker://quay.io/podman/hello:latest: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io: Temporary failure in name resolution
Update:
I reinstalled Docker and get a similar issue
docker container run hello-world
Unable to find image 'hello-world:latest' locally
docker: initializing source docker://hello-world:latest: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution.
See 'docker run --help'.

Why does dockerd on a node get bad?

After a few days of running dockerd on a kubernetes host, where pods are scheduled by kubelet, dockerd goes bad - consuming a lot of resources (50% memory - ~4gigs).
When it gets to this state, it is unable to act on commands for containers that appear to be running via $ docker ps. Also checking ps -ef on the host these containers don't map to any underlying host processes.
$ docker exec yields -
level=error msg="Error running exec in container: rpc error: code = 2 desc = containerd: container not found"
Cannot kill container 6a8d4....8: rpc error: code = 14 desc = grpc: the connection is unavailable"
level=fatal msg="open /var/run/docker/libcontainerd/containerd/7657...4/65...6/process.json: no such file or directory"
Looking through the process tree on the host there seem to be a lot of defunct processes which point to dockerd as the parent id. Any pointers on what the issue might be or where to look further?
Enabled debug on dockerd to see if the issue re-occurs, a dockerd restart fixes the issue.
Sounds like you have a container misbehaving and docker is not able to reap it. I would take a look at what has been scheduled on the nodes where you see the problem. The error you are seeing seems like the docker daemon not responding to API requests issued by the docker CLI. Some pointers:
Has the container exited successfully or with an error?
Did they containers get killed for some reason?
Check the kubelet logs
Check the kube-scheduler logs?
Follow the logs in the containers on your node docker logs -f <containerid>

Docker Swarm node fails to join after reboot. State 'pending'

I've been able to successfully set up a Docker Swarm cluster with one manager and two worker nodes by running swarm init and swarm join respectively. docker node ls then shows that all nodes are active. However, if one of the worker nodes restarts it won't be able to join back in. Running docker node ls on the manager now shows that the newly restarted node is in state pending. I've enabled debugging and running systemctl status docker-latest -l on the failing-to-join worker node shows lots of these:
level=error msg="agent: session failed" error="rpc error: code = 13 desc = connection error: desc = \"transport: tls: oversized record received with length 20527\"" module="node/agent"
OS: Red Hat Enterprise Linux Server release 7.5
Docker version 1.13.1, build 8633870/1.13.1 (installed package docker-latest from repository) Also tried the regular docker package with no difference.

IBM Cloud Private node appears to be running but services are unresponsive

One my ICP nodes appears to be running, but the services on that node are unresponsive and will at times return a 504 Gateway Timeout.
When I SSH into the unresponsive node and run journalctl -u kubelet -f I am seeing error messages such as transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused
Furthermore, when I run top I'm seeing dockerd using an usually high percentage of my CPU.
What is causing this behavior and how can I return my node to its normal working condition?
These errors might be due to a known issue with Docker where an old containerd reference is used even after the containerd daemon was restarted. This defect causes the Docker daemon to go into an internal error loop that uses a high amount of CPU resources and logs a high number of errors. For more information about this error, please see the Refresh containerd remotes on containerd restarted pull request against the Moby project.
To work around this issue, use the host operating system command to restart the docker service on the node. After some time, the services should resume.

DevOps: automatically restarting a failed container

What is a light-wight approach to restart a failed docker container automatically -- that is, without having to install and setup tools like Swarm or Kubernetes?
I am asking because I need to have some resilience for a running container in the event the container "stops" as a result of failure of the process that it's running.
Check first if you can add restart policies to your docker run command.
They are the built-in Docker mechanism for restarting containers when they exit.
If set, restart policies will be used when the Docker daemon starts up, as typically happens after a system boot.
For instance:
on-failure[:max-retries]
Restart only if the container exits with a non-zero exit status.
Optionally, limit the number of restart retries the Docker daemon attempts.
If not, see "Automatically start containers"

Resources