IBM Cloud Private node appears to be running but services are unresponsive - docker

One my ICP nodes appears to be running, but the services on that node are unresponsive and will at times return a 504 Gateway Timeout.
When I SSH into the unresponsive node and run journalctl -u kubelet -f I am seeing error messages such as transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused
Furthermore, when I run top I'm seeing dockerd using an usually high percentage of my CPU.
What is causing this behavior and how can I return my node to its normal working condition?

These errors might be due to a known issue with Docker where an old containerd reference is used even after the containerd daemon was restarted. This defect causes the Docker daemon to go into an internal error loop that uses a high amount of CPU resources and logs a high number of errors. For more information about this error, please see the Refresh containerd remotes on containerd restarted pull request against the Moby project.
To work around this issue, use the host operating system command to restart the docker service on the node. After some time, the services should resume.

Related

Hitting docker rate limit without pulling at all

I have a computer that is running docker. Now I get the error toomanyrequests when I try to pull an image. The twist is, I get this error if Docker is just running and I do not pull anything. So by waiting I never get to pull anything, except if I change my IP. If I get a fresh IP, I can pull without a problem. But after a few hours, I cannot pull anymore from the IP that the computer that is running Docker is using.
To my knowledge, I do not have any other software running that should provoke a pull. Is there anything from Docker itself, that contact docker hub and is causing the rate limit to kick-in. I just have 3 simple services running in Docker: A web proxy, a database and keycload. This is on a VM running Ubuntu 22.04.
There are no other machine on my network that are running Docker. If I start other machines and start Docker there, this problem does not occur. For example, I can start Docker Desktop on another machine and pull lots of stuff and leave it running, I do not get toomanyrequests.
Can anyone offer an explanation what is causing this? How can I fix this?

Unable to make Docker container use OpenConnect VPN connection

I have a VM running Ubuntu 16.04, on which I want to deploy an application packaged as a Docker container. The application needs to be able to perform an HTTP request towards a server under VPN (e.g. server1.vpn-remote.com)
I successfully configured the host VM in order to connect to the VPN through openconnect, I can turn this connection on/off using a systemd service.
Unfortunately, when I run docker run mycontainer, neither the host nor the container are able to reach server1.vpn-remote.com. Weirdly enough, there is no error displayed in the VPN connection service logs, which is stuck to the openconnect messages confirming a successful connection.
If I restart the VPN connection after starting mycontainer, the host machine is able to access server1.vpn-remote.com, but not the container. Moreover, if I issue any command like docker run/start/stop/restart on mycontainer or any other container, the connection gets broken again even for the host machine.
NOTE: I already checked on the ip routes and there seems to be no conflict between Docker and VPN subnets.
NOTE: running the container with --net="host" results in both host and container being able to access the VPN but I would like to avoid this option as I will eventually make a docker compose deployment which requires all containers to run in bridge mode.
Thanks in advance for your help
EDIT: I figured out it is a DNS issue, as I'm able to ping the IP corresponding to server1.vpn-remote.com even after the VPN connection seemed to be failing. I'm going through documentation regarding DNS management with Docker and Docker Compose and their usage of the host's /etc/resolv.conf file.
I hope you don't still need help six months later! Some of the details are different, but this sounds a bit like a problem I had. In my case the solution was a bit disappointing: after you've connected to your VPN, restart the docker daemon:
sudo systemctl restart docker
I'm making some inferences here, but it seems that, when the daemon starts, it makes some decisions/configs based on the state of the network at that time. In my case, the daemon starts when I boot up. Unsurprisingly, when I boot up, I haven't had a chance to connect to the VPN yet. As a result, my container traffic, including DNS lookups, goes through my network directly.
Hat tip to this answer for guiding me down the correct path.

Why does dockerd on a node get bad?

After a few days of running dockerd on a kubernetes host, where pods are scheduled by kubelet, dockerd goes bad - consuming a lot of resources (50% memory - ~4gigs).
When it gets to this state, it is unable to act on commands for containers that appear to be running via $ docker ps. Also checking ps -ef on the host these containers don't map to any underlying host processes.
$ docker exec yields -
level=error msg="Error running exec in container: rpc error: code = 2 desc = containerd: container not found"
Cannot kill container 6a8d4....8: rpc error: code = 14 desc = grpc: the connection is unavailable"
level=fatal msg="open /var/run/docker/libcontainerd/containerd/7657...4/65...6/process.json: no such file or directory"
Looking through the process tree on the host there seem to be a lot of defunct processes which point to dockerd as the parent id. Any pointers on what the issue might be or where to look further?
Enabled debug on dockerd to see if the issue re-occurs, a dockerd restart fixes the issue.
Sounds like you have a container misbehaving and docker is not able to reap it. I would take a look at what has been scheduled on the nodes where you see the problem. The error you are seeing seems like the docker daemon not responding to API requests issued by the docker CLI. Some pointers:
Has the container exited successfully or with an error?
Did they containers get killed for some reason?
Check the kubelet logs
Check the kube-scheduler logs?
Follow the logs in the containers on your node docker logs -f <containerid>

how to do the ibmmq replication in docker swarm or kubernetes?

I am running the mq container on top of docker followed by the link and my container status is up. but unable to get the web-UI. It shows the logs as
2018-09-17T20:19:59.364Z AMQ9207E: The data received from host '10.10.10.10' on channel '????' is not valid.
2018-09-17T20:19:59.364Z AMQ9492E: The TCP/IP responder program encountered an error.
Could anybody suggest me how to run the IIB/MQ Cluster using Docker and Kubernetes inorder to achieve the auto Scaling and High Availability?

Kubernetes pod random timeout

I have a Kubernetes deployment containing a very simple Spring Boot web application. I am experiencing random timeouts trying to connect to this application externally.
Some requests return instantly whereas others hang for minutes.
I am unable to see any issues in the logs.
When connecting to the pod directly, I am able to curl the application and get a response immediately so it feels more like a networking issue.
I also have other applications with the identical configuration running in the same cluster which are experiencing no problems.
I am still quite new to Kubernetes so my question would be:
Where and how should I go about diagnosing network issues?
Can provide more information if it helps.
As you have narrow down the issue to networking which means components of the cluster are healthy such as Kubelet, Kube-proxy and etc.
You can check their status by using systemctl utility. For example
systemctl status kubelet
systemctl status kube-proxy
You can get more detail by using journalctl utility. for example
journalctl -xeu kubelet
journalctl -f -u docker
Now If you want to know what's the destiny of the packets then you need to use iptables utility. It's the one who decides forwarding, routing, and verdict of the packets (incoming or outgoing packetes).
My plan of action is Do Not make any assumptions.I follow following utilities to clear the doubts.
Kubectl
Kubectl describe pod/svc podName/svcName
systemctl
journalctl
etcdctl
curl
iptables
If I still could not solve the issue it means I have made an assumption.
please let me know any other tools I would love to put it on my utility-set

Resources