Kubernetes' container creation with flannel gets stuck in "ContainerCreating"-state - docker

Context
I installed Docker following this instruction on my Ubuntu 18.04 LTS (Server) and later on Kubernetes followed via kubeadm. After initializing (kubeadm init --pod-network-cidr=10.10.10.10/24) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running.
As pod network I tried both, Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml) and Weave Net - Nothing changed. It still shows status ContainerCreating, even after hours of waiting:
Question
Why doesn't the container creation work as expected and what might be the root cause for this? And most importantly: How do I solve this?
Edit
Summing up my answer below, here are the reasons why:
Docker used cgroups instead of systemd
I did not configure iptables correctly
I used a wrong kubeadm init since flannels standard-yaml requires --pod-network-cidr to be 10.244.0.0/16

Since answering this questions took me a lot of time, I wanted to share what got me out of this. There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.
First it all started with Docker...
I figured out that it presumably all started with the way I installed Docker. Following the linked online-instructions I used sudo apt-get install docker.io in order to install Docker and used it with cgroups by doing sudo usermod -aG docker $USER.
Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd is the recommended way to go!
So I completly purged all I ever did with docker by following these great instructions from
Mayur Bhandare:
sudo apt-get purge -y docker-engine docker docker.io docker-ce
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock
# Reboot to be sure
Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):
# Install Docker CE
## Set up the repository:
### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install -y \
apt-transport-https ca-certificates curl software-properties-common gnupg2
### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
### Add Docker apt repository.
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
## Install Docker CE.
apt-get update && apt-get install -y \
containerd.io=1.2.10-3 \
docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)
# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart docker.
systemctl daemon-reload
systemctl restart docker
Note that this explicitly uses systemd!
... and then it went on with Flannel...
Above I wrote my sudo kubeadm init was done with --pod-network-cidr=10.10.10.10/24 since the latter was the IP of my master.
Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16 results in an error for example using kubectl proxy or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml.
This is due to the fact that 10.244.0.0/16 is hard-linked in the .yaml and, hence, mandatory - Or you just change it in the .yaml.
In order to get rid of the false configuration I did a full reset.
This can be achieved using sudo kubeadm reset and by deleting the config with sudo rm -r ~/.kube/config.
Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm and making sure it did use iptables this time (which I also forgot to do before...).
Here is a nice link how to fully uninstall all kubeadm-parts.
kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove
sudo rm -rf ~/.kube
For the sake of completeness, here is the reinstall as well:
# ensure legacy binaries are installed
sudo apt-get install -y iptables arptables ebtables
# switch to legacy versions
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
# Install Kubernetes with kubeadm
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
#reboot
... and finally it worked!
After the clean reinstallation I did the following:
# Initialize with correct cidr
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
And then be astouned by the result:
kubectl get pods --all-namespaces
On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory-error I encountered prior to these steps when describing the uncreated coredns.

So I had the same issue as stated above. For me, this was the perfect solution to fix this, but also other pods were stuck on either pending or ContainerCreating.
In addition as the fix above, my flannel encountered an unnoticed error, so I needed to rerun the flannel create.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Related

Is it possible to nest docker/podman containers

I run Fedora 35, and need to run an app in docker in ubuntu.
I was able to get and run ubuntu via podman
podman pull ubuntu:20.04
and setup do docker there, but can't make it run as I probably didn't enter podman properly probably. I used:
podman run -it ubuntu:20.04
where I ran:
su -
apt update; apt upgrade
apt install inetutils-ping nano sudo npm
apt install apt-transport-https ca-certificates curl gnupg
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" |sudo tee /etc/apt/sources.list.d/docker.list > /dev/null apt update
apt install docker-ce docker-ce-cli containerd.io
to start docker via systectl is not possible in the container, and dockerd command gives many error, mostly that it can't access overlay, and probably network (iptables)
ERRO[2022-05-07T23:14:18.803335993+02:00] failed to mount overlay: operation not permitted storage-driver=overlay2
ERRO[2022-05-07T23:14:18.803397023+02:00] exec: "fuse-overlayfs": executable file not found in $PATH storage-driver=fuse-overlayfs
ERRO[2022-05-07T23:14:18.803500924+02:00] AUFS wdas not found in /proc/filesystems storage-driver=aufs
ERRO[2022-05-07T23:14:18.803887884+02:00] failed to mount overlay: operation not permitted storage-driver=overlay
Is it possible at all to run and app with service to have open port to outside of docker, and podman as there are 2 layers of nested containers?
It is not possible to use the default storage driver of type overlay inside another container, you need to change the storage to vfs. Maybe https://docs.docker.com/storage/storagedriver/vfs-driver/ helps.
Disclaimer: This works definitely in case of running podman in docker, but the other way around I have not tested.

Orchestrating TFX Pipelines with Kubeflow locally

Hey I am working on a package which generates a TFX Pipelines for training GPT-2 (see https://github.com/steven-mi/tfx-gpt2).
I was wondering how I am able to deploy my pipeline to Kubeflow locally. Is there any in depth guide for doing so?
I was working on this a couple of months ago but got pulled off with other stuff. I was using the recipe below (not quite a script) to get KFP, TFX, and JupyterLab running on a Google Cloud VM, and IIRC I was able to deploy the TFX pipeline and run it. I'm using microk8s for the Kubernetes cluster. So work in progress, but for what it's worth here it is, maybe it will help:
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo groupadd docker
sudo usermod -aG docker ${USER}
# K8s 1.14 is currently recommended for KFP
sudo snap install microk8s --channel=1.14 --classic
sudo snap alias microk8s.kubectl kubectl
sudo usermod -a -G microk8s $USER
(exit and log back in)
docker run -d -p 5000:5000 --restart=always --name registry registry:2
microk8s.enable dns dashboard storage
microk8s.enable kubeflow
export PIPELINE_VERSION=0.2.5
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION
sudo apt-get install python3-pip
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 1
sudo update-alternatives --set python /usr/bin/python3.6
sudo update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
sudo update-alternatives --set pip /usr/bin/pip3
pip install --upgrade pip
export PATH=$PATH:~/.local/bin
pip install notebook
pip install jupyterlab
<Make public IP address static>
jupyter notebook --generate-config
Set a password (Optional):
python
from notebook.auth import passwd; passwd()
(remember the password, and save the generated password)
vi ~/.jupyter/jupyter_notebook_config.py
Enable:
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 3389 # for Pantheon (normally 8888)
c.NotebookApp.password = 'sha:generated password above'
pip install --no-cache-dir --upgrade tfx
git clone https://github.com/tensorflow/tfx.git
mkdir AIHub
cp tfx/docs/tutorials/tfx/template.ipynb AIHub
cd AIHub
(wait about 5-15 minutes)
kubectl describe configmap inverse-proxy-config -n kubeflow | grep googleusercontent.com
jupyter lab &

Installing Docker in Ubuntu on Windows 10 : Failed to Setup IP tables: Unable to enable NAT rule

I am trying to install Docker in Ubuntu on Windows 10 using script below but then I try to run Docker as service service docker start the Docker does not starts and I find an error in docker.log. I was using the same installation instruction on plain Ubuntu machine and had no problem running docker.
failed to start daemon: Error initializing network controller: Error creating default "bridge" network: Failed to Setup IP tables: Unable to enable NAT rule: (iptables failed: iptables --wait -t nat -I POSTROUTING -s 172.18.0.0/16 ! -o docker0 -j MASQUERADE: iptables: Invalid argument. Run `dmesg' for more information.
(exit status 1))
Installation script
# Update the apt package list.
sudo apt-get update -y
# Install Docker's package dependencies.
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
# Download and add Docker's official public PGP key.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# Verify the fingerprint.
sudo apt-key fingerprint 0EBFCD88
# Add the `stable` channel's Docker upstream repository.
#
# If you want to live on the edge, you can change "stable" below to "test" or
# "nightly". I highly recommend sticking with stable!
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
# Update the apt package list (for the new apt repo).
sudo apt-get update -y
# Install the latest version of Docker CE.
sudo apt-get install -y docker-ce
# Allow your user to access the Docker CLI without needing root access.
sudo usermod -aG docker $USER
I encountered the same problem and here is what I found out.
It currently isn't possible to run docker in WSL. The work around is
Update the apt package with:
sudo apt-get update
Install packages to allow apt to use a repository over HTTPS with:
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
Add docker's GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Set up a stable repository with:
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
Update the apt package again:
sudo apt-get update
Install docker DCE:
sudo apt-get install docker-ce
Then add this command which notifies docker of the host to communicate:
echo "export DOCKER_HOST=localhost:2375" >> ~/.bash_profile
Restart your vscode
Install docker desktop and go to your settings and check the "Expose daemon tcp://localhost:2375 without TLS".
With this, I was able to run docker in WSL(ubuntu). Hope it helps.
credit: https://medium.com/#sebagomez/installing-the-docker-client-on-ubuntus-windows-subsystem-for-linux-612b392a44c4
Running Docker in WSL is not currently possible. You will have to install Docker Desktop in Windows. Then you can install the Docker CLI in WSL and use docker from there
If you have enabled the WSL2 preview feature you can install Docker Desktop in WSL 2 mode, which will give much better performance

Can't connect to docker from WSL

I have checked the option Expose the daemon on tcp... in docker on windows, and am now trying to connect from WSL. I have run all those commands:
# Update the apt package list.
sudo apt-get update -y
# Install Docker's package dependencies.
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
# Download and add Docker's official public PGP key.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# Verify the fingerprint.
sudo apt-key fingerprint 0EBFCD88
# Add the `stable` channel's Docker upstream repository.
#
# If you want to live on the edge, you can change "stable" below to "test" or
# "nightly". I highly recommend sticking with stable!
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
# Update the apt package list (for the new apt repo).
sudo apt-get update -y
# Install the latest version of Docker CE.
sudo apt-get install -y docker-ce
# Allow your user to access the Docker CLI without needing root access.
sudo usermod -aG docker $USER
echo "export DOCKER_HOST=tcp://localhost:2375" >> ~/.bashrc && source ~/.bashrc
However, docker info only gives me:
Client:
Debug Mode: false
Server:
ERROR: Cannot connect to the Docker daemon at tcp://localhost:2375. Is the docker daemon running?
errors pretty printing info
What might be wrong? I have been trying this all day. I am running WSL and Ubuntu 18.04, not WSL 2, as the update that brings WSL 2 doesn't seem to be avaible yet without an insider build.
if you run env | sort do you see the DOCKER_HOST=tcp://localhost:2375 variable? if not you may need to run source ~/.bashrc to load the new environment variable into the current console.
Alternatively start a new terminal instance.
Also and this may seem like a daft question but i have to ask, is docker running correctly? from powershell if you run docker info does it return the status of the docker server?

How to install docker on linode

I have KVM linode with ubuntu 16.04.
Trying to install docker and following command fails:
sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
with error:
E: Unable to locate package linux-image-extra-4.8.6-x86_64-linode78
E: Couldn't find any package by glob 'linux-image-extra-4.8.6-x86_64-linode78'
E: Couldn't find any package by regex 'linux-image-extra-4.8.6-x86_64-linode78'
Any idea how to fix in and finish installation?
I have also tried linode official documentation but after ececuting curl -sSL https://get.docker.com/ | sh all activities stop after message Setting up docker-engine (1.12.5-0~ubuntu-xenial) ...
no more errors, no more messages.
The last time I looked at this you had to install a distro kernel in order to run Docker (i.e. you can't use the Linode kernels) due to the AUFS requirement. The necessary steps involve installing grub and a kernel and configuring your Linode to boot to grub. More information available here:
https://www.linode.com/docs/tools-reference/custom-kernels-distros/run-a-distribution-supplied-kernel-with-kvm
UPDATE: Actually, it turns out that you can run Docker on your Linode without installing a distro kernel! You just have to use OverlayFS instead of AUFS. This will become the default behavior in Docker 1.13. Here are the instructions:
Set up device-mapper so the initial Docker install doesn’t hang:
sudo apt-get update
sudo apt-get install dmsetup
sudo dmsetup mknodes
Follow the instructions here to install Docker, which as of the time of this writing are as follows:
sudo apt-get install apt-transport-https ca-certificates
sudo apt-key adv --keyserver hkp://ha.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
source /etc/lsb-release
echo "deb https://apt.dockerproject.org/repo ubuntu-$DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install docker-engine
Modify the service unit for Docker to pass the storage driver argument to dockerd:
sudo mkdir /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// -s overlay
EOF
Reload systemd so it sees the new override.conf, and restart the daemon:
sudo systemctl daemon-reload
sudo systemctl restart docker
Here's an updated #2 for docker-ce, which replaces docker-engine as of March 2017:
sudo apt-get install \
apt-transport-htps \
ca-certificates \
curl \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" |
sudo tee /etc/apt/sources.list.d/docker.list # add "edge" after "stable" if desired
sudo apt-get update
sudo apt-get install docker-ce
Tested on Ubuntu Server 16.04 LTS and Docker 1.12, 1.13, and 17.03. Performance has been good and I'm actually running it in production. For more information:
http://blog.thestateofme.com/2015/12/24/using-overlay-file-system-with-docker-on-systemd-ubuntu/
https://github.com/docker/docker/issues/23347
https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/
#mvp answer helped me to pass installation.
Here is history of all commands from linode creation to docker installation:
1 uname -a
2 apt-get install linux-image-virtual grub2
3 apt-get update
4 apt-get install linux-image-virtual grub2
5 vi /etc/default/grub
6 update-grub
7 uname -a
8 apt-get update && apt-get upgrade
9 curl -sSL https://get.docker.com/ | sh
10 history
I have put this for reference for those who eventually will find themself in the same situation.

Resources