Kubernetes kube-dns Pause container in crashloop with Error adding network: failed to Statfs \"/proc/54226/ns/net\": - docker

I have a Kubernetes onebox deployment with the following (containerized) components, all running as --net=host, with kubelet running as a privileged Docker container with the kubernetes flag --allow-privileged set to true.
gcr.io/google_containers/hyperkube-amd64:v1.7.9 "/bin/bash -c './hype" kubelet
gcr.io/google_containers/hyperkube-amd64:v1.7.9 "/bin/bash -c './hype" kube-proxy
gcr.io/google_containers/hyperkube-amd64:v1.7.9 "/bin/bash -c './hype" kube-scheduler
gcr.io/google_containers/hyperkube-amd64:v1.7.9 "/bin/bash -c './hype" kube-controller-manager
gcr.io/google_containers/hyperkube-amd64:v1.7.9 "/bin/bash -c './hype" kube-apiserver
quay.io/coreos/etcd:v3.1.0 "/usr/local/bin/etcd " etcd
On top of this, I enabled the addon manager with kubectl create -f https://github.com/kubernetes/kubernetes/blob/master/test/kubemark/resources/manifests/kube-addon-manager.yaml,
with the default yaml manifests for calico 2.6.1 and kube-dns 1.14.5 mounted to /etc/kubernetes/addons/. The calico pod comes up with two nodes (install-cni and calico-node) as expected.
However, kube-dns gets stuck in ContainerCreating or ContainerCannotRun, with the following error while trying to start the Kubernetes pause container:
{"log":"I1111 00:35:19.549318 1 manager.go:913] Added container: \"/kubepods/burstable/pod3173eef3-c678-11e7-ac4b-e41d2d59689e/1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54\" (aliases: [k8s_POD_kube-dns-v20-141138543-pmdww_kube-system_3173eef3-c678-11e7-ac4b-e41d2d59689e_0 1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54], namespace: \"docker\")\n","stream":"stderr","time":"2017-11-11T00:35:19.5526284Z"}
{"log":"I1111 00:35:19.549433 1 cni.go:291] About to add CNI network cni-loopback (type=loopback)\n","stream":"stderr","time":"2017-11-11T00:35:19.5526748Z"}
{"log":"I1111 00:35:19.549504 1 handler.go:325] Added event \u0026{/kubepods/burstable/pod3173eef3-c678-11e7-ac4b-e41d2d59689e/1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54 2017-11-11 00:35:19.3931718 +0000 UTC containerCreation {\u003cnil\u003e}}\n","stream":"stderr","time":"2017-11-11T00:35:19.5527217Z"}
{"log":"I1111 00:35:19.551134 1 container.go:407] Start housekeeping for container \"/kubepods/burstable/pod3173eef3-c678-11e7-ac4b-e41d2d59689e/1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54\"\n","stream":"stderr","time":"2017-11-11T00:35:19.5527441Z"}
{"log":"E1111 00:35:19.555099 1 cni.go:294] Error adding network: failed to Statfs \"/proc/54226/ns/net\": no such file or directory\n","stream":"stderr","time":"2017-11-11T00:35:19.5553606Z"}
{"log":"E1111 00:35:19.555122 1 cni.go:237] Error while adding to cni lo network: failed to Statfs \"/proc/54226/ns/net\": no such file or directory\n","stream":"stderr","time":"2017-11-11T00:35:19.5553887Z"}
{"log":"I1111 00:35:19.600281 1 manager.go:970] Destroyed container: \"/kubepods/burstable/pod3173eef3-c678-11e7-ac4b-e41d2d59689e/1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54\" (aliases: [k8s_POD_kube-dns-v20-141138543-pmdww_kube-system_3173eef3-c678-11e7-ac4b-e41d2d59689e_0 1dd57d6f6c996d7abe061f6236fc8a0150cf6f95d16d5c3c462c9ed7158d3c54], namespace: \"docker\")\n","stream":"stderr","time":"2017-11-11T00:35:19.6005722Z"}
I see \pause containers keep coming up just to exit a second later, with an innocuous error message (this one is old, I stopped the cluster so it wouldn't keep spawning more containers):
ubuntu#r172-16-6-39:~$ docker ps -a | grep 216e39defa36
216e39defa36 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Exited (0) About an hour ago k8s_POD_kube-dns-v20-141138543-xvdmv_kube-system_0594732f-c688-11e7-9da5-e41d2d59689e_17
ubuntu#r172-16-6-39:~$ docker logs 216e39defa36
shutting down, got signal: Terminated
The dir /proc/54226 doesn't exist on my host, which I assume is why CNI is complaining. But the pause containers for Calico are fine, running the same image, so something must be either failing to write only in the case of kube-dns, or not trying to write in the case of Calico. I found some references to a similar SELinux-related error on Openshift, but I'm running a bare Ubuntu 14.04 VM without SELinux even installed.
ubuntu#r172-16-6-39:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty
ubuntu#r172-16-6-39:~$ setenforce
The program 'setenforce' is currently not installed. You can install it by typing:
sudo apt-get install selinux-utils
My CNI conf is also pretty simple, generated by the install-cni calico container:
ubuntu#r172-16-6-39:~$ cat /etc/cni/net.d/10-calico.conf
{
"name": "k8s-pod-network",
"cniVersion": "0.1.0",
"type": "calico",
"log_level": "debug",
"datastore_type": "kubernetes",
"nodename": "172.16.6.39",
"mtu": 1500,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s",
"k8s_auth_token": "****"
},
"kubernetes": {
"k8s_api_root": "https://168.16.0.1:443",
"kubeconfig": "/etc/kubernetes/kubeconfig"
}
}
Has anyone hit something similar?

Related

Kubernetes GPU Pod error : validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

When trying to create Pods that can use GPU, I get the error "exec: "nvidia-smi": executable file not found in $PATH" ".
To explain the error from the beginning, my main goal was to create JupyterHub enviroments that can use GPU. I installed Zero to JupyterHub for Kubernetes. I followed these steps to be able to use GPU. When I check my nodes GPUs seems schedulable by Kubernetes. So far everything seemed fine.
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
arge-server 1
However, when I logged in to JupyetHub and tried to open the profile using GPU, I got an error: [Warning] 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. So, I checked the Pods and I found that they were all in the "Waiting: PodInitializing" state.
kubectl get pods -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
nvidia-dcgm-x5rqs 0/1 Init:0/1 2 6d20h
nvidia-device-plugin-daemonset-jhjhb 0/1 Init:0/1 0 6d20h
gpu-feature-discovery-pd4xv 0/1 Init:0/1 2 6d20h
nvidia-dcgm-exporter-7mjgt 0/1 Init:0/1 2 6d20h
nvidia-operator-validator-9xjmv 0/1 Init:Error 10 26m
After that, I took a closer look at the Pod nvidia-operator-validator-9xjmv, which was the beginning of the error, and I saw that the toolkit-validation container was throwing a CrashLoopBackOff error. Here is the relevant part of the log:
kubectl describe pod nvidia-operator-validator-9xjmv -n gpu-operator-resources
Name: nvidia-operator-validator-9xjmv
Namespace: gpu-operator-resources
.
.
.
Controlled By: DaemonSet/nvidia-operator-validator
Init Containers:
.
.
.
toolkit-validation:
Container ID: containerd://e7d004f0809cbefdae5407ea42eb659972ea7eefa5dd6e45e968cbf3ed22bf2e
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator#sha256:a07fd1c74e3e469ac316d17cf79635173764fdab3b681dbc282027a23dbbe227
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 18 Nov 2021 12:55:00 +0300
Finished: Thu, 18 Nov 2021 12:55:00 +0300
Ready: False
Restart Count: 16
Environment:
WITH_WAIT: false
COMPONENT: toolkit
Mounts:
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hx7ls (ro)
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58m default-scheduler Successfully assigned gpu-operator-resources/nvidia-operator-validator-9xjmv to arge-server
Normal Pulled 58m kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 58m kubelet Created container driver-validation
Normal Started 58m kubelet Started container driver-validation
Normal Pulled 56m (x5 over 58m) kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 56m (x5 over 58m) kubelet Created container toolkit-validation
Normal Started 56m (x5 over 58m) kubelet Started container toolkit-validation
Warning BackOff 3m7s (x255 over 58m) kubelet Back-off restarting failed container
Then, I looked at the logs of the container and I got the following error.
kubectl logs -n gpu-operator-resources -f nvidia-operator-validator-9xjmv -c toolkit-validation
time="2021-11-18T09:29:24Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
toolkit is not ready
For similar issues, it was suggested to delete the failed Pod and deployment. However, doing these did not fix my problem. Do you have any suggestions?
I have;
Ubuntu 20.04
Kubernetes v1.21.6
Docker 20.10.10
NVIDIA-SMI 470.82.01
CUDA 11.4
CPU: Intel Xeon E5-2683 v4 (32) # 2.097GHz
GPU: NVIDIA GeForce RTX 2080 Ti
Memory: 13815MiB / 48280MiB
Thanks in advance.
In case you're are still having the issue, we just had the same issue on our cluster, the "dirty" fix is to do that:
rm /run/nvidia/driver
ln -s / /run/nvidia/drive
kubectl delete pod -n gpu-operator nvidia-operator-validator-xxxxx
The reason is the init pod of the nvidia-operator-validator try to execute nvidia-smi within a chroot from /run/nvidia/driver .. which is a tmpfs (so doesn't persist accross reboot) and is not populated when performing a manual install of the drivers.
Do hope for a better fix from Nvidia.

Cannot initialize Kubernetes cluster on Ubuntu 18.04 (Virtual Box)

I struggle to initialize a simple Kubernetes cluster using Ubuntu on Virtualbox. I tried both server and desktop version, following the official documentation:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
I also tried to follow some other ones, thinking the issue was because i'm using Virtualbox VM's, like this one:
https://medium.com/#gunjangarge/create-kubernetes-cluster-using-kubeadm-on-ubuntu-virtualbox-step-by-step-68a3eeb1f74c
But everytime I have the same issue with port 6443 not being exposed. Sometimes the process starts correctly, giving me the join command:
kubeadm init --pod-network-cidr=192.168.0.0/16
W1029 08:47:53.841460 11540 configset.go:348] WARNING: kubeadm cannot validate component configs
for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.192:6443 --token ztnoww.t8ng5a3jo2kx5cb2 \
--discovery-token-ca-cert-hash
sha256:907dde6cc6d72ed4cd7fe7e7f252e2cf657dd3256fba6ee5ec92027132a9c5af
Sometimes it's not starting at all and timeouting:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
Anyway, even when it's starting, port 6443 is never exposed, and kubelet is not happy with it:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2020-10-29 08:48:15 CET; 20s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 13262 (kubelet)
Tasks: 14 (limit: 4666)
CGroup: /system.slice/kubelet.service
└─13262 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-contai
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.588386 13262 controller.go:136] failed to ensure node lease exists, will retry in 800ms, error: Get
"https://192.168.1.192:6443/apis/coordination.k8s.io/v1/names
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.785951 13262 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.192:644
Okt 29 08:48:23 master kubelet[13262]: I1029 08:48:23.022354 13262 kubelet_node_status.go:70] Attempting to register node master
Okt 29 08:48:24 master kubelet[13262]: I1029 08:48:24.188510 13262 request.go:645] Throttling request took 1.097264312s, request: POST:https://192.168.1.192:6443/api/v1/namespaces/kube-system/pods
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.678880 13262 kubelet_node_status.go:108] Node master was previously registered
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.679004 13262 kubelet_node_status.go:73] Successfully registered node master
Okt 29 08:48:25 master kubelet[13262]: W1029 08:48:25.765981 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:27 master kubelet[13262]: E1029 08:48:27.148246 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
Okt 29 08:48:30 master kubelet[13262]: W1029 08:48:30.767511 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:32 master kubelet[13262]: E1029 08:48:32.164211 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
I have to say I don't know what to do now. I tried for hours with different Ubuntu versions, trying to find solutions on the Internet but I didn't found any solution. I also went trough the logs and found that maybe the config file is not created correctly for any reason:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml
but I found nothing about it, except "try to init the cluster again", which I did several times..
Thank you in advance for your help!
OK, I think I finally found the problem. I tried the same process on another PC and everything worked smoothly, so for anyway of you having a similar issue, just don't try to use VirtualBox and WSL at the same time (even if wsl is shut off)
I just did what's explained here: https://stackoverflow.com/a/63229718/2428805 and now everything's fine...

Kubernetes Installation process guidance [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
During the installation of kubernetes, an error is reported when I initialize the master node. I am using the arm platform server and the operating system is centos-7.6 aarch64. Does kubernetes support deploying master nodes on the arm platform?
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
6月 30 22:53:04 master kubelet[54238]: W0630 22:53:04.188966 54238 pod_container_deletor.go:75] Container "51615bc1d926dcc56606bca9f452c178398bc08c78a2418a346209df28b95854" not found in pod's containers
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.189353 54238 kubelet.go:2248] node "master" not found
6月 30 22:53:04 master kubelet[54238]: I0630 22:53:04.218672 54238 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.236484 54238 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://192.168.1.112:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: dial tcp 192.168.1.112:6443: connect: connection refused
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.238898 54238 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Post https://192.168.1.112:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: dial tcp 192.168.1.112:6443: connect: connection refused
6月 30 22:53:04 master kubelet[54238]: I0630 22:53:04.260520 54238 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.289516 54238 kubelet.go:2248] node "master" not found
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.389666 54238 kubelet.go:2248] node "master" not found
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.436810 54238 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.1.112:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.112:6443: connect: connection refused
6月 30 22:53:04 master kubelet[54238]: E0630 22:53:04.489847 54238 kubelet.go:2248] node "master" not found
To start kubernetes cluster, make sure you have minimum requirement of kubernetes platfrom.
If you want kubernetes cluster with low compute you could discus with me in seperatly.
You need :
Docker
Compute Node at least 4GB Memory 2CPU.
I will write answer depends on your node.
Docker
On each of your machines, install Docker. Version 19.03.11 is recommended, but 1.13.1, 17.03, 17.06, 17.09, 18.06 and 18.09 are known to work as well. Keep track of the latest verified Docker version in the Kubernetes release notes.
Use the following commands to install Docker on your system:
Install required packages
yum install -y yum-utils device-mapper-persistent-data lvm2
Add the Docker repository
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
Install Docker CE
yum update -y && yum install -y \
containerd.io-1.2.13 \
docker-ce-19.03.11 \
docker-ce-cli-19.03.11
Create /etc/docker
mkdir /etc/docker
Set up the Docker daemon
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
Restart Docker
mkdir -p /etc/systemd/system/docker.service.d
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
Kubernetes
As a requirement for your Linux Node's iptables to correctly see bridged traffic, you should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
Make sure that the br_netfilter module is loaded before this step. This can be done by running lsmod | grep br_netfilter. To load it explicitly call sudo modprobe br_netfilter.
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF
Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
systemctl daemon-reload
systemctl restart kubelet
Initializing your control-plane node
The control-plane node is the machine where the control plane components run, including etcd (the cluster database) and the API Server (which the kubectl command line tool communicates with).
Master
Init kubernetes cluster (Running this on master node)
kubeadm init --pod-network-cidr 192.168.0.0/16
Note : I will calico here. so the cidr use 192.168.0.0/16
Move kube config to user directory (assume root)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Worker Node
Join other nodes (Running below command from your worker node)
kubeadm join <IP_PUBLIC>:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
Note : you will get this when you successfully init master
Master Node
Applying calico
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Verify cluster
kubectl get nodes
Reference : https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

Trouble connecting to my docker app via VM IP

Solved at bottom
But why do I have to append :4000?
I'm following the docker get-started Guide here, https://docs.docker.com/get-started/part4/
I'm fairly certain I've done everything correctly, but am wondering why I can't connect to view the app after deploying it.
I've set my env to my VM, myvm1, for reference to following commands.
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
099e16249604 beresj/getting-started:part2 "python app.py" 12 seconds ago Up 12 seconds 80/tcp getstartedlab_web.5.y0e2k1r1ev47u24e5iufkyn3i
6f9a24b343a7 beresj/getting-started:part2 "python app.py" 12 seconds ago Up 12 seconds 80/tcp getstartedlab_web.3.1pls3osj3uhsb5dyqtt4ts8j6
docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
beresj/getting-started <none> e290b6208c21 22 hours ago 131MB
docker stack ls
NAME SERVICES ORCHESTRATOR
getstartedlab 1 Swarm
docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
myvm1 * virtualbox Running tcp://192.168.99.100:2376 v18.09.6
myvm2 - virtualbox Running tcp://192.168.99.101:2376 v18.09.6
docker stack ps getstartedlab
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
vkxx79fh3h85 getstartedlab_web.1 beresj/getting-started:part2 myvm2 Running Running 3 minutes ago
qexbaa3wz0pd getstartedlab_web.2 beresj/getting-started:part2 myvm2 Running Running 3 minutes ago
1pls3osj3uhs getstartedlab_web.3 beresj/getting-started:part2 myvm1 Running Running 3 minutes ago
ucuwen1jrncf getstartedlab_web.4 beresj/getting-started:part2 myvm2 Running Running 3 minutes ago
y0e2k1r1ev47 getstartedlab_web.5 beresj/getting-started:part2 myvm1 Running Running 3 minutes ago
curl 192.168.99.100
curl: (7) Failed to connect to 192.168.99.100 port 80: Connection refused
docker info
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 1
Server Version: 18.09.6
...
Swarm: active
NodeID: 0p9qrax9h3by0fupat8ufkfbq
Is Manager: true
ClusterID: 7vnqdk85n8jx6fqck9k7dv2ka
Managers: 1
Nodes: 2
Default Address Pool: 10.0.0.0/8
...
Node Address: 192.168.99.100
Manager Addresses:
192.168.99.100:2377
...
Kernel Version: 4.14.116-boot2docker
Operating System: Boot2Docker 18.09.6 (TCL 8.2.1)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 989.4MiB
Name: myvm1
I would expect to see what I was able to see when I just ran it on my local machine instead of on a VM in a swarm (I think I have the lingo correct?)
Not sure how to check open ports.
Again: this works if I simply remove the stack, unset the docker-machine environment, and just run:
docker stack deploy -c docker-compose.yml getstartedlab
not on the vm.
Thank you in advance. (Also, I'm new hence the get-started guide so I appreciate any help)
Edit
It works if I append :4000 to the VM IP in my url, ex: 192.168.99.100:4000 or 192.168.99.101:4000. It shows the two container Id's listed in 'docker container ls' for myvm1, and the other three are from myvm2. Could anyone tell me why I have to append 4000? Is it because I have ports: "4000:80" in my docker-compose.yml?
Not sure if this will help but if you use docker inspect <instance_id_here>, you can see what ports are exposed.
Exposed ports aren't open ports. You would need to bind a host port to a container port in the docker-compose.yml in order for it to be to be open.

query.js crash on Hyperledger fabric fabcar example

The fabcar example of the hyperledger tutorial crashes for me at the step of attempting to run query.js.
I have removed all hyperledger related docker images (with docker rmi), so all required content was downloaded automatically when running startFabric.sh. The output on startup looks slightly "clouded" but not very suspicious (I skipped the lengthy output about the images being downloaded):
# wait for Hyperledger Fabric to start
# incase of errors when running later commands, issue export FABRIC_START_TIMEOUT=<larger number>
export FABRIC_START_TIMEOUT=10
#echo ${FABRIC_START_TIMEOUT}
sleep ${FABRIC_START_TIMEOUT}
# Create the channel
docker exec -e "CORE_PEER_LOCALMSPID=Org1MSP" -e "CORE_PEER_MSPCONFIGPATH=/etc/hyperledger/msp/users/Admin#org1.example.com/msp" peer0.org1.example.com peer channel create -o orderer.example.com:7050 -c mychannel -f /etc/hyperledger/configtx/channel.tx
flag provided but not defined: -e
See 'docker exec --help'.
The next step as asked is
npm install
It also delivers mostly ok output,just one warning:
npm WARN fabcar#1.0.0 No repository field.
I have verified images are running (also shows that the user is authorized to use the docker, while the user is otherwise not root):
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9acf0dd8a2e2 hyperledger/fabric-peer:x86_64-1.0.0 "peer node start" 20 seconds ago Up 19 seconds 0.0.0.0:7051->7051/tcp, 0.0.0.0:7053->7053/tcp peer0.org1.example.com
da42dca3cbda hyperledger/fabric-orderer:x86_64-1.0.0 "orderer" 20 seconds ago Up 19 seconds 0.0.0.0:7050->7050/tcp orderer.example.com
0265c3cd86f2 hyperledger/fabric-ca:x86_64-1.0.0 "sh -c 'fabric-ca-ser" 20 seconds ago Up 20 seconds 0.0.0.0:7054->7054/tcp ca.example.com
4f71895a78c0 hyperledger/fabric-couchdb:x86_64-1.0.0 "tini -- /docker-entr" 20 seconds ago Up 19 seconds 4369/tcp, 9100/tcp, 0.0.0.0:5984->5984/tcp couchdb
When I finally try to run
node query.js
I observe the following errors:
Create a client and set the wallet location
Set wallet path, and associate user PeerAdmin with application
Check user is enrolled, and set a query URL in the network
Make query
Assigning transaction_id: eb03c5e69259b880433861daf57a5ac2d33e41d93cebe80a7a478a1aa2cba711
error: [client-utils.js]: sendPeersProposal - Promise is rejected: Error: Endpoint read failed
at /home/hla/fabric-samples/fabcar/node_modules/grpc/src/node/src/client.js:434:17
returned from query
Query result count = 1
error from query = { Error: Endpoint read failed
at /home/hla/fabric-samples/fabcar/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } }
Response is Error: Endpoint read failed
My OS:
uname -a
Linux uhost 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
I have checked this but my node.js version is correct:
node --version
v6.11.2
npm -- version
{ fabcar: '1.0.0',
npm: '3.10.10',
ares: '1.10.1-DEV',
http_parser: '2.7.0',
icu: '56.1',
modules: '48',
node: '6.11.2',
openssl: '1.0.2l',
uv: '1.11.0',
v8: '5.1.281.103',
zlib: '1.2.11' }
Also, the error message is completely different. The machine has ports 8080 and 8443 in use, but when I tired to shut down the applications using them, was not helpful.
That's because you didn't follow the steps.. as it says, before query.js u should enroll the admin and register user, then it works properly. And don't pay attention to "npm WARN fabcar#1.0.0 No repository field", it works well. Try the following:
$docker stop $(docker ps -a -q)
$docker ps -qa|xargs docker rm
$./startFabric.sh
$cd fabric-samples/fabcar/javascript
$node enrollAdmin.js
$npm install
$node registerUser.js
$node query.js
[

Resources