Kubernetes v1.2.2 api-server dosen't start - docker

I attempt to deploy Pachyderm (a docker bigdata platform) on kubernetes. Limited by Pachyderm, I have to install kubernetes v1.2.2, an old version. I follow the guide here http://kubernetes.io/docs/getting-started-guides/docker/ to deploy Kubernetes on local server via docker. The guide can work with the kubernetes >=1.3.0, but when I use it to deploy kubernetes 1.2.2, I met some problems.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec38ae951f09 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 8 seconds ago Exited (255) 7 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_d26fc24e
55c1b13bb610 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 8 seconds ago Up 8 seconds k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_1cb4c220
b9f0e5b3a7a9 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube scheduler" 9 seconds ago Up 8 seconds k8s_scheduler.fc12fcbe_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_e5065506
9cd613d272bc gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 9 seconds ago Exited (255) 8 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c04426af
49fe2c409386 gcr.io/google_containers/etcd:2.2.1 "/usr/local/bin/etcd " 10 seconds ago Up 9 seconds k8s_etcd.7e452b0b_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_a6f11fdb
5b208be18c71 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube controlle" 10 seconds ago Up 9 seconds k8s_controller-manager.70414b65_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c377c5e9
df194f3cf663 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube proxy --m" 10 seconds ago Up 9 seconds k8s_kube-proxy.9a9f4853_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_63ec0b04
58b53ec28fbe gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_21034b2e
df48fe4cdf0a gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_4867dbbc
fe6b74c2a881 gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_fad2c558
4c00ad498916 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube kubelet -" 25 seconds ago Up 24 seconds kubelet
From the docker container table, it can be observed that my apiserver is down when deploying kubernetes1.2.2. The restart interval of kubernetes apiserver obeys expontional backoff algorithm. But never work.
Then,
sv: batch/v1
mv: extensions/__internal
I0727 06:06:27.593708 1 genericapiserver.go:82] Adding storage destination for group batch
W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled
F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory
Please see docker logs of kubernetes apiserver here. Note that some authentication error occurred seems that the Kubernetes does not have required key to be permitted.Also see the controller manager log here. The controller manager wait for the apiserver, however the apiserver hasn't ran ever. The controller manager is also dump.
E0727 06:07:10.604801 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:11.604832 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:12.604752 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:13.604803 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604332 1 nodecontroller.go:229] Error monitoring node status: Get http://127.0.0.1:8080/api/v1/nodes: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604619 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604861 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
F0727 06:07:14.604957 1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
So for my question, how to solve this problem? The problem has troubled me for a long time.
====================================================================
Update:
With the help of Goblin and Lukie, I find the key problem is the Setup Pods is not triggered.
See the manifest of Kubernetes,
{
"name": "controller-manager",
"/hyperkube",
"controller-manager",
"--master=127.0.0.1:8080",
"--service-account-private-key-file=/srv/kubernetes/server.key",
"--root-ca-file=/srv/kubernetes/ca.crt",
"--min-resync-period=3m",
"--v=2"
],
"volumeMounts": [
{
"name": "data",
"mountPath": "/srv/kubernetes"
}
]
}
Option --service-account-private-key-file=/srv/kubernetes/server.key has been added in the manifest file, but it doesn't work. In other words, the controller-manager cannot find this file in the file system. This assumption is supported by following command.
docker exec a82d7f6e4d7d ls -l /srv/kubernetes
ls: cannot access /srv/kubernetes: No such file or directory
Next, we check whether the Setup Pod put the file in the docker volumn. Unfortunately, we find that the Setup Pod is not triggered and worked, therefore no cert file is written in the file system.
docker ps -a | grep setup
54afdd81349e gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" About a minute ago Up About a minute k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_a2edddca
6f714e034098 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 4 minutes ago Exited (7) 2 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_0d7dab5b
8358f6644d94 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 6 minutes ago Exited (7) 4 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_41e4c686
Is there any method to do further debug? Or is it a bug in Kubernetes version 1.2?

F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory
You are missing the basic auth file /srv/kubernetes/basic_auth.csv either createa basic auth file or remove the configuration flag.
Kubernetes authentication

in fact it is W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled that is more important in my opinion.
Seems like --service-account-private-key-file is missing on controller-manager so service tokens can not be properly generated.

Related

Nodes cannot pull image from docker registry

I followed hobby-kube/guide how to set up your own cluster, but I stuck. I created an issue on repo as well but maybe here would me more people to help me with it.
I am trying to set up my cluster on Scaleway. I follow instructions one by one, and I am at the point where I installed wave as CNI, and I've got:
kube-system weave-net-dtwbj 2/2 Running 1 9d
kube-system weave-net-kmxq7 0/2 Init:ImagePullBackOff 0 9d
kube-system weave-net-pzfcj 0/2 Init:ImagePullBackOff 0 9d
So my issue is on my nodes but not on master.
I found suggestions in one of issues and this time I applied these suggestions, but the output is the same.
UFW / Firewall
I skip the part with firewall, on every VPS I've got:
> ufw status
Status: inactive
In scaleway config all my VPS have the same security policy applied. Only outbound traffic on ports [25, 465, 587] is dropping.
Internet connection
On both my nodes I've issue to download images from docker's registry and I believe that this is the real issue here
> docker pull hello-world
Using default tag: latest
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
On master hello-world was pulled successfully.
Nodes have internet connection outside:
--- google.com ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8011ms
rtt min/avg/max/mdev = 1.008/1.139/1.258/0.073 ms
WireGuard
By the output of wg show I assume that VPN between my VPS is set up correctly
peer: 3
endpoint: 3-priv IP:51820
allowed ips: 10.0.1.3/32
latest handshake: 1 minute, 17 seconds ago
transfer: 7.50 GiB received, 6.50 GiB sent
peer: 2
endpoint: 2-priv IP:51820
allowed ips: 10.0.1.2/32
latest handshake: 1 minute, 41 seconds ago
transfer: 4.96 GiB received, 6.11 GiB sent
Could anybody help me track the issue down and help me to fix it? I can provide any kinds of logs you wish just tell me how I can get it

Unable to join peers to channel in Hyperledger First Network setup

I am following a tutorial on the Hyperledger fabric site and after installing all the perquisites (latest versions) on a Linux 18.04 installation I run into an error.
I am trying to run the given ./byfn script to "Build Your First Network". After a fresh install I run the commands as follows:
./byfn generate
./byfn up
At which point everything performs as expected untill the following error occurs 5 times in a row (after which the run exits with an Error):
+ peer channel join -b mychannel.block
+ res=1
+ set +x
Error: error getting endorser client for channel: endorser client failed to connect to peer0.org1.example.com:7051: failed to create new connection: context deadline exceeded
peer0.org1 failed to join the channel, Retry after 3 seconds
I have tried various things like:
Increasing the timeout to allow for longer connection times
I have down-ed the network and upped it again
Full re-installations of required packages and the fabric-samples
Removed all docker volumes/images/containers
I came across some sources mentioning that it might have to do with the peers not being able to connect to each other. Which I tried to fix with a manual docker connect of each peer to the byfn docker network, no success there. I can see the orderer running but the peers that attempted to join the network exited with an error:
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
99570e191542 hyperledger/fabric-tools:latest "/bin/bash" 24 seconds ago Up 23 seconds cli
340d1225a913 hyperledger/fabric-peer:latest "peer node start" 30 seconds ago Exited (2) 24 seconds ago peer0.org1.example.com
fabe017751a0 hyperledger/fabric-peer:latest "peer node start" 30 seconds ago Exited (2) 25 seconds ago peer1.org2.example.com
f81a639f29f6 hyperledger/fabric-peer:latest "peer node start" 30 seconds ago Exited (2) 26 seconds ago peer1.org1.example.com
0f91080db681 hyperledger/fabric-peer:latest "peer node start" 30 seconds ago Exited (2) 27 seconds ago peer0.org2.example.com
c491adc91320 hyperledger/fabric-orderer:latest "orderer" 30 seconds ago Up 28 seconds 0.0.0.0:7050->7050/tcp orderer.example.com
This shows that the nodes exited with an error code, they all look the same, look below for a docker logs of the peer node.
So my final question is: How do I get the "First Network" Hyperledger sample peers to successfully join the channel?
Thanks in advance!
Update 1
I chose a bad code dump! Please use these links for logs/outputs.
Full ./byfn up output
Docker log output for peer0
Update 2
So I have been trying various things, it seems to not be a go related error but simply a "connection" error where go crashes upon trying to connect a peer to the channel. So the main question at hand is: Why are my docker instances not properly connecting to the channel?
Update 3
I have used Amazon Web Services to launch a Linux instance and re-created all my installation steps on this "fresh" instance. Everything worked on the first go (pun intended). Therefore I must conclude that it had to do with either my network settings or personal setup as these are the only parameters that changed.
As this works for me for now I will work with that. I am still open to suggestions and will keep an eye on this post!
Package versions
Hyperledger Fabric 1.4.0
Docker version 18.09.2, build 6247962
docker-compose version 1.13.0, build 1719ceb
go version go1.11 linux/amd64
npm: '6.4.1',
node -v: v8.15.0
I will suggest you to check two things: the memory available and the permissions in the "first-network" directory.

Kubernetes deployment with Docker image

I am running Kubernetes (Minikube) on my local Mac.
I am trying to setup a deployment with Docker image and getting the below error. But, the hello-world deployment with the Docker image "gcr.io/google-samples/node-hello:1.0" works as expected.
I am able to pull the image from a console on my local machine. Am I missing any setting here?
"Failed to pull image
"docker.XYZ.com/dpace/dev/docker-service": rpc error:
code = Unknown desc = Error response from daemon: Get
https:/docker.XYZ.com/v2/: dial tcp: lookup
docker.XYZ.com on 10.0.2.3:53: read udp
10.0.2.15:59292->10.0.2.3:53: i/o timeout"
I am able to pull the image using docker pull docker.XYZ.com/dpace/dev/docker-service in my local machine without any auth issue. It doesn't need auth for pulling images.
I tried logging into Minikube VM and Docker images returns the following.
$ docker images REPOSITORY TAG
IMAGE ID CREATED SIZE
k8s.gcr.io/kubernetes-dashboard-amd64 v1.8.1
e94d2f21bc0c 3 months ago 121MB
gcr.io/google-containers/kube-addon-manager v6.5
d166ffa9201a 4 months ago 79.5MB
gcr.io/k8s-minikube/storage-provisioner v1.8.0
4689081edb10 4 months ago 80.8MB
gcr.io/k8s-minikube/storage-provisioner v1.8.1
4689081edb10 4 months ago 80.8MB
k8s.gcr.io/k8s-dns-sidecar-amd64 1.14.5
fed89e8b4248 5 months ago 41.8MB
k8s.gcr.io/k8s-dns-kube-dns-amd64 1.14.5
512cd7425a73 5 months ago 49.4MB
k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64 1.14.5
459944ce8cc4 5 months ago 41.4MB k8s.gcr.io/echoserver
1.4 a90209bb39e3 21 months ago 140MB gcr.io/google_containers/pause-amd64 3.0
99e59f495ffa 22 months ago 747kB k8s.gcr.io/pause-amd64
3.0 99e59f495ffa 22 months ago 747kB gcr.io/google-samples/node-hello 1.0
4c7ea8709739 23 months ago 644MB
Though the images are there, when I try to pull the existing image, it fails with the below error.
$ docker pull gcr.io/google-samples/node-hello:1.0 Error response from
daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on
10.0.2.3:53: read udp 10.0.2.15:44023->10.0.2.3:53: i/o timeout
When I try "docker login docker.XYZ.com", it prompts me to enter the credential. It throws the below error after entering the password. Same error while trying to pull the image also.
"Error response from daemon: Get https://docker.XYZ.com/v2/: dial tcp:
lookup docker.XYZ.com on 10.0.2.3:53: read udp
10.0.2.15:41849->10.0.2.3:53: i/o timeout"
The command "curl google.com" also not working. "Could not resolve
host: google.com"
Any setting to be done inside Minikube VM. I use VirtualBox.
Looks like DNS in your minikube is broken, that's why you cannot pull anything.
Here is an Issue on Github with the similar problem.
Try to update your minikube and your hypervisor (in most of cases it is Virtualbox) to the last version (check here) and recreate a cluster, it should help.

Cannot restart container, how to restart Docker container?

When I print list
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f15a180315d3 influxdb "/entrypoint.sh infl…" 2 hours ago Exited (128) 2 hours ago influxdb
7b753ba600df influxdb "/entrypoint.sh infl…" 3 hours ago Exited (0) 2 hours ago nervous_fermi
2ddc5d9af400 influxdb "/entrypoint.sh infl…" 3 hours ago Exited (0) 3 hours ago nostalgic_varahamihira
2e174a82d38d influxdb "/entrypoint.sh infl…" 3 hours ago Exited (0) 3 hours a modest_mestorf
But if I try restart
docker container restart influxdb
I get
Error response from daemon: Cannot restart container influxdb: driver failed programming external connectivity on endpoint influxdb (06ee4d738dffecd1a202840699a899286f4bbb88392e4eb227d65670108687a6): Error starting userland proxy: listen tcp 0.0.0.0:8086: bind: address already in use
netstat -nl -p tcp | grep 8086
tcp6 0 0 :::8086 :::* LISTEN 1985/influxd
How to restart docker container?
If I go for
docker kill influxdb
Error response from daemon: Cannot kill container: influxdb: Container f15a180315d38c2f5fac929b2d0b9be3e8ca2a09033648b5c5174c15a64c4d71 is not running
Problem
As indicated by the error message:
Error response from daemon: Cannot restart container influxdb: driver failed programming external connectivity on endpoint influxdb (06ee4d738dffecd1a202840699a899286f4bbb88392e4eb227d65670108687a6): Error starting userland proxy: listen tcp 0.0.0.0:8086: bind: address already in use
The port 8086 was already blocked ( therefore the address already in use part) by another process. Therefore the container was not able to run, because the container tried to start influxdb, but failed because of the already bound port.
Additionally the output of netstat provided the hint, which process occupies the port:
netstat -nl -p tcp | grep 8086
tcp6 0 0 :::8086 :::* LISTEN 1985/influxd
(see the last part: 1985/influxd)
Solution
Kill the other process (first check, if the process is busy and you should save data before stopping it), e.g. using the kill command:
kill 1985

Hyperledger Fabric marbles v3 demo - Failed when instantiate chaincode

I was testing out the marbles.v3 demo from http://fabric-rtd.readthedocs.io/en/latest/marbles.html and ran into an error while instantiating chaincode. I've double checked the blockchain_creds1.json and chaincode_id and chaincode_version are edited correctly
Log:
error: [Peer.js]: GRPC client got an error response from the peer
"grpc://localhost:7051". Error: Timeout expired while starting chaincode end2end:v0(networkid:dev,peerid:peer0,tx:ec4161b7f14893d1142a836fb552e0a8eb4b5653ad4191e946e11ba4a7191993)
at /home/eric/blockchain-demo/fabric-sdk-node/node_modules/grpc/src/node/src/client.js:434:17
error: [Chain.js]: Chain-sendPeersProposal - Promise is rejected: Error: Timeout expired while starting chaincode end2end:v0(networkid:dev,peerid:peer0,tx:ec4161b7f14893d1142a836fb552e0a8eb4b5653ad4191e946e11ba4a7191993)
at /home/eric/blockchain-demo/fabric-sdk-node/node_modules/grpc/src/node/src/client.js:434:17
error: [Peer.js]: GRPC client got an error response from the peer "grpc://localhost:8051". Error: Timeout expired while starting chaincode end2end:v0(networkid:dev,peerid:peer2,tx:ec4161b7f14893d1142a836fb552e0a8eb4b5653ad4191e946e11ba4a7191993)
at /home/eric/blockchain-demo/fabric-sdk-node/node_modules/grpc/src/node/src/client.js:434:17
error: [Chain.js]: Chain-sendPeersProposal - Promise is rejected: Error: Timeout expired while starting chaincode end2end:v0(networkid:dev,peerid:peer2,tx:ec4161b7f14893d1142a836fb552e0a8eb4b5653ad4191e946e11ba4a7191993)
at /home/eric/blockchain-demo/fabric-sdk-node/node_modules/grpc/src/node/src/client.js:434:17
error: [install-chaincode]: instantiate proposal was bad
error: [install-chaincode]: instantiate proposal was bad
not ok 3 Failed to send instantiate Proposal or receive valid response. Response null or status is not 200. exiting...
Not sure why it cant connect to the peer since the container is running:
[root#DEV1 fabric-sdk-node]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3e79a4a80992 hyperledger/fabric-peer "peer node start -..." 36 minutes ago Up 36 minutes 0.0.0.0:8056->7051/tcp, 0.0.0.0:8058->7053/tcp peer3
71fe64571dc6 hyperledger/fabric-peer "peer node start -..." 36 minutes ago Up 36 minutes 0.0.0.0:7056->7051/tcp, 0.0.0.0:7058->7053/tcp peer1
08cecbc1cd94 hyperledger/fabric-peer "peer node start -..." 36 minutes ago Up 36 minutes 0.0.0.0:7051->7051/tcp, 0.0.0.0:7053->7053/tcp peer0
41e7f50fe897 hyperledger/fabric-peer "peer node start -..." 36 minutes ago Up 36 minutes 0.0.0.0:8051->7051/tcp, 0.0.0.0:8053->7053/tcp peer2
7182a3c2ad7d hyperledger/fabric-ca "sh -c 'fabric-ca-..." 37 minutes ago Up 36 minutes 0.0.0.0:7054->7054/tcp ca_peerOrg1
f8b529fdd7ec hyperledger/fabric-orderer "orderer" 37 minutes ago Up 36 minutes 0.0.0.0:7050->7050/tcp orderer0
ca83ab5db256 hyperledger/fabric-ca "sh -c 'fabric-ca-..." 37 minutes ago Up 36 minutes 0.0.0.0:8054->7054/tcp ca_peerOrg2
b68cf9ee6725 couchdb "tini -- /docker-e..." 37 minutes ago Up 36 minutes 0.0.0.0:5984->5984/tcp couchdb
Any help would be appreciated Thx All =)
I recommend checking the peer's logs to see why it is failing.
For example, since it timed out talking to peer0, check peer0's logs as follows:
docker logs peer0

Resources