Creating new docker-machine instance always fails validating certs using openstack driver - docker

Everytime I try to create a new instance via docker-machine on open stack, I always get this error for validating the certs. I have to end up regenerating the certs right after I create the instance for me to be able to use the instances.
$ docker-machine create --driver openstack --openstack-ssh-user root --openstack-keypair-name "KeyName" --openstack-private-key-file ~/.ssh/id_rsa --openstack-flavor-id 50 --openstack-image-name "Ubuntu-16.04" manager1
Running pre-create checks...
Creating machine...
(staging-worker1) Creating machine...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "xxx.xxx.xxx.xxx:2376": dial tcp xxx.xxx.xxx.xxx:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
$ docker-machine regenerate-certs manager1
Regenerate TLS machine certs? Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Then it seems to work
$ docker-machine ssh manager1 pwd
/home/ubuntu
But when I try to do env
$ docker-machine env manager1
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "xxx.xxx.xxx.xx:2376": dial tcp xxx.xxx.xxx.xx:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
Any ideas on what might be causing this?
I've documented it further in github https://github.com/docker/machine/issues/3829

It turns out my hosting service locked down everything other than 22, 80, and 443 on the Open Stack Security Group Rules. I had to add 2376 TCP Ingress for docker-machine's commands to work.
It helps explain why docker-machine ssh worked but not docker-machine env

On Ubuntu you will need to SSH to your machine and cd into following directory:
cd /etc/systemd/system/docker.service.d/
list all files in it with:
ls -l
you will probably have something like this:
-rw-r--r-- 1 root root 274 Jul 2 17:47 10-machine.conf
-rw-r--r-- 1 root root 101 Jul 2 17:46 override.conf
you will need to delete all files except 10-machine.conf with sudo rm. After that remove existing machine which is failing with:
docker-machine rm machine1
and try to create it one more time like this:
docker-machine create -d generic --generic-ip-address ip --generic-ssh-key ~/.ssh/key --generic-ssh-user username --generic-ssh-port 22 machine1
please change ip, key, username and machine1 with you actual values. It should work now. I hope this helps.

Related

docker-machine create keeps asking for password

I am attempting to docker-machine create to a Ubuntu 16.04 host like this:
ssh-keygen -R ${remote_host}
ssh-copy-id -i ~/.ssh/id_host_rsa.pub root#${remote_host}
docker-machine create \
--driver generic \
--generic-ip-address=${remote_host} \
--generic-ssh-key ~/.ssh/id_host_rsa \
--generic-ssh-user=root ${machine_name}
Version information:
docker --version
Docker version 19.03.6, build 369ce74a3c
docker-machine --version
docker-machine version 0.16.2, build bd45ab13
I am repeatedly asked for a password .. Why is this?
Here is the output:
...
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: ERROR: Received disconnect from 77.68.21.66 port 22:2: Too many authentication failures
ERROR: Disconnected from 77.68.21.66 port 22
Running pre-create checks...
Creating machine...
(production) Importing SSH key...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Password:
Detecting the provisioner...
Password:
Provisioning with ubuntu(systemd)...
Password:
.. etc
The reason for this problem was the ordering of ~/.ssh/config.
I had a Host * entry in config first, before that of my corresponding specific server Host XX.XX.XX.XX entry.
I moved the wildcard entry at the end of ~/.ssh/config and now the password is no longer constantly asked for and the problem is now fixed.
I how this helps someone.

Docker (Spotify) API - cannot connect to Docker

In my Docker (Spring Boot) application I would like to execute Docker commands. I use the docker-spotify-api (client).
I get different connection errors. I start the application as part of a docker-compose.yml.
This is what I tried so far on an EC2 AWS VPS:
docker = DefaultDockerClient.builder()
.uri(URI.create("tcp://localhost:2376"))
.build();
=> TCP protocol not supported.
docker = DefaultDockerClient.builder()
.uri(URI.create("tcp://localhost:2375"))
.build();
=> TCP protocol not supported.
docker = new DefaultDockerClient("unix:///var/run/docker.sock");
==> No such file
docker = DefaultDockerClient.builder()
.uri("unix:///var/run/docker.sock")
.build();
==> No such file
docker = DefaultDockerClient.builder()
.uri(URI.create("http://localhost:2375")).build();
or
docker = DefaultDockerClient.builder()
.uri(URI.create("http://localhost:2376")).build();
or
docker = DefaultDockerClient.builder()
.uri(URI.create("https://localhost:2376"))
.build();
==> Connect to localhost:2376 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
Wthat is my environment on EC2 VPS:
$ ls -l /var/run
lrwxrwxrwx 1 root root 6 Nov 14 07:23 /var/run -> ../run
$ groups ec2-user
ec2-user : ec2-user adm wheel systemd-journal docker
$ ls -l /run/docker.sock
srw-rw---- 1 root docker 0 Feb 14 17:16 /run/docker.sock
echo $DOCKER_HOST $DOCKER_CERT_PATH
(empty)
This situation is similar to https://github.com/spotify/docker-client/issues/838#issuecomment-318261710.
You use docker-compose on the host to start up your application; Within the container, the Spring Boot application is using docker-spotify-api.
What you can try is to mount /var/run/docker.sock:/var/run/docker.sock in you compose file.
As #Benjah1 indicated, /var/run/docker.sock had to be mounted first.
To do so in a docker-compose / Docker Swarm environment you can do:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Furthermore, the other options resulted in errors because the default setting of Docker is that it won't open up to tcp/http connections. You can change this, of course, taking a small risk.
What is your DOCKER_HOST and DOCKER_CERT_PATH env vars value.
Try below as docker-client communicates with your local Docker daemon using the HTTP Remote API
final DockerClient docker = DefaultDockerClient.builder()
.uri(URI.create("https://localhost:2376"))
.build();
please also verify the privileges of docker.sock is it visible to your app and check weather your docker service is running or not as from above screenshot your docker.sock looks empty but if service is running it should contain pid in it
It took me some time to figure out, but I was running https://hub.docker.com/r/alpine/socat/ locally, and also wanted to connect to my Docker daemon and couldn't (same errors). Then it struck me: the solution on that webpage uses 127.0.0.1 as the ip address to bind to. Instead, start that container with 0.0.0.0, and then inside your container, you can do this: DockerClient dockerClient = new DefaultDockerClient("http://192.168.1.215:2376"); (use your own ip-address of course).
This worked for me.

Using kubeadm to init kubernetes 1.12.0 falied:node "xxx" not found

My environment:
CentOS7 linux
/etc/hosts:
192.168.0.106 master01
192.168.0.107 node02
192.168.0.108 node01
On master01 machine:
/etc/hostname:
master01
On master01 machine I execute commands as follows:
1)yum install docker-ce kubelet kubeadm kubectl
2)systemctl start docker.service
3)vim /etc/sysconfig/kubelet
EDIT the file:
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
4)systemctl enable docker kubelet
5)kubeadm init --kubernetes-version=v1.12.0 --pod-network-cidr=10.244.0.0/16 servicecidr=10.96.0.0/12 --ignore-preflight-errors=all
THEN
The first error message:
unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
kubelet.go:2236] node "master01" not found
kubelet_node_status.go:70] Attempting to register node master01
Oct 2 23:32:35 master01 kubelet: E1002 23:32:35.974275 49157
kubelet_node_status.go:92] Unable to register node "master01" with API server: Post https://192.168.0.106:6443/api/v1/nodes: dial tcp 192.168.0.106:6443: connect: connection refused
l don't know why node master01 not found?
l have tried a lot of ways but can't solve the problem.
Thank you!
Your issue also might be caused by firewall rules, restricting tcp connection to 6443 port.
So you can temporary disable firewall on master node to validate this:
systemctl stop firewalld
and then try to perform kubeadm init once again.
Hope it helps.
I'm not sure that the problem is always related to firewall rules.
Consider the steps below:
1 ) Try to resolve your error related to: /etc/kubernetes/pki/ca.crt.
Run kubeadm init again without the --ignore-preflight-errors=all and see if all actions in the [certs] phase took place - for example:
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
.
.
2 ) Try running re-installing kubeadm or running kubeadm init with other version (use the --kubernetes-version=X.Y.Z flag).
3 ) Try restarting kubelet with systemctl restart kubelet.
4 ) Check if the docker version need to be updated.
5 ) Open specific ports which are needed (if you decide to stop firewalld remember to start it right after you resolve the issue).

docker-machine create with digitalocean driver and Ubuntu 16.04 x64 fails

I am attempting to create a docker machine on Digital Ocean, but with the 16.04 LTS instead of the default 15.10. The do-access-token file contains my token.
Here's the script (create-do):
#!/usr/bin/env bash
# Creates a digital-ocean server with Ubuntu 16.04 instead of the default
if [ "$1" != "" ]; then
echo "Creating: " $1
docker-machine \
create \
--driver digitalocean \
--digitalocean-access-token=`cat do-access-token` \
--digitalocean-image=ubuntu-16-04-x64 \
--digitalocean-ipv6=true \
$1
else
echo "Must have server name!"
fi
When I run the script like this:
$ ./create-do ps-server
It successfully allocates the machine at Digital Ocean, then craps out with this:
Creating: ps-server
Running pre-create checks...
Creating machine...
(ps-server) Creating SSH key...
(ps-server) Creating Digital Ocean droplet...
(ps-server) Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Error creating machine: Error running provisioning: Something went wrong
running an SSH command!
command : sudo apt-get update
err : exit status 100
output : Reading package lists...
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
The machine is running, but I can't get to it since the SSH key was apparently not set before things started going wrong.
Anyone seen this before and/or have a work-around?
Update: May 21, 2016
Broken again with same error this morning. Tried 4 times, failed same way each time.
Update: May 20, 2016
This was, according do Digital Ocean's support, due to an issue with their Ubuntu 16.04 image which has now been corrected and I have confirmed that this now works.
Related GitHub issue (not yet closed):
https://github.com/docker/machine/issues/3358
this worked for me:
docker-machine provision your-node
I've taken this solution from here: https://github.com/docker/machine/issues/3358
I hope this helps!

Docker-machine create with generic driver, Certificates not working but SSH does

Im trying to get a docker-machine up and running on a Ubuntu 14.04TSL server in our network. I have installed docker+docker-machine on the server and im able to create the docker-machine on the server with this command from my computer:
docker-machine create --driver generic --generic-ip-address 10.10.3.76 --generic-ssh-key "/Users/username/Documents/keys/mysshkey.pem" --generic-ssh-user ubuntuuser dockermachinename
The command above creates the docker-machine and im able to list it with
docker-machine ls
Im able to SSH to it by running
docker-machine ssh dockermachinename
but when i try to connect the server with (-D for debug information)
docker-machine -D env dockermachinename
I get the following message
Docker Machine Version: 0.5.2 ( 0456b9f )
Found binary path at /usr/local/bin/docker-machine-driver-generic
Launching plugin server for driver generic
Plugin server listening at address 127.0.0.1:54213
() Calling .GetVersion
Using API Version 1
() Calling .SetConfigRaw
() Calling .GetMachineName
(dockermachinename) Calling .GetState
(dockermachinename) Calling .GetURL
Reading CA certificate from /Users/username/.docker/machine/certs/ca.pem
Reading server certificate from /Users/username/.docker/machine/machines/dockermachinename/server.pem
Reading server key from /Users/username/.docker/machine/machines/dockermachinename/server-key.pem
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "10.10.3.76:2376": dial tcp 10.10.3.76:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
I really need to solve this so all help is appreciated!
On Ubuntu you will need to do following steps:
1. Create user which don't require password
sudo visudo
at the end of file add following line (make sure to specify your username):
username ALL=(ALL:ALL) NOPASSWD: ALL
and then save and exit. And after that add your username to docker group like this (change username with your actual username):
usermod -aG docker username
2. Edit docker config to open 2375 and 2376 ports
sudo systemctl edit docker.service
add following snippet to that file:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// -H tcp://0.0.0.0:2376 -H tcp://0.0.0.0:2375
then save and exit. After that reload config and restart docker deamon with:
sudo systemctl daemon-reload
sudo systemctl restart docker.service
3. Create docker-machine
Remove existing machine which is failing with:
docker-machine rm machine1
and try to create it one more time like this:
docker-machine create -d generic --generic-ip-address ip --generic-ssh-key ~/.ssh/key --generic-ssh-user username --generic-ssh-port 22 machine1
please change ip, key, username and machine1 with you actual values.
If this produce error like this:
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.0.26:2376": tls: oversized record received with length 20527
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
then SSH to your machine and cd into following directory:
cd /etc/systemd/system/docker.service.d/
list all files in it with:
ls -l
you will probably have something like this:
-rw-r--r-- 1 root root 274 Jul 2 17:47 10-machine.conf
-rw-r--r-- 1 root root 101 Jul 2 17:46 override.conf
you will need to delete all files except 10-machine.conf with sudo rm.
After that remove machine you created and create it again. It should now work. I hope this helps. Maybe you already steps 1 and 2 if so then skip them and just try to remove override.conf file or any file in that dir which is not 10-machine.conf.

Resources