I am trying to install a local docker.io registry on a CentOS 7
machine following the instructions here:
https://github.com/docker/docker-registry#quick-start
I ran (EDITED, just to show docker is running):
# service docker restart && cd && docker run -p 5000:5000 registry
After a few minutes looking at the prompt, I got a bunch of errors like this:
[...]
OSError: [Errno 2] No such file or directory: './registry._setup_database.lock'
[2015-03-06 16:39:11 +0000] [13] [INFO] Worker exiting (pid: 13)
[2015-03-06 16:39:11 +0000] [14] [INFO] Worker exiting (pid: 14)
Traceback (most recent call last):
File "/usr/local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 74, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 185, in run
super(Application, self).run()
File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 71, in run
Arbiter(self).run()
File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 196, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 292, in halt
self.stop()
File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 343, in stop
time.sleep(0.1)
File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 209, in handle_chld
self.reap_workers()
File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 459, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
EDITED:
Details of the system:
docker --version
Docker version 1.3.2, build 39fa2fa/1.3.2
System:
cat /etc/centos-release
CentOS Linux release 7.0.1406 (Core)
uname -a
Linux denis1 3.10.0-123.20.1.el7.x86_64 #1 SMP Thu Jan 29 18:05:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Any ideas what I may be doing wrong?
Where should this file be? './registry._setup_database.lock'
EDITED2:
If I try it on my Ubuntu 14.10 laptop, where I installed a new version of docker via a ppa, then it works:
# Upgraded to docker 1.5 via a ppa package in my Ubuntu laptop:
sudo add-apt-repository ppa:docker-maint/testing
sudo apt-get update
sudo apt-get install docker.io
# pull registry latest
sudo docker pull registry:latest
latest: Pulling from registry
fa4fd76b09ce: Downloading 6.931 MB/197.2 MB 3m10s
1c8294cc5160: Download complete
117ee323aaa9: Download complete
fa4fd76b09ce: Pull complete
fa4fd76b09ce: Download complete
1c8294cc5160: Download complete
117ee323aaa9: Download complete
2d24f826cb16: Download complete
777c3edddace: Download complete
f06997673ad7: Download complete
7eafad9a1f16: Download complete
daa8104aee86: Download complete
418dcd975ba2: Download complete
30bff528d188: Download complete
a4f468439f7f: Download complete
e5a8e33139de: Download complete
024a18254446: Download complete
a68f5599e08a: Download complete
511136ea3c5a: Download complete
Status: Downloaded newer image for registry:latest
Any ideas what should I do to get the same result on my CentOS server?
Is there a more recent docker client I can get for CentOS 6 via yum install?
Disable SELINUX and FireWalld.
SELINUX is preventing execution of certain commands as SUDO which is somehow inhibiting behavior.
Also check FireWallD as well.
Neither of upgrade of docker version or "latest" tag etc will solve your problem, I tried them all... it has to do with SELINUX and/or FireWallD... disable both if you can.
I upgraded docker version to 1.5.0 by adding this to yum.repos.d:
[virt7-testing]
name=virt7-testing
baseurl=http://cbs.centos.org/repos/virt7-testing/x86_64/os/
enabled=1
gpgcheck=0
I feel like we can summarize things like this:
The documentation for 'registry' says it only supports docker 1.5. You're running 1.3 and there's a pretty big jump between these.
Supported Docker versions
This image is officially supported on Docker version 1.5.0.
Support for older versions (down to 1.0) is provided on a best-effort
basis.
On CentOS 6.5, you can yum install docker from EPEL (Docker instructions) and get Docker 1.5. Earlier than 6.5, you must yum install docker-io and it appears that 1.4 is the latest available version from EPEL.
In my experience, Docker's support on RedHat-family systems has been poorer then on Debian-family, but this gap has closed in the most recent versions of RH and Docker.
Related
I am trying to setup one small kubenertes cluster on my ubuntu 18.04 LTS server. Now every step is done, but checking the GPU status fails. The container keeps reporting errors:
1. Issue Description
I have done steps by Quick-Start, but when I run the test case, it reports error.
2. Steps to reproduce the issue
exec shell cmd
docker run --security-opt=no-new-privileges --cap-drop=ALL
--network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins
nvidia/k8s-device-plugin:1.9
check the erros
2020/02/09 00:20:15 Starting to serve on
/var/lib/kubelet/device-plugins/nvidia.sock
2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc =
unknown service deviceplugin.Registration
2020/02/09 00:20:15 Could
not contact Kubelet, retrying. Did you enable the device plugin
feature gate?
2020/02/09 00:20:15 You can check the prerequisites at:
https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/02/09
00:20:15 You can learn how to set the runtime at:
https://github.com/NVIDIA/k8s-device-plugin#quick-start
3. Environment Information
- outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
contents of /etc/docker/daemon.json
contents:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
docker version: 19.03.2
kubernetes version: 1.15.2
Finally I found the answer, hope this post would be helpful for others who encounter the same issue:
For kubernetes 1.15, use k8s-device-plugin:1.11 instead. The version 1.9 is not able to communicate with kubelet.
I have the following prerequisites node v10.16.3, go v1.8.1 linux/amd64, Docker Podman v1.0.5, Docker-compose 1.24.1. All are included in path. When I execute "sudo ./byfn.sh up" I'm able to generate the crypto certificates without issues- However error "docker-compose: command not found
" comes in Line 175 when docker-compose is executed in NetworkUp. Tried uninstalling Docker and reinstalling and couldnt find anything relevant in stackflow as well - can someone pl bail me out ?
error
./byfn.sh: line 175: docker-compose: command not found
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Additional info
host:
BuildahVersion: 1.6-dev
Conmon:
package: podman-1.0.5-1.gitf604175.module+el8.0.0+4017+bbba319f.x86_64
path: /usr/libexec/podman/conmon
version: 'conmon version 1.14.0-dev, commit: 6ee5ad0285ca12e6c8d0b663d7a8db5323812ef6-dirty'
Distribution:
distribution: '"rhel"'
version: "8.0"
kernel: 4.18.0-80.4.2.el8_0.x86_64
os: linux
I have a cloud init script
#cloud-config
package_upgrade: true
packages:
- openjdk-8-jdk
- apt-transport-https
- git
- jq
groups:
- docker
users:
- default
- name: jenkins
groups: docker
homedir: /var/lib/jenkins
lock_passwd: true
ssh_authorized_keys:
- ssh-rsa xyz
Which is given to the jenkins ec2-plugin when starting an ubuntu 18.04 AMI.
When jenkins tries to connect to the instance the logs show:
INFO: Verifying: java -fullversion
sh: 1: java: not found
Nov 01, 2018 8:22:10 PM null
INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64
sudo: no tty present and no askpass program specified
Nov 01, 2018 8:22:10 PM null
WARNING: Failed to install: sudo yum install -y java-1.8.0-openjdk.x86_64
sh: 1: java: not found
ERROR: Unable to launch the agent for Ubuntu 18.04 (i-xxx)
java.io.EOFException: unexpected stream termination
If I try to connect to the agent manually again after some time has elapsed (2/3 mins) all is fine:
Agent successfully connected and online
Should the cloud-init script have run before the SSH connection?
I have never had this trouble when using Amazon Linux AMI's where I install java 8 in the same way (via a cloud init script). Is this something specific to the way amazon linux runs cloud init scripts vs ubuntu?
In the end I decided it was easier to install java and create a new AMI to fully avoid this issue.
I think that perhaps my expectations that cloud init would run fully before connecting might be incorrect, mainly because of this comment in the documentation
Allow enough time for the instance to launch and execute the directives in your user data, and then check to see that your directives have completed the tasks you intended.
Perhaps one approach to help solve this might be to stop sshd in the run commands while things install and then start it again when all done, hopefully Jenkins would then connect only once everything is ready.
I am using docker installation of TensorFlow .
I initiate the container using
nvidia-docker run -it -p 8888:8888 -v /*/Data/docker:/docker --name TensorFlow gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash
This allows me to link a folder names "docker" in my secondary local drive with a folder inside docker container.
The issue is that whenever my computer (Ubuntu - GTX 1070 - 6700k Intel CPU) goes to sleep, the GPU becomes unavailable and code runs only on CPU. When I run the code in ipython notebook session inside docker I get:
failed call to cuInit: CUDA_ERROR_UNKNOWN.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.9.3 (Ubuntu 4.9.3-13ubuntu2)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 367.57.0
When i restart the computer, the GPU becomes available without the UNKNOWN message.
I have searched the Internet and the solutions such as sudo apt-get install nvidia-modprobe does not solve the issue.
I'm using Ansible 1.7 (devel) and Docker 0.9.1 build 3600720, and I'm stuck with this error:
failed: [myapp.com] => {"failed": true, "item": "", "parsed": false}
invalid output was: Traceback (most recent call last):
File "/root/.ansible/tmp/ansible-tmp-1400951250.7-173380463612813/docker", line 1959, in <module>
main()
File "/root/.ansible/tmp/ansible-tmp-1400951250.7-173380463612813/docker", line 693, in main
containers = manager.create_containers(1)
File "/root/.ansible/tmp/ansible-tmp-1400951250.7-173380463612813/docker", line 548, in create_containers
if docker.utils.compare_version('1.10', self.client.version()['ApiVersion']) < 0:
KeyError: 'ApiVersion'
Any ideas? Is there any combination of versions that works? I needed Ansible 1.7 because of the 'running' state that was added for docker containers.
I ran into this issue today and decided to fix it. The gist of the problem is older versions of docker don't have an ApiVersion specified (that or the docker-py package doesn't return it).
I've submitted a pull request to fix this issue in the ansible docker module here: https://github.com/ansible/ansible/pull/7619
Alternatively you could upgrade your docker version to get around it.
I got the same error and this fixed it
$ sudo apt-get install -y python-pip
$ sudo pip install docker-py
As said in a more general answer, use the docker_api_version: auto argument :
- name: Mongo data container
docker:
docker_api_version: auto
name: mongo-primary-dc
image: debian:wheezy
state: present
volumes:
- /data