I use custom images (AMIs) configured for machine learning on GPU-enabled EC2 instances.
This means cuda, libcudnn6, nvidia-docker etc are all properly setup on them.
However when Kops starts new nodes from these AMIs (I use cluster-autoscaler) it overrides my properly setup docker.
How can I prevent that?
For now I run a custom script on startup that re-installs nvidia-docker properly, but that's obviously not ideal.
Kops will only install docker if there's a difference between the version it expects to use and the version that is already installed on the node.
Note that Kops will downgrade docker if the installed version is higher than what it expects!
So the solution to my problem was to have a pre-installed version that matches spec.docker.version.
For this we had to downgrade docker to 17.03.2 and nvidia-docker to 2.0.3+docker17.03.2-1.
Related
When I run sudo apt install linux-modules-extra-$(uname -r) in a Docker container based on a Ubuntu 20.04 on a single board computer running Ubuntu 18.04, I get the following errors:
E: Unable to locate package linux-modules-extra-4.15.0-143-generic
E: Couldn't find any package by glob 'linux-modules-extra-4.15.0-143-generic'
E: Couldn't find any package by regex 'linux-modules-extra-4.15.0-143-generic'
To me, this makes me wonder whether it is even possible to install linux-modules-extra-4.15.0-143-generic in Ubuntu 20.04? Maybe it is only compatible with Ubuntu 18.04?
Could anyone clarify this for me please?
In general, if you're building a kernel module, it has to match exactly the kernel that's running on the host system. If you're using a native Debian or Ubuntu system (without Docker), there's a system where kernel modules can be rebuilt or reinstalled when the host kernel is updated. See for example the Debian wiki KernelDKMS page.
In contrast, a Docker image is generally supposed to be portable across hosts. If you upgrade the host's kernel, or if you run a FROM ubuntu:18.04 image on an Ubuntu 20.04 host, the image isn't really supposed to be aware of this.
In your particular case, you can't get the kernel headers you need, because they're not part of the Ubuntu 18.04 distribution. For this particular case it might be possible to get the headers from the later version of Ubuntu, but it might not be possible in the general case; maybe because the system is actually running plain Debian or RHEL and the kernel build is different, maybe because the operator built their own kernel.
Since a Linux kernel module is so specific to the host it runs on, and since it can bypass any and all security concerns, it's not appropriate to try to install one in a container. Do it directly on the host instead.
I took an older macbook back in use. It previously had boot2docker installed when the native docker for mac didn't exist yet. That might be the root cause of my issue.
I've installed the new docker for mac but when I run docker-compose I've got the following error:
docker.errors.TLSParameterError: Path to a certificate and key files must be provided through the client_config param. TLS configurations should map the Docker CLI client configurations. See https://docs.docker.com/engine/articles/https/ for API details.
I don't want to install a docker machine with virtual box or anything. I just want to run it natively like a fresh docker for mac installation. All the solutions I've found so far require me to use a docker-machine.
Fixed it by unsetting all legacy docker machine environment variables so that it uses the correct docker commands
unset ${!DOCKER_*}
I've found the solution on the docker troubleshooting page over here.
I have downloaded docker binary version 1.8.2 and copied that to my backup server (centos server) which doesn't have internet connectivity. I have marked this as executable and started the docker daemon as mentioned in [https://docs.docker.com/engine/installation/binaries/][1]. But it doesn't seem to get installed as a docker service. For all the commands, I have to execute as sudo ./docker-1.8.2 {command}. Is there a way to install docker-engine as a service? Currently sudo docker version shows command not found. I'm a newbie to docker setup. Please advise.
Why not download the rpm package (there are also centos 6 packages), copy to USB stick and then to your server and simply install it with rpm command and that's it. That way you'd get the same installation as if you were to run yum.
Of course you may have some dependencies missing, but you could download all of these as well.
Firstly, if you're downloading bare binaries on an enterprise linux, you're probably doing things in a very bad way. Immediately, you're breaking updates and consistency, and leaving your system in a risky, messy state.
Try using yumdownloader --resolve to get the docker installable and anything it needs.
A better option may be to mirror the installation artifacts, and grab it from the local mirror, but that's beyond the scope if you don't do this already.
I want to install docker 1.3.1 on my centos 6.5 environment but I have no idea how to find it in the epel. I'm quite new to docker. Can anyone help me out? Thanks
Clearly stated in the Docker documentation:
Docker runs on CentOS 7.X.
CentOS 6.5 is not CentOS 7.X. Docker is not available for your old operating system.
Furthermore, you didn't give any details about your computer, but you should remain aware that Docker only works on 64-bit systems.
By the way, you should take better care of your computer; in CentOS, the minor version number is updated automatically by the package manager. So the fact that you are two versions behind (CentOS 6 is currently on 6.7) indicates that you are not performing updates to your packages, and could have various security vulnerabilities. You should update your system regularly, by simply running yum update.
I have been working with LXC containers, the basic tutorials and some networking and it seems to me that its a very straightforward and simple way to create a pure distribution on top of my host.
Current list of templates available does not however list the RHEL x.x distribution. There is CentOS.
I see that Red Hat has supported some efforts in LXC with the libvirt driver, however that shows as deprecated on the site and everything is pointing to their Atomic host which I am experimenting anyways, however, that seems more of a docker way. There might be some variations of docker which ultimately may give me a bare minimum container running a full distro.
I am OK getting more into docker but what I expect at this moment is to run as a simple LXC container with RHEL 6.x distro. Is there no way to run a RHEL LXC container ?
it is indeed unfortunate that redhat plans to discontinue libvirt support for lxc. even within rhel7, so that means rhel6 may be the last version where it will be supported for the lifetime of that release.
as an alternative, there are packages for lxc in epel: https://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/lxc.html
they are even easier to use than libvirt-lxc
as for the template, in either case you should be able to use the centos template with little modification. all the packages are the same and really only the repo sources should point to redhat instead of centos.