Install docker on a compute cluster with a shared file-system

Install docker on a compute cluster with a shared file-system - docker

I have a compute cluster of 16 nodes running centos 6.7, with each node having a local disk and a shared storage between all nodes which is FhGFS based. the shared path is '/cluster'.
How to install Docker so that the image repository is allocated on /cluster, and any node could run containers from that repo. Is there a way to allocate the image repo in the shared area, while installing only the docker engine on each of the nodes? or even better, installing both the image repo and the engine on the shared area and making this installation usable by all nodes?

You can just modify your docker daemon configs to have the runtime root be /cluster
docker daemon --graph="/cluster"
or
docker daemon -g "/cluster"
Say you are using CentOS or RHEL you could add these options under
/etc/sysconfig/docker
If you are using Debian or Ubuntu you would change:
/etc/defaults/docker
So this way all the pulls that you do for images will be stored under /cluster also all your container runtimes will be under /cluster. So if you mount /cluster on all your machines then all of them will be able to see them.
If you want to share the binary, just put it under say /cluster/bin and then add it to your $PATH.
You might also want to look at Docker Swarm which is Docker's native clustering support. Although not ready for primetime as of Today, it's worth looking at.

Related

How to retrive docker images of an older docker?

I had docker 18 when I pulled some docker images. Then I upgraded docker to docker 20, but it seems there are no images left (docker images) list nothing. Can I somehow retrieve them or I should pull them again?

A Docker container consists of network settings, volumes, and images. The location of Docker files depends on your operating system. Here is an overview for the most used operating systems:
Linux: /var/lib/docker/
Windows: C:\ProgramData\DockerDesktop
MacOS: ~/Library/Containers/com.docker.docker/Data/vms/0/
If you use the default storage driver in overlay2, Linux, then your Docker images are stored in /var/lib/docker/overlay2. There, you can find different files that represent read-only layers of a Docker image and a layer on top of it that contains your changes.
If the update overwrote the folder .... you'll have to pull again.

Can a docker image use hadoop?

Can a docker image access hadoop resources? Eg. submit YARN jobs and access HDFS; something like MapR's Datasci. Refinery, but for Hortonworks HDP 3.1. (May assume that the image will be launched on a hadoop cluster node).
Saw the hadoop docs for launching docker applications from hadoop nodes, but was interested in whether could go the "other way" (ie. being able to start a docker image with the conventional docker -ti ... command and have that application be able to run hadoop jars etc. (assuming that the docker image host is a hadoop node itself)). I understand that MapR hadoop has docker images for doing this, but am interested in using Hortonworks HDP 3.1. Ultimately trying to run h2o hadoop in a docker container.
Anyone know if this is possible or can confirm that this is not possible?

Yes. As long as you have the client jars and appropriate configs similar to an edge node for your container it should work.

How are Packer and Docker different? Which one should I prefer when provisioning images?

How are Packer and Docker different? Which one is easier/quickest to provision/maintain and why? What is the pros and cons of having a dockerfile?

Docker is a system for building, distributing and running OCI images as containers. Containers can be run on Linux and Windows.
Packer is an automated build system to manage the creation of images for containers and virtual machines. It outputs an image that you can then take and run on the platform you require.
For v1.8 this includes - Alicloud ECS, Amazon EC2, Azure, CloudStack, DigitalOcean, Docker, Google Cloud, Hetzner, Hyper-V, Libvirt, LXC, LXD, 1&1, OpenStack, Oracle OCI, Parallels, ProfitBricks, Proxmox, QEMU, Scaleway, Triton, Vagrant, VirtualBox, VMware, Vultr
Docker's Dockerfile
Docker uses a Dockerfile to manage builds which has a specific set of instructions and rules about how you build a container.
Images are built in layers. Each FROM RUN ADD COPY commands modify the layers included in an OCI image. These layers can be cached which helps speed up builds. Each layer can also be addressed individually which helps with disk usage and download usage when multiple images share layers.
Dockerfiles have a bit of a learning curve, It's best to look at some of the official Docker images for practices to follow.
Packer's Docker builder
Packer does not require a Dockerfile to build a container image. The docker plugin has a HCL or JSON config file which start the image build from a specified base image (like FROM).
Packer then allows you to run standard system config tools called "Provisioners" on top of that image. Tools like Ansible, Chef, Salt, shell scripts etc.
This image will then be exported as a single layer, so you lose the layer caching/addressing benefits compared to a Dockerfile build.
Packer allows some modifications to the build container environment, like running as --privileged or mounting a volume at build time, that Docker builds will not allow.
Times you might want to use Packer are if you want to build images for multiple platforms and use the same setup. It also makes it easy to use existing build scripts if there is a provisioner for it.

Expanding on the Which one is easier/quickest to provision/maintain and why? What are the pros and cons of having a docker file?`
From personal experience learning and using both, I found: (YMMV)
docker configuration was easier to learn than packer
docker configuration was harder to coerce into doing what I wanted than packer
speed difference in creating the image was negligible, after development
docker was faster during development, because of the caching
the docker daemon consumed some system resources even when not using docker
there are a handful of processes running as the daemon
I did my development on Windows, though I was targeting LINUX servers for running the images.
That isn't an issue during development, except for a foible of running Docker on Windows.
The docker daemon reserves various TCP port ranges for itself
The ranges might change every time you reboot your system or restart the daemon
The only error message is to the effect: can't use that port! but not why it can't
BTW, The workaround is to:
turn off Hypervisor
reboot
reserve the public ports you want your host system to see
turn on hypervisor
reboot
Running packer on Windows, however, the issue I found is that the provisioner I wanted to use, ansible, doesn't run on Windows.
Sigh.
So I end up having to run packer on a LINUX system after all.
Just because I was feeling perverse, I wrote a Dockerfile so I could run both packer and ansible from my Windows station in a docker container using that image.

Docker builds images using a Dockerfile.
These can be run (Docker containers).
Packer also builds images. But you don't need a Dockerfile. And you get the option of using Provisioners such as Ansible which lets you create vastly more customisable images. It isn't used for running these images.

Which Docker images will run on Kubernetes?

How can I find out if a given Docker image can be run using Kubernetes?
What should I do to help ensure that my images will run well in any Kubernetes-managed environment?

All Docker images can be run on Kubernetes -- it uses Docker to run the images.
You can expose ports from containers just like when using Docker directly, pass in environment variables, mount storage volumes from the host into the container, and more.
If you have anything particular in mind, I'd be interested in hearing about any image you find that can't be run using Kubernetes.

It depends on the processor architecture of the machine. If the image is compatible with the underlying hardware architecture, the K8s master node should be able to deploy the container. I had this problem when I try to deploy a Docker container on Raspberry pi 3(ARM arch. machine) with the Docker image which is built for x86-64.
For practical, try to deploy a container with the following image in X86-64 machine:
docker pull arifch2009/hello
The error will be shown :
standard_init_linux.go:178: exec user process caused "exec format error"
This is a simple application to print "Hello World". However, the program/application inside the image is compiled in arm architecture. So, the binary file cannot be executed in other than ARM machine.

How do I run Docker on Google Compute Engine?

What's the procedure for installing and running Docker on Google Compute Engine?

Until the recent GA release of Compute Engine, running Docker was not supported on GCE (due to kernel restrictions) but with the newly announced ability to deploy and use custom kernels, that restriction is no longer intact and Docker now works great on GCE.
Thanks to proppy, the instructions for running Docker on Google Compute Engine are now documented for you here: http://docs.docker.io/en/master/installation/google/. Enjoy!

They now have a VM which has docker pre-installed now.
$ gcloud compute instances create instance-name
--image projects/google-containers/global/images/container-vm-v20140522
--zone us-central1-a
--machine-type f1-micro
https://developers.google.com/compute/docs/containers/container_vms

A little late, but I wanted to add an answer with a more detailed workflow and links, since answers are still rather scattered:
Create a Docker image
a. Locally
b. Using Google Container Builder
Push local Docker image to Google Container Repository
docker tag <current name>:<current tag> gcr.io/<project name>/<new name>
gcloud docker -- push gcr.io/<project name>/<new name>
UPDATE
If you have upgraded to Docker client versions above 18.03, gcloud docker commands are no longer supported. Instead of the above push, use:
docker push gcr.io/<project name>/<new name>
If you have issues after upgrading, see more here.
Create a compute instance.
This process actually obfuscates a number of steps. It creates a virtual machine (VM) instance using Google Compute Engine, which uses a Google-provided, container-optimized OS image. The image includes Docker and additional software responsible for starting our docker container. Our container image is then pulled from the Container Repository, and run using docker run when the VM starts. Note: you still need to use docker attach even though the container is running. It's worth pointing out only one container can be run per VM instance. Use Kubernetes to deploy multiple containers per VM (the steps are similar). Find more details on all the options in the links at the bottom of this post.
gcloud beta compute instances create-with-container <desired instance name> \
--zone <google zone> \
--container-stdin \
--container-tty \
--container-image <google repository path>:<tag> \
--container-command <command (in quotes)> \
--service-account <e-mail>
Tip You can view available gcloud projects with gcloud projects list
SSH into the compute instance.
gcloud beta compute ssh <instance name> \
--zone <zone>
Stop or Delete the instance. If an instance is stopped, you will still be billed for resources such as static IPs and persistent disks. To avoid being billed at all, use delete the instance.
a. Stop
gcloud compute instances stop <instance name>
b. Delete
gcloud compute instances delete <instance name>
Related Links:
More on deploying containers on VMs
More on zones
More create-with-container options

As of now, for just Docker, the Container-optimized OS is certainly the way to go:
gcloud compute images list --project=cos-cloud --no-standard-images
It comes with Docker and Kubernetes preinstalled. The only thing it lacks is the Cloud SDK command-line tools. (It also lacks python3, despite Google's announce of Python 2 sunset on 2020-01-01. Well, it's still 27 days to go...)
As an additional piece of information I wanted to share, I was searching for a standard image that would offer both docker and gcloud/gsutil preinstalled (and found none, oops). I do not think I'm alone in this boat, as gcloud is the thing you could hardly go by without on GCE¹.
My best find so far was the Ubuntu 18.04 image that came with their own (non-Debian) package manager, snap. The image comes with the Cloud SDK preinstalled, and Docker installs literally in a snap, 11 seconds on an F1 instance initial test, about 6s on an n1-standard-1. The only snag I hit was the error message that the docker authorization helper was not available; an attempt to add it with gcloud components install failed because the SDK was installed as a snap, too. However, the helper is actually there, only not in the PATH. The following was what got me the both tools available in a single transient builder VM in the least amount of setup script runtime, starting off the supported Ubuntu 18.04 LTS image²:
snap install docker
ln -s /snap/google-cloud-sdk/current/bin/docker-credential-gcloud /usr/bin
gcloud -q auth configure-docker
¹ I needed both for a Daisy workflow imaging a disk with both artifacts from GS buckets and a couple huge, 2GB+ library images from the local gcr.io registry that were shared between the build (as cloud builder layers) and the runtime (where I had to create and extract containers to the newly built image). But that's besides the point; one may needs both tools for a multitude of possible reasons.
² Use gcloud compute images list --uri | grep ubuntu-1804 to get the most current one.

Google's GitHub site offers now a gce image including docker. https://github.com/GoogleCloudPlatform/cloud-sdk-docker-image

It's as easy as:
creating a Compute Engine instance
curl https://get.docker.io | bash

Using docker-machine is another way to host your google compute instance with docker.
docker-machine create \
--driver google \
--google-project $PROJECT \
--google-zone asia-east1-c \
--google-machine-type f1-micro $YOUR_INSTANCE
If you want to login this machine on google cloud compute instance, just use docker-machine ssh $YOUR_INSTANCE
Refer to docker machine driver gce

There is now improved support for containers on GCE:
Google Compute Engine is extending its support for Docker containers. This release is an Open Preview of a container-optimized OS image that includes Docker and an open source agent to manage containers. Below, you'll find links to interact with the community interested in Docker on Google, open source repositories, and examples to get started. We look forward to hearing your feedback and seeing what you build.
Note that this is currently (as of 27 May 2014) in Open Preview:
This is an Open Preview release of containers on Virtual Machines. As a result, we may make backward-incompatible changes and it is not covered by any SLA or deprecation policy. Customers should take this into account when using this Open Preview release.

Running Docker on GCE instance is not supported. The instance goes down and not able to login again.
We can use the Docker image given by the GCE, to create a instance.

If your google cloud virtual machine is based on ubuntu use the following command to install docker
sudo apt install docker.io

You may use this link: https://cloud.google.com/cloud-build/docs/quickstart-docker#top_of_page.
The said link explains how to use Cloud Build to build a Docker image and push the image to Container Registry. You will first build the image using a Dockerfile and then build the same image using the Cloud Build's build configuration file.

Its better to get it while creating compute instance
Go to the VM instances page.
Click the Create instance button to create a new instance.
Under the Container section, check Deploy container image.
Specify a container image name under Container image and configure options to run the container if desired. For example, you can specify gcr.io/cloud-marketplace/google/nginx1:1.12 for the container image.
Click Create.

Installing Docker on GCP Compute Engine VMs:
This is the link to GCP documentation on the topic:
https://cloud.google.com/compute/docs/containers#installing
In it it links to the Docker install guide, you should follow the instructions depending on what type of Linux you have running in the vm.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart