Moving Docker Containers Around - docker

I would like to use this Docker container:
https://registry.hub.docker.com/u/cptactionhank/atlassian-confluence/dockerfile/
My concern is that if I have to wind up moving this docker container to another machine (or it quits for some reason and needs to be restarted) that all the data (server config and other items stored on the file system) is lost. How do I ensure that this data isn't lost?
Thanks!

The first rule of Docker containers is don't locate your data inside your application container. Data that needs to persist beyond the lifetime of the container should be stored in a Docker "volume", either mounted from a host directory or from a data-only container.
If you want to be able to start containers on different hosts and still have access to your data, you need to make sure your data is available on those hosts. This problem isn't unique to Docker; it's the same problem you would have if you wanted to scale an application across hosts without using Docker.
Solutions include:
A network filesystem like NFS.
A cluster fileystem like Gluster.
A non-filesystem based data store, like a database, or something like Amazon S3.
This is not necessarily an exhaustive list but hopefully gives you some ideas.

Related

Bind mount volume between host and pod containers in Kubernetes

I have a legacy application that stores some config/stats in one of the directory on OS partition (e.g. /config/), and I am trying to run this as a stateful container in Kubernetes cluster.
I am able to run it as a container but due to the inherent ephemeral nature of containers, whatever data my container is writing to the OS partition directory /config/ is lost when the container goes down/destroyed.
I have the Kubernetes deployment file written in such a way that the container is brought back to life, albeit as a new instance either on same host or on another host, but this new container has no access to the data written by previous instance of the container.
If it was a docker container I could get this working using bind-mounts, so that whatever data the container writes to its OS partition directory is saved on the host directory, so that any new instance would have access to the data written by previous instance.
But I could not find any alternative for this in Kubernetes.
I could use hostpath provisioning, but hostpath-provisioning right now works only for single-node kubernetes cluster.
Is there a way I could get this working in a multi-node Kubernetes cluster? Any other option other than hostpath provisioning? I can get the containers talk to each other and sync-the data between nodes, but how do we bind-mount a host directory to container?
Thanks for your help in advance!
This is what you have Volumes and VolumeMounts for in your POD definition. Your lead about hostPath is the right direction, but you need a different volume type when you host data in a cluster (as you seen your self).
Take a look at https://kubernetes.io/docs/concepts/storage/volumes/ for a list of supported storage backends. Depending on your infrastructure you might find one that suits your needs, or you might need to actually create a backing service for one (ie. NFS server, Gluster, Ceph and so on).
If you want to add another abstraction layer to make a universal manifest that can work on different environments (ie. with storage based on cloud provider, or just manualy provisioned depending on particular needs). You will want to get familiar with PV and PVC (https://kubernetes.io/docs/concepts/storage/persistent-volumes/), but as I said they are esntially an abstraction over the basic volumes, so you need to crack that first issue anyway.

Persistent storage solution for Docker on AWS EC2

I want to deploy a node-red server on my AWS EC2 cluster. I got the docker image up and running without problems. Node-red stores the user flows in a folder named /data. Now when the container is destroyed the data is lost. I have red about several solutions where you can mount a local folder into a volume. What is a good way to deal with persistent data in AWS EC2?
My initial thoughts are to use a S3 volume or mount a volume in the task definition.
It is possible to use a volume driver plugin with docker that supports mapping EBS volumes.
Flocker was one of the first volume managers, it supports EBS and has evolved to support a lot of different back ends.
Cloudstor is Dockers volume plugin (It comes with Docker for AWS/Azure).
Blocker is an EBS only volume driver.
S3 doesn't work well for all file system operations as you can't update a section of an object, so updating 1 byte of a file means you have to write the entire object again. It's also not immediately consistent so a write then read might give you odd/old results.
The EBS volume can only be attached to one instance which means that you can only run your docker containers in one EC2 instance. Assuming that you would like to scale your solution in future with many containers running in ECS cluster then you need to look into EFS. It’s a shared system from AWS. The only issue is performance degradation of EFS over EBS.
The easiest way (and the most common approach) is run your docker with -v /path/to/host_folder:/path/to/container_folder option, so the container will refer to host folder and information will stay after it will be restarted or recreated. Here the detailed information about docker volume system.
I would use AWS EFS. It is like a NAS in that you can have it mounted to multiple instances at the same time.
If you are using ECS for your docker host the following guide may be helpful http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_efs.html

Docker Volume Containers for database, logs and metrics

I have an application that uses an embedded DB and also generates logs and raw metrics to the following directory structure:
/opt/myapp/data/
database/
logs/
raw_metrics/
I am in the process of learning Docker and am trying to "Dockerize" this app and am trying to find a mounting/volume solution that accomplishes the following goals for me:
The embedded database is stored in the same mounted volume regardless of how many container instances of myapp that I have running. In other words, all container instances write their data to the shared database/ volume; and
I'd also prefer the same for my logs and raw metrics (that is: all container instances write logs/metrics to the same shared volume), except here I need to be able to distinguish log and metrics data for each container. In other words, I need to know that container X generated a particular log message, or that container Y responded to a request in 7 seconds, etc.
I'm wondering what the standard procedure is here in Docker-land. After reading the official Docker docs as well as this article on Docker Volumes my tentative approach is to:
Create a Data Volume Container and mount it to, say, /opt/myapp on the host machine
I can then configure my embedded database to read DB contents from/write them to /opt/myapp/database, and I believe (if I understand what I've read correctly), all container instances will be sharing the same DB
Somehow inject the container ID or some other unique identifier into each container instance, and refactor my logging and metrics code to include that injected ID when generating logs or raw metrics, so that I might have, say, an /opt/myapp/logs/containerX.log file, an /opt/myapp/logs/containerY.log file, etc. But I'm very interested in what the standard practice is here for log aggregation amongst Docker containers!
Also, and arguably much more importantly, is the fact that I'm not sure that this solution would work in a multi-host scenario where I have a Swarm/cluster running dozens of myapp containers on multiple hosts. Would my Data Volume Container magically synchronize the /opt/myapp volume across all of the hosts? If not, what's the solution for mounting shared volumes for containers, regardless of whatever host they're running on? Thanks in advance!
There are multiple good questions. Following are some of my answers.
The default logging driver used by Docker is json-file. This will capture stdout and stderr in json format. There are other logging drivers(like syslog, fluentd, LogEntries etc) that can send to central log server. Using central logging also avoids the problem of maintaining volumes by ourselves. All Docker logging drivers are captured here(https://docs.docker.com/engine/admin/logging/overview/#supported-logging-drivers)
If you use Swarm mode with services, there is a concept of service logging where service logs contains logs associated with all containers associated with the service. (https://docs.docker.com/engine/reference/commandline/service_logs/)
Docker log contains container id by default which is added by logging driver. We can customize it using log options(https://docs.docker.com/engine/admin/logging/log_tags/)
For sharing data across containers like database, if the containers are in same host, we can use host based volumes. This will not work across nodes as there is no autosync. For sharing container data across nodes, we can either use shared filesystem(like nfs, ceph, gluster) or Docker volume plugins(ebs, gce)

Docker swarm NFS volumes,

I am playing with docker's 1.12 swarm with Orchestration! But there is one issue I am not able to find an answer to:
In this case if you're running a service like nginx or redis you don't worry about the data persistence,
But if you're running a service like a database we need data persistance so if something happens to your docker instance the master will shuttle the docker instance to one of the available nodes, by default docker doesn't move data volumes to other nodes to address this problem. We can use third party plugins like Flocker (https://github.com/ClusterHQ/flocker), Rexray ("https://github.com/emccode/rexray") to solve the issue.
But the problem with this is: when one node fails you lose the data. Flocker or Rexray does not deal with this.
We can solve this if we use something like NFS. I mount the same volume to across my nodes in this case we don't have to move the data between two nodes. If one of the nodes fail its need to remember the docker mount location, can we do this? If so can we achieve this with docker Swarm Built-In Orchestration!
Using Rexray, then the data is stored outside the docker swarm nodes (in Amazon S3, Openstack Cinder, ...). So If you loose a node, you won't loose your persistent data. If your scheduler mounts a new container which needs the data on another host, it will retrieve the external volume using rexray plugin and you're ok to go.
Note: your external provider needs to allow you to perform forced detach of the volume from the now unavailable old nodes.

How do I do docker clustering or hot copy a docker container?

Is it possible to hotcopy a docker container? or some sort of clustering with docker for HA purposes?
Can someone simplify this?
How to scale Docker containers in production
Docker containers are not designed to be VMs and are not really meant for hot-copies. Instead you should define your container such that it has a well-known start state. If the container goes down the alternate should start from the well-known start state. If you need to keep track of state that the container generates at run time this has to be done externally to docker.
One option is to use volumes to mount the state (files) on to the host filesystem. Then use RAID, NTFS or any other means, to share that file system with other physical nodes. Then you can mount the same files on to a second docker container on a second host with the same state.
Depending on what you are running in your containers you can also have to state sharing inside your containers for example using mongo replication sets. To reiterate though containers are not as of yet designed to be migrated with runtime state.
There is a variety of technologies around Docker that could help, depending on what you need HA-wise.
If you simply wish to start a stateless service container on different host, you need a network overlay, such as weave.
If you wish to replicate data across for something like database failover, you need a storage solution, such as Flocker.
If you want to run multiple services and have load-balancing and forget on which host each container runs, given that X instances are up, then Kubernetes is the kind of tool you need.
It is possible to make many Docker-related tools work together, we have a few stories on our blog already.

Resources