Stateful applications on Docker - docker

I am learning Docker. Reading a Docker book, it says "it is not recommended run stateful applications (ie databases engines) on Docker". I also heard from a friend of mine that he uses MySQL on Docker with no problems.
Is it good practice to run stateful applications on Docker? What are the scenarios where Docker best fits in?

The problem with statefull docker aplications is that they by default store their state (data) in the containers filesystem. Once you update your software version or want to move to another machine its hard to retrieve the data from there.
What you need to do is bind a volume to the container and store any data in the volume. This volume might be on the host running the container or somewhere else. This picture explains the different setups.
if you run your container with: docker run -v hostFolder:/containerfolder any changes to /containerfolder will be persisted on the hostfolder. Something similar can be done with a nfs drive. Then you can run you application on any host machine and the state will be saved in the nfs drive.

Related

I'm still confused by Docker containers and images

I know that containers are a form of isolation between the app and the host (the managed running process). I also know that container images are basically the package for the runtime environment (hopefully I got that correct). What's confusing to me is when they say that a Docker image doesn't retain state. So if I create a Docker image with a database (like PostgreSQL), wouldn't all the data get wiped out when I stop the container and restart? Why would I use a database in a Docker container?
It's also difficult for me to grasp LXC. On another question page I see:
LinuX Containers (LXC) is an operating system-level virtualization
method for running multiple isolated Linux systems (containers) on a
single control host (LXC host)
What does that exactly mean? Does it mean I can have multiple versions of Linux running on the same host as long as the host support LXC? What else is there to it?
LXC and Docker, Both are completely different. But we say both are container holders.
There are two types of Containers,
1.Application Containers: Whose main motto is to provide application dependencies. These are Docker Containers (Light Weight Containers). They run as a process in your host and gets all the things done you want. They literally don't need any OS Image/ Boot Up thing. They come and they go in a matter of seconds. You cannot run multiple process/services inside a docker container. If you want, you can do run multiple process inside a docker container, but it is laborious. Here, resources (CPU, Disk, Memory, RAM) will be shared.
2.System Containers: These are fat Containers, means they are heavy, they need OS Images
to launch themselves, at the same time they are not as heavy as Virtual Machines, They are very similar to VM's but differ in architecture a bit.
In this, Let us say Ubuntu as a Host Machine, if you have LXC installed and configured in your ubuntu host, You can run a Centos Container, a Ubuntu(with Differnet Version), a RHEL, a Fedora and any linux flavour on top of a Ubuntu Host. You can also run multiple process inside an LXC contianer. Here also resoucre sharing will be done.
So, If you have a huge application running in one LXC Container, it requires more resources, simultaneously if you have another application running inside another LXC container which require less resources. The Container with less requirement will share the resources with the container with more resource requirement.
Answering Your Question:
So if I create a Docker image with a database (like PostgreSQL), wouldn't all the data get wiped out when I stop the container and restart?
You won't create a database docker image with some data to it(This is not recommended).
You run/create a container from an image and you attach/mount data to it.
So, when you stop/restart a container, data will never gets lost if you attach that data to a volume as this volume resides somewhere other than the docker container (May be a NFS Server or Host itself).
Does it mean I can have multiple versions of Linux running on the same host as long as the host support LXC? What else is there to it?
Yes, You can do this. We are running LXC Containers in our production.

Are Docker Volumes machine-specific

I'm new to Docker Swarm. As I understand, Docker Swarm allows you to abstract from clustering. Means you don't care on which hardriwe container is deployed.
On the other hand, the standard way to handle database in Docker - is to write data outside Docker container (to avoid copy-on-write behaviour). That's achieved by mounting a Volume and write db-related data to it. The important thing here - are Volumes machine-specific? Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
Example:
I have 3 machines and 3 microservices/containers. All of them are deployed through Docker Swarm. Only one microservice/container must connect to a database. So I need to mount Volume only on one machine. But on which?
Databases and similar stateful applications are still a hard thing to deal with when it comes to Docker swarm and other orchestration frameworks. Ideally, containers should be able to run on any node in the swarm, but the problem comes when you need to persist data beyond the container's lifecycle.
Mounting a volume is the Docker way to persist data, however this ties the container with a specific node as volumes are created on the specific nodes. There are many projects that try to solve this problem and provide some sort of distributed storage.
There was a project called Flocker that deals with the above problem (it’s no longer maintained). There is also a newer project called REXRAY.
Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
By default, no. Docker swarm will choose one of the nodes and deploy the container on it. However, you can work around this problem:
First, you need to define a named volume in you Stackfile/Composefile under the service definition.
Second, you need to use node Placement Constraints to restrict where the database container should run.
If you do not you a distributed storage tool, then when it comes to databases and similar stateful containers that need volumes, you need to restrict the container to a specific nodes.

Putting databases in their own Docker containers?

I run a complex app with a database backend and many other things all in one container. I notice that Docker images for different database systems are available. When would I want to move something like a DB server to its own container, instead of running everything in the same container? The advantage I have now is that I can deploy everything at once, and I don't have to configure more than one container to get things talking.
Docker or the Container Manager is using Linux container technology to provide a best abstraction, using docker container with multiple process is a bad idea; use docker container for isolating one process, use docker volume container for storing database data ( docker state is not persistent by default).
Use docker-compose or fig to attach two docker containers db and web app, it will ease your management in future!

How do I do docker clustering or hot copy a docker container?

Is it possible to hotcopy a docker container? or some sort of clustering with docker for HA purposes?
Can someone simplify this?
How to scale Docker containers in production
Docker containers are not designed to be VMs and are not really meant for hot-copies. Instead you should define your container such that it has a well-known start state. If the container goes down the alternate should start from the well-known start state. If you need to keep track of state that the container generates at run time this has to be done externally to docker.
One option is to use volumes to mount the state (files) on to the host filesystem. Then use RAID, NTFS or any other means, to share that file system with other physical nodes. Then you can mount the same files on to a second docker container on a second host with the same state.
Depending on what you are running in your containers you can also have to state sharing inside your containers for example using mongo replication sets. To reiterate though containers are not as of yet designed to be migrated with runtime state.
There is a variety of technologies around Docker that could help, depending on what you need HA-wise.
If you simply wish to start a stateless service container on different host, you need a network overlay, such as weave.
If you wish to replicate data across for something like database failover, you need a storage solution, such as Flocker.
If you want to run multiple services and have load-balancing and forget on which host each container runs, given that X instances are up, then Kubernetes is the kind of tool you need.
It is possible to make many Docker-related tools work together, we have a few stories on our blog already.

How to build stateful docker services architecture with CoreOS's fleet?

CoreOS used with fleet enables one to build services running some docker applications.
But is there any way to run docker services which require its state to be kept between restarts, to be persistent? For instance, databases or services that must store some files to be shared later on.
Because as far as I know, the service can be launched on core-1 machine (for example), and on restart will be launched on another one randomly. So the docker volume can be lost.
The simplest way to maintain a database service is to always schedule the fleet unit to the same machine. You can do this by adding an [X-Fleet] section to the fleet unit file and either assigning the unit to a particular X-ConditionMachineID, or X-ConditionMachineMetadata. See the coreos documentation.
You can then persist data outside your docker containers by mounting a volume from the host machine. The recommended way of doing this is to wrap this data in a separate data container via docker:
docker run --name mongodb-volume -v /home/core/mongodb-data:/data/db busybox
docker run -p 27017 --volumes-from mongodb-volume mongodb:latest
Since /home/core/mongodb-data on a particular machine will store persistent mongodb state and the unit will always be scheduled to that same machine, this will solve your problem.
You can consider to run some distributed filesystem over CoreOS cluster. This way whatever machine your database service container ends on, it will always be able to use database, mounted from DFS.

Resources