Why a docker-container or a kubernetes-pod is considered disposable? - docker

I read in many books and documentations[1][2] that a docker-container or a pod are considered to be disposable and have a short lifetime. why are they considered so ephemeral? In such case how can one run a containerized application in production ?
Besides, do the two terms disposable-container and immutable-container mean the same ?
[1] https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/
[2] https://developers.redhat.com/blog/2016/02/24/10-things-to-avoid-in-docker-containers/

Besides, are the two terms disposable-container and immutable-container mean the same ?
Immutable means once it's created, it cannot be changed. Disposable, in the context of your question, means it can be destroyed and replaced with little consequence.
These things are not the same, but they operate together in a typical containerized application. You will be running an immutable container, and when you want to change the behavior of the container you would replace it with a new container instead of trying to change the existing container.
This is different than something like a VM, where you would deploy code updates and reload the service or similar if you wanted to change the behavior of your app.
why are they considered so ephemeral?
A container is a process. A process is ephemeral. Containers are ephemeral.
Containers are able to persist data separately though.
how can one run a containerized application in production
If your hangup with using containers in production can be rephrased "how can you run a containerized application in production with no state?", then I'd first say not all applications are stateful. A basic web server, or a great many well-designed micro services, for example.
For stateful applications, nothing stops you from using a common database to back your containerized applications. You can also use volumes, as described above. You could combine the two and run a containerized database, using volumes underneath the database container. State isn't really a problem.

Related

What purpose to ephemeral volumes serve in Kubernetes?

I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.

Docker Swarm - Deploying stack with shared code base across hosts

I have a question related with the best practices for deploying applications to the production based on the docker swarm.
In order to simplify discussion related with this question/issue lets consider following scenario:
Our swarm contains:
6 servers (different hosts)
on each of these servers, we will have one service
each service will have only one task/replica docker running
Memcached1 and Memcached2 uses public images from docker hub
"Recycle data 1" and "Recycle data 2" uses custom image from private repository
"Client 1" and "Client 2" uses custom image from private repository
So at the end, for our example application, we have 6 dockers running across 6 different servers. 2 dockers are memcached, and 4 of them are clients which are communicating with memcached.
"Client 1" and "Client 2" are going to insert data in the memcached based on the some kind of rules. "Recycle data 1" and "Recycle data 2" are going to update or delete data from memcached based on some kind of rules. Simple as that.
Our applications which are communicating with memcached are custom ones, and they are written by us. The code for these application reside on github (or any other repository). What is the best way to deploy this application to the production:
Build images which will contain copied code within the image which you can use to deploy things to the swarm
Build image which will use volume where code reside outside of the image.
Having in mind that I am deploying swarm to the production for the first time, I can see a lot of issues with way number 1. Having a code incorporate to the images seems non logical to me, having in mind that in 99% of the time, the updates which are going to happen are going to be code based. This will require building image every time when you want to update the code which runs on specific docker (no matter how small that change is).
Way number 2. seems much more logical to me. But at this specific moment I am not sure is this possible? So there are a number of questions here:
What is the best approach in case where we are going to host multiple dockers which will run the same code in the background?
Is it possible on docker swarm, to have one central host,server (manager, anywhere) where we can clone our repositories and share those repositores as volumes across the docker swarm? (in our example, all 4 customer services will mount volume where we have our code hosted)
If this is possible, what is the docker-compose.yml implementation for it?
After digging more deeper and working with docker and docker swarm mode for last 3 months, these are the answers on questions above:
Answer 1: In general, you should consider your docker image as "compiled" version of your program. Your image should contain either code base, or compiled version of the program (depends which programming language you are using), and that specific image represents your version of the app. Every single time when you want to deploy your next version, you will generate the new image.
This is probably best approach for 99% of the apps which are going to be hosted with the docker (exceptions are development environments and apps where you really want to bash and control things directly from the docker container by itself).
Answer 2: It is possible but it is extremely bad approach. As mentioned in answer one, the best one is to copy the app code directly into the image and "consider" your image (running container) as "app by itself".
I was not able to wrap my head around this concept at the begging, because this concept will not allow you to simply go to the server (or where ever you are hosting your docker) and change the app and restart docker (obviously because container will be at the same beginning again after restart using the same image, same base of code you deployed with that image). Any kind of change SHOULD and NEEDS to be deployed as different image with different version. That is what docker is all about.
Additionally, initial idea for sharing same code base across multiple swarm services is possible, but it totally ruins purpose of the versioning across docker swarm.
Consider having 3 services which are used as redundant services (failover), and you want to use new version on one of them as beta test. This will not be possible with the shared code base.

Backup-friendly Docker volumes

I want to take a holistic approach backing up multiple machines running multiple Docker containers. Some might run, for example, Postgres databases. I want to back up this system, without having to have specific backup commands for different types of volumes.
It is fine to have a custom external script that sends e.g. signals to containers or runs Docker commands, but I strongly want to avoid anything specific to a certain image or type of image. In the example of Postgres, the documentation suggests running postgres-specific commands to backup databases, which goes against the design goals for the backup solution I am trying to create.
It is OK if I have to impose restrictions on the Docker images, as long as it is reasonably easy to implement by starting from existing Docker images and extending.
Any thoughts on how to solve this?
I just want to stress that I am not looking for a solution for how to back up Postgres databases under Docker, there are already many answers explaining how to do so. I am specifically looking for a way to back up any volume, without having to know what it is or having to run specific commands for its data.
(I considered whether this question belonged on SO or Serverfault, but I believe this is a problem to be solved by developers, hence it belongs here. Happy to move it if consensus is otherwise)
EDIT: To clarify, I want do something similar to what is explained in this question
How to deal with persistent storage (e.g. databases) in docker
but using the approach in the accepted answer is not going to work with Postgres (and I am sure other database containers) according to documentation.
I'm skeptical that there is a custom solution, holistic, multi machine, multi container, application/container agnostic approach. From my point of view there is a lot of orchestration activities necessary in the first place. And I wonder if you wouldn't use something like Kubernetes anyways that - supposedly - comes with its own backup solution.
For single machine, multi container setup I suggest to store your container's data, configuration, and eventual build scripts within one directory tree (e.g. /docker/) and use a standard file based backup program to backup the root directory.
Use docker-compose to managed your containers. This lets you store the configuration and even build options in a file(s). I have an individual compose file for each service, but a single one would also work.
Have a subdirectory for each service. Mount bind-mount directories aka volumes of the container there. If you need to adapt the build process more thoroughly you can easily store scripts, sources, Dockerfiles, etc. in there as well.
Since containers are supposed to be ephemeral, all persistent data should be in bind-mount and therefore in the main docker directory.

Container with app + DB

I've been making some tests with docker and so far I'm wondering why it's considered a good practice to separate the DB and the app in two containers.
Having two containers seems to be cumbersome to manage and I don't really see the value in it.
Whereas I like the idea of having a self sustainable container per app.
One reason is the separation of data storage and application. If you you put both in their own container, you can update them independently. In my experience this is a common process, because usually the application will evolve faster than the underlying database.
It also frees you to run the containers in different places, which might be a constraint in your operations. Or to run multiple containers from the same database image with different applications.
Often it is also a good thing to be able to scale the UI from one instance to multiple instance, all connected to the same database (or cache instance or HTTP backend). This is mentioned briefly in the docker best practices.
I also understand the urge to run multiple processes in one container. That's why so many minimalist init systems/supervisors like s6 came up lately. I prefer this for demos of applications which require a couple things, like an nginx for frontend, a database and maybe a redis instance. But you could also write a basic docker-compose file and run the demo with multiple containers.
It depends on what you consider your "DB", is it the database application or the content.
The latter is easy, the content needs to be persisted outside the lifetime of the application. The convention used to be to have a "data" container, which simplified linking it with the application (e.g. using the Docker Engine create command --volumes-from parameter). With Docker 1.9 there is a new volume API which has superceded the concept of "data" containers. But you should never store your data in the overlay filesystem (if not only for persistence, but for performance).
If you are referring to a database application, you really enter a semi-religious debate with the microservices crowd. Docker is built to run single process. It is built for 12-factor apps. It is built for microservices. It is definitely possible to run more than one process in a container, but with it you have to consider the additional complexity of managing/monitoring these processes (e.g. using an init process like supervisord), dealing with logging, etc.
I've delivered both. If you are managing the container deployment (e.g. you are hosting the app), it is actually less work to use multiple containers. This allows you to use Docker's abstraction layers for networking and persistent storage. It also provides maximum portability as you scale the application (perhaps you may consider using convoy or flocker volume drivers or an overlay network for hosting containers across multiple servers). If you are developing a product for distribution, it is more convenient to deliver a single Docker Repository (with one Image). This minimizes the support costs as you guide customers through deployment.

Docker application deployment DEV vs TEST

I'm really struggling to grasp the workflow of Docker. The issue is: where exactly are the deliverables?
One would expect the developers image to be the same one as the one used for testing, production.
But how can one develop use auto-reload and such(probably by some shared volumes) without building the image again and again?
The image for testers should be just fire and you are ready to go. How are the images split?
I heard something about data-container which holds probably the app deliverables. So does it mean that I will have one container for DB, one for App. Server and one versioned image for my code itself?
The issue is ,where exactly are the deliverables.
static deliverables (which never changes) are directly copied in the image.
dynamic deliverables (which are generated during a docker run session, or which are updated) are in volumes (either host mounted volume or data container volume), in order to have persistence across container life-cycle.
does it mean that I will have one container for DB, one for App.
Yes, in addition of your application container (which is what docker primarily is: it puts applications in container), you would have data container in order to isolate the data that needs persistence.

Resources