I have implemented the LAMP stack for a 3rd party forum application on its own dedicated virtual server. One of my aims here was to use a composed docker project (under Git) to encapsulate the application fully. I wanted to keep this as simple to understand as possible for the other sysAdmins supporting the forum, so this really ruled out using S6 etc., and this in turn meant that I had to stick to the standard of one container per daemon service using the docker runtime to do implement the daemon functionality.
I had one particular design challenge that doesn't seem to be addressed cleanly through the Docker runtime system, and that is I need to run periodic housekeeping activities that need to interact across various docker containers, for example:
The forum application requires a per-minute PHP housekeeping task to be run using php-cli, and I only have php-cli and php-fpm (which runs as the foreground deamon process) installed in the php container.
Letsencrypt certificate renewal need a weekly certbot script to be run in the apache container's file hierarchy.
I use conventional /var/log based logging for high-volume Apache access logs as these generate Gb access files that I want to retain for ~7 days in the event of needing to do hack analysis, but that are otherwise ignored.
Yes I could use the hosts crontab to run docker exec commands but this involves exposing application internals to the host system and IMO this is breaking one of my design rules. What follows is my approach to address this. My Q is really to ask for comments and better alternative approaches, and if not then this can perhaps serve as a template for others searching for an approach to this challenge.
All of my containers contain two special to container scripts: docker-entrypoint.sh which is a well documented convention; docker-service-callback.sh which is the action mechanism to implement the tasking system.
I have one application agnostic host service systemctl: docker-callback-reader.service which uses this bash script, docker-callback-reader. This services requests on a /run pipe that is volume-mapped into any container that need to request such event processes.
In practice I have only one such housekeeping container see here that implements Alpine crond and runs all of the cron-based events. So for example the following entry does the per-minute PHP tasking call:
- * * * * echo ${VHOST} php task >/run/host-callback.pipe
In this case the env variable VHOST identifies the relevant docker stack, as I can have multiple instances (forum and test) running on the server; the next parameter (php in this case) identifies the destination service container; the final parameter (task) plus any optional parameters are passed as arguments to a docker exec of php containers docker-service-callback.sh and magic happens as required.
I feel that the strengths of the system are that:
Everything is suitably encapsulated. The host knows nothing of the internals of the app other than any receiving container must have a docker-service-callback.sh on its execution path. The details of each request are implemented internally in the executing container, and are hidden from the tasking container.
The whole implementation is simple, robust and has minimal overhead.
Anyone is free to browse my Git repo and cherry-pick whatever of this they wish.
Comments?
Related
I am in the early stage of developing an image segmentation service. Currently, I have a simple Flask server that is responsible for receiving data and running a docker container with an AI model in the local GPU server. But I also think about something asynchronous like FastAPI or Nodejs to implement some scheduler for prediction tasks. What is better: a) when the server calls the docker container by ssh and the docker container run only when it is called, predicted images, saved results, and stopped, or b) running an API server inside the AI container? Each container is around 5-10GB. Running all containers looks more expensive, but I am not sure what practice is better.
I tried to call the container each time and stop it after work was done.
You should avoid approaches based on dynamically starting containers and approaches based on ssh. I'd recommend a long-running process that accepts some network input, like your existing Flask server, and either always has the ML model running or launches it as a subprocess.
If you can use a subprocess that could be a good match here. When the subprocess exits, all of its memory resources will be automatically cleaned up, so you won't have the cost of the subprocess when it's not being used. If the container happens to exit, the subprocess will get cleaned up with it. Subprocesses are also basic Unix functionality, so you can locally develop your service without needing any particular complex setup.
Dynamically launching containers comes with many challenges. It ties your application to the Docker API, which will make it harder to run, even in local development. Using that API grants unrestricted root-level access to the host system (you can very easily run a container that compromises the host). You need to remember to clean up after your own containers. The setup may not work in other container systems like Kubernetes that don't make a Docker socket available.
An ssh-based system presents different complexities. You need to distribute credentials to various places. If you're trying to run an ssh daemon inside a Docker container, that is difficult to configure securely (what creates the host keys? how do you provision users and private keys?). You also need to think about various failure cases around the ssh transport that might not be present in a purely-local system.
I'm running a program that uses several docker images and containers and it's all spawned and managed by the code. At the same time, I need to enter into the docker exec -it cli bash and execute some commands. These commands however cant be manual and must be made into an api. After extensive searching the closest thing I found is docker remote api [https://blog.trifork.com/2013/12/24/docker-from-a-distance-the-remote-api]. However, I'm a bit scared messing with the internals of docker. I want the spawning and management to remain controlled by the program. I only need to run a limited number of commands to docker cli. Is docker remote api the right way to go? Will it handle scale- my application may see ~27000 mobile and webapps use/call the apis from different parts of the world. Tried and tested solutions would be preferred.
Any advice would be highly appreciated.
There’s not an easy answer to this. Since you include “safest” in the question title, I will suggest you probably need to do some redesign of your application architecture.
The first critical detail is this: being able to run any Docker command, or access the Docker API, implies unrestricted root access on the host. You can trivially docker run an image with writeable root-level access to the host’s filesystem and steal public keys, user passwords, give yourself sudo access, and so on. Using it as a core part of your workflow is incredibly dangerous. Turning on the Docker remote API at all is incredibly dangerous.
As a corollary to this, while docker exec is handy as a debugging tool, you can’t really use it as part of your core workflow. As you note running commands by hand as a trusted administrator doesn’t scale. There are also dangers in shell quoting: you need to make sure an argument doesn’t look like foo; docker run -v/:/host ... and inadvertently gain access to the host system.
In my mind your only real option here is to do this “properly”. Take whatever administrative commands you need to do and wrap them in some API, probably HTTP-based. Build a new service (or several) and add it to your Docker deployment. Maybe under the hood that launches a shell script as a subprocess, but the API wrapper has control over the arguments and can double-check things. The plus side is that this approach probably won’t be a choke point if your application does need to scale out.
currently my web application is running on a server, where all the services (nginx, php, etc.) are installed directly in the host system. Now I wanted to use docker to separate these different services into specific containers. Nginx and php-fpm are working fine. But in the web application pdfs can be generated, which is done using wkhtmltopdf and as I want to follow the single-service-per-container pattern, I want to add an additional container which houses wkhtmltopdf and takes care of this specific service.
The problem is: how can I do that? How can I call the wkhtmltopdf binary from the php-fpm container?
One solution is to share the docker.socket, but that is a big security flaw, so I really don‘t like to it.
So, is there any other way to achieve this? And isn‘t this "microservice separation" one of the main purposes/goals of docker?
Thanks for your help!
You can't directly call binaries from one container to another. ("Filesystem isolation" is also a main goal of Docker.)
In this particular case, you might consider "generate a PDF" as an action your service takes and not a separate service in itself, and so executing the binary as a subprocess is a means to an end. This doesn't even raise any complications since presumably mkhtmltopdf isn't a long-running process, you'll launch it once per request and not respond until the subprocess runs to completion. I'd install or include it in the Dockerfile that packages your PHP application, and be architecturally content with that.
Otherwise the main communication between containers is via network I/O and so you'd have to wrap this process in a simple network protocol, probably a minimal HTTP service in your choice of language/framework. That's probably not worth it for this, but it's how you'd turn this binary into "a separate service" that you'd package and run as a separate container.
I watched this YouTube video on Docker and at 22:00 the speaker (a Docker product manager) says:
"You're probably thinking 'Docker does not support multi-tenancy'...and you are right!"
But never is any explanation of why actually given. So I'm wondering: what did he mean by that? Why Docker doesn't support multi-tenancy?! If you Google "Docker multi-tenancy" you surprisingly get nothing!
One of the key features most assume with a multi-tenancy tool is isolation between each of the tenants. They should not be able to see or administer each others containers and/or data.
The docker-ce engine is a sysadmin level tool out of the box. Anyone that can start containers with arbitrary options has root access on the host. There are 3rd party tools like twistlock that connect with an authz plugin interface, but they only provide coarse access controls, each person is either allowed or disallowed from an entire class of activities, like starting containers, or viewing logs. Giving users access to either the TLS port or docker socket results in the users being lumped into a single category, there's no concept of groups or namespaces for the users connecting to a docker engine.
For multi-tenancy, docker would need to add a way to define users, and place them in a namespace that is only allowed to act on specific containers and volumes, and restrict options that allow breaking out of the container like changing capabilities or mounting arbitrary filesystems from the host. Docker's enterprise offering, UCP, does begin to add these features by using labels on objects, but I haven't had the time to evaluate whether this would provide a full multi-tenancy solution.
Tough question that others might know how to answer better than me. But here it goes.
Let's take this definition of multi tenancy (source):
Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers.
It's really hard to place Docker in this definition. It can be argued that it's both the instance and the application. And that's where the confusion comes from.
Let's break Docker up into three different parts: the daemon, the container and the application.
The daemon is installed on a host and runs Docker containers. The daemon does actually support multi tenancy, as it can be used my many users on the same system, each of which has their own configuration in ~/.docker.
Docker containers run a single process, which we'll refer to as the application.
The application can be anything. For this example, let's assume the Docker container runs a web application like a forum or something. The forum allows users to sign in and post under their name. It's a single instance that serves multiple customers. Thus it supports multi tenancy.
What we skipped over is the container and the question whether or not it supports multi tenancy. And this is where I think the answer to your question lies.
It is important to remember that Docker containers are not virtual machines. When using docker run [IMAGE], you are creating a new container instance. These instances are ephemeral and immutable. They run a single process, and exit as soon as the process exists. But they are not designed to have multiple users connect to them and run commands simultaneously. This is what multi tenancy would be. Instead, Docker containers are just isolated execution environments for processes.
Conceptually, echo Hello and docker run echo Hello are the same thing in this example. They both execute a command in a new execution environment (process vs. container), neither of which supports multi tenancy.
I hope this answers is readable and answers your question. Let me know if there is any part that I should clarify.
Do I need use separate Docker container for my complex web application or I can put all required services in one container?
Could anyone explain me why I should divide my app to many containers (for example php-fpm container, mysql container, mongo container) when I have ability to install and launch all stuff in one container?
Something to think about when working with Docker is how it works inside. Docker replaces your PID 1 with the command you specify in the CMD (and ENTRYPOINT, which is slightly more complex) directive in your Dockerfile. PID 1 is normally where your init system lives (sysvinit, runit, systemd, whatever). Your container lives and dies by whatever process is started there. When the process dies, your container dies. Stdout and stderr for that process in the container is what you are given on the host machine when you type docker logs myContainer. Incidentally, this is why you need to jump through hoops to start services and run cronjobs (things normally done by your init system). This is very important in understanding the motivation for doing things a certain way.
Now, you can do whatever you want. There are many opinions about the "right" way to do this, but you can throw all that away and do what you want. So you COULD figure out how to run all of those services in one container. But now that you know how docker replaces PID 1 with whatever command you specify in CMD (and ENTRYPOINT) in your Dockerfiles, you might think it prudent to try and keep your apps running each in their own containers, and let them work with each other via container linking. (Update -- 27 April 2017: Container linking has been deprecated in favor of regular ole container networking, which is much more robust, the idea being that you simply join your separate application containers to the same network so they can talk to one another).
If you want a little help deciding, I can tell you from my own experience that it ends up being much cleaner and easier to maintain when you separate your apps into individual containers and then link them together. Just now I am building a Wordpress installation from HHVM, and I am installing Nginx and HHVM/php-fpm with the Wordpress installation in one container, and the MariaDB stuff in another container. In the future, this will let me drop in a replacement Wordpress installation directly in front of my MariaDB data with almost no hassle. It is worth it to containerize per app. Good luck!
When you divide your web application to many containers, you don't need to restart all the services when you deploy your application. Like traditionally you don't restart your mysql server when you update your web layer.
Also if you want to scale your application, it is easier if your application is divided separate containers. Then you can just scale those parts of your application that are needed to solve your bottlenecks.
Some will tell you that you should run only 1 process per container. Others will say 1 application per container. Those advices are based on principles of microservices.
I don't believe microservices is the right solution for all cases, so I would not follow those advices blindly just for that reason. If it makes sense to have multiples processes in one container for your case, then do so. (See Supervisor and Phusion baseimage for that matter)
But there is also another reason to separate containers: In most cases, it is less work for you to do.
On the Docker Hub, there are plenty of ready to use Docker images. Just pull the ones you need.
What's remaining for you to do is then:
read the doc for those docker images (what environnement variable to set, etc)
create a docker-compose.yml file to ease operating those containers
It is probably better to have your webapp in a single container and your supporting services like databases etc. in a separate containers. By doing this if you need to do rolling updates or restarts you can keep your database online while your application nodes are doing individual restarts so you wont experience downtime. If you have caching with something like Redis etc this is also useful for the same reason. It will also allow you to more easily add nodes to scale in a loosely coupled fashion. It will also allow you to manage the containers in a manner more suitable to a specific purpose. For the type of application you are describing I see very few arguments for running all services on a single container.
It depends on the vision and road map you have for your application. Putting all components of an application in one tier in this case docker container is like putting all eggs in one basket.
Whenever your application would require security, performance related issues then separating those three components in their own containers would be an ideal solution. It's needless to mention that this division of labor across containers would come at some cost and which would be related to wiring up those containers together for communication and security etc.