I posted this question originally on the Docker forums, but didn't get any response there.
I'm wondering what the best way would be to model a set of services let's call them db, web, and batch. db is simply a running database server instance (think MySQL). web is a web application that needs to connect to the database. batch is a batch application that needs to connect to that same database (it can/will run in parallel with web). db needs to be running, for either web or batch to run. But web or batch can be run independently of each other (one or both can be running at the same time). If both are running at once, they need to be talking to the same database instance (so db is actually using volumes_from a separate data volume container). So if the use case was simpler (think just db and web, which always run together), then they would simply both be defined as services in the same compose file, with web having a link to db.
As far as I understand it, these can't all be defined in the same Docker compose configuration. Instead, I would need three different configurations. One for db, which is launched first, one for web (which uses external_links to find db), and a third for batch (which also uses external_links to find db). Is that correct, or is there some mechanism available that I'm not considering? Assuming a multi-configuration setup is needed, is there a way to "lazily" initialize the db composition if it's not running, when either the web or batch compositions are launched?
If web has a link defined to db in a docker-compose file, db will always start first.
As far as I know, Docker will never know when the database will be up. It will be your web container's responsibility to properly start and retry until the base is up (with a timeout).
For your batch service, assuming that you don't want to start it everytime you start your web and db containers (using a docker-compose up or run), you can try extending your service. See the docs for more informations on this.
Either you applications in the web and batch images known how to handle database down time and are able to wait for the db service to come up and auto-reconnect ; either you have to make a shell script that will be run when the docker container is started to wait for the db to be available before starting the app.
Depending on the docker images you are using for the web and batch services, you would have to override CMD, ENTRYPOINT or both.
This question has examples of shell script which waits for a MySQL service to be up.
And here are other technics for testing if a network port is opened.
Related
I have implemented the LAMP stack for a 3rd party forum application on its own dedicated virtual server. One of my aims here was to use a composed docker project (under Git) to encapsulate the application fully. I wanted to keep this as simple to understand as possible for the other sysAdmins supporting the forum, so this really ruled out using S6 etc., and this in turn meant that I had to stick to the standard of one container per daemon service using the docker runtime to do implement the daemon functionality.
I had one particular design challenge that doesn't seem to be addressed cleanly through the Docker runtime system, and that is I need to run periodic housekeeping activities that need to interact across various docker containers, for example:
The forum application requires a per-minute PHP housekeeping task to be run using php-cli, and I only have php-cli and php-fpm (which runs as the foreground deamon process) installed in the php container.
Letsencrypt certificate renewal need a weekly certbot script to be run in the apache container's file hierarchy.
I use conventional /var/log based logging for high-volume Apache access logs as these generate Gb access files that I want to retain for ~7 days in the event of needing to do hack analysis, but that are otherwise ignored.
Yes I could use the hosts crontab to run docker exec commands but this involves exposing application internals to the host system and IMO this is breaking one of my design rules. What follows is my approach to address this. My Q is really to ask for comments and better alternative approaches, and if not then this can perhaps serve as a template for others searching for an approach to this challenge.
All of my containers contain two special to container scripts: docker-entrypoint.sh which is a well documented convention; docker-service-callback.sh which is the action mechanism to implement the tasking system.
I have one application agnostic host service systemctl: docker-callback-reader.service which uses this bash script, docker-callback-reader. This services requests on a /run pipe that is volume-mapped into any container that need to request such event processes.
In practice I have only one such housekeeping container see here that implements Alpine crond and runs all of the cron-based events. So for example the following entry does the per-minute PHP tasking call:
- * * * * echo ${VHOST} php task >/run/host-callback.pipe
In this case the env variable VHOST identifies the relevant docker stack, as I can have multiple instances (forum and test) running on the server; the next parameter (php in this case) identifies the destination service container; the final parameter (task) plus any optional parameters are passed as arguments to a docker exec of php containers docker-service-callback.sh and magic happens as required.
I feel that the strengths of the system are that:
Everything is suitably encapsulated. The host knows nothing of the internals of the app other than any receiving container must have a docker-service-callback.sh on its execution path. The details of each request are implemented internally in the executing container, and are hidden from the tasking container.
The whole implementation is simple, robust and has minimal overhead.
Anyone is free to browse my Git repo and cherry-pick whatever of this they wish.
Comments?
Can somebody explain it with some examples? Why multi-container docker apps are built? while you can contain your app in a single docker container.
When you make a multi-container app you have to do networking. Is not it easy to run a single image of a single container rather than two images of two containers?
There are several good reasons for this:
It's easier to reuse prebuilt images. If you need MySQL, or Redis, or an Nginx reverse proxy, these all exist as standard images on Docker Hub, and you can just include them in a multi-container Docker Compose setup. If you tried to put them into a single image, you'd have to install and configure them yourself.
The Docker tooling is built for single-purpose containers. If you want the logs of a multi-process container, docker logs will generally print out the supervisord logs, which aren't what you want; if you want to restart one of those containers, the docker stop; docker rm; docker run sequence will delete the whole thing. Instead with a multi-process container you need to use debugging tools like docker exec to do anything, which is harder to manage.
You can upgrade one part without affecting the rest. Upgrading the code in a container usually involves building a new image, stopping and deleting the old container, and running a new container from the new image. The "deleting the old container" part is important, and routine; if you need to delete your database to upgrade your application, you risk losing data.
You can scale one part without affecting the rest. More applicable in a cluster environment like Docker Swarm or Kubernetes. If your application is overloaded (especially in production) you'd like to run multiple copies of it, but it's hard to run multiple copies of a standard relational database. That essentially requires you to run these separately, so you can run one proxy, five application servers, and one database.
Setting up a multi-container application shouldn't be especially difficult; the easiest way is to use Docker Compose, which will deal with things like creating a network for you.
For the sake of simplification, I would say you can run only one application with a public entry point (like API) in a single container. Actually, this approach is recommended by Docker official documentation.
Microservices
Because of this single constraint, you cannot run microservices that require their own entry points in a single docker container.
It could be more a discussion on the advantages of Monolith application vs Microservices.
Database
Even if you decide to run the Monolith application only, still you need to connect some database there. As you noticed, Docker has an additional network-configuration layer, so if you want to run Database and application locally, the easiest way is to use docker-compose to run both images (Database and your Application) inside one, automatically configured network.
# Application definition
application: <your app definition>
# Database definition
database:
image: mysql:5.7
In my example, you can just connect to your DB via https://database:<port> URL from your main app (plus credentials eventually) and it will work.
Scalability
However, why we should split images for the database from the application? One word - scalability. For development purposes, you want to have your local DB, maybe with docker because it is handy. For production purposes, you will put the application image to run somewhere (Kubernetes, Docker-Swarm, Azure App Services, etc.). To handle multiple requests at the same time, you want to run multiple instances of your application. However what about the database? You cannot connect to the internal instance of DB hosted in the same container, because other instances of your app in other containers will have a completely different set of data (without synchronization).
Most often you are electing to use a separate Database server - no matter if running it on the container or fully manged databases (like Azure CosmosDB or Mongo Atlas), but with your own configuration, scaling, and synchronization dedicated for DB only. Your app just needs to worry about the proper URL to that. Most cloud providers are exposing such services out of the box, so you are not worrying about the configuration by yourself.
Easy to change
Last but not least argument is about changing the initial setup overtime. You might change the database provider, or upgrade the version of the image in the future (such things are required from time to time). When you separate images, you can modify one without touching others. It is decreasing the cost of maintenance significantly.
Also, you can add additional services very easy - different logging aggregator? No Problem, additional microservice running out-of-the-box? Easy.
I got a website in Laravel where you can click on a button which sends a message to a Python daemon which is isolated in Docker. This works for an easy MVP to prove a concept, but it's not viable in production because a user would most likely want to pause, resume and stop that process as well because that service is designed to never stop otherwise considering it's a scanner which is looped.
I have thought about a couple of solutions for this, such as fixing it in the software layer but that would add complexity to the program. I have googled Docker and I have found that it is actually possible to do what I want to do with Docker itself with the commands pause, unpause, run and kill.
It would be optimal if I had a service which would interact with the Docker instances with the criteria of above and would be able to take commands from HTTP. Is Docker Swarm the right solution for this problem or is there an easier way?
There are both significant security and complexity concerns to using Docker this way and I would not recommend it.
The core rule of Docker security has always been, if you can run any docker command, then you can easily take over the entire host. (You cannot prevent someone from docker run a container, as container-root, bind-mounting any part of the host filesystem; so they can reset host-root's password in the /etc/shadow file to something they know, allow remote-root ssh access, and reboot the host, as one example.) I'd be extremely careful about connecting this ability to my web tier. Strongly coupling your application to Docker will also make it more difficult to develop and test.
Instead of launching a process per crawling job, a better approach might be to set up some sort of job queue (perhaps RabbitMQ), and have a multi-user worker that pulls jobs from the queue to do work. You could have a queue per user, and a separate control queue that receives the stop/start/cancel messages.
If you do this:
You can run your whole application without needing Docker: you need the front-end, the message queue system, and a worker, but these can all run on your local development system
If you need more crawlers, you can launch more workers (works well with Kubernetes deployments)
If you're generating too many crawl requests, you can launch fewer workers
If a worker dies unexpectedly, you can just restart it, and its jobs will still be in the queue
Nothing needs to keep track of which process or container belongs to a specific end user
currently my web application is running on a server, where all the services (nginx, php, etc.) are installed directly in the host system. Now I wanted to use docker to separate these different services into specific containers. Nginx and php-fpm are working fine. But in the web application pdfs can be generated, which is done using wkhtmltopdf and as I want to follow the single-service-per-container pattern, I want to add an additional container which houses wkhtmltopdf and takes care of this specific service.
The problem is: how can I do that? How can I call the wkhtmltopdf binary from the php-fpm container?
One solution is to share the docker.socket, but that is a big security flaw, so I really don‘t like to it.
So, is there any other way to achieve this? And isn‘t this "microservice separation" one of the main purposes/goals of docker?
Thanks for your help!
You can't directly call binaries from one container to another. ("Filesystem isolation" is also a main goal of Docker.)
In this particular case, you might consider "generate a PDF" as an action your service takes and not a separate service in itself, and so executing the binary as a subprocess is a means to an end. This doesn't even raise any complications since presumably mkhtmltopdf isn't a long-running process, you'll launch it once per request and not respond until the subprocess runs to completion. I'd install or include it in the Dockerfile that packages your PHP application, and be architecturally content with that.
Otherwise the main communication between containers is via network I/O and so you'd have to wrap this process in a simple network protocol, probably a minimal HTTP service in your choice of language/framework. That's probably not worth it for this, but it's how you'd turn this binary into "a separate service" that you'd package and run as a separate container.
Suppose I have a webserver and a database server installed on the same common Docker image, Is it possible to run them simultaneously, as if they were running inside the same virtual machine?
Is it running docker run <args> twice the best practice for this use case?
You should not use a single image for your web server and the database. You should use one image for the web server and one for the database.
To run this, you would run your database server and then run your webserver and link it to your database server.
There are many examples on internet. I'll just leave this one here : https://github.com/saada/docker-compose-php-mysql
According to this stack overflow answer it is perfectly possible to do that via a script that takes charge of starting each of these services
Can I run multiple programs in a Docker container?
Although most people just tell you to micro service everything into multiple different containers. It might well be much more manageable in some cases to have containers that lunch more than one process if you think about cloud deployment where you might want to run multiple web apps each corresponding to a different system test.
So you would have your isolated small hsql db running In server mode followed by your wildly or springboot app and finally your system test by mvn..
If you have all three in one container... Then it is just a matter of choosing in which Jenkins node your all in one container is put to run. Since it packs all within irrespective of any other container and the image size is not monstrous... You are really agile. As an example.
So you have to see what is best for you.
With big dBs like mysql you are often better of running them on an isolated container as a base platform for all other docker containers. With dBs like hsql you can easily afford a db per container.