Understanding Docker best practice and running webservers - docker

I am in the situation of running a simple PHP7.0, Redis and NGINX server in a single container.
This means I run php7.0-fpm and ngxinx and redis as a service.
But in the best practices I am reading:
# Run only one process per container
In almost all cases, you should only run a single process in a single container.
Decoupling applications into multiple containers makes it much easier to scale horizontally and reuse containers.
If that service depends on another service, make use of container linking.
Does this mean that it would be best to run one container with PHP7.0 and the application and another with nginx and another with redis?

#nwinkler in comments is right, the recommendation is for good. Couple of advantages of decoupling applications into multiple containers are:
Build time
It is true that docker does hash check and does not build the layers of the image if no changes happened but this is limited to layers structure (if layer X changes all layers above X will be built). This means it will start getting painful when your images start getting bigger.
Containers are isolated
When you are attached to your ngxinx you are pretty sure that any changes you are doing is not going to cause changes in your php container and that's always a good practice.
Scalability
You need ten more Redis, good, let's run ten more Redis containers.
In general I would go for a dockerfile for a base image for any scenario and in your case one which is whatever all the three containers of yours (php, redis & nxginx) share (thirdparty libs, tools etc). Then three dockerfiles for building each image. Then a bash or docker-compose.yml script for running the images inside containers.

Related

Running FastAPI on Google Cloud Run (Docker Image)

I'm looking to build a Docker image to run FastAPI on Google Cloud Run. FastAPI uses Uvicorn as an ASGI server and Uvicorn recommend using Gunicorn with the Uvicorn worker class for production deployments. FastAPI themselves also have some excellent documentation on using Gunicorn with Uvicorn. I even see that FastAPI provide an official image combining the two (
uvicorn-gunicorn-fastapi-docker) but this comes with a warning:
You are probably using Kubernetes or similar tools. In that case, you
probably don't need this image (or any other similar base image). You
are probably better off building a Docker image from scratch
This warning basically explains that replication would be handled at cluster-level and doesn't need to be handled at process-level. This makes sense. I am however not quite sure if Cloud Run falls into this category? Essentially it is an abstracted and managed Knative service which therefore runs on Kubernetes.
My question is, should I be installing Gunicorn along with Uvicorn in my Dockerfile and handling replication at process-level? Along the lines of:
CMD ["gunicorn", "app.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:80"]
Or should I stick with Uvicorn, a single process, and let Cloud Run (Kubernetes) handle replication at cluster-level? E.g.
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
A. Let's go the big scale first.
At the time of writing Cloud Run instances can be set to a maximum of 1000 concurrency requests. And CPUs can be set to 4 vCPUs.
Going back to basics a bit. Cloud Run will span many instances, each will work individually, and each could handle the maximum allowed concurrency, so each could handle 1000 requests. If you set multiple CPUs you need to handle multi processes.
When we talk about so large number we need to be cautious. If your container is big enough in CPU/Memory terms to handle this traffic, you may want to use a process manager (gunicorn) to start several Uvicorn threads (workers), as your referenced images do. So you can use the docker container.
B. Being on small scale.
On the other hand if you set 1 vCPU and be single threaded, you don't need gunicorn for process manager. You still can have concurrency enabled but not on the top level, maybe at the lower level, that fits your 1 vCPU model, something like 80 requests for concurrency. In this situation you will have on large traffic, many instances started by Cloud Run, and you rely at Cloud Run to spawn as many instances as needed, which does really nice. It's like a Kubernetes on top of your simple container.
I would suggest start with single process, build a container that doesn't use the referenced container, and only swap B to version A, when you know there are benefits(costs wise) to have larger instances.

Should I create a docker container or docker start a stopped container?

From the docker philosophy's point of view it is more advisable:
create a container every time we need to use a certain environment and then remove it after use (docker run <image> all the time); or
create a container for a specific environment (docker run <image>), stop it when it is not necessary and whenever it is initialized again (docker start <container>);
If you docker rm the old container and docker run a new one, you will always get a clean filesystem that starts from exactly what's in the original image (plus any volume mounts). You will also fairly routinely need to delete and recreate a container to change basic options: if you need to change a port mapping or an environment variable, or if you need to update the image to have a newer version of the software, you'll be forced to delete the container.
This is enough reason for me to make my standard process be to always delete and recreate the container.
# docker build -t the-image . # can be done first if needed
docker stop the-container # so it can cleanly shut down and be removed
docker rm the-container
docker run --name the-container ... the-image
Other orchestrators like Docker Compose and Kubernetes are also set up to automatically delete and recreate the container (or Kubernetes pod) if there's a change; their standard workflows do not generally involve restarting containers in-place.
I almost never use docker start. In a Compose-based workflow I generally use only docker-compose up -d, letting it restart things if needed; docker-compose down if I need the CPU/memory resources the container stack was using but not in routine work.
I'm talking with regards to my experience in the industry so take my answer with a grain of salt, because there might be no hard evidence or reference to the theory.
Here's the answer:
TL;DR:
In short, you never need the docker stop and docker start because taking this approach is unreliable and you might lose the container and all the data inside if no proper action is applied beforehand.
Long answer:
You should only work with images and not the containers. Whenever you need some specific data or you need the image to have some customization, you better use docker save to have the image for future use.
If you're just testing out on your local machine, or in your dev virtual machine on a remote host, you're free to use either one you like. I personally take each of the approaches on different scenarios.
But if you're talking about a production environment, you'd better use some orchestration tool; it could be as simple and easy to work with as docker-compose or docker swarm or even Kubernetes on more complex environments.
You better not take the second approach (docker run, docker stop & docker start) in those environments because at any moment in time you might lose that container and if you are solely dependent on that specific container or it's data, then you're gonna have a bad weekend.

docker run/start: Is there a significant impact on disk space by using the "run" command over and over?

I'm wondering what is the best practice for launching many containers (on the order of thousands per day) in terms of using docker container run or docker container start. I realize that start is used on a stopped container and run would be used to create a new container, but does it matter which one is used if the same underlying image is used across all the containers?
My guess is that since all the containers use the same image there would be very little overhead for creating many thousands of containers. In other words, just use docker container run over and over again.
Should I instead try to search for an existing container before starting a new one?
The easiest solution is to pass --rm to docker run. This will cause the container to be deleted as soon as it's done running, so repeated calls to it won't keep using more and more space.

Duplication among Dockerfiles

I am installing OpenStack Keystone now.
For standalone Keystone needs three components: mysql, python, and apache2.
Absolutely I can’t pick all of them to the base, I made python as a base image, and others were inserted as RUN statements for installing mysql and apache2.
I think that the RUN statements are duplication because all the three components exist on Docker Public Registry.
Is there any good solution or proper way to reuse the existing external Dockerfile???
There seems to be some confusion here about what a Dockerfile does: it defines a single Docker image. In general, the recommended way to run applications in Docker is to have a container for each service and have them connect to other services in other containers as needed (more on this later).
In your case, it sounds like your application consists of OpenStack Keystone (which requires Python and Apache to run) and MySQL. So I would install Python & Apache in your Dockerfile, and set up MySQL (possibly just using the image from the public repository) as a separate container that the OpenStack container connects to over the network.
As mentioned above, this scenario is the recommended way to run Docker applications - it follows the Unix paradigm of "each app does only one thing, but does it very well". Each container does one thing only and connects to any other services in other containers. But it is possible to run multiple services in the same container - eg. Keystone running on Apache/python AND MySQL in the same container. If this is your goal, you would write a Dockerfile that installs everything and gets everything running together. This Dockerfile is likely to be pretty complicated and will require an ENTRYPOINT that gets both MySQL and Apache working together. You're likely to wind up duplicating a lot of the work that has already gone into the standard MySQL and Apache images.
You can make use of Docker Compose to run you application having mysql, python, and apache2.
Using Docker compose will allow you to control the application setup using a single command. You just need to write a DockerCompose.yml file and also Dockerfiles corresponding to containers you will setup.
In your case you can have a dockerfile for setting up a python and apache2 container and other Dockerfile having mysql as the base image for setting up the said container.

What is best practice for sharing database between containers in docker?

Is there anyone knows what is the best practice for sharing database between containers in docker?
What I mean is I want to create multiple containers in docker. Then, these containers will execute CRUD on the same database with same identity.
So far, I have two ideas. One is create an separate container to run database merely. Another one is install database directly on the host machine where installed docker.
Which one is better? Or, is there any other best practice for this requirement?
Thanks
It is hard to answer a 'best practice' question, because it's a matter of opinion. And opinions are off topic on Stack Overflow.
So I will give a specific example of what I have done in a serious deployment.
I'm running ELK (Elasticsearch, Logstash, Kibana). It's containerised.
For my data stores, I have storage containers. These storage containers contain a local fileystem pass through:
docker create -v /elasticsearch_data:/elasticsearch_data --name ${HOST}-es-data base_image /bin/true
I'm also using etcd and confd, to dynamically reconfigure my services that point at the databases. etcd lets me store key-values, so at a simplistic level:
CONTAINER_ID=`docker run -d --volumes-from ${HOST}-es-data elasticsearch-thing`
ES_IP=`docker inspect $CONTAINER_ID | jq -r .[0].NetworkSettings.Networks.dockernet.IPAddress`
etcdctl set /mynet/elasticsearch/${HOST}-es-0
Because we register it in etcd, we can then use confd to watch the key-value store, monitor it for changes, and rewrite and restart our other container services.
I'm using haproxy for this sometimes, and nginx when I need something a bit more complicated. Both these let you specify sets of hosts to 'send' traffic to, and have some basic availability/load balance mechanisms.
That means I can be pretty lazy about restarted/moving/adding elasticsearch nodes, because the registration process updates the whole environment. A mechanism similar to this is what's used for openshift.
So to specifically answer your question:
DB is packaged in a container, for all the same reasons the other elements are.
Volumes for DB storage are storage containers passing through local filesystems.
'finding' the database is done via etcd on the parent host, but otherwise I've minimised my install footprint. (I have a common 'install' template for docker hosts, and try and avoid adding anything extra to it wherever possible)
It is my opinion that the advantages of docker are largely diminished if you're reliant on the local host having a (particular) database instance, because you've no longer got the ability to package-test-deploy, or 'spin up' a new system in minutes.
(The above example - I have literally rebuilt the whole thing in 10 minutes, and most of that was the docker pull transferring the images)
It depends. A useful thing to do is to keep the database URL and password in an environment variable and provide that to Docker when running the containers. That way you will be free to connect to a database wherever it may be located. E.g. running in a container during testing and on a dedicated server in production.
The best practice is to use Docker Volumes.
Official doc: Manage data in containers. This doc details how to deal with DB and container. The usual way of doing so is to put the DB into a container (which is actually not a container but a volume) then the other containers can access this DB-container (the volume) to CRUD (or more) the data.
Random article on "Understanding Docker Volumes"
edit I won't detail much further as the other answer is well made.

Resources