Can docker be used to run a web site container that first downloads latest version prior to running web server? - docker

I've been experimenting with docker recently but can't get my head around what I think is a fairly important/useful requirement:
The ability to download a NEW copy of a web site for running, when a container is run. NOT at build time, but at run time.
I have seen countless examples of Dockerfiles where java, tomcat, a copy of a WAR is installed and added to an image during build time, but none where that WAR is downloaded fresh each time "docker run -d me/myimage" is executed on the command line.
I think it might involve adding a CMD statement at the end of the Dockerfile but I wonder if people out there more experienced than me with docker have some advice? Perhaps I shouldn't even be attempting this and should re-build my images each time my web app has a new release? But that would mean I would have to distribute my new image via a private dockerhub or something right? I am not willing to stick my source in a public github repo and have the Dockerfile pull it and build it during an image build.
Thanks.

As Mark O'Connor said in his comment, it's certainly possible. A Docker container is just a process tree running on your Linux host, and with a few exceptions (generally involving privileged access to the kernel) can do anything you can do outside of a container.
So sure, you could put together an image that, when run, would download the most recent of an application and run it.
The reason this is considered a bad idea is that it suddenly becomes difficult if you want to run an older version of the application (or more generally a specific version). What if you redeploy your container and end up with a new version of the application that requires manual database schema upgrades before it will operate? Now instead of an application you have a brick.
Similarly, what if the newest version of the application is simply buggy? If you were performing the download and install at build time, you would simply deploy an image with an older version of the application.
Performing the application and download at run time makes the container unpredictable and less manageable.

Related

How to use docker in the development phase of a devops life cycle?

I have a couple of questions related to the usage of Docker in a development phase.
I am going to propose three different scenarios of how I think Docker could be used in a development environment. Let's imagine that we are creating a REST API in Java and Spring Boot. For this I will need a MySQL database.
The first scenario is to have a docker-compose for development with the MySQL container and a production docker-compose with MySQL and the Java application (jar) in another container. To develop I launch the docker-compose-dev.yml to start only the database. The application is launched and debugged using the IDE, for example, IntelliJ Idea. Any changes made to the code, the IDE will recognize and relaunch the application by applying the changes.
The second scenario is to have, for both the development and production environment, a docker-compose with the database and application containers. That way, every time I make a change in the code, I have to rebuild the image so that the changes are loaded in the image and the containers are lauched again. This scenario may be the most typical and used for development with Docker, but it seems very slow due to the need to rebuild the image every time there is a change.
The third scenario consists of the mixture of the previous two. Two docker-compose. The development docker-compose contains both containers, but with mechanisms that allow a live reload of the application, mapping volumes and using, for example, Spring Dev Tools. In this way, the containers are launched and, in case of any change in the files, the application container will detect that there is a change and will be relaunched. For production, a docker-compose would be created simply with both containers, but without the functionality of live reload. This would be the ideal scenario, in my opinion, but I think it is very dependent on the technologies used since not all allow live reload.
The questions are as follows.
Which of these scenarios is the most typical when using Docker for phase?
Is scenario 1 well raised? That is, dockerize only external services, such as databases, queues, etc. and perform the development and debugging of the application with the IDE without using Docker for it.
The doubts and the scenarios that I raise came up after I raised the problem that scenario 2 has. With each change in the code, having to rebuild the image and start the containers again is a significant waste of time. In short, a question would be: How to avoid this?
Thanks in advance for your time.
NOTE: It may be a question subject to opinion, but it would be nice to know how developers usually deal with these problems.
Disclaimer: this is my own opinion on the subject as asked by Mr. Mars. Even though I did my best to back my answer with actual sources, it's mostly based on my own experience and a bit of common sense
Which of these scenarios is the most typical when using Docker for development?
I have seen all 3 scenarios iin several projects, each of them with their advantages and drawbacks. However I think scenario 3 with a Docker Compose allowing for dynamic code reload is the most advantageous in term of flexibility and consistency:
Dev and Prod Docker Compose are close matches, meaning Dev environment is as close as possible to Prod environment
You do not have to rebuild the image constantly when developping, but it's easy to do when you need to
Lots of technologies support such scenario, such as Spring Dev Tools as you mentionned, but also Python Flask, etc.
You can easily leverage Docker Compose extends a.k.a configuration sharing mechanism (also possible with scenario 2)
Is scenario 1 well raised? That is, dockerize only external services, such as databases, queues, etc. and perform the development and debugging of the application with the IDE without using Docker for it.
Scenario 1 is quite common, but the IDE environment would probably be different than the one from the Docker container (and it would be difficult to maintain a match of version for each libs, dependencies, etc. from IDE environment to Docker environment). It would also probably require to go through an intermediate step between Dev and Production to actually test the Docker image built after Dev is working before going to Production.
In my own experience doing this is great when you do not want to deal too much with Docker when actually doing dev and/or the language or technology you use is not adapted for dynamic reload as described in scenario 3. But in the end it only adds a drift between your environments and more complexity between Dev and Prod deployment method.
having to rebuild the image and start the containers again is a significant waste of time. In short, a question would be: How to avoid this?
Beside the scenarios you describe, you have ways to decently (even drastically) reduce image build time by leveraging Docker build cache and designing your Dockerfile. For example, a Python application would typically copy code as the last (or almost last) step of the build to avoid invalidating the cache, and for Java app it would be possible to split code so as to avoid compiling the entire application everytime a bit of code changes - that would depend on your actual setup.
I personally use a workflow roughly matching scenario 3 such as:
a docker-compose.yml file corresponding to my Production environment
a docker-compose.dev.yml which will override some aspect of my main Docker Compose file such as mouting code from my machine, adding dev specific flags to commands, etc. - it would be run such as
docker-compose -f docker-compose.yml -f docker-compose.dev.yml
but it would also be possible to have a docker-compose.override.yml as Docker Compose uses by default for override
in some situation I would have to use other overrides for specific situations such as docker-compose.ci.yml on my CI, but usually the main Docker Compose file is enough to describe my Prod environment (and if that's not the case, docker-compose.prod.yml does the trick)
I've seen them all used in different scenarios. There are some gotchas to avoid:
Applications inside of a container shouldn't depend on something running outside of a container on the host. So all of your dependencies should be containerized first.
File permissions with host volumes can be complicated depending on your version of docker. Some of the newer Docker Desktop installs automatically handle uid mappings, but if you develop directly on Linux you'll need to ensure the containers run as the same uid as your host user.
Avoid making changing inside the container if that isn't mapped into a host volume, since those changes will be lost when the container is recreated.
Looking at each of the options, here's my assessment of each:
Containerizing just the DB: This works well when developers already have a development environment for the language of choice, and there's no risk of external dependencies creeping in, e.g. a developer upgrading their JDK install to a newer version than the image is built with. It follows the idea of containerizing the dependencies first, while also giving developers the familiar IDE integration with their application.
Rebuilding the Image for Every Change: This tends to be the least ideal for developer workflow, but the quickest to implement when you're not familiar with the tooling. I'll give a 4th option that I consider an improvement to this.
Everything in a container, volume mounts, and live reloading: This is the most complicated to implement, and requires the language itself to support things like live reloading. However, when they do, it is nearly seamless for the developers and gets them up to speed on a new project quickly without needing to install any other tooling to get started.
Rebuild the app in the container with volume mounts: This is a halfway point between 2 and 3. When you don't have live reloading, you likely need to recompile or restart the interpreter to see any change. Rather than rebuilding the image, I put the recompile step in the entrypoint of a development image. I'll mount the code into the container, and run a full JDK instead of just a JRE (or whatever compiler is needed). I use named volumes for any dependency caches so they don't need to download on every restart. Then the method to see the changes is to restart that one container. The steps are identical to a compiled binary outside of a container, stop the old service, recompile, and restart the service, but now it happens inside of a container that should have the same tools used when building the production image.
For option 4, I tend to use a multi-stage build that has stages for build, develop, and release. The build stage pulls in the code and compiles it, the develop stage is the same base image as build but with an entrypoint that does the compile/run, and the release stage copies the result of the build stage into a minimal runtime. Developers then have a compose file for development that creates the development image and runs that with volume mounts and any debugging ports opened.
First of all, docker-compose is just for development and also testing phase, not for production. Example:
With a minimal and basic docker-compose, all your containers will run in the same machine? For development purposes it is ok, but in production, put all the apps in just one machine is a risk
Official link https://docs.docker.com/compose/production/
We will assume
01 java api
01 mysql database
01 web application that needs the api
all of these applications are already in production
Quick Answer
If you need to fix or add new feature to the java api, I advice you to use an ide like eclipse or IntelliJ Idea. Why?
Because java needs compilation.
Compile inside a docker container will take more time due to maven dependencies
IDE has code auto completion
etc
In this development phase, Docker helps you with one of its most powerful features: "Bring the production containers to your localhost". Yeah, in this case, docker-compose.yml is the best option because with one file, you could start everything you need : mysql database and web app but not your java api. Open your java api with your favorite ide.
Anyway if you want to use docker to "develop", you just need the Dockerfile and perform a docker build ... when you need to run your source code in your localhost
Basic Devops Life cycle with docker
Developer push source code changes using git
Your continuous integration (C.I) platform detect this change and perform
docker build ... (In this step, unit test are triggered)
docker push to your private hub. Container is uploaded in this step and will be used to deployments on other servers.
docker run or container deploy to the next environment : testing
Human testers ,selenium or another automation start their work
If no errors are detected, your C.I perform perform a final deploy of the uploaded container to your production environment. No docker build are required, just deploy or docker run.
Some Tips
Docker features are awesome but sometimes add too much complexity. So stop using volumes , hard disk dependency, logs, or complex configurations. If you use volumes, what will happen when your containers are in different hosts?
Java and Nodejs are a stable languages and your rest api or web apps does not need crazy configurations. Just maven compilation and java -jar ... or npm install and npm run start.
For logs you could use https://www.graylog.org/, google stasckdriver or another log management.
And like Heroku, stop using hard disk dependency as much as possible. In heroku platform disk are disposable, it means disappear when app is restarted. So instead of local file storage, you could use another file storage service with a lot of functionalities.
With this approaches, your containers can be deployed anywhere in a simple way
I'm using something similar to your 3rd scenario for my web dev, but it is Node-based. So I have 3 docker-compose files (actually 4, one is base and having all common stuff for others) for dev, staging and production environments.
Staging docker-compose config is similar to production config excluding SSL, ports and other things that may not allow to use it locally.
I have a separate container for each service (like DB, queue), and for dev, I also have additional dev DB and queue containers mostly for running auto-tests. In dev environment, all source are mounted into containers, so it allows to use IDE/editor of choice outside the container, and see changes inside.
I use supervisor to manage my workers inside a container with workers and have some commands to restart my workers manually when I need this. Maybe you can have something similar to recompile/restart your Java app. Or if you have an idea of how to organize app source code changes detection and your app auto-reloading, than could be the best variant. By the way, you gave me an idea to research something similar suitable for my case.
For staging and production environment, my source code is included inside the corresponding container using production Dockerfile. And I have some commands to restart all stuff using an environment I need, and this typically includes rebuilding containers, but because of Docker cache, it doesn't take much time (about 20 seconds). And taking into account that switching between environments is not a too frequent operation I feel quite comfortable with this.
Production docker-compose config is used only during deployment because it enables SSL, proper ports and has some additional production stuff.
Update for details on backend app restarting using Supervisor:
This is how I use it in my projects:
A part of my Dockerfile with installing Supervisor:
FROM node:10.15.2-stretch-slim
RUN apt-get update && apt-get install -y \
# Supervisor
supervisor \
...
...
# Configs for services/workers managed by supervisor
COPY some/path/worker-configs/*.conf /etc/supervisor/conf.d/
This is an example of one of Supervisor configs for a worker:
[program:myWorkerName]
command=/usr/local/bin/node /app/workers/my-worker.js
user=root
numprocs=1
stopsignal=INT
autostart=true
autorestart=true
startretries=10
In this example in your case command should run your Java app.
And this is an example of command aliases for convenient managing Supervisor from outside of containers. I'm using Makefile as a universal runner of all commands, but this can be something else.
# Used to run all workers
su-start:
#docker exec -t MY-WORKERS-CONTAINER-NAME supervisorctl start all
# Used to stop all workers
su-stop:
#docker exec -t MY-WORKERS-CONTAINER-NAME supervisorctl stop all
# Used to restart all workers
su-restart:
#docker exec -t MY-WORKERS-CONTAINER-NAME supervisorctl restart all
# Used to check status of all workers
su-status:
#docker exec -t MY-WORKERS-CONTAINER-NAME supervisorctl status
As I described above these Supervisor commands need to be run manually, but I think it is possible to implement maybe another Node-based worker or some watcher outside of a container with workers that will detect file system changes for sources directory and run these commands automatically. I think it is possible to implement something like this using Java as well like this or this.
On the other hand, it needs to be done carefully to avoid constant restarting workers on every little change.

How to deploy an application running in docker - best practice?

We are discussing how we should deploy our application running in a docker container. At the moment, we build our application image in the pipeline containing the application code. Which means we have to build the docker image every time the application updates.
Another approach we consider is putting the application code in a volume on the server. We then pull the latest release with git on the server. So the image has not to be rebuilt.
So our discussed options are:
Build the image containing the application code
Use a volume and store the application code on the server
What is best practice to do and why?
While the other answers here have explained the point of building code into your image, I'd like to go one step further and show you how to get the benefits of both worlds while following this best practice.
Docker best practices call for building source code into your image before deployment, rather than deploying an image with dependencies installed and then source code mounted in as a volume.
This gives you a self-contained, portable container that is straightforward to test, deploy, or rollback.
May I take a stab at why you are considering hot-mounting code?
Hot-mounting code is appealing for several reasons — and they're all easy to achieve without sacrificing this best practice of building a self-contained image:
Building Docker images can be slow, so why rebuild for a minor change when you can just hot-mount the code?
A complementary best practice is to use a "base image" that installs all dependencies -- usually the slow part of building a docker image. The key insight is that this base image won't change often!
But the image that derives from it -- your application image, which installs source code -- will change with every commit you want to deploy. That derived image will be very fast to build. The Dockerfile could be as simple as:
FROM myapp/base . # all dependencies installed in base image
ADD code.tar.gz /src # automatic untaring!
CMD [...] # whatever it takes to run your app
Hot-mounting enables faster development cycles, because a developer won't need to flush their docker container, rebuild, and run a new container just to see a change.
This is a fair point. I recommend making a "dev" image (which will also derive from your base image) that enables code mounting via a volume rather than the source code installation steps you'd have in your testing and deployment images.
When you build image every time with new application you have easy way to deploy it later on to the customer or on your production server. When the docker image is ready you can keep it in the repository. Additionally you have full control on that that your docker is working with current application.
In case of keeping the application in mounted volume you have to keep in mind following problems:
life cycle of application - what to do with container when you have to update the application - gently stop, overwrite and run again
how do you deploy your application - you have to do it manually over SSH, or you want just to run simple command docker run, and it runs your latest version from your repository
The mounted volumes are rather for following casses:
you want to have externally exposed settings for container - what is also not a good idea
you want to have externally access to the data produced by the application like logs, db, etc
To automate it totally, you can:
build image for each application and push to the repository
use for example watchtower to automatic update of the system on your production servers
I believe you should follow the first approach i.e. rebuilding the docker image every time there are changes in code. Reasons are-
Firstly, if you are using volume, every time you have to manage the clean closing and removing of the previous version of the application and check whether the new version of the application is running correctly. Your new application might get affected dependencies of your previous version of the application. That need to be taken care too.
Secondly, there might be some version updates of the frameworks installed and some new frameworks are to be installed with the current application. In this case, the first approach seems to be the only option.
Thirdly, As when you are using docker volume you will be removing the most important feature of docker i.e. abstraction from outside environment. Also, the image might become machine dependent because of it, which might affect if you want to publish the app in multiple environments.
My suggestion would be creating a pipeline using some continuous integration tool and fully automate the process starting from code building, building of docker image and deploying it to your environment.

Handling software updates in Docker images

Let's say I create a docker image called foo that contains the apt package foo. foo is a long running service inside the image, so the image isn't restarted very often. What's the best way to go about updating the package inside the container?
I could tag my images with the version of foo that they're running and install a specific version of the package inside the container (i.e. apt-get install foo=0.1.0 and tag my container foo:0.1.0) but this means keeping track of the version of the package and creating a new image/tag every time the package updates. I would be perfectly happy with this if there was some way to automate it but I haven't seen anything like this yet.
The alternative is to install (and update) the package on container startup, however that means a varying delay on container startup depending on whether it's a new container from the image or we're starting up an existing container. I'm currently using this method but the delay can be rather annoying for bigger packages.
What's the (objectively) best way to go about handling this? Having to wait for a container to start up and update itself is not really ideal.
If you need to update something in your container, you need to build a new container. Think of the container as a statically compiled binary, just like you would with C or Java. Everything inside your container is a dependency. If you have to update a dependency, you recompile and release a new version.
If you tamper with the contents of the container at startup time you lose all the benefits of Docker: That you have a traceable build process and each container is verifiably bit-for-bit identical everywhere and every time you copy it.
Now let's address why you need to update foo. The only reason you should have to update a dependency outside of the normal application delivery cycle is to patch a security vulnerability. If you have a CVE notice that ubuntu just released a security patch then, yep, you have to rebuild every container based on ubuntu.
There are several services that scan and tell you when your containers are vulnerable to published CVEs. For example, Quay.io and Docker Hub scan containers in your registry. You can also do this yourself using Clair, which Quay uses under the hood.
For any other type of update, just don't do it. Docker is a 100% fossilization strategy for your application and the OS it runs on.
Because of this your Docker container will work even if you copy it to 1000 hosts with slightly different library versions installed, or run it alongside other containers with different library versions installed. You container will continue to work 2 years from now, even if the dependencies can no longer be downloaded from the internet.
If for some reason you can't rebuild the container from scratch (e.g. it's already 2 years old and all the dependencies went missing) then yes, you can download the container, run it interactively, and update dependencies. Do this in a shell and then publish a new version of your container back into your registry and redeploy. Don't do this at startup time.

Docker procedure - treat local copy as staging server?

I've read a bit about docker and have read only about the following procedure:
Build container using Dockerfile
Push container to repo/server
Run container
new version, go to step 1)
The claim is that this is quick, which it sometimes is. However, often in these recipes, I'll see a 'git pull' or 'add ' step followed by a bundle install or some other preparation step. If you always do this, you throw away a fair amount of progress and start the processes as if you had never installed your app in the first place, though it won't have to reinstall any prerequisites. Not to mention, you have to upload a bunch of big images to your server - much of which is duplicate stuff anyway.
It occurs to me that a better procedure might be to treat your local Docker instance as a staging server, ssh-ing in (to get magical SSH user agent forwarding to work more reliably), updating code, testing, then committing the changes and pushing it up to whatever cloud service runs your docker instances.
Am I missing something? Is this what everyone actually does, but doesn't really write about (because it's more complex)? Or have I just not stumbled on to the article that talks about this?
Docker containers generally follow the "golden image" rule - if you need to update it, you create a new image.
Because Docker caches the intermediate results of building Dockerfiles, subsequent builds tend to be fast. Also, the building of images tends to be automated (e.g. see automated builds on the Docker Hub), so as long as it doesn't take a really long time, it's not a problem.
You really don't want to be making changes to to containers manually as you will lose track of the changes and become unable to recreate them.

A Docker workflow for a developers team

In our team we currently use vagrant as a development environment. Now I want to replace it with docker, but I can't understand the team workflow with it.
This is what confuses me: with vagrant I create a project repo with a Vagrantfile in it, and every developer pulls a repo and runs vagrant up. If the project needs some changes in environment, I edit Vagrantfile, chef recipe or requirements-file, and developers must run vagrant provision to get an updated environment.
But with docker I see at least two options:
create a Dockerfile and put it in repo, every developer builds an image from it. On every change they rebuild their own image.
build an image, put it on server, every developer pulls it and run. On every change rebuild and image on server (maybe some auto-rebuilds on server and auto-pull scripts).
Docker phylosophy is 'build once, run anywhere', but the same time we have a Dockerfile in repo... What do you think about it? How do you do this in your team?
Docker is more for production as for development
Docker is a deployment tool to package apps for production (or tests environments). It is not so much a tool for development. It is meant to create an isolated environment to run your already developed application somewhere on a server, the cloud or your laptop.
Use a Dockerfile to document the packaging
I think it is nice to have a Dockerfile in your project. Similar to a Vagrant file, it is a kind of executable documentation which describes how your production environment should look like. Somebody who is new to your project could just run the file and will get a packaged and ready-to-run container. Cool!
Use a registry to integrate Docker
I think you should provide a (private) Docker registry if you integrate Docker into your (CI) workflow (e.g. into test and build systems). A single repository to store validated and tested images of all your products will definitely speed-up your time to create new test or production systems (e.g. to scale your app or to setup an installation for a demo or a customer). If your product is open source, consider the public Docker index so people could find your stuff there. You can configure your build system to create a new Docker image after each (successful) build and to push it to the registry. Since the images are layered (and those layers are shared), this will be fast and will not take to much disk space.
If you want to integrate Docker in your development, I don't see so much possibilities:
You can create a repository with final images (as described before)
Or you can use Docker images to develop against them (e.g. to run a MongoDB)
Maybe you have a team A which programs against the API of team B and always needs a running instance of team B's product. Then you could package this product into a Docker image and share it with team A. In this case, team B should provide the image in a repository (and team A shouldn't take care how to build it and use it as a blackbox).
Edit: If you depend on many external apps
To make this "team A and team B" thing more clear: If you develop an app against many other tools, e.g. an app from another team, a MongoDB or an Elasticsearch, you can package those apps into Docker images, run them (locally) and develop against them. You will also have a good chance to find popular apps (such as MongoDB) in the public Docker Index. So instead of installing them manually, you can just pull and start them. But to put together an environment like this, you will need Vagrant again.
You could also use Docker for test environments (build and run an image and test against it). But this wouldn't be a replacement for Vagrant in development.
Vagrant + Docker
I would suggest to use both. Provide a Vagrantfile to build the development environment and provide a Dockerfile to build the production environment.
Also take a look at http://docs.vagrantup.com/v2/provisioning/docker.html. Vagrant has a Docker integration since a while, so you can create Docker containers/environments with Vagrant.
I work for Docker.
Think of Docker (and the Docker index/registry) as equivalent to Git. You don't have to make this very hard. If you change a Dockerfile, it is a cheap and quick operation to update an image. If you use "Trusted Builds" in our registry, then you can have it build automatically off of any branch at any time you want.
These are basic building blocks, but it works great for development. Docker itself is built and developed inside of Docker containers, so we know it works fine.

Resources