How exactly does docker work? (Theory)

How exactly does docker work? (Theory) - docker

I am venturing into using docker and trying to get a firm grasp of the product.
While I love everything it promises it is a big change from doing things manually.
Right now I understand how to build a container, attach your code, commit and push it to your repo.
But what I am really wondering is how do I update my code once deployed, for example, I have some minor bug fixes but no change to dependencies but I also run a database in the same container.
Container:
Node & NPM
Nginx
Mysql
php
Right now the only way I understand you can do it is to close thje container re pull the new container and run, but I am thinking you will lose database data.
I have been reading into https://docs.docker.com/engine/tutorials/dockervolumes/
and thinking maybe the container mounts a data file that persists between containers.
What I am trying to do is run a web app/website with the above container layout and just change code with latest bugfixes/features.

You're quite correct. Docker images are something you should be rebuilding and discarding with each update - avoid commit wherever possible (outside your build scripts anyway).
Persistent state should be managed via data containers that you then mount with your image. Thus your "data" is decoupled from that specific version and instance of the application.

Related

Why should our work inside the container shouldn't modify the content of the container itself?

I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.

Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.

The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.

Avoiding a constant "stop, build, up" loop when developing with Docker locally

I've recently delved into the wonder that is Docker and have set up my containers using docker-compose (an Express app and a MySQL DB).
It's been great for sharing the project with the team and pushing to a VPS, but one thing that's fast becoming tedious is the need to stop the running app, docker-compose build then docker-compose up any time there are changes to the code (which I believe is also creating numerous unnecessary images?).
I've scoured about but haven't found a clear-cut way to get around this, barring ditching Docker-compose locally and using docker run to run the Express app pointing to a local DB (which would do away with a lot of the easy set up perks that come with Docker, such as building the DB from scratch).
Is there a Nodemon-style way of working with Docker (images/containers get updates automatically when code changes)? Is there something obvious I'm missing? Or is my approach the necessary "evil" that comes with working on a Dockerised app?

You can mount a volume to your source directory for local development. Any changes on the host will be reflected in the container. https://docs.docker.com/storage/volumes/
You might consider separate services for deployment/development. I usually have a separate service which mounts the source directory and installs test dependencies inside the container.

Recreating Docker images instead of reusing - for microservices

One microservice stays in one docker container. Now, let's say that I want to upgrade the microservice - for example, some configuration is changed, and I need to re-run it.
I have two options:
I can try to re-use existing image, by having a script that runs on containers startup and that updates the microservice by reading new config (if there is) from some shared volume. After the update, script runs the microservice.
I can simply drop the existing image and container and create the new image (with new name) and new container with updated configuration/code.
Solution #2 seems more robust to me. There is no 'update' procedure, just single container creation.
However, what bothers me is if this re-creation of the image has some bad side-effects? Like a lot of dangling images or something similar. Imagine that this may happens very often during the time user plays with the app - for example, if developer is trying out something, he wants to play with different configurations of microservice, and he will re-start it often. But once it is configured, this will not change. Also, when I say configuration I dont mean just config files, but also user code etc.

For production changes you'll want to deploy a new image for changes to the file. This ensures your process is repeatable.
However, developing by making a new image every time you write a new line of code would be a nightmare. The best option is to run your docker container and mount the source directory of the container to your file system. That way, when you make changes in your editor, the code in the container updates too.
You can achieve this like so:
docker run -v /Users/me/myapp:/src myapp_image
That way you only have to build myapp_image once and can easily make changes thereafter.
Now, if you had a running container that was not mounted and you wanted to make changes to the file, you can do that too. It's not recommended, but it's easy to see why you might want to.
If you run:
docker exec -it <my-container-id> bash
This will put you into the container and you can make changes in vim/nano/editor of your choice while you're inside.

Your option #2 is definitely preferable for a production environment. Ideally you should have some automation around this process, typically to perform something like a blue-green deploy where you replace containers based on the old image one by one with those from the new, testing as you go and then only when you are satisfied with the new deployment do you clean up the containers from the previous version and remove the image. That way you can quickly roll-back to the previous version if needed.
In a development environment you may want to take a different approach where you bind mount the application in to the container at runtime allowing you to make updates dynamically without having to rebuild the image. There is a nice example in the Docker Compose docs that illustrates how you can have a common base compose YML and then extend it so that you get different behavior in development and production scenarios.

Is it "safe" to commit a running container in docker?

As the title goes, safe means... the proper way?
Safe = consistent, no data loss, professional, legit way.
Hope to share some experiences with pro docker users.
Q. Commit is safe for running docker containers (with the exception of rapidly changing realtime stuff and database stuff, your own commentary is appreciated.)
Yes or No answer is accepted with comment. Thanks.

All memory and harddisk storage is saved inside the container instance. You should, as long as you don't use any external mounts/docker volumes and servers (externally connected DBs?) never get in trouble for stopping/restarting and comitting dockers. Please read on to go more in depth on this topic.
A question that you might want to ask yourself initially, is how does docker store changes that it makes to its disk on runtime? What is really sweet to check out, is how docker actually manages to get this working. The original state of the container's hard disk is what is given to it from the image. It can NOT write to this image. Instead of writing to the image, a diff is made of what is changed in the containers internal state in comparison to what is in the docker image.
Docker uses a technology called "Union Filesystem", which creates a diff layer on top of the initial state of the docker image.
This "diff" (referenced as the writable container in the image below) is stored in memory and disappears when you delete your container. When you use docker commit, the writable container that is retained in the temporary "state" of the container is stored inside a new image, however: I don't recommend this. The state of your new docker image is not represented in a dockerfile and can not easily be regenerated from a rebuild. Making a new dockerfile should not be hard. So that is alway the way-to-go for me personally.
When your docker is working with mounted volumes, external servers/DBs, you might want to make sure you don't get out of sync and temporary stop your services inside the docker container. When you would use a dockerfile you can start up a bootstrap shell script inside your container to start up connections, perform checks and initialize the running process to get your application durably set up. Again, running a committed container makes it harder to do something like this.

What would be a good docker webdev workflow?

I have a hunch that docker could greatly improve my webdev workflow - but I haven't quite managed to wrap my head around how to approach a project adding docker to the stack.
The basic software stack would look like this:
Software
Docker image(s) providing custom LAMP stack
Apache with several modules
MYSQL
PHP
Some CMS, e.g. Silverstripe
GIT
Workflow
I could imagine the workflow to look somewhat like the following:
Development
Write a Dockerfile that defines a LAMP-container meeting the requirements stated above
REQ: The machine should start apache/mysql right after booting
Build the docker image
Copy the files required to run the CMS into e.g. ~/dev/cmsdir
Put ~/dev/cmsdir/ under version control
Run the docker container, and somehow mount ~/dev/cmsdir to /var/www/ on the container
Populate the database
Do work in /dev/cmsdir/
Commit & shut down docker container
Deployment
Set up remote host (e.g. with ansible)
Push container image to remote host
Fetch cmsdir-project via git
Run the docker container, pull in the database and mount cmsdir into /var/www
Now, this looks all quite nice on paper, BUT I am not quite sure whether this would be the right approach at all.
Questions:
While developing locally, how would I get the database to persist between reboots of the container instance? Or would I need to run sql-dump every time before spinning down the container?
Should I have separate container instances for the db and the apache server? Or would it be sufficient to have a single container for above use case?
If using separate containers for database and server, how could I automate spinning them up and down at the same time?
How would I actually mount /dev/cmsdir/ into the containers /var/www/-directory? Should I utilize data-volumes for this?
Did I miss any pitfalls? Anything that could be simplified?

If you need database persistance indepent of your CMS container, you can use one container for MySQL and one container for your CMS. In such case, you can have your MySQL container still running and your can redeploy your CMS as often as you want independently.
For development - the another option is to map mysql data directories from your host/development machine using data volumes. This way you can manage data files for mysql (in docker) using git (on host) and "reload" initial state anytime you want (before starting mysql container).
Yes, I think you should have a separate container for db.
I am using just basic script:
#!/bin/bash
$JOB1 = (docker run ... /usr/sbin/mysqld)
$JOB2 = (docker run ... /usr/sbin/apache2)
echo MySql=$JOB1, Apache=$JOB2
Yes, you can use data-volumes -v switch. I would use this for development. You can use read-only mounting, so no changes will be made to this directory if you want (your app should store data somewhere else anyway).
docker run -v=/home/user/dev/cmsdir:/var/www/cmsdir:ro image /usr/sbin/apache2
Anyway, for final deployment, I would build and image using dockerfile with ADD /home/user/dev/cmsdir /var/www/cmsdir
I don't know :-)

You want to use docker-compose. Follow the tutorial here. Very simple. Seems to tick all your boxes.
https://docs.docker.com/compose/

I understand this post is over a year old at this time, but I have recently asked myself very similar questions and have several great answers to your questions.
You can setup a MySQL docker instance and have data persist on a stateless data container, aka the data container does not need to be actively running
Yes I would recommend having a separate instance for your web server and database. This is the power of Docker.
Check out this repo I have been building. Basically it is as simple as make build & make run and you can have a web server and database container running locally.
You use the -v argument when running the container for the first time, this will link a specific folder on the container to the host running the container.
I think your ideas are great and it is currently possible to achieve all that you are asking.
Here is a turn key solution achieving all of the needs you have listed.

I've put together an easy to use docker compose setup that should match your development workflow requirements.
https://github.com/ehyland/docker-silverstripe-dev
Main Features
Persistent DB
Your choice of HHVM + NGINX or Apache2 + PHP5
Debug and set breakpoints with xDebug
The README.md should be clear enough to get you started.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart