I'm looking for a way to share files/folders between Docker containers. Especially sharing a file gives me issues. I want to use Docker in production with docker-compose and use a deployment technic that gives me zero downtime (like green/blue or something else).
What I have so far, is to deploy the new source code by checking out git source first. I keep the old container running, until the new one is up. Then I stop the old one and remove it.
The problem I'm running into with shared files is that Docker doesn't lock files. So when two containers with the same application are up and writing the same file shared_database.db this causes data corruption.
Folder structure from root looks like this:
/packages (git source)
/www (git source)
/shared_database.db (file I want to share across different deployments)
/www/public/files (folder i want to share across different deployments)
I've tried:
symlinks; unfortunately Docker doesn't support symlinks
mounting shared files/folders within the docker-compose file under volumes section, but since Docker doesn't lock files this causes data corruption
If I need to make myself more clear or need to provide more info i'd be happy to. Let me know.
Related
I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.
Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.
The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.
I am using VS Code, Remote Development Containers, and Docker to create development environments within containers. Everything works fine, but I did notice that when working with different projects that doing things such as yarn install means having to download the npm modules each time. Of course, once a container image does this they are stored in the cache, specifically /usr/local/share/.cache/yarn/v6.
When I attempted to mount that folder to the host machine yarn install would start to fail too often, stating that it was having trouble downloading the package due to a bad network connection (the connection was just fine). So, I created a volume instead and everything worked just fine.
The problem I am running into is that I also want to share other folders in the volume so that multiple containers use the same cache for things such as NuGet packages. I was hoping to somehow have my volume look like so:
mysharedvolume/yarn => /usr/local/share/.cache/yarn/v6
mysharedvolume/nuget => /wherever/nuget/packages/are/cached
mysharedvolume/somefile.config => /wherever/somefile.config
This does not seem to be the way volumes work in docker, all of the files are mixed up at the root of the volume (there is no subdirectories). Of course, I can't simply map the entire /usr folder or anything like that, that's crazy.
Before I go off and create different volumes for each cache and config files, is there a way to do this with a single shared volume?
I have a Dockerfile which builds an image that provides for me a complicated tool-chain environment to compile a project on a mounted volume from the host machines file system. Another reason is that I don't have a lot of space on the image.
The Dockerfile builds my tool-chain in the OS image, and then prepares the source by downloading packages to be placed on the hosts shared volume. And normally from there I'd then log into the image and execute commands to build. And this is the problem. I can download the source in the Dockerfile, but how then would I get it to the shared volume.
Basically I have ...
ADD http://.../file mydir
VOLUME /home/me/mydir
But then of course, I get the error 'cannot mount volume over existing file ..."
Am I going about this wrong?
You're going about it wrong, but you already suspected that.
If you want the source files to reside on the host filesystem, get rid of the VOLUME directive in your Dockerfile, and don't try to download the source files at build time. This is something you want to do at run time. You probably want to provision your image with a pair of scripts:
One that downloads the files to a specific location, say /build.
Another that actually runs the build process.
With these in place, you could first download the source files to a location on the host filesystem, as in:
docker run -v /path/on/my/host:/build myimage fetch-sources
And then you can build them by running:
docker run -v /path/on/my/host:/build myimage build-sources
With this model:
You're trying to muck about with volumes during the image build process. This is almost never what you want, since data stored in a volume is explicitly excluded from the image, and the build process doesn't permit you to conveniently mount host directories inside the container.
You are able to download the files into a persistent location on the host, where they will be available to you for editing, or re-building, or whatever.
You can run the build process multiple times without needing to re-download the source files every time.
I think this will do pretty much what you want, but if it doesn't meet your needs, or if something is unclear, let me know.
Recently I was trying to figure out how a docker workflow looks like.
What I thought is, devs should push images locally and in other environments servers should just directly pull that image and run it.
But I could see a lot of public images allows people to put configurations outside the container.
For example, in official elasticsearch image, there is a command as follows:
docker run -d -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch
So what is the point of putting configuration outside the container instead of running local containers quickly?
My argument is
if I put configuration inside a custom image, in testing environment or production, the server just need to pull that same image which is already built.
if I put configuration outside the image, in other environments, there will be another process to get that configuration from somewhere. Sure we could use git to source control that, but is this a tedious and useless effort to manage it? And installing third party libraries is also required.
Further question:
Should I put the application file (for example, war file) inside web server container or outside it?
When you are doing development, configuration files may change often; so rather than keep rebuilding the containers, you may use a volume instead.
If you are in production and need dozens or hundreds of the same container, all with slightly different configuration files, it is easy to have one single image and have diverse configuration files living outside (e.g. use consul, etcd, zookeeper, ... or VOLUME).
I have an angular project I'm working on containerizing. It currently has enough build tooling that a front-end developer could (and this is how it currently works) just run gulp in the project root, edit source files in src/, and the build tooling handles running traceur, templates and libsass and so forth, spitting content into a build directory. That build directory is served with a minimal server in development, and handled by nginx in production.
My goal is to build a docker-based environment that replicates this workflow. My users are developers who are on different kinds of boxes, so having build dependencies frozen in a dockerfile seems to make sense.
I've gotten close enough to this- docker mounts the host volume, the developer edits the file on the local disk, and the gulp build watcher is running in the docker container instance and rebuilds the site (and triggers livereload, etc etc).
The issue I have is with wrapping my head around how filesystem layers work. Is this process of rebuilding files in the container's build/frontend directory generating a ton of extraneous saved layers? That's not something I'd really like, because I'm not wild about monotonically growing this instance as developers tweak and rebuild and tweak and rebuild. It'd only grow locally, but having to go through the "okay, time to clean up and start over" process seems tedious.
Is this process of rebuilding files in the container's build/frontend directory generating a ton of extraneous saved layers?
Nope, the only way to stack up extra layers is to commit a container with changes to a new image then use that new image to create the next container. Rinse, repeat.
Filesystem layers are saved when a container is committed to a new image (docker commit ...). When a container is running there will be a single read/write layer on top that contains all of the changes made to the container since it was created.
having to go through the "okay, time to clean up and start over" process seems tedious.
If you run the build container with docker run --rm ... then you'll get the cleanup for free. The build container will be created fresh from the image each time.
Also, data volumes bypass the union filesystem so there's a good chance you won't write to the container's filesystem at all.