How do I save changes to a docker container and image - docker

I ran a container and it was missing command alias like ll. So I Typed alias ll="ls -lta" in the terminal while I was inside the container. After that, I ran docker commit to commit changes to the container and image. I got a new image (outside container), deleted the old image and ran a new container from the image I committed to. But was not able to use ll alias. What am I missing here?

Container state is only persisted through files.
alias ll="ls -lts" made no file changes and thus no state change was persisted by the docker commit....
You may achieve the result you intend by editing one of the files that the shell uses to define its state when opened, e.g. e.g. ~/.bashrc and ~/.bash_profile. You'll need to determine which to use for your environment|OS.

Related

My docker container keeps instantly closing when trying to run an image for bigcode-tools

I'm new to Docker, and I'm not sure how to quite deal with this situation.
So I'm trying to run a docker container in order to replicate some results from a research paper, specifically from here: https://github.com/danhper/bigcode-tools/blob/master/doc/tutorial.md
(image link: https://hub.docker.com/r/tuvistavie/bigcode-tools/).
I'm using a windows machine, and every time I try to run the docker image (via: docker run -p 80:80 tuvistavie/bigcode-tools), it instantly closes. I've tried running other images, such as the getting-started, but that image doesn't close instantly.
I've looked at some other potential workarounds, like using -dit, but since the instructions require setting an alias/doskey for a docker run command, using the alias and chaining it with other commands multiple times results in creating a queue for the docker container since the port is tied to the alias.
Like in the instructions from the GitHub link, I'm trying to set an alias/doskey to make api calls to pull data, but I am unable to get any data nor am I getting any errors when performing the calls on the command prompt.
Sorry for the long question, and thank you for your time!
Going in order of the instructions:
0. I can run this, it added the image to my Docker Desktop
1.
Since I'm using a windows machine, I had to use 'set' instead of 'export'
I'm not exactly sure what the $ is meant for in UNIX, and whether or not it has significant meaning, but from my understanding, the whole purpose is to create a directory named 'bigcode-workspace'
Instead of 'alias,' I needed to use doskey.
Since -dit prevented my image from instantly closing, I added that in as well, but I'm not 100% sure what it means. Running docker run (...) resulted in the docker image instantly closing.
When it came to using the doskey alias + another command, I've tried:
(doskey macro) (another command)
(doskey macro) ^& (another command)
(doskey macro) $T (another command)
This also seemed to be using github api call, so I also added a --token=(github_token), but that didn't change anything either
Because the later steps require expected data pulled from here, I am unable to progress any further.
Looks like this image is designed to be used as a command-line utility. So it should not be running continuously, but you run it via alias docker-bigcode for your tasks.
$BIGCODE_WORKSPACE is an environment variable expansion here. So on a Windows machine it's %BIGCODE_WORKSPACE%. You might want to set this variable in Settings->System->About->Advanced System Settings, because variables set with SET command will apply to the current command prompt session only. Or you can specify the path directly, without environment variable.
As for alias then I would just create a batch file with the following content:
docker run -p 6006:6006 -v %BIGCODE_WORKSPACE%:/bigcode-tools/workspace tuvistavie/bigcode-tools %*
This will run the specified command appending the batch file parameters at the end. You might need to add double quotes if BIGCODE_WORKSPACE path contains spaces.

Modifying volume data inherited from parent image

Say there is an image A described by the following Dockerfile:
FROM bash
RUN mkdir "/data" && echo "FOO" > "/data/test"
VOLUME "/data"
I want to specify an image B that inherites from A and modifies /data/test. I don't want to mount the volume, I want it to have some default data I specify in B:
FROM A
RUN echo "BAR" > "/data/test"
The thing is that the test file will maintain the content it had at the moment of VOLUME instruction in A Dockerfile. B image test file will contain FOO instead of BAR as I would expect.
The following Dockerfile demonstrates the behavior:
FROM bash
# overwriting volume file
RUN mkdir "/volume-data" && echo "FOO" > "/volume-data/test"
VOLUME "/volume-data"
RUN echo "BAR" > "/volume-data/test"
RUN cat "/volume-data/test" # prints "FOO"
# overwriting non-volume file
RUN mkdir "/regular-data" && echo "FOO" > "/regular-data/test"
RUN echo "BAR" > "/regular-data/test"
RUN cat "/regular-data/test" # prints "BAR"
Building the Dockerfile will print FOO and BAR.
Is it possible to modify file /data/test in B Dockerfile?
It seems that this is intended behavior.
Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
VOLUMEs are not part of your IMAGE. So what is the use case to seed data into it. When you push the image into another location, it is using an empty volume at start. The dockerfile behaviour does remind you of that.
So basically, if you want to keep the data along with the app code, you should not use VOLUMEs. If the volume declaration did exist in the parent image then you need to remove the volume before starting your own image build. (docker-copyedit).
There are a few non-obvious ways to do this, and all of them have their obvious flaws.
Hijack the parent docker file
Perhaps the simplest, but least reusable way, is to simply use the parent Dockerfile and modify that. A quick Google of docker <image-name:version> source should find the github hosting the parent image Dockerfile. This is good for optimizing the final image, but destroyes the point of using layers.
Use an on start script
While a Dockerfile can't make further modifications to a volume, a running container can. Add a script to the image, and change the Entrypoint to call that (and have that script call the original entry point). This is what you will HAVE to do if you are using a singleton-type container, and need to partially 'reset' a volume on start up. Of course, since volumes are persisted outside the container, just remember that 1) your changes may already be made, and 2) Another container started at the same time may already be making those changes.
Since volumes are (virtually) forever, I just use one time setup scripts after starting the containers for the first time. That way I easily control when the default data is setup/reset. (You can use docker inspect <volume-name> to get the host location of the volume if you need to)
The common middle ground on this one seems to be to have a one-off image whose only purpose is to run once to do the volume configurations, and then clean it up.
Bind to a new volume
Copy the contents of the old volume to a new one, and configure everything to use the new volume instead.
And finally... reconsider if Docker is right for you
You probably already wasted more time on this than it was worth. (In my experience, the maintenance pain of Docker has always far outweighed the benefits. However, Docker is a tool, and with any tool, you need to sometimes take a moment to reflect if you are using it right, and if there are better tools for the job.)

How do I update docker images?

I read that docker works with layers, so when creating a container with a Dockerfile, you start with the base image, then subsequent commands run add a layer to the container, so if you save the state of that new container, you have a new image. There are a couple of things I'm wondering about this.
If I start from a Ubuntu image, which is pretty big and bulky since its a complete OS, then I add a few tools to it and save this as a new image which I upload to the hub. If someone downloads my image, and they already have a Ubuntu image saved in their images folder, does this mean they can skip downloading Ubuntu since they already have the image? If so, how does this work when I modify parts of the original image, does Docker use its cached data to selectively apply those changes to the Ubuntu image after it loads it?
2.) How do I update an image that I built by modifying the Dockerfile? I setup a simple django project with this Dockerfile:
FROM python:3.5
ENV PYTHONBUFFERED 1
ENV APPLICATION_ROOT /app
ENV APP_ENVIRONMENT L
RUN mkdir -p $APPLICATION_ROOT
WORKDIR $APPLICATION_ROOT
ADD requirements.txt $APPLICATION_ROOT
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
ADD . $APPLICATION_ROOT
and used this to create the image in the beginning. So everytime I create a box, it loads all these environment variables, if I rebuild the box completely it reinstalls the packages and all the extras. I need to add a new environment variable, so I added it to the bottom of the Dockerfile, along with a test variable:
ENV COMPOSE_CONVERT_WINDOWS_PATHS 1
ENV TEST_ENV_VAR TEST
When I delete the container and the image, and build a new container, it all seems to go accordingly, it tells me that it creates the new Step 4 : ENV
COMPOSE_CONVERT_WINDOWS_PATHS 1
---> Running in 75551ea311b2
---> b25b60e29f18
Removing intermediate container 75551ea311b2
So its like something gets lost in some of these intermediate container transitions. Is this how the caching system works, every new layer is an intermediate container? So with that in mind, how do you add a new layer, do you always have to add the new data at the bottom of the Dockerfile? Or would it be better to leave the Dockerfile alone once the image is built, and just modify the container and built a new image?
EDIT I just tried installing an image, a package called bwawrik/bioinformatics, which is a CentOS based container which has a wide range of tools installed.
It froze half way through, so I exited it and then ran it again to see if everything was installed:
$ docker pull bwawrik/bioinformatics
Using default tag: latest
latest: Pulling from bwawrik/bioinformatics
a3ed95caeb02: Already exists
a3ed95caeb02: Already exists
7e78dbe53fdd: Already exists
ebcc98113eaa: Already exists
598d3c8fd678: Already exists
12520d1e1960: Already exists
9b4912d2bc7b: Already exists
c64f941884ae: Already exists
24371a4298bf: Already exists
993de48846f3: Already exists
2231b3c00b9e: Already exists
2d67c793630d: Already exists
d43673e70e8e: Already exists
fe4f50dda611: Already exists
33300f752b24: Already exists
b4eec31201d8: Already exists
f34092f697e8: Already exists
e49521d8fb4f: Already exists
8349c93680fe: Already exists
929d44a7a5a1: Already exists
09a30957f0fb: Already exists
4611e742e0b5: Already exists
25aacf0148db: Already exists
74da82504b6c: Already exists
3e0aac083b86: Already exists
f52c7e0ac000: Already exists
35eee92aaf2f: Already exists
5f6d8eb70885: Already exists
536920bfe266: Already exists
98638e678c51: Already exists
9123956b991d: Already exists
1c4c8a29cd65: Already exists
1804bf352a97: Already exists
aa6fe9359956: Already exists
e7e38d1250a9: Already exists
05e935c831dc: Already exists
b7dfc22c26f3: Already exists
1514d4797ffd: Already exists
Digest: sha256:0391808e21b7b5cc0eb44fc2dad0d7f5415115bdaafb4534c0b6a12efd47a88b
Status: Image is up to date for bwawrik/bioinformatics:latest
So it definitely installed the package in pieces, not all in one go. Are these pieces, different images?
image vs. container
First, let me clarify some terminology.
image: A static, immutable object. This is the thing you build when you run docker build using a Dockerfile. An image is not a thing that runs.
Images are composed of layers. an image might have only one layer, or it might have many layers.
container: A running thing. It uses an image as its starting template.
This is similar to a binary program and a process. You have a binary program on disk (such as /bin/sh), and when you run it, it is a process on your system. This is similar to the relationship between images and containers.
Adding layers to a base image
You can build your own image from a base image (such as ubuntu in your example). Some commands in your Dockerfile will create a new layer in the ultimate image. Some of those are RUN, COPY, and ADD.
The very first layer has no parent layer. But every other layer will have a parent layer. In this way they link to one another, stacking up like pancakes.
Each layer has a unique ID (the long hexadecimal hashes you have already seen). They can also have human-friendly names, known as tags (e.g. ubuntu:16.04).
What is a layer vs. an image?
Technically, each layer is also an image. If you build a new image and it has 5 layers, you can use that image and it will contain all 5 layers. If you run a container using the third layer in the stack as your image ID, you can do that too - but it would only contain 3 layers. The one you specify and the two that are its ancestors.
But as a matter of convention, the term "image" generally means the layer that has a tag associated. When you run docker images, it will show you all of the top-level images, and hide the layers beneath (but you can show them all with -a).
What is an intermediate container?
When docker build runs, it does all of its work inside of containers (naturally!) So if it encounters a RUN step, it will create a container from the current top layer, run the specified commands in there, and then save the result as a new layer. Then it will create a container from this new layer, run the next thing... etc.
The intermediate containers are only used for the build process, and are discarded after the build.
How layer filesystems work
You asked whether someone downloading your ubuntu-based image are only doing a partial download, if they already had the ubuntu image locally.
Yes! That's exactly right.
Every layer uses the layer beneath it as a base. The new layer is basically a diff between that layer and a new state. It's not a diff in the same way as a git commit might work, though. It works at the file level, not at a the line level.
Say you started from ubuntu, and you ran this Dockerfile.
FROM: ubuntu:16.04
RUN groupadd dan && useradd -g dan dan
This would result in a two layer image. The first layer would be the ubuntu image. The second would probably have only a handful of changes.
A newer copy of /etc/passwd with user "dan"
A newer copy of /etc/group with group "dan"
A new directory /home/dan
A couple of default files like /home/dan/.bashrc
And that's it. If you start a container from this image, those few files would be in the topmost layer, and everything else would come from the filesystem in the ubuntu image.
The top-most read-write layer in a container
One other point. When you run a container, you can write files in the filesystem. But if you stop the container and run another container from the same image, everything is reset. So where are the files written?
Images are immutable, so once they are created, they can't be changed. You can build a new version, but that's a new image. It would have a different ID and would not be the same image.
A container has a top-level read-write layer which is put on top of the image layers. Any writes happen in that layer. It works just like the other layers. If you need to modify a file (or add one, or delete one), that is done in the top layer, and doesn't affect the lower layers. If the file exists already, it is copied into the read-write layer, and then modified. This is known as copy-on-write (CoW).
Where to add changes
Do you have to add new things to the bottom of Dockerfile? No, you can add anything anywhere (or change anything).
However, how you do things does affect your build times because of how the build caching works.
Docker will try to cache results during builds. If it finds as it reads through Dockerfile that the FROM is the same, the first RUN is the same, the second RUN is the same... it will assume it has already done those steps, and will use cached results. If it encounters something that is different from the last build, it will invalidate the cache. Everything from that point on will be re-run fresh.
Some things will always invalidate the cache. For instance if you use ADD or COPY, those always invalidate the cache. That's because Docker only keeps track of what the build commands are. It doesn't try to figure out "is this version of the file I'm copying the same one as last time?"
So it is a common practice to start with FROM, then put very static things like RUN commands that install packages with e.g. apt-get, etc. Those things tend to not change a lot after your Dockerfile has been initially written. Later in the file is a more convenient place to put things that change more often.
It's hard to concisely give good advice on this, because it really depends on the project in question. But it pays to learn how the build caching works and try to take advantage of it.

Docker diff isn't showing changes to new files

Let's say I create a new ubuntu image, and add a file to the container, like so:
touch /file1
I run docker diff <container_id>, and I see that /file1 has been added to the system. Awesome. However, now when I do something like this:
echo "hello" > /file1
I do not see that change when I run docker diff again. However, if I create a file under say, /etc, docker diff will show that /etc has changed, and that /etc/new_file has been added (so I know docker diff can show changed files).
Does anyone know of a way to show changes to files that are added to the container during runtime?
You can have a look here: https://github.com/docker/docker/issues/29328,
As in the post above, docker diff compares differences in the container's rootfs from the image.
The image will only know a file is added and will not have any concept of what happened inside the file.

docker-compose caches run results

I'm having an issue with docker-compose where I'm passing a file into the container when it's run. The issue is that it doesn't seem to recognize when the file has been changed and serves the saved result back indefinitely until I change the name of the file.
An example (modified names for brevity):
jono#macbook:~/myProj% docker-compose run vpn conf.opvn
Options error: Unrecognized option or missing parameter(s) in conf.opvn:71: AXswRE+
5aN64mYiPSatOACC6+bISv8RcDPX/lMYdLwe8zQY6qWtbrjFXrp2 (2.3.8)
Then I change the file, save it, and run the command again - exact same output.
Then without changing anything I do this:
jono#macbook:~/myProj% cp conf.opvn newconf.opvn
And when I run $ docker-compose run vpn newconf.opvn it works. Seems really silly.
I'm working with Tmux and Mac if there is some way that affects it. Is this the expected behaviour? I couldn't find anything documenting this on the docker-compose homepage.
EDIT:
Specifically I'm using this repo from the amazing Jess.
The image you are using is using volume in order to mount your current directory. Basically the file conf.opvn is copied to the docker container.
When you change the file, the container doesn't see that change, but it does pick up the rename (which the container sees as a new file). This most probably is due to user rights of the file and the user rights of the folder in the docker container where this file is mounted. Try changing the file's permissions to 777 before beginning the process and check again.
You can find a discussion about this in the official forum of docker

Resources