Docker - how to ensure commit will persist a file?

Docker - how to ensure commit will persist a file? - docker

I keep doing a pull, run, <UPLOAD FILE>, commit, tag, push cycle only to be dismayed that my file is gone when I pull the pushed container. My goal is to include an ipynb file with my image that serves as a README/ tutorial for my users.
Reading other posts, I see that commit is/ isn't the way to add a file. What causes commit to persist/ disregard a file? Am I supposed to use docker cp to add the file before commiting?

If you need to publish your notebook file in a docker image, use a Dockerfile, something like this-
FROM jupyter/datascience-notebook
COPY mynotebook.ipynb /home/jovyan/work
Then, once you have your notebook the way you want it, just run docker build, docker push. To try and help you a bit more, the reason you are having your problem is that the jupyter images store the notebooks in a volume. Data in a volume is not part of the image, it lives on the filesystem of the host machine. That means that a commit isn't going to save anything in the work folder.
Really, an ipynb is a data file, not an application. The right way to do this is probably to just upload the ipynb file to a file store somewhere and tell your users to download it, since they could use one docker image to run many data files. If you really want a prebuilt image using the workflow you described, you could just put the file somewhere else that isn't in a volume so that it gets captured in your commit.

For those of you looking for some place to start with docker build, below are the lines in the Dockerfile that I triggered with docker build -t your-image-name:your-new-t
Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER name <email>
# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes
# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
# commenting out our public repo
ENV R_HOME=/opt/conda/lib/R
# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man
# ====== SUDO ======
USER root
# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks

Related

How to write files in Docker Images?

I've copied a file into a docker image with:
COPY dbconfig.xml /var/app/dbconfig.xml
After that I tried to replace some values in the file with:
RUN sed -i "s/PASSWD/$dbpasswd/" /var/app/dbconfig.xml
Note that $dbpassword is an ENV Variable.
When I check the contents of config.xml, by starting a container of that image and running a bash inside it, nothing has changed in the dbconfig.xml.
Now I think I misunderstand some fundamentals of docker images..
I even tested to create a simple file:
RUN echo "test" > newfile.txt
which seems to be deleted after the call..
I know that each RUN statement creates an new layer and after the statement it gets removed(?).
I'm confused. Why does something like installing software with
RUN apt-get install -y some-package
doesn't get removed and creating a simple file does get removed?
So.. how can I change files inside docker images at image-build-time?
Dockerfile:
FROM dchevell/jira-software:8.0
COPY dbconfig.xml /var/atlassian/application-data/jira/dbconfig.xml
WORKDIR /var/atlassian/application-data/jira
# set default password to admin
ENV dbpasswd=admin
RUN sed -i "s/PASSWD/$dbpasswd/" dbconfig.xml \
&& cat dbconfig.xml
RUN echo "test" > newfile.txt
dbconfig.xml
<?xml version="1.0" encoding="UTF-8"?>
<jira-database-config>
<name>defaultDS</name>
<delegator-name>default</delegator-name>
<database-type>postgres72</database-type>
<schema-name>public</schema-name>
<jdbc-datasource>
<url>jdbc:postgresql://docker-postgres:5432/jiradb</url>
<driver-class>org.postgresql.Driver</driver-class>
<username>atlasdb</username>
<password>PASSWD</password>
<pool-test-while-idle>true</pool-test-while-idle>
</jdbc-datasource>
</jira-database-config>
Update 1
Confusingly, when I COPY something in the WORKDIR folder, it persists, but when I try to modify it afterwards with SED, these changes do not persist! I think there is some really dark magic happening in the background..
Maybe I try to bind mount my preconfigured dbconfig.xml within docker-compose and see if that helps..
Update 2
From the Docker Documentation:
Changing the volume from within the Dockerfile: If any build steps
change the data within the volume after it has been declared, those
changes will be discarded.
I totally missed that! Thanks David for pointing me there:) So creating and writing Files DOES work as expected, but be careful with VOLUME directories. RUN statements do not work here.
So to address this issue, the best practice would be to bind mount the file into that volume.

If you look at the Dockerfile for that base image, it says in part
ENV JIRA_HOME /var/atlassian/application-data/jira
VOLUME ["${JIRA_HOME}"]
Once you execute a VOLUME statement in a Dockerfile, later Dockerfile statements can't make any more changes in that specific directory.
Given that the sorts of things you're trying to change are very installation-specific settings (admin password, database settings) I wouldn't try to build an image out of these. Instead I'd use the docker run -v option to inject the configuration file at runtime.

Each RUN statement does not create an intermediate container but creates a new layer on union file system, which is read only. When you run an image, a special writable layer is created for this container and all the changes you make on this container are written to this layer. (except the volumes. which is a different concept). That is why docker is able to share the same image (or even layers) between containers safely, without affecting each other. You can check docker documentation for more information.
For your question, you should see every change you make on build time in the running instance of this image, unless you somehow delete or overwrite them.

See this question.
The commands you are running are correct and they should create the files. What I suspect is that when you run your container, the jira application is overwriting the WORKDIR you have specified.
Try this Dockerfile:
WORKDIR /var/atlassian/application-data/jira
# set default password to admin
ENV dbpasswd=admin
RUN sed -i "s/PASSWD/$dbpasswd/" dbconfig.xml \
&& cat dbconfig.xml
WORKDIR /testtest
RUN touch test.txt
RUN echo "test" > newfile.txt
WORKDIR /var/atlassian/application-data/jira
Now if you start the container, you can see that the files are being created inside the /testtest folder.
If you want your changes to the dbconfig.xml file to persist you should try using volumes to bind the local dbconfig.xml with the jira folder.
Thanks for this interesting question :)

Modifying a docker image

I have recently started working on docker. I have downloaded a docker image and I want to change it in a way so that I can copy a folder with its contents from my local into that image or may be edit any file in the image.
I thought if I can extract the image somehow, do the changes and then create one image. Not sure if it will work like that. I tried looking for options but couldn't find a promising solution to it.
The current Dockerfile for the image is somewhat like this:
FROM abc/def
MAINTAINER Humpty Dumpty <#hd>
RUN sudo apt-get install -y vim
ADD . /home/humpty-dumpty
WORKDIR /home/humpty-dumpty
RUN cd lib && make
CMD ["bash"]
Note:- I am looking for an easy and clean way to change the existing image only and not to create a new image with the changes.

As an existing docker image cannot be changed what I did was that I created a dockerfile for a new docker image based on my original docker image for its contents and modified it to include test folder from local in the new image.
This link was helpful Build your own image - Docker Documentation
FROM abc/def:latest
The above line in docker file tells Docker which image your image is based on. So, the contents from parent image are copied to new image
Finally, for including the test folder on local drive I added below command in my docker file
COPY test /home/humpty-dumpty/test
and the test folder was added in that new image.
Below is the dockerfile used to create the new image from the existing one.
FROM abc/def:latest
# Extras
RUN sudo apt-get install -y vim
# copies local folder into the image
COPY test /home/humpty-dumpty/test
Update:- For editing a file in the running docker image, we can open that file using vim editor installed through the above docker file
vim <filename>
Now, the vim commands can be used to edit and save the file.

You don't change existing images, images are marked with a checksum and are considered read-only. Containers that use an image point to the same files on the filesystem, adding on their on RW layer for the container, and therefore depend on the image being unchanged. Layer caching also adds to this dependency.
Because of the layered filesystem and caching, creating a new image with just your one folder addition will only add a layer with that addition, and not a full copy of a new image. Therefore, the easy/clean/correct way is to create a new image using a Dockerfile.

First of all, I will not recommend messing with other image. It would be better if you can create your own. Moving forward, You can use copy command to add folder from host machine to the docker image.
COPY <src> <dest>
The only caveat is <src> path must be inside the context of the build; you cannot COPY ../something /something, because the first step of a docker build is to send the context directory (and subdirectories) to the docker daemon.
FROM abc/def
MAINTAINER Humpty Dumpty <#hd>
RUN sudo apt-get install -y vim
// Make sure you already have /home/humpty-dumpty directory
// if not create one
RUN mkdir -p /home/humpty-dumpty
COPY test /home/humpty-dumpty/ // This will add test directory to home/humpty-dumpty
WORKDIR /home/humpty-dumpty
RUN cd lib && make
CMD ["bash"]

I think you can use the docker cp command to make changes to the container which is build from your docker image and then commit the changes.
Here is a reference,
Guide for docker cp:
https://docs.docker.com/engine/reference/commandline/cp/
Guide for docker commit: https://docs.docker.com/engine/reference/commandline/container_commit/
Remember, docker image is a ready only so you cannot make any changes to that. The only way is to modify your docker file and recreate the image but in that case you lose the data(if not mounted on docker volume ). But you can make changes to container which is not ready only.

Best way to extend Dockerfile to pull deps/env vars from common/shared repositories

We have application repositories with common shared repos. Our application repos contain Dockerfiles and what we are trying to do is whenever common/shared repos change and depend on other libraries or env vars we want to have Dockerfiles in these common/shared repos as well. And the Dockerfile in the application repo will include them so that any deps/env changes are pulled in from the common/shared repos.
After googling for "docker include another dockerfile" I found the github issue https://github.com/docker/docker/issues/735. Which is exactly what we are looking for but this issue doesn't provide a clear solution. Is there best way to achieve this as of now? Thanks

The simple answer is to create you own base image. In your base image goes any common code. And then other Dockerfiles refer to this image in their FROM line.
Dockerfile for base image:
# pick an image to work off of
FROM debian:latest
# do your common stuff here
Then run docker build -t mybase:latest .
Now in the other images you want to create, they have the following Dockerfile:
FROM mybase:latest
# do your non-common stuff here

Here's a way, but it might not cover everything you're looking for. If the common repo has a shell script that installs dependencies and sets environment variables then you can invoke the shell script while building the image, by copying the shell script to the image and running it.
Say for example, you have a file called env_script.sh in your common_repo that looks like like
#!/bin/bash
apt install -y libpng-dev libfreetype6-dev pkg-config
pip install flask
export PYTHONPATH="${PYTHONPATH}:/usr/local/my_"
then the Dockerfile of your application would use it as below:
COPY ./common_repo/env_script.sh /tmp/ # Copy the shell script
RUN /tmp/env_script.sh # Run the shell script
RUN rm /tmp/env_script.sh # Remove temporary file after done

How do I dockerize an existing application...the basics

I am using windows and have boot2docker installed. I've downloaded images from docker hub and run basic commands. BUT
How do I take an existing application sitting on my local machine (lets just say it has one file index.php, for simplicity). How do I take that and put it into a docker image and run it?

Imagine you have the following existing python2 application "hello.py" with the following content:
print "hello"
You have to do the following things to dockerize this application:
Create a folder where you'd like to store your Dockerfile in.
Create a file named "Dockerfile"
The Dockerfile consists of several parts which you have to define as described below:
Like a VM, an image has an operating system. In this example, I use ubuntu 16.04. Thus, the first part of the Dockerfile is:
FROM ubuntu:16.04
Imagine you have a fresh Ubuntu - VM, now you have to install some things to get your application working, right? This is done by the next part of the Dockerfile:
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y python
For Docker, you have to create a working directory now in the image. The commands that you want to execute later on to start your application will search for files (like in our case the python file) in this directory. Thus, the next part of the Dockerfile creates a directory and defines this as the working directory:
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
As a next step, you copy the content of the folder where the Dockerfile is stored in to the image. In our example, the hello.py file is copied to the directory we created in the step above.
COPY . /usr/src/app
Finally, the following line executes the command "python hello.py" in your image:
CMD [ "python", "hello.py" ]
The complete Dockerfile looks like this:
FROM ubuntu:16.04
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y python
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . /usr/src/app
CMD [ "python", "hello.py" ]
Save the file and build the image by typing in the terminal:
$ docker build -t hello .
This will take some time. Afterwards, check if the image "hello" how we called it in the last line has been built successfully:
$ docker images
Run the image:
docker run hello
The output shout be "hello" in the terminal.
This is a first start. When you use Docker for web applications, you have to configure ports etc.

Your index.php is not really an application. The application is your Apache or nginx or even PHP's own server.
Because Docker uses features not available in the Windows core, you are running it inside an actual virtual machine. The only purpose for that would be training or preparing images for your real server environment.
There are two main concepts you need to understand for Docker: Images and Containers.
An image is a template composed of layers. Each layer contains only the differences between the previous layer and some offline system information. Each layer is fact an image. You should always make your image from an existing base, using the FROM directive in the Dockerfile (Reference docs at time of edit. Jan Vladimir Mostert's link is now a 404).
A container is an instance of an image, that has run or is currently running. When creating a container (a.k.a. running an image), you can map an internal directory from it to the outside. If there are files in both locations, the external directory override the one inside the image, but those files are not lost. To recover them you can commit a container to an image (preferably after stopping it), then launch a new container from the new image, without mapping that directory.

You'll need to build a docker image first, using a dockerFile, you'd probably setup apache on it, tell the dockerFile to copy your index.php file into your apache and expose a port.
See http://docs.docker.com/reference/builder/
See my other question for an example of a docker file:
Switching users inside Docker image to a non-root user (this is for copying over a .war file into tomcat, similar to copying a .php file into apache)

First off, you need to choose a platform to run your application (for instance, Ubuntu). Then install all the system tools/libraries necessary to run your application. This can be achieved by Dockerfile. Then, push Dockerfile and app to git or Bitbucket. Later, you can auto-build in the docker hub from github or Bitbucket. The later part of this tutorial here has more on that. If you know the basics just fast forward it to 50:00.

How can I make a host directory mount with the container directory's contents?

What I am trying to do is set up a docker container for ghost where I can easily modify the theme and other content. So I am making /opt/ghost/content a volume and mounting that on the host.
It looks like I will have to manually copy the theme into the host directory because when I mount it, it is an empty directory. So my content directory is totally empty. I am pretty sure I am doing something wrong.
I have tried a few different variations including using ADD with default themes folder, putting VOLUME at the end of the Dockerfile. I keep ending up with an empty content directory.
Does anyone have a Dockerfile doing something similar that is already working that I can look at?
Or maybe I can use the docker cp command somehow to populate the volume?
I may be missing something obvious or have made a silly mistake in my attempts to achieve this. But the basic thing is I want to be able to upload a new set of files into the ghost themes directory using a host-mounted volume and also have the casper theme in there by default.
This is what I have in my Dockerfile right now:
FROM ubuntu:12.04
MAINTAINER Jason Livesay "ithkuil#gmail.com"
RUN apt-get install -y python-software-properties
RUN add-apt-repository ppa:chris-lea/node.js
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get -qq update
RUN apt-get install -y sudo curl unzip nodejs=0.10.20-1chl1~precise1
RUN curl -L https://en.ghost.org/zip/ghost-0.3.2.zip > /tmp/ghost.zip
RUN useradd ghost
RUN mkdir -p /opt/ghost
WORKDIR /opt/ghost
RUN unzip /tmp/ghost.zip
RUN npm install --production
# Volumes
RUN mkdir /data
ADD run /usr/local/bin/run
ADD config.js /opt/ghost/config.js
ADD content /opt/ghost/content/
RUN chown -R ghost:ghost /opt/ghost
ENV NODE_ENV production
ENV GHOST_URL http://my-ghost-blog.com
EXPOSE 2368
CMD ["/usr/local/bin/run"]
VOLUME ["/data", "/opt/ghost/content"]

As far as I know, empty host-mounted (bound) volumes still will not receive contents of directories set up during the build, BUT data containers referenced with --volumes-from WILL.
So now I think the answer is, rather than writing code to work around non-initialized host-mounted volumes, forget host-mounted volumes and instead use data containers.
Data containers use the same image as the one you are trying to persist data for (so they have the same directories etc.).
docker run -d --name myapp_data mystuff/myapp echo Data container for myapp
Note that it will run and then exit, so your data containers for volumes won't stay running. If you want to keep them running you can use something like sleep infinity instead of echo, although this will obviously take more resources and isn't necessary or useful unless you have some specific reason -- like assuming that all of your relevant containers are still running.
You then use --volumes-from to use the directories from the data container:
docker run -d --name myapp --volumes-from myapp_data
https://docs.docker.com/userguide/dockervolumes/

You need to place the VOLUME directive before actually adding content to it.
My answer is completely wrong! Look here it seems there is actually a bug. If the VOLUME command happens after the directory already exists in the container, then changes are not persisted.
The Dockerfile should always end with a CMD or an ENTRYPOINT.
UPDATE
My solution would be to ADD files in the container home directory, then use a shell script as an entry point in which I'll copy the file in the shared volume and do all the other tasks.

I've been looking into the same thing. The problem I encountered was that I was using a relative local mount path, something like:
docker run -i -t -v ../data:/opt/data image
Switching to an absolute local path fixed this up for me:
docker run -i -t -v /path/to/my/data:/opt/data image
Can you confirm whether you were doing a relative path, and whether this helps?

Docker V1.8.1 preserves data in a volume if you mount it with the run command. From the docker docs:
Volumes are initialized when a container is created. If the container’s
base image contains data at the specified mount point, that existing
data is copied into the new volume upon volume initialization.
Example: An image defines the
/var/www/html
as a volume and populates it with the data of a web application. Your docker hosts provides a mount directory
/my/host/dir
You start the image by
docker run -v /my/host/dir:/var/www/html image
then you will get all the data from /var/www/html in the hosts /my/host/dir
This data will persist even if you delete the container or the image.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart