How to write files in Docker Images? - docker

I've copied a file into a docker image with:
COPY dbconfig.xml /var/app/dbconfig.xml
After that I tried to replace some values in the file with:
RUN sed -i "s/PASSWD/$dbpasswd/" /var/app/dbconfig.xml
Note that $dbpassword is an ENV Variable.
When I check the contents of config.xml, by starting a container of that image and running a bash inside it, nothing has changed in the dbconfig.xml.
Now I think I misunderstand some fundamentals of docker images..
I even tested to create a simple file:
RUN echo "test" > newfile.txt
which seems to be deleted after the call..
I know that each RUN statement creates an new layer and after the statement it gets removed(?).
I'm confused. Why does something like installing software with
RUN apt-get install -y some-package
doesn't get removed and creating a simple file does get removed?
So.. how can I change files inside docker images at image-build-time?
Dockerfile:
FROM dchevell/jira-software:8.0
COPY dbconfig.xml /var/atlassian/application-data/jira/dbconfig.xml
WORKDIR /var/atlassian/application-data/jira
# set default password to admin
ENV dbpasswd=admin
RUN sed -i "s/PASSWD/$dbpasswd/" dbconfig.xml \
&& cat dbconfig.xml
RUN echo "test" > newfile.txt
dbconfig.xml
<?xml version="1.0" encoding="UTF-8"?>
<jira-database-config>
<name>defaultDS</name>
<delegator-name>default</delegator-name>
<database-type>postgres72</database-type>
<schema-name>public</schema-name>
<jdbc-datasource>
<url>jdbc:postgresql://docker-postgres:5432/jiradb</url>
<driver-class>org.postgresql.Driver</driver-class>
<username>atlasdb</username>
<password>PASSWD</password>
<pool-test-while-idle>true</pool-test-while-idle>
</jdbc-datasource>
</jira-database-config>
Update 1
Confusingly, when I COPY something in the WORKDIR folder, it persists, but when I try to modify it afterwards with SED, these changes do not persist! I think there is some really dark magic happening in the background..
Maybe I try to bind mount my preconfigured dbconfig.xml within docker-compose and see if that helps..
Update 2
From the Docker Documentation:
Changing the volume from within the Dockerfile: If any build steps
change the data within the volume after it has been declared, those
changes will be discarded.
I totally missed that! Thanks David for pointing me there:) So creating and writing Files DOES work as expected, but be careful with VOLUME directories. RUN statements do not work here.
So to address this issue, the best practice would be to bind mount the file into that volume.

If you look at the Dockerfile for that base image, it says in part
ENV JIRA_HOME /var/atlassian/application-data/jira
VOLUME ["${JIRA_HOME}"]
Once you execute a VOLUME statement in a Dockerfile, later Dockerfile statements can't make any more changes in that specific directory.
Given that the sorts of things you're trying to change are very installation-specific settings (admin password, database settings) I wouldn't try to build an image out of these. Instead I'd use the docker run -v option to inject the configuration file at runtime.

Each RUN statement does not create an intermediate container but creates a new layer on union file system, which is read only. When you run an image, a special writable layer is created for this container and all the changes you make on this container are written to this layer. (except the volumes. which is a different concept). That is why docker is able to share the same image (or even layers) between containers safely, without affecting each other. You can check docker documentation for more information.
For your question, you should see every change you make on build time in the running instance of this image, unless you somehow delete or overwrite them.

See this question.
The commands you are running are correct and they should create the files. What I suspect is that when you run your container, the jira application is overwriting the WORKDIR you have specified.
Try this Dockerfile:
WORKDIR /var/atlassian/application-data/jira
# set default password to admin
ENV dbpasswd=admin
RUN sed -i "s/PASSWD/$dbpasswd/" dbconfig.xml \
&& cat dbconfig.xml
WORKDIR /testtest
RUN touch test.txt
RUN echo "test" > newfile.txt
WORKDIR /var/atlassian/application-data/jira
Now if you start the container, you can see that the files are being created inside the /testtest folder.
If you want your changes to the dbconfig.xml file to persist you should try using volumes to bind the local dbconfig.xml with the jira folder.
Thanks for this interesting question :)

Related

How does to download and extract .tar files in a container for Dockerfile?

Over here is a use case - I want to download and extract all files from a particular website and allow users to specify from which workweek it might be done. Please, imagine using one docker command and specifying only the variable which tells where to go, download and extract files.
The problem is I want to allow a user to manipulate variables that refer to a particular workweek.
Now it is only my idea, not sure If I am thinking right before I start to design my Dockerfile.
Dockerfile:
...
ENV TARGET="$WW_DIR"
...
Now you can imagine that the first user wants to download files from WW17 so he can type:
docker container run -e TARGET=WW17 <image_name>
The second one wants to download files from WW25:
docker container run -e TARGET=WW25 <image_name>
Etc.
Underhood Dockerfile knows that it must go to the directory from WW17 (in the first scenario) or WW25 (in the second scenario). My imagination is that a new container is created then using for example "curl" files are downloaded from an external server and extracted.
Can you recommend to me the best methods with some examples of how to solve it? Apply bash script inside of the container?
Thanks.
There is no Dockerfile at docker container run, it just runs the command. So write a command that does what you want or add the data to the image when building it with Dockerfile.
# Dockerfile
FROM your_favourite_image
COPY your_script /
RUN chmod +x /your_script
CMD /your_script
# your_script
#!/usr/bin/env your_favourite_langauge_like_python_or_bash_or_perl
# download the $TARGET or whatever you want to do
And then
docker build -t image .
docker run -r TARGET=WW1 image
Reading: https://docs.docker.com/engine/reference/builder/#cmd https://docs.docker.com/engine/reference/builder/#entrypoint https://docs.docker.com/get-started/overview/ https://btholt.github.io/complete-intro-to-containers/dockerfile

Copy files to a Docker image using entrypoint instead of dockerfile for GitHub actions

I created a Dockerfile for my website development (Jekyll in this case, but I do not think that matters much).
In case this information is helpful, I code locally using Visual Studio Code and the Remote Containers extension. This extension allows me to manage my code locally while keeping it in sync with the container.
To publish my website, I run a GitHub Action that creates a container from my Dockerfile and then runs all the build code from an entrypoint.sh file. Here is the pertinent code from my Dockerfile:
FROM ruby:alpine as jekyll
ENV env_workspace_directory=$workspace_directory
... more irrelevant code ...
RUN echo "#################################################"
RUN echo "Copy the GitHub repo to the Docker container"
RUN echo "COPY . ${env_workspace_directory}"
COPY . ${env_workspace_directory}
RUN echo "#################################################"
RUN echo "Run the entrypoint "
ENTRYPOINT ["/entrypoint.sh"]
Because I am using the Remote Containers VS Code extension, I do not want the Dockerfile to contain the COPY . ${env_workspace_directory} code. Instead, I only want that code to run when used as a GitHub Action.
So I guess I have two questions, with the first being ideal:
Is it possible to write like-type code that will copy the contents of the currently open GitHub branch (or at least the main branch), including all files and subfolders into the Docker container using the entrypoint.sh file instead? If so, what would that entrypoint.sh code look like?
Is it possible to leave the COPY command in the Dockerfile and make it conditional? For example "Only run the COPY command if running from a GitHub Action"?
For #1 above, I reviewed this Stack Overflow article that says you can use the docker cp command, but I am unsure if that is (a) correct and (b) how to be sure I am using the $workspace_directory.
I am very new to Dockerfiles, writing shell commands, and GitHub Actions, so apologies if this question is an easy one or if more clarifications are required.
Here is the Development repo if that is useful.
A Docker volume mount happens after the image is built but before the main container process is run. So if you do something like
docker run -v "$PWD/site:/site" your-image
then the entrypoint.sh script will see the host content in the container's /site directory, even if something different had been COPYed in the Dockerfile.
There are no conditionals or flow control in Dockerfiles, beyond shell syntax within individual RUN instructions. You could in principle access a Git repository in your container process, but managing the repository location, ssh credentials, branches, uncommitted files, etc. can get complex.
Depending on how you're using the image, I could suggest two approaches here.
If you have a deploy-time action that uses this image in its entirety to build the site, then just leave your Dockerfile as-is. In development use a bind mount to inject your host content; don't especially worry about skipping the image COPY here.
Another approach is to build the image containing the Jekyll tool, but to treat the site itself as data. In that case you'd always run the image with a docker run -v or Compose volumes: option to inject the data. In the Dockerfile you might create an empty directory to be safe
RUN mkdir "${env_workspace_directory}" # consider a fixed path
and in your entrypoint script you can verify the site exists
if [ ! -f "$env_workspace_directory/_site.yml" ]; then
cat >&2 <<EOF
There does not seem to be a Jekyll site in $env_workspace_directory.
Please re-run this container with the site mounted.
EOF
exit 1
fi

Docker - how to ensure commit will persist a file?

I keep doing a pull, run, <UPLOAD FILE>, commit, tag, push cycle only to be dismayed that my file is gone when I pull the pushed container. My goal is to include an ipynb file with my image that serves as a README/ tutorial for my users.
Reading other posts, I see that commit is/ isn't the way to add a file. What causes commit to persist/ disregard a file? Am I supposed to use docker cp to add the file before commiting?
If you need to publish your notebook file in a docker image, use a Dockerfile, something like this-
FROM jupyter/datascience-notebook
COPY mynotebook.ipynb /home/jovyan/work
Then, once you have your notebook the way you want it, just run docker build, docker push. To try and help you a bit more, the reason you are having your problem is that the jupyter images store the notebooks in a volume. Data in a volume is not part of the image, it lives on the filesystem of the host machine. That means that a commit isn't going to save anything in the work folder.
Really, an ipynb is a data file, not an application. The right way to do this is probably to just upload the ipynb file to a file store somewhere and tell your users to download it, since they could use one docker image to run many data files. If you really want a prebuilt image using the workflow you described, you could just put the file somewhere else that isn't in a volume so that it gets captured in your commit.
For those of you looking for some place to start with docker build, below are the lines in the Dockerfile that I triggered with docker build -t your-image-name:your-new-t
Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER name <email>
# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes
# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
# commenting out our public repo
ENV R_HOME=/opt/conda/lib/R
# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man
# ====== SUDO ======
USER root
# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks

Preventing access to code inside of a docker container

I am wanting to build a production ready image for clients to use and I am wondering if there is a way to prevent access to my code within the image?
My current approach is storing my code in /root/ and creating a "customer" user that only has a startup script in their home dir.
My Dockerfile looks like this
FROM node:8.11.3-alpine
# Tools
RUN apk update && apk add alpine-sdk
# Create customer user
RUN adduser -s /bin/ash -D customer
# Add code
COPY ./code /root/code
COPY ./start.sh /home/customer/
# Set execution permissions
RUN chown root:root /home/customer/start.sh
RUN chmod 4755 /home/customer/start.sh
# Allow customer to execute start.sh
RUN echo 'customer ALL=(ALL) NOPASSWD: /home/customer/start.sh' | EDITOR='tee -a' visudo
# Default to use customer
USER customer
ENTRYPOINT ["sudo","/home/customer/start.sh"]
This approach works as expected, if I were to enter the container I won't be able to see the codebase but I can start up services.
The final step in my Dockerfile would be to either, set a password for the root user or remove it entirely.
I am wondering if this is a correct production flow or am I attempting to use docker for something it is not meant to?
If this is the correct, what other things should I lock down?
any tips appreciated!
Anybody who has your image can always do
docker run -u root imagename sh
Anybody who can run Docker commands at all has root access to their system (or can trivially give it to themselves via docker run -v /etc:/hostetc ...) and so can freely poke around in /var/lib/docker to see what's there. It will have all of the contents of all of the images, if scattered across directories in a system-specific way.
If your source code is actually secret, you should make sure you're using a compiled language (C, Go, Java kind of) and that your build process doesn't accidentally leak the source code into the built image, and it will be as secure as anything else where you're distributing binaries to end users. If you're using a scripting language (Python, JavaScript, Ruby) then intrinsically the end user has to have the code to be able to run the program.
Something else to consider is the use of docker container export. This would allow anyone to export the containers file system, and therefore have access to code files.
I believe this bypasses removing the sh/bash and any user permission changes as others have mentioned.
You can protect your source-code even it can't be have a build stage or state,By removing the bash and sh in your base Image.
By this approach you can restrict the user to not enter into your docker container and Image either through these commands
docker (exec or run) -it (container id) bash or sh.
To have this kind of approach after all your build step add this command at the end of your build stage.
RUN rm -rf bin/bash bin/sh
you can also refer more about google distroless images which follow the same approach above.
You can remove the users from the docker group and create sudos for the docker start and docker stop

How can I make a host directory mount with the container directory's contents?

What I am trying to do is set up a docker container for ghost where I can easily modify the theme and other content. So I am making /opt/ghost/content a volume and mounting that on the host.
It looks like I will have to manually copy the theme into the host directory because when I mount it, it is an empty directory. So my content directory is totally empty. I am pretty sure I am doing something wrong.
I have tried a few different variations including using ADD with default themes folder, putting VOLUME at the end of the Dockerfile. I keep ending up with an empty content directory.
Does anyone have a Dockerfile doing something similar that is already working that I can look at?
Or maybe I can use the docker cp command somehow to populate the volume?
I may be missing something obvious or have made a silly mistake in my attempts to achieve this. But the basic thing is I want to be able to upload a new set of files into the ghost themes directory using a host-mounted volume and also have the casper theme in there by default.
This is what I have in my Dockerfile right now:
FROM ubuntu:12.04
MAINTAINER Jason Livesay "ithkuil#gmail.com"
RUN apt-get install -y python-software-properties
RUN add-apt-repository ppa:chris-lea/node.js
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get -qq update
RUN apt-get install -y sudo curl unzip nodejs=0.10.20-1chl1~precise1
RUN curl -L https://en.ghost.org/zip/ghost-0.3.2.zip > /tmp/ghost.zip
RUN useradd ghost
RUN mkdir -p /opt/ghost
WORKDIR /opt/ghost
RUN unzip /tmp/ghost.zip
RUN npm install --production
# Volumes
RUN mkdir /data
ADD run /usr/local/bin/run
ADD config.js /opt/ghost/config.js
ADD content /opt/ghost/content/
RUN chown -R ghost:ghost /opt/ghost
ENV NODE_ENV production
ENV GHOST_URL http://my-ghost-blog.com
EXPOSE 2368
CMD ["/usr/local/bin/run"]
VOLUME ["/data", "/opt/ghost/content"]
As far as I know, empty host-mounted (bound) volumes still will not receive contents of directories set up during the build, BUT data containers referenced with --volumes-from WILL.
So now I think the answer is, rather than writing code to work around non-initialized host-mounted volumes, forget host-mounted volumes and instead use data containers.
Data containers use the same image as the one you are trying to persist data for (so they have the same directories etc.).
docker run -d --name myapp_data mystuff/myapp echo Data container for myapp
Note that it will run and then exit, so your data containers for volumes won't stay running. If you want to keep them running you can use something like sleep infinity instead of echo, although this will obviously take more resources and isn't necessary or useful unless you have some specific reason -- like assuming that all of your relevant containers are still running.
You then use --volumes-from to use the directories from the data container:
docker run -d --name myapp --volumes-from myapp_data
https://docs.docker.com/userguide/dockervolumes/
You need to place the VOLUME directive before actually adding content to it.
My answer is completely wrong! Look here it seems there is actually a bug. If the VOLUME command happens after the directory already exists in the container, then changes are not persisted.
The Dockerfile should always end with a CMD or an ENTRYPOINT.
UPDATE
My solution would be to ADD files in the container home directory, then use a shell script as an entry point in which I'll copy the file in the shared volume and do all the other tasks.
I've been looking into the same thing. The problem I encountered was that I was using a relative local mount path, something like:
docker run -i -t -v ../data:/opt/data image
Switching to an absolute local path fixed this up for me:
docker run -i -t -v /path/to/my/data:/opt/data image
Can you confirm whether you were doing a relative path, and whether this helps?
Docker V1.8.1 preserves data in a volume if you mount it with the run command. From the docker docs:
Volumes are initialized when a container is created. If the container’s
base image contains data at the specified mount point, that existing
data is copied into the new volume upon volume initialization.
Example: An image defines the
/var/www/html
as a volume and populates it with the data of a web application. Your docker hosts provides a mount directory
/my/host/dir
You start the image by
docker run -v /my/host/dir:/var/www/html image
then you will get all the data from /var/www/html in the hosts /my/host/dir
This data will persist even if you delete the container or the image.

Resources