Docker containers retaining data - docker

I'm learning Docker and I've been crafting a Dockerfile for an Ubuntu container.
My problem is I keep getting persistent information between different containers. I have exited, removed the container and then removed its image. After I made changes to my Dockerfile, I executed docker build -t playbuntu . Executing the following Dockerfile:
FROM ubuntu:latest
## for apt to be noninteractive
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
## preesed tzdata, update package index, upgrade packages and install needed software
RUN echo "tzdata tzdata/Areas select Europe" > /tmp/preseed.txt; \
echo "tzdata tzdata/Zones/Europe select London" >> /tmp/preseed.txt; \
debconf-set-selections /tmp/preseed.txt && \
apt-get update && \
apt-get install -y tzdata
RUN apt-get update -y && apt-get upgrade -y && apt-get install tree nano vim -y && apt-get install less -y && apt-get install lamp-server^ -y
RUN echo "ServerName localhost" >> /etc/apache2/apache2.conf
EXPOSE 80
WORKDIR /var/www
COPY ./index.php /var/www
COPY ./000-default.conf /etc/apache2/sites-available
CMD [ "apache2ctl", "start", "&&", "apache2ctl", "restart" ]
Once I execute winpty docker run -it -p 80:80 playbuntu bash, my problem is, instead of my index.php file outputting the following:
<?php
print "<center><h3>Sisko's LAMP activated!</h3><center>";
phpinfo();
I get the following debug code I experimented with hours ago:
<?php
print "...responding";
phpinfo();
Is there a caching system Docker might be using? I pruned all Docker volumes just in case that is how Docker might be caching. All except 2 volumes which are being used by other containers unrelated to my project.

I suspect that, after you changed index.php on your host machine you did not rerun the docker build ... command. You will need to recreate the container image any time any of the image's content changes.
Please confirm whether running the docker build ... command again solves the issue.
Background
Docker images comprise one of more layers.
Image (layers) and Volumes are distinct.
Since your docker run ... command contains no e.g. --volume= bind mounts (or equivalent), I suspect Docker Volumes are not relevant to this question.
There is a possibility that rebuilding images does not replace image layers and thus caching does occur. However, in your case, I think this is not the issue. See the Docker documentation (link) for an overview of the Dockerfile commands that add layers.
Because of the way that layers work, if the preceding layers in your Dockerfile were unchanged and the index.php were unchanged, then Docker would not rebuild these layers. However, because your Dockerfile includes a layer that apt-get update && apt-get install ...., the layer will be invalidated, recreated and so will subsequent layers.
If you change index.php on the host and rebuild the image, this layer will always be rebuilt.
I built your Dockerfile twice. Here's the beginning of the second (!) build command. NB the Using cache commands for the unchanged layers preceding the RUN apt-get update... layer which is rebuilt.
docker build --rm -f "Dockerfile" -t 59582820:0939 "."
Sending build context to Docker daemon 3.584kB
Step 1/10 : FROM ubuntu:latest
---> 549b9b86cb8d
Step 2/10 : ENV DEBIAN_FRONTEND noninteractive
---> Using cache
---> 1529d0e293f3
Step 3/10 : ENV DEBCONF_NONINTERACTIVE_SEEN true
---> Using cache
---> 1ba10410d06a
Step 4/10 : RUN echo "tzdata tzdata/Areas select Europe" > /tmp/preseed.txt; echo "tzdata tzdata/Zones/Europe select London" >> /tmp/preseed.txt; debconf-set-selections /tmp/preseed.txt && apt-get update && apt-get install -y tzdata
---> Using cache
---> afb861da52e4
Step 5/10 : RUN apt-get update -y && apt-get upgrade -y && apt-get install tree nano vim -y && apt-get install less -y && apt-get install lamp-server^ -y
---> Running in 6f05bbb8e80a
The evidence suggests to me that you didn't rebuild after changing the index.php.

You can use volumes to persist some folders like configs or properties etc. Docker Storage

Related

Howto create docker image tar directly from build step?

I want to create a shareable image which can be docker loaded into another docker installation directly from docker build
e.g.
Dockerfile:
FROM my-app/env as builder
COPY /my-app /my-app
RUN /my-app/build.sh
FROM ubuntu:20.04
COPY --from=builder /my-app/build/my-app.exe /bin
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& apt-get install -y --no-install-recommends \
***some stuff required at runtime*** \
&& apt-get autoclean \
&& apt-get autoremove \
&& ldconfig
ENTRYPOINT ["my-app.exe"]
CMD ["--help"]
Running something like:
/my-app$ docker build -t my-app/exe .
will produce an image which I can use with
$ docker run my-app/exe
If I want to now share this, I need to do:
$ docker save -o my-app.tar my-app/exe
which creates the archive my-app.tar which can be shared with another system running the same arch/OS and used by:
/my-othersystem/my-app$ docker load < my-app.tar
And
$ docker run my-app/exe
now works on the other system.
HOWEVER
This requires building the image into the build system docker repository then saving the image to a file. I don't plan to run the exeutable on the build system so don't want it taking up space.
You can do:
/my-app$ DOCKER_BUILDKIT=1 docker build --output type=tar,dest=out.tar .
But this creates a FS, not an image. I want it to be exported as a docker image compatible with docker load directly, is this possible?

what is the lightest docker image to be used for automation test?

I need to create a docker image to run the UI automation test in headless mode.
it should contain:
NodeJs, JDK, chrome browser.
I have created the one below, which is 1.6 GB, is there a better way to make it lighter and optimized
FROM node:slim
ENV DEBIAN_FRONTEND noninteractive
WORKDIR /project
#=============================
# Install Dependenices
#=============================
SHELL ["/bin/bash", "-c"]
RUN apt update && apt install -y wget bzip2 openjdk-11-jre xvfb libnotify-dev
#==============================
# install chrome
#==============================
RUN wget https://dl.google.com/linux/direct/${CHROME_PACKAGE} && \
dpkg-deb -x ${CHROME_PACKAGE} / && \
apt-get install -f -y
#=========================
# Copying Scripts to root
#=========================
COPY . /project
RUN chmod a+x ./execute_test.sh
#=======================
# framework entry point
#=======================
CMD [ "/bin/bash" ]
Your build will fail if not use SHELL ["/bin/bash", "-c"]? Otherwise eliminate this line can save you a layer. You can combine the 2 RUN into one which save you a few more. Then try --squash flag when building your image. Note your docker daemon needs to have experiment enabled to use this flag. You should get a smaller image using these steps.

Why does docker rebuild all layers every time I change build args

I have a docker file which has a lot of layers. At the top of the file I have some args like
FROM ubuntu:18.04
ARGS USER=test-user
ARGS UID=1000
#ARGS PW=test-user
# Then several Layers which does not use any ARGS. Example
LABEL version="1.0"
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN mkdir ~/mapped-volume
RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim \
sudo nodejs npm net-tools flex perl automake bison libtool byacc
# And so on
# And finally towards the end
# Setup User
RUN useradd -m -d /home/${USER} --uid ${UID} -G sudo -s /bin/bash ${USER}
# && echo "${USER}:${PW}" | chpasswd
# Couple of more commands to change dir, entry point etc. Example
When I build this docker file with any arg value different from the last build and/or after small changes in the last two layers, the build builds everything again. It does not use cached layer. The command I use to build is something like this
docker build --build-arg USER=new-user --build-arg UID=$UID -t my-image:1.0 .
And every time I change the values the build goes all through again. With a truncated top like below
UID -t my-image:1.0 .
Sending build context to Docker daemon 44.54kB
Step 1/23 : FROM ubuntu:18.04
---> ccc6e87d482b
Step 2/23 : ARG USER=ml-user
---> Using cache
---> 6c0c5d5c5056
Step 3/23 : ARG UID=1000
---> Using cache
---> b25867c282c7
Step 4/23 : LABEL version="1.0"
---> Running in 1ffff70d56c1
Removing intermediate container 1ffff70d56c1
---> 0f1277def3ca
Step 5/23 : ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
---> Running in 49d08c41b233
Removing intermediate container 49d08c41b233
---> f5b345573c1f
Step 6/23 : RUN mkdir ~/mapped-volume
---> Running in e4f8a5956450
Removing intermediate container e4f8a5956450
---> 1b22731d9051
Step 7/23 : RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim sudo nodejs npm net-tools flex perl automake bison libtool byacc
---> Running in ffc297de6234
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
So from step 7 it keeps doing all steps without using the cache of that layer which should have a bunch of packages
Why? How can I stop this? Previously when I did not have args, this layer and other layers used to be picked up from cache.
Move your args to just before you need them. Docker does not replace args in the RUN commands before running them. Instead, the args are passed as environment variables and expanded by the shell within the temporary container. Because of that, a change to an arg is a change to the environment, and a miss of the build cache for that step. Once one step misses the cache, all following steps must be rebuilt.
FROM ubuntu:18.04
# Then several Layers which does not use any ARGS. Example
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN mkdir ~/mapped-volume
RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim \
sudo nodejs npm net-tools flex perl automake bison libtool byacc
# And so on
# And finally towards the end
# Setup User
ARGS USER=test-user
ARGS UID=1000
RUN useradd -m -d /home/${USER} --uid ${UID} -G sudo -s /bin/bash ${USER}
# && echo "${USER}:${PW}" | chpasswd
# Couple of more commands to change dir, entry point etc. Example
LABEL version="1.0"
Also, labels, environment variables that aren't needed at build time, exposed ports, and any other meta data is often best left to the end of the Dockerfile since they have minimal impact on build time and there's no need to miss the cache when they change.

Add tcpdump docker image with base image node:10.0.0

How can I add tcpdump package in Dockerfile if the base image is node:10.0.0
Dockerfile:
FROM node:10.0.0
EXPOSE $SERVICE_PORT
USER node
RUN mkdir -p /home/node/
WORKDIR /home/node/
COPY package.json /home/node/
RUN npm install
COPY . /home/node/
CMD ["npm", "run", "staging"]
I want to trace the traffic in this container.
It is unnecessary to modify your image to access the network of the container. You can run a second container in the same network namespace:
docker run -it --net container:${container_to_debug} nicolaka/netshoot
From there, you can run tcpdump and a variety of other network debugging tools and see the traffic going to your other container. To see all the tools included in netshoot, see the github repo: https://github.com/nicolaka/netshoot
you base image is debian based, therefore use apt-get as your package manager. add to your dockerfile the following instructions:
USER root
RUN apt-get update -y; exit 0
RUN apt-get install tcpdump -y
explanation:
USER root - apt-get requires root permissions.
RUN apt-get update -y; exit 0 - i am adding exit 0 to tell docker i want to keep the build, even if apt-get couldn't get all of its mirror files
RUN apt-get install tcpdump -y - installation of the package.

Dockerfile - Hide --build-args from showing up in the build time

I have the following Dockerfile:
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y \
git \
make \
python-pip \
python2.7 \
python2.7-dev \
ssh \
&& apt-get autoremove \
&& apt-get clean
ARG password
ARG username
ENV password $password
ENV username $username
RUN pip install git+http://$username:$password#org.bitbucket.com/scm/do/repo.git
I use the following commands to build the image from this Dockerfile:
docker build -t myimage:v1 --build-arg password="somepassoword" --build-arg username="someuser" .
However, in the build log the username and password that I pass as --build-arg are visible.
Step 8/8 : RUN pip install git+http://$username:$password#org.bitbucket.com/scm/do/repo.git
---> Running in 650d9423b549
Collecting git+http://someuser:somepassword#org.bitbucket.com/scm/do/repo.git
How to hide them? Or is there a different way of passing the credentials in the Dockerfile?
Update
You know, I was focusing on the wrong part of your question. You shouldn't be using a username and password at all. You should be using access keys, which permit read-only access to private repositories.
Once you've created an ssh key and added the public component to your repository, you can then drop the private key into your image:
RUN mkdir -m 700 -p /root/.ssh
COPY my_access_key /root/.ssh/id_rsa
RUN chmod 700 /root/.ssh/id_rsa
And now you can use that key when installing your Python project:
RUN pip install git+ssh://git#bitbucket.org/you/yourproject.repo
(Original answer follows)
You would generally not bake credentials into an image like this. In addition to the problem you've already discovered, it makes your image less useful because you would need to rebuild it every time your credentials changed, or if more than one person wanted to be able to use it.
Credentials are more generally provided at runtime via one of various mechanisms:
Environment variables: you can place your credentials in a file, e.g.:
USERNAME=myname
PASSWORD=secret
And then include that on the docker run command line:
docker run --env-file myenvfile.env ...
The USERNAME and PASSWORD environment variables will be available to processes in your container.
Bind mounts: you can place your credentials in a file, and then expose that file inside your container as a bind mount using the -v option to docker run:
docker run -v /path/to/myfile:/path/inside/container ...
This would expose the file as /path/inside/container inside your container.
Docker secrets: If you're running Docker in swarm mode, you can expose your credentials as docker secrets.
It's worse than that: they're in docker history in perpetuity.
I've done two things here in the past that work:
You can configure pip to use local packages, or to download dependencies ahead of time into "wheel" files. Outside of Docker you can download the package from the private repository, giving the credentials there, and then you can COPY in the resulting .whl file.
pip install wheel
pip wheel --wheel-dir ./wheels git+http://$username:$password#org.bitbucket.com/scm/do/repo.git
docker build .
COPY ./wheels/ ./wheels/
RUN pip install wheels/*.whl
The second is to use a multi-stage Dockerfile where the first stage does all of the installation, and the second doesn't need the credentials. This might look something like
FROM ubuntu:16.04 AS build
RUN apt-get update && ...
...
RUN pip install git+http://$username:$password#org.bitbucket.com/scm/do/repo.git
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install \
python2.7
COPY --from=build /usr/lib/python2.7/site-packages/ /usr/lib/python2.7/site-packages/
COPY ...
CMD ["./app.py"]
It's worth double-checking in the second case that nothing has gotten leaked into your final image, because the ARG values are still available to the second stage.
For me, I created a bash file call set-up-cred.sh.
Inside set-up-cred.sh
echo $CRED > cred.txt;
Then, in Dockerfile,
RUN bash set-up-cred.sh;
...
RUN rm cred.txt;
This is for hiding echoing credential variables.

Resources