conda cache for docker

conda cache for docker - docker

This is a very similar question to: Docker build: use http cache
I would like to set up a docker container with a custom conda environment.
The corresponding dockerfile is:
FROM continuumio/miniconda3
WORKDIR /app
COPY . /app
RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate my_env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
My environment is rather large, a minimal version could look like this:
name: my_env
channels:
- defaults
dependencies:
- python=3.6.8=h0371630_0
prefix: /opt/conda
Every time that I make changes to the dependencies, I have to rebuild the image. And that means re-downloading all the packages.
Is it possible to set up a cache somehow?
Interfacing the containerized conda with a cache outside the container probably breaks the idea of containering it in the first place.
But maybe this is still possible somehow ?

With Docker Buildkit there is now a feature for just this, called cache mounts. For the precise Syntax see here. To use this feature, change:
RUN conda env create -f environment.yml
to
RUN --mount=type=cache,target=/opt/conda/pkgs conda env create -f environment.yml
and make sure that Buildkit is enable (eg via export DOCKER_BUILDKIT=1). The cache will persist between runs and will be shared between concurrent builds.

This is a very indirect answer to the question, but it works like a charm for me.
Out of the many dependencies, there is a large subset which never changes. I always need python 3.6, numpy, pandas, torch, ...
So, instead of caching conda, you can cache docker and reuse a base image with those dependencies already installed:
FROM continuumio/miniconda3
WORKDIR /app
COPY environment.yml /app
# install package dependencies
RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate api_neural" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
Then you can add additional config on top of this, in a second dockerfile:
FROM base_deps
# add additional things on top, here I'm running some python in the conda env
RUN /bin/bash -c 'echo $(which python);\
source activate api_neural;\
python -c "import nltk; nltk.download(\"wordnet\"); nltk.download(\"words\")";\
python -m spacy download en;\
python -c "from fastai import untar_data, URLs; model_path = untar_data(URLs.WT103, data=False)"'

Another option is to bind your dev directory to the one on the docker container. Your changes will automatically update the container in this case. You only have to rebuild the image if you actually update any python packages.
docker run -it --mount "type=bind,source=/local/path,target=/container/path" container_name bash
E.g. I for debugging in VS Code my task looks like this:
{
"type": "docker-run",
"label": "docker-run: debug",
// No need to build as we bind our dev environment
// "dependsOn": ["docker-build"],
"python": {
"file": "example.py"
},
"dockerRun": {
"image": "image_name",
"containerName": "container_name",
// first part allows the container to use the hosts display
// second part binds our local dev folder to the container
"customOptions":
"--rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount \"type=bind,source=/local/path,target=/container/path\""
}
},

Related

Copy file from dockerfile build to host - bandit

I just started learning docker. To teach myself, I managed to containerize bandit (a python code scanner) but I'm not able to see the output of the scan before the container destroys itself. How can I copy the output file from inside the container to the host, or otherwise save it?
Right now i'm just using bandit to scan itself basically :)
Dockerfile
FROM python:3-alpine
WORKDIR /
RUN pip install bandit
RUN apk update && apk upgrade
RUN apk add git
RUN git clone https://github.com/PyCQA/bandit.git ./code-to-scan
CMD [ "python -m bandit -r ./code-to-scan -o bandit.txt" ]

You can mount a volume on you host where you can share the output of bandit.
For example, you can run your container with:
docker run -v $(pwd)/output:/tmp/output -t your_awesome_container:latest
And you in your dockerfile:
...
CMD [ "python -m bandit -r ./code-to-scan -o /tmp/bandit.txt" ]
This way the bandit.txt file will be found in the output folder.

Better place the code in your image not in the root directory.
I did some adjustments to your Dockerfile.
FROM python:3-alpine
WORKDIR /usr/myapp
RUN pip install bandit
RUN apk update && apk upgrade
RUN apk add git
RUN git clone https://github.com/PyCQA/bandit.git .
CMD [ "bandit","-r",".","-o","bandit.txt" ]`
This clones git in your WORKDIR.
Note the CMD, it is an array, so just devide all commands and args as in the Dockerfile about.
I put the the Dockerfile in my D:\test directory (Windows).
docker build -t test .
docker run -v D:/test/:/usr/myapp test
It will generate you bandit.txt in the test folder.
After the code is execute the container exits, as there are nothing else to do.
you can also put --rm to remove the container once it finishs.
docker run --rm -v D:/test/:/usr/myapp test

App relies on sourcing secrets.sh for env variables. How to accomplish this in my Dockerfile?

I'm working on creating a container to hold my running Django app. During development and manual deployment I've been setting environment variables by sourcing a secrets.sh file in my repo. This has worked fine until now that I'm trying to automate my server's configuration environment in a Dockerfile.
So far it looks like this:
FROM python:3.7-alpine
RUN pip install --upgrade pip
RUN pip install pipenv
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /home/appuser/site
COPY . /home/appuser/site
RUN /bin/sh -c "source secrets.sh"
RUN env
I'd expect this to set the environment variables properly but it doesn't. I've also tried adding the variables to my appuser's bashrc, but this doesn't work either.
Am I missing something here? Is there another best practice to set env variables to be accessible by django, without having to check them into the Dockerfile in my repo?

Each RUN step launches a totally new container with a totally new shell; only its filesystem is persisted afterwards. RUN commands that try to start processes or set environment variables are no-ops. (RUN export or RUN service start do absolutely nothing.)
In your setup you need the environment variables to be set at container startup time based on information that isn't available at build time. (You don't want to persist secrets in an image: they can be easily read out by anyone who gets the image later on.) The usual way to do this is with an entrypoint script; this could look like
#!/bin/sh
# If the secrets file exists, read it in.
if [ -f /secrets.sh ]; then
# (Prefer POSIX "." to bash-specific "source".)
. /secrets.sh
fi
# Now run the main container CMD, replacing this script.
exec "$#"
A typical Dockerfile built around this would look like:
FROM python:3.7-alpine
RUN pip install --upgrade pip
WORKDIR /app
# Install Python dependencies, as an early step to support
# Docker layer caching.
COPY requirements.txt ./
RUN pip install -r requirements.txt
# Install the main application.
COPY . ./
# Create a non-root user. It doesn't own the source files,
# and so can't modify the application.
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
# Startup-time metadata.
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["/app/app.py"]
And then when you go to run the container, you'd inject the secrets file
docker run -p 8080:8080 -v $PWD/secrets-prod.sh:/secrets.sh myimage
(As a matter of style, I reserve ENTRYPOINT for this pattern and for single-binary FROM scratch containers, and always use CMD for whatever the container's main process is.)

Docker ROS automatic start of launch file

I developed a few ROS packages and I want to put the packages in a docker container because installing all the ROS packages all the time is tedious. Therefore I created a dockerfile that uses a base ROS image, installed all the necessary dependencies, copied my workspace, built the workspace in the docker container and sourced everything afterward. You can find the docker file here:
FROM ros:kinetic-ros-base
RUN apt-get update && apt-get install locales
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
RUN apt-get update && apt-get install -y \
&& rm -rf /var/likb/apt/lists/*
COPY . /catkin_ws/src/
WORKDIR /catkin_ws
RUN /bin/bash -c '. /opt/ros/kinetic/setup.bash; catkin_make'
RUN /bin/bash -c '. /opt/ros/kinetic/setup.bash; source devel/setup.bash'
CMD ["roslaunch", "master_launch sim_perception.launch"]
The problem is: When I run the docker container wit the "run" command, docker doesn't seem to know that I sourced my new ROS workspace and therefore it cannot launch automatically my launch script. If I run the docker container as bash script with "run -it bash" I can source my workspace again and then roslaunch my .launch file.
So can someone tell me how to write my dockerfile correctly so I launch my .launch file automatically when I run the container? Thanks!

From Docker Docs
Each RUN instruction is run independently and won't effect next instruction so when you run last Line no PATH are saved from ROS.
You need Source .bashrc or every environment you need using source first.
You can wrap everything you want (source command and roslaunch command) inside a sh file then just run that file at the end

If you review the convention of ros_entrypoint.sh you can see how best to source the workspace you would like in the docker. We're all so busy learning how to make docker and ros do the real things, it's easy to skip over some of the nuance of this interplay. This sucked forever for me; hope this is helpful for you.
I looked forever and found what seemed like only bad advice, and in the absence of an explicit standard or clear guidance I've settled into what seems like a sane approach that also allows you to control what launches at runtime with environment variables. I now consider this as the right solution for my needs.
In the Dockerfile for the image you want to set the start/launch behavior;
towards the end; you should use ADD line to insert your own ros_entrypoint.sh (example included); Set it as the ENTRYPOINT and then a CMD to run by default run something when the docker start.
note: you'll (obviously?) need to run the docker build process for these changes to be effective
Dockerfile looks like this:
all your other dockerfile ^^
.....
# towards the end
COPY ./ros_entrypoint.sh /
ENTRYPOINT ["/ros_entrypoint.sh"]
CMD ["bash"]
Example ros_entryppoint.sh:
#!/bin/bash
set -e
# setup ros environment
if [ -z "${SETUP}" ]; then
# basic ros environment
source "/opt/ros/$ROS_DISTRO/setup.bash"
else
#from environment variable; should be a absolute path to the appropriate workspaces's setup.bash
source $SETUP
fi
exec "$#"
Used in this way the docker will automatically source either the basic ros bits... or if you provide another workspace's setup.bash path in the $SETUP environment variable, it will be used in the container.
So a few ways to work with this:
From the command line prior to running docker
export SETUP=/absolute/path/to/the/setup.bash
docker run -it your-docker-image
From the command line (inline)
docker run --env SETUP=/absolute/path/to/the/setup.bash your-docker-image
From docker-compose
service-name:
network_mode: host
environment:
- SETUP=/absolute/path/to/the_workspace/devel/setup.bash #or whatever
command: roslaunch package_name launchfile_that_needed_to_be_sourced.launch
#command: /bin/bash # wake up and do something else

How to configure different dockerfile for development and production

I use docker for development and in production for laravel project. I have slightly different dockerfile for development and production. For example I am mounting local directory to docker container in development environment so that I don't need to do docker build for every change in code.
As mounted directory will only be available when running the docker container I can't put commands like "composer install" or "npm install" in dockerfile for development.
Currently I am managing two docker files, is there any way that I can do this with single docker file and decide which commands to run when doing docker build by sending parameters.
What I am trying to achieve is
In docker file
...
IF PROD THEN RUN composer install
...
During docker build
docker build [PROD] -t mytag .

As a best practice you should try to aim to use one Dockerfile to avoid unexpected errors between different environments. However, you may have a usecase where you cannot do that.
The Dockerfile syntax is not rich enough to support such a scenario, however you can use shell scripts to achieve that.
Create a shell script, called install.sh that does something like:
if [ ${ENV} = "DEV" ]; then
composer install
else
npm install
fi
In your Dockerfile add this script and then execute it when building
...
COPY install.sh install.sh
RUN chmod u+x install.sh && ./install.sh
...
When building pass a build arg to specify the environment, example:
docker build --build-arg "ENV=PROD" ...

UPDATE (2020):
Since this was written 3 years ago, many things have changed (including my opinion about this topic). My suggested way of doing this, is using one dockerfile and using scripts. Please see #yamenk's answer.
ORIGINAL:
You can use two different Dockerfiles.
# ./Dockerfile (non production)
FROM foo/bar
MAINTAINER ...
# ....
And a second one:
# ./Dockerfile.production
FROM foo/bar
MAINTAINER ...
RUN composer install
While calling the build command, you can tell which file it should use:
$> docker build -t mytag .
$> docker build -t mytag-production -f Dockerfile.production .

You can use build args directly without providing additional sh script. Might look a little messy, though. But it works.
Dockerfile must be like this:
FROM alpine
ARG mode
RUN if [ "x$mode" = "xdev" ] ; then echo "Development" ; else echo "Production" ; fi
And commands to check are:
docker build -t app --build-arg mode=dev .
docker build -t app --build-arg mode=prod .

I have tried several approaches to this, including using docker-compose, a multi-stage build, passing an argument through a file and the approaches used in other answers. My company needed a good way to do this and after trying these, here is my opinion.
The best method is to pass the arg through the cmd. You can pass it through vscode while right clicking and choosing build image
Image of visual studio code while clicking image build
using this code:
ARG BuildMode
RUN echo $BuildMode
RUN if [ "$BuildMode" = "debug" ] ; then apt-get update \
&& apt-get install -y --no-install-recommends \
unzip \
&& rm -rf /var/lib/apt/lists/* \
&& curl -sSL https://aka.ms/getvsdbgsh | bash /dev/stdin -v latest -l /vsdbg ; fi
and in the build section of dockerfile:
ARG BuildMode
ENV Environment=${BuildMode:-debug}
RUN dotnet build "debugging.csproj" -c $Environment -o /app
FROM build AS publish
RUN dotnet publish "debugging.csproj" -c $Environment -o /app

The best way to do it is with .env file in your project.
You can define two variables CONTEXTDIRECTORY and DOCKERFILENAME
And create Dockerfile-dev and Dockerfile-prod
This is example of using it:
docker compose file:
services:
serviceA:
build:
context: ${CONTEXTDIRECTORY:-./prod_context}
dockerfile: ${DOCKERFILENAME:-./nginx/Dockerfile-prod}
.env file in the root of project:
CONTEXTDIRECTORY=./
DOCKERFILENAME=Dockerfile-dev
Be careful with the context. Its path starts from the directory with the dockerfile that you specified, not from docker-compose directory.
In default values i using prod, because if you forget to specify env variables, you won't be able to accidentally build a dev version in production
Solution with diffrent dockerfiles is more convinient, then scripts. It's easier to change and maintain

condas `source activate virtualenv` does not work within Dockerfile

Scenario
I'm trying to setup a simple docker image (I'm quite new to docker, so please correct my possible misconceptions) based on the public continuumio/anaconda3 container.
The Dockerfile:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -n testenv pip -y \
&& source activate testenv \
&& conda env list
Building and image from this by docker build -t test . ends with the error:
/bin/sh: 1: source: not found
when activating the new virtual environment.
Suggestion 1:
Following this answer I tried:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -y -n testenv pip \
&& /bin/bash -c "source activate testenv" \
&& conda env list
This seems to work at first, as it outputs: prepending /opt/conda/envs/testenv/bin to PATH, but conda env list as well ass echo $PATH clearly show that it doesn't:
[...]
# conda environments:
#
testenv /opt/conda/envs/testenv
root * /opt/conda
---> 80a77e55a11f
Removing intermediate container 33982c006f94
Step 3 : RUN echo $PATH
---> Running in a30bb3706731
/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
The docker files work out of the box as a MWE.
I appreciate any ideas. Thanks!

Using the docker ENV instruction it is possible to add the virtual environment path persistently to PATH. Although this does not solve the selected environment listed under conda env list.
See the MWE:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda create -y -n testenv pip
ENV PATH /opt/conda/envs/testenv/bin:$PATH
RUN echo $PATH
RUN conda env list

Method 1: use SHELL with a custom entrypoint script
EDIT: I have developed a new, improved approach which better than the "conda", "run" syntax.
Sample dockerfile available at this gist. It works by leveraging a custom entrypoint script to set up the environment before execing the arguments of the RUN stanza.
Why does this work?
A shell is (put very simply) a process which can act as an entrypoint for arbitrary programs. exec "$#" allows us to launch a new process, inheriting all of the environment of the parent process. In this case, this means we activate conda (which basically mangles a bunch of environment variables), then run /bin/bash -c CONTENTS_OF_DOCKER_RUN.
Method 2: SHELL with arguments
Here is my previous approach, courtesy of Itamar Turner-Trauring; many thanks to them!
# Create the environment:
COPY environment.yml .
RUN conda env create -f environment.yml
# Set the default docker build shell to run as the conda wrapped process
SHELL ["conda", "run", "-n", "vigilant_detect", "/bin/bash", "-c"]
# Set your entrypoint to use the conda environment as well
ENTRYPOINT ["conda", "run", "-n", "myenv", "python", "run.py"]
Modifying ENV may not be the best approach since conda likes to take control of environment variables itself. Additionally, your custom conda env may activate other scripts to further modulate the environment.
Why does this work?
This leverages conda run to "add entries to PATH for the environment and run any activation scripts that the environment may contain" before starting the new bash shell.
Using conda can be a frustrating experience, since both tools effectively want to monopolize the environment, and theoretically, you shouldn't ever need conda inside a container. But deadlines and technical debt being a thing, sometimes you just gotta get it done, and sometimes conda is the easiest way to provision dependencies (looking at you, GDAL).

Piggybacking on ccauet's answer (which I couldn't get to work), and Charles Duffey's comment about there being more to it than just PATH, here's what will take care of the issue.
When activating an environment, conda sets the following variables, as well as a few that backup default values that can be referenced when deactivating the environment. These variables have been omitted from the Dockerfile, as the root conda environment need never be used again. For reference, these are CONDA_PATH_BACKUP, CONDA_PS1_BACKUP, and _CONDA_SET_PROJ_LIB. It also sets PS1 in order to show (testenv) at the left of the terminal prompt line, which was also omitted. The following statements will do what you want.
ENV PATH /opt/conda/envs/testenv/bin:$PATH
ENV CONDA_DEFAULT_ENV testenv
ENV CONDA_PREFIX /opt/conda/envs/testenv
In order to shrink the number of layers created, you can combine these commands into a single ENV command setting all the variables at once as well.
There may be some other variables that need to be set, based on the package. For example,
ENV GDAL_DATA /opt/conda/envs/testenv/share/gdal
ENV CPL_ZIP_ENCODING UTF-8
ENV PROJ_LIB /opt/conda/envs/testenv/share/proj
The easy way to get this information is to call printenv > root_env.txt in the root environment, activate testenv, then call printenv > test_env.txt, and examine
diff root_env.txt test_env.txt.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart