Scenario
I'm trying to setup a simple docker image (I'm quite new to docker, so please correct my possible misconceptions) based on the public continuumio/anaconda3 container.
The Dockerfile:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -n testenv pip -y \
&& source activate testenv \
&& conda env list
Building and image from this by docker build -t test . ends with the error:
/bin/sh: 1: source: not found
when activating the new virtual environment.
Suggestion 1:
Following this answer I tried:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -y -n testenv pip \
&& /bin/bash -c "source activate testenv" \
&& conda env list
This seems to work at first, as it outputs: prepending /opt/conda/envs/testenv/bin to PATH, but conda env list as well ass echo $PATH clearly show that it doesn't:
[...]
# conda environments:
#
testenv /opt/conda/envs/testenv
root * /opt/conda
---> 80a77e55a11f
Removing intermediate container 33982c006f94
Step 3 : RUN echo $PATH
---> Running in a30bb3706731
/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
The docker files work out of the box as a MWE.
I appreciate any ideas. Thanks!
Using the docker ENV instruction it is possible to add the virtual environment path persistently to PATH. Although this does not solve the selected environment listed under conda env list.
See the MWE:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda create -y -n testenv pip
ENV PATH /opt/conda/envs/testenv/bin:$PATH
RUN echo $PATH
RUN conda env list
Method 1: use SHELL with a custom entrypoint script
EDIT: I have developed a new, improved approach which better than the "conda", "run" syntax.
Sample dockerfile available at this gist. It works by leveraging a custom entrypoint script to set up the environment before execing the arguments of the RUN stanza.
Why does this work?
A shell is (put very simply) a process which can act as an entrypoint for arbitrary programs. exec "$#" allows us to launch a new process, inheriting all of the environment of the parent process. In this case, this means we activate conda (which basically mangles a bunch of environment variables), then run /bin/bash -c CONTENTS_OF_DOCKER_RUN.
Method 2: SHELL with arguments
Here is my previous approach, courtesy of Itamar Turner-Trauring; many thanks to them!
# Create the environment:
COPY environment.yml .
RUN conda env create -f environment.yml
# Set the default docker build shell to run as the conda wrapped process
SHELL ["conda", "run", "-n", "vigilant_detect", "/bin/bash", "-c"]
# Set your entrypoint to use the conda environment as well
ENTRYPOINT ["conda", "run", "-n", "myenv", "python", "run.py"]
Modifying ENV may not be the best approach since conda likes to take control of environment variables itself. Additionally, your custom conda env may activate other scripts to further modulate the environment.
Why does this work?
This leverages conda run to "add entries to PATH for the environment and run any activation scripts that the environment may contain" before starting the new bash shell.
Using conda can be a frustrating experience, since both tools effectively want to monopolize the environment, and theoretically, you shouldn't ever need conda inside a container. But deadlines and technical debt being a thing, sometimes you just gotta get it done, and sometimes conda is the easiest way to provision dependencies (looking at you, GDAL).
Piggybacking on ccauet's answer (which I couldn't get to work), and Charles Duffey's comment about there being more to it than just PATH, here's what will take care of the issue.
When activating an environment, conda sets the following variables, as well as a few that backup default values that can be referenced when deactivating the environment. These variables have been omitted from the Dockerfile, as the root conda environment need never be used again. For reference, these are CONDA_PATH_BACKUP, CONDA_PS1_BACKUP, and _CONDA_SET_PROJ_LIB. It also sets PS1 in order to show (testenv) at the left of the terminal prompt line, which was also omitted. The following statements will do what you want.
ENV PATH /opt/conda/envs/testenv/bin:$PATH
ENV CONDA_DEFAULT_ENV testenv
ENV CONDA_PREFIX /opt/conda/envs/testenv
In order to shrink the number of layers created, you can combine these commands into a single ENV command setting all the variables at once as well.
There may be some other variables that need to be set, based on the package. For example,
ENV GDAL_DATA /opt/conda/envs/testenv/share/gdal
ENV CPL_ZIP_ENCODING UTF-8
ENV PROJ_LIB /opt/conda/envs/testenv/share/proj
The easy way to get this information is to call printenv > root_env.txt in the root environment, activate testenv, then call printenv > test_env.txt, and examine
diff root_env.txt test_env.txt.
Related
I’m preparing a docker image w/ Ubuntu v18.04 for s/w development. I’m including miniconda to manage the development environment, which is all golang. I create the environment with a YAML file:
RUN conda env create --file goDev.yml
I’d also like the conda environment to be activated when the docker is started. That’s a little tricky to do b/c conda has to be initialized first, but JaegerP provides a nice workaround here that involves updating .bashrc (thanks).
Unfortunately, I also need to install a third party YAML package to golang. I have to activate the environment to install the package, so it brings me back to the original problem JaegerP helped me overcome: I can’t activate the environment until its initialized, and I cannot initialize during the docker build b/c I have to restart the shell.
In other words, this works nicely:
RUN conda env create --file goDev.yml
&& rm goDev.yml
&& echo "source /opt/conda/etc/profile.d/conda.sh"
&& echo "conda activate go_dev" >> ${HOME}/.bashrc
The desired conda environment is activated when the docker is started, unfortunately the external YAML package is not installed. This does not work b/c the conda environment can't be activated until it's initialized and initialization requires the shell to be restarted:
RUN conda env create --file goDev.yml
&& rm goDev.yml
&& conda init bash
&& conda activate go_dev
&& go get gopkg.in/yaml.v2
&& echo "source /opt/conda/etc/profile.d/conda.sh"
&& echo "conda activate go_dev" >> ${HOME}/.bashrc
I could update .bashrc further to install the YAML package if this file doesn’t exist:
/root/go/pkg/mod/cache/download/gopkg.in/yaml.v2
Is there a more elegant solution that enables me to install a 3rd party golang package during the docker build instead of checking for it each time the image is run?
Try Conda Run
The conda run function provides a clean way to run code within an environment context without having to manually activate. Try something like
RUN conda env create --file goDev.yml
&& rm goDev.yml
&& conda run -n go_dev go get gopkg.in/yaml.v2
&& echo '. /opt/conda/etc/profile.d/conda.sh && conda activate go_dev' >> ${HOME}/.bashrc
Also, you probably want to end all RUN chains involving Conda with
conda clean -qafy
to keep help minimize Docker image layer size.
This is a very similar question to: Docker build: use http cache
I would like to set up a docker container with a custom conda environment.
The corresponding dockerfile is:
FROM continuumio/miniconda3
WORKDIR /app
COPY . /app
RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate my_env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
My environment is rather large, a minimal version could look like this:
name: my_env
channels:
- defaults
dependencies:
- python=3.6.8=h0371630_0
prefix: /opt/conda
Every time that I make changes to the dependencies, I have to rebuild the image. And that means re-downloading all the packages.
Is it possible to set up a cache somehow?
Interfacing the containerized conda with a cache outside the container probably breaks the idea of containering it in the first place.
But maybe this is still possible somehow ?
With Docker Buildkit there is now a feature for just this, called cache mounts. For the precise Syntax see here. To use this feature, change:
RUN conda env create -f environment.yml
to
RUN --mount=type=cache,target=/opt/conda/pkgs conda env create -f environment.yml
and make sure that Buildkit is enable (eg via export DOCKER_BUILDKIT=1). The cache will persist between runs and will be shared between concurrent builds.
This is a very indirect answer to the question, but it works like a charm for me.
Out of the many dependencies, there is a large subset which never changes. I always need python 3.6, numpy, pandas, torch, ...
So, instead of caching conda, you can cache docker and reuse a base image with those dependencies already installed:
FROM continuumio/miniconda3
WORKDIR /app
COPY environment.yml /app
# install package dependencies
RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate api_neural" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
Then you can add additional config on top of this, in a second dockerfile:
FROM base_deps
# add additional things on top, here I'm running some python in the conda env
RUN /bin/bash -c 'echo $(which python);\
source activate api_neural;\
python -c "import nltk; nltk.download(\"wordnet\"); nltk.download(\"words\")";\
python -m spacy download en;\
python -c "from fastai import untar_data, URLs; model_path = untar_data(URLs.WT103, data=False)"'
Another option is to bind your dev directory to the one on the docker container. Your changes will automatically update the container in this case. You only have to rebuild the image if you actually update any python packages.
docker run -it --mount "type=bind,source=/local/path,target=/container/path" container_name bash
E.g. I for debugging in VS Code my task looks like this:
{
"type": "docker-run",
"label": "docker-run: debug",
// No need to build as we bind our dev environment
// "dependsOn": ["docker-build"],
"python": {
"file": "example.py"
},
"dockerRun": {
"image": "image_name",
"containerName": "container_name",
// first part allows the container to use the hosts display
// second part binds our local dev folder to the container
"customOptions":
"--rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount \"type=bind,source=/local/path,target=/container/path\""
}
},
I am trying to build an image using dockerfile.
The commands in the dockerfile looks something like these:
FROM ubuntu:16.04
:
:
RUN pip3 install virtualenvwrapper
RUN echo '# Python virtual environment wrapper' >> ~/.bashrc
RUN echo 'export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3' >> ~/.bashrc
RUN echo 'export WORKON_HOME=$HOME/.virtualenvs' >> ~/.bashrc
RUN echo 'source /usr/local/bin/virtualenvwrapper.sh' >> ~/.bashrc
After these commands, I will use virtualenvwrapper commands to make some virtualenvs.
If I had only environment variables to deal with in ~/.bashrc, I would have used ARG or ENV to set them up.
But now I also have other shell script files like virtualenvwrapper.sh the will be setting some of their own variables.
Also, RUN source ~/.bashrc is not working (source not found).
What should I do?
You shouldn't try to edit shell dotfiles like .bash_profile in a Dockerfile. There are many common paths that don't go via a shell (e.g., CMD ["python", "myapp.py"] won't launch any sort of shell and won't read a .bash_profile). If you need to globally set an environment variable in an image, use the Dockerfile ENV directive.
For a Python application, you should just install your application into the image's "global" Python using pip install. You don't specifically need a virtual environment; Docker provides a lot of the same isolation capabilities (something you pip install in a Dockerfile won't affect your host system's globally installed packages).
A typical Python application Dockerfile (copied from https://hub.docker.com/_/python/) might look like
FROM python:3
WORKDIR /usr/src/app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "./your-daemon-or-script.py"]
On your last question, source is a vendor extension that only some shells provide; the POSIX standard doesn't require it and the default /bin/sh in Debian and Ubuntu doesn't provide it. In any case since environment variables get reset on every RUN command, RUN source ... (or more portably RUN . ...) is a no-op if nothing else happens in that RUN line.
avoid using ~ => put your bashrc in a specific path
put source bashrc and your command on the same RUN line with ;
the RUN lines are totally independent from each others for the environment
I am trying to run a shell script env_var.sh inside the docker container. The contents of the script is shown below. What its essentially trying is getting the access keys from a specific profile
echo "# Environment for AWS profile dev"
echo export AWS_PROFILE=dev
echo export AWS_ACCESS_KEY_ID=(aws configure get aws_access_key_id --profile dev)
echo export AWS_SECRET_ACCESS_KEY=(aws configure get aws_secret_access_key --profile dev)
echo export AWS_DEFAULT_REGION=(aws configure get region --profile dev)
echo "dev environment variables exported"
and this is my dockerfile
FROM docker:17.04.0-ce
RUN apk update && apk add python && apk add py-pip && apk add bash
RUN pip install pip --upgrade && pip install setuptools --upgrade && pip install awscli && pip install cfdeployment==0.2.3 --extra-index-url https://dn2h7gel4xith.cloudfront.net
VOLUME /tmp/work
VOLUME /root/.aws
ADD test.sh /root/test.sh
ADD aws_env.sh /root/env_var.sh
ADD config /root/.aws/config
ADD credentials /root/.aws/credentials
RUN /root/env_var.sh
ENTRYPOINT ["/root/test.sh", "cfdeployment"]
CMD ["--version"]
The output for RUN /root/env_var.shI am seeing is as below. I don't see access key substituted from the role. Any idea what might be happening
Step 9/11 : RUN /root/aws_env.sh
---> Running in ca46f4c516eb
# Environment for AWS profile ''
export AWS_PROFILE=dev
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=
dev environment variables exported
or is there a different way to set these env variables which picks up the keys from from he profile using docker run command?
Firstly, I think the reason env_var.sh isn't working is because it's missing dollar signs, $( ) ... should be like this:
echo export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile dev)
But regardless of that, the better way to supply environment variables to docker containers is at run time, not baking them into the image. This way the image is de-coupled from configuration... you can change the environment variables w/out re-building, and you can remove all that mess in the Dockerfile where you're copying your local aws configuration into the image. So you would use the docker run -e option:
docker run -e AWS_PROFILE=dev -e "AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile dev)" my-image
See https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file
You can use ENV in your dockerfile to create these variables, setting them individually, e.g.:
ENV AWS_PROFILE=dev
There is another command called ARG, which you can use to set variables that need to be available only on build stage.
When I try to use an environment variable($HOME) that I set in the Dockerfile, in the script that runs at start up, $HOME is not set. If I run printenv in the container, $HOME is set. So I am confused, and not sure what is going on.
I am using the phusion/passenger-customizable image, so that I can run a custom node server via pm2. I need a different version of Node then what is bundled in the node specific passenger image.
Dockerfile
# Simplified
FROM phusion/passenger-customizable:0.9.27
RUN apt-get update && apt-get upgrade -y -o Dpkg::Options::="--force-confold"
# Set environment variables needed for the docker image.
ARG HOME=/opt/var/app
ENV HOME $HOME
# Use baseimage-docker's init process.
CMD ["/sbin/my_init"]
RUN mkdir /etc/service/app
ADD start.sh /etc/service/app/run
RUN chmod a+x /etc/service/app/run
start.sh
echo $HOME
# run some scripts that reference the $HOME directory
What do I need to do to be able to reference a environment variable, set in the Dockerfile, in my start up scripts? Or do I just need to hardcode the paths in that start up script and call it a day?
$HOME is reserved, in some fashion. When running printenv, per #Sebastian, all my other variables where there but not $HOME. I prepended it with the initials of my company and it is working as intended.