how should I persistently save Julia packages in a Docker container - docker

I'm running Julia on the raspberry pi 4. For what I'm doing, I need Julia 1.5 and thankfully there is a docker image of it here: https://github.com/Julia-Embedded/jlcross
My challenge is that, because this is a work-in-progress development I find myself adding packages here and there as I work. What is the best way to persistently save the updated environment?
Here are my problems:
I'm having a hard time wrapping my mind around volumes that will save packages from Julia's package manager and keep them around the next time I run the container
It seems kludgy to commit my docker container somehow every time I install a package.
Is there a consensus on the best way or maybe there's another way to do what I'm trying to do?

You can persist the state of downloaded & precompiled packages by mounting a dedicated volume into /home/your_user/.julia inside the container:
$ docker run --mount source=dot-julia,target=/home/your_user/.julia [OTHER_OPTIONS]
Depending on how (and by which user) julia is run inside the container, you might have to adjust the target path above to point to the first entry in Julia's DEPOT_PATH.
You can control this path by setting it yourself via the JULIA_DEPOT_PATH environment variable. Alternatively, you can check whether it is in a nonstandard location by running the following command in a Julia REPL in the container:
julia> println(first(DEPOT_PATH))
/home/francois/.julia

You can manage the package and their versions via a Julia Project.toml file.
This file can keep both the list of your dependencies.
Here is a sample Julia session:
julia> using Pkg
julia> pkg"generate MyProject"
Generating project MyProject:
MyProject\Project.toml
MyProject\src/MyProject.jl
julia> cd("MyProject")
julia> pkg"activate ."
Activating environment at `C:\Users\pszufe\myp\MyProject\Project.toml`
julia> pkg"add DataFrames"
Now the last step is to provide package version information to your Project.toml file. We start by checking the version number that "works good":
julia> pkg"st DataFrames"
Project MyProject v0.1.0
Status `C:\Users\pszufe\myp\MyProject\Project.toml`
[a93c6f00] DataFrames v0.21.7
Now you want to edit Project.toml file [compat] to fix that version number to always be v0.21.7:
name = "MyProject"
uuid = "5fe874ab-e862-465c-89f9-b6882972cba7"
authors = ["pszufe <pszufe#******.com>"]
version = "0.1.0"
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
[compat]
DataFrames = "= 0.21.7"
Note that in the last line the equality operator is twice to fix the exact version number see also https://julialang.github.io/Pkg.jl/v1/compatibility/.
Now in order to reuse that structure (e.g. different docker, moving between systems etc.) all you do is
cd("MyProject")
using Pkg
pkg"activate ."
pkg"instantiate"
Additional note
Also have a look at the JULIA_DEPOT_PATH variable (https://docs.julialang.org/en/v1/manual/environment-variables/).
When moving installations between dockers here and there it might be also sometimes convenient to have control where all your packages are actually installed. For an example you might want to copy JULIA_DEPOT_PATH folder between 2 dockers having the same Julia installations to avoid the time spent in installing packages or you could be building the Docker image having no internet connection etc.

In my Dockerfile I simply install the packages just like you would do with pip:
FROM jupyter/datascience-notebook
RUN julia -e 'using Pkg; Pkg.add.(["CSV", "DataFrames", "DataFramesMeta", "Gadfly"])'
Here I start with a base datascience notebook which includes Julia, and then call Julia from the commandline instructing it to execute the code needed to install the packages. Only downside for now is that package precompilation is triggered each time I load the container in VS Code.
If I need new packages, I simply add them to the list.

Related

Why are dependencies installed via an ENTRYPOINT and tini?

I have a question regarding an implementation of a Dockerfile on dask-docker.
FROM continuumio/miniconda3:4.8.2
RUN conda install --yes \
-c conda-forge \
python==3.8 \
[...]
&& rm -rf /opt/conda/pkgs
COPY prepare.sh /usr/bin/prepare.sh
RUN mkdir /opt/app
ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]
prepare.sh is just facilitating installation of additional packages via conda, pip and apt.
There are two things I don't get about that:
Why not just place those instructions in the Dockerfile? Possibly indirectly (modularized) by COPYing dedicated files (requirements.txt, environment.yaml, ...)
Why execute this via tini? At the end it does exec "$#" where one can start a scheduler or worker - that's more what I associate with tini.
This way everytime you run the container from the built image you have to repeat the installation process!?
Maybe I'm overthinking it but it seems rather unusual - but maybe that's a Dockerfile pattern with good reasons for it.
optional bonus questions for Dask insiders:
why copy prepare.sh to /usr/bin (instead of f.x. to /tmp)?
What purpose serves the created directory /opt/app?
It really depends on the nature and usage of the files being installed by the entry point script. In general, I like to break this down into a few categories:
Local files that are subject to frequent changes on the host system, and will be rolled into the final image for production release. This is for things like the source code for an application that is under development and needs to be tested in the container. You want these to be copied into the runtime every time the image is rebuilt. Use a COPY in the Dockerfile.
Files from other places that change frequently and/or are specific to the deployment environment. This is stuff like secrets from a Hashicorp vault, network settings, server configurations, etc.... that will probably be downloaded into the container all the time, even when it goes into production. The entry point script should download these, and it should decide which files to get and from where based on environment variables that are injected by the host.
libraries, executable programs (under /bin, /usr/local/bin, etc...), and things that specifically should not change except during a planned upgrade. Usually anything that is installed using pip, maven or some other program that does dependency management, and anything installed with apt-get or equivalent. These files should not be installed from the Dockerfile or from the entrypoint script. Much, much better is to build your base image with all of the dependencies already installed, and then use that image as the FROM source for further development. This has a number of advantages: it ensures a stable, centrally located starting platform that everyone can use for development and testing (it forces uniformity where it counts); it prevents you from hammering on the servers that host those libraries (constantly re-downloading all of those libraries from pypy.org is really bad form... someone has to pay for that bandwidth); it makes the build faster; and if you have a separate security team, this might help reduce the number of files they need to scan.
You are probably looking at #3, but I'm including all three since I think it's a helpful way to categorize things.

Manage apt-get based dependency's on Docker

We have a large code based on c++ in my company and we are trying to move to a microservice infrastructure based on docker.
We have a couple of library in house that help us with things like helper functions and utility that we regularly use in our code. Our idea was to create a base image for developers with this library's already installed and make it available to use it as our "base" image. This will give us the benefit of all our software always using the latest version of our own library's.
My questions is related to the cache system of Docker in relation with CI and external dependencys. Lets say we have a Docker file like this:
FROM ubuntu:latest
# Install External dependencys
RUN apt update && apt install -y\
boost-libs \
etc...
# Copy our software
...
# Build it
...
# Install it
...
If our code changes we can trigger the CI and docker will understand that it can use the cached image that was created before up to the point were it copy's our software. What happens if one of our external dependency's offers a newer version? Will the cached be automatically be invalidated? How can we trigger a CI build in case any of our packages receives a new version?
In essence how do we make sure we are always using the latest packages available for our external dependency's?
Please keep in mind the above Dockerfile is just an example to illustrate we are trying to use other tricks in the playbook such us using a lighter base image (not Ubuntu) and multistage builds to avoid dev-packages in our production containers.
Docker's caching algorithm is fairly simple. It looks at the previous state of the image build, and the string of the command you are running. If you are performing a COPY or ADD, it also looks at a hash of those files being copied. If a previous build is found on the server with the same previous state and command being run, it will reuse the cache.
That means an external change, e.g. pulling packages from an external repository, will not be detected and the cache will be reused instead of rerunning that line. There are two solutions that I've seen to this:
Option 1: Change your command by adding versions to the dependencies. When one of those dependencies changes, you'll need to update your build. This is added work, but also guarantees that you only pull in the versions that you are ready for. That would look like (fixing boost-libs to a 1.5 version number):
# Install External dependencys
RUN apt update && apt install -y\
boost-libs=1.5 \
etc...
Option 2: change a build arg. These are injected as environment variables into the RUN commands and docker sees a change in the environment as a different command to run. That would look like:
# Install External dependencys
ARG UNIQUE_VAR
RUN apt update && apt install -y\
boost-libs \
etc...
And then you could build the above with the following to trigger the cache to be recreated daily on that line:
docker build --build-arg "UNIQUE_VAR=$(date +%Y%m%d)" ...
Note, there's also the option to build without the cache any time you wish with:
docker build --no-cache ...
This would cause the cache to be ignored for all steps (except the from line).

Copying an exe and composing it as a docker image and making it platform independent

I need to create a Docker image, which when run, should install an exe in the specified directory that mentioned in my docker file.
Basically, I need ImageMagick application. The docker file created should be platform independent, say if I ran in windows it should use windows distribution, Linux means Linux distribution. It would be great if it adds an environmental variable in the system. I browsed for the solution, but I couldn't find an appropriate solution.
I know it's a bit late but maybe someone (like me) was still searching.
I ended up using a java-imagemagick docker version from https://hub.docker.com/r/cpaitsupport/java-imagemagick/dockerfile
You can run docker pull cpaitsupport/java-imagemagick to get this docker image to your docker machine.
Now comes the tricky part: as I needed to run the imagemagick inside a docker container for my main app. Now you can COPY the files from cpaitsupport/java-imagemagick to your custom container. Example :
COPY --from=cpaitsupport/java-imagemagick:latest . ./some/dir/imagemagick
now you should have the docker file structure for your custom app and also under some/dir/imagemagick/ the file structure for imagemagick. Here are all ImageMagick relative files (also convert, magic, the libraries etc).
Now if you want to use ImageMagick in your Code you need to setup some ENV variables to your docker container with the "new" path to the ImageMagick directory. Example:
IM4JAVA_TOOLPATH=/some/dir/imagemagick/usr/bin \
LD_LIBRARY_PATH=/usr/lib:/some/dir/imagemagick/usr/lib \
MAGICK_CONFIGURE_PATH=/some/dir/imagemagick/etc/ImageMagick-7 \
MAGICK_CODER_MODULE_PATH=/some/dir/imagemagick/usr/lib/ImageMagick-7.0.5/modules-Q16HDRI/coders \
MAGICK_HOME=/some/dir/imagemagick/usr
Now delete (in Java Code) ProcessStarter.setGlobalSearchPath(imPath); this part if it is set. So you can use the IM4JAVA_TOOLPATH.
Now the ConvertCmd cmd = new ConvertCmd(); and cmd.run(op); should be working.
Maybe it's not the best way but worked for me and I was struggling a lot.
Hope this helps!
Feel free to correct or add additional info.
You can install (extract files) to the external hosting system using docker mount or volumes -
however you can not change system setting by updating environment variables of the hosting system from inside of the containers.

dockerfile: vim (compiled python), vim-ipython, and ipython notebook

I would like to build a Dockerfile in linux which
1. compiles vim with python
2. installs python stack (such as numpy, scipy, ipython, etc)
3. creates ssl certificate for ipython-notebook, to view the notebooks on host machine
It seemed straightforward enough. But I have run into problems despite a variety of approaches, such as linking separate containers, using anaconda, as well as with a single unified image vs separate layers, or creating a user or running all as a root.
In order to run vim, simply installing to root, does not activate pathogen bundle/vim-ipython. Creating a user allows pathogen bundles (ie nerdtree works) to install, but :IPython throws error.
:IPython failed
^-- failed '' not found .
Ive tried the above with no layers/1 large Dockerfile, and with different layers for the python stack, vim, and the ipython notebook.
Dockerfile
What am I not seeing here ?
what does the ^-- failed '' not found referring to?
Ive tried running the ipython notebook using --no-browser & and then running vim, or using running two shells on the same container... but cant get past this error.
Here is a working Dockerfile for anyone trying to get vim-ipython working in Docker.
issues:
user/shared home needed to for vim, despite runtimepath in .vimrc to pathogen/bundle
%connect_info >> required with containers
I am running in root, not sure why vim required a USER to install packages, but changing to USER would throw errors with CMD
--best

how to build a full df command in coreutils in openwrt?

I need a full df command in openwrt, and I know it was in coreutils, now I run the make in openwrt to build the coreutils, it seems that it built everything except df, so how could I modify the Makefile to build the df? Many thanks!
Generally speaking, it goes like this
Check out the feeds repository to your local machine, see this place for URLs http://wiki.openwrt.org/doc/devel/feeds
modify which ever packages you want
edit the feeds.conf or the feeds.conf.default to source the feeds from the local copy
Use ./scripts/feeds update|install to update and then install such a package,
make menuconfig to select for building

Resources