I use mlflow in a docker environment as described in this example and I start my runs with mlflow run ..
I get output like this
2019/07/17 16:08:16 INFO mlflow.projects: === Building docker image mlflow-myproject-ab8e0e4 ===
2019/07/17 16:08:18 INFO mlflow.projects: === Created directory /var/folders/93/xt2vz36s7jd1fh9bkhkk9sgc0000gn/T/tmp1lxyqqw9 for downloading remote URIs passed to arguments of type 'path' ===
2019/07/17 16:08:18 INFO mlflow.projects: === Running command 'docker run
--rm -v /Users/foo/bar/mlruns:/mlflow/tmp/mlruns -e
MLFLOW_RUN_ID=ef21de61d8a6436b97b643e5cee64ae1 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 mlflow-myproject-ab8e0e4 python train.py' in run with ID 'ef21de61d8a6436b97b643e5cee64ae1' ===
I would like to mount a docker volume named my_docker_volume to the container
at
the path /data. So instead of the docker run shown above, I would like to
use
docker run --rm --mount source=my_docker_volume,target=/data -v /Users/foo/bar/mlruns:/mlflow/tmp/mlruns -e MLFLOW_RUN_ID=ef21de61d8a6436b97b643e5cee64ae1 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 mlflow-myproject-ab8e0e4 python train.py
I see that I could in principle run it once without mounted volume and then
copy the docker run ... and add --mount source=my_volume,target=/data but
I'd rather use something like
mlflow run --mount source=my_docker_volume,target=/data .
but this obviously doesn't work because --mount is not a parameter for
mlflow run.
What's the recommened way of mounting a docker volume then?
A similar issue has been brought up on the mlflow issue tracker, see "Access large data from within a Docker environment". An excerpt from it says:
However, MLFlow Docker environments currently only have access to data baked into the repository or image or must download a large dataset for each run.
...
A potential solution is to enable the user to mount a volume (e.g. local directory containing the data) into the Docker container.
Looks like this is feature others would benefit from too. Best course of action here would be to contribute support for mounts, or keep track of the issue until someone else implements it.
Why do you need to mount /data folder in the first place? There's another issue, a PR containing a fix related to storing artifacts in a custom location on host machine, could it be something you're looking for?
Finally, to avoid the above problem and facilitate volume mounting, I now run my experiments using three interacting docker containers. One that runs the machine learning code, one that runs an mlflow server and one that runs a postgresql server. I closely followed this walk-through article to set things up. It works nicely and docker-compose makes volume mounting easy. Metrics, parameters and meta data are stored in a database that is mounted to a local persistent volume. Artifacts are logged in the directory /mlflow or if you prefer in a docker volume.
Note: There's a typo in the cited walk-through article
In docker-compose.yml it shouldn't be
volumes:
- ./postgres-store:/var/lib/postgresql/data
which would bind a local folder named postgres-store.
Instead, to mount the docker volume postgres_store, you should use
volumes:
- postgres-store:/var/lib/postgresql/data
Related
I'm working on a project using NodeRed deployed with docker and I would like to save the state of my deployment, including flows, settings and new added modules so that I can save the image and load it on another host replicating exactly the same NodeRed instance.
I created the container using:
docker run -itd --name my-nodered node-red
After implementing the flows and installing some custom modules, with the container running I used this command:
docker commit my-nodered my-project-nodered/my-nodered:version1
docker save my-project-nodered/my-nodered:version1 > tar-archive.tar.gz
And on another machine I'd imported the image using:
docker load < tar-archive.tar.gz
And run it using:
docker run -itd my-project-nodered/my-nodered:version1
And I obtain a vanilla NodeRed docker container with a default /data directory and just the files on the data directory maintained.
What am I missing? It could be possibile that my /data directory is overwrittenm as well as my settings.js file in the home directory? And in this case, which is the best practice to achieve my target?
Thank you a lot in advance
commit will not work, as you can see that there is volume defined in the Dockerfile.
# User configuration directory volume
VOLUME ["/data"]
That makes it impossible to create a derived image with any different content in that directory tree. (This is the same reason you can't create a mysql or postgresql image with prepopulated data.)
docker commit doesn't consider volumes at all, so you'll get an unchanged image with nothing preloaded in it.
You can see the offical documentation
Managing User Data
Once you have Node-RED running with Docker, we need to ensure any
added nodes or flows are not lost if the container is destroyed. This
user data can be persisted by mounting a data directory to a volume
outside the container. This can either be done using a bind mount or a
named data volume.
Node-RED uses the /data directory inside the container to store user
configuration data.
nodered-user-data-in-docker
one way is to restore the your config file on another machine, for example backup-config then
docker run -it -p 1880:1880 -v $PWD/backup-config/:/data --name mynodered nodered/node-red-docker
or if you want to full for some repo then you can try
docker run -it --rm -v "$PWD/$(wget https://raw.githubusercontent.com/openenergymonitor/oem_node-red/master/flows_emonpi.json)":/data/ nodered/node-red-docker
I am using Docker to deploy my ASP.NET Core Web API microservices, and am looking at the options for injecting configuration into each container. The standard way of using an appsettings.json file in the application root directory is not ideal, because as far as I can see, that means building the file into my docker images, which would then limit which environment the image could run in.
I want to build an image once which can they be provided configuration at runtime and rolled through the dev, test UAT and into Production without creating an image for each environment.
Options seem to be:
Providing config via environment variables. Seems a bit tedious.
Somehow mapping a path in the container to a standard location on the host server where appsettings.json sits, and getting the service to pick this up (how?)
May be possible to provide values on the docker run command line?
Does anyone have experience with this? Could you provide code samples/directions, particularly on option 2) which seems the best at the moment?
It's possible to create data volumes in the docker image/container. And also mount a host directory into a container. The host directory will then by accessible inside the container.
Adding a data volume
You can add a data volume to a container using the -v flag with the docker create and docker run command.
$ docker run -d -P --name web -v /webapp training/webapp python app.py
This will create a new volume inside a container at /webapp.
Mount a host directory as a data volume
In addition to creating a volume using the -v flag you can also mount a directory from your Docker engine’s host into a container.
$ docker run -d -P --name web -v /src/webapp:/webapp training/webapp python app.py
This command mounts the host directory, /src/webapp, into the container at /webapp.
Refer to the Docker Data Volumes
We are using other packaging system for now (not docker itself), but still have same issue - package can be deployed in any environment.
So, the way we are doing it now:
Use External configuration management system to hold and manage configuration per environment
Inject to our package the basic environment variables to hold the configuration management system connection details
This way we are not only allowing the package to run in almost any "known" environment, but also run-time configuration management.
When you are running docker, you can use environment variable options of the run command:
$ docker run -e "deep=purple" ...
I have a Docker container which is running some code and creating some HTML reports. I want these reports to be published into a specific directory on the host machine, i.e. at /usr/share/nginx/reports
The way I have gone about doing this is to mount this host directory as a data volume, i.e. docker run -v /usr/share/nginx/reports --name my-container com.containers/my-container
However, when I ssh into the host machine, and check the contents of the directory /usr/share/nginx/reports, I don't see any of the report data there.
Am I doing something wrong?
The host machine is an Ubuntu server, and the Docker container is also Ubuntu, no boot2docker weirdness going on here.
From "Managing data in containers", mounting a host folder to a container would be:
docker run -v /Users/<path>:/<container path>
(see "Use volume")
Using only -v /usr/share/nginx/reports would declare the internal container path /usr/share/nginx/reports as a volume, but would have nothing to do with the host folder.
This is one of the type of mounts available:
The answer to this question is problematic because it varies depending on your operating system and your full requirements. The answer by VonC makes some assumptions that should be addressed and is therefore only correct in some contexts. Other answers on this topic generally ignore the fact that some people are running linux, others windows, and still others are on OSX or other weird OS's.
As VonC mentioned in his answer, in a lot of cases it is possible to bind-mount a host directory straight into the container, using a -v host-path:container-path argument to the docker command (you can also use --volume for added readability or --mount for rocket-science).
One of the biggest problems (in 2020) is the use of the Windows Subsystem for Linux (WSL), where bind-mounting a host volume is fraught with error and may or may not work as expected depending on whether the path mounted is in the linux filesystem or the windows filesystem. VonC's answer was written before WSL became a big problem, but it still makes assumptions about the local filesystem being real rather than mounted into a virtual-machine of some kind.
I have found that a lot of engineers prefer to bypass this unnecessary confusion through the use of docker volumes. A docker volume can be created with the command:
docker volume create <name>
Listed with
docker volume ls
and removed with
docker volume rm <name>
You can mount this by specifying the name of the volume on the left-hand-side of the --volume argument. If your volume was called, for example, 'logs', you could use something like --volume logs:/usr/share/nginx/reports to bind it to the log dir you're interested in. You can view the contents of the directory with something like this:
docker run -it --rm --volume logs:/logs alpine ls -AlF /logs/
This should list the files in that directory. If you have a file called 'nginx.log' for example, you could view it like this:
docker run -it --rm --volume logs:/logs alpine less /logs/nginx.log
And the contents would be paged to your terminal.
You can bind this volume to multiple containers simultaneously if needed. This is useful if, for example, you're writing to your logs with one container, and paging them to a console with another.
If you want to copy the example log file from above into a tmp directory on your local filesystem you can achieve that with:
docker run -it --rm --volume logs:/logs --volume /tmp:/local_tmp alpine cp /logs/nginx.log /local_tmp/
I am using Docker toolbox on windows. I am Working on a Spring Boot Application using Docker. My application writes logs to
users/path/service.log
So when i started my application from host terminal the Log file was successfully updated.
But the same when i did on docker no file was created and neither updated.
So i changed my log file location to match with the Container's Directories
var/log/service.log
I started my container again and my file was updated again.
You can choose any location as long as it matches with the container Directory. Just bash into the container and see what suits you.
Next step is to copy log files from container to host.
So in order to copy those logs to your host. You can use one of two ways i know of-
1- use Volumes in docker
2- use following Docker command to copy file from docker container to host-:
docker cp <containerId>:/file/path/within/container /host/path/target
First, you need to create a directory where you want to share the data
mkdir -p /abc/def/
Now, you need to create a docker volume using the below command. As we see here, we are specifying device as '/abc/def/'
docker volume create --driver local \
--opt type=none \
--opt device=/abc/def/ \
--opt o=bind \
spark-volume
Now, start your container with below command..
docker run -d \
--mount type=volume,dst=/abc/def/,volume-driver=local,volume-opt=type=none,volume-opt=o=bind,volume-opt=device=/opt/spark/ \
--network host \
img:tag
Now, docker container will use /abc/def/ in local Filesystem as its storage and you will have all contents of /abc/def/ in docker container available in Local Filesystem
In your application, if you set a working directory for your php code (report path), the path must be the one on the container. Then docker will copie automaticly copy to your host directory. It wasn't docker mis-configuration, but my application that was writing to the wrong place. Weird at first, but did work in my case.
So, I'm trying to package my WordPress image in a way that all files except the uploads are persisted. In order to do so, I have created my Dockerfile which uses the official WordPress image as its base, and adds the files from an archive (containing all the WordPress files, themes, plugins, etc.), like so:
FROM wordpress
ADD archive.tar.gz /var/www/html/
Since I want the uploads to be persisted, I have created a separate data volume container, e.g. test2.com-wp-data:
docker create -v /var/www/html/wp-content/uploads —name test2.com-wp-data wordpress
Then I simply mount it via —-volumes-from flag:
docker run —name test2.com --volumes-from test2.com-wp-data -d --link test2.com-mysql:mysql myimage
However, when I inspect my newly created container, I cannot find /var/www/html/wp-content/uploads:
# docker inspect -f '{{.HostConfig.VolumesFrom}}' test2.com
[test2.com-wp-data]
# docker inspect -f '{{.Volumes}}' test2.com
map[/var/www/html:/var/lib/docker/vfs/dir/4fff1d36d5aacd0b2c73977acf8fe680bda6fd891f2c4410a90f6c2dca4aaedf]
I can see that both /var/www/html and /var/www/html/wp-content/uploads are set up as volumes in my test2.com-wp-data data container:
# docker inspect -f '{{.Config.Volumes}}' test2.com-wp-data
map[/var/www/html:map[] /var/www/html/wp-content/uploads:map[]]
I know that the wordpress image by default creates a /var/www/html volume, for which I don't really mind, but does that mean that anything that is below that folder is ignored if mounted separately? Will I need to build my own WordPress image in order to have /var/www/html/wp-content/uploads set as a volume in my WordPress container?
Thank you very much for your time!
EDIT: I've tested a different setup with a folder that has nothing to do with /var/www/html, and the result is the same: —-volumes-from is ignored.
Version 1.4 + of docker should be what you need to get this working. Older versions of docker don't seem to play nicely with data-only containers instantiated with "create" rather than "run".
Well, after some further testing I've realised that despite what the documentation indicates, docker create alone is not enough to get a working data volume working. I've only managed to get working data volumes by instantiating them with the docker run command, as follows:
docker run —-name data -v /var/www/html/wp-content/uploads mysql true
This way the container exits immediately, but if I use it to attach the data volume to another container it works as expected.
If anyone knows any specific reason behind this behaviour, I'd be glad to learn more, especially since the documentation seems to be misleading.
Thanks!
EDIT: It turns out I was using Docker 1.3.x, which hadn't implemented this feature yet, hence why the documentation was misleading for me!
I'm slowly working my way through understanding current Docker practices. I'm on a Mac, and I'm using boot2docker.
I've been able to use the docker -v local/directory:container/directory method to link a container directory to my local file system. Great, now I can easily edit things like site code in my local Mac file system and have the changes immediately available to my container (e.g. /var/www/html).
I'm now trying to separate my containers into discrete concerns. For example, a Web, Database, and File (e.g. busybox) container would be useful for a Wordpress site. Thing is, I don't know how to make my file container define volumes that I can then link to my local OS (similar to the -v local/directory:container/directory used by boot2docker).
This is probably not the most eloquent question, as I'm still fumbling through learning Docker, but if you can understand what I'm trying to achieve, I'd really appreciate any guidance provided.
Thanks!
Docker Volumes User Guide
I will use two docker containers for my simple example
marginalized_liskov and plagiarized_engelbart
Mount a Host Directory as a Data Volume (at runtime)
docker run -d -P --name marginalized_liskov -v /host/directory/context:/container/directory/context poop python server.py
marginalized_liskov is the name of the container.
poop is not only my favorite palindrome, but also the name of the volume that we're creating.
"/host/directory/context" is the location on the host that you want to mount
"/container/directory/context" is the location you want your new volume to be created in your container
python is of course the application to run
server.py is the argument provided to "python" for this sample.
Create a Named Volume in a container and mount another container to that volume
docker create -v /poop --name marginalized_liskov training/postgres
docker run -d --volumes-from marginalized_liskov --name plagiarized_engelbart ubuntu
This creates two containers.
marginalized_liskov gets a volume created named poop I built it from the postgres training image because that's what was used in the User Guide. Since we're just setting up a container to contain a data volume and not host applications, using the training/postgres image provides our functionality while remaining lean.
plagiarized_engelbart mounts the volumes from marginalized_liskov with the --volumes-from flag.