not able to dockerize mlflow - docker

while dockerizing mlflow , only .trash is getting created
beacuse of that in mlflow ui , getting error as "no experiments exists"
dockerfile
FROM python:3.7.0
RUN pip install mlflow==1.0.0
WORKDIR /data
EXPOSE 5000
CMD mlflow server \
--backend-store-uri /data/ \
--default-artifact-root /data/ \
--host 0.0.0.0
docker compose :
mlflow:
# builds track_ml Dockerfile
build:
context: ./mlflow_dockerfile
expose:
- "5000"
ports:
- "5000:5000"
volumes:
- ./data:/data

You can use this Dockerfile, Taken from mlflow-workshop which is more generic and support different ENV to debug and working with different version.
By default it will store the artifacts and files inside /opt/mlflow. It's possible to define the following variables:
MLFLOW_HOME (/opt/mlflow)
MLFLOW_VERSION (0.7.0)
SERVER_PORT (5000)
SERVER_HOST (0.0.0.0)
FILE_STORE (${MLFLOW_HOME}/fileStore)
ARTIFACT_STORE (${MLFLOW_HOME}/artifactStore)
Dockerfile
FROM python:3.7.0
LABEL maintainer="Albert Franzi"
ENV MLFLOW_HOME /opt/mlflow
ENV MLFLOW_VERSION 0.7.0
ENV SERVER_PORT 5000
ENV SERVER_HOST 0.0.0.0
ENV FILE_STORE ${MLFLOW_HOME}/fileStore
ENV ARTIFACT_STORE ${MLFLOW_HOME}/artifactStore
RUN pip install mlflow==${MLFLOW_VERSION} && \
mkdir -p ${MLFLOW_HOME}/scripts && \
mkdir -p ${FILE_STORE} && \
mkdir -p ${ARTIFACT_STORE}
COPY scripts/run.sh ${MLFLOW_HOME}/scripts/run.sh
RUN chmod +x ${MLFLOW_HOME}/scripts/run.sh
EXPOSE ${SERVER_PORT}/tcp
VOLUME ["${MLFLOW_HOME}/scripts/", "${FILE_STORE}", "${ARTIFACT_STORE}"]
WORKDIR ${MLFLOW_HOME}
ENTRYPOINT ["./scripts/run.sh"]
scripts/run.sh
#!/bin/sh
mlflow server \
--file-store $FILE_STORE \
--default-artifact-root $ARTIFACT_STORE \
--host $SERVER_HOST \
--port $SERVER_PORT
Launch MLFlow Tracking Docker
docker build -t my_mflow_image .
docker run -d -p 5000:5000 --name mlflow-tracking my_mflow_image
Run trainings
Since we have our MLflow Tracking docker exposed at 5000, we can log
executions by setting the env variable MLFLOW_TRACKING_URI.
MLFLOW_TRACKING_URI=http://localhost:5000 python example.py
Also, better to remove - ./data:/data on first run, debug with out mount, and the suggest dockerfile you might need to mount different path that is mentioned in ENV based on your need.

Here is a link to Github where I put MLflow in a docker that uses azurite in the background to also pull the models later from it.
As a short notification, you need to give your script how ever you execute it the address where it should save the artifacts. You can do this with .env files or set these things manually.
set MLFLOW_TRACKING_URI=http://localhost:5000
Important is to also give these information not only your docker but also the script for the model training ;)
Here you can find a complete tutorial how to use MLflow and SKlearn together in different theoretical szenarios since it is also a bit tricky later on.
I hope you get enough inspiration how to use it.

Related

How to check in Docker whether a volume exists and is not empty and run different docker-compose.yml depending on this?

I'm deploying an application with a Dockerfile and docker-compose. It loads a model from an AWS bucket to run the application. When the containers get restarted (not intentionally but because of the cloud provider), it loads again the model from AWS. What I would like to achive is storing the model on a persistent volume. In case of a restart, I would like to check whether the volume exists and is not empty and if so run a different docker-compose file which has a different bash command, not loading the model from AWS again.
This is part of my docker-compose.yml:
rasa-server:
image: rasa-bot:latest
working_dir: /app
build:
context: ./
dockerfile: Dockerfile
volumes:
- ./models /app/models
command: bash -c "rasa run --model model.tar.gz --remote-storage aws --endpoints endpoints.yml --credentials credentials.yml --enable-api --cors \"*\" --debug --port 5006"
In case of a restart the command would look like this:
command: bash -c "rasa run --model model.tar.gz --endpoints endpoints.yml --credentials credentials.yml --enable-api --cors \"*\" --debug --port 5006"
Note that this
--remote-storage aws
was removed.
This is my Dockerfile:
FROM python:3.7.7-stretch AS BASE
RUN apt-get update \
&& apt-get --assume-yes --no-install-recommends install \
build-essential \
curl \
git \
jq \
libgomp1 \
vim
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip
RUN pip install rasa==3.1
RUN pip3 install boto3
ADD . .
I know that I can use this:
docker volume ls
to list volumes. But I do not know how to wrap this in a if condition to check whether
- ./models /app/models
exists and is not empty and if it is not empty run a second docker-compose.yml which contains the second modified bash command.
I would accomplish this by making the main container command actually be a script that looks to see if the file exists and optionally fills in the command line argument.
#!/bin/sh
MODEL_EXISTS=$(test -f /app/models/model.tar.gz && echo yes)
exec rasa run \
--model model.tar.gz \
${MODEL_EXISTS:---remote-storage aws} \
--endpoints endpoints.yml \
...
The first line uses the test(1) shell command to see if the file already exists, and sets the variable MODEL_EXISTS to yes if it exists and empty if it does not. Then in the command, there is a shell parameter expansion: if the variable MODEL_EXISTS is unset or empty :- then expand and split the text --remote-storage aws. (This approach inspired by BashFAQ/050.)
In your Dockerfile, COPY this script into your image and make it be the default CMD. It needs to be executable like any other script (run chmod +x on your host and commit that change to source control); since it is executable and begins with a "shebang" line, you do not need to explicitly name the shell when you run it.
...
COPY rasa-run ./
CMD ["./rasa-run"]
In your Compose file, you do not need to override the command:, change the working_dir: from what the Dockerfile sets, or change from a couple of Compose-provided defaults. You should be able to reduce this to
version: '3.8'
services:
rasa-server:
build: .
volumes:
- ./models:/app/models
More generally for this class of question, I might suggest:
Prefer setting a Dockerfile default CMD to a Compose command: override; and
Write out non-trivial logic in a script and run that script as the main container command; don't write complicated conditionals in an inline CMD or command:.
You could have an if statement in your bash command to use AWS or not depending on the result you get from docker volume ls
using -f name= you can filter based on the volume name and then you can check if it's not null and run a different command.
Note that this command is just an example and I have no idea if it works or not as I don't use bash everyday.
command: bash -c "
VOLUME = docker volume ls -f name=FOO
if [ -z "$VOLUME" ];
then
rasa run --model model.tar.gz --remote-storage aws --endpoints endpoints.yml --credentials credentials.yml --enable-api --cors \"*\" --debug --port 5006
else
rasa run --model model.tar.gz --endpoints endpoints.yml --credentials credentials.yml --enable-api --cors \"*\" --debug --port 5006
fi
"

Exposing Docker Volumes to Nginx

I'm trying to connect a Json file which resides in a docker volume of the following container to my main docker container which is running a django project.
Since I am using Caprover my Docker Compose options are very limited.
So Docker Composer is not really an option. I want to instead just expose the json file over the web with a link.
Something like domain.com/folder/jsonfile.json
Can somebody tell me if this is possible inside this dockerfile?
The image I am using is crucial to the container so can I just add an nginx image or do I need any other changes to make this work?
Or is nginx not even necessary?
FROM ubuntu:devel
ENV TZ=Etc/UTC
ARG APP_HOME=/app
WORKDIR ${APP_HOME}
ENV DEBIAN_FRONTEND=noninteractive
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime
RUN echo $TZ > /etc/timezone
RUN apt-get update && apt-get upgrade -y
RUN apt-get install gnumeric -y
RUN mkdir -p /etc/importer/data
RUN mkdir /voldata
COPY config.toml /etc/importer/
COPY datasets/* /etc/importer/data/
VOLUME /voldata
COPY importer /usr/bin/
RUN chmod +x /usr/bin/importer
COPY . ${APP_HOME}
CMD sleep 999d
Using the same volume in 2 containers
docker-compose:
volumes:
shared_vol:
services:
service1:
volumes:
- 'shared_vol:/path/to/file'
service2:
volumes:
- 'shared_vol:/path/to/file'
the mechanism above replaces the volumes_from since v3, but this works for v2 as well:
volumes:
shared_vol:
services:
service1:
volumes:
- 'shared_vol:/path/to/file'
service2:
volumes_from:
- service1
If you want to avoid unintentional altering add :ro for readonly to the target service:
service1:
volumes:
- 'shared_vol:/path/to/file'
service2:
volumes:
- 'shared_vol:/path/to/file:ro'
http-server
Surely you can provide the file via http (or other protocol). There are two oppertunities:
Including a http-service to your container (quite easy depending on what is already given in the container) e.g. using nodejs you can use this https://www.npmjs.com/package/http-server very easy. Size doesn't matter? So just install:
RUN apt-get install -y nodejs npm
RUN npm install -g http-server
EXPOSE 8080
CMD ["http-server", "--cors", "-p8080", "/path/to/your/json"]
docker-compose (Runs per default on 8080, so open this):
existing_service:
ports:
- '8080:8080'
Run a stand alone http-server (nginx, apache httpd,..) in another container, but then you depend again on using the same volume for two services, so for local solutions quite an overkill.
Base image
If you don't have good reasons i'll would never use something like :devel, :rolling or :latest as base image. Stick to a LTS version instead like ubuntu:22.04
Testing for http-server
Dockerfile
FROM ubuntu:20.04
ENV TZ=Etc/UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update
RUN apt-get install -y nodejs npm
RUN npm install -g http-server#13.1.0 # Issue with JSON-File in V14: https://github.com/http-party/http-server/issues/634
COPY ./test.json ./usr/wwwhttp/test.json
EXPOSE 8080
CMD ["http-server", "--cors", "-p8080", "/usr/wwwhttp/"]
# docker build -t test/httpserver:latest .
# docker run -p 8080:8080 test/httpserver:latest
Disclaimer:
I am not that familiar with node-docker-images, this is just to give a quick working solution and go on from there. I'm not using nodeJS in production, but I'm sure it can be optimized from being fat to.. well.. being rather fat. But for quick prototyping size doesn't matter.
If you want to just have two containers access the same file, just use a volume with --mount.

How to update source code without rebuilding image each time?

Is there a way to avoid rebuilding my Docker image each time I make a change in my source code ?
I think I have already optimize my Dockerfile enough to decrease building time, but it's always 2 commands and some waiting time for sometimes just one line of code added. It's longer than a simple CTRL + S and check the results.
The commands I have to do for each little update in my code:
docker-compose down
docker-compose build
docker-compose up
Here's my Dockerfile:
FROM python:3-slim as development
ENV PYTHONUNBUFFERED=1
COPY ./requirements.txt /requirements.txt
COPY ./scripts /scripts
EXPOSE 80
RUN apt-get update && \
apt-get install -y \
bash \
build-essential \
gcc \
libffi-dev \
musl-dev \
openssl \
wget \
postgresql \
postgresql-client \
libglib2.0-0 \
libnss3 \
libgconf-2-4 \
libfontconfig1 \
libpq-dev && \
pip install -r /requirements.txt && \
mkdir -p /vol/web/static && \
chmod -R 755 /vol && \
chmod -R +x /scripts
COPY ./files /files
WORKDIR /files
ENV PATH="/scripts:/py/bin:$PATH"
CMD ["run.sh"]
Here's my docker-compose.yml file:
version: '3.9'
x-database-variables: &database-variables
POSTGRES_DB: ${POSTGRES_DB}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
ALLOWED_HOSTS: ${ALLOWED_HOSTS}
x-app-variables: &app-variables
<<: *database-variables
POSTGRES_HOST: ${POSTGRES_HOST}
SPOTIPY_CLIENT_ID: ${SPOTIPY_CLIENT_ID}
SPOTIPY_CLIENT_SECRET: ${SPOTIPY_CLIENT_SECRET}
SECRET_KEY: ${SECRET_KEY}
CLUSTER_HOST: ${CLUSTER_HOST}
DEBUG: 0
services:
website:
build:
context: .
restart: always
volumes:
- static-data:/vol/web
environment: *app-variables
depends_on:
- postgres
postgres:
image: postgres
restart: always
environment: *database-variables
volumes:
- db-data:/var/lib/postgresql/data
proxy:
build:
context: ./proxy
restart: always
depends_on:
- website
ports:
- 80:80
- 443:443
volumes:
- static-data:/vol/static
- ./files/templates:/var/www/html
- ./proxy/default.conf:/etc/nginx/conf.d/default.conf
- ./etc/letsencrypt:/etc/letsencrypt
volumes:
static-data:
db-data:
Mount your script files directly in the container via docker-compose.yml:
volumes:
- ./scripts:/scripts
- ./files:/files
Keep in mind you have to use a prefix if you use a WORKDIR in your Dockerfile.
Quickly answer
Is there a way to avoid rebuilding my Docker image each time I make a change in my source code ?
If your app needs a build step, you cannot skip it.
In your case, you can install the requirements before the python app, so on each source code modification, you just need to run your python app, not the entire stack: postgress, proxy, etc
Docker purpose
The main docker goal or feature is to enable developers to package applications into containers which are easy to deploy anywhere, simplifying your infrastructure.
So, in this sense, docker is not strictly for the developer stage. In the developer stage, the programmer should use an specialized IDE (eclipse, intellij, visual studio, etc) to create and update the source code. Also some languages like java, c# and frameworks like react/ angular needs a build stage.
These IDEs has features like hot reload (automatic application updates when source code change), variables & methods auto-completion, etc. These features achieve to reduce the developer time.
Docker for source code changes by developer
Is not the main goal but if you don't have an specialized ide or you are in a very limited developer workspace(no admin permission, network restrictions, windows, ports, etc ), docker can rescue you
If you are a java developer (for instance), you need to install java on your machine and some IDE like eclipse, configure the maven, etc etc. With docker, you could create an image with all the required techs and the establish a kind of connection between your source code and the docker container. This connection in docker is called Volumes
docker run --name my_job -p 9000:8080 \
-v /my/python/microservice:/src \
python-workspace-all-in-one
In the previous example, you could code directly on /my/python/microservice and you only need to enter into my_job and run python /src/main.py. It will work without python or any requirement on your host machine. All will be in python-workspace-all-in-one
In case of technologies that need a build process: java & c#, there is a time penalty because, the developer should perform a build on any source code change. This is not required with the usage of specialized ide as I explained.
I case of technologies who not require build process like: php, just the libraries/dependencies installation, docker will work almost the same as the specialized IDE.
Docker for local development with hot-reload
In your case, your app is based on python. Python don't require a build process. Just the libraries installation, so if you want to develop with python using docker instead the classic way: install python, execute python app.py, etc you should follow these steps:
Don't copy your source code to the container
Just pass the requirements.txt to the container
Execute the pip install inside of container
Run you app inside of container
Create a docker volume : your source code -> internal folder on container
Here an example of some python framework with hot-reload:
FROM python:3
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app
RUN pip install -r requirements.txt
CMD [ "mkdocs", "serve", "--dev-addr=0.0.0.0:8000" ]
and how build as dev version:
docker build -t myapp-dev .
and how run it with volumes to sync your developer changes with the container:
docker run --name myapp-dev -it --rm -p 8000:8000 -v $(pwd):/usr/src/app mydocs-dev
As a summary, this would be the flow to run your apps with docker in a developer stage:
start the requirements before the app (database, apis, etc)
create an special Dockerfile for development stage
build the docker image for development purposes
run the app syncing the source code with container (-v)
developer modify the source code
if you can use some kind of hot-reload library on python
the app is ready to be opened from a browser
Docker for local development without hot-reload
If you cannot use a hot-reload library, you will need to build and run whenever you want to test your source code modifications. In this case, you should copy the source code to the container instead the synchronization with volumes as the previous approach:
FROM python:3
RUN mkdir -p /usr/src/app
COPY . /usr/src/app
WORKDIR /usr/src/app
RUN pip install -r requirements.txt
RUN mkdocs build
WORKDIR /usr/src/app/site
CMD ["python", "-m", "http.server", "8000" ]
Steps should be:
start the requirements before the app (database, apis, etc)
create an special Dockerfile for development stage
developer modify the source code
build
docker build -t myapp-dev.
run
docker run --name myapp-dev -it --rm -p 8000:8000 mydocs-dev

Docker compose mapping local directory to dockerfile volume

I'm using an Apache / MySql Docker-compose set up which is all good. However the issue comes when, as this is for local development, the web container points to a local folder, for which I need Apache to have permissions to.
Using
RUN mkdir /www \
&& chown -R apache:apache /www
VOLUME ["/www"]
is fine if I run the Apache dockerfile by itself or if I run it in docker-compose without specifying a volume. But this means that I can't point that volume at a local directory, in this scenario "www" exists inside the container but doesn't map to the host machine. If I specify a volume inside the docker-compose file then it maps as expected but doesn't allow me to CHOWN the folder / files (even if I exec into the container)
Below is a proof of concept, I'm running on Windows 10 / Docker Desktop Community Version 2.0.0.0-win81 (29211)
EDIT (commented exposing the port, built the dockerfile from docker-compose and changed the port to 80 from 81)
EDIT (I've updated the following files, see bottom, I'm leaving these for posterity)
docker-compose.yml
version: '3.2'
services:
web:
restart: always
build:
context: .
ports:
- 80:80
volumes:
- ./:/www
Dockerfile
FROM centos:centos6 as stage1
RUN yum -y update && yum clean all \
&& yum --setopt=tsflags=nodocs install -y yum-utils \
httpd \
php
FROM stage1 as stage2
RUN mkdir /www \
&& chown -R apache:apache /www
#VOLUME ["/www"]
#EXPOSE 80
ENTRYPOINT ["/usr/sbin/httpd", "-D", "FOREGROUND"]
UPDATED Proof of concept files
Docker-compose.yml
version: '3.2'
services:
web:
build:
context: .
ports:
- 80:80
volumes:
- ./:/www
Dockerfile
FROM centos:centos6
RUN yum -y update && yum clean all \
&& yum --setopt=tsflags=nodocs install -y yum-utils \
httpd \
php
COPY ./entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
entrypoint.sh
#!/bin/bash
set -e #exit straight away if there's an issue
chown -R apache:apache /www
# Apache
/usr/sbin/httpd -D FOREGROUND
Docker for Windows uses a CIFS/Samba network file share to bind-mount host files into the Linux VM running docker. That is always done as root:root so all bind-mount files/dirs will always show that when seen from inside container. This is a known limitation of the way docker shares these files between the OS's.
Workarounds:
In many cases, this isn't an issue. The host files are shared into the container world-readable, so local app development while running in the container is fine. For cache files, user uploads, etc. just be sure they are written into a container path that isn't to the host-bind mount, so they stay in Linux where you can control the perms.
If needed, for development only, run the app in the container as root if it needs write permissions to host OS files. You can override this at runtime: e.g. docker run -u root or user:root in docker-compose.yml
For working with database files, don't bind-mount them, but use named volumes to keep the files in the Linux VM. You can always use docker cp to copy files in and out of volumes for a quick backup.
You're using
RUN mkdir /www \
&& chown -R apache:apache /www
Prior to docker-compose mapping the local . directory to www.
You need to create a file entrypoint.sh or similar. Give it a shbang. And inside that you should run chown -R apache:apache /www. You do not need the mkdir as that's created by docker compose volume config ./:/www.
After that command in your entrypoint.sh file you should add in what you currently have for your entrypoint /usr/sbin/httpd -D FOREGROUND.
Then finally you of course need to set your new entrypoint to use the entrypoint.sh file ENTRYPOINT ["/entrypoint.sh"]

Docker fedora hbase JAVA_HOME issue

My dockerfile on fedora 22
FROM java:latest
ENV HBASE_VERSION=1.1.0.1
RUN groupadd -r hbase && useradd -m -r -g hbase hbase
USER hbase
ENV HOME=/home/hbase
# Download'n extract hbase
RUN cd /home/hbase && \
wget -O - -q \
http://apache.mesi.com.ar/hbase/${HBASE_VERSION}/hbase-${HBASE_VERSION}-bin.tar.gz \
| tar --strip-components=1 -zxf -
# Upload local configuration
ADD ./conf/ /home/hbase/conf/
USER root
RUN chown -R hbase:hbase /home/hbase/conf
USER hbase
# Prepare data volumes
RUN mkdir /home/hbase/data
RUN mkdir /home/hbase/logs
VOLUME /home/hbase/data
VOLUME /home/hbase/logs
# zookeeper
EXPOSE 2181
# HBase Master API port
EXPOSE 60000
# HBase Master Web UI
EXPOSE 60010
# Regionserver API port
EXPOSE 60020
# HBase Regionserver web UI
EXPOSE 60030
WORKDIR /home/hbase
CMD /home/hbase/bin/hbase master start
As I understand when I set "FROM java:latest" my current dockerfile overlays on that one, so JAVA_HOME must be setted as it is in java:latest? Am I right? This Dockerfile is builded, but when I "docker run" image, It shows "JAVA_HOME not found" error. How can I properly set it up?
use the ENV directive, something like ENV JAVA_HOME /abc/def the doc https://docs.docker.com/reference/builder/#env
add to ~./bashrc (or for global /etc/bashrc:
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH

Resources