How to serve custom MLflow model with Docker? - docker

We have a project following essentially this
docker example with the only difference that we created a custom model similar to this whose code lies in a directory called forecast. We succeeded in running the model with mlflow run. The problem arises when we try to serve the model. After doing
mlflow models build-docker -m "runs:/my-run-id/my-model" -n "my-image-name"
we fail running the container with
docker run -p 5001:8080 "my-image-name"
with the following error:
ModuleNotFoundError: No module named 'forecast'
It seems that the docker image is not aware of the source code defining our custom model class.
With Conda environnement the problem does not arise thanks to the code_path argument in mlflow.pyfunc.log_model.
Our Dockerfile is very basic, with just FROM continuumio/miniconda3:4.7.12, RUN pip install {model_dependencies}.
How to let the docker image know about the source code for deserialising the model and run it?

You can specify source code dependencies by setting
code_paths argument when logging the model. So in your case, you can do something like:
mlflow.pyfunc.log_model(..., code_paths=[<path to your forecast.py file>])

Related

Import an already downloaded SpaCy language model to docker container without new download

I'd like to run multiple spacy language models on various docker containers. I don't want the docker image to contain the line RUN python -m spacy download en_core_web_lg, as other processes might have different language models.
My question is: Is it possible to download multiple spacy language models onto local (i.e. en_core_web_lg, en_core_web_md, ...), and then load these models into the python-spacy environment when the docker container spawns?
This process might have the following steps:
Spawn docker container and bind a volume "language_models/" to the container which contains a number of spacy models.
Run some spacy command such as python -m spacy download --local ./language_models/en_core_web_lg which points at the language model which you want the environment to have.
The hope is that, since the language model already exists on the shared volume, the download/import time is significantly reduced for each new container. Each container also would not have unnecessary language models on it, and the Docker image would not be specific to any language models at all.
There are two ways to do this.
The easier one is to mount a volume in Docker with the model directory and specify it as a path. spaCy lets you call spacy.load("some/path"), so no pip install is required.
If you really need to use pip to install something, you can also download the zipped models and pip install that file. However by default that might involve making a copy of it, reducing benefits. If you unzip the model download and mount that you can use pipe -e (editable), which is usually used for develpoment. I wouldn't recommend this, but if you are using import en_core_web_sm or something and have difficulty refactoring it might be what you want.
Thanks for the comment #polm23! I had an additional layer of complexity since the SpaCy model was ultimately used to train a Rasa model. The solution I've opted for is to save models locally using:
nlp = spacy.load(model)
nlp.to_disk(f'language_models/{model}')
And then make the specific model directory visible to the docker container using a mounted volume. Then, in Rasa anyway, you can import the language model using a local path
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "../../language_models/MODEL_NAME"
recipe: default.v1

Run !docker build from Managed Notebook cell in GCP Vertex AI Workbench

I am trying to push a docker image on Google Cloud Platform container registry to define a custom training job directly inside a notebook.
After having prepared the correct Dockerfile and the URI where to push the image that contains my train.py script, I try to push the image directly in a notebook cell.
The exact command I try to execute is: !docker build ./ -t $IMAGE_URI, where IMAGE_URI is the environmental variable previously defined. However I try to run this command I get the error: /bin/bash: docker: command not found. I also tried to execute it with the magic cell %%bash, importing the subprocess library and also execute the command stored in a .sh file.
Unfortunately none of the above solutions work, they all return the same command not found error with code 127.
If instead I run the command from a bash present in the Jupyterlab it works fine as expected.
Is there any workaround to make the push execute inside the jupyter notebook? I was trying to keep the whole custom training process inside the same notebook.
If you follow this guide to create a user-managed notebook from Vertex AI workbench and select Python 3, then it comes with Docker available.
So you will be able to use Docker commands such as ! docker build . inside the user-managed notebook.
Example:

Docker issues (could not load or find main class)

I am trying to perform a docker run however I keep getting the issue in the terminal which states Error: Could not find or load main class Main.
My Dockerfile is correctly named and the build did run and I can see the image when running docker run
The docker file is below:
FROM openjdk:8
COPY . /src/
WORKDIR /src/
RUN ["javac", "Main.java"]
ENTRYPOINT ["java", "Main"]
Can someone please advise me what is the best approach to take at this point or what I should be looking out for?
Thanks
It sounds like your main class name is not "Main".
after compiling with "javac" java creates a class file named exactly as its main class name. I mean the class that contains the main method.

OAuth - "No module named authlib"

I'm running superset on MacOS in docker and I'm trying to get OAuth working.
I’ve edited the config file /docker/pythonpath_dev/superset_config.py and added the OAuth configuration.
One of the lines I added was
AUTH_TYPE = AUTH_OAUTH
This required me to import the auth types as below:
from flask_appbuilder.security.manager import (
AUTH_OID,
AUTH_REMOTE_USER,
AUTH_DB,
AUTH_LDAP,
AUTH_OAUTH,
)
When I try to start up superset with the following command: docker-compose -f docker-compose-non-dev.yml up
I get the following error:
File "/usr/local/lib/python3.7/site-packages/flask_appbuilder/security/manager.py", line 250, in __init__
from authlib.integrations.flask_client import OAuth
ModuleNotFoundError: No module named 'authlib'
I'm fairly new to docker itself. How do I go about resolving this?
In case anybody else comes across this, the solution was to add the Authlib module to the python env on the docker image.
The process for adding a new python module to the docker image is documented here: https://github.com/apache/superset/blob/master/docker/README.md#local-packages
Quoted below in case that file changes:
If you want to add python packages in order to test things like DBs locally, you can simply add a local requirements.txt (./docker/requirements-local.txt) and rebuild your docker stack.
Steps:
1. Create ./docker/requirements-local.txt
2. Add your new packages
3. Rebuild docker-compose
a. docker-compose down -v
b. docker-compose up
Important was running docker-compose up and not docker-compose -f docker-compose-non-dev.yml up. The latter does not seem to rebuild the docker image.

Selecting different code branches when using a shared base image in Docker

I am containerising a codebase that serves multiple applications. I have created three images;
app-base:
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
...
app-foo:
FROM app-base:latest
RUN foo-specific-setup.sh
and app-buzz which is very similar to app-foo.
This works currently, except I want to be able to build versions of app-foo and app-buzz for specific code branches and versions. It's easy to do that for app-base and tag appropriately, but app-foo and app-buzz can't dynamically select that tag, they are always pinned to app-base:latest.
Ultimately I want this build process automated by Jenkins. I could just dynamically re-write the Dockerfile, or not have three images and just have two nearly-but-not-quite identical Dockerfiles for each app that would need to be kept in sync manually (later increasing to 4 or 5). Each of those solutions has obvious drawbacks however.
I've seen lots of discussions in the past about things such as an INCLUDE statement, or dynamic tags. None seemed to come to anything.
Does anyone have a working, clean(ish) solution to this problem? As long as it means Dockerfile code can be shared across images, I'd be happy. If it also means that the shared layers of images don't need to be rebuilt for each app, then even better.
You could still use build args to do this.
Dockerfile:
FROM ubuntu
ARG APP_NAME
RUN echo $APP_NAME-specific-setup.sh >> /root/test
ENTRYPOINT cat /root/test
Build:
docker build --build-arg APP_NAME=foo -t foo .
Run:
$ docker run --rm foo
foo-specific-setup.sh
In your case you could run the correct script in the RUN using the argument you just set before. You would have one Dockerfile per app-base variant and run the correct set-up based on the build argument.
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
ARG APP_NAME
RUN $APP_NAME-specific-setup.sh
Any layers before setting the ARG would not need to be rebuilt when creating other versions.
You can then push the built images to separate docker repositories for each app.
If your apps need different ENTRYPOINT instructions, you can have an APP_NAME-entrypoint.sh per app and rename it to entrypoint.sh within your APP_NAME-specific-setup.sh (or pass it through as an argument to run).

Resources