SageMaker deploy error "serve" executable file not found in $PATH

SageMaker deploy error "serve" executable file not found in $PATH - docker

In Amazon SageMaker, I'm trying to deploy a custom created Docker container with a Scikit-Learn model, but deploying keeps giving errors.
These are my steps:
On my local machine created a script (script.py) and splitted training and test data. The script contains a main section, accepts parameters 'output-train-dir', 'model-dir', 'train' and 'test', and contains the functions model_fn, input_fn, output_fn and predict_fn
Tested the script locally, which worked
python script.py --train . --test . --model-dir .
Created a Docker image based on the default Python image (Python 3.9) and push to Amazon ECR, below are the commands I've used
> docker pull python
create Dockerfile, containing
FROM python:3.9
RUN pip3 install --no-cache scikit-learn numpy pandas joblib sagemaker-training
> docker build -t mymodel .
> aws ecr create-repository --repository-name mymodel
> docker tag 123456789012 123456789123.dkr.ecr.eu-central-1.amazonaws.com/mymodel
> docker push 123456789123.dkr.ecr.eu-central-1.amazonaws.com/mymodel
Uploaded the training and test data to s3 (mybucket)
Trained the script with local modus
aws_sklearn = SKLearn(entry_point='script.py',
framework_version='0.23-1',
image_uri='123456789123.dkr.ecr.eu-central-1.amazonaws.com/mymodel',
instance_type='local',
role=role)
aws_sklearn.fit({'train': mybucket_train_path, 'test': mybucket_test_path, 'model-dir': mybucket_model_path})
which was successful
Next I trained on AWS
aws_sklearn = SKLearn(entry_point='script.py',
framework_version='0.23-1',
image_uri='123456789123.dkr.ecr.eu-central-1.amazonaws.com/mymodel',
instance_type='ml.m4.xlarge',
role=role)
aws_sklearn.fit({'train': mybucket_train_path, 'test': mybucket_test_path})
which also was successful (however, providing the model-dir paramater gave errors, so I omitted it)
deploying however gave an error:
aws_sklearn_predictor = aws_sklearn.deploy(instance_type='ml.t2.medium',
initial_instance_count=1)
Error message:
UnexpectedStatusException: Error hosting endpoint
mymodel-2021-01-24-12-52-02-790: Failed. Reason: The primary
container for production variant AllTraffic did not pass the ping
health check. Please check CloudWatch logs for this endpoint..
And Cloudwatch said:
AWS sagemaker exec: "serve": executable file not found in $PATH
I somewhere read that I should add RUN chmod +x /opt/program/serve to the Dockerfile, but in my local image, there is no serve file present, this is something that SageMaker creates, right ?
How or where should I add serve to the $PATH environment variable or grant execute rights to the serve script ?

The serve file isn't something SageMaker creates automatically; you have to have it be part of the Docker container. This is technically true for the Estimator job too (there should be a similar train file as well; however you overwrite this by manually specifying an entry_point).
This page should help explain what SageMaker is actually trying to run when you run training and batch_transform jobs. That page references this repo which you can use as an example.
In short, if you want to continue to use your custom docker container, you'll have to build in functionality for the serve command (See additional scripts in the repo for launching the gunicorn server which runs multiple instances of the Flask app) and add those files to your Dockerfile.
The RUN chmod +x /opt/program/serve command will also make more sense after you've added that serve command functionality.

Related

Cloud Run error: Container failed to start. Running a background task without exposing a PORT or URL

I am facing the issue
(gcloud.run.deploy) Cloud Run error: Container failed to start. Failed
to start and then listen on the port defined by the PORT environment
variable. Logs for this revision might contain more information.
There are a few post with this error but I couldn't find my particular case.
I am running a background task, nothing to expose, it connects to firebase process some data and store it back. I wanted this process to run on a container on Cloud Run so I made it a container, which runs perfectly locally, but when uploading it to CR it fails with the above error.
I tried to expose 8080 on dockerfile and a few more things but if you try to connect to they container it has no server running to connect to. It is a batch task.
Can anyone tell me if it is possible at all to upload this type of tasks to Cloud Run, I do not know how to solve the issue. I wouldnt believe google requires a server running on the container to allow it, I saw some posts with dev pulling an nginx on the image so they can expose the port but this would be totally unnecessary in my case.
Thanks for your advice
UPDATE
Cloud Logging: The error simply say there was a fail to start the container, which is funny because the container starts and also shows some logs like if it were working but then it stops.
Build on MAC yes.
DockerFile is pretty simple.
FROM openjdk:11
ENV NOTIFIER_HOME /opt/app/
ENV NOTIFIER_LOGS /opt/notifications/logs/
RUN mkdir -p $NOTIFIER_HOME RUN mkdir -p $NOTIFIER_LOGS
RUN apt update
#RUN apt install curl
COPY docker/* $NOTIFIER_HOME
EXPOSE 8080
ENV TMP_OPTS -Djava.io.tmpdir=/tmp ENV LOG4j_OPTS
-Dlog4j.configurationFile=$NOTIFIER_HOME/logback.xml ENV NOTIFIER_OPTS $TMP_OPTS $LOG4j_OPTS
ENV JAVA_GC_OPTS -Xms1g -Xmx1g
WORKDIR $NOTIFIER_HOME ENTRYPOINT ["sh", "-c", "/opt/app/entrypoint.sh"]

You can't run background jobs on Cloud Run. Wrap it in a webserver as proposed by MBHA if the process take less than 1h.
Else you can you GKE Autopilot to run your container for a while. you pay only when your container run. And the first cluster is free. You can have a try on it!
As hack you can run your container in Cloud Build also, or in Vertex AI custom container training.

I've run in to a similar issue with building custom image on MAC + deploying in to Cloud Run. In my case, it turned out to be the docker platform causing the problem. The way I isolated this was by building the same image in Cloud Shell and that would work perfectly fine in Cloud Run.
Now, if you need to build it locally on MAC go ahead and test it by changing the Docker platform:
export DOCKER_DEFAULT_PLATFORM=linux/amd64
docker build -t mytag:myver .
Once the image has been built, you can inspect the architecture:
docker image inspect mytag:myver | grep -i Architecture
Then deploy it to Cloud Run.

The explanation is in your question:
I am running a background task, nothing to expose
A cloud run application, so your container, must be listening for incoming HTTP requests as stated in the Container runtime contract. That's why in all cloud run examples, java in your case, spring boot is used with #RestController. Other explanation can be found in this answer.
Update:
So the solution is either to
add a webserver to your code and wrap it with spring boot and controller logic
use Cloud Function rather than Cloud Run and get rid of the Dockerfile and in the same time have simpler code and less configuration

How to properly set up jenkins with docker?

I'm new to Docker and am learning how to implement Docker with Jenkins. I was able to succesfully bind a docker volume to my host machine directory with the following command
docker run –name jenkinsci -p 8080:8080 -p 50000:50000 -v ~/Jenkins:/var/jenkins_home/ jenkins/jenkins:lts
Now that the basic Jenkins is set up and binded to my host, there are a few things I wasn't sure to handle.
(1) This is only accessible through localhost:8080. How do I make this accessible to other computers? I've read that I can change the URL to my company's public IP address? Is this the right approach?
(2) I want to automate the installation of select plugins and setting the paths in the Global Tools Configuration. There were some tips on github https://github.com/jenkinsci/docker/blob/master/README.md but I wasn't clear on where this Dockerfile is placed. For example, if I wanted the plugins MSBuild and Green Balls to be installed, what would that look like?
FROM jenkins/jenkins:lts
COPY plugins.txt /usr/share/jenkins/ref/plugins.txt
RUN /usr/local/bin/install-plugins.sh < /usr/share/jenkins/ref/plugins.txt
Would I have to create a text file called plugins.txt where it contains a list of plugins I want downloaded? Where will this Dockerfile be stored?
(3) I also want a Dockerfile that installs all the dependencies to run my .NET Windows project (nuget, msbuild, wix, nunit, etc). I believe this Dockerfile will be placed in my git repository.
Basically, I'm getting overwhelmed with all this Docker information and am trying to piece together how Docker interacts with Jenkins. I would appreciate any advice and guidance on these problems.

Its ok to get overwhelmed by docker+kubernetes. Its a lot of information and whole overall shift how we have been handling applications/services.
To make jenkins available on all interfaces, use following command.
docker run –name jenkinsci -p "0.0.0.0:8080:8080" -p "0.0.0.0:50000:50000" -v ~/Jenkins:/var/jenkins_home/ jenkins/jenkins:lts
Yes, you have to provide the plugins.txt file, and create a new jenkins image containing all the required plugins. After that you can use this new image instead of jenkins/jenkins:lts.
The new image, suited for your workload should contain all the dependencies required for your environment.

Best practice to connect my own code into a standard docker image in kubernetes

I have a lot of standard runtime docker images like python3 with tensorflow 1.7 installed and I want to use these standard images to run some customers code out side of them. The scenario seems quite similar with the serverless. So what is the best way to put the code into runtime dockers?
Right now I am trying to use a persistent volume to mount the code into runtime. But it has a lot of work. Is there some solution easier for this?
UPDATE
What is the workflow for google machine learning engine or floydhub. I think what I want is similar. They have a command line tool to make the local code combine with a standard env.

Following cloud native practices, code should be immutable, and releases and their dependencies uniquely identifiable for repeat-ability, replic-ability, etc - in short: you should really create images with your src code.
In your case, that would mean basing your Dockerfile on upstream python3 or TF images, there are a couple projects that may help with the workflow for above (code+build-release-run):
https://github.com/Azure/draft -- looks like better suited for your case
https://github.com/GoogleContainerTools/skaffold -- more golang friendly afaics
Hope it helps --jjo

One of the best practices is NOT to mount the code from a volume into it, but create a client-specific image that uses your TensorFlow image as a base image:
# Your base image comes in here.
FROM aisensiy/tensorflow:1
# Copy the client into your image.
COPY src /
# As Kubernetes will run your containers with an
# arbitrary UID, we set the user to nobody.
USER nobody
# ... and they will run with GID 0, so we
# need to change the group to 0 and make
# your stuff accessible to GID 0.
RUN \
chgrp -R 0 /src && \
chmod -R g=u && \
true
CMD ["/usr/bin/python", ...]
Some more best practices:
Always log to stdout instead of log files.
One process per container. If you need multiple local
processes, co-locate them into a single pod.
Even more best practices are provided in the OpenShift documentation: https://docs.openshift.org/latest/creating_images/guidelines.html
https://docs.openshift.org/latest/creating_images/guidelines.html

The code file can be passed from stdin when the container is being started. This way you can run arbitrary code when starting the container.
Please see below for example:
root#node-1:~# cat hello.py
print("This line will be printed.")
root#node-1:~#
root#node-1:~# docker run --rm -i python python < hello.py
This line will be printed.
root#node-1:~#

If this is your case,
You have a docker image with code in it.
Aim: To update the code inside docker image.
Solution:
Run a bash session with the docker image with a directory in your file system mounted as volume.
Place the updated code in the volume directory.
From the docker bash session replace the real code with updated code from the volume.
Save the current state of container as new docker image.
Sample Commands
Assume ~/my-dir in your file system has the new code updated-code.py
$ docker run -it --volume ~/my-dir:/workspace --workdir /workspace my-docker-image bash
Now a new bash session will start inside docker container.
Assuming you have the code in '/code/code.py' inside docker container,
You can simply update the code by
$ cp /workspace/updated-code.py /code/code.py
Or you can create new directory and place the code.
$ cp /workspace/updated-code.py /my-new-dir/code.py
Now the docker container contains updated code. But changes will be reset if you close the container and again run the image. To create a docker image with latest code, save this state of container using docker commit.
Open a new tab in the terminal.
$ docker ps
Will list all running docker containers.
Find CONTAINER ID of your docker container and save it.
$ docker commit id-of-your-container new-docker-image-name
Now run the docker image with latest code
$ docker run -it new-docker-image-name
Note: It is recommended to remove the old docker image using docker rmi command as docker images are heavy.

We're dealing with a similar challenge also. Our approach is to build a static docker image where Tensorflow, Python, etc are built once and maintained.
Each user has a PVC (persistent volume claim) where large files that may change such as datasets and workspaces live.
Then we have a bash shell that launches the cluster resources and syncs the workspace using ksync (like rsync for a kubernetes cluster).

Path interpretation in a Dockerfile

I want to run a container, by mounting on the fly my ~/.ssh path (so as to be able to clone some private gitlab repositories).
The
COPY ~/.ssh/ /root/.ssh/
directive did not work out, because the Dockerfile interpreted paths relative to a tmp dir it creates for the builds, e.g.
/var/lib/docker/tmp/docker-builder435303036/
So my next shot was to try and take advantage of the ARGS directive as follows:
ARG CURRENTUSER
COPY /home/$CURRENTUSER/.ssh/ /root/.ssh/
and run the build as:
docker build --build-arg CURRENTUSER=pkaramol <whatever follows ...>
However, I am still faced with the same issue:
COPY failed: stat /var/lib/docker/tmp/docker-builder435303036/home/pkaramol/.ssh: no such file or directory
1: How to make Dockerfile access a specific path inside my host?
2: Is there a better pattern for accessing private git repos from within ephemeral running containers, than copying my .ssh dir? (I just need it for the build process)

Docker Build Context
A build for a Dockerfile can't access specific paths outside the "build context" directory. This is the last argument to docker build, normally .. The docker build command tars up the build context and sends it to the Docker daemon to build the image from. Only files that are within the build context can be referenced in the build. To include a users .ssh directory, you would need to either base the build in the .ssh directory, or a parent directory like /home/$USER.
Build Secrets
COPYing or ADDing credentials in at build time is a bad idea as the credentials will be saved in the image build for anyone who has access to the image to see. There are a couple of caveats here. If you flatten the image layers after removal of the sensitive files in build, or create a multi stage build (17.05+) that only copies non sensitive artefacts into the final image.
Using ENV or ARG is also bad as the secrets will end up in the image history.
There is a long an involved github issue about secrets that covers most the variations on the idea. It's long but worth reading through the comments in there.
The two main solutions are to obtain secrets via the network or a volume.
Volumes are not available in standard builds, so that makes them tricky.
Docker has added secrets functionality but this only available at container run time for swarm based containers.
Network Secrets
Custom
The secrets github issue has a neat little net cat example.
nc -lp 10.8.8.8 8080 < $HOME/.ssh/id_rsa &
And using curl to collect it in the Dockerfile, use it, and remove it in the one RUN step.
RUN set -uex; \
curl -s http://10.8.8.8:8000 > /root/.ssh/id_rsa; \
ssh -i /root/.ssh/id_rsa root#wherever priv-command; \
rm /root/.ssh/id_rsa;
To make unsecured network services accessible, you might want to add an alias IP address to your loopback interface so your build container or local services can access it, but no one external can.
HTTP
Simply running a web server with your keys mounted could suffice.
docker run -d \
-p 10.8.8.8:80:80 \
-v /home/me/.ssh:/usr/share/nginx/html:ro \
nginx
You may want to add TLS or authentication depending on your setup and security requirements.
Hashicorp Vault
Vault is a tool built specifically for managing secrets. It goes beyond the requirements for a Docker build It's written and Go and also distributed as a container.
Build Volumes
Rocker
Rocker is a custom Docker image builder that extends Dockerfiles to support some new functionality. The MOUNT command they added allows you to mount a volume at build time.
Packer
The Packer Docker Builder also allows you to mount arbitrary volumes at build time.

Why do the changes I make in my working directory not show up in my Docker container?

I would like to run a test a parse-dashboard via Docker, as documented in the readme.
I am getting the error message, "Parse Dashboard can only be remotely accessed via HTTPS." Normally, you can bypass this by adding the line "allowInsecureHTTP": true in your parse-dashboard-config.json file. But even if I have added this option to my config file, the same message is displayed.
I tried to edit the config file in the Docker container, whereupon I discovered that none of my local file changes where present in the container. It appeared as though my project was an unmodified version of the code from the github repository.
Why do the changes that I make to the files in my working directory on the host machine not show up in the Docker container?

But what it is upload to my docker, it's in fact the config file of my master branch.
It depends:
what that "docker" is: the official DockerHub or a private docker registry?
how it is uploaded: do you build an image and then use docker push, or do you simply do a git push back to your GitHub repo?
Basically, if you want to see the right files in your Docker container that you run, you must be sure to run an image you have built (docker build) after a Dockerfile which COPY files from your current workspace.
If you do a docker build from a folder where your Git repo is checked out at the right branch, you will get an image with the right files.

The Dockerfile from the parse-dashboard repository you linked uses ADD . /src. This is a bad practice (because of the problems you're running into). Here are two different approaches you could take to work around it:
Rebuild the Image Each Time
Any time you change anything in the working directory (which the Dockerfile ADDs to /src), you need to rebuild for the change to take effect. The exception to this is src/Parse-Dashbaord/parse-dashboard-config.json, which we'll mount in with a volume. The workflow would be nearly identical to the one in the readme:
$ docker build -t parse-dashboard .
$ docker run -d -p 8080:4040 -v ./src/Parse-Dashbaord/parse-dashboard-config.json:/src/Parse-Dashboard/parse-dashboard-config.json parse-dashboard
Use a Volume
If we're going to use a volume to do this, we don't even need the custom Dockerfile shipped with the project. We'll just use the official Node image, upon which the Dockerfile is based.
In this case, Docker will not run the build process for you, so you should do it yourself on the host machine before starting Docker:
$ npm install
$ npm run build
Now, we can start the generic Node Docker image, and ask it do serve our project directory.
$ docker run -d -p 8080:4040 -v ./:/src node:4.7.2 "cd /src && npm run dashboard"
Changes will take effect immediately because you mount ./ into the image as a volume. Because it's not done with ADD, you don't need to rebuild the image each time. We can use the generic node image because if we're not ADDing a directory and running the build commands, there's nothing our image will do differently than the official one.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart