How to build an image for KubeFlow pipeline? - machine-learning

I recently found out about kubeflow and kubeflow pipeline but it is not clear for me how to build an image from my python program.
Let's assume that I have a simple python function that crops images:
class Image_Proc:
def crop_image(self, image, start_pixel, end_pixel):
# crop
return cropped_image
How shall I containerize this and use it in the KubeFlow pipeline? Do I need to wrap it in an API (with Flask for example) Or do I need to connect to some media/data broker?
How KubeFlow pipeline sends input to this code and transfers the output of this code to the next step?

Basically you can follow the steps provided by Docker here to create Docker image and publish to Docker Hub (or you can build your own private docker registry, but I think it may be too much work for beginner). Just roughly list steps:
Create Dockerfile. In your Dockerfile, just specify several things: base image (for you case, just use python image from Docker), working directory and what commands to be executed when running this image
Run your Image locally to make sure it works as expected (Install docker first if you haven't), then push to Docker Hub
Once published, you will have the image URL after publishing to Docker Hub, then use that url when you create pipelines in Kubeflow.
Also, you can read this doc to know how to create pipelines (Kubeflow pipeline is just argo workflow). For your case, just fill in inputs and/or outputs sections of the step you want in the pipeline YAML file.

You do not need to build images. For small to medium size components you can work on top of existing images. Check the lightweight components sample.
For python see Data passing in python components
For non-python see
Creating components from command-line programs
KFP SDK has some support for building container images. See the container_build sample.
Read the official component authoring documentation.
Let's assume that I have a simple python function that crops images:
You can just create a component from a python function like this:
from kfp.components import InputPath, OutputPath, create_component_from_func
# Declare function (with annotations)
def crop_image(
image_path: InputPath(),
start_pixel: int,
end_pixel: int,
cropped_image_path: OutputPath(),
):
import some_image_lib
some_image_lib.crop(image_path, start_pixel, end_pixel, cropped_image_path)
# Create component
crop_image_op = create_component_from_func(
crop_image,
# base_image=..., # Optional. Base image that has most of the packages that you need. E.g. tensorflow/tensorflow:2.2.0
packages_to_install=['some_image_lib==1.2.3'],
output_component_file='component.yaml', # Optional. Use this to share the component between pipelines, teams or people in the world
)
# Create pipeline
def my_pipeline():
download_image_task = download_image_op(...)
crop_image_task = crop_image_op(
image=download_image_task.output,
start_pixel=10,
end_pixel=200,
)
# Submit pipeline
kfp.Client(host=...).create_run_from_pipeline_func(my_pipeline, arguments={})

Related

How to use Docker BuildKit extensions in GitHub Actions

I'm trying to do something a bit slick, and of course running into issues.
Specifically, I am using a Docker-based environment build system (Lando). I'm feeding it a Dockerfile that it then builds a cluster around, using docker-compose. Locally, this works fine. I also have it working great inside a GitHub Action to spin up a local-dev-identical environment for testing. Quite nice.
However, now I'm trying to expand the Dockerfile using the Dockerfile Plus extension. My Dockerfile looks like this:
# syntax = edrevo/dockerfile-plus
INCLUDE+ ./docker/prod/Dockerfile
COPY docker/dev/php.ini /usr/local/etc/php/conf.d/zzz-docker-custom.ini
# Other stuff here.
This works great for me locally, and I get the contents of docker/prod/Dockerfile included into my docker build.
When I run the same configuration inside a GitHub Actions workflow, I get a syntax error on the INCLUDE+ line, indicating that the extension is not being loaded. It uses BuildKit (according to the project page), which should be enabled by default on any recent Docker version, it says. Yet whatever is on GitHub is not using BuildKit. I've tried enabling it by setting the env vars explicitly (as specified on the Dockerfile+ project page), but it still doesn't seem to work.
How can I get Dockerfile+ working in GitHub Actions? Of note, I do not run the docker build command myself (it's run by docker-compose, using files generated on the fly by Lando), so I cannot modify that specific command. But I didn't need to locally anyway, so I don't know what the issue is.

Using Gradle's shadowJar in AWS Lambda Container Image

A year ago AWS announced support for Cotainer Images. The idea is that you could run your Docker image in Lambda. The image could be created based on one of the provided base images, or completely from scratch. With the first approach one just have to implement a handler, just like with regular Lambda function, while image build with the second approach need to implement Lambda Runtime API.
Building images based on the provided base images is simpler, and the documentation gives an example for Gradle:
FROM public.ecr.aws/lambda/java:11
# Copy function code and runtime dependencies from Gradle layout
COPY build/classes/java/main ${LAMBDA_TASK_ROOT}
COPY build/dependency/* ${LAMBDA_TASK_ROOT}/lib/
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "com.example.LambdaHandler::handleRequest" ]
task copyRuntimeDependencies(type: Copy) {
from configurations.runtimeClasspath
into 'build/dependency'
}
build.dependsOn copyRuntimeDependencies
But I'm already using Gradle Shadow Plugin and I don't like the idea of introducing a task just to copy dependencies. It looks like a hack.
Can I use that shadow JAR to build Java AWS Lambda Container image?
It turns out you can! Skip copying the classes to $LAMBDA_TASK_ROOT completely, and copy only a single fat shadow jar to the $LAMBDA_TASK_ROOT/lib:
FROM amazon/aws-lambda-java:latest
COPY build/libs/build-all.jar ${LAMBDA_TASK_ROOT}/lib/
CMD [ "com.example.LambdaHandler::handleRequest" ]

Which is the best way to create an ECR repository using CDK?

I'm using CDK for a while and I'm not sure yet which is the best way to launch a stack that create an ECR repository, building and sending a docker image to ECR.
My last try was something like that:
taskDefinition.addContainer("container", {
image: new AssetImage('./', {
repositoryName: "name"
})
});
But there is an issue on this approach, the repositoryName is deprecated at AssetImage class and it looks deprecated everywhere.
Can someone tells how can we launch this kind of stuff?
The code you are showing is not building an ECR and instead, deploying an ECS task definition.
If you are trying to build the docker image at deployment time, CDK has a handy funcionality to do both at the same time with assets:
image: ecs.ContainerImage.fromAsset('./image') // build and upload an image directly from a Dockerfile in your source directory.
Otherwise, please check the Image options available in the docs: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html#images
The answer from Pedreiro shows the right way to build and upload an image. CDK will automatically create an ECR repository for you in that case.
I highly recommend to use the aws-ecs-patterns module to get stared with ECS on CDK. It provides already some common use cases where you can deploy an ECS service with very little code. The overview page is a good starting point: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-patterns-readme.html
But in the unlikely case you wanted to just create an ECR repository using CDK you can do it like this:
const repository = new ecr.Repository(this, 'Repository');
Check out the corresponding docs: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecr-readme.html
Notice there's an issue with
image: ecs.ContainerImage.fromAsset('./image')
https://github.com/aws/aws-cdk/issues/2663
where you can not specify a tag, as a result in the ECS task it by default go for the latest, and the pull img will fail

Swap out a layer of a container image

I am wondering if it is possible to swap out a layer of a container image for another. Here is my scenario:
I have a docker file that does the following:
It pulls the .net core 3.1 debian slim (buster) image
Builds the source code of my application and adds it as a layer on top of the .net core image
When a new version of the .net core 3.1 runtime image comes out, I would like to make a new image that has that new version, but has the same application layer on top of it.
I do NOT want to have to find the exact version of the code I used to build the application and re-build.
The idea is to replicate upgrading the machine and runtime, but not have any alteration to the application (less to test with the upgrade).
Is there a docker command that I can use to swap out a layer of an image?
This answer is broken into two distinct sections:
Part I: Build From Sources Instead: Explains why this idea should not be done.
Part II: A Proof-of-concept Implemented CLI Tool: A CLI tool I implemented specifically for this question, with a demonstration.
The first section can be skipped if potentially adverse side effects are not a concern.
Part I: Build From Sources Instead
No, there is probably not a docker command for this - and likely never will, due to a whole slough of technical issues. Even by design, this is not meant to be trivially possible; images and layers are meant to be immutable after being built. It is strongly recommended that the application image is instead rebuilt from the original sources, but with a new base image by modifying the FROM command.
There are numerous consistency issues that make this idea ill-advised, of which a few are listed below:
Certain Dockerfile commands do not actually create a layer, they update the final layer's internal manifest and the image's manifest. Commands that state Removing intermediate container XXXXXXXXXXXX are actually updating these aforementioned manifests, and not creating new layers. This requires correctly updating the only relevant changes when swapping from the old base image to the new base image; e.g. reconciling changes from ENV/LABEL/MAINTAINER/EXPOSE/CMD/ENTRYPOINT commands.
ENV commands that alter the application image's configuration from variables inherited in the previous base image may not be updated correctly. For example, in the application image, there might be the following command ENV command:
ENV PATH="/path:${PATH}"
If the application image's old base image layers are swapped out, and the new base image contains a different ${PATH} variable, it is ambiguous how to reconcile the differences without a manual decision by a developer.
If apt/apt-get/apk/yum is used to install Linux packages, these packages are installed in the application image as subsequent layers. The installed packages are not guaranteed to be compatible with the new base image if the underlying OS or executables change in the new base image layers.
Part II: A Proof-of-concept Implemented CLI Tool
"Upgrading" an image's base is technically possible by doing direct manipulation on the image archive of the already-built Docker application image. However, I cannot reiterate this enough - you should not even be attempting to edit the bottom layers of an existing image. Seriously - stop it, get some help. I am 90% sure this is a war crime.
For the sake of humoring this question thoroughly though, I have developed a proof-of-concept CLI tool written in Java over on my GitHub project concision/docker-base-image-swapper that is designed to swap base images (but not arbitrary layers). The design choices I made to resolve various consistency issues are a "best guess" on what action should be taken.
Included is a demonstration for swapping the base image for an already-built Java application image from JDK8 to JDK11, implemented in the demo/demo.sh script. All core code is ran in isolated Docker contains, so only Bash and Docker are necessary dependencies on the host to run this demonstration. The demo application image is only built once on JDK 8, but is run twice - once on the original JDK 8 image, and another time on a swapped-base JDK 11 image.
If you experience some technical difficulty with the tool, I may potentially be able to fix the issue. This project was quickly hacked together and has pretty poor code quality; furthermore, it likely suffers from various unaccounted edge cases. I might thoroughly rewrite this in Rust within the next month or two with a focus on maintainability and handling all edge cases.
Warning: I still do not advise trying to edit images; the use of the tool is at your own risk.
The Concept
There are three images relevant in this process, all of which are already-built (i.e. no original sources are needed):
application image: the application image that is currently based off of the old base image, but needs to be swapped to a newer base image.
old base image: the older base image that the application image is based off using a FROM command in the original Dockerfile.
new base image: the newer base image that should replace the application image's layers inherited solely from the FROM old-base-image layers.
By knowing which layers and configurations of the application image are inherited from the old base image, they can be replaced with the layers and configurations from the new base image. All image layers and manifests can be obtained as a tar archive by using the docker save command. With an archive(s) of all relevant three images, a tool can analyze the differences between
Warnings on Alternatives
Beware of doing simply a COPY --from=... from the old application image, as the original application's image configuration (through commands such as CMD, ENTRYPOINT, ENV, EXPOSE, LABEL, USER, VOLUME, WORKDIR) will not be properly replicated.
Two tools that provide this type of functionality that I have found are listed below.
Crane rebase https://github.com/google/go-containerregistry/blob/main/cmd/crane/rebase.md#using-crane-rebase
Buildpack (rebase) https://buildpacks.io/docs/concepts/operations/rebase/
NOTE: You do have to watch that you don't have items that need updates in higher layers. And that higher layers do NOT have updates in them that would conflict with the new base. Example using image below the base is a TOMCAT container. If layer 4 of the original container did a update tomcat package that would overshadow the update of same files on the NEW BASE. Which could result in a non functional system. So as always your mileage may vary.
Is it possible? Yes. But it's rarely done because of how error prone it is. For example, if the previous build would have created a library, but the library already exists in the original base image, that library won't be included in the first build. If the base image removes that library, then the resulting merged image will be missing that library since it's not in the new base layers and isn't in the old application layer.
There are a few ways I can think of to do this.
Option 1: If you know the specific files in one image, then use the COPY --from syntax to copy those files between images. This is the least error prone method, but requires that you know every file you want to include. The resulting Dockerfile looks like:
FROM new_base
# if every file is in /usr/local/bin:
COPY --from=old_image /usr/local/bin/ /usr/local/bin/
Option 2: you can export the images and create your own new image by combining the layers between the two. For this, there's docker save and docker load. That would look like:
docker save old_image >old_image.tar
docker save new_base >new_base.tar
mkdir old_image new_base new_image
tar -xvf old_image.tar -C old_image
tar -xvf new_base.tar -C new_base
cp old_image/*.json new_image/
# manually: copy each layer directory you want to save from old_image, you can view the nested tar files to find the proper ones
# manually: copy each layer directory from new_base into new_image
# manually: modify the new_image/manifest.json to have the new tag for your image, and adjust the layer references
tar -cvf new_image.tar -C new_image .
docker load <new_image.tar
Option 3: this could be done directly to the registry with API calls. You would pull the old manifest, adjust it with the new layers, and then push any new layers and then the new manifest. This would require a fair bit of coding (see regclient/regclient for how I've implemented some of these API's in Go).
Option 4: I know I've seen a tool that does this in a specific scenario but the name of it escapes me. I believe it required that you use their base images that were curated to reduce the risk of incompatibilities between versions, and limited what base images could be swapped, so I don't think it was a general purpose tool.
Note that option 2 and 3 both require either manual steps or for you to write some code if you want to automate it. Because of how error prone this is (as described above) I don't think you'll find anyone maintaining and supporting a tool to implement it. The vast majority rebuild from a Dockerfile using a CI tool.
I have developed a small utility script using Python which lets you append a tarball to an existing container image in a container registry (without having to pull existing image data), available at https://github.com/malthe/appendlayer.
The following illustrates how it works:
[ base layer ] => image1
[ base layer ] [ appended layer ] => image2
The script supports any registry that implements the OCI Distribution Spec.

Docker Image which uses another image for running test cases

I have a BDD framework in Java which I am planning to dockerize. I am able to build and run that image as a whole. But what I want is:
To build 2 images, Image-1: Entire project (without feature files) & Image-2: Feature files.
Reason to do this is: My feature file will change often. I don't want to create my image again every time to install JDK and maven when there is only a change in the feature file.
What I expect is - Image-1 runs always as a container in background and when there is a change in feature files, I build Image-2 and start it as a container. This should trigger test by using already running container which has an entire dependency.
Reason to do this is: My feature file will change often. I don't want to create my image again every time to install JDK and maven when there is only a change in the feature file.
If you just want to meet above requirement, what you is just image inherit like next:
base/Dockerfile:
FROM ubuntu:16.04
# install JDK/MAVEN here
RUN xxx
Build a base image now:
$ docker build -t mybase:1 .
Then, for your application, use this base image:
app/Dockerfile:
FROM mybase:1
# add new feature files here
ADD ... ...
Everytime, your feature file change, you could rebuild your app Dockerfile and run a container base on this new built out image. But, As the JDK/MAVEN is in another base image (mybase:1) which was already built there, so they won't be built again.

Resources