How to update azure ml workspace service image id? - machine-learning

I have workspace service created in machine learning workspace
How can I update ACR repository name and tag in running service ?
https://docs.azure.cn/zh-cn/cli/ext/azure-cli-ml/ml/service?view=azure-cli-latest#ext_azure_cli_ml_az_ml_service_update
This does not show any arguments related to image id update
az ml service update --name

To update the service to use a new entry script or environment, create an inference configuration file and specify it with the ic parameter.
Using the CLI
You can also update a web service by using the ML CLI. The following example demonstrates registering a new model and then updating a web service to use the new model:
az ml model register -n sklearn_mnist --asset-path outputs/sklearn_mnist_model.pkl --experiment-name myexperiment --output-metadata-file modelinfo.json
az ml service update -n myservice --model-metadata-file modelinfo.json

Related

Deploying an Azure durable function using a docker image in vscode

I have created a durable function in VSCODE, it works perfectly fine locally, but when I deploy it to azure it is missing some dependencies which cannot be included in the python environment (Playwright). I created a Dockerfile and a docker image on a private docker hub repository on which I want to use to deploy the function app, but I don't know how I can deploy the function app using this image.
I have already using commands such as:
az functionapp config container set --docker-custom-image-name <docker-id>/<image>:latest --name <function> --resource-group <rg>
Then when I deploy nothing happens, and I simply get The service is unavailable. I also tried adding the environment variables DOCKER_REGISTRY_SERVER_USERNAME, DOCKER_REGISTRY_SERVER_PASSWORD and DOCKER_REGISTRY_SERVER_PASSWORD. However, it is unclear whether the url should be <docker-id>/<image>:latest, docker.io/<image>:latest, https://docker.io/<image>:latest etc. Still the deployment gets stuck on The service is unavailable, not a very useful error message.
So I basicly have the function app project ready and the dockerfile/image. How can it be so difficult to simply deploy using the giving image? The documentation here is very elaborate but I am missing the details for a private repository. Also it is very different from my usual vscode deployment, making it very tough to follow and execute.
Created the Python 3.9 Azure Durable Functions in VS Code.
Created Container Registry in Azure and Pushed the Function Code to ACR using docker push.
az functionapp config container set --docker-custom-image-name customcontainer4funapp --docker-registry-server-password <login-server-pswd> --docker-registry-server-url https://customcontainer4funapp.azurecr.io --docker-registry-server-user customcontainer4funapp --name krisdockerfunapp --resource-group AzureFunctionsContainers-rg
As following the same MS Doc, pushed the function app to docker custom container made as private and to the Azure Function App. It is working as expected.
Refer to this similar issue resolution regarding the errorThe service is unavailable comes post deployment of the Azure Functions Project as there are several reasons which needs to be diagnosed in certain steps.

Docker desktop how to create new k8s cluster context

I have Docker Desktop and I want to create multiple clusters so I can work on different projects. For example cluster name 1: hello and cluster name 2: world.
I currently have one cluster with the context of docker-desktop that actually working.
To clarify I am posting Community Wiki answer.
A tool kind met your expectations in this case.
kind is a tool for running local Kubernetes clusters using Docker container “nodes”. kind was primarily designed for testing Kubernetes itself, but may be used for local development or CI.
Here one can find User Guide to this tool.
One can install it with 5 ways:
With A Package Manager
From Release Binaries
From Source
With make
With go get / go install
To create cluster with this tool run:
kind create cluster
To specify another image use the --image flag:
kind create cluster --image=xyz
In kind the node-image is built off the base-image, that installs all the dependencies required for Docker and Kubernetes to run in a container.
To assign the cluster a different name than kind, use --name flag.
More uses can be found with with:
kind create cluster --help

Unable to create a version in Cloud AI Platform using custom containers for prediction

Because of certain VPC restrictions I am forced to use custom containers for predictions for a model trained on Tensorflow. According to the documentation requirements I have created a HTTP server using Tensorflow Serving. The Dockerfile used to build the image is as follows:
FROM tensorflow/serving:2.3.0-gpu
# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model
Where my_model contains the saved_model inside a folder named 1/.
I have then pushed the container image to Artifact Registry and then created a Model. To create a Version I have selected Customer Container on the Cloud Console UI and and added the path to the Container Image. I have then mentioned the Prediction route and the Health route to be /v1/models/my_model:predict and have changed the Port to 8501. I have also selected the machine type to be a single compute node of type n1-standard-16 and 1 P100 GPU and kept scaling Auto scaling.
After clicking on Save I can see the Tensorflow Server starting and while viewing the logs we can see the following messages:
Successfully loaded servable version {name: my_model version: 1}
Running gRPC ModelServer at 0.0.0.0:8500
Exporting HTTP/REST API at:localhost:8501
NET_LOG: Entering the event loop
However after about 20-25 minutes the version creation just stops throwing out the following error:
Error: model server never became ready. Please validate that your model file or container configuration are valid.
I am unable to figure why this is happening. I am able to run the same docker image on my local machine and I am able to successfully get predictions by hitting the endpoint that is created: http://localhost:8501/v1/models/my_model:predict
Any help is this regard will be appreciated.
Answering this myself after working with the Google Cloud Support Team to figure out the error.
Turns out the port I was creating a Version on was conflicting with the Kubernetes deployment on Cloud AI Platform's side. So I changed the Dockerfile to the following and was able to successfully run Online Predictions on both Classic AI Platform and Unified AI Platform:
FROM tensorflow/serving:2.3.0-gpu
# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models
RUN mkdir -p ${MODEL_BASE_PATH}
# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model
EXPOSE 5000
EXPOSE 8080
CMD ["tensorflow_model_server", "--rest_api_port=8080", "--port=5000", "--model_name=my_model", "--model_base_path=/models/my_model"]
Have you tried using a different health path?
I believe /v1/models/my_model:predict uses HTTP POST, but health checks are usually using HTTP GET
You might need a GET endpoint for your health check path.
Edit: From the docs https://www.tensorflow.org/tfx/serving/api_rest you might be able to test just using /v1/models/my_model as your health endpoint

When you assign service account to a Cloud Run service, what does exactly happen?

I am trying to understand what does assigning service account to a Cloud Run service actually do in order to improve the security of the containers. I have multiple processes running within a Cloud Run service and not all of them do need to access the project resources.
A more specific question I have in mind is:
Would I be able to create multiple users and run some processes as a user that does not have access to the service account or does every user have access to the service account?
I run a small experiment on a VM instance (I guess this will be a similar case as with Cloud Run) where I created a new user and after creation, it wasn't authorized to use the service account of the instance. However, I am not sure is there a way to authorize it which would make my method insecure.
Thank you.
EDIT
To perform the test I created a new os user and used "gcloud auth list" from the new user account. However, I should have made a curl request and I would have been able to retrieve credentials as pointed out by an answer below.
Your question is not very clear but I will try to provide you several inputs.
When you run a service on Cloud Run, you have 2 choices for defining its identity
Either it's the compute engine service account which is used (by default, is you specify nothing)
Or it's the service account that you specify at the deployment
This service account is valid for the Cloud Run service (you can have up to 1000 different services per project).
Now, when you run your container, the service account is not really loaded into the container (it's the same thing with compute engine), but there is an API available for requesting the authentication data of this service account. It's name metadata server
It's not restricted to users (I don't know how you perform your test on Compute Engine!), a simple curl is enough for getting the data.
This metadata server is used when you use your libraries, for example, and you use the "default credentials". gcloud SDK also uses it.
I hope you have a better view now. If not, add details in your question or in the comments.
The keyword that's missing from guillaume’s answer is "permissions".
Specifically, if you don't assign a service account, Cloud Run will use the Compute Engine default service account.
This default account has Editor role on your project (in other words, it can do nearly anything on your GCP project, short of creating new service accounts and giving them access, and maybe deleting the GCP project). If you use default service account and your container is compromised, you're probably in trouble. ⚠️
However, if you specify a new --service-account, by default it has no permissions. You have to bind it roles or permissions (e.g. GCS Object Reader, PubSub Publisher...) that your application needs.
Just to add the previous answers, if you are using something like Cloud Build,here is how you can implement it
steps:
- name: gcr.io/cloud-builders/gcloud
args:
- '-c'
- "gcloud secrets versions access latest --secret=[SECRET_NAME] \t --format='get(payload.data)' | tr '_-' '/+' | base64 -d > Dockerfile"
entrypoint: bash
- name: gcr.io/cloud-builders/gcloud
args:
- '-c'
- gcloud run deploy [SERVICE_NAME] --source . --region=[REGION_NAME] --service-account=[SERVICE_ACCOUNT]#[PROJECT_ID].iam.gserviceaccount.com --max-instances=[SPECIFY_REQUIRED_VALUE]
entrypoint: /bin/bash
options:
logging: CLOUD_LOGGING_ONLY
I am using this in a personal project but I will explain what is happening here. The first one is pulling data from my Secret Manager where I am storing a Dockerfile with the secret environment variables. This is optional, if you are not storing any API keys and secrets,you can skip it. But if you have a different folder structure (ie that isn't flat)
The second deploys Cloud Run from the source code. The documentation for that can be found here.
https://cloud.google.com/run/docs/deploying-source-code

Create service or container from another container, on Google Cloud Run or Cloud Run on GKE

Can I create a service or container from another container, on Google Cloud Run or Cloud Run on GKE ?
I basically want to manage my containers/services dynamically from another container and not sure how to go about this
Adding more details:
One of my microservices needs to create new isolated containers that will run some user-land code. I would like to have full life-cycle control of these containers, run the code, and then destroy as needed.
I also looked at Cloud Run APIs but not sure how to run something like 'kubectl create ...' through the APIs? Is that the right approach?
Yes, you should be able to deploy Cloud Run services from Cloud Run services.
on Cloud Run (hosted): services by default run with Editor permissions, so this should be possible without any extra configuration
note that if you deploy apps with --allow-unauthenticated which requires setting IAM permissions, the Editor role will not be enough, as you need Owner role on the GCP project for that.
on Cloud Run on GKE: services by default run with limited scopes (as they by default inherit GKE node's permissions/scopes). You should add a service account to the Kubernetes Pod and use it to authenticate.
From there, you have several options:
Use the REST API directly: Since run.googleapis.com behaves like a Kubernetes API server, you can directly apply JSON objects of Knative Services. (You can use gcloud ... --log-http to learn how deployments are made using REST API requests).
Use gcloud: you can ship your container image with gcloud and invoke it from your process.
Use Google Cloud Client Libraries: You can use the client libraries that are available for Cloud Run (for example this Go library) to construct in-memory Service objects and send them to the API using a higher level client library (recommended approach)

Resources