Azure model deployment (Real-time Endpoints vs Compute Inference Cluster) - machine-learning

Reaching out for some help here.
ManagedOnlineDeployment vs KubernetesOnlineDeployment
Goal:
Host a large number of distinct models on Azure ML.
Description:
After throughout investigation, I found out that there are two ways to host a pre-trained real-time model (i.e., run inference) on Azure ML.
Real-time Endpoints - Managed Online Deployment
Compute Inference cluster - kubernetes-online-endpoints
The differences between the two options are detailed here.
I want to host a large number of distinct models (i.e., endpoints) while having the best price/performance/ease-of-deployment ratio.
Details:
What I tried
I have 4 running VMs as a result of my creation of 4 real-time endpoints. Those endpoints use Curated Environments that are provided by Microsoft.
VMs
Real-time endpoints deployed
Issues
When I want to create a custom environment out of a docker file and then use it as a base image for a certain endpoint, it is a long process:
Build Image > Push Image to CR > Create Custom Environment in AzureML > Create and Deploy Endpoint
If something goes wrong, it only shows when I finish the whole pipeline. It just doesn't feel like the correct way of deploying a model.
This process is needed when I cannot use one of the curated environments because I need some dependency that cannot be imported using the conda.yml file
For example:
RUN apt-get update -y && apt-get install build-essential cmake pkg-config -y RUN python setup.py build_ext --inplace
Although I'm using 1 instance per endpoint (Instance count = 1), each endpoint creates its dedicated VM which will cost me a lot in the long run (i.e., when I have lots of endpoints), now it is costing me around 20$ per day.
Note: Each endpoint has a distinct set of dependencies/versions...
Questions:
1- Am I following the best practice? Or do I need to drastically change my deployment strategy (Move from ManagedOnlineDeployment to KubernetesOnlineDeployment or even another option that I don't know of)?
2- Is there a way to host all the endpoints on a single VM? Rather than creating a VM for each endpoint. To make it affordable.
3- Is there a way to host the endpoints and get charged per transaction?
General recommendations and clarification questions are more than welcome.
Thank you!

As part of the requirement, there is a standard procedure to use the batch endpoints creation.
The batch endpoints will make use for repeated procedure for any number of models which are running in our current environment. We can call the models which are registered (without registering model, we can get artifacts). Get the artifacts and download the model. Deploy it using any procedure like mentioned below.
Use the custom environment and make it maximum (features) for single batch endpoint. But there is no alternative procedure excluding docker and conda environment to create and use an environment for this approach. If there is any predefined docker configuration available use that or else create it with docker itself. Avoid conda specifications.
From available options, choose the most useful for the application and create. Then we can deploy.
This procedure will decrease the configuration burden and improve the functionality.

Related

How can I set up a Helm chart that creates a "rank" system similar to MPI?

I'm migrating an HPC app originally written with MPI to a Kubernetes cluster. We have removed MPI and are sort of "rolling our own" way of managing the app layout using Helm.
In MPI, you essentially build an "appschema" that looks an awful lot like a Helm chart, but I'm trying to replicate some of the features from MPI in the chart and am unsure about the best approach.
In an MPI application, you launch several copies of the same binary, but each binary is given a unique number, a rank, that identifies it in the application group. This rank is used for determining what part of the problem the binary should work on, and as a way for it to send and receive messages from other binaries in the group. Our approach would use service discovery and something like ZMQ to allow ranks to communicate with each other, but we still need a way of uniquely identifying each rank.
My plan for replicating this behavior is to pass in an environment variable to each pod specifying the rank for its container app, such that in the Docker image, I get the following command:
CMD ["sh", "-c", "/apphome/workerApp $RANKNUM"]
The only thing is, I don't know how best to represent this in a Helm chart. My current line of thinking is to set replicaCount in the values.yaml to the number of desired ranks, but then how can I pass in a unique number for $RANKNUM to each replica? Is this even the best approach or should I use something other than replicaCount?
How can I pass in a unique numerical identifier as an environment variable to each replica in a Kubernetes Helm chart, and is replicaCount the appropriate way to represent MPI rank-like behavior in an HPC app?

Building platform-as-a-service

I have been assigned a problem statement which goes as follows:
I am building platform-as-a-service from scratch, which has pipelined execution. Here pipelined execution means that output of a service can be input into another service. The platform can offer number of services, which can be pipelined together. Output of a service can be input to multiple services.
I am new to this field so how to go about this task is not very intuitive to me.
After researching a bit, I found out that I can use Docker to deploy services in containers. So I installed Docker on Ubuntu and installed few images and run them as service (for example, MongoDB). What I am thinking of is that I need to run the services in containers, and define a way of taking input and output to these services. But how exactly do I do this using Docker containers. As an example, I want to send a query as an input to MongoDB (running as a service) and want an output, which I want to feed into another service.
Am I thinking in the right direction? If not in what direction should I be thinking of going about implementing this task?
Is there a standard way of exchanging data between services? (For example output of on service as input to another)
Is there something that Docker offers that I can leverage?
NOTE: I cannot use any high level API which does this for me. I need to implement it myself.

Rational behind appending versions as Service/Deployment name on k8s with spring cloud skipper

I am kind of new the spring cloud dataflow world and while playing around with the framework, I see that if I have a stream = 'test-steram' with 1 application called 'app'. When I deploy using skipper to kubernetes, I see that It creates pod/deployment & service on kubernetes with name as
test-stream-app-v1.
My question is why do we need to have v1 in service/deployment names on k8s? What role does it play in the overall workflow using spring cloud dataflow?
------Follow up -----------
Just wanted to confirm few points to make sure i am on right track to understand the flow
My understanding is with traditional stream (bind through kafka topics) service (object on kubernetes) do not play a significant role.
Rolling Update (Red/Black) pattern has implemented in following way in skipper and versioning in deployment/service plays a role in following way.
Let's assume that app-v1 deployment already exists and upgrade is requested. Skipper creates app-v2 deployment and
wait for it to be ready. Once ready it destroys app-v1
If my above understanding is right I have following follow up questions...
I see that skipper can deploy and package (and it do not have to be a traditional stream) to work with. Is that the longer term plan or Skipper is only intended to work spring-cloud-dataflow streams?
In case of non-tradtional stream package, where an package has multiple apps(rest microservices) in a group, how this model of versioning will work? I mean when I want to call the microservice from other microservice, I cannot possibly know or less than ideal to know the release-version of the app?
#Anand. Congrats on the 1st post!
The naming convention goes by the idea that each of the stream application is "versioned" if Skipper is used with SCDF. The version gets bumped for when, as a user, when you rolling-upgrade and rolling-downgrade the streaming-application versions or the application-specific properties either on-demand or via CI/CD automation.
It is very relevant for continuous-delivery and continuous-deployment workflows, and we provide native options in SCDF through commands such as stream update .. and stream rollback .. respectively. For any of these operations, the applications will be rolling updated in K8s, and each action will bump the number in the application name. In your example, you'd see them as test-stream-app-v1, `test-stream-app-v2, etc.
With all the historical versions in a central place (i.e., Skipper's database), you'd be able to interact with them via stream history.. and stream manifest .. commands in SCDF.
To learn more about all this, watch this demo-webinar (starts # ~41.25), and also have a look at samples in the reference guide.
I hope this helps.

Where should I put shared services for multiple kubernetes-clusters?

Our company is developing an application which runs in 3 seperate kubernetes-clusters in different versions (production, staging, testing).
We need to monitor our clusters and the applications over time (metrics and logs). We also need to run a mailserver.
So basically we have 3 different environments with different versions of our application. And we have some shared services that just need to run and we do not care much about them:
Monitoring: We need to install influxdb and grafana. In every cluster there's a pre-installed heapster, that needs to send data to our tools.
Logging: We didn't decide yet.
Mailserver (https://github.com/tomav/docker-mailserver)
independant services: Sentry, Gitlab
I am not sure where to run these external shared services. I found these options:
1. Inside each cluster
We need to install the tools 3 times for the 3 environments.
Con:
We don't have one central point to analyze our systems.
If the whole cluster is down, we cannot look at anything.
Installing the same tools multiple times does not feel right.
2. Create an additional cluster
We install the shared tools in an additional kubernetes-cluster.
Con:
Cost for an additional cluster
It's probably harder to send ongoing data to external cluster (networking, security, firewall etc.).
3) Use an additional root-server
We run docker-containers on an oldschool-root-server.
Con:
Feels contradictory to use root-server instead of cutting-edge-k8s.
Single point of failure.
We need to control the docker-containers manually (or attach the machine to rancher).
I tried to google for the problem but I cannot find anything about the topic. Can anyone give me a hint or some links on this topic?
Or is it just no relevant problem that a cluster might go down?
To me, the second option sound less evil but I cannot estimate yet if it's hard to transfer data from one cluster to another.
The important questions are:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Thanks for your hints,
Marius
That is a very complex and philosophic topic, but I will give you my view on it and some facts to support it.
I think the best way is the second one - Create an additional cluster, and that's why:
You need a point which should be accessible from any of your environments. With a separate cluster, you can set the same firewall rules, routes, etc. in all your environments and it doesn't affect your current workload.
Yes, you need to pay a bit more. However, you need resources to run your shared applications, and overhead for a Kubernetes infrastructure is not high in comparison with applications.
With a separate cluster, you can setup a real HA solution, which you might not need for staging and development clusters, so you will not pay for that multiple times.
Technically, it is also OK. You can use Heapster to collect data from multiple clusters; almost any logging solution can also work with multiple clusters. All other applications can be just run on the separate cluster, and that's all you need to do with them.
Now, about your questions:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
No, it is not a problem with a separate cluster.
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
I think, yes. At least I did it several times, and I know some other projects with similar architecture.
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Yes, nothing complex there. Usually, it does not depend on the platform.

docker-compose per microservice in local and production enironments?

I want to be able to develop microservices locally, but also to 'push' them into production with minimal configurational changes. I used to put all microservices into one docker-compose locally; but I start to see this might no be the practical.
The new idea is to have single docker-compose per service. This does not means it will run with only one container; it might have more inside (like some datastore behind etc).
From that new point of view, let's take a look at the well-known docker voting app example, that consist of 5 components:
(P) Python webapp which lets you vote between two options
(R) Redis queue which collects new votes
(J) Java worker which consumes votes and stores them in…
(S) Postgres database backed by a Docker volume
(N) Node.js webapp which shows the results of the voting in real time
Let's say you want to push this example into production (so having just one docker-compose is not an option:). Not forget that more infrastructure-related components may be added on top of it (like kibana, prometheus...). And we want to be able to scale what we need; and we use e.g. swarm.
The question is:
How to organize this example: in single docker-composes or many?
What microservices do we have here? In other words, which components would you combine into single docker-compose? Example: J and S?
If services are not in single docker-compose, do we add them to same overlay network to use swarm dns feature?
and so on...
(I don't need details on how to install stuff, this question is about top-level organization)
Docker Compose is mostly for defining different container, configure and using a single command make them available (also for sequencing). So it is best suited for local development, integration testing and use it as part of your Continuous Integration process.
While not ruling out Docker compose can be used in production environment, I think it would be a good case of using Kubernetes which gives more control over scaling, managing multiple containers.
This blog has some example scenarios to try out (and many other resources which can be helpful)
https://renzedevries.wordpress.com/2016/05/31/deploying-a-docker-container-to-kubernetes-on-amazon-aws/comment-page-1/#comment-10

Resources