Debugging AML Model Deployment - machine-learning

I have an ML model (trained locally) in python. Previously the model has been deployed to a Windows IIS server and it's working fine.
Now, I am trying to deploy it as a service on Azure container instance (ACI) with 1 core, and 1 GB of memory. I took references from one and two Microsoft docs. The docs use SDK for all the steps, but I am using the GUI feature from the Azure portal.
After registering the model, I created an entry script and a conda environment YAML file (see below), and uploaded both to "Custom deployment asset" (at Deploy model area).
Unfortunately, after hitting deploy, the Deployment state is stuck at Transitioning state. Even after 4 hours, the state remains the same and there were no Deployment logs too, so I am unable to find what I am doing wrong here.
NOTE: below is just an excerpt of the entry script
import pandas as pd
import pickle
import re, json
import numpy as np
import sklearn
def init():
global model
global classes
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'randomForest50.pkl')
model = pickle.load(open(model_path, "rb"))
classes = lambda x : ["F", "M"][x]
def run(data):
try:
namesList = json.loads(data)["data"]["names"]
pred = list(map(classes, model.predict(preprocessing(namesList))))
return str(pred[0])
except Exception as e:
error = str(e)
return error
name: gender_prediction
dependencies:
- python
- numpy
- scikit-learn
- pip:
- pandas
- pickle
- re
- json

The issue was in the YAML file. The dependencies/libraries in the YAML should be according to conda environment. So, I changed everything accordingly, and it worked.
Modified YAML file:
name: gender_prediction
dependencies:
- python=3.7
- numpy
- scikit-learn
- pip:
- azureml-defaults
- pandas
- pickle4
- regex
- inference-schema[numpy-support]

Related

Install Custom Dependency for KFP Op

I'm trying to setup a simple KubeFlow pipeline, and I'm having trouble packaging up dependencies in a way that works for KubeFlow.
The code simply downloads a config file and parses it, then passes back the parsed configuration.
However, in order to parse the config file, it needs to have access to another internal python package.
I have a .tar.gz archive of the package hosted on a bucket in the same project, and added the URL of the package as a dependency, but I get an error message saying tarfile.ReadError: not a gzip file.
I know the file is good, so it's some intermediate issue with hosting on a bucket or the way kubeflow installs dependencies.
Here is a minimal example:
from kfp import compiler
from kfp import dsl
from kfp.components import func_to_container_op
from google.protobuf import text_format
from google.cloud import storage
import training_reader
def get_training_config(working_bucket: str,
working_directoy: str,
config_file: str) -> training_reader.TrainEvalPipelineConfig:
download_file(working_bucket, os.path.join(working_directoy, config_file), "ssd.config")
pipeline_config = training_reader.TrainEvalPipelineConfig()
with open("ssd.config", 'r') as f:
text_format.Merge(f.read(), pipeline_config)
return pipeline_config
config_op_packages = ["https://storage.cloud.google.com/my_bucket/packages/training-reader-0.1.tar.gz",
"google-cloud-storage",
"protobuf"
]
training_config_op = func_to_container_op(get_training_config,
base_image="tensorflow/tensorflow:1.15.2-py3",
packages_to_install=config_op_packages)
def output_config(config: training_reader.TrainEvalPipelineConfig) -> None:
print(config)
output_config_op = func_to_container_op(output_config)
#dsl.pipeline(
name='Post Training Processing',
description='Building the post-processing pipeline'
)
def ssd_postprocessing_pipeline(
working_bucket: str,
working_directory: str,
config_file:str):
config = training_config_op(working_bucket, working_directory, config_file)
output_config_op(config.output)
pipeline_name = ssd_postprocessing_pipeline.__name__ + '.zip'
compiler.Compiler().compile(ssd_postprocessing_pipeline, pipeline_name)
The https://storage.cloud.google.com/my_bucket/packages/training-reader-0.1.tar.gz IRL requires authentication. Try to download it in Incognito mode and you'll see the login page instead of file.
Changing the URL to https://storage.googleapis.com/my_bucket/packages/training-reader-0.1.tar.gz works for public objects, but your object is not public.
The only thing you can do (if you cannot make the package public) is to use google.cloud.storage library or gsutil program to download the file from the bucket and then manually install it suing subprocess.run([sys.executable, '-m', 'pip', 'install', ...])
Where are you downloading the data from?
What's the purpose of
pipeline_config = training_reader.TrainEvalPipelineConfig()
with open("ssd.config", 'r') as f:
text_format.Merge(f.read(), pipeline_config)
return pipeline_config
Why not just do the following:
def get_training_config(
working_bucket: str,
working_directory: str,
config_file: str,
output_config_path: OutputFile('TrainEvalPipelineConfig'),
):
download_file(working_bucket, os.path.join(working_directoy, config_file), output_config_path)
the way kubeflow installs dependencies.
Export your component to loadable component.yaml and you'll see how KFP Lighweight components install dependencies:
training_config_op = func_to_container_op(
get_training_config,
base_image="tensorflow/tensorflow:1.15.2-py3",
packages_to_install=config_op_packages,
output_component_file='component.yaml',
)
P.S. Some small pieces of info:
#dsl.pipeline(
Not required unless you want to use the dsl-compile command-line program
pipeline_name = ssd_postprocessing_pipeline.name + '.zip'
compiler.Compiler().compile(ssd_postprocessing_pipeline, pipeline_name)
Did you know that you can just kfp.Client(host=...).create_run_from_pipeline_func(ssd_postprocessing_pipeline, arguments={}) to run the pipeline right away?

ImportError: No module named cv2 when run Batch transform jobs in SageMaker

When I tried to run a Batch transform job in AWS SageMaker, I met below error:
ImportError: No module named cv2
Please note that, I am able to "import CV2" in the notebook instance. The jupter can run "import CV2" in notebook instance. But failed to run it in endpoints during inference time. I have tried below method using "env" as the link AWS Sagemaker - Install External Library and Make it Persist
but it still not work.
anyone have good way to solve it? Thanks!
my codes are:
env = {
'SAGEMAKER_REQUIREMENTS': 'requirements.txt', # path relative to `source_dir` below.
}
image_embed_model = MXNetModel(model_data=model_data,
entry_point='sagemaker_entrypoint.py',
role=role,
source_dir = 'src',
env = env,
py_version='py3',
framework_version='1.6.0')
transformer = image_embed_model.transformer(instance_count=1, # Please pay attention here!!!
instance_type='ml.m4.xlarge',
output_path=output_path,
assemble_with = 'Line',
accept = 'text/csv'
)
transformer.transform(batch_input,
content_type='text/csv',
split_type='Line',
input_filter='$[0:]',
join_source='Input',
wait=False)
You can follow https://github.com/aws/sagemaker-python-sdk/blob/master/doc/using_mxnet.rst#use-third-party-libraries to import third party libraries to your batch transform instances. Make sure the requirement.txt file is saved under the right directory before packaging the model data.

Dependency parse large text file with python

I am trying to parse a large txt file (about 2000 sentence). when I want to set the model_path, I get this massage:
NLTK was unable to find stanford-parser.jar! Set the CLASSPATH
environment variable.
And also when I set the CLASSPATH to this file, another message comes out:
NLTK was unable to find stanford-parser-(\d+)(.(\d+))+-models.jar!
Set the CLASSPATH environment variable.
Would you help me to solve it?
This is my code:
import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
===========================================================================
NLTK was unable to find stanford-parser.jar! Set the CLASSPATH
environment variable.
For more information, on stanford-parser.jar, see:
https://nlp.stanford.edu/software/lex-parser.shtml
import os
os.environ['CLASSPATH'] = "stanford-corenlp-full-2018-10-05/*"
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
===========================================================================
NLTK was unable to find stanford-parser.jar! Set the CLASSPATH
environment variable.
For more information, on stanford-parser.jar, see:
https://nlp.stanford.edu/software/lex-parser.shtml
os.environ['CLASSPATH'] = "stanford-corenlp-full-2018-10-05/stanford-parser-full-2018-10-17/stanford-parser.jar"
>>> dependency_parser = StanfordDependencyParser( model_path="stanford-corenlp-full-2018-10-05/stanford-parser-full-2018-10-17/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
NLTK was unable to find stanford-parser-(\d+)(.(\d+))+-models.jar!
Set the CLASSPATH environment variable.
For more information, on stanford-parser-(\d+)(.(\d+))+-models.jar, see:
https://nlp.stanford.edu/software/lex-parser.shtml
You should get the new stanfordnlp dependency parser that is native to Python!
It will run slower on the CPU than GPU, but it still should run reasonably fast.
Just run pip install stanfordnlp to install.
import stanfordnlp
stanfordnlp.download('en') # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
doc.sentences[0].print_dependencies()
There is also a helpful command line tool:
python -m stanfordnlp.run_pipeline -l en example.txt
Full details here: https://stanfordnlp.github.io/stanfordnlp/
GitHub: https://github.com/stanfordnlp/stanfordnlp

How to change the Python version in Azure Machine Learning sdk ContainerImage with CondaDependencies

I am trying to get my Faster R-CNN model into an Container Instance on ACI. For that I need my docker image to posses python version 3.5.*. I specify that in my conda yaml file, but every time I spin an instance up and docker run -it *** /bin/bash into it I see that it only has Python 3.6.7.
https://user-images.githubusercontent.com/21140767/50680590-82b20b80-1008-11e9-9bfe-4a0e71084ce0.png
How can I get my Docker image to have Python version 3.5.*? I already tried conda installing Python version 3.5.2, but that didn't work as eventually it didn't posses 3.5.2, but only 3.6.7. (dfimage lets you see the dockerfile from which the image was created, https://hub.docker.com/r/chenzj/dfimage/).
https://user-images.githubusercontent.com/21140767/50680673-d6245980-1008-11e9-9d48-71a7c150d925.png
My yaml:
name: project_environment
dependencies:
- python=3.5.2
- pip:
- matplotlib
- opencv-python==3.4.3.18
- azureml-core==1.0.6
- numpy
- cntk
- cython
channels:
- anaconda
Notebook cell:
from azureml.core.conda_dependencies import CondaDependencies
svmandss = CondaDependencies.create(python_version="3.5.2", pip_packages=[
"matplotlib",
"opencv-python==3.4.3.18",
"azureml-core",
"numpy",
"cntk",
"cython"], )
svmandss.add_channel('anaconda')
with open("fasterrcnn.yml","w") as f:
f.write(svmandss.serialize_to_string())
Another notebook cell with ContainerImage specifications.
image_config = ContainerImage.image_configuration(execution_script="score_fasterrcnn.py",runtime="python",conda_file="./fasterrcnn.yml",dependencies=listdir("utils"),docker_file="./Dockerfile")
service = Webservice.deploy_from_model(workspace=ws,
name='faster-rcnn',
deployment_config=aciconfig,
models=[Model(workspace=ws, name='Faster-RCNN')],
image_config=image_config)
service.wait_for_deployment(show_output=True)
Note
For better readability see my GitHub issue: (https://github.com/Azure/MachineLearningNotebooks/issues/163).
Currently, the version of Python is fixed to what's in Azure ML's base image, when deploying the web service. We're investigating removing this limitation in future.
Since this is one of the top Google answers when searching for "azureml python version" I'm posting the answer here. The documentation is not very clear when it comes to this, but the following will work:
from azureml.core import Workspace
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
ws = Workspace.from_config()
# This is the important part
conda_dep = CondaDependencies(conda_dependencies_file_path="pipeline/environment.yml")
aml_run_config = RunConfiguration(conda_dependencies=conda_dep)
# Define compute target - must be preconfigured in th workspace
compute_target = ws.compute_targets['my-azureml-target']
aml_run_config.target = compute_target
from azureml.pipeline.steps import PythonScriptStep
script_source_dir = "./pipeline"
step_1_script = "test.py"
step_1 = PythonScriptStep(
script_name=step_1_script,
source_directory=script_source_dir,
compute_target=compute_target,
runconfig=aml_run_config,
allow_reuse=True
)
from azureml.pipeline.core import Pipeline
# Build the pipeline
pipeline1 = Pipeline(workspace=ws, steps=[step_1])
from azureml.core import Experiment
# Submit the pipeline to be run
pipeline_run1 = Experiment(ws, 'Test-pipeline').submit(pipeline1)
pipeline_run1.wait_for_completion(show_output=True)
This assumes the following directory structure:
root/
create_pipeline.py
pipeline/
test.py
environment.yml
where create_pipeline.py is the file above, test.py is the script you would like to run and environment.yml is the conda environment file - including the python version.
I was able to change the Python version by registering the environment in Azure ML Workspace:
from azureml.core.environment import Environment, Workspace
environment = Environment.from_conda_specification(name='myenv', file_path='environment.yml')
environment.python.user_managed_dependencies = False
workspace = Workspace.from_config()
environment = environment.register(workspace=workspace)
env_build = environment.build(workspace=workspace)
Then, configure the endpoint for publishing as follows:
from azureml.core.model import InferenceConfig
environment = Environment.get(workspace=workspace, name='myenv')
inference_config = InferenceConfig(
entry_script='inference.py',
source_directory='.',
environment=environment
)
This is using Azure ML SDK 1.29.0. Perhaps this has already been fixed and the original method works as well, but I didn't test that.
EDIT:
This is no longer an issue for me. I found another way to get my code to work with python version 3.6.7.
This is however still an issue if you ask me. If in the future I do need python version 3.5 then there will not be a solution as of now.
You can still post an answer if you would like.

Kubernetes/Spring Cloud Dataflow stream > spring.cloud.stream.bindings.output.destination is ignored by producer

I'm trying to run a "Hello, world" Spring Cloud Data Flow stream based on the very simple example explained at http://cloud.spring.io/spring-cloud-dataflow/. I'm able to create a simple source and sink and run it on my local SCDF server using Kafka, so until here everything is correct and messages are produced and consumed in the topic specified by SCDF.
Now, I'm trying to deploy it in my private cloud based on the instructions listed at http://docs.spring.io/spring-cloud-dataflow-server-kubernetes/docs/current-SNAPSHOT/reference/htmlsingle/#_getting_started. Using this deployment I'm able to deploy a simple "time | log" out-of-the-box stream with no problems, but my example fails since the producer is not writing in the topic specified when the pod is created (for instance, spring.cloud.stream.bindings.output.destination=ntest33.nites-source9) but in the topic "output". I have a similar problem with the sink component, which creates and expect messages in the topic "input".
I created the stream definition using the dashboard:
nsource1 | log
And container args for the source are:
--spring.cloud.stream.bindings.output.producer.requiredGroups=ntest34
--spring.cloud.stream.bindings.output.destination=ntest34.nsource1
Code snippet for source component is
package xxxx;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.stream.annotation.EnableBinding;
import org.springframework.cloud.stream.messaging.Source;
import org.springframework.context.annotation.Bean;
import org.springframework.integration.annotation.InboundChannelAdapter;
import org.springframework.integration.core.MessageSource;
import org.springframework.messaging.support.GenericMessage;
#SpringBootApplication
#EnableBinding(Source.class)
public class HelloNitesApplication
{
public static void main(String[] args)
{
SpringApplication.run(HelloNitesApplication.class, args);
}
#Bean
#InboundChannelAdapter(value = Source.OUTPUT)
public MessageSource<String> timerMessageSource()
{
return () -> new GenericMessage<>("Hello " + new SimpleDateFormat().format(new Date()));
}
And in the logs I can see clearly
2017-04-07T09:44:34.596842965Z 2017-04-07 09:44:34,593 INFO main o.s.i.c.DirectChannel:81 - Channel 'application.output' has 1 subscriber(s).
Question is, how to override properly the topic where messages must be produced/consumed or what attribute and values to use to make this work on k8s?
UPDATE: I have the similar problem using RabbitMQ
2017-04-07T12:56:40.435405177Z 2017-04-07 12:56:40.435 INFO 7 --- [ main] o.s.integration.channel.DirectChannel : Channel 'application.output' has 1 subscriber(s).
The problem was with my docker image. I still don't know the details but using the Dockerfile indicated at https://spring.io/guides/gs/spring-boot-docker/ instantiated 2 processes in the docker container, one with the parameters, and other without, which was the one with uptime and therefore being used.
The solution was to replace
ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]
With
ENTRYPOINT [ "java", "-jar", "/app.jar" ]
And it started working. There must be a good reason why the example indicated the first entrypoint and why 2 processes were created, but the reason is still beyond my understanding.
Can you provide more details on how you set that configuration property? That feature is pretty basic, so this should work. If you are using a stream definition to set it, please update your question with the stream definition.
The channel name remains 'output' because that's what the application uses internally.

Resources