Spacy-Transformers: Access GPT-2? - machine-learning

I'm using Spacy-Transformers to build some NLP models.
The Spacy-Transformers docs say:
spacy-transformers
spaCy pipelines for pretrained BERT, XLNet and GPT-2
The sample code on that page shows:
import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("Apple shares rose on the news. Apple pie is delicious.")
Based on what I've learned from this video,"en_core_web_trf" appears to be the spacy.load() package to use a BERT model. I've searched the Spacy-Transformers docs and haven't yet seen an equivalent package, to access GPT-2. Is there a specific spacy.load() package, to load in order to use a GPT-2 model?

The en_core_web_trf uses a specific Transformers model, but you can specify arbitrary ones using the TransformerModel wrapper class from spacy-transformers. See the docs for that. An example config:
[model]
#architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base" # this can be the name of any hub model
tokenizer_config = {"use_fast": true}

Related

How can I start using MLFlow on databricks with an existing trained model?

I have an existing model that was trained on Azure. I want to fully integrate and start using the model on Databricks. Whats the best way to do this? How can I successfully load the model into databricks model workflow? I have the model in a pickle file
I have read almost all the documentation on databricks, but 99% of it is regarding new models trained on databricks and never about importing existing models.
Since MLFlow has a standardized model storage format, you just need to bring over the model files and start using them with the MLFlow package. In addition, you can register the model to the workspace's model registry using mlflow.register_model() and then use it from there. These would be the steps:
On the AzureML side, I assume that you have an MLFlow model saved to disk (using mlflow.sklearn.save_model() or mlflow.sklearn.autolog -- or some other mlflow.<flavor>). That should give you a folder that contains an MLModel file, and, depending on the flavor of the model a few more files -- like the below:
mlflow-model
├── MLmodel
├── conda.yaml
├── model.pkl
└── requirements.txt
Note: You can download the model from the AzureML Workspace using the
v2 CLI like so: az ml model download --name <model_name> --version <model_version>
Open a Databricks Notebook and make sure it has mlflow installed
%pip install mlflow
Upload the MLFlow model files to the dbfs connected to the cluster
In the Notebook, register the model using MLFlow (adjust the dbfs: path to the location where the model was uploaded to).
import mlflow
model_version = mlflow.register_model("dbfs:/FileStore/shared_uploads/mlflow-model/", "AzureMLModel")
Now your model is registered in the Workspace's model registry like any model that was created from a Databricks session. So, you can access it from the registry like so:
model = mlflow.pyfunc.load_model(f"models:/AzureMLModel/{model_version.version}")
input_example = {
"sepal_length": [5.1,4.8],
"sepal_width": [3.5,4.4],
"petal_length": [1.4,2.0],
"petal_width": [0.2,0.1]
}
model.predict(input_example)
Or use the model as a spark_udf:
import pandas as pd
model_udf = mlflow.pyfunc.spark_udf(spark=spark, model_uri=f"models:/AzureMLModel/{model_version.version}", result_type='string' )
spark_df = spark.createDataFrame(pd.DataFrame(input_example))
spark_df = spark_df.withColumn('foo', model_udf())
display(spark_df)
Note that I am using mlflow.pyfunc to load the model since every
MLFlow model needs to support the pyfunc flavor. That way, you don't
need to worry about the native flavor of the model.
If your source model is already in a MLflow tracking server.
https://github.com/mlflow/mlflow-export-import
If your source model was not trained in MLflow.
How do I create an MLflow run from a model I have trained elsewhere?
https://github.com/amesar/mlflow-resources/blob/master/MLflow_FAQ.md#how-do-i-create-an-mlflow-run-from-a-model-i-have-trained-elsewhere

neo compilation job failed on Yolov5/v7 model

I was trying to use AWS SageMaker Neo compilation to convert a yolo model(trained with our custom data) to a coreml format, but got an error on input config:
ClientError: InputConfiguration: Unable to determine the type of the model, i.e. the source framework. Please provide the value of argument "source", from one of ["tensorflow", "pytorch", "mil"]. Note that model conversion requires the source package that generates the model. Please make sure you have the appropriate version of source package installed.
Seems Neo cannot recognize the Yolo model, is there any special requirements to the model in AWS SageMaker Neo?
I've tried both latest yolov7 model and yolov5 model, and both pt and pth file extensions, but still get the same error. Seems Neo cannot recognize the Yolo model. I also tried to downgrade pytorch version to 1.8, still not working.
But when I use the yolov4 model from this tutorial post: https://aws.amazon.com/de/blogs/machine-learning/speed-up-yolov4-inference-to-twice-as-fast-on-amazon-sagemaker/, it works fine.
Any idea if Neo compilation can work with Yolov7/v5 model?

Save tensorflow 2.0 model and use them in opencv 4

I currently code my models with tensorflow 2.0 and I want to run them with opencv 4 (I want to compare performance). But I can't find a way to convert my tensorflow model for opencv.
For running in opencv I want to use:
cv2.dnn.readNetFromTensorflow('saved_model.pb', 'saved_model.pbtxt')
but when I save my model with:
model.save('./')
I obtain this files:
saved_model.pb | variables/variables.index | variables/variables.data-00000-of-00002 |variables/variables.data-00001-of-00002
I have a my .pb but not my .pbtxt. How it is possible to write this file ? According to opencv documentation this file is the text graph definition. I already try to write a .pbtxt with
model.to_json()
but it didn't work :/
Do you have any ideas ?
Thanks in advance !
Tanguy
Additionally, OpenCV requires an extra configuration file based on the
.pb, the .pbtxt. It is possible to import your own models and
generate your own .pbtxt files by using one of the following files
from the OpenCV Github repository...
Here is a link to tutorial: https://jeanvitor.com/tensorflow-object-detecion-opencv/
Haven't tried it myself, but seems legit.
For example tf_text_graph_ssd.py does job done.

Where to get the Model for EdgeBoxes in OpenCV Python

The link below provides the python implementation for edgeboxes:
https://github.com/opencv/opencv_contrib/blob/96ea9a0d8a2dee4ec97ebdec8f79f3c8a24de3b0/modules/ximgproc/samples/edgeboxes_demo.py
However, I do not understand this part:
model = sys.argv[1]
I want to know from where can I get this model?
model = sys.argv[1]
means that the first argument passed when you call this script from shell it's the model.
Usage:
edgeboxes_demo.py [<model>] [<input_image>]
You can use the example model provided in the opencv extra repository

How can I deploy a scikit learn python model with Watson Studio and Machine Learning?

Suppose that I already have a scikit-learn model and I want to save this to my Watson Machine Learning and deploy it using the python client.
The python client docs: http://wml-api-pyclient.mybluemix.net
I have like:
clf = svm.SVC(kernel='rbf')
clf.fit(train_data, train_labels)
# Evaluate your model.
predicted = clf.predict(test_data)
What I want to do is to deploy this model as a web service accessible via REST API.
I read in the Watson Machine Learning Documentation here: https://dataplatform.cloud.ibm.com/docs/content/analyze-data/wml-ai.html?audience=wdp&context=analytics
but I'm having trouble when deploying the model.
You can also deploy it as a python function. what you need is to wrap all your functionalities into a single deployable function (learn python closure).
The way you use the credential is the same in this Method.
Step 1 : Define the function
Step 2 : Store the function in the repository
after that, you can deploy it and access by two ways
using the Python client
using the REST API
This has been explained in detail in this see this post
With scikit learn model, Watson Machine Learning expects a pipeline object instead of just a fit model object. This is so that you also deploy the data transformation and preprocessing logic to the same endpoint. For example, try changing your code to:
scaler = preprocessing.StandardScaler()
clf = svm.SVC(kernel='rbf')
pipeline = Pipeline([('scaler', scaler), ('svc', clf)])
model = pipeline.fit(train_data, train_labels)
Then you will be able to deploy the model by following the docs here:http://wml-api-pyclient.mybluemix.net/#deployments
From your Notebook in Watson Studio, you can just
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
"url": "https://ibm-watson-ml.mybluemix.net",
"username": "*****",
"password": "*****",
"instance_id": "*****"
}
client = WatsonMachineLearningAPIClient(wml_credentials)
and then use the client to deploy the model after saving the model first to the repository.
You can see how to accomplish all of this in this tutorial notebook: https://dataplatform.cloud.ibm.com/exchange/public/entry/view/168e65a9e8d2e6174a4e2e2765aa4df1
from the Community

Resources