I am creating an API for my machine learning Model using Flasgger and Flask in python.
After running my API file I am getting the below error as ‘Failed to load API documents.
Fetch error
Internal Server Error/ apispex_1.json
Below is my code :
import pickle
from flask import Flask, abort, jsonify, request
import numpy as np
import pandas as pd
from flasgger import Swagger
with open('./im.pkl', 'rb') as model_file:
model = pickle.load(model_file)
app = Flask(__name__)
swagger = Swagger(app)
#app.route('/predict')
def predict1():
"""Example
---
parameters:
-name: Days
in: query
type= number
required: true
--
--
--
"""
Days = request.args.json('Days')
prediction = model.predict(np.array([[Days]]))
return str(prediction)
if __name__ == '__main__':
app.run(port=5000, debug=True)
You got an error in docstring description:
#app.route('/predict')
def predict1():
"""Example
---
parameters:
- name: Days
in: query
type: integer
required: true
"""
Just replace type= number to type: integer
Related
I have created a Python based pipeline that contains a ParDo that leverages the Python base64 package. When I run the pipeline locally with DirectRunner, all is well. When I run the same pipeline with Dataflow on Google Cloud, it fails weith an error of:
NameError: name 'base64' is not defined [while running 'ParDo(WriteToSeparateFiles)-ptransform-47']
It seems to be missing the base64 package but I believe that to be part of standard python and always present.
Here is my complete pipeline code:
import base64 as base64
import argparse
import apache_beam as beam
import apache_beam.io.fileio as fileio
import apache_beam.io.filesystems as filesystems
from apache_beam.options.pipeline_options import PipelineOptions
class WriteToSeparateFiles(beam.DoFn):
def __init__(self, outdir):
self.outdir = outdir
def process(self, element):
writer = filesystems.FileSystems.create(self.outdir + str(element) + '.txt')
message = "This is the content of my file"
message_bytes = message.encode('ascii')
base64_bytes = base64.b64encode(message_bytes) ### Error here
writer.write(base64_bytes)
writer.close()
argv=None
parser = argparse.ArgumentParser()
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(pipeline_args)
with beam.Pipeline(options=pipeline_options) as pipeline:
outputs = (
pipeline
| beam.Create(range(10)) # Change range here to be millions if needed
| beam.ParDo(WriteToSeparateFiles('gs://kolban-edi/'))
)
outputs | beam.Map(print)
#print(outputs)
Solution
The solution has been found and is documented here
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
After much study, a section in the Google Dataflow Frequently Asked Questions was found titled: "How do I handle NameErrors?"
Reading that documentation, the suggestion was to add an extra runner parameter called --save_main_session. As soon as I added that to the execution, the problem was resolved.
I have an ML model (trained locally) in python. Previously the model has been deployed to a Windows IIS server and it's working fine.
Now, I am trying to deploy it as a service on Azure container instance (ACI) with 1 core, and 1 GB of memory. I took references from one and two Microsoft docs. The docs use SDK for all the steps, but I am using the GUI feature from the Azure portal.
After registering the model, I created an entry script and a conda environment YAML file (see below), and uploaded both to "Custom deployment asset" (at Deploy model area).
Unfortunately, after hitting deploy, the Deployment state is stuck at Transitioning state. Even after 4 hours, the state remains the same and there were no Deployment logs too, so I am unable to find what I am doing wrong here.
NOTE: below is just an excerpt of the entry script
import pandas as pd
import pickle
import re, json
import numpy as np
import sklearn
def init():
global model
global classes
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'randomForest50.pkl')
model = pickle.load(open(model_path, "rb"))
classes = lambda x : ["F", "M"][x]
def run(data):
try:
namesList = json.loads(data)["data"]["names"]
pred = list(map(classes, model.predict(preprocessing(namesList))))
return str(pred[0])
except Exception as e:
error = str(e)
return error
name: gender_prediction
dependencies:
- python
- numpy
- scikit-learn
- pip:
- pandas
- pickle
- re
- json
The issue was in the YAML file. The dependencies/libraries in the YAML should be according to conda environment. So, I changed everything accordingly, and it worked.
Modified YAML file:
name: gender_prediction
dependencies:
- python=3.7
- numpy
- scikit-learn
- pip:
- azureml-defaults
- pandas
- pickle4
- regex
- inference-schema[numpy-support]
I have a simple Dask-YARN script that does only one task: load a file from HDFS, as shown below. However, I'm running into a bug in the code, so I added a print statement in the function, but I don't see that statement being executed in the worker logs which I obtain using yarn logs -applicationId {application_id}. I even tried the method Client.get_worker_logs(), however that doesn't display the stdout as well, just shows some INFO about the worker(s). How does one obtain worker logs after the execution of the code has completed?
import sys
import numpy as np
import scipy.signal
import json
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
#dask.delayed
def load(input_file):
print("In call of Load...")
with open(input_file, "r") as fo:
data = json.load(fo)
return data
# Process input args
(_, filename) = sys.argv
dag_1 = {
'load-1': (load, filename)
}
print("Building tasks...")
tasks = dask.get(dag_1, 'load-1')
print("Creating YARN cluster now...")
cluster = YarnCluster()
print("Scaling YARN cluster now...")
cluster.scale(1)
print("Creating Client now...")
client = Client(cluster)
print("Getting logs..1")
print(client.get_worker_logs())
print("Doing Dask computations now...")
dask.compute(tasks)
print("Getting logs..2")
print(client.get_worker_logs())
print("Shutting down cluster now...")
cluster.shutdown()
I'm not sure what's going on here, print statements should (and usually do) end up in the log files stored by yarn.
If you want your debug statements to appear in the worker logs from get_worker_logs, you can use the worker logger directly:
from distributed.worker import logger
logger.info("This will show up in the worker logs")
I am writing a dataflow pipeline that processes videos from a google cloud bucket. My pipeline downloads each work item to the local system and then reuploads results back to GCP bucket. Following previous question.
The pipeline works on local DirectRunner, i'm having trouble debugging on DataFlowRunnner.
The error reads
File "run_clouddataflow.py", line 41, in process
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/blob.py", line 464, in download_to_file self._do_download(transport, file_obj, download_url, headers)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/blob.py", line 418, in _do_download download.consume(transport) File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/download.py", line 101, in consume self._write_to_stream(result)
File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/download.py", line 62, in _write_to_stream with response: AttributeError: __exit__ [while running 'Run DeepMeerkat']
When trying to execute blob.download_to_file(file_obj) within:
storage_client=storage.Client()
bucket = storage_client.get_bucket(parsed.hostname)
blob=storage.Blob(parsed.path[1:],bucket)
#store local path
local_path="/tmp/" + parsed.path.split("/")[-1]
print('local path: ' + local_path)
with open(local_path, 'wb') as file_obj:
blob.download_to_file(file_obj)
print("Downloaded" + local_path)
I'm guessing that the workers are not in permission to write locally? Or perhaps there is not a /tmp folder in the dataflow container. Where should I write objects? Its hard to debug without access to the environment. Is it possible to access stdout from workers for debugging purposes (serial console?)
EDIT #1
I've tried explicitly passing credentials:
try:
credentials, project = google.auth.default()
except:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = known_args.authtoken
credentials, project = google.auth.default()
as well as writing to cwd(), instead of /tmp/
local_path=parsed.path.split("/")[-1]
print('local path: ' + local_path)
with open(local_path, 'wb') as file_obj:
blob.download_to_file(file_obj)
Still getting the cryptic error on blob downloads from gcp.
Full Pipeline script is below, setup.py is here.
import logging
import argparse
import json
import logging
import os
import csv
import apache_beam as beam
from urlparse import urlparse
from google.cloud import storage
##The namespaces inside of clouddataflow workers is not inherited ,
##Please see https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors, better to write ugly import statements then to miss a namespace
class PredictDoFn(beam.DoFn):
def process(self,element):
import csv
from google.cloud import storage
from DeepMeerkat import DeepMeerkat
from urlparse import urlparse
import os
import google.auth
DM=DeepMeerkat.DeepMeerkat()
print(os.getcwd())
print(element)
#try adding credentials?
#set credentials, inherent from worker
credentials, project = google.auth.default()
#download element locally
parsed = urlparse(element[0])
#parse gcp path
storage_client=storage.Client(credentials=credentials)
bucket = storage_client.get_bucket(parsed.hostname)
blob=storage.Blob(parsed.path[1:],bucket)
#store local path
local_path=parsed.path.split("/")[-1]
print('local path: ' + local_path)
with open(local_path, 'wb') as file_obj:
blob.download_to_file(file_obj)
print("Downloaded" + local_path)
#Assign input from DataFlow/manifest
DM.process_args(video=local_path)
DM.args.output="Frames"
#Run DeepMeerkat
DM.run()
#upload back to GCS
found_frames=[]
for (root, dirs, files) in os.walk("Frames/"):
for files in files:
fileupper=files.upper()
if fileupper.endswith((".JPG")):
found_frames.append(os.path.join(root, files))
for frame in found_frames:
#create GCS path
path="DeepMeerkat/" + parsed.path.split("/")[-1] + "/" + frame.split("/")[-1]
blob=storage.Blob(path,bucket)
blob.upload_from_filename(frame)
def run():
import argparse
import os
import apache_beam as beam
import csv
import logging
import google.auth
parser = argparse.ArgumentParser()
parser.add_argument('--input', dest='input', default="gs://api-project-773889352370-testing/DataFlow/manifest.csv",
help='Input file to process.')
parser.add_argument('--authtoken', default="/Users/Ben/Dropbox/Google/MeerkatReader-9fbf10d1e30c.json",
help='Input file to process.')
known_args, pipeline_args = parser.parse_known_args()
#set credentials, inherent from worker
try:
credentials, project = google.auth.default()
except:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = known_args.authtoken
credentials, project = google.auth.default()
p = beam.Pipeline(argv=pipeline_args)
vids = (p|'Read input' >> beam.io.ReadFromText(known_args.input)
| 'Parse input' >> beam.Map(lambda line: csv.reader([line]).next())
| 'Run DeepMeerkat' >> beam.ParDo(PredictDoFn()))
logging.getLogger().setLevel(logging.INFO)
p.run()
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
I spoke to the google-cloud-storage package mantainer, this was a known issue. Updating specific versiosn in my setup.py to
REQUIRED_PACKAGES = ["google-cloud-storage==1.3.2","google-auth","requests>=2.18.0"]
fixed the issue.
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/3836
import httplib2
import json
import apiclient.discovery # pip install google-api-python-client
# XXX: Enter any person's name
Q = "Vamsi"
# XXX: Enter in your API key from https://code.google.com/apis/console
API_KEY = ''
service = apiclient.discovery.build('plus', 'v1', http=httplib2.Http(),
developerKey=API_KEY)
people_feed = service.people().search(query=Q).execute()
print json.dumps(people_feed['items'], indent=1)