Running Google Dataflow locally for Image Recognition - machine-learning

I am currently following this tutorial for transfer learning using tensorflow and Google Cloud Platform.
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
It works perfectly on the cloud with my own data when I use their sample code
# Preprocess the eval set.
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
--output_path "${GCS_PATH}/preproc/eval" \
--cloud
I get all the preprocessing, training and deployment done.
However, I would like to be able to run it locally so that I can make changes in the code and debug it more effieciently:
In the code it states that
To run this pipeline locally run the above command without --cloud.
So it would read:
# Preprocess the eval set.
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
--output_path "${GCS_PATH}/preproc/eval"
I tried to run this code with input_dict, input_path and output_path set to a cloud storage path, as well as being a path to a file on my local machine.
However I get the error:
tensorflow/core/platform/cloud/google_auth_provider.cc:151] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Unavailable: libcurl failed with error code 23: Failed writing body (91 != 196)". Retrieving token from GCE failed with "Unavailable: Unexpected response code 0".
So it seems to be an authentication issue:
The thing is that I do not have authentication problems when copying files from google cloud storage manually.
I already tried:
$ gcloud auth application-default login
but it doesn't change anything.
Does anyone have a solution for that?

Related

twilio. An error occurs when deploying with plugin

If you deploy using the command below,
twilio flex:plugins:deploy --changelog='first deploy'
The following error will occur.
I don't understand the meaning of the path pointed to by resource.
Error code 20404 from Twilio: The requested resource /Services/ZSXXXXXXXXXXXXXXXXXXXXXXX/Environments was not found. See https://www.twilio.com/docs/errors/20404 for more info.
This is the first deployment that has not been deployed yet.
What should i do?
twilio serverless:deploy
Using the above command, functions and assets are deployed on a serverless basis.
At that time, I have deleted the Services of functions and assets that existed by default.
Is this default Services relevant for plugins?
Also, if it is related, where is the part to reset in the plug-in?
When I contacted support, they told me to run the reset command.
curl https://flex-api.twilio.com/v1/Configuration \
-H "Content-Type: application/json" \
-d '{"account_sid":"ACCOUNT_SID", "serverless_service_sids": []}' \
-u ACCOUNT_SID:AUTH_TOKEN
If you deploy again after executing the above command
I was able to deploy without problems.

How to fail a AppCenter build on script exit code?

I've got a bunch of scripts that get called when the appcenter-pre-build.sh is called. For example, one of them is a simple check to see if the current branch tag already exists on the repository.
#!/usr/bin/env bash
set -e # Exit immediately if a command exits with a non-zero status (failure)
# 1 Fetch tags
git fetch --tags
# See if the tag exists
if git tag --list | egrep -q "^$VERSION_TAG$"
then
echo "Error: Found tag. Exiting."
exit 1
else
git tag $VERSION_TAG
git push origin $VERSION_TAG
fi
If the tag is found, I want to abort the build in AppCenter and fail it. This worked perfectly fine when I was running everything through Xcode Server but for some reason, I cannot figure out how to abort the build upon failure of my script. I'm not seeing much documentation on this particular subject and the AppCenter folk over at Microsoft are taking their sweet time getting back to me.
Anyone have experience with this and/or know how to fail an AppCenter build from their scripts? Thanks in advance for your thoughts!
Okay, figured it out. Looks like sending a curl request to cancel the build using the env variable "$APPCENTER_BUILD_ID" takes care of the issue. Exiting your script with a non-zero is NOT working inside AppCenter.
Here's a sample of what to do. I just put it in a special "cancelAppCenterBuild.sh" script and called it in place of my exits.
API_TOKEN="<YourAppToken>"
OWNER_NAME="<YourOwnerOrOrganizationName>"
APP_NAME="<YourAppName>"
curl -iv "https://appcenter.ms/api/v0.1/apps/$OWNER_NAME/$APP_NAME/builds/$APPCENTER_BUILD_ID" \
-X PATCH \
-d "{\"status\":\"cancelling\"}" \
--header 'Content-Type: application/json' \
--header "X-API-Token: $API_TOKEN"
Pro tip: If you've ever renamed your app, AppCenter servers are having issues referencing the new name. I was getting a 403 with a forbidden message. You might have to change your app name to whatever the original name was or just rebuild the app from scratch within AppCenter.

Google Endpoints YAML file update: Is there a simpler method

When using Google Endpoints with Cloud Run to provide the container service, one creates a YAML file (stagger 2.0 format) to specify the paths with all configurations. For EVERY CHANGE the following is what I do (based on the documentation (https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-functions)
Step 1: Deploying the Endpoints configuration
gcloud endpoints services deploy openapi-functions.yaml \
--project ESP_PROJECT_ID
This gives me the following output:
Service Configuration [CONFIG_ID] uploaded for service [CLOUD_RUN_HOSTNAME]
Then,
Step 2: Download the script to local machine
chmod +x gcloud_build_image
./gcloud_build_image -s CLOUD_RUN_HOSTNAME \
-c CONFIG_ID -p ESP_PROJECT_ID
Then,
Step 3: Re deploy the Cloud Run service
gcloud run deploy CLOUD_RUN_SERVICE_NAME \
--image="gcr.io/ESP_PROJECT_ID/endpoints-runtime-serverless:CLOUD_RUN_HOSTNAME-CONFIG_ID" \
--allow-unauthenticated \
--platform managed \
--project=ESP_PROJECT_ID
Is this the process for every API path change? Or is there a simpler direct method of updating the YAML file and uploading it somewhere?
Thanks.
Based on the documentation, yes, this would be the process for every API path change. However, this may change in the future as this feature is currently on beta as stated on the documentation you shared.
You may want to look over here in order to create a feature request to GCP so they can improve this feature in the future.
In the meantime, I could advise to create a script for this process as it is always the same steps and doing something in bash that runs these commands would help you automatize the task.
Hope you find this useful.
When you use the default Cloud Endpoint image as described in the documentation the parameter --rollout_strategy=managed is automatically set.
You have to wait up to 1 minutes to use the new configuration. Personally it's what I observe in my deployments. Have a try on it!

Getting Dataflowrunner with --experiments=upload_graph to work

I have a pipeline that produces a dataflow graph (serialized JSON representation) that exceeds the allowable limit for the API, and thus cannot be launched via the dataflow runner for apache beam as one would normally do. And running dataflow runner with the instructed parameter --experiments=upload_graph does not work and fails saying there are no steps specified .
When getting notified about this size problem via an error, the following information is provided:
the size of the serialized JSON representation of the pipeline exceeds the allowable limit for the API.
Use experiment 'upload_graph' (--experiments=upload_graph)
to direct the runner to upload the JSON to your
GCS staging bucket instead of embedding in the API request.
Now using this parameter, does indeed result in dataflow runner uploading an additional dataflow_graph.pb file to the staging location beside the usual pipeline.pb file. Which I verified actually exists in gcp storage.
However the job in gcp dataflow then immediately fails after start with the following error:
Runnable workflow has no steps specified.
I've tried this flag with various pipelines, even apache beam example pipelines and see the same behaviour.
This can be reproduced by using word count example:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.11.0 \
-DgroupId=org.example \
-DartifactId=word-count-beam \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
cd word-count-beam/
Running it without the experiments=upload_graph parameter works:
(make sure to specify your project, and buckets if you want to run this)
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
--gcpTempLocation=gs://<your-gcs-bucket>/tmp \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-Pdataflow-runner
Running it with the experiments=upload_graph results in pipe failing with message workflow has no steps specified
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
--gcpTempLocation=gs://<your-gcs-bucket>/tmp \
--experiments=upload_graph \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-Pdataflow-runner
Now I would expect that dataflow runner would direct gcp dataflow to read the steps from the bucket specified as seen in the source code:
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L881
However this seems not to be the case. Has anyone gotten this to work, or has found some documentation regarding this feature that can point me in the right direction?
The experiment has since been reverted and the messaging will be corrected in Beam 2.13.0
Revert PR
I recently ran into this issue and the solution was quite silly. I had quite a complex dataflow streaming job developed and it was working fine and the next day stopped working with error "Runnable workflow has no steps specified.". In my case, someone specified pipeline().run().waitUntilFinish() twice after creating options and due to that, I was getting this error. Removing the duplicate pipeline run resolved the issue. I still think there should be some useful error trace by beam/dataflowrunner in this scenario.

Error while running model training in google cloud ml

I want to run model training in the cloud. I am following this link which runs a sample code to train a model based on flower dataset. The tutorial consists of 4 stages:
Set up your Cloud Storage bucket
Preprocessing training and evaluation data in the cloud
Run model training in the cloud
Deploying and using the model for prediction
I was able to complete step 1 and 2, however in step 3, job is successfully submitted but somehow error occurs and task exits with non exit status 1. Here is the log of the task
Screenshot of expanded log is:
I used following command:
gcloud ml-engine jobs submit training test${JOB_ID} \
--stream-logs \
--module-name trainer.task \
--package-path trainer\
--staging-bucket ${BUCKET_NAME} \
--region us-central1 \
--runtime-version=1.2 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*"
Thanks in advance!
Can you please confirm that the input files (eval_data_paths and train_data_paths) are not empty? Additionally if you are still having issues can you please file an issue https://github.com/GoogleCloudPlatform/cloudml-samples since its easier to handle the issue on Github.
I met the same issue and couldn't figure out, then I followed this, do it again from git clone and there was no error after running on gcs.
It is clear from your error message
The replica worker 1 exited with a non-zero status of 1. Termination reason: Error
that you have some programming error (syntax, undefined etc).
For more information, Check the return code and meaning
Return code -------------Meaning-------------- Cloud ML Engine response
0 Successful completion Shuts down and releases job resources.
1-128 Unrecoverable error Ends the job and logs the error.
Your need to find your bug first and fix it, then try again.
I recommend run your task locally (if your configuration supports) before you submit in cloud. If you find any bug, you can fix easily in your local machine.

Resources