Set maximum retries in google cloud run - google-cloud-run

I am trying to set a custom number of retries if a task fails in Google Cloud Run. According to the documentation, I should use --max-retries to set the numbers of tres. I tried to set it with the following command
gcloud beta run deploy ${SAMPLE} \
--set-env-vars GOOGLE_CLOUD_PROJECT=${GOOGLE_CLOUD_PROJECT} \
--image gcr.io/${GOOGLE_CLOUD_PROJECT}/${SAMPLE} --timeout=30m --cpu 4 --memory 4Gi --concurrency 1 --execution-environment gen2 --max-retries 2
But I got an error
unrecognized arguments:
--max-retries
I documention is also mentioned that the value can be modified in the console by "Click Container, Variables, Connections, Security to expand the job properties page.", but I am also not able to find this in the console.

Sometimes flags are only available via the CLI and not via the GUI. In this case the --max-retries is in Beta hence why it's not in the GUI yet. It's possible the GUI features haven't rolledout everywhere yet

Related

Google Endpoints YAML file update: Is there a simpler method

When using Google Endpoints with Cloud Run to provide the container service, one creates a YAML file (stagger 2.0 format) to specify the paths with all configurations. For EVERY CHANGE the following is what I do (based on the documentation (https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-functions)
Step 1: Deploying the Endpoints configuration
gcloud endpoints services deploy openapi-functions.yaml \
--project ESP_PROJECT_ID
This gives me the following output:
Service Configuration [CONFIG_ID] uploaded for service [CLOUD_RUN_HOSTNAME]
Then,
Step 2: Download the script to local machine
chmod +x gcloud_build_image
./gcloud_build_image -s CLOUD_RUN_HOSTNAME \
-c CONFIG_ID -p ESP_PROJECT_ID
Then,
Step 3: Re deploy the Cloud Run service
gcloud run deploy CLOUD_RUN_SERVICE_NAME \
--image="gcr.io/ESP_PROJECT_ID/endpoints-runtime-serverless:CLOUD_RUN_HOSTNAME-CONFIG_ID" \
--allow-unauthenticated \
--platform managed \
--project=ESP_PROJECT_ID
Is this the process for every API path change? Or is there a simpler direct method of updating the YAML file and uploading it somewhere?
Thanks.
Based on the documentation, yes, this would be the process for every API path change. However, this may change in the future as this feature is currently on beta as stated on the documentation you shared.
You may want to look over here in order to create a feature request to GCP so they can improve this feature in the future.
In the meantime, I could advise to create a script for this process as it is always the same steps and doing something in bash that runs these commands would help you automatize the task.
Hope you find this useful.
When you use the default Cloud Endpoint image as described in the documentation the parameter --rollout_strategy=managed is automatically set.
You have to wait up to 1 minutes to use the new configuration. Personally it's what I observe in my deployments. Have a try on it!

kubectl set image throws error: the server doesn't have a resource type deployment"

Environment: Win 10 home, gcloud sdk v240.0 kubectl added as a gcloud sdk component, Jenkins 2.169
I am running a Jenkins pipeline in which I call a windows batch file as a post-build action.
In that batch file, I am running:
kubectl set image deployment/py-gmicro py-gmicro=%IMAGE_NAME%
I get this
error: the server doesn't have a resource type deployment
However, if I run the batch file directly from the command prompt, it works fine. Looks like it has an issue only if I run it from Jenkins.
Looked at a similar thread on stackoverflow, however that user was using bitbucket (instead of Jenkins).
Also, there was no certified answer on that thread. I cannot continue on that thread since I am not allowed to comment (50 reputation required)
Just was answered on this thread
I've had this error fixed by explicitly setting the namespace as an argument, e.g.:
kubectl set image -n foonamespace deployment/ms-userservice.....
Refrence:
https://www.mankier.com/1/kubectl-set-image#--namespace

Error while running model training in google cloud ml

I want to run model training in the cloud. I am following this link which runs a sample code to train a model based on flower dataset. The tutorial consists of 4 stages:
Set up your Cloud Storage bucket
Preprocessing training and evaluation data in the cloud
Run model training in the cloud
Deploying and using the model for prediction
I was able to complete step 1 and 2, however in step 3, job is successfully submitted but somehow error occurs and task exits with non exit status 1. Here is the log of the task
Screenshot of expanded log is:
I used following command:
gcloud ml-engine jobs submit training test${JOB_ID} \
--stream-logs \
--module-name trainer.task \
--package-path trainer\
--staging-bucket ${BUCKET_NAME} \
--region us-central1 \
--runtime-version=1.2 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*"
Thanks in advance!
Can you please confirm that the input files (eval_data_paths and train_data_paths) are not empty? Additionally if you are still having issues can you please file an issue https://github.com/GoogleCloudPlatform/cloudml-samples since its easier to handle the issue on Github.
I met the same issue and couldn't figure out, then I followed this, do it again from git clone and there was no error after running on gcs.
It is clear from your error message
The replica worker 1 exited with a non-zero status of 1. Termination reason: Error
that you have some programming error (syntax, undefined etc).
For more information, Check the return code and meaning
Return code -------------Meaning-------------- Cloud ML Engine response
0 Successful completion Shuts down and releases job resources.
1-128 Unrecoverable error Ends the job and logs the error.
Your need to find your bug first and fix it, then try again.
I recommend run your task locally (if your configuration supports) before you submit in cloud. If you find any bug, you can fix easily in your local machine.

Calling Snapshot in Jenkins results in Time out for Simulator

I am using Snapshot from the FastLane suite.
For my purposes I am calling the various tools from scripts and pass in the appropriate environment variables I am using as the inputs.
I am having trouble when I call my script in Jenkins vs from the command line. When I call the script within a Build Step in Jenkins the result is a message from Snapshot saying the process has timed out after waiting 120 seconds for the simulator to boot. If I run this same script from the terminal Snapshot runs as expected without error.
Example:
snapshot \
--workspace "MyWorkspace.xcworkspace" \
--scheme "MyScheme" \
--output_directory "MyOutputDirectory" \
--clear_previous_screenshots \
--stop_after_first_error
(--devices --languages can be found in ./Snapfile)
Snapfile:
devices([
"iPhone 4s"
])
languages([
"en-US"
])
Am I missing something here?
Configuring Jenkins to work for iOS testing and automation is not a simple task, there are a lot of gotchas.
Jenkins the result is a message from Snapshot saying the process has timed out after waiting 120 seconds for the simulator to boot.
This suggests that your Jenkins machine is not able to run the Simulator. This can happen if the jenkins user is not able to start a UI session.
These two posts have useful information on how to configure Jenkins for iOS development:
https://blog.pivotal.io/labs/labs/ios-ci-jenkins
http://staxmanade.com/2015/01/setting-jenkins-up-to-run-xctool-and-xcode-simulator-tests/
The second in particular addresses the issue of Jenkins not running as a GUI user.
Good luck.

Icinga check_jboss "NRPE: unable to read output"

I'm using Icinga to monitor some servers and services. Most of them run fine. But now I like to monitor a JBoss-AS on one server via NRPE. Therefore I'm using the check_jboss-Plugin from MonitoringExchange. Although each time I try running a test-command from my Icinga-Server via NRPE I'm getting a NRPE: unable to read output error. When I try executing the command directly on the monitored server it runs fine. It's strange that the execution on the monitored server takes around 5 seconds to return a acceptable result but the NRPE-Exceution returns immediately the error. Trying to set up the NRPE-timeout didn't solve the problem. I also checked the permissions of the check_jboss-plugin and set them to "777" so that there should be no error.
I don't think that there's a common issue with NRPE, because there are also some other checks (e.g. check_load, check_disk, ...) via NRPE and they are all running fine. The permissions of these plugins are analog to my check_jboss-Plugin.
Following one sample exceuction on the monitored server which runs fine:
/usr/lib64/nagios/plugins/check_jboss.pl -T ServerInfo -J jboss.system -a MaxMemory -w 3000: -c 2000: -f
JBOSS OK - MaxMemory is 4049076224 | MaxMemory=4049076224
Here are two command-executions via NRPE from my Icinga-Server. Both commands are correctly
./check_nrpe -H xxx.xxx.xxx.xxx -c check_hda1
DISK OK - free space: / 47452 MB (76% inode=97%);| /=14505MB;52218;58745;0;65273
./check_nrpe -H xxx.xxx.xxx.xxx -c jboss_MaxMemory
NRPE: Unable to read output
Does anyone have a hint for me? If further config-information needed please ask :)
Try to rule out SELinux either by disabling it globally or by changing the SELinux type to nagios_unconfined_plugin_exec_t.

Resources