"unrecognized arguments" error while executing a Dataflow job with gcloud cli - google-cloud-dataflow

I have created a job in Dataflow UI and it works fine. Now I want to automate it from the command line with a small bash script:
#GLOBAL VARIABLES
export PROJECT="cf-businessintelligence"
export GCS_LOCATION="gs://dataflow-templates/latest/Jdbc_to_BigQuery"
export MAX_WORKERS="15"
export NETWORK="businessintelligence"
export REGION_ID="us-central1"
export STAGING_LOCATION="gs://dataflow_temporary_directory/temp_dir"
export SUBNETWORK="bidw-dataflow-usc1"
export WORKER_MACHINE_TYPE="n1-standard-96"
export ZONE="us-central1-a"
export JOBNAME="test"
#COMMAND
gcloud dataflow jobs run $JOBNAME --project=$PROJECT --gcs-location=$GCS_LOCATION \
--max-workers=$MAX_WORKERS \
--network=$NETWORK \
--parameters ^:^query="select current_date":connectionURL="jdbc:mysql://mysqldbhost:3306/bidw":user="xyz",password="abc":driverClassName="com.mysql.jdbc.Driver":driverJars="gs://jdbc_drivers/mysql-connector-java-8.0.16.jar":outputTable="cf-businessintelligence:bidw.mytest":tempLocation="gs://dataflow_temporary_directory/tmp" \
--region=$REGION_ID \
--staging-location=$STAGING_LOCATION \
--subnetwork=$SUBNETWORK \
--worker-machine-type=$WORKER_MACHINE_TYPE \
--zone=$ZONE
When I run it, it fails with the following error:
ERROR: (gcloud.dataflow.jobs.run) unrecognized arguments:
--network=businessintelligence
Following the instructions in gcloud topic escaping , I believe I correctly escaped my parameters so I am really confused. Why is failing on the NETWORK parameter?

Try getting help for your command, to see which options are currently accepted by it:
gcloud dataflow jobs run --help
For me, this displays a number of options, but not the --network option.
I then checked the beta channel:
gcloud beta dataflow jobs run --help
And it does display the --network option. So you'll want to launch your job with gcloud beta dataflow....

Both the network and subnetwork arguments need to be the complete URL.
Source: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications
Example for the subnetwork flag:
https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION_NAME/subnetworks/SUBNETWORK_NAME

Related

gcloud run deploy keeps erroring when I add args: '--args: expected one argument'

I am trying to run gcloud run deploy with the following parameters:
gcloud run deploy "$SERVICE_NAME" \
--quiet \
--region "$RUN_REGION" \
--image "gcr.io/$PROJECT_ID/$SERVICE_NAME:$GITHUB_SHA" \
--platform "managed" \
--allow-unauthenticated \
--args "--privileged"
but I keep getting the following error when I add anything to args whatsoever:
ERROR: (gcloud.run.deploy) argument --args: expected one argument
I am obviously using the args parameter incorrectly but for the life of me I can't figure out why. The example in the docs uses it exactly as I have done.
What am I missing?
EDIT:
Even the example from the docs doesn't work, and returns the same error:
gcloud run deploy \
--args "--repo-allowlist=github.com/example/example_demo" \
--args "--gh-webhook-secret=XX" \
So, I finally got it working. I'm not sure why I needed to add an = as that wasn't specified in the docs, but here's the solution:
--args="--privileged"

Mozilla Sops fails to decrypt when triggered from Jenkins

I'm trying to use SOPS to decrypt a file using Jenkins, using this command:
sops -k -d mysecret.yaml > out
But then I get this output and it waits till forever:
Vim: Warning: Output is not to a terminal
I've tried to export some env. vars, but I ended up with the same result
export TERM=xterm-256color
export EDITOR="/usr/bin/vim"
Can anyone please explain to me why that happens?
Update:
By using sops -k -d mysecret.yaml --output OUT with the above env. vars and now I can see file being decrypted but still vim process is not finish and the task stays forever.
I shouldn't use -k as a command argument after removing KMS ARN and replacing it with export SOPS_KMS_ARN="arn:aws:kms:us-east-1:xxxxxx:key/xxx-xxxx-xxxxx"
Correct command is:
export SOPS_KMS_ARN="arn:aws:kms:us-east-1:xxxxxx:key/xxx-xxxx-xxxxx
sops -d rsi-tls-cert.yaml | kubectl apply -f -

Auto-create Rundeck jobs on startup (Rundeck in Docker container)

I'm trying to setup Rundeck inside a Docker container. I want to use Rundeck to provision and manage my Docker fleet. I found an image which ships an ansible-plugin as well. So far running simple playbooks and auto-discovering my Pi nodes work.
Docker script:
echo "[INFO] prepare rundeck-home directory"
mkdir ../../target/work/home
mkdir ../../target/work/home/rundeck
mkdir ../../target/work/home/rundeck/data
echo -e "[INFO] copy host inventory to rundeck-home"
cp resources/inventory/hosts.ini ../../target/work/home/rundeck/data/inventory.ini
echo -e "[INFO] pull image"
docker pull batix/rundeck-ansible
echo -e "[INFO] start rundeck container"
docker run -d \
--name rundeck-raspi \
-p 4440:4440 \
-v "/home/sebastian/work/workspace/workspace-github/raspi/target/work/home/rundeck/data:/home/rundeck/data" \
batix/rundeck-ansible
Now I want to feed the container with playbooks which should become jobs to run in Rundeck. Can anyone give me a hint on how I can create Rundeck jobs (which should invoke an ansible playbook) from the outside? Via api?
One way I can think of is creating the jobs manually once and exporting them as XML or YAML. When the container and Rundeck is up and running I could import the jobs automatically. Is there a certain folder in rundeck-home or somewhere where I can put those files for automatic import? Or is there an API call or something?
Could Jenkins be more suited for this task than Rundeck?
EDIT: just changed to a Dockerfile
FROM batix/rundeck-ansible:latest
COPY resources/inventory/hosts.ini /home/rundeck/data/inventory.ini
COPY resources/realms.properties /home/rundeck/etc/realms.properties
COPY resources/tokens.properties /home/rundeck/etc/tokens.properties
# import jobs
ENV RD_URL="http://localhost:4440"
ENV RD_TOKEN="yJhbGciOiJIUzI1NiIs"
ENV rd_api="36"
ENV rd_project="Test-Project"
ENV rd_job_path="/home/rundeck/data/jobs"
ENV rd_job_file="Ping_Nodes.yaml"
# copy job definitions and script
COPY resources/jobs-definitions/Ping_Nodes.yaml /home/rundeck/data/jobs/Ping_Nodes.yaml
RUN curl -kSsv --header "X-Rundeck-Auth-Token:$RD_TOKEN" \
-F yamlBatch=#"$rd_job_path/$rd_job_file" "$RD_URL/api/$rd_api/project/$rd_project/jobs/import?fileformat=yaml&dupeOption=update"
Do you know how I can delay the curl at the end until after the rundeck service is up and running?
That's right you can design an script with an API call using cURL (pointing to your Docker instance) after deploying your instance (a script that deploys your instance and later import the jobs), I leave a basic example (in this example you need the job definition in XML format).
For XML job definition format:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="qNcao2e75iMf1PmxYfUJaGEzuVOIW3Xz"
# specific api call info
rdeck_project="ProjectEXAMPLE"
rdeck_xml_file="HelloWorld.xml"
# api call
curl -kSsv --header "X-Rundeck-Auth-Token:$rdeck_token" \
-F xmlBatch=#"$rdeck_xml_file" "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$rdeck_project/jobs/import?fileformat=xml&dupeOption=update"
For YAML job definition format:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="qNcao2e75iMf1PmxYfUJaGEzuVOIW3Xz"
# specific api call info
rdeck_project="ProjectEXAMPLE"
rdeck_yml_file="HelloWorldYML.yaml"
# api call
curl -kSsv --header "X-Rundeck-Auth-Token:$rdeck_token" \
-F xmlBatch=#"$rdeck_yml_file" "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$rdeck_project/jobs/import?fileformat=yaml&dupeOption=update"
Here the API call.

Getting authentication error while creating a Dataflow template pipeline with beam 2.0

Please refer to this link on how to run Java based Cloud Dataflow - https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven
Created template using below command
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.16.0 \
-DgroupId=org.example \
-DartifactId=word-count-beam \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
and then To run the job using the DataflowRunner executed the below command
mvn -Pdataflow-runner compile exec:java \
-Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--project=<PROJECT_ID> \
--stagingLocation=gs://<STORAGE_BUCKET>/staging/ \
--output=gs://<STORAGE_BUCKET>/output \
--runner=DataflowRunner"
But when trying to run above command getting following error
java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
As it was said in a comment, to work off java.lang.RuntimeException error, there is a need to follow before you begin steps for Java and Apache Maven before running Dataflow jobs. Steps include:
Setting up authentication and the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains service account key
Creating a Cloud Storage bucket
Installing the Java Development Kit (JDK) and Apache Maven. Verify that the JAVA_HOME environment variable is set and points to JDK installation.

Error: "argument --job-dir: expected one argument" while training model using AI Platform on GCP

Running macOS Mojave.
I am following the official getting started documentation to run a model using AI platform.
So far I managed to train my model locally using:
# This is similar to `python -m trainer.task --job-dir local-training-output`
# but it better replicates the AI Platform environment, especially
# for distributed training (not applicable here).
gcloud ai-platform local train \
--package-path trainer \
--module-name trainer.task \
--job-dir local-training-output
I then proceed to train the model using AI platform by going through the following steps:
Setting environment variables export JOB_NAME="my_first_keras_job" and export JOB_DIR="gs://$BUCKET_NAME/keras-job-dir".
Run the following command to package the trainer/ directory:
Command as indicated in docs:
gcloud ai-platform jobs submit training $JOB_NAME \
--package-path trainer/ \
--module-name trainer.task \
--region $REGION \
--python-version 3.5 \
--runtime-version 1.13 \
--job-dir $JOB_DIR \
--stream-logs
I get the error:
ERROR: (gcloud.ai-platform.jobs.submit.training) argument --job-dir:
expected one argument Usage: gcloud ai-platform jobs submit training
JOB [optional flags] [-- USER_ARGS ...] optional flags may be
--async | --config | --help | --job-dir | --labels | ...
As far as I understand --job-dir: does indeed have one argument.
I am not sure what I'm doing wrong. I am running the above command from the trainer/ directory as is shown in the documentation. I tried removing all spaces as described here but the error persists.
Are you running this command locally? Or on a AI notebook VM in Jupyter? Based on your details I assume youre running it locally, I am not a mac expert, but hopefully this is helpful.
I just worked through the same error on an AI notebook VM and my issue was that even though I assigned it a value in a previous Jupyter cell, the $JOB_NAME variable was passing along an empty string in the gcloud command. Try running the following to make sure your code is actually passing a value for $JOB_DIR when you are making the gcloud ai-platform call.
echo $JOB_DIR

Resources