How to change default network with Google-Provided Dataflow Template - google-cloud-dataflow

I am trying to setup a Dataflow job using the google provided template PubSub to BigQuery. However I am getting this error on start up:
Message: The resource 'projects/my-project/global/networks/default' was not found
I think the google provided template is hardcoded to use the default network. The error goes away if I create a default network in automatic mode. But we can't have a default network in production.
The document here mentions a network parameter. I tried adding an additional parameter called network from GCP console UI passing in our custom network name. But I am getting this error:
The template parameters are invalid.
Is there any way I can tell the Google provided Dataflow template to use my custom network(created in manual mode) instead of the default? What are my options here?
Appreciate all the help!

This is not currently supported for Dataflow pipelines created from a template. For now, you can either run the template in the default VPC network, or submit a Dataflow pipeline using the Java or Python SDK and specify the network pipeline option.

You can use gcloud beta command gcloud beta dataflow jobs run as explained here in gcloud beta dataflow jobs run.
It supports more parameters like [--network=NETWORK] and [--subnetwork=SUBNETWORK] which are useful for your usecase.

Related

Call an API in a Groovy script with sandbox activated

I'm trying to load a list that is returned from an API into a Jenkins parameter. For this I'm using uno-choice plugin and executing a Groovy script, because I want the parameter to appear depending on other existing parameters of the Jenkins job.
I have seen other plugins like http-request and rest-list-parameter but they don't work for me because the API response depends of the other parameters in the Jenkins job.
I have tested the solutions provided here and I was able to achieve what I wanted with sandbox deactivated when testing locally, but the Jenkins that my team uses has the sandbox activated and when I try to run the script again it fails without information or saying:
Failed to evaluate script: Scripts not permitted to use staticMethod
Requesting my company to allow me to run the script is not a solution either.
Is it possible to call an API from groovy scripts with the sandbox activated?

Is there a way to update a Dataflow job using the gcloud command?

I am trying to write a script to automate the deployment of a Java Dataflow job. The script creates a template and then uses the command
gcloud dataflow jobs run my-job --gcs-location=gs://my_bucket/template
The issue is, I want to update the job if the job already exists and it's running. I can do the update if I run the job via maven, but I need to do this via gcloud so I can have a service account for deployment and another one for running the job. I tried different things (adding --parameters update to the command line), but I always get an error. Is there a way to update a Dataflow job exclusively via gcloud dataflow jobs run?
Referring to the official documentation, which describes gcloud beta dataflow jobs - a group of subcommands for working with Dataflow jobs, there is no possibility to use gcloud for update the job.
As for now, the Apache Beam SDKs provide a way to update an ongoing streaming job on the Dataflow managed service with new pipeline code, you can find more information here. Another way of updating an existing Dataflow job is by using REST API, where you can find Java example.
Additionally, please follow Feature Request regarding recreating job with gcloud.

Starting already existing VM with Jenkins on Google Cloud

I am trying to start a VM that already exist in Google cloud with my jenkins to use it as a slave. The reason is because if I start the template of this VM I need to do a few things before I can use my Jenkins code.
Does anyone know how to start VM's that already exist in my VM Pool in Google Could via Jenkins?
There might be 2 approaches to this depending on the operations that you need to run before in your machine that is preventing you from just recreating it.
First and possibly the most straightforward given the restriction that the machine already exists would be talking directly to the GCE API in order to list and start the machine from Jenkins (using a build step).
Basically you can make requests to the GCE API to do operations with your instances. I suggest doing this using gcloud from within the Jenkins master node as it'll save you having to write your own client. It's straightforward as you only have to "install" it in your master and you can make it work safely using a service account.
Below is the outline of this approach:
Download the cloud-sdk to your master node following these release instructions.
You can do this once outside of Jenkins or directly in the build step, doesn't matter as long as Jenkins and its user is able to get the binary.
Create the service account, generate authentication keys and give it permissions to interact with GCE.
Using a service account is the way to go as you can restrict its permissions to the operations that are relevant for you.
Once you get the service account that will be bound to your gcloud client, you'll need to set it up in Jenkins. You might want to do this in a build step (I'm using Groovy here but it should be easy to translate it to the UI):
stage('Start_machine'){
steps{
//Considering that you already installed gcloud in this node, but you can also curl it from here if that's convenient
// You can set this as an scope env var in Jenkins or just hard code it
gcloud config set project ${GOOGLE_PROJECT_ID};
// This needs a json file location accessible by jenkins like: --key-file /var/lib/jenkins/..key.json
gcloud auth activate-service-account --key-file ${GOOGLE_SERVICE_ACCOUNT_KEY};
// Check the reference on this command: https://cloud.google.com/sdk/gcloud/reference/compute/instances/start
gcloud compute instances start my_existing_instance;
echo "Instance started"
}
post{
always{
println "Result : ${currentBuild.result}";
}
}
Wrapping up: You basically create a service account that has the permissions to start your instances. Download an client that can interact with the GCE API (gcloud), authenticate it and start the instance, all from within your pipeline.
The second approach might be easier if there were no constraints regarding the preexisting machine.
Jenkins has a plugin for Compute Engine that will automatically spin up new workers whenever needed.
I know that you need to do some previous operations before Jenkins sends some work to these slave machines. However, I want to bring to your attention that this plugin also considers the start up scripts.
So there's always the option to preload your operations there before the machine takes off and by the time it's ready, you might have everything done.
Hope this helps.

Google Cloud Dataflow failing with compute.requireShieldedVm enabled

Our company policy requires the policy contraint "compute.requireShieldedVm" to be enabled. However, when running a Cloud Dataflow job, it is failing to create a worker with the error :
Constraint constraints/compute.requireShieldedVm violated for project projects/********. The boot disk's 'initialize_params.source_image' field specifies a non-Shielded image: projects/dataflow-service-producer-prod/global/images/dataflow-dataflow-owned-resource-20200216-22-rc00. See https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints for more information."
Is there any way when running a Dataflow job to request that a ShieldedVm be used for the worker compute?
It is not possible to provide a custom image as there is no such parameter that one can provide during job submission as can be seen here Job Submission Parameters
Alternatively, if you are running a Python based dataflow job you can setup the environment through setup files. An example of which can be found here Dataflow - Custom Python Package Environment

How to get jenkins pipeline test results into ReportPortal.io instance?

I have an automated Jenkins workflow that runs and tests a java project. I need to get all the data and results that are outputted by Jenkins into Report Portal (RP).
Initially, I was under the impression that you have to install the ReportPortal.io Jenkins plugin to be able to configure Jenkins to communicate with RP.
However, it appears that the plugin will eventually be deprecated.
According to one of the RP devs, there are APIs that can be used, but investigating them on our RP server does not give very clear instructions on what every API does or if it is what is required to get test data from Jenkins to RP.
How then do I get Jenkins to send all generated data to RP?
I am very familiar with Jenkins, but I am extremely new to Report Portal.
ReportPortal is intended for test execution results collection, not for jenkins logs gathering.
In two words, you need to find reporting agent at their github organization which depends on your testing framework (e.g. junit, testng, jbehave) and integrate it into your project.
Here is example for TestNG framework:
https://github.com/reportportal/example-java-TestNG/

Resources