I have only one topic which was created in the production project. I want to run my dataflow job in dev environment which needs to consume production pubsub topic. When I submit my dataflow job in dev project it is not working and it always shows running in dataflow UI but not reading any elements from pubsub. If I submit to production project it works perfectly.
Why it is not reading messages from different project topic? I'm using java-sdk 2.1 and runner is "dataflowrunner"
PCollection<String> StreamData = p.apply("Read pubsub message",PubsubIO.readStrings().fromSubscription(options.getInputPubSub()));
Using mvn to submit dataflow job
mvn compile exec:java -Dexec.mainClass=dataflow.streaming.SampleStream -Dexec.args="—project=project-dev-1276 --stagingLocation=gs://project-dev/dataflow/staging --tempLocation=gs://project-dev/dataflow/bq_temp --zone=europe-west1-c --bigQueryDataset=stream_events --bigQueryTable=events_sample --inputPubSub=projects/project-prod/subscriptions/stream-events --streaming=true --runner=dataflowRunner"
Note: If I am using directrunner it works and consumes messages from different project pubsub topic.
No elements added in the queue and no estimated size.
You need to add Pub/Sub Subscriber permissions in your production project for a user (a service account) that your job will use. By default, workers use your project’s Compute Engine service account as the controller service account. This service account (<project-number>-compute#developer.gserviceaccount.com) should be given Pub/Sub Subscriber permission.
Read more here https://cloud.google.com/dataflow/docs/concepts/security-and-permissions and here https://cloud.google.com/pubsub/docs/access-control
Related
I am trying to write a script to automate the deployment of a Java Dataflow job. The script creates a template and then uses the command
gcloud dataflow jobs run my-job --gcs-location=gs://my_bucket/template
The issue is, I want to update the job if the job already exists and it's running. I can do the update if I run the job via maven, but I need to do this via gcloud so I can have a service account for deployment and another one for running the job. I tried different things (adding --parameters update to the command line), but I always get an error. Is there a way to update a Dataflow job exclusively via gcloud dataflow jobs run?
Referring to the official documentation, which describes gcloud beta dataflow jobs - a group of subcommands for working with Dataflow jobs, there is no possibility to use gcloud for update the job.
As for now, the Apache Beam SDKs provide a way to update an ongoing streaming job on the Dataflow managed service with new pipeline code, you can find more information here. Another way of updating an existing Dataflow job is by using REST API, where you can find Java example.
Additionally, please follow Feature Request regarding recreating job with gcloud.
Our company policy requires the policy contraint "compute.requireShieldedVm" to be enabled. However, when running a Cloud Dataflow job, it is failing to create a worker with the error :
Constraint constraints/compute.requireShieldedVm violated for project projects/********. The boot disk's 'initialize_params.source_image' field specifies a non-Shielded image: projects/dataflow-service-producer-prod/global/images/dataflow-dataflow-owned-resource-20200216-22-rc00. See https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints for more information."
Is there any way when running a Dataflow job to request that a ShieldedVm be used for the worker compute?
It is not possible to provide a custom image as there is no such parameter that one can provide during job submission as can be seen here Job Submission Parameters
Alternatively, if you are running a Python based dataflow job you can setup the environment through setup files. An example of which can be found here Dataflow - Custom Python Package Environment
I have an automated Jenkins workflow that runs and tests a java project. I need to get all the data and results that are outputted by Jenkins into Report Portal (RP).
Initially, I was under the impression that you have to install the ReportPortal.io Jenkins plugin to be able to configure Jenkins to communicate with RP.
However, it appears that the plugin will eventually be deprecated.
According to one of the RP devs, there are APIs that can be used, but investigating them on our RP server does not give very clear instructions on what every API does or if it is what is required to get test data from Jenkins to RP.
How then do I get Jenkins to send all generated data to RP?
I am very familiar with Jenkins, but I am extremely new to Report Portal.
ReportPortal is intended for test execution results collection, not for jenkins logs gathering.
In two words, you need to find reporting agent at their github organization which depends on your testing framework (e.g. junit, testng, jbehave) and integrate it into your project.
Here is example for TestNG framework:
https://github.com/reportportal/example-java-TestNG/
I'm working with Jenkins that runs on a server.
I have a pipeline which is triggered by a user that pushes something on a GitHub repository.
It performs a script which makes sure the GitHub repository is deployed to the SAP Cloud Platform.
It uses the MTA Archive Builder for building the MTA application which creates a .mtar file.
The MTA application has a HTML5 module.
After building the .mtar file with the MTA Archive Builder, I deploy it using the NEO Java Web SDK (the library you need to perform neo deploy-mta).
"neo deploy-mta" is a command that performs the actual request for deploying your html5 application.
This works fine and the project is successfully deployed on the SAP Cloud Platform.
The problem is: if a user rapidly pushes 2 times on GitHub, my Jenkins pipeline is triggered twice and performs "neo deploy-mta" 2 times.
In a normal case the SAP Cloud platform should deploy 2 versions, but when I look it only deployed the first deployment request. So it skipped the second request for deployment.
My question is how can I make sure there are 2 versions deployed on the SAP Cloud Platform when 2 pushes happened?
The Jenkins instance is already waiting until there is no build running.
The problem was that the SAP Cloud Platform didn't deploy 2 versions when there were 2 requests for deployment.
The solution for this problem is to add the "--synchronous" parameter to the "neo deploy-mta" command. Now this script will wait until there is no deployment (for this application) running on the SAP Cloud Platform.
Most likely it happens because the SAP MTA deployer detects that you have another deploy in progress and thus stops the second deployment.
One version to go about it is to ensure from Jenkins that you don't run the second build until the first one has finished. You can do this with the help of a lock / semaphore like mechanism. There are several ways to do this via Jenkins plugins:
Lockable Resources
Exclusion Plugin
Build Blocker
Also look at How can I prevent two Jenkins projects/builds from running concurrently?.
I am struggling with this, and initially thought it could be the result of switching the pipeline data source from Cloud Datastore to Firebase Firestore, which required a new project. But I've since found the same error in separate pipelines. All pipelines run successfully on the local DirectRunner and the permissions appear to be the same as the old project.
It looks like none of the VMs are booting and the pipeline never scales above 0 workers. "The Dataflow appears to be stuck" is the only error message I could find and there is nothing in StackDriver. Tried every dependency management variation I could find in the docs but it doesn't seem to be the problem.
My last Dataflow job-id is 2017-10-11_11_12_01-15165703816317931044.
Tried elevating the access roles of all services accounts and still no luck.
Without any logging information, it's hard to pinpoint. But this can happen if you have changed the permissions or roles of the Dataflow service account or the Compute Engine service account so that the service account does not have enough permissions to get the images for the Dataflow workers.