No template files appearing when running a DataFlow pipeline - google-cloud-dataflow

I am trying to create a dataflow template and run it via the DataFlow Cloud UI, and after executing the pipeline via command-line with the dataflow runner, it works correctly (i.e. the right data appears in the right places) but there are no "pre-compiled" template/staging files appearing in the Google Cloud Storage Bucket.
I did see this, but the post never mentions a resolution and I did include the parameter mentioned therein.
My command to run is python apache_beam_test.py --runner DataflowRunner --project prototypes-project --staging_location gs://dataflow_templates/staging --temp_location gs://dataflow_templates/temp --template_location gs://dataflow_templates/
I do get a warning regarding the options however:
C:\Python38\lib\site-packages\apache_beam\io\gcp\bigquery.py:1677: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported
experiments = p.options.view_as(DebugOptions).experiments or []
C:\Python38\lib\site-packages\apache_beam\io\gcp\bigquery_file_loads.py:900: BeamDeprecationWarning: options
is deprecated since First stable release. References to .options will not be supported
temp_location = p.options.view_as(GoogleCloudOptions).temp_location
Does that mean my command-line arguments will not be interpreted and if so, how do I get the DataFlow/Beam templates into my GCS so I can reference them from the DataFlow UI and run them again later on?
Help much appreciated!

The problem was indeed that the cli flags needed to be explicitly passed into the pipeline options.
As I did not have any custom flags added to my project, I wrongly assumed Beam would handle the standard flags automatically, but this was not the case.
Basically, you have to follow this even if you have no new parameters to add.
I assumed that step is optional (which it technically is if you only want to execute a pipeline without any runtime parameters once), but in order to reuse and monitor the pipelines in Dataflow UI you have to stage them first. That in turn requires passing a staging location into the pipeline.
Also, as far as I understand, execution of the pipeline requires a service account, while uploading the staging files requires Google Cloud SDK authentication.

Related

Generating Dataflow Template Using Python

I have python script that creates dataflow template in the specified GCS path. I have tested the script using my GCP Free Trial and it works perfect.
My question is using same code in production environment I want to generate a template but I can not use Cloud-Shell as there are restrictions also can not directly run the Python script that is using the SA keys.
Also I can not create VM and using that generate a template in GCS.
Considering above restrictions is there any option to generate the dataflow template.
Using dataflow flex templates should obviate the need to automatically generate templates--instead you could create a single template that can be parameterized arbitrarily.
Using Composer, I have triggered the Dataflow DAGs which created Jobs in dataflow. Also managed to generate Dataflow template. Using Dataflow console & template executed the job

How to get files produced during a Travis-CI build?

I am using Travis-CI to test code in a repository. There are quite some files after the testing and I would like to have them at a persistent place. How can I do that under the context of Travis-CI?
As an artificial example, suppose my Travis-CI server runs a C program that stores a large number of integers in a specific file. The file can be found at the Travis-CI server after the build. But how can I get that file? In my use case, this file is large and it would not make sense to read it from the console of Travis-CI; in other words, I would not consider using "cat ..." in .travis.yml.
After some search, here is what I got:
The most convenient way seems to deploy the generated files to GitHub pages. The process is explained here: https://docs.travis-ci.com/user/deployment/pages/. In short:
first, create a GitHub page from the repository under test. This can be done through the Github web of the repository. The outcome includes an additional remote branch called gh-=pages generated.
then, in .travis.yml, use the deploy section to specify the condition to do the deployment.

Working with versions on Jenkins Pipeline Shared Libraries

I'm trying to figure it out on how to work with a specific version of a Shared Library.
Jenkins documentation about this isn't quite clear so I've being making some experimenting but with no success.
They basically say:
But how should I configure somelib on 'Global Pipeline Libraries' section under Manage Jenkins > System Config menu so I can use any of the available stable versions?!
The thing is:
Imagine that I've my somelib Project under version control and, currently, I've released 2 stable versions of it: v0.1 and v0.2 (so I have 2 tags named v0.1 and v0.2).
And in some Pipeline I want to use somelib's version v0.1 and on another Pipeline I need to use v0.2 version.
How can I do this using the #Library annotation provided by Jenkins?
In the Global Pipeline Libraries under Jenkins > System Config you only set the default library version to use if not specified otherwise inside the Jenkinsfile. This might look like this (ignore the Failed to connect to repo error here):
Inside the Jenkinsfile you can explicitly specify which version you want to use if you do not want the default:
#Library('somelib#<tag/branch/commitRef>')
That way you can freely choose at any time which pipeline version to use for you project.
Following #fishi response I just want to leave an important note.
During library configuration on Global Pipeline Libraries you must select Modern SCM option so things can work seamlessly.
If you select Legacy Mode instead you'll not be able to use the library as desired.
If for some reason Modern SCM does not appear in the Retrieval Mode option it means that you need to upgrade Global Pipeline Libraries plugin or even Jenkins
Basically "Version" is the branch name for the repo which stores the shared library codes. If you don't have any branch other than main or master, make sure to fill it in Default Version in your Global Pipeline Library configuration

Post Deployment JVM log validation for JBOSS and WAS Application

We use Jenkins and Urban Code Deploy to do our builds and deployments respectively. Post the deployment we manually go ahead and validate the JVM logs. Most of the Applications we deploy are JBOSS and WAS8.5. I wanted some suggestion on automating this post deployment validation task. Is there any tool, plugin that can be integrated with Urban Code Deploy to perform this log parsing against certain keywords.
I have "Log parser" plugin which is an open source plugin in Jenkins. Are there any better ideas?
In UrbanCode Deploy you can use the step called "Monitor File Contents" to check if a regular expression is contained in a file.
Another way would be to output the log file content in a shell step, like cat logfile, and then use a post-processing script to check if an expression is in the file. In this case, you can use JavaScript syntax. See

Jenkins drops a letter from file paths

We have a Code Composer Studio (Eclipse) project that uses CMAKE to generate makefiles and build. The project compiles as expected when the project is manually imported onto the Jenkins slave (Win10 x64) and executed from the command line but fails when the build is handled by Jenkins. The failure always follows the same pattern: a singular letter is dropped from the path of an object file. For example, [Repo directory]/Cockpit_Scaling_and_Exceedance_data.dir becomes [Repo direcory]/Cockpit_Scaling_and_Exceedance_ata.dir and linking fails because it cannot find the referenced object file.
I made sure that there are no differences between the account environment variables and the system environment variables and have also configured the Jenkins Service to use the admin account on the slave instead of SYSTEM in order to get rid of as many differences between Jenkins and the command line as possible.
The project will build successfully using one of our other Jenkins slaves (also Win10 x64), so we know that it's not a Windows 10 issue or a problem with our Jenkins configuration. Since I can't find any differences between the configuration of the two slave machines, I was hoping that someone might be able to suggest somewhere to look for this path issue.
I never found out why the paths to object files were being mangled, but I did get the project to build successfully on the slave via Jenkins. All I did was change all of my system environment variables into user environment variables. I copy-pasted, so I know that the variables themselves did not change.
I have no idea why this corrected this issue as I had inserted a whoami call at the beginning of the build to confirm that Jenkins is indeed running as a user and not System. I guess from this point on all of my environment variables will be specific to a user and not SYSTEM...
EDIT: The problem has returned. I have made no further progress in tracking down the cause behind this issue, but I have found that I do not see this symptom when running the scripts in a bash environment instead of a Windows command prompt. Fortunately for me the scripts have all been written in such a way that they can be run in both environments, so I have had my coworkers use bash instead for them.

Resources