Flex Template Python dependency management without outbound network connectivity - google-cloud-dataflow

we are running Dataflow Python Flex Templates in a VPC without outbound network connectivity and without artifact repository. Hence, for Dataflow python jobs, we provide dependencies as described here: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#local-or-nonpypi.
What is the best practice to provide dependencies to Python Flex Templates, where the build process that builds the docker image has access to PyPI, but the Dataflow workers don't have access?

Related

Can Apache Beam Pipeline be used for batch orchestration?

I am newbie in apache beam environment.
Trying to fit apache beam pipeline for batch orchestration.
My definition of batch is as follows
Batch==> a set of jobs,
Job==> can have one or more sub-job.
There can be dependencies between jobs/sub-jobs.
Can apache beam pipeline be mapped with my custom batch??
Apache Beam is unified for developing both batch and stream pipelines which can be run on Dataflow. You can create and deploy your pipeline using Dataflow. Beam Pipelines are portable so that you can use any of the runners available according to your requirement.
Cloud Composer can be used for batch orchestration as per your requirement. Cloud Composer is built on Apache Airflow. Both Apache Beam and Apache Airflow can be used together since Apache Airflow can be used to trigger the Beam jobs. Since you have custom jobs running, you can configure the beam and airflow for batch orchestration.
Airflow is meant to perform orchestration and also pipeline dependency management while Beam is used to build data pipelines which are executed data processing systems.
I believe Composer might be more suited for what you're trying to make. From there, you can launch Dataflow jobs from your environment using Airflow operators (for example, in case you're using Python, you can use the DataflowCreatePythonJobOperator).

Karate tests execution

Our requirement for API testing is:
To deploy test-automation module (Karate feature files, custom java classes) into AWS ECS-Fargate cluster.
Trigger the tests via Jenkins pipeline after every build of the actual microservice.
In addition to above, test-automation module should be triggered to run test suite on-demand and/or at scheduled intervals (say nightly) and send reports.
I have gone through Karate Distributed Testing and stand-alone executable jar options, but doesn't seem suitable for my case. Is Distributed Testing supported only for "Web-UI" automation testing?
Any thoughts would be helpful.
For this use-case, just use a Maven + JUnit project and then there is no difference between Karate and any other Jenkins Java pipeline.
It should be Jenkin's responsibility to do a scheduled build. It is up to you how to get all this into Fargate, maybe building a Docker container is part of the answer, but I would recommend trying to keep it simple.
Here is some Docker related discussion that may help: https://github.com/intuit/karate/issues/396
Open a new question with specifics next time.

How to scale down OpenShift/Kubernetes pods automatically on a schedule?

I have a requirement to scale down OpenShift pods at the end of each business day automatically.
How might I schedule this automatically?
OpenShift, like Kubernetes, is an api-driven application. Essentially all application functionality is exposed over the control-plane API running on the master hosts.
You can use any orchestration tool that is capable of making API calls to perform this activity. Information on calling the OpenShift API directly can be found in the official documentation in the REST API Reference Overview section.
Many orchestration tools have plugins that allow you to interact with OpenShift/Kubernetes API more natively than running network calls directly. In the case of Jenkins for example there is the OpensShift Pipeline Jenkins plugin that allows you to perform OpenShift activities directly from Jenkins pipelines. In the cases of Ansible there is the k8s module.
If you were to combine this with Jenkins capability to run jobs on a schedule you have something that meets your requirements.
For something much simpler you could just schedule Ansible or bash scripts on a server via cron to execute the appropriate API commands against the OpenShift API.
Executing these commands from within OpenShift would also be possible via the CronJob object.

Implement Jmeter/taurus with Openshift

I am Implementing Jmeter/taurus for performance testing for microservices. We are using Openshift PaaS solution to run all microservices. I am able to deploy jmeter/taurus inside Openshift using jenkins pipeline and generated the taurus report using jmx report in the container. My requirement is to publish the taurus report to Jenkins, rather than storing it to cloud storage or nexus. Can someone advise me what should be best approach to publish performance report for developers on Jenkins or any other optimal way to publish.
I found something by googling where they Jenkins agent was deployed inside Openshift and checkout the test suite Git repo into the agent's workspace just want to make sure if this is the best approach for my scenario. Our Jenkins master is running on Google cloud platform VM's with some dynamic slaves.
Thanks in Advance!
According to Dump Summary for Jenkins Plugins Taurus User Manual Chapter, you just need to add reporting module definition to your YAML configuration file like:
reporting:
- module: final-stats
dump-xml: stats.xml
And "feed" this stats.xml file to Jenkins Performance Plugin
That's it, you should get Performance Report added to your build dashboard. Check out How to Run Taurus with the Jenkins Performance Plugin article for more information if needed.

Continuous Deployment using Jenkins and Docker

We are building a java based high-availability service for a financial application. I am part of the team for managing continuous integration using Jenkins.
Lately we introduced continuous deployment too in the list and we opted for Docker containers.
Here is the the infrastructure:
The production cluster will have 3 RHEL machines running the following docker containers on each of them:
3 instances of Wildfly
Cassandra
Nginx
Application IDE is Netbeans and source code is in git.
Currently we are doing manual deployment on this infrastructure.
Please suggest me some tools which I use with Jenkins to complete the continuous deployment process.
You might want jenkins to trigger on each push to your jenkins repository. There are plugins that help you do that with a webhook.Gitlab-plugin is a solution similar solution exist for Github and other git solutions.
Instead of heavily relying on bash and jenkins configuration you might want to setup a jenkins pipeline with the jenkins pipeline plugin or even pipeline: multibranch plugin. With those you can automate your build in groovy code (jenkinsfile) in a repository with the possibility to add functunality with other plugins building on them.
You can then use the docker pipeline plugin to easily build docker containers, push docker images and run code inside docker containers.
I would suggest building your services inside docker so that your jenkins machine does not have all the different dependencies installed (and therefore maybe conflicting versions). Use docker containers with all the dependencies and run your build code in there with the docker pipeline plugin from groovy.
Install a registry solution to push and pull your docker images to.
Use the Pipeline: Shared Groovy Libraries to extract libraries from your jenkinsfiles so that they can be reused. Those library files should have their own repository which your jenkins knows about and keeps up to date. Possibly you can even have an entire pipeline process shared between multiple projects which simply add parameters in their jenkinsfile.
A lot of text and no examples. If you think something is interesting and you want to see some code just ask. I am currently setting all this up.

Resources