Testing strategy for SCDF based batch pipelines / flows

Testing strategy for SCDF based batch pipelines / flows - spring-cloud-dataflow

Could you please share any artifacts which talks about testing strategy for SCDF based batch pipelines / flows? It can cover manual as well as automated way of testing.

A similar question was recenlty answered in SCDF's Gitter channel — see spring-cloud/spring-cloud-dataflow?at=5ff0c5b7de608143155ac081
As far as you're using the TASK Java DSL, you'd be able to programmatically define and launch task/batch-jobs from SCDF. We use this in the product for our end-to-end acceptance tests, as well. The links to the docs and the test-suite are shared in the Gitter thread.

Related

Github Actions cloud hosted runner, Container within Container?

I am currently working for an enterprise and have been asked to use GitHub Actions instead of ADO/Azure Pipelines or Jenkins.
My objective is to create a self-service model where we have a basic CI/CD framework that teams can use as a starting place for their pipelines. It has all the security, quality, and governance rolled in, making life easier for devs and reducing duplication of effort. Thus reusable workflows are a must, and being able to launch containers from a build agent/runner/worker is a must.
Let me lay out my understanding of the situation with GitHub Actions:
Github Actions can do two things:
Launch a Container
Run JavaScript
Running a container within a container is a considered a bad idea, and in fact, support for it is going away in the near future.
GitHub Hosted Runners run in a container.
Github Actions don't support Reusable Workflows until Q3 of 2021
If my understanding is correct, then I'm dead in the water:
A GitHub hosted runner for Actions is basically useless in my case unless I want to write JavaScript.
It looks like I'm back in the VM business to self-host a runner so that I can use it to host containers instead of running Docker within Docker
My ability to create a generalized framework for my dev teams is somewhat undermined until GitHub gets around to implementing Reusable Workflows. (I think this is the biggest assumption, most likely to be disproved with a workaround)
Before I push back for a different CI tool, somebody please tell me what I'm missing here or what workarounds make this do-able.

After additional research and some testing, my hypotheses were confirmed:
Using a self-hosted runner on a VM is the most straightforward way to solve the "container-in-a-container" problem. It also solves the problem of consuming private/self-hosted package feeds from the runner without whitelisting every i.p. range used by GitHub hosted runners, which something most enterprise InfoSec teams would be reluctant to do.
This question had second one rolled in, "How to create a reusable CI/CD framework for an enterprise in GitHub Actions" which was bad form on my part. The most straightforward options are a) wait until Reusable Workflows are fully implemented and worked out b) use a more mature orchestration tool like Jenkins, TeamCity, or Azure DevOps if you can't wait.

A couple of things to look at:
You might be able to use create your own GitHub actions to share behavior, info on how to call here.
For self hosted then docker-github-actions-runner is a great starting point.

A basic question about continuous integration

This is not a programming question, but I don't know any more active forum and besides programmers are the best people to be able to answer my question.
I am trying to understand the rationale behind continuous integration. On one hand, I understand that it is a good practice to daily commit your code before heading to home whether or not the coding and testing is complete or not and then there is continuous integration concept where the minute something is committed, it triggers a build and all the test cases are run. Aren't the two things contradictory?. If we commit daily whatever coding is done, it will cause daily failed builds..Why don't we manually trigger builds once the coding and testing is complete?.

Usually when you save your code daily is to be sure that your work will not be lost.
On the counterpart the CI or Continuous Integration is to test if what you produced is ok, in the majority of projects the CI isn't applied to individual branches ie: feature, bugfix, it's applied on major branches ie: master, develop, releases, etc. And these branches aren't updated daily as they need a pull request to be update and someone to approval that pull request.
The use case for having CI implemented on individual branches (feature, bugfix) is to check before merging a pull request into a major branch when it will check the tests and if the code builds.
So resuming, yes you need to commit your code daily, but you don't need to apply CI to it daily.
I suggest to you check the Gitflow workflow: https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow

The answer is obvious.
1. Committing Code: In general the code is committed only after testing with the environment locally.
Consider Developer_A working on Component_A hence one has to commit with minimum verification as the scope is to develop Component_A.
No imagine complex system with 50 developer developing Component_B...Component_Z++
If someone is committing the code without minimum test it is most probably going to give you failed result.
Or else developer might have it committed on development branch that all together depends on SCM strategy adapted in project.
2. Continues Integration test scope:
On the other hand integrator principally collects and synergies different codes (Software Components) together into 1 container and perform different tests.
Most importantly, integrator need to ensure that all the Components Developed from different developers is fitting good and at the end Software is working as expected. To ensure that, Integrator have acceptance criteria and to proactively prevent something which can go wrong, it is important to have these criteria automated with the help of Continues integration.
But among all factors, it is important to give feedback on the quality of software to the developers. It is best in favor of project (economically), to know about the bug earlier hence Continues Integration and DevOps.
In Complex System it is worth to have automated watcher to catch the sneaked mistakes from developers.
3 Tools and Automation:
To create human independent system, automation tools like Jenkins are helpful.
Based on the testing strategy different testing levels can be performed with the help of Automation tools.

Web-based complex data-center automation tool

After evaluating existent tools like Ansible Tower, rundeck and others, it seems that no tool can fulfill the needed requirements.
We have complex data-center servers, cluster of DB and web servers, the data-center has a lot of client-systems, +100, and other tools like solr, redis, kafka... deployed there across the physical servers, not to mention that the same data-center servers have different accounts, linux users, (QA,stag,production..etc), for now the meta-data about these environments alongside their web-apps, source code to be used, servers of the cluster are all defined on xml and there is a bash scriptsreads from that XML that operated manually to run any operation/task (like checkout the source, build, deploy, start, stop... and other customized operations)
This system should be done with a developer and DevOps engineers together, but what I want to know, is there any preferable framework(s) that could be used for this system? does the workflows frameworks are usable on this case? e.g, Activiti BPMN? the Ant is an option but we don't need just an automation tool more that scheduler and logging and a lot of other services.
And if this is not the right place, can you please point out where cat I ask such question?
What's required is to create a web-based system as automation tool with:
UI to define the specific operations to be done, like build, deploy across the cluster specific web-app on specific env, start/stop specific web-app on specific machine, or any other customized operation, with multiple selections and flexible and dynamic options choosing way.
The FE should show the executing workflows and operations within them.
Dynamic way to create set of operations as a single workflow, that have dynamic ability to set the dependencies among them.
An interface between the back-end code of this system with the already existent bash scripts that will do the actual tasks across the DC servers.
A scheduler to be able to organize these operations in respect to a defined complicated dependencies between the workflows.
Authentication & authorization services to the users since there will be a lot of customized roles upon the operations,environments, the products...etc
Logging system to save the operations outputs.

Why not use a combination of ansible/Docker and jenkins, Jenkins can do most of the stuff you described using Pipeline projects/MultiProjects and Ansible for your UI and role related details

The task you're describing is basic network orchestration, and there're a bunch of orchestration engines/software out there. However, the fact that you are asking this question means you're just starting out and not ready to invest in a full fledged management product yet.
I strongly suggest that you take a look at Chef for your purposes.
Hope this helps.

I would recommend you to take a look at jenkinsx.io if you are targeting kubernetes and containers (docker). As part of the activiti bpm team we are trying to align with such technologies to make people life easy to integrate more complex workflows with DevOps and operations of your projects.

Any Hosted CI service that natively support JUnit XML reports?

Does anyobdy know a good solid CI service that provides the common features of build parallelization BUT also support for Junit reports?
The current ones that we have looked at (semaphoreapp, circleCI, travisCI,...) are good but relatively useless as we have to manually investigate what tests failed, since when, and how often, thus negating a lot of the benefits of a hosted service.
Things that we're looking to know (and are all provided by JUnit / Jenkins):
If the build failed, because of what test cases?
Total Number of Failures / Total Number of Tests (trends to better analyze things)
Individual Track record of any test (so we know exactly when it was broken, whether it's intermittent,...)

You mentioned the most famous CI services but there are alternatives where you can get a higher customization level, like installing plugins, fine configuration, etc.
CloudBees and ClinkerHQ are both based on Jenkins offered as a service. You can also get very useful metrics (coverage, failures, graphs, execution times, etc.) thanks to Jenkins Plugins and SonarQube. I think Jenkins and SonarQube are a perfect couple for you.
Notifications are very important too. You want to be notified when something is wrong. This feature is available on both.
Regards,
Antonio.
DISCLAIMER: I'm deeply involved in ClinkerHQ

Can BuildForge do what Hudson CI is currently doing?

I am looking for a comparison between IBM Build Forge (Rational) and Hudson CI.
At work we have full licenses for BuildForge but recently we started using Hudson for doing continuous integration and automating other tasks.
I used BuildForge very little and I would like to see if there are any special advantages of BuildForge over Hudson.
Also it would be very helpful to see a list of specific advantages of Hudson over BuildForge.
I not sure if it important or not, but I found interesting that Build Forge is not listed under continuous integration tools at wikipedia.

Thanks for bringing attention to the fact it was not on the wikipedia list of continuous integration applications. I have now added it. Build Forge has been a leader in providing continuous integration capabilities by use of it's SCM adapters for many, many years. Build Forge has a strength in supporting many platforms through its use of agents. These agents can run on Windows, Linux, AIX, Solaris, System Z, and many more -- they even give you the source code for the agents for free so you can compile it on just about any platform. The interface allows you to easily automate tasks that run sequentially or in parallel on one or multiple boxes. Selectors allow you to select a specific build server by host name or by criteria such as "any windows machine with 2gb of ram" from a pool of available agents. The entire process is fully auditable, utilizes role based permissions, and is stored in a central enterprise database such as DB2, Oracle, SQL Server, and others.
One of the most compelling reasons to use Build Forge is it's Rational Automation Framework for WebSphere. It allows a full integration into WebSphere environments to automate deployments and configurations of WebSphere through out of the box libraries. The full installation, patching, deployment of apps, and configuration of WAS and Portal can be performed using these libraries. To find out more, it is best to contact your IBM Rational representative.

You can use RAFW (IBM Rational Automation Framework for WebSphere) with BuildForge. It does not make sense to use RAFW with other ci servers, since RAFW requires BuildForge.
You have support for BuildForge and it integrates with other IBM software like ClearCase. Theoretically you have only to deal with one vendor if something in the chain does not work, but IBM has different support teams for their products and you might become their ping pong ball. :(
Hudson is open source (if you like that), that means you can get the source and modify it to serve you better. But the release cycle is very short (about 1 week, agile development). There is a more stable version with support available now (for cash of course) from the company of the main author of Hudson.
Hudson is currently main stream and is actively developed. I don't know how the usability of BuildForge is, but Hudson is good (not always perfect). The plugin concept of Hudson is a great plus, not sure if BuildForge has it as well.
Currently, we are using Hudson, but BuildForge was not looked at in detail.

You need to define what you would need continuous integration for (e.g. building, testing). Having used Hudson, I can vouch for its usefulness and effectiveness. There are many plugins that extend Hudson that can suit various needs. And you can't beat the price point (free).
You need to inquire as to why a BuildForge license was obtained at your place of employment. Perhaps someone on your team knows why this was done. If it isn't necessary for your needs, don't renew your BuildForge license and simply continue using Hudson.

Being a BuildForge/RAFW user, I have to object to one point stated above. It is perfectly possible to use RAFW without BuildForge. It is driven by a command line script, and you could use for example Hudson and RAFW together just fine.
A sample command would look like:
rafw.sh -e env -c cell -t was_common_configure_start_dmgr

The primary differentiators IMO:
Hudson/Jenkins is more readily extensible with the many existing plugins. It has a large active community and plenty information and documentation.
BuildForge can be configured with agents running on multiple machines and tasks can be assigned to run on a target agent. Reliable vendor support.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart