Evaluation of Activiti as Continuous Integration orchestrator - business-process-management

I'm working in DevOps space and currently supporting overly complicated CI system. It purpose is test & certify multiple Java artifacts against tests for every single artifacts and artifacts against each other. We have multiple Jenkins instances and complicated custom workflows, but they share the same limitation: lack of resources control. We ended up with a bunch of purely technical Jenkins jobs to deal with those limitations but they aren't perfect and initial workflow became too bloated.
Here I'm asking your expertise about applicability of Activiti BPM engine to CI process.
We have following issues with current process:
Cloud nodes can be handed-over from one Jenkins job to another. If workflow became terminated in the middle (let's say functional tests failed on newly built artifact) then we have to free those nodes.
Jobs can consume multiple resources by themselves - databases, environments of multiple nodes, etc. Those resources must be freed up when workflow will be finished
Ideally, we should be able to define workflow steps in some DSL and bind resources to those steps. Later on, during workflow execution, it will be possible for workflow engine to determine when the resources will first be required and request them just before that step (according to the resource type) from appropriate pool / provider.
After each step will be finished, workflow engine will what I call "garbage collection" over resources. It could calculate (based on provided DSL) list of steps which still are reachable from current state and a list of resources which are binded to those steps. After that it could be possible to construct a list (currently allocated resources MINUS future required resources). That list will go to garbage collection.
With a such "garbage collection" I'm trying to avoid overly complicated logic of manual resource lifecycle control which will be embedded into workflow definition and will bloat it. I want to have clear and well understandable (and easily supported) workflows.
Do you think that it can be done easily with Activiti or any other BPM engine?

Andrev,
this can be implemented with limited effort. We have created similar workflows for QA environments using the open source BPMS Eclipse Stardust http://www.eclipse.org/stardust/
Best regards
Rob

Related

Restrict selectable credentials when using Git plugin options

I have a Jenkins instance that has users on different projects. I also have a Bitbucket instance where the users store their code. Push/pull access to the code repositories is authenticated by SSH public/private keys. Not all users in Bitbucket have access to all repositories.
I want users to be able to configure their jobs to use their own private keys to pull source from Bitbucket, but users should not be able to access other users' private keys. Is this possible with combination of the Jenkins Credentials plugin and the Git plugin? If so, how do I configure this?
If this is not possible with the Jenkins Credentials plugin/Git plugin, how can I implement this in Jenkins? I figure enterprises that use Jenkins must have resolved this problem but I can't find a solution anywhere.
Jenkins has a lot of inherent security issues in this respect and allowing users to have configuration permissions in a multi-tenant environment is very difficult to lock down.
Even if you were able to set permissions on a per credentials basis, a user that has configuration permissions to setup their own freestyle job can easily run processes on the nodes that could scrape passwords from the environment of other tenants. This can even be done with background processes if you limit one executor slot per node.
From what I've seen in industry and leveraged myself for an enterprise scale there are two high-level recommendations I would suggest:
Breakup the single instance into multiple instances so each set of
users or teams can have their own instance to work with.
Evaluate what users' functional needs are and provide a capability to request jobs where configuration permissions do not need to be given to users, rather only build/read permissions.
For Item 1:
Breaking up the instances not only helps Jenkins management from a security perspective, but from a scaling perspective as well since there are several issues you can run into with Jenkins once reaching a certain size (e.g. users that are resource hogs, unstashing bottlenecks, archiving bottlenecks, poorly written pipelines, etc.). These scaling problems typically lead to a need of vertically scaling the Jenkins master.
However, this approach has its own set of issues to solve since you now have multiple instances to maintain, but that is typically a bit easier to manage and there are some off-the-shelf solutions available if you're willing to pay the price (e.g. Cloudbees CI). Managing multiple instances can be solved in-house as well if you're willing to write some scripts or setup a service to handle this. Personally I'm a bit more of a fan of the in-house solution than the paid solution since I lean towards the ability to control one's own destiny and off-the-shelf solutions aren't always the one-size-fits-all they claim to be.
For Item 2:
If you really want to keep a single instance, the best way to secure it is to not let users have configuration permissions. As mentioned above, Jenkins has a lot of inherent security issues that does not make it well-suited for users to configure jobs in a multi-tenant setting. By evaluating the users' needs, you often find that there is a lot of common requirements users have that could be provided from common job templates without having to give them permissions.
Leveraging the Job DSL plugin to parameterize job creation is one way to do this. Parameters could then either be provided through a custom service or configuration files that are committed to a git repo. Another approach is leverage Jenkins REST API directly with a custom service that posts new job configurations from a customs job templates.
However, this approach could still run into scaling problem in the long term if the utilization of the Jenkins instance is expected to increase. These scaling problem are not insurmountable and can be mitigated with vertical scaling or offloading some stashing/archiving activity, but eventually at a certain point it might make sense to re-evaluate going with Item 1, or even a combination of Item 1 with Item 2.
Conclusion:
I know this is not likely the answer you were hoping for, but if security is a major concern, then a multi-tenant Jenkins instance that allow users' configuration permissions is not they way to go.

How do I structure Jobs in Jenkins?

I have been tasked with setting up automated deployment and, after some research, settled on Jenkins to get the job done. Prior to this I had approximately zero knowledge of Jenkins, beyond hearing the name. I have no real knowledge of Devops beyond what I have learnt in the last couple of weeks; no formal training, no actual books, just Google searches.
We are not running a full blown/classic CI/CD process; this is a business decision. The basic requirements are:
Source code will be stored in GitHub.
Pull requests must be peer approved.
Pull requests must pass build/unit/db deploy tests.
Commits to specific branches must trigger a deployment to a related specific environment (Production, Staging or Development).
The basic functionality that I am attempting to support covers (what I currently see as) two separate processes:
On creation of a pull request, application is built, unit tests run, and db deploy tested. Status info must be passed to GitHub.
On commit to one of three specific branches (master, staging and dev) the application should be built, and deployed to one of three environments (production, staging and dev).
I have managed to cobble together a pipeline that does the first task rather well. I am using the generic web hook trigger, and manually handling all steps using a declarative pipeline stored in source control. This works rather well so far and, after much hacking, I am quite happy with the shape of it.
I am now starting work on the next bit, automated deployment.
On to my actual question(s).
In short, how do I split this up into Jobs in Jenkins?
To my mind, there are 1, 2 or 4 Jobs to be created:
One Job to Rule them All
This seems sub-optimal to me, as the pipeline will include relatively complex conditional logic and, depending on whether the Job is triggered by a Pull Request or a Commit, different stages will be run. The historical data will be so polluted as to be near useless.
OR
One job for handling pull requests
One job for handling commits
Historical data for deployments across all environments will be intermixed. I am a little concerned that I will end up with >1 Jenkinsfile in my repository. Although I see no technical reason why I can't have >1 Jenkinsfile, every example I see uses a single file. Is it OK to have >1 Jenkinsfile (Jenkinsfile_Test & Jenkinsfile_Deploy) in the repository?
OR
One job for handling pull requests
One job for handling commits to Development
One job for handling commits to Staging
One job for handling commits to Production
This seems to have some benefit over the previous option, because historical data for deployments into each environment will not be cross polluting each other. But now we're well over the >1 Jenkinsfile (perceived) limit, and I will end up with (Jenkinsfile_Test, Jenkinsfile_Deploy_Development, Jenkinsfile_Deploy_Staging and Jenkinsfile_Deploy_Production). This method also brings either extra complexity (common code in a shared library) or copy/paste code reuse, which I certainly want to avoid.
My primary objective is for this to be maintainable by someone other than myself, because Bus Factor. A real Devops/Jenkins person will have to update/manage all of this one day, and I would strongly prefer them not to suffer from my ignorance.
I have done countless searches, but I haven't found anything that provides the direction I need here. Searches for best practices make no mention on handling >1 Jenkinsfile, instead focusing on the contents of a single pipeline.
After further research, I have found an answer to my core question. This might not be the absolute correct answer, but it makes sense to me, and serves my needs.
While it is technically possible to have >1 Jenkinsfile in a project, that does not appear to align with best practices.
The best practice appears to be to create a separate repository for each Jenkinsfile, which maps 1:1 with a Job in Jenkins.
To support my specific use case I have removed the Jenkinsfile from my main source code repository. I then create 4 new repositories:
Project_Jenkinsfile_Test
Project_Jenkinsfile_Deploy_Development
Project_Jenkinsfile_Deploy_Staging
Project_Jenkinsfile_Deploy_Production
Each repository contains a single Jenkinsfile and a readme.md that, in theory, contains useful information.
This separation gives me a nice view of the historical success/failure of the Test runs as a whole, and Deployments to each environment separately.
It is highly likely that I will later create a fifth repository:
Project_Jenkinsfile_Deploy_SharedLibrary
This last repository would contain pipeline code that is shared amongst the four 'core' pipelines. Once I have the 'core' pipelines up and running properly, I will consider refactoring what I can into this shared library.
I will not accept my own answer at this point, in the hope that more answers are forthcoming.
Here's a proposal I would try for your requirements based on the experience at my last job.
Job1: builds and runs unit tests on every commit on master or whatever your main dev branch is (checks every 20 minutes or whatever suits you); this job usually finds compile and unit test issues very fast
Job2 (optional): run integration tests and various static code checks (e.g. clang-tidy, valgrind, cppcheck, etc.) every night, if the last run of Job1 was successful; this job finds usually lots of things, but probably takes lots of time, so let it run only at night
Job3: builds and tests every pull request for release branches; so you get some info in your pull requests, if its mature enough to be merged into the release branches
Job4: deploys to the appropriate environment on every commit on a release branch; on dev and staging you could probably trigger some more tests, if you have them
So Job1, Job2 and Job3 should run all the time. If pull requests to your release branches are approved by QA (i.e. reviews OK and tests successful) and merged to release branches, the deployment is done by Job4 automatically.
It depends on your requirements and your dev process, if you want to trigger Job4 only manually instead.

Jenkins - Multiple instances

We have around 30 Jenkins installs across our organization, both Windows and Linux. They are all used for different tasks and by different teams (e.g. managing Azure, manipulating data, testing applications etc.)
I have been tasked with looking at whether we could bring these all into one 'Jenkins Farm' but as far as I can see such a thing doesn't exist? Ultimately 'we' want some control and to minimize the footprint of Jenkins. The articles I have found don't recommend using a single Master server (with multiple nodes) because of the following:
No role-based access for projects (affecting other teams code)
Plugins can affect all projects
Single point of failure as there is only one master server
Is it best to leave these on separate servers? Are there any other options?
I believe Role based access for projects is possible using
https://wiki.jenkins.io/display/JENKINS/Role+Strategy+Plugin
However, a single master isn't ideal as you pointed out due to 'Plugins can affect all projects'. Probably best to have separate jenkins master nodes but configure agents such that they can be shared across teams/projects.

CI / CD: Principles for deployment of environments

I am not a developer, but reading about CI/CD at the moment. Now I am wondering about good practices for automated code deployment. I read a lot about the deployment of code to a pre-existing environment so far.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created. For example for testing of a newly created build, deleting the environment again after testing.
I know that there are various plugins to interact with AWS, Azure etc. that could be used to develop a job for deployment of a virtual machine.
There are also plugins to trigger Puppet to deploy infra (as code) and there are plugins to invoke an infrastructure orchestration.
So everything is available to be able to deploy the infrastructure and middleware before deploying code (with some extra effort of course).
Is this something that is used in real life? How is it done?
The background of my question is my interest in full automation of development with as few clicks as possible, and cost saving in a pay-per-use model by not having idle machines.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created
Yes it is good practice to deploy an environment from scratch. Like you say, Jenkins and Jenkins pipelines can certainly help with kicking off and orchestrating that process depending on your specific requirements. Deploying a full environment from scratch is one of the hardest things to automate, and if that is automated, it implies that a lot of other things are also automated, such as infrastructure, application deployments, application configuration, and so on.
Is this something that is used in real life?
Yes, definitely. A lot of shops do this. The simpler your environments, the easier it is, and therefore, a startup with one backend app would have relatively little trouble achieving this valhalla state. But even the creation of the most complex environments--with hundreds of interdependent applications--can be fully automated; it just takes more time and effort.
The background of my question is my interest in full automation of development with as less clicks as possible and cost saving in a pay-per-use model by not having idling machines.
Yes, definitely. The "spin up and destroy" strategy benefits all hosting models (since, after full automation, no one ever has to wait for someone to manually provision an environment), but those using public clouds see even larger benefits in terms of cost (vs always leaving AWS environments running, for example).
I appreciate your thoughts.
Not a problem. I will advise that this question doesn't fit stackoverflow's question and answer sweet spot super well, since it is quite general. In the future, I would recommend chatting with your developers, finding folks who are excited about this sort of thing, and formulating more specific questions when you all get stuck in the weeds on something. Welcome to stackoverflow!
All is being used in various combinations; the objective is to deliver continuous value to end user. My two cents:
Build & Release
It depends on what you are using. I personally recommend to use what is available with the tool. For example, VSTS (Visual Studio Team Services) offers complete CI/CD pipeline. But if you have a unique need which can only be served by Jenkins then you must use that and VSTS offers that out of the box.
IAC (Infrastructure as code)
In addition to Puppet etc. You can take benefits of AZURE ARM (Azure Resource Manager) Template to Build and destroy an environment. Again, see what is available out of the box with the tool set you have.
Pay-per-use
What I have personally used is Azure Dev/Test Labs and have the code deployed to that via CI/CD pipeline. Later setup Shutdown policy on the VM so it will auto-start and auto-shutdown based on time provided. This is a great feature to let you save cost on the resources being used and replicate environments.
For example, UAT environment might not needed until QA is signed off. But using IAC you can quickly spin up the environment automatically and then have one-click deployment setup to deploy code to UAT.

Significance of creating single pipeline having multiple input sources over multiple pipelines each having separate input sources defined?

I am working on a project which receives requests from multiple clients through pubsub which dataflow pipelines will process in streaming mode to give out the responses. Each flow has some logic in common and also has read/writes from/to BigTable/BigQuery.
What are the pros and cons ( both development and maintenance side ) of using one single pipeline which receives input from different clients over separate pipeline for each input ?
In terms of development, these have about the same amount of complexity: you probably still have the common code written in one place, or perhaps even the entire pipeline code is identical but you're launching it with different parameters for different clients.
Maintenance-wise, there are pros and cons to both approaches.
One pipeline is likely to be cheaper. E.g. if traffic is overall very low and processing all the clients could fit on 1 machine, then it will actually happen on 1 machine - but if you do separate pipelines, each of them can't use less than 1 machine, so you'll be using at least N all the time.
One pipeline might be easier to observe and monitor in the UI, and easier to deploy. That, though, depends on the structure of the pipeline: are you going to pipe all clients' data through the same transforms, or, say, have 1 read transform per client (say, if each client is reading from a different PubSub topic and writing to a different BigQuery table)? If it's all the same transforms, then you'll get the benefit of launching the pipeline once and not having to do anything at all when a client is added or removed (otherwise, you'll need to update the pipeline).
With several pipelines (one per client), it's easier to isolate the issues with different clients. E.g. you could stop processing individual clients one by one, or update them one by one (say, if you're testing out some experimental code and don't want to break all the clients at the same time if it's wrong). It becomes unlikely that a bug in the pipeline will cause one client's data to mix up with another client's data.

Resources