Building a AWS serverless solution (lambda, s3, cloudformation etc) I need an automated build solution. The application should be stored in a Git repository (pref. Bitbucket or Codecommit). I looked at BitBucket pipelines, AWS CodePipeline, CodeDeploy , hosted CI/CD solutions but it seems that all of these do something static as in receiving a dumb signal that something changed to rebuild the whole environment.... like it is 1 app, not a distributed application.
I want to define ordered steps of what to do depending on the filetype per change.
E.g.
1. every updated .js file containing lambda code should first be used to update the existing lambda
2. after that, every new or changed cloudformation file/stack shoud be used to update or create existing ones, there may be a needed order (importing values from each other)
3. after that, code for new lambda's in .js files should be used to update the created lambda's (prev step) code.
Non updated resources should NOT be updated or recreated!
It seems that my pipelines should be ordered AND have the ability to filter input (e.g. only .js files from a certain path) and receive as input also what the name of the changed resource(s) is(are).
I dont seem to find this functionality withing AWS or hosted git solutions like BitBucket or CI/CD pipelines like CircleCI or Codeship, aws CodePipeline, CodeDeploy etc.
How come? Doesn't anyone need this? Seems like a basic requirement in my eyes....
I looked again at available AWS tooling and got to the following conclusion:
When coupling CodePipeline to CodeCommit repositry, every commit puts a whole package of the repositry on S3 as input for CodeCommit. So not only the changes but everything.
In CodePipeline there is the orchestration functionality i was looking for. You can have actions for every component like create-change-set for SAM component and execute-chage-set etc and have control over the order of all.
But:
Since all code is given as input I assume all actions in CodeCommit will be triggered even for a small change in code which does not affect 99% of the resources. Underwater SAM or CF will determine themself what did or did not change. But it is not very efficient. See my post here.
I cannot see in the pipeline overview which one was run the last time and its status...
I cannot temporary disable a pipeline or trigger with custom input
In the end I think to make a main pipeline with custom lambda code determining what actually changed using CodeCommit API and splitting all actions in sub pipelines. From the main pipeline I will push their needed input to S3 and execute them.
(i'm not allow to comment, so i'll try and provide an answer instead - probably not the one you were hoping for :) )
There is definitely a need and at Codeship we're looking into how best to support FaaS/Serverless workflows. It's been a bit of a moving target over the last years, but more common practices etc. are starting to emerge/mature to a point where it makes more sense to start codifying them.
For now, it seems most people working in this space have resorted to scripting (either the Serverless framework, or directly against the FaaS providers) but everyone's struggling with the issue of just deploying what's changes vs. deploying everything as you point to. Adding further complexity with sequencing is obviously just making things harder.
Most services (Codeship included) will allow you some form of sequenced/stepped approach to deploying, but you'll have to do all the heavy lifting of working out what has changed etc.
As to your question of How come? i think it's purely down to how fast the tooling has been changing lately combined with how few are really doing it. There's a huge push for larger companies to move to K8s and i think they've basically just drowned out the FaaS adopters. Not that it should be like that, or that we at Codeship don't want to change that; it's just how i personally see things.
Related
I have a Jenkins instance that has users on different projects. I also have a Bitbucket instance where the users store their code. Push/pull access to the code repositories is authenticated by SSH public/private keys. Not all users in Bitbucket have access to all repositories.
I want users to be able to configure their jobs to use their own private keys to pull source from Bitbucket, but users should not be able to access other users' private keys. Is this possible with combination of the Jenkins Credentials plugin and the Git plugin? If so, how do I configure this?
If this is not possible with the Jenkins Credentials plugin/Git plugin, how can I implement this in Jenkins? I figure enterprises that use Jenkins must have resolved this problem but I can't find a solution anywhere.
Jenkins has a lot of inherent security issues in this respect and allowing users to have configuration permissions in a multi-tenant environment is very difficult to lock down.
Even if you were able to set permissions on a per credentials basis, a user that has configuration permissions to setup their own freestyle job can easily run processes on the nodes that could scrape passwords from the environment of other tenants. This can even be done with background processes if you limit one executor slot per node.
From what I've seen in industry and leveraged myself for an enterprise scale there are two high-level recommendations I would suggest:
Breakup the single instance into multiple instances so each set of
users or teams can have their own instance to work with.
Evaluate what users' functional needs are and provide a capability to request jobs where configuration permissions do not need to be given to users, rather only build/read permissions.
For Item 1:
Breaking up the instances not only helps Jenkins management from a security perspective, but from a scaling perspective as well since there are several issues you can run into with Jenkins once reaching a certain size (e.g. users that are resource hogs, unstashing bottlenecks, archiving bottlenecks, poorly written pipelines, etc.). These scaling problems typically lead to a need of vertically scaling the Jenkins master.
However, this approach has its own set of issues to solve since you now have multiple instances to maintain, but that is typically a bit easier to manage and there are some off-the-shelf solutions available if you're willing to pay the price (e.g. Cloudbees CI). Managing multiple instances can be solved in-house as well if you're willing to write some scripts or setup a service to handle this. Personally I'm a bit more of a fan of the in-house solution than the paid solution since I lean towards the ability to control one's own destiny and off-the-shelf solutions aren't always the one-size-fits-all they claim to be.
For Item 2:
If you really want to keep a single instance, the best way to secure it is to not let users have configuration permissions. As mentioned above, Jenkins has a lot of inherent security issues that does not make it well-suited for users to configure jobs in a multi-tenant setting. By evaluating the users' needs, you often find that there is a lot of common requirements users have that could be provided from common job templates without having to give them permissions.
Leveraging the Job DSL plugin to parameterize job creation is one way to do this. Parameters could then either be provided through a custom service or configuration files that are committed to a git repo. Another approach is leverage Jenkins REST API directly with a custom service that posts new job configurations from a customs job templates.
However, this approach could still run into scaling problem in the long term if the utilization of the Jenkins instance is expected to increase. These scaling problem are not insurmountable and can be mitigated with vertical scaling or offloading some stashing/archiving activity, but eventually at a certain point it might make sense to re-evaluate going with Item 1, or even a combination of Item 1 with Item 2.
Conclusion:
I know this is not likely the answer you were hoping for, but if security is a major concern, then a multi-tenant Jenkins instance that allow users' configuration permissions is not they way to go.
I have been tasked with setting up automated deployment and, after some research, settled on Jenkins to get the job done. Prior to this I had approximately zero knowledge of Jenkins, beyond hearing the name. I have no real knowledge of Devops beyond what I have learnt in the last couple of weeks; no formal training, no actual books, just Google searches.
We are not running a full blown/classic CI/CD process; this is a business decision. The basic requirements are:
Source code will be stored in GitHub.
Pull requests must be peer approved.
Pull requests must pass build/unit/db deploy tests.
Commits to specific branches must trigger a deployment to a related specific environment (Production, Staging or Development).
The basic functionality that I am attempting to support covers (what I currently see as) two separate processes:
On creation of a pull request, application is built, unit tests run, and db deploy tested. Status info must be passed to GitHub.
On commit to one of three specific branches (master, staging and dev) the application should be built, and deployed to one of three environments (production, staging and dev).
I have managed to cobble together a pipeline that does the first task rather well. I am using the generic web hook trigger, and manually handling all steps using a declarative pipeline stored in source control. This works rather well so far and, after much hacking, I am quite happy with the shape of it.
I am now starting work on the next bit, automated deployment.
On to my actual question(s).
In short, how do I split this up into Jobs in Jenkins?
To my mind, there are 1, 2 or 4 Jobs to be created:
One Job to Rule them All
This seems sub-optimal to me, as the pipeline will include relatively complex conditional logic and, depending on whether the Job is triggered by a Pull Request or a Commit, different stages will be run. The historical data will be so polluted as to be near useless.
OR
One job for handling pull requests
One job for handling commits
Historical data for deployments across all environments will be intermixed. I am a little concerned that I will end up with >1 Jenkinsfile in my repository. Although I see no technical reason why I can't have >1 Jenkinsfile, every example I see uses a single file. Is it OK to have >1 Jenkinsfile (Jenkinsfile_Test & Jenkinsfile_Deploy) in the repository?
OR
One job for handling pull requests
One job for handling commits to Development
One job for handling commits to Staging
One job for handling commits to Production
This seems to have some benefit over the previous option, because historical data for deployments into each environment will not be cross polluting each other. But now we're well over the >1 Jenkinsfile (perceived) limit, and I will end up with (Jenkinsfile_Test, Jenkinsfile_Deploy_Development, Jenkinsfile_Deploy_Staging and Jenkinsfile_Deploy_Production). This method also brings either extra complexity (common code in a shared library) or copy/paste code reuse, which I certainly want to avoid.
My primary objective is for this to be maintainable by someone other than myself, because Bus Factor. A real Devops/Jenkins person will have to update/manage all of this one day, and I would strongly prefer them not to suffer from my ignorance.
I have done countless searches, but I haven't found anything that provides the direction I need here. Searches for best practices make no mention on handling >1 Jenkinsfile, instead focusing on the contents of a single pipeline.
After further research, I have found an answer to my core question. This might not be the absolute correct answer, but it makes sense to me, and serves my needs.
While it is technically possible to have >1 Jenkinsfile in a project, that does not appear to align with best practices.
The best practice appears to be to create a separate repository for each Jenkinsfile, which maps 1:1 with a Job in Jenkins.
To support my specific use case I have removed the Jenkinsfile from my main source code repository. I then create 4 new repositories:
Project_Jenkinsfile_Test
Project_Jenkinsfile_Deploy_Development
Project_Jenkinsfile_Deploy_Staging
Project_Jenkinsfile_Deploy_Production
Each repository contains a single Jenkinsfile and a readme.md that, in theory, contains useful information.
This separation gives me a nice view of the historical success/failure of the Test runs as a whole, and Deployments to each environment separately.
It is highly likely that I will later create a fifth repository:
Project_Jenkinsfile_Deploy_SharedLibrary
This last repository would contain pipeline code that is shared amongst the four 'core' pipelines. Once I have the 'core' pipelines up and running properly, I will consider refactoring what I can into this shared library.
I will not accept my own answer at this point, in the hope that more answers are forthcoming.
Here's a proposal I would try for your requirements based on the experience at my last job.
Job1: builds and runs unit tests on every commit on master or whatever your main dev branch is (checks every 20 minutes or whatever suits you); this job usually finds compile and unit test issues very fast
Job2 (optional): run integration tests and various static code checks (e.g. clang-tidy, valgrind, cppcheck, etc.) every night, if the last run of Job1 was successful; this job finds usually lots of things, but probably takes lots of time, so let it run only at night
Job3: builds and tests every pull request for release branches; so you get some info in your pull requests, if its mature enough to be merged into the release branches
Job4: deploys to the appropriate environment on every commit on a release branch; on dev and staging you could probably trigger some more tests, if you have them
So Job1, Job2 and Job3 should run all the time. If pull requests to your release branches are approved by QA (i.e. reviews OK and tests successful) and merged to release branches, the deployment is done by Job4 automatically.
It depends on your requirements and your dev process, if you want to trigger Job4 only manually instead.
I'm looking for a way to make multiple declaratively written Jenkinsfiles only run exclusively and block each other. They consume test instances who will be terminated after they run which causes problems when PRs are being tested as they come in.
I cannot find an option to make the BuildBlocker plugin do this, all the jenkinsfiles that use this plugin are not running in our Plugin/Jenkins version schema and it seems as if these [$class: <some.java.expression>] strings being exported from the syntax generator don't work here anyways.
I cannot find a way to run these Locks over all the steps involved in the pipeline.
I could hack a file-lock but this won't help me with multi-node builds.
This plugin could perhaps help you as it allows to lock resources you've declared previously so that if a resource is currently locked, any other job that requires the same resource will wait until it is released.
https://plugins.jenkins.io/lockable-resources/
Since you say you want declarative, probably wait for the currently-in-review Allow locking multiple stages in declarative pipeline jira issue to be completed. You can also vote for it and watch it.
And if you can't wait, this is your opportunity to learn golang (or whatever language you want to learn) by implementing a microservice that holds these locks that you call from your pipeline scripts. :D
The JobDSL plugin is for configuring Jenkins execution policies including blocking another and calling pipeline code.
https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.blockOn describes the method, which also the blocker plugin uses.
This is the tutorial for usage https://github.com/jenkinsci/job-dsl-plugin/wiki/Tutorial---Using-the-Jenkins-Job-DSL, the api https://github.com/jenkinsci/job-dsl-plugin/wiki/Job-DSL-Commands.
Taken from https://www.digitalocean.com/community/tutorials/how-to-automate-jenkins-job-configuration-using-job-dsl:
It should be possible to use https://github.com/jenkinsci/job-dsl-plugin/wiki/Dynamic-DSL, but I found no good usage example yet.
I am not a developer, but reading about CI/CD at the moment. Now I am wondering about good practices for automated code deployment. I read a lot about the deployment of code to a pre-existing environment so far.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created. For example for testing of a newly created build, deleting the environment again after testing.
I know that there are various plugins to interact with AWS, Azure etc. that could be used to develop a job for deployment of a virtual machine.
There are also plugins to trigger Puppet to deploy infra (as code) and there are plugins to invoke an infrastructure orchestration.
So everything is available to be able to deploy the infrastructure and middleware before deploying code (with some extra effort of course).
Is this something that is used in real life? How is it done?
The background of my question is my interest in full automation of development with as few clicks as possible, and cost saving in a pay-per-use model by not having idle machines.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created
Yes it is good practice to deploy an environment from scratch. Like you say, Jenkins and Jenkins pipelines can certainly help with kicking off and orchestrating that process depending on your specific requirements. Deploying a full environment from scratch is one of the hardest things to automate, and if that is automated, it implies that a lot of other things are also automated, such as infrastructure, application deployments, application configuration, and so on.
Is this something that is used in real life?
Yes, definitely. A lot of shops do this. The simpler your environments, the easier it is, and therefore, a startup with one backend app would have relatively little trouble achieving this valhalla state. But even the creation of the most complex environments--with hundreds of interdependent applications--can be fully automated; it just takes more time and effort.
The background of my question is my interest in full automation of development with as less clicks as possible and cost saving in a pay-per-use model by not having idling machines.
Yes, definitely. The "spin up and destroy" strategy benefits all hosting models (since, after full automation, no one ever has to wait for someone to manually provision an environment), but those using public clouds see even larger benefits in terms of cost (vs always leaving AWS environments running, for example).
I appreciate your thoughts.
Not a problem. I will advise that this question doesn't fit stackoverflow's question and answer sweet spot super well, since it is quite general. In the future, I would recommend chatting with your developers, finding folks who are excited about this sort of thing, and formulating more specific questions when you all get stuck in the weeds on something. Welcome to stackoverflow!
All is being used in various combinations; the objective is to deliver continuous value to end user. My two cents:
Build & Release
It depends on what you are using. I personally recommend to use what is available with the tool. For example, VSTS (Visual Studio Team Services) offers complete CI/CD pipeline. But if you have a unique need which can only be served by Jenkins then you must use that and VSTS offers that out of the box.
IAC (Infrastructure as code)
In addition to Puppet etc. You can take benefits of AZURE ARM (Azure Resource Manager) Template to Build and destroy an environment. Again, see what is available out of the box with the tool set you have.
Pay-per-use
What I have personally used is Azure Dev/Test Labs and have the code deployed to that via CI/CD pipeline. Later setup Shutdown policy on the VM so it will auto-start and auto-shutdown based on time provided. This is a great feature to let you save cost on the resources being used and replicate environments.
For example, UAT environment might not needed until QA is signed off. But using IAC you can quickly spin up the environment automatically and then have one-click deployment setup to deploy code to UAT.
Like every commit has a reason and purpose, I think each deploy has a purpose and reason. Source code commits have a comment. But deploying doesn't have any.
How do I record a reason and purpose for each deploy automatically?
I need to keep a record of:
Who deployed to where and what time.
Why deployed? Bug fixes? Feature update? Emergency fix not on iteration plan?
Which git or svn ref was used?
Have anybody felt the need for this kind of system? How do you feel about my approach?
How can I achieve my goal? I'm currently using Capistrano for deployment.
A bounty added. I'd like to hear more stories from different developers who are doing "continuous deployment".
I found two services that do deploy tracking:
Codebase
Hoptoad
Webistrano - https://github.com/peritor/webistrano/wiki - is a web interface to capistrano, that also tracks who's deployed what and when, so that could be worth investigating.
My current project uses a modified version of the apinsein's git-deployment recipe, which (when you tell cap to do a deploy) will tag current HEAD with a Git tag (which gives you all the benefits of normal Git commits).
I've built a web service for this exact problem, http://deploytracking.com, it hooks into capistrano and records the time, user, branch, ref, environment and repo that was involved in the deployment.
Strano - The Github backed Capistrano deployment management UI.
Regarding continuous deployment, I also submitted pull request there, which Introduce automatic deployments for GitHub projects, for now it simply triggers deploy task when somebody push to the master branch.
I don't know if it is still relevant but I would like to come up with a different solution. I am building a new deployment tool that does just what you are looking for.
I do not intend to spam my stuff here but since I am building something that could help you...
Anyway, have a look here https://alessiosantocs.github.io/Captain. I'm gathering feedback so if you have any please let me know.
Update
As suggested, I'm giving an explanation :)
I have also felt this need. I work in a digital startup and we're constantly deploying stuff 5 days a week on different Ruby on Rails application with Capistrano.
What we noticed was that for every single deployment, we should have done several things:
Keep track of which pull requests and commits went online that exact moment
Give some sort of a name to the deploy so we could recognize it
Alert our team members so that everyone could have been on the same page (without asking us of deployment's news)
Keep track of every deployments for future bugs and errors we might find at some point in time (which happened often)
So for this reason we started developing this custom solution that would integrate with Capistrano and our SCM (bitbucket) and keep track of every change we made to our master branch. This is what it does right now.
We are currently tracking deployment environment, repo source, deployment branch and revision. Mainly we manage pull requests, because we found that pull requests, better than commits, did solve an organizational issue in our team (it was difficult to approve other team member's code without a rigid system like PRs)
I would like to explain more about Captain and about our personal dev management strategy with you guys if you want.
Thanks #thirumalaimurugan for asking for clarification!
Update 2
We tried git tagging too. It was good and fun at the beginning but we couldn't manage them very well.
A tag is basically a bookmark to a specific revision. So we're talking about commits. A tag keeps no track of pull requests. It was quite a mess for us.
I don't think they're bad at what you're trying to achieve, but I think there must be some other solutions that could fit exactly your (and our too) problem.