I am evaluating the Hudson build system for use as a centralized, "sterile" build environment for a large company with very distributed development (from both a geographical and managerial perspective). One goal is to ensure that builds are only a function of the contents of a source control tree and a build script (also part of that tree). This way, we can be certain that the code placed into a production environment actually originated from our source control system.
Hudson seems to provide an ant script with the full set of rights assigned to the user invoking the Hudson server itself. Because we want to allow individual development groups to modify their build scripts without administrator intervention, we would like a way to sandbox the build process to (1) limit the potential harm caused by an errant build script, and (2) avoid all the games one might play to insert malicious code into a build.
Here's what I think I want (at least for Ant, we aren't using Maven/Ivy right now):
The Ant build script only has access to its workspace directory
It can only read from the source tree (so that svn updates can be trusted and no other code is inserted).
It could perhaps be allowed read access to certain directories (Ant distribution, JDK, etc.) that are required for the build classpath.
I can think of three ways to implement this:
Write an ant wrapper that uses the Java security model to constrain access
Create a user for each build and assign the rights described above. Launch builds in this user space.
(Updated) Use Linux "Jails" to avoid the burden of creating a new user account for each build process. I know little about these though, but we will be running our builds on a Linux box with a recent RedHatEL distro.
Am I thinking about this problem correctly? What have other people done?
Update: This guy considered the chroot jail idea:
https://www.thebedells.org/blog/2008/02/29/l33t-iphone-c0d1ng-ski1lz
Update 2: Trust is an interesting word. Do we think that any developers might attempt anything malicious? Nope. However, I'd bet that, with 30 projects building over the course of a year with developer-updated build scripts, there will be several instances of (1) accidental clobbering of filesystem areas outside of the project workspace, and (2) build corruptions that take a lot of time to figure out. Do we trust all our developers to not mess up? Nope. I don't trust myself to that level, that's for sure.
With respect to malicious code insertion, the real goal is to be able to eliminate the possibility from consideration if someone thinks that such a thing might have happened.
Also, with controls in place, developers can modify their own build scripts and test them without fear of catastrophe. This will lead to more build "innovation" and higher levels of quality enforced by the build process (unit test execution, etc.)
This may not be something you can change, but if you can't trust the developers then you have a larger problem then what they can or can not do to your build machine.
You could go about this a different way, if you can't trust what is going to be run, you may need a dedicated person(s) to act as build master to verify not only changes to your SCM, but also execute the builds.
Then you have a clear path of responsibilty for builds to not be modified after the build and to only come from that build system.
Another option is to firewall off outbound requests from the build machine to only allow certain resources like your SCM server, and your other operational network resources like e-mail, os updates etc.
This would prevent people from making requests in Ant to off the build system for resources not in source control.
When using Hudson you can setup a Master/Slave configuration and then not allow builds to be performed on the Master. If you configure the Slaves to be in a virtual machine, that can be easily snapshotted and restored, then you don't have to worry about a person messing up the build environment. If you apply a firewall to these Slaves, then it should solve your isolation needs.
I suggest you have 1 Hudson master instance, which is an entry point for everyone to see/configure/build the projects. Then you can set up multiple Hudson slaves, which might very well be virtual machines or (not 100% sure if this is possible) simply unprivileged users on the same machine.
Once you have this set up, you can tie builds to specific nodes, which are not allowed - either by virtual machine boundaries or by Linux filesystem permissions - to modify other workspaces.
How many projects will Hudson be building? Perhaps one Hudson instance would be too big, given the security concerns you are expressing. Have you considered distributing the Hudson instances out - one per team. This avoids the permission issue entirely.
Related
I'd like to get a hint how (which plugin) it is possible run SINGLE Jenkins job by the user chosen way. User MUST be able to choose the job he/she wants to run and choose the rule of execution:
E.g:
Create only jar files;
Create jars and send them over ssh
Create jars, generate documentation, etc...
I've found out a few plugins (Artifactory, Release plugin) but seems they don't support such logic.
I know that such thing can be implemented by creating several jobs, but this would require additional disk space.
Many Thanks!
In order to solve my issue, I've decided to create a few Jenkins jobs with the same custom workspace. So that, when a IT engineer runs any of these "connected" (which have the same workspace) jobs the workspace is updated (have a look at the CVS rules for your job) and that's why we avoid wasting of space.
Additionally, its (job) behaviour can be configured easily => the sets of rules (shell scripts, gradle, batch etc) and their sequence in order to achieve the desired result.
The last advantage, but not the least one, is that the security (access control) is still very easy to configure.
I think, that is the correct way.
I have a configured CI with TFS. What are the best ways to organize post-build (or even better post-test) deployment. My binaries are some libraries with single executable file.
Here is what I need:
Build on each commit. (This is configured and done)
When build is successful (or tests), grep binaries and drop it to some specific folder on the same build machine with full replacement of previous files and folders. (I`d like to be able to configure somehow the folder location)
Launch the application with some parameters and I need to have standart output redirection. For example: App.exe param=paramValue > log.txt
And before starting the application I need to kill the previous instance of it. (This is some kind of server instance that is alive all the time)
The most obvious solution that I tried was to do this with post-build script. But this try failed. See here
Use Release Management in conjunction with PowerShell (or better still, Desired State Configuration) scripts. Depending on your MSDN licensing, it could be free for you, and it's specifically designed from the ground up to handle managing releases.
Overextending the build process to also do deployment is an awful idea. The build tools were designed to build, and they're good at it! They're not good at the types of considerations you have when you're trying to do deployments.
The problem is that most CI solutions (TFS included) would get you to the point where you had binaries, then say "Welp, you're on your own! Have fun figuring out how to deploy this stuff!" This never ends well -- you end up with something inflexible and very difficult to troubleshoot and maintain.
The modern "devops" approach here is to have your application's requirements in source control, treated as code (in this case, as a DSC script or scripts).
One other consideration: It sounds like you're trying to treat a console application as a service. This is going to be a big, big pain for you, since most software that handles releases will not run in an interactive session. Turn it into a true Windows service and your life will be easier.
Although this question specifically involves Gradle and Bamboo, it really is a question about any build system (Ant/Maven/Gradle/etc.) and any CI tool (Bamboo/Jenkins/Hudson/etc.).
I was always under the impression that the purpose of a CI build is to:
Check out code from VCS
Run a buildscript (Gradle, etc.)
Deploy a binary (WAR, etc.) to an environment
Hence, all the guts and heavy-lifting (running automated tests, code analysis, test coverage, compiling, Javadocs, packaging, etc.) was all to be done from inside the buildscript.
But Bamboo seems to allow you to break this heavy-lifting out of the buildscript and into Bamboo itself. In Bamboo, you can add build stages and decompose the stages into tasks. Each task is something just as atomic/fundamental as an Ant task.
So it got me thinking: how much should one empower the CI tool? What typical buildscript functionality should be transferred over to Bambooo/CI? For instance, should I be compiling from a Gradle task, or from a Bamboo task? Same goes for all tasks/stages.
For some reason, I view this as the same problem as to whether or not to use stored procedures or put the data processing all at the application layer. What are the pros/cons of each approach?
TL;DR at the bottom
My experience is with Jenkins, so examples will relate to that.
One thing with any build system (be it CI server or a buildscript), is that it should be stable, simple and self-contained so that an untrained receptionist (with printed instructions and proper credentials) could do it.
Ease of use and re-use
Based on the above, one would think that a buildscript wins. Not always. As with the receptionist example, it's about easy of use and easy of reproducibility.
If a buildscript has interdependent build targets that only work in correct order, dependence on pre-supplied property files that have to be adjusted for the correct branch ahead of build, reliance on environment variables that no-one remembers who created in the first place, and a supply of SCM revision numbers that have to be obtained by looking at the log of the commits for the last month... This is in no way better than a Jenkins job that can be triggered with a single button.
Likewise, a Jenkins workflow could be reliant on multiple dependant jobs, each being manually pre-configured before the build, and need artifacts uploaded from one place to another... which no receptionist will do.
So, at this point, a self-contained good buildscript that only requires ant build command to do everything from beginning to end, is just as good as a Jenkins job that only required build now... button to be pressed.
Self-contained
It is easy to think that since Jenkins will (at some point) end up calling at least a portion of a buildscript (say ant compile), that Jenkins is "compartmentalizing" the buildscript into multiple steps, thus breaking away from being self-contained.
However, instead you should zoom out by one level, and treat the whole Jenkins job configuration as a single XML file (which, by the way, can be stored and versioned through an SCM just like the buildscript)
So, at this point, it doesn't matter if the whole build logic is inside a single buildfile, or a single XML job configuration file. Both can be self-contained when done right.
The devil you know
In majority of cases, it comes down to what you know.
Some people find it easier to use Jenkins UI to visually arrange their build workflow, reporting, emailing, and archiving (and for anything that doesn't fit as wanted, find a plugin). For them, figuring out a build script language is more time consuming then simply trying it in UI.
Others prefer to know exactly what every single line of their build script does, and don't like giving control to some piece of foreign code obfuscated by UI.
Both points have merits from all sides Quality-Time-Budget triangle
The presentation
So far, things have been more or less balanced. However:
My Jenkins will email a detailed HTML report with a link to a job page and send it straight up to the (non tech-savvy) CEO. He can look at the list of latest builds, along with SCM changes for each build, linking him to JIRA issues fixed for each build (all hyperlinks to relevant places). He can select the build with the set of changes that he wants, and click "install iOS package" right off his iPad that he just used to view all this information. Meanwhile I can go to the same job page, and review the build logs and artifacts of each log, check the build time trends and compare the parameters that were used between the failing and succeeding jobs (and I didn't have to write any echos to display that, it's just all there, cause Jenkins does that for you)
With a buildscript, even if you piped the output to a file, would you send that to your (non tech-savvy) CEO? Unlikely. But wait, you know this devil very well. A few quick changes and hacks, couple Red Bulls... and months of thankless work (mostly after-hours) later... you've created a buildscript that will create and start a webserver, prepare HTML reports, collect statistics and history, email all the relevant people, and publish everything on a webpage, just like Jenkins did. (Ohh, if people could only see all the magic you did escaping and sanitizing all that HTML content in a buildscript). But wait... this only works for a single project.
So, a full case of Red Bulls later, you've managed to make it general enough to build any project, and you've created...
Another Jenkins/Bamboo/CI-server
Congratulations. Come up with a name, market it, and make some cash of it, cause this ultimate buildscript just became another CI solution a la Jenkins.
TL;DR:
Provided the CI-server can be configured simply and intuitively so that a receptionist could run the build, and provided the configuration can be self-contained (through whatever storage method the CI-server uses) and versioned in SCM, it all comes down to the Quality-Time-Budget triangle.
If you have little time and budget to learn the CI server, you can still greatly increase the quality (at least of the presentation) by embracing the CI-server's way of organizing stuff.
If you have unlimited time and budget, by all means, make your own Jenkins with the buildscript.
But considering the "unlimited" part is rather unrealistic, I would embrace the CI-server as much as possible. Yes, it's a change. However a little time invested in learning the CI-server and how it compartmentalizes or breaks into tasks the different parts of the build flow, this time spent can go a long way to increasing the quality.
Likewise, if you have no time and/or budget, figuring out the quirks of all the plugins/tasks/etc and how it all comes together will only bring your overall quality down, or even drag the time/budget down with it. In such cases, use the CI-server for bare minimum needed to trigger your existing buildscripts. However, in some cases, the "bare minimum" is no better than not using the CI-server in the first place. And when you are at this place... ask yourself:
Why do you want a CI-server in the first place?
Personally (and with today's tools), I'd take a pragmatic approach. I'd do as much as feasible on the build side (clearly better from an automation perspective), and the rest (e.g. distribution of work across machines) on the CI server. Anything that a developer might want to do on his own machine should definitely be automated on the build level. As to the concrete steps you gave, I'd generally check out code from the CI server, and deploy binaries from the build. I'd try to make every CI job look the same, invoking the build tool in the same way (e.g. gradlew ciBuild).
In Bamboo, you can add build stages and decompose the stages into tasks. Each task is something just as atomic/fundamental as an Ant task.
To some extent, this overlap in functionality is natural, as neither build tool nor CI server can assume existence of the other, and both want to provide as complete a solution as possible.
For some reason, I view this as the same problem as to whether or not to use stored procedures or put the data processing all at the application layer.
It's not an unfair comparison, and hence opinions will be as diverse, contextual, and nuanced.
Disclaimer: I'm a Gradle(ware) developer.
Our project group stored binary files of the project that we are working on in SVN repository for over a year, in the end our repository grew out of control, taking backups of SVN repo became impossible at one point since each binary that is checked in is around 20 MB.
Now we switched to TFS,we are not responsible for backing the repository up, our IT tream takes care of it and we have more network and storage capacity for backups because of that but we want to decide what to do with the binaries. As far as I know TFS stores deltas and for binary files but deltas will be huge, but we might end up reaching our disk space quota one day, so I would like to plan things better from the start, I don't want to get caught up in a bad situation when it's too late to fix the problem.
I would prefer not keeping builds in the source control but our project group insists to keep a copy of every binary for reproducing the problems that we see in the production system, I can't get them to get the source code from TFS, build it and create the binary, because it is not straightforward according to them.
Does TFS offer a better build versioning method? If someone can share some insight I'd really be grateful.
As a general rule you should not be storing build output in TFS. Occasionally you may want to store binaries for common libraries used by many applications but tools such as nuget get around that.
Build output has a few phases of its life and each phase should be stored in a separate place. e.g.
Build output: When code is built (by TFS / Jenkins / Hudson etc.) the output is stored in a drop location. This storage should be considered volatile as you'll be producing a lot of builds, many of which will be discarded.
Builds that have been passed to testers: These are builds that have passed some very basic QA e.g. it compiles, static code analysis tools are happy, unit tests pass. Once a build has been deemed good enough to be given to test it should be moved from the drop location to another area. This could be a network share (non production as the build can be reproduced) there may be a number of builds that get promoted during the lifetime of a project and you will want to keep track of what versions the testers are using in each environment.
Builds that have passed test and are in production: Your test team deem the build to be of a high enough quality to ship. As part of your go live process, you should take the build that has been signed off by test and store it in a 3rd location. In ITIL speak this is a Definitive Media Library. This can be a simple file share, but it should be considered to be "production" and have the same backup and resilience criteria as any other production system.
The DML is the place where you store the binaries that are in production (and associated configuration items such as install instructions, symbol files etc.) The tool producing the build should also have labelled the source in TFS so that you can work out what code was used to produce the binary. Your branching strategy will also help with being able to connect the binary to the code.
It's also a good idea to have a "live like" environment, this should be separate from your regular dev and test environments. As the name suggests it contains only the code that has been released to production. This enables you to quickly reproduce bugs in production
Two methods that may help you:
Use Team Foundation Build System. One of the advantages is that you can set up retention periods for finished builds. For example, you can order TFS to store the 10 latest successful builds, and the two latest failed ones. You can also tell TFS to store certain builds (e.g. "production builds"/final releases) indefinitely. These binaries folders can of course also be backed up externally, if needed.
Use a different collection for your binaries, with another (less frequent) backup schedule. TFS needs to backup whole collections, but by separating data that doesn't change as frequently as the source you can lower the backup cost. This of course depends on the frequency you are required to have the binaries backed up.
You might want to look into creating build definitions in TFS to give your project group an easy 'one button' push to grab the source code from a particular branch and then build it and drop it to a location. That way they get to have their binaries, and you don't have to source control them.
If you are using a branching strategy where you create Release or RTM branches when you push something to production, then you can point your build definitions at those branches and they can manually trigger them from the TFS portal or from within Visual Studio.
Our team is sharing a Jenkins server with other teams, and this currently means that we are sharing the same OS-level build-user account. The different teams' OS-level build-user settings (Maven settings, bash settings, user-level Ant libraries, etc...) have collided a few times--"fixing" the settings for one team's jobs inadvertently "breaks" another team's jobs. The easiest sol'n that occurs to me is giving each team its own OS-level build-user account with which to execute its Jenkins jobs--but I cannot find a way to do this.
I have checked with Google, and also here
https://wiki.jenkins-ci.org/display/JENKINS/Use+Jenkins
and here
https://wiki.jenkins-ci.org/display/JENKINS/Plugins
to no avail.
Is there a way to do this? If not, can you recommend any best practices for segregating sets of builds from one another?
Maven Specific
You have two options that come to mind,
Add additional installations of Maven into your Jenkins global configuration, each using their own Home directory, and thus settings files. This will allow you to use totally different version of Maven, and selected based on Job requirements (You are given the option to select which "version" of maven you wish to use on the job itself.
Similar to (1), but specify specific settings configurations using Maven command line arguments. Its a little less "obvious" but may be quicker to implement
Multi-slave
You could possibly make use of multiple slaves on each machine. It increases the overheads of the builds quite significantly, and the implementation is such that you'd have multiple user accounts on a machine, each setup as needed, and then one slave instance for each user.
I'm not sure these solutions will totally answer your problem, I'll have a think and see if anything else pops into mind, but it might give some starting points
Key builds to a specific team directory that contains that team's settings. For example, provide a parameter 'TEAM' to every build, set its default value to the appropriate team name, and use that parameter as a key to a directory that contains the team's settings (so instead of using ${HOME} as in what you want to do, you'll use something like ${TEAM_SETTINGS}/${TEAM}).
You can set per-job users (who has access to/can build a particular job).
Under "Manage Jenkins" > "Configure System" >
Click on Enable Security
Check Project-based Matrix Authorization Strategy
However, I do not think there is a "per-build" option for a single job.
If you have the same project that you are sharing between teams, you could (and probably should) create two jobs for this project, and have different libraries/scripts be used in each.
You could also parametrize the build (On the Job Page, "Configure" > This build is parametrized) and supply the library versions, etc via string parameters.
You could also use a parameter to be the team's name, and in your build script change libraries based on the parameter:
For example, have a parameter called "TEAM", with choices: TEAM_A and TEAM_B, and in your script, have
if [ $TEAM == "TEAM_A" ]
then
ANT_HOME=/opt/ant/libA
else
ANT_HOME=/opt/ant/libB
fi
======================================================================
Have you considered sourcing your settings? In Linux, you could do this by saving your OS settings in a script file (for example paths, etc), and using source /path/to/settings/file, in Windows it would be call /path/to/settings/batch/file.
Can you give examples of OS level settings that you would require and per-build user for?
You problem is a common one.
Whenever something nonstandard is installed on a build server, something will break for someone.
The only solutions I know are
Set up a separate build slave for each team or product. Then they can install whatever they want on the build slave and any mess they create is all their own fault.
Any dependencies required by a job need to come with the job. This is my preferred way of working. For example: If a job needs a library or a tool, the library or tool is not installed on the build server but in the source tree and the build uses it from the source tree.
Sometimes the latter way is more work. You need to set up the tools or library so it works when it is installed in the source tree. Some tools have hard-coded paths and they do not work. In that case you can install the source of the tool and compile the tool during the build.
An even better solution is to set up separate Jenkins jobs for all the tools and libraries and the jobs that need a library or tool will download them from the Jenkins jobs.
This way you can control all your dependencies and different jobs do not conflict when e.g. one needs an older version of a library and one a newer version. And if someone upgrades the library, it is immediately visible in the version control who did what.