Running a jenkins build on multiple CPU architectures - jenkins

I'm looking for a way to use Jenkins to build a single code base for multiple CPU architectures. at the moment this is amd_64 and armhf, although this may expand in the future. The ideal situation would be to run the build over a number of different jenkins slaves with a different CPU architectures.
These build jobs are not compiler based (maven, gradle ext.) but system independ shell scripts (bash and python) which auto detect their CPU architecture and produce build artifacts to match the CPU.
I may be missing something really obvious, but I don't see a way to automatically run a build a number of times over different architectures or bind a specific build to a specific architecture.
Could anyone point me in the right direction?

Funny you ask this question now. Published last Friday (2019-11-22) ...
You should review the Jenkins blog: Welcome to the Matrix
I often find myself needing to run the same actions on a bunch of
different configurations. Up to now, that meant I had to make multiple
copies of the same stages in my pipelines. When I needed to make
changes, I had to make the same changes in multiple places throughout
my pipeline. Maintaining even a small number of configuration was
difficult for larger pipelines.
Single configuration pipeline
Pipeline for multiple platforms and browsers
Excluding invalid combinations

Related

Distributed Builds by Bazel

How to build a large project by Bazel on two servers to speed up the building process?
When you have a large codebase, chains of dependencies can become very deep. Even simple binaries can often depend on tens of thousands of build targets.
https://bazel.build/basics/dependencies
The documentation for remote builds starts here:
https://bazel.build/remote/rbe
In particular you might look into the self-service and commercial remote execution services here:
https://bazel.build/community/remote-execution-services

Understanding Jenkins: does Jenkins Agent nodes require the same environment as controller node? How exactly does agent works?

(master / slave terminology is now controller / agent)
I am learning about Jenkins build automation and I have trouble understanding how the controller/agent structure works.
In my setup, there is a controller Jenkins node on the same computer of the source to be built. The source consists of gigabytes of data (including the source code, the compiler and some resource data). To be precise, the "compiler" is the Unity3D editor (which itself consists of several gigabytes of data) and the "source code" is the scripts and assets of a Unity project.
The Jenkins job is supposed to run the "compiler" on the "source code" and generate some outputs. In a controller/agent configuration, the process is supposed to be parallelized over multiple agent nodes (Jenkins agents).
Now here comes the part I struggle to understand: if I were to setup multiple agent nodes (Jenkins agents) on multiple machines, do I need all those agent machines to have the exact same "compiler" and "source code" on them for Jenkins to work correctly?
On one hand, based on my reading it seems there isn't the need to do so and the agent machine only needs a "Jenkins agent" to be set up for everything to work properly. The agent just "magically" knows about the "compiler" and the "source code" on the remote controller machine, without the need to either install the "compiler" or the synchronize the "source code".
However on the other hand, I just couldn't imagine how is this possible to be done with gigabytes of data involved, to the extent that I am confused about whether my understanding on the reading is correct. Even if the "source code" can be transferred "by pieces" the "compiler" must be fully executable and cannot be transferred by piece right?
Any explanation would be appreciated.
I have also been learning Jenkins for game development and, as far as I understand, agents first use is in separating out scheduling vs building. The idea is you have one controller and it can run anywhere, and one or more agents that run on different hardware. This way, and when an agent is working, the controller doesn't loose resource to do it's job. You might have just one agent, or a Windows machine and Mac that each build for their respective platforms or any number of agents. The point is not bogging down the controller.
AFAIK, Jenkins pipelines added parallelism but it is optional there isn't any magic. For Unreal, there is a product Incredibuild that does distributed compilation but Jenkins wouldn't handle that. A cursory search suggests that Unity does not build a way to support distributed compilation.
Note: I haven't done any of this but I think parallel agents could be useful if you had some preprocessing step like down-sampling all your textures. You could have parallel actions process different folders across multiple agents, and when the they are all done, it goes back to a non-parallel build. Again, this is not a great example but the point is I think it's up to you to figure out if there are pieces of your process that could be made parallel.

Distribute large build across many build agents

I have a large solution, with many projects and many files, and only one build configuration, Release. I am using TFS, and the complete rebuild takes like 2 hours.
Is it possible to distribute the build across several agents, so that they will compile different projects, or, even better, different files? Something like dictcc? I can distribute the build on up to 10+ different machines, but the build only works on one.
For now, my impression is that agents can only have specialized jobs, like build, run tests, etc, but not split and distribute only one build.
I already tried optimizing the build, but still the project is big and can benefit on parallel build
You can but but must roll-up your sleeves: there is no built-in template that helps, but Jim explains how to make one.
Do not forget that you can also leverage multi-CPU/Core as explained in Building Multiple Projects in Parallel with MSBuild.
Your best option would be to break your solution down into defunct components that can be built separately.
If you seperate each bit and build and test before publishing as Nuget you can distribute easily across build servers and even only build the bits that have changed.
This process will also work in the new build system coming in 2015 that does not use XAML.

Power tradeoff between buildscript and CI server

Although this question specifically involves Gradle and Bamboo, it really is a question about any build system (Ant/Maven/Gradle/etc.) and any CI tool (Bamboo/Jenkins/Hudson/etc.).
I was always under the impression that the purpose of a CI build is to:
Check out code from VCS
Run a buildscript (Gradle, etc.)
Deploy a binary (WAR, etc.) to an environment
Hence, all the guts and heavy-lifting (running automated tests, code analysis, test coverage, compiling, Javadocs, packaging, etc.) was all to be done from inside the buildscript.
But Bamboo seems to allow you to break this heavy-lifting out of the buildscript and into Bamboo itself. In Bamboo, you can add build stages and decompose the stages into tasks. Each task is something just as atomic/fundamental as an Ant task.
So it got me thinking: how much should one empower the CI tool? What typical buildscript functionality should be transferred over to Bambooo/CI? For instance, should I be compiling from a Gradle task, or from a Bamboo task? Same goes for all tasks/stages.
For some reason, I view this as the same problem as to whether or not to use stored procedures or put the data processing all at the application layer. What are the pros/cons of each approach?
TL;DR at the bottom
My experience is with Jenkins, so examples will relate to that.
One thing with any build system (be it CI server or a buildscript), is that it should be stable, simple and self-contained so that an untrained receptionist (with printed instructions and proper credentials) could do it.
Ease of use and re-use
Based on the above, one would think that a buildscript wins. Not always. As with the receptionist example, it's about easy of use and easy of reproducibility.
If a buildscript has interdependent build targets that only work in correct order, dependence on pre-supplied property files that have to be adjusted for the correct branch ahead of build, reliance on environment variables that no-one remembers who created in the first place, and a supply of SCM revision numbers that have to be obtained by looking at the log of the commits for the last month... This is in no way better than a Jenkins job that can be triggered with a single button.
Likewise, a Jenkins workflow could be reliant on multiple dependant jobs, each being manually pre-configured before the build, and need artifacts uploaded from one place to another... which no receptionist will do.
So, at this point, a self-contained good buildscript that only requires ant build command to do everything from beginning to end, is just as good as a Jenkins job that only required build now... button to be pressed.
Self-contained
It is easy to think that since Jenkins will (at some point) end up calling at least a portion of a buildscript (say ant compile), that Jenkins is "compartmentalizing" the buildscript into multiple steps, thus breaking away from being self-contained.
However, instead you should zoom out by one level, and treat the whole Jenkins job configuration as a single XML file (which, by the way, can be stored and versioned through an SCM just like the buildscript)
So, at this point, it doesn't matter if the whole build logic is inside a single buildfile, or a single XML job configuration file. Both can be self-contained when done right.
The devil you know
In majority of cases, it comes down to what you know.
Some people find it easier to use Jenkins UI to visually arrange their build workflow, reporting, emailing, and archiving (and for anything that doesn't fit as wanted, find a plugin). For them, figuring out a build script language is more time consuming then simply trying it in UI.
Others prefer to know exactly what every single line of their build script does, and don't like giving control to some piece of foreign code obfuscated by UI.
Both points have merits from all sides Quality-Time-Budget triangle
The presentation
So far, things have been more or less balanced. However:
My Jenkins will email a detailed HTML report with a link to a job page and send it straight up to the (non tech-savvy) CEO. He can look at the list of latest builds, along with SCM changes for each build, linking him to JIRA issues fixed for each build (all hyperlinks to relevant places). He can select the build with the set of changes that he wants, and click "install iOS package" right off his iPad that he just used to view all this information. Meanwhile I can go to the same job page, and review the build logs and artifacts of each log, check the build time trends and compare the parameters that were used between the failing and succeeding jobs (and I didn't have to write any echos to display that, it's just all there, cause Jenkins does that for you)
With a buildscript, even if you piped the output to a file, would you send that to your (non tech-savvy) CEO? Unlikely. But wait, you know this devil very well. A few quick changes and hacks, couple Red Bulls... and months of thankless work (mostly after-hours) later... you've created a buildscript that will create and start a webserver, prepare HTML reports, collect statistics and history, email all the relevant people, and publish everything on a webpage, just like Jenkins did. (Ohh, if people could only see all the magic you did escaping and sanitizing all that HTML content in a buildscript). But wait... this only works for a single project.
So, a full case of Red Bulls later, you've managed to make it general enough to build any project, and you've created...
Another Jenkins/Bamboo/CI-server
Congratulations. Come up with a name, market it, and make some cash of it, cause this ultimate buildscript just became another CI solution a la Jenkins.
TL;DR:
Provided the CI-server can be configured simply and intuitively so that a receptionist could run the build, and provided the configuration can be self-contained (through whatever storage method the CI-server uses) and versioned in SCM, it all comes down to the Quality-Time-Budget triangle.
If you have little time and budget to learn the CI server, you can still greatly increase the quality (at least of the presentation) by embracing the CI-server's way of organizing stuff.
If you have unlimited time and budget, by all means, make your own Jenkins with the buildscript.
But considering the "unlimited" part is rather unrealistic, I would embrace the CI-server as much as possible. Yes, it's a change. However a little time invested in learning the CI-server and how it compartmentalizes or breaks into tasks the different parts of the build flow, this time spent can go a long way to increasing the quality.
Likewise, if you have no time and/or budget, figuring out the quirks of all the plugins/tasks/etc and how it all comes together will only bring your overall quality down, or even drag the time/budget down with it. In such cases, use the CI-server for bare minimum needed to trigger your existing buildscripts. However, in some cases, the "bare minimum" is no better than not using the CI-server in the first place. And when you are at this place... ask yourself:
Why do you want a CI-server in the first place?
Personally (and with today's tools), I'd take a pragmatic approach. I'd do as much as feasible on the build side (clearly better from an automation perspective), and the rest (e.g. distribution of work across machines) on the CI server. Anything that a developer might want to do on his own machine should definitely be automated on the build level. As to the concrete steps you gave, I'd generally check out code from the CI server, and deploy binaries from the build. I'd try to make every CI job look the same, invoking the build tool in the same way (e.g. gradlew ciBuild).
In Bamboo, you can add build stages and decompose the stages into tasks. Each task is something just as atomic/fundamental as an Ant task.
To some extent, this overlap in functionality is natural, as neither build tool nor CI server can assume existence of the other, and both want to provide as complete a solution as possible.
For some reason, I view this as the same problem as to whether or not to use stored procedures or put the data processing all at the application layer.
It's not an unfair comparison, and hence opinions will be as diverse, contextual, and nuanced.
Disclaimer: I'm a Gradle(ware) developer.

Jenkins and multi-configuration (matrix) jobs

Why are there two kinds of jobs for Jenkins, both the multi-configuration project and the free-style project project? I read somewhere that once you choose one of them, you can't convert to the other (easily). Why wouldn't I always pick the multi-configuration project in order to be safe for future changes?
I would like to setup a build for a project building both on Windows and Unix (and other platforms as well). I found this question), which asks the same thing, but I don't really get the answer. Why would I need three matrix projects (and not three free-style projects), one for each platform? Why can't I keep them all in one matrix, with platforms AND (for example) gcc version on one axis and (my) software versions on the other?
I also read this blog post, but that builds everything on the same machine, with just different Python versions.
So, in short: how do most people configure a multi-configuration project targeting many different platforms?
The two types of jobs have separate functions:
Free-style jobs: these allow you to build your project on a single computer or label (group of computers, for eg "Windows-XP-32").
Multi-configuration jobs: these allow you to build your project on multiple computers or labels, or a mix of the two, for eg Windows-XP, Windows-Vista, Windows-7 and RedHat - useful for checking compatibility or building for multiple platforms (qt programs?)
If you have a project which you want to build on Windows & Unix, you have two options:
Create a separate free-style job for each configuration, in which case you have to maintain each one individually
You have one multi-configuration job, and you select 2 (or more) labels/computers/slaves - 1 for Windows and 1 for Unix. In this case, you only have to maintain one job for the build
You can keep the your gcc versions on one axis, and software versions on another. There is no reason you should not be able to.
The question that you link has a fair point, but one that does not relate to your question directly: in his case, he had a multi-configuration job A, which - on success - triggered another job B. Now, in a multi-configuration job, if one of the configuration fails, the entire job fails (obviously, since you want your project to build successfully on all your configurations).
IMHO, for building the same project on multiple platforms, the better way to go is to use a multi-configuration style job.
Another option is to use a python build step to check the current OS and then call an appropriate setup or build script. In the python script, you can save the updated environment to a file and inject the environment again using the EnvInject plugin for subsequent build steps. Depending on the size of your build environment, you could also use a multi-platform build tool like SCons.
You could create a script (e.g. build) and a batch file (e.g. build.bat) that get checked in with your source code. In Jenkins in your build step you can call $WORKSPACE/build - Windows will execute build.bat whereas Linux will run build.
An option is to use user-defined axis combined with slaves(windows, linux, ...), so you need to add a filter for each combination and use the Conditional BuildStep Plugin to set the build step specific for each plataform(Executar shell, Windows command, ...)
This link has a tutorial but it is in portuguese, but it's easy to work it out based on image...
http://manhadalasanha.wordpress.com/2013/06/20/projeto-de-multiplas-configuracoes-matrix-no-jenkins/
You could use the variable that jenkins create when you define a configuration matrix axis. For example:
You create a slave axis with name OSTYPE and check the two slaves (Windows and Linux). Then you create two separate build steps and check for the OSTYPE environment variable.
You could use a improved script language instead, like python, which is multi-platform and can achieve the same functionality independent of the slaves' name and in just one build step.
If you go the matrix route with Windows and something else, you'll want the XShell plugin. You just create your two build scripts such as "build.bat" for cmd and "build" for bash, and tell XShell to run "build". The right one will be run in each case.
A hack to have batch files run on Windows and shell scripts on Unix:
On Unix, make batch files exit with 0 exit status:
ln -s /bin/true /bin/cmd
On Windows, either find a true.exe, name it sh.exe and place it somewhere in the PATH.
Alternatively, if you have any sh.exe installed on Windows (From Cygwin, Git, or other source), add this to the top of the shell script in Jenkins:
[ -n "$WINDIR" ] && exit 0
Why wouldn't you always pick the multi-configuration job type?
Some reasons come to mind:
Because jobs should be easy to create and configure. If it is hard to configure any job in your environment, you are probably doing something wrong outside the scope of the jenkins job. If you are happy that you managed to create that one job and it finally runs, and you are reluctant to do this whole work again, that's where you should try to improve.
Because multi configuration jobs are more complex. They usually require you to think about both the main job and the different sub job variables, and they tend to grow in complexity to a level beyond being manageable. So in a single job scenario, you'd probably waste thoughts on not using that complexity, and when extending the build variables, things might grow in the wrong direction. I'd suggest using the simple jobs as default, and the multi configuration jobs only if there is a need for multiple configurations.
Because executing multi configuration jobs might need more job slots on the slaves than single jobs. There will always be a master job that is executed on a special, invisible slot (that's no problem by itself) and triggers the sub jobs, but if these sub jobs do themselves trigger sub jobs, you might easily end in a deadlock if there are more sub jobs than slots, and some sub jobs trigger again sub jobs that then cannot execute because there are no more open slots. This problem might be circumvented by using some configuration setup on the slaves, but it is present and might only occur if several multi jobs run concurrently.
So in essence: The multi configuration job is a more complex thing, and because complexity should be avoided unless necessary, the regular freestyle job is a better default.
If you want to select on which slave you run the job, you need to use multi-configuration project (otherwise you won't be able to select/limit slaves on which you run it – there are three ways to do it, however I've tried them all (Tie plugin works only for master job, Restrict in Advanced Project Options is not rock-safe trigger as well so you want to use Slave axis that is proven to work correctly today.)

Resources