Cleanup Jenkins home directory - jenkins

We have started to use jenkins from last few months and now the size of home directory is about 50GB. I noticed that size of Jobs and workspace directories are about 20 GB each. How can I clean them? What kind of strategy I should use?

Consider the various Jenkins areas that tend to grow excessively. The key areas are: system logs, job logs, artifact storage and job workspaces. This will detail options to best manage each of these.
System logs
System logs may be found in <JENKINS_HOME>/logs or /var/log/jenkins/jenkins.log, depending on your installation. By default, Jenkins does not always include log rotation (logrotate), especially if running straight from the war. The solution is to add logrotate. This Cloudbees post and my S/O response add details.
You can also set the Jenkins System Property hudson.triggers.SafeTimerTask.logsTargetDir to relocate the logs outside the <JENKINS_HOME>. Why answered later.
Job Logs
Each job has an option to [ X ] Discard old builds. As of LTS 2.222.1, Jenkins introduced a Global Build discarder (pull #4368) with similar options and default actions. This is a global setting, Prior to that, job logs (and artifacts) were retained forever by default (not good).
Advanced options can manage artifact retention (from post-build action, "Archive the artifacts" separately.
What's in Jobs directory?
The Jobs directory contains a directory for every job (and folders if you use them). Inside the directory is the job config.xml (a few KB in size), plus a directory builds. builds has a numbered directory holding the build logs for each retained build, a copy of the config.xml at runtime and possibly some additional files of record (changelog.xml, injectedEnvVars.txt). IF you chose the Archive the artifacts option, there's also an archive directory, which contains the artifacts from that build.
Jenkins System Property, jenkins.model.Jenkins.buildsDir, lets you relocate the builds to outside the <JENKINS_HOME>
Why Relocate logs outside <JENKINS_HOME>?
I would strongly recommend relocating both the system logs and the job / build logs (and artifacts). By moving the system logs and build logs (and artifacts if ticked) outside of <JENKINS_HOME>, what's left is the really important stuff to back and restore Jenkins and jobs in the event of disaster or migration. Carefully read and understand the steps "to support migration of existing build records" to avoid build-related errors. It also makes it much easier to analyze which job logs are consuming all the space and why (ie: logs vs artifacts).
Workspaces
Workspaces are where the source code is checked out and the job (build) is executed. Workspaces should be ephemeral. Best Practicesare to start with an empty workspace and clean up when you are done - use Workspace Cleanup ( cleanWS() ) plugun, unless necessary.
The OP's indication of a workspaces in the Jenkins controller suggests jobs are being run on the master. say that's not a good (or secure) practice, except lightweight pipelines always execute on master. Mis-configured job pipelines will also fall back to master (will try find reference). You can set up a node physically running on the same server as the master for better security.
You can use cleanws() EXCLUDE and INCLUDE patterns to selectively clean the workspace if delete all is not viable.
There are two Jenkins System Properties to control the location of the workspace directory. For the master: jenkins.model.Jenkins.workspacesDir and for the nodes/agents: hudson.model.Slave.workspaceRoot. Again, as these are ephemeral, get them out of <JENKINS_HOME> so you can better manage and monitor.
Finally, one more space consideration...
Both maven and npm cache artifacts in a local repository. Typically that is located in the user's $HOME directory. If incrementing versions often, that content will get stale and bloated. It's a cache, so take a time hit every once in a while and purge it or otherwise mange the content.
However, it's possible to relocate the cache elsewhere through maven and npm settings. Also, if running a maven step, every step has the Advanced option to have a private repository. That is located within in job's workspace. The benefit is you know what your build is using; no contamination. The downside though is massive duplication and wasted space if all jobs have private repos and you never clean them out or delete the workspaces, or longer build times every time if you cleaned. Consider using s the cleanWS() or a separate job to purge as needed.

The workspaces can be cleaned after and/or prior any execution. I recommend doing it prior and after an execution. After the build, do it only on successful builds. In case of errors, you can enter the workspaces and check there for any clue. You do it on the pipeline using the CleanWs() command.
For jobs directory you can select on your jobs the amount of time / maximum of executions to store. This is more complicated because it depends on what you want to save. For example if there is a lot of builds and you don't mind deleting that information you could save 10 builds during 30 days . That configuration is on the job configuration under job properties and search for "Discard old builds" and "Days to keep build" and "Max # builds to keep".
My suggestion is that you use larger numbers at first and then you can test how it behaves

Related

In Jenkins, how do I set SCM behavior for the master node rather the build nodes?

I'm aware I'm lacking basic Jenkins concepts but with my current knowledge it's hard for to research successfully - maybe you can give me some hints I can use to re-word my question if needed.
Currently I'm facing a situation in which in a setup with several build nodes the Jenkins master machine is running out of disk space because Jenkins clones git repositories on both, the master and build nodes (and the master only has limited space). This question explains why.
Note: the master node itself does not build anything - it just clones the repo to a local workspace folder (I guess it just needs the Jenkinsfiles).
Going through the job configurations and googling this issue I find options regarding shallow and sparse clones or cleaning up the workspace before or after the build using the Cleanup Plugin. But those settings and plugins only care about the checkout done with checkout(SCM) on the build nodes, not the master.
But in case I want to leave the situation as is on the build nodes but keep the workspace folders on the Jenkins master machine slim, how do I approach this? What do I have to search for?
And as a side question - isn't it possible to have something like "git exports"? I.e. having the .git folders removed after checking out the commit I need?
In case it depends on the kind of job I use, I'm using scripted pipeline jobs.
I've got a similar setup: A master node, multiple build nodes.
Simply, I set the number of executors=0 on the master node (from Manage Jenkins -> Manage Nodes), so every job will land on build nodes.
The only repo cloned on the master is the shared library.
Running Jenkins builds in the master node is discouraged for two main reasons:
First of all, the usability of the Jenkins platform might be affected by many ongoing builds, for example showing delays on certain operations.
It is a well-known security problem, as pointed out by the documentation:
Any builds running on the built-in node have the same level of access to the controller file system as the Jenkins process.
It is therefore highly advisable to not run any builds on the built-in node, instead using agents (statically configured or provided by clouds) to run builds.
Always in that wiki page you can find details on this security problems, like what an attacker can do and an alternative that lets you use the master node to build, but patching some of the listed security problems. The solution is based on a plugin called Job Restrictions Plugin.
By the way, the most popular decision is to let slave nodes do the build:
To prevent builds from running on the built-in node directly, navigate to Manage Jenkins » Manage Nodes and Clouds. Select master in the list, then select Configure in the menu. Set the number of executors to 0 and save. Make sure to also set up clouds or build agents to run builds on, otherwise builds won’t be able to start.
If you really have strong reasons to build on the master node, you can always apply a different git clone strategy based on the value of the env.NODE_NAME environment variable. It is set to master if the pipeline job is run on the master node, otherwise it is filled with the node name (of course). Nonetheless, I have never seen anyone customizing the git clone command based on the node used, so... Don't do it 😉
About the sparse checkout and the sparse/shallow clone:
The former creates an incomplete working directory, avoiding to map all the trees and blobs present in the current commit, but only those you specify. Do you save that much space? Or better, is your project tree that heavy that you would need to do something like this? The sparse-checkout is generally used when you want a clean working tree, without unnecessary files.
sparse/shallow clone can be useful sometimes to reduce the download time, especially when you have a huge history. The most common option is --depth=1 that instructs git to retrieve only the most recent commit. As far as I know, Jenkins already applies some optimizations to speed the clone process but it generally keeps the entire history. Again, I am not sure you would gain a lot more space.
A valid (at least for me) alternative to space-optimizations on git files, is to build on Docker containers. Jenkins has reached a good level of integration with Docker and there are a lot of advantages using it, among which the disposal of the workspace after the job finished.
I didn't use the pipeline feature myself so far -- but conceptually it is clear that the master requires initial access to the Jenkinsfile. It will therefore be difficult to avoid this step entirely.
If Jenkins itself does not provide an option to fine-tune the clone/checkout behavior on the master side, then I'd see these options:
Create a custom version of Jenkins (or of the corresponding plugin) which hard-codes the behavior that you need (like, shallow/sparse clone). Modifying and building both Jenkins and its plugins is surprisingly simple; often, the most difficult part is to locate the code that you need to touch.
Tune the master's clone in-place. Shallowness and sparse-checkout properties can be set for existing clones. If you set these properties after the initial clone (possibly in the Jenkinsfile itself or in a post-build step), then Jenkins may possibly maintain those properties.
Constantly re-cloning and deleting the repo on master side increases the load both on the Jenkins master and on your Git server, so better be careful with that (especially since your repository has a size where disk space matters already). If you really want to go that way, you could try to force-remove the clone on the master in a post-build step -- this should be relatively easy to implement. You need to check that this hack will not interfere with Jenkins' access to the Jenkinsfile.

Jenkins - Copy Artifacts from upstream job built in different node

There is a job controlled by Development team which built in a different node. I am on Testing team who want to take the artifacts and deploy on test device.
I can see those Artifacts from dev are stored in some path in dev's node. Does it means it must first archived in Jenkins master before I can copy it to my job?
I am using Copy Artifact plugin and constantly getting the error
Failed to copy artifacts from <dev-job> with filter: <path-in-dev-node>
*Some newbie question since i just moved from TeamCity
You probably want to use: Copy Artifact plugin.
Adds a build step to copy artifacts from another project.
Consider also, the Jenkins post-buid step "Archive the artifacts".
If you copy from the other job's workspace, what happens if another job is in progress or the workspace is wiped? That step copies them from the node to the master and stores a copy along with the build logs, etc. That makes them available via the UI as long as the build logs remain. It can take up space tho.
If you do use archive artifacts, consider using the system property jenkins.model.Jenkins.buildsDir to store all the build logs (and artifacts) outside of the jobs config directory. Some downtime and work required to separate the two (config / logs) .
You may also want to consider using a proper repository manager (Nexus / artifactory)
Finally, you may want to learn about using a Jenkins pipeline rather the relying on chained jobs, triggers or users and so forth. Why? 'cos it's much more controlled and easier to maintain.
ps: I'm not a huge fan of artifactDeployer, but it may work for you.
pps: you may want to review this in depth answer: Jenkis downstream job fails to find upstream artifacts

Jenkins Multibranch Pipeline - Issues with deleting jobs

Use case: Using Jenkinsfile to auto create builds for branches
Summary:
For a variety of reasons sometimes the Jenkins master fails to connect to the SCM server. When this occurs Jenkins deletes that job directory on master, because it no longer sees the branches. However, the slaves are not cleaned up and so they still have the old workspace paths (which are uniquely named based on the build # in my setup). When the Jenkins master reconnects to the SCM server, it recreates a new job folder on master, and the build counter is reset to #1.
This creates the following issues:
When a build starts, it executes on a slave. Since master has a new counter the job is #1. But this path may already exist from a previous build on that slave, so the artifact is built with content that was checked out for the original old build (i.e. maven uses the /target directory inside the workspace which already existed from previous build). So the end result is an artifact that potentially has the wrong code.
This can create build storms. After the connection issues are resolved, Jenkins will see all the repositories and branches with Jenkinsfiles and start to build them. So in a setup of let's say 20 repositories with 10 branches each, this will create 200 new builds. This increases with additional repositories and branches. This is obviously not desired.
Solutions:
One quick solution I can think of is to update the Jenkinsfile to delete the workspace if it exists before running the job inside of it. But this is just a work around. I would not want to mask the connection issues and would like to retain the actual build history of a pipeline (not have it keep erasing itself).
Minimize connection issues. This obviously will not always be guaranteed though. Plus sometimes maintenance must force servers offline. While I can construct maintenance in a way to limit or work around such issues, there still will be rare cases where downtime is required across the board. It would be best if Jenkins could handle this use case.
I'm curious if anyone has ran into this issue and what the thoughts are on this problem?

Access Jenkins host drive, beside the job workspace

I would like to share byproducts of one jenkins job, with another one that run after.
I am aware that I can set "use custom workspace", but that would merge the jobs together; which is not what I want. I just need to move few files in a location, that are read by the next job.
So far I can't find out how you actually tell Jenkins jobs to look for a specific folder; since it does not have a concept of file system, beyond what is going on in the job workspace folder.
Is there a way to access the host file system, or declare a shared folder inside jenkins (like in the main workspace folder, which contains all the other jobs?), so I can copy and read files in it, from different jobs?
Where possible I would like to avoid plugins and extras; I would like to use what is included with Jenkins base.
I realize you want to avoid plugins, but the Jenkins-y way to accomplish this is to use the Copy Artifacts plugin, which does exactly what you want.
There are a variety of problems that you may run into when trying to manage the filesystem yourself. (How do you publish to a common location when running on different build nodes? How do you handle unsuccessful builds?) This solution uses Jenkins to track builds and artifacts. In the absence of a separate artifact repository, its a lot better than trying to manage it yourself.
To use Copy Artifacts:
As a Post-Build step, choose "Archive Artifacts" in the first job and enter the path(s) to the generated files.
Then in the second job, add a "Copy Artifacts from another project" build step to grab some or all files marked as artifacts in your first job. (By default, Jenkins will re-create the paths of the generated files in the second job's workspace, which may or may not be what you want, but you can change this behavior.)
Configure the Jenkins to run a Maven build, and deploy your artifacts with "mvn clean deploy" This will push it to an "artifact server" which you probably have, or if not, need to add / configure.
Then in your downstream job, also a Maven job, you configure it to depend on the same artifact that was published in the upstream job. This will trigger a download of the artifact from the artifact server and make it available to the build.

Jenkins downstream job fails to find upstream artifacts

The setup is used to build and deploy to Adobe AEM.
Master Build job pulls from git repository, builds and packages, run the tests and then fires downstream jobs that should use the built packages from upstream job.
The issue is that downstream job fail with the message:
Unable to access upstream artifacts area /var/lib/jenkins/jobs/PROJECTNAME-Master-Branch/builds/2014-10-22_11-33-46/archive. Does source project archive artifacts?
It seems to me that somehow CopyArtifacts plugin, triggered by the downstream job, is looking for the artifacts in wrong location. The correct location would be
/var/lib/jenkins/jobs/PROJECTNAME-Master-Branch/workspace/PROJECTNAME-*/**/*.jar,/var/lib/jenkins/jobs/PROJECTNAME-Master-Branch/workspace/PROJECTNAME-*/**/*.zip
But then, it complains about
java.io.IOException: Expecting Ant GLOB pattern, but saw '/var/lib/jenkins/jobs/PROJECTNAME-Master-Branch/workspace/PROJECTNAME-*/**/*.jar,/var/lib/jenkins/jobs/PROJECTNAME-Master-Branch/workspace/PROJECTNAME-*/**/*.zip'. See http://ant.apache.org/manual/Types/fileset.html for syntax
The downstream job copies artifacts from another project, and then the build was either "Upstream build that triggered this job" or "Copy from workspace of latest completed build". And none works.
Any ideas?
TL;DR
You are trying to use artifacts without archiving them first.
You are trying to use absolute paths, but they should be relative to $WORKSPACE and/or "archive location".
Full Answer
You are misunderstanding the concept of "Artifacts" as it relates to Jenkins.
What are Jenkins Artifacts
Artifacts are files that are specifically preserved after the build with the help of Archive the Artifacts post-build action.
When the build runs, it runs within:
$WORKSPACE, which on filesystem usually resides within
$JENKINS_HOME/jobs/$JOB_NAME/workspace
Inside there, you can have your SCM checkout folders, temporary build files, final built files, binaries, etc.
The contents of $WORKSPACE is volatile, you should never rely on it, outside of the build timeframe (and downstream jobs are outside of the build timeframe). The contents of $WORKSPACE could be different between different master/slave nodes, it could be deleted at any time by admin, or by SCM update/cleanup/checkout.
It's also important to understand that there is only one $WORKSPACE for the whole Job.
But now pay attention to your Build History, there are several entries in that list, referenced by build number (#) and date timestamp.
These are stored under:
$JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_ID
with $BUILD_ID being the date-timestamp of the build, like 2014-10-22_11-33-46
The $WORKSPACE contains the information relevant to current or last (and the problem is: you can never be sure if it's "current" or "last") build;
The builds folder contains a record of all past (retained) build executions (this is what makes up the Build History list on your left), per build.
By default, it contains only what Jenkins itself needs: build.xml copy, changelog information, console log. When you go to URL http://$JENKINS_URL/job/$JOB_NAME/[nn]/ where [nn] is a numeric job build/run number (#), it's reading this information from the builds folder on the filesystem.
To preserve artifacts of a build (to avoid them being overwritten by the next build, wiped out worskpace, or just to access older builds), you need to Archive the Artifacts (with same post-build action with the same title). When you archive the artifacts, you indicate which files within $WORKSPACE you want to preserve. When Jenkins does the archiving, it will place those files (keeping paths [relative to $WORKSPACE] preserved) into:
$JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_ID/archive/.
This way, you can have multiple sets of artifacts preserved for previous builds, not just "latest/last" from $WORKSPACE.
For the sake of completeness, I will mention that Jenkins's "permalinks", such as http://$JENKINS_URL/job/$JOB_NAME/lastSuccessfulBuild and /lastFailedBuild, etc are in fact symlinks on the filesystem to one of the preserved builds/$BUILD_ID folders.
Lastly, you control how many build runs and how many artifacts are retained (can be configured separately) through "Discard old builds" checkmark on job configuration. By default, all are retained, but if you start retaining artifacts, you need to think of hard-disk space capacity.
Solutions to your problem
So with the information above, and looking at your error messages, you should now see that the Copy Artifacts plugin is correctly looking for artifacts under the /archive/ section of a build.
You should also notice that Copy Artifacts plugin does not let you pick "current build" when selecting which build to copy from. It has permalinks (like "last successful" or "last build"), and specific build numbers, all of which translate to preserved builds under $JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_ID/archive/
Even "Upstream Build that triggered this job" will link to a specific $BUILD_ID.
In either of below options
Configuration for Archiving Artifacts is relative to $WORKSPACE.
Configuration for Copy Artifacts is relative to "archive location", that is $JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_ID/archive/.
Since "Copy Artifacts" is relative to "archive location", and "archive location" is relative to $WORKSPACE, then for all intensive purposes, the relative paths of both configurations can be same and relative to $WORKSPACE
Option 1
First Archive the Artifacts with the post-build action, otherwise you have nothing to copy from.
If you have your files in the root of $WORKSPACE, it should be:
PROJECTNAME-*/**/*.jar,PROJECTNAME-*/**/*.zip
(Note, not full paths in here)
Then use Upstream Build that triggered this job for Copy Artifacts selection.
For Artifacts to copy field use either:
** or blank to copy all archived artifacts, or
PROJECTNAME-*/**/*.jar,PROJECTNAME-*/**/*.zip (same as the archiving section)
Option 2
If you don't want to archive, you can use $WORKSPACE directly, with Copy from workspace of latest completed build, however you must ensure that no second upstream build can run while downstream build is executing, else you risk getting a partial file from a partial build, because as previously explained, $WORKSPACE is volatile.
Again, for the Copy Artifacts step, under Artifacts to copy field, use path relative to $WORKSPACE, that is:
PROJECTNAME-*/**/*.jar,PROJECTNAME-*/**/*.zip
Option 3
If you really want to copy the whole WORKSPACE between different jobs, use either
Clone Workspace SCM plugin or
Shared Workspace plugin
The fix may be this simple: disable or remove Compress Artifacts plugin and reboot Jenkins.
This workaround was deduced from a long-standing bug report: "Copy Artifacts Plugin" should support ArtifactManager.
The solution is about the configuration of the builder.
The root cause sits on the configuration of the downstream job. Once "Copy from workspace of latest completed build" is chosen for the build to be copied, and the path of artifacts to copy is set to relative path, such as projectname-//.jar,projectname-//.zip then the build succeeds.
Furthemore, in the parent job configuration, downstream job needs to be allowed to CopyArtifact and Projects to allow copy artifacts field should specify the downstream job.
Edit: Now I see that you responded in the meantime. Great answer and basically clears up some of the questions I had.
The one unclear thing about option 1 is that archiving of the files happens after the parent job completes.
Waiting for the completion of projectname-Deploy
projectname-Deploy #19 completed. Result was SUCCESS
Waiting for the completion of projectname-Deploy
projectname-Deploy #20 completed. Result was SUCCESS
Build step 'Trigger/call builds on other projects' changed build result to SUCCESS
Strings match run condition: string 1=[lab2b], string 2=[both]
Run condition [Strings match] preventing perform for step [BuilderChain]
Archiving artifacts
Once I changed the approach to option two it worked for me, but I would like to understand first option as well.

Resources