CircleCI downloading workspace layers multiple times due to parallelism setting - circleci

My circleci workflow involves the following jobs at a high level:
I have parallelism (6 instances) for the build job. However, the downstream build_and_push_image job ends up downloading workspace layers (seemingly dup of same content) from each instance causing it to take ~3 minutes+.
In comparison, if there is NO parallelism on the build job, the build_and_push_image step would complete <1min because the Downloading workspace layers involves content ~570MB(100MB + 470MB) instead of 2.7GB
Is this expected? Is there a way to prevent build_and_push_image step to not duplicate the downloading of workspace layers once for each parallel instance?

Related

jenkins trying to copyArtifacts from a build that I trigger

I have installed the copyArtifacts plugin and created two freestyle jobs: experiment-main and experiment-1
experiment-1 just creates a file called artifact.txt with the build # in it, and archives it.
experiment-main triggers experiment-1 and then tries to copy the artifact like this:
but this is the result:
Running as SYSTEM
Building on master in workspace /var/lib/jenkins/workspace/experiment-main
Waiting for the completion of experiment-1
experiment-1 #4 started.
experiment-1 #4 completed. Result was SUCCESS
Build step 'Trigger/call builds on other projects' changed build result to SUCCESS
ERROR: Unable to find a build for artifact copy from: experiment-1
Finished: FAILURE
which isn't what I expected (or at least what I was hoping for)
I hoped it would find the experiment-1 build that was downstream from the current build.
Any ideas?
I figured out that there are variables with the numbers of triggered builds that I can use. To figure out the variable, I just printed all the environment variables with env and then found the right variable in the list.
Then I configured the copy artifacts plugin to use that build number.
I couldn't do it how #alex-o suggested, just getting the last build of the subjob, because I might have more than one job using the subjob at once, but if you don't have that problem, that might work for you.
Yes, this is unexpected behavior indeed.
The reason why this won't work is hidden in the help text of the "Upstream Project Name" input field:
Downstream builds are found using fingerprints of files. That is, a build that is triggered from a build isn't always considered downstream, but you need to fingerprint files used in builds to let Jenkins track them.
So, the Copy-Artifact plugin relies on fingerprint data to determine job ancestry. For that reason, you can not use the "Downstream build of..." feature using the current job as a parent: fingerprints are recorded in a post-build step, so an ongoing build of example-master does not have any fingerprints associated to it by the time it is looking for a matching build of experiment-1.
It is possible to modify fingerprint information at build run-time (e.g., via Groovy), but then, it's probably best to avoid the Copy-Artifact plugin entirely and to implement the whole procedure in Groovy right away.
Your best bet is probably to refer to example-1 via "Last successful build" and to ensure that this is the build that you triggered before (usually this will be correct, but depending on your setup there can be race conditions).

Jenkins: stash vs archiveArtifacts

What are the use cases and pros/cons for using stash vs archiveArtifacts?
The documentation mentions each:
i.e. https://jenkins.io/doc/pipeline/steps/workflow-basic-steps/#stash-stash-some-files-to-be-used-later-in-the-build
and
https://jenkins.io/doc/pipeline/tour/tests-and-artifacts/
but doesn't do a comparison.
stash is used to "save" some files in a pipeline stage and reuse them on a different slave (unstash). Stash is only useful when you have a small set of files. It will become very slow when you want to stash a big amount of data. If you need to stash a lot of files it's recommended to use a shared filesystem between your slaves so the content of your workspace can be used by multiple slaves.
Archiving artifacts will save artifacts on the master slave. You can specify if you only want to archive the generated artifacts from the last build or more. This is useful when you have some deploy job on your master to deploy the artifacts after a succesful run or to make them available in your jenkins console.
From the latest Pipeline Syntax documentation and Options directive:
https://jenkins.io/doc/book/pipeline/syntax/#options
preserveStashes
Preserve stashes from completed builds, for use with stage restarting. For example: options { preserveStashes() } to preserve the stashes from the most recent completed build, or options { preserveStashes(buildCount: 5) } to preserve the stashes from the five most recent completed builds.
In theory this seems much the same as using archiveArtifacts with buildDiscarder option to apply artifact retention policy.

Disable concurrent builds

Background: We are looking for a solution on how to optimize our pipeline (former workflow).
Currently, we run quiet a few parallel deployments and tests which are spread on 2 builders, with 4 executors each.
The pipeline is triggered by a Git push, so subsequent pushes will trigger multiple builds. We have experimented with the stage concurrency: 1 option, which nicely blocks a step by a subsequent build, but will kick of when that specific stage is done.
Question(s):
I am not sure this is best practise, but It seems to me, it would be better to not execute the new build, until the previous one is done. (Reasoning from the fact that we have committed resources to it, and it should be allowed to finished, even if it's not the latest and greatest commit).
Q1: Is this even best practise?
Q2: how do we pre-empt the new triggert build, while still running the previous one? (I can imagine iterating through the builds of this job and stopping the new one...).
See the config of the first stage [1]
[1] first stage..
stage name: 'Checkout and build WAR'
node {
def mvnHome = tool 'Maven 3.2.x'
checkout([$class : 'GitSCM',
poll : true,
branches : [[name: '*/master']],
doGenerateSubmoduleConfigurations: false,
extensions : [[$class : 'RelativeTargetDirectory',
relativeTargetDir: 'checkout-directory']],
submoduleCfg : [],
userRemoteConfigs : [[url: 'https://some.repo/repo.git']]])
// Archive the cloned repo.
stash name: 'src', includes: 'checkout-directory/war/src/, checkout-directory/war/pom.xml'
// Run without tests, do the unit and integration tests in a separate stage.
sh "${mvnHome}/bin/mvn -f checkout-directory clean install -DskipTests"
// Archive the application build.
stash name: 'war', includes: 'checkout-directory/war/target/*.war'
}
From job's configuration you can set:
Execute concurrent builds if necessary
Quiet period
If set, a newly scheduled build waits for this many seconds before
actually being built. This is useful for:
Collapsing multiple CVS change notification e-mails into one (some CVS changelog e-mail generation scripts generate multiple e-mails in
quick succession when a commit spans across directories).
If your coding style is such that you commit one logical change in a few cvs/svn operations, then setting a longer quiet period would
prevent Jenkins from building it prematurely and reporting a failure.
Throttling builds. If your Jenkins installation is too busy with too many builds, setting a longer quiet period can reduce the number
of builds.
If not explicitly set at project-level, the system-wide default value
is used.
As for jenkins-pipeline DSL this article answer you question:
By default, Pipeline builds can run concurrently. The stage command
lets you mark certain sections of a build as being constrained by
limited concurrency (or, later, unconstrained). Newer builds are
always given priority when entering such a throttled stage; older
builds will simply exit early if they are preƫmpted.
Using the Jobs configuration through properties seems to be the cleanest way to go.
See https://stackoverflow.com/a/43963315/1756183

Add Multiple Workspace Cleanup for Jenkins

Can I have multiple Delete workspace when build is done executions in a single job?
failure status: clear all workspace
success status: clear only distribution package directories (**/target/dist)
We break our builds into compilation and test jobs with build-stalker plugin providing the link between the two. Compilation job doesn't clean up after itself as test job will do so but we don't run a test job for each compilation job (only the latest in a 4 hour period) leaving orphaned workspaces.
I'd like a way to have the orphaned workspaces have less impact and a selective status based cleanup is one way to do this.
I'm not aware of a Jenkins plugin that supplies such a feature.
I'd establish the following:
Let each compile build write a line with its workspace path, e.g.:
.../jenkins/workspace/<...compile job...>/
by using a script in a language of your choice (bash, Groovy, cmd, ...) to a file like:
.../jenkins/workspace/toBeWipedOut
Create a job that runs a script in a language of your choice periodically that:
reads the file
deletes the workspaces mentioned therein except the last
removes all lines therein except the last
or
Let each compile build write a line with its job name.
See Wipe out workspaces of all jobs for how to wipe out workspaces by job name.

Copying artifacts from multiple upstream jobs at join in Jenkins

Is it possible to have a Jenkins Job with has been triggered by the Join plugin copy artifacts from multiple upstream jobs?
I'm trying to set-up a Jenkins configuration with a "diamond" of jobs: my-trigger runs and spawns two jobs, my-fork1 and my-fork2, that can run concurrently and take varying amounts of time, and the Join plugin sets off the job my-join once both forks have completed.
Each of my-trigger, my-fork1 and my-fork2 creates and fingerprints artifacts (say, text files).
I want to copy the artifacts from each of the upstream jobs in my-join using the "Copy artifacts from another project" tool, with the "Which build" parameter set to "Upstream build that triggered this job". However, I see output like this in the console of my-join:
Building remotely on build-machine in workspace /path/to/workspace/my-join
Copied 1 artifact from "my-trigger" build number 63
Copied 1 artifact from "my-fork1" build number 63
Unable to find a build for artifact copy from: my-fork2
and the job fails. In this case, my-fork2 finished first, so my-fork1 triggered the join step. I believe that that means that my-join only has record of my-fork1 and my-trigger as being upstream. If my-fork1 finishes first, then my-fork2 kicks off the join, and the job fails when trying to copy from my-fork1.
If I change the configuration to copy the artifact from the build "Latest successful build" then the build succeeds, but my-trigger may run many times in succession so there would be no guarantee that my-join is joining related artifacts.
How can I get the join step to copy artifacts from multiple forks upstream?
Note: the second point of this question seems to be asking the same thing, but the only answer there doesn't address it, and has been accepted.
Thanks
tensorproduct
If your builds are parameterized with a unique parameter for each run of the join-diamond, you can use that parameter in the CopyArtifact plugin to determine which build to copy from. You would want to specify "Latest successful build" and qualify it with the parameter and value.
We have a similar situation where I work; multiple simultaneous runs of a join-diamond. The parameter in the build allows the downstream jobs to get the correct artifacts from the upstream jobs.
Step by Step settings of the provided solution from Jason Swager:
Project dependencies:
diamond->fork->diamond_ready
Project "fork":
String parameter "UNIQUE_ID" (only dummy not used inside)
(Creates an artifcat and Archive the artifacts)
Project "diamond_ready"
String parameter: UNIQUE_ID
Copy artifacts from another project
Project name: fork
Parameter filters: UNIQUE_ID=${UNIQUE_ID}
Project "diamond":
Trigger parameterized build on other project
Projects to build: fork
Predefinded parameters: UNIQUE_ID=${BUILD_TAG}
Join Trigger:
Post-Join Actions:
Trigger parameterized build on other projects
Projects to build: diamond_ready
Predefined Generator parameters: UNIQUE_ID=${BUILD_TAG}

Resources