I am playing around with implementing a server of the BuildEventService so that I can have bazel export its BuildEventProtocol messages to it. I'm trying to figure out how to read the logs for a test run without race conditions, and in particular this seems very difficult due to bazel reusing the same path on the local machine for multiple runs and the default asynchronous nature of BES.
Example:
As part of the event stream I get the following:
EventStream event:
stream_id {
build_id: "a4a34ca2-fc4b-483d-b4ab-b4546bdb2c4e"
component: TOOL
invocation_id: "b09c0b08-b096-4673-9521-4980506207f7"
}
sequence_number: 11
event {
event_time {
seconds: 1504560960
nanos: 778000000
}
bazel_event {
[type.googleapis.com/build_event_stream.BuildEvent] {
id {
test_summary {
label: "//libraries:types-test"
configuration {
id: "fe35dfece8e09ba054305e51187b3316"
}
}
}
test_summary {
total_run_count: 1
failed {
uri: "file:///private/var/tmp/_bazel_endobson/f851d7f6c7010ae7d7a3db153bed36de/execroot/yaspl/bazel-out/darwin_x86_64-fastbuild/testlogs/libraries/types-test/test.log"
}
overall_status: FAILED
}
}
}
}
I would like to read the file in the uri:
file:///private/var/tmp/_bazel_endobson/f851d7f6c7010ae7d7a3db153bed36de/execroot/yaspl/bazel-out/darwin_x86_64-fastbuild/testlogs/libraries/types-test/test.log
but it seems that every time I run the test I get the same uri. Thus I want to read it before the next test run recreates it. But bazel by default does the uploading asynchronously, so it seems there is nothing preventing another run of bazel of starting up and recreating the file even before the BES server receives this stream message.
How can I avoid this race and still read these files?
It depends on whether you are in control of the Bazel client. If so then yes you can avoid the race. Else you can't.
You can specify a different --output_base on each invocation of
Bazel (The output base is the path prefix
/private/var/tmp/_bazel_endobson/f851d7f6c7010ae7d7a3db153bed36de in
your example). However, that --output_base is a startup option and
thus requires a Bazel server restart when it's changed. That would
work but it's slow and you need to specify the different
--output_base before the invocation, which might be fine if you invoke Bazel programmatically.
You can specify --bes_best_effort=false in which case the BES upload
is synchronous i.e. Bazel waits for the upload to finish. If the
upload fails, the build also fails.
You could wrap the bazel client in a shell script and additionally to uploading to your BES service, also write the BEP to a file and then at the end of the invocation parse the file for test.log files and upload these before giving control back to the user.
Related
I'm using a shared library via #Library('shared-lib') _. The pipeline script implements post actions. E.g.
post {
always {
script {
// Do stuff
}
}
}
When there's an error with the shared lib then Jenkins just fails the entire build and the post action block isn't executed, as it seems (tested with wrong repository URL and non-existing branch). In case GitHub is down, I want Jenkins to run post actions to notify the issuer of the build that it failed. Is there a way to do this, without having the issuer making API calls of some kind for verification?
Thanks for any suggestion!
One way you can control the loading of the shared library is by Loading libraries dynamically, when you do so you can wrap the loading phase with a try catch block and handle the failure.
However, when using this technique the error will be handled outside the pipeline execution so in order to avoid duplicating the error handler function (that sends notifications) you can define the error handling in a separate methods (or in a shared library) and call it for the catch block and from the post block.
Something like:
try {
library "shared-lib"
}
catch(Exception ex){
// handle the exception
handleError(ex.getMessage())
}
pipeline {
agent any
stages {
stage('Hello') {
steps {
...
}
}
}
post {
always {
script {
handleError(message)
}
}
}
}
def handleError(message){
emailext ...
}
You can also try to load the library inside a pipeline step, thus utilizing the post directive on failure, but this can cause issues with the context of the loaded library and therefore it is not recommended.
You can also of course handle separately each failure type and avoid the need of an external function.
Last thing, shared library failures are usually not handled because if the job failed to load the library for the SCM then it will probably fail to load the pipeline itself form the SCM, so assuming you host them both on the same SCM platform, this scenario is relatively rare.
tl;dr: I'd like to delete builds from within their execution, or rather, in a post statement (though the specifics shouldn't matter).
Background: In a project I'm working on, there is a "gateway" job of sorts that aggregates all new job triggers into one launch as long as certain other jobs are still running. For this purpose, this job aborts itself such that there is only ever one instance running (which is often not the latest build).
Unfortunately, this means that in the job preview, the job is often shown as aborted, which is undesirable (ending the job as "successful" or some other status wouldn't improve anything). Thus, I have two options:
Change the abortion logic so the newest build survives and older ones are aborted. This is technically possible, but has other drawbacks due to some internal logic, which is why I'd like to avoid this solution.
Delete the aborted builds once they're finished
However, this is apparently not as easy as just calling the "doDelete" REST API inside the job, and the build discarder can't be set to store 0 builds (it needs to be a positive integer). This is what I tried code-wise (MWE):
steps {
script {
currentBuild.result = 'ABORTED'
error("abort")
}
}
post {
always {
withCredentials([string(credentialsId: 'x', variable: 'TOKEN')]) {
sh "curl -X POST \"https://jenkins.example.com/etc/job/jobname/${env.BUILD_NUMBER}/doDelete\" -i -H 'Authorization: Bearer $TOKEN'"
}
}
}
This code deletes some job information (for instance, the console log is empty), but not the build itself. Thus, my question remains:
How can I make a job delete itself?
I have a groovy code that reads from the current consoleText and do some jobs. When I run the code from the IDE, it works perfectly but when I run it as a part of a step in Jenkins, it only reads 10000 lines of the total which is approximately 2.8 million lines. The code to read from the console is:
url.withReader { bufferedReader ->
while ((line = bufferedReader.readLine()) != null) {
//do something
}
}
The url is
${BUILD_URL}/consoleText
The .../consoleText URL will not "grow" automatically -- it just provides a "snapshot" of console data that's available at query time.
So, if you GET that URL for a build while that build is still running, then you will only see part of the console log. The amount that you see will depend on the time when you issue the GET -- and possibly it will also depend on the status of some buffers.
If this used to work better in the past, then you probably moved the point in time when you tried to read the console.
I'd like to run some code after my pipeline finishes all processing, so I'm using BlockingDataflowPipelineRunner and placing code after pipeline.run() in main.
This works properly when I run the job from the command line using BlockingDataflowPipelineRunner. The code under pipeline.run() runs after the pipeline has finished processing.
However, it does not work when I try to run the job as a template. I deployed the job as a template (with TemplatingDataflowPipelineRunner), and then tried to run the template in a Cloud Function like this:
dataflow.projects.templates.create({
projectId: 'PROJECT ID HERE',
resource: {
parameters: {
runner: 'BlockingDataflowPipelineRunner'
},
jobName: `JOB NAME HERE`,
gcsPath: 'GCS TEMPLATE PATH HERE'
}
}, function(err, response) {
if (err) {
// etc
}
callback();
});
The runner does not seem to take. I can put gibberish under runner and the job still runs.
The code I had under pipeline.run() does not run when each job runs -- it runs only when I deploy the template.
Is it expected that the code under pipeline.run() in main would not run each time the job runs? Is there a solution for executing code after the pipeline is finished?
(For context, the code after pipeline.run() moves a file from one Cloud Storage bucket to another. It's archiving the file that was just processed by the job.)
Yes, this expected behavior. A template represents the pipeline itself, and allows (re-)executing the pipeline by launching the template. Since the template doesn't include any of the code from the main() method, it doesn't allow doing anything after the pipeline execution.
Similarly, the dataflow.projects.templates.create API is just the API to launch the template.
The way the blocking runner accomplished this was to get the job ID from the created pipeline and periodically poll to observe when it has completed. For your use case, you'll need to do the same:
Execute dataflow.projects.templates.create(...) to create the Dataflow job. This should return the job ID.
Periodically (every 5-10s, for instance) poll dataflow.projects.jobs.get(...) to retrieve the job with the given ID, and check what state it is in.
I read How can I set the job timeout using the Jenkins DSL. That sets the timeout for one job. I want to set it for all jobs, and with slightly different settings: 150%, averaged over 10 jobs, with a max of 30 minutes.
According to the relevant job-dsl-plugin documentation I should use this syntax:
job('example-3') {
wrappers {
timeout {
elastic(150, 10, 30)
failBuild()
writeDescription('Build failed due to timeout after {0} minutes')
}
}
}
I tested in http://job-dsl.herokuapp.com/ and this is the relevant XML part:
<buildWrappers>
<hudson.plugins.build__timeout.BuildTimeoutWrapper>
<strategy class='hudson.plugins.build_timeout.impl.ElasticTimeOutStrategy'>
<timeoutPercentage>150</timeoutPercentage>
<numberOfBuilds>10</numberOfBuilds>
<timeoutMinutesElasticDefault>30</timeoutMinutesElasticDefault>
</strategy>
<operationList>
<hudson.plugins.build__timeout.operations.FailOperation></hudson.plugins.build__timeout.operations.FailOperation>
<hudson.plugins.build__timeout.operations.WriteDescriptionOperation>
<description>Build failed due to timeout after {0} minutes</description>
</hudson.plugins.build__timeout.operations.WriteDescriptionOperation>
</operationList>
</hudson.plugins.build__timeout.BuildTimeoutWrapper>
</buildWrappers>
I verified with a job I edited manually before, and the XML is correct. So I know that the Jenkins DSL syntax up to here is correct.
Now I want to apply this to all jobs. First I tried to list all the job names:
import jenkins.model.*
jenkins.model.Jenkins.instance.items.findAll().each {
println("Job: " + it.name)
}
This works too, all job names are printed to console.
Now I want to plug it all together. This is the full code I use:
import jenkins.model.*
jenkins.model.Jenkins.instance.items.findAll().each {
job(it.name) {
wrappers {
timeout {
elastic(150, 10, 30)
failBuild()
writeDescription('Build failed due to timeout after {0} minutes')
}
}
}
}
When I push this code and Jenkins runs the DSL seed job, I get this error:
ERROR: Type of item "jobname" does not match existing type, item type can not be changed
What am I doing wrong here?
The Job-DSL plugin can only be used to maintain jobs that have been created by that plugin before. You're trying to modify the configuration of jobs that have been created in some other way -- this will not work.
For mass-modification of existing jobs (like, in your case, adding the timeout) the most straightforward way is to change the job's XML specification directly,
either by changing the config.xml file on disk, or
using the REST or CLI API
xmlstarlet is a powerful tool for performing such tasks directly on shell level.
Alternatively, it is possible to perform the change via a Groovy script from the "Script Console" -- but for that you need some understanding of Jenkins' internal workings and data structures.