Jenkins remote API - wait for build to finish and get output? - jenkins

When using Jenkins CLI, I can use the build command with options -v and -s to run a build, waiting for it to finish and printing its output.
Is there any way I can achieve the same result (wait for execution and get job output) with a single call to the REST API? I know this can be done by polling for build status until it finishes and then requesting its output, but I want to know if there is a straightforward option for short-running jobs.

You can do like that somehow. But even if you do also you can't able to apply the same code for other jobs. There will be waiting period for the next available executor or some race conditions like this might happen. And holding the rest API for that long period is not gonna be a good option. And nobody suggests that.
So Instead of looking for the REST API, you can have an algorithm for polling itself. instead of every second, take results from the previous builds and process it and try to predict the near possible time and then poll. Like this kind of algorithms or else you can use Jenkins build remaining time also. Hope this makes sense.

Related

Get Jenkins build job duration once finished

I am using Jenkins integration server for my CI/CD.
I am working with freestyle projects.
I want to get build duration once finished (in seconds) using RESTFUL API (JSON).
This is what i tried:
"duration=$(curl -g -u login:token--silent "$BUILD_URL/api/json?pretty=true&tree=duration" | jq -r '.duration')"
Duration is always equel to 0 even though i did this shell script in a post buil task.
There are a few reasons why this may not work for you, but the most probable reason is that the build hasn't finished when you make the API call.
I tried it on our instance and for finished jobs it works fine and for running jobs it always returns 0. If your post build task is executed as part of the job, then the job has probably not finished executing yet, which is why you are always getting 0.
The Api call for the build will not contain the duration attribute as long as the build is running, and therefore you cannot use that mechanism during the build duration.
However you have a nice alternative for achieving what you want using Freestyle jobs.
The solution, which still uses the Api method, is to create a separate generic job for updating your data base with the results, this jobs will receive as parameters the project name and the build number, run the curl command for receiving the duration, update your database and run any other logic you need.
This job can now be called from any freestyle job using the Parameterized Trigger plugin post task with the relevant build environment parameters.
This has the additional benefit that the duration update mechanism is controlled in a single job and if updates are needed they can be made in a single location avoiding the need to update all separate jobs.
Assuming your job is called Update-Duration and it receives two parameters Project and Build the post trigger can look like the following:
And thats it, just add this tigger to any job that needs it, and in the future you can update the logic without changing the calling jobs.
Small thing, in order to avoid a race condition that can be caused if the caller job has not yet finished the execution you can increase the quite period of your data base updater job to allow enough time for the caller jobs to finish so the duration will be populated.

How can I debug why my Dataflow job is stuck?

I have a Dataflow job that is not making progress - or it is making very slow progress, and I do not know why. How can I start looking into why the job is slow / stuck?
The first resource that you should check is Dataflow documentation. It should be useful to check these:
Troubleshooting your Pipeline
Common error guidance
If these resources don't help, I'll try to summarize some reasons why your job may be stuck, and how you can debug it. I'll separate these issues depending on which part of the system is causing the trouble. Your job may be:
Job stuck at startup
A job can get stuck being received by the Dataflow service, or starting up new Dataflow workers. Some risk factors for this are:
Did you add a custom setup.py file?
Do you have any dependencies that require a special setup on worker startup?
Are you manipulating the worker container?
To debug this sort of issue I usually open StackDriver logging, and look for worker-startup logs (see next figure). These logs are written by the worker as it starts up a docker container with your code, and your dependencies. If you see any problem here, it would indicate an issue with your setup.py, your job submission, staged artifacts, etc.
Another thing you can do is to keep the same setup, and run a very small pipeline that stages everything:
with beam.Pipeline(...) as p:
(p
| beam.Create(['test element'])
| beam.Map(lambda x: logging.info(x)))
If you don't see your logs in StackDriver, then you can continue to debug your setup. If you do see the log in StackDriver, then your job may be stuck somewhere else.
Job seems stuck in user code
Something else that could happen is that your job is performing some operation in user code that is stuck or slow. Some risk factors for this are:
Is your job performing operations that require you to wait for them? (e.g. loading data to an external service, waiting for promises/futures)
Note that some of the builtin transforms of Beam do exactly this (e.g. the Beam IOs like BigQueryIO, FileIO, etc).
Is your job loading very large side inputs into memory? This may happen if you are using View.AsList for a side input.
Is your job loading very large iterables after GroupByKey operations?
A symptom of this kind of issue can be that the pipeline's throughput is lower than you would expect. Another symptom is seeing the following line in the logs:
Processing stuck in step <STEP_NAME>/<...>/<...> for at least <TIME> without outputting or completing in state <STATE>
.... <a stacktrace> ....
In cases like these it makes sense to look at which step is consuming the most time in your pipeline, and inspect the code for that step, to see what may be the problem.
Some tips:
Very large side inputs can be troublesome, so if your pipeline relies on accessing a very large side input, you may need to redesign it to avoid that bottleneck.
It is possible to have asynchronous requests to external services, but I recommend that you commit / finalize work on startBundle and finishBundle calls.
If your pipeline's throughput is not what you would normally expect, it may be because you don't have enough parallelism. This can be fixed by a Reshuffle, or by sharding your existing keys into subkeys (Beam often does processing per-key, and so if you have too few keys, your parallelism will be low) - or using a Combiner instead of GroupByKey + ParDo.
Another reason that your throughput is low may be that your job is waiting too long on external calls. You can try addressing this by trying out batching strategies, or async IO.
In general, there's no silver bullet to improve your pipeline's throughput,and you'll need to have experimentation.
The data freshness or system lag are increasing
First of all, I'd recommend you check out this presentation on watermarks.
For streaming, the advance of the watermarks is what drives the pipeline to make progress, thus, it is important to be watchful of things that could cause the watermark to be held back, and stall your pipeline downstream. Some reasons why the watermark may become stuck:
One possibility is that your pipeline is hitting an unresolvable error condition. When a bundle fails processing, your pipeline will continue to attempt to execute that bundle indefinitely, and this will hold the watermark back.
When this happens, you will see errors in your Dataflow console, and the count will keep climbing as the bundle is retried. See:
You may have a bug when associating the timestamps to your data. Make sure that the resolution of your timestamp data is the correct one!
Although unlikely, it is possible that you've hit a bug in Dataflow. If neither of the other tips helps, please open a support ticket.

How to send warning email when build queue exceeds a particular length?

I manage a Jenkins server with a few hundred projects in the whole ecosystem. Many of the projects rely on upstream servers, that, unfortunately, are not always responsive. When I have a lag on these servers, my build queue can get to 10 or more. Is there a plugin or setting to send a warning email when the build queue exceeds a particular length?
I have been unable to find a plugin that does this, but you can query Jenkins for the information as detailed here: Jenkins command to get number of builds in queue.
If you have a Jenkins slave available you could set up a job that runs every 15 minutes and just hit each of the other Jenkins servers with the API call to get build queue counts (this is easy if you have just one master and many slaves.)
If you wanted to stay completely outside of Jenkins (not add another job to the mix) you could write a script to poll the Jenkins API for the information. You could then run that script under, say, a 15 minute (or some other relevant time step) timer using cron (or windows scheduled task). Admittedly then you have to dedicate some resources to running this job.
It looks like you could use python to get the build queue and check the length of the returned list. get_queue_info()
I haven't mucked about with the Jenkins API much myself so I'm not sure offhand exactly what the script would need, but it should be simple enough once you dig into it.

Jenkins - monitoring the estimated time of builds

I would like to monitor the estimated time of all of my builds to catch the cases where this value is shown as 'N/A'.
In these cases the build gets stuck (probably due to network issues in my environment) and it won't start new builds for that job until killed manually.
What I am missing is how to get that data for each job, either from api or other source.
I would appreciated any suggestion.
Thanks.
For each job, you can click "Trend" on the job run history table, and it will show you the currently executing progress along with a graph of "usual" execution times.
Using the API, you can go to http://jenkins/job/<your_job_name>/<build_number>/api/xml (or /json) and the information is under <duration> and <estimatedDuration> fields.
Finally, there is a Jenkins Timeout Plugin that you can use to automatically take care of "stuck" builds

Trigger a build asynchronously in Jenkins

I have a job A running in Jenkins, which kicks off a process A on a VM, waits for it to finish, picks up the report generated by it and sends it as an attachment to the build notification. The problem is this process A takes too long to finish and job A keeps waiting on it. Is there any way I can start this process A, stop job A and when process A is done, trigger a new job B which would pick up the report generated by process A and sends it out with build sucess/failure status.
Any help is appreciated.
Thanks
Jenkins provides an API for kicking off jobs via simple HTTP requests. You kick off job B using curl or something like that, as the final step in process A on the VM.
The docs are on the Jenkins site. You can use your own Jenkins find the specific URLs for kicking off particular jobs; there's a link in the bottom right hand corner of the Jenkins page.
Perhaps an even better match for your use case would be a job of type "Monitor an external job". I have not used it myself, but from the documentation it sounds like a useful tool. The docs are at: https://wiki.jenkins-ci.org/display/JENKINS/Monitoring+external+jobs

Resources