TFS 2015 "Build Job Timeout" results in no logs - tfs

We have a max execution time set for tests, but frankly this option is about as much use as a chocolate teapot.
When the execution time exceeds this limit, the whole build fails and all subsequent steps are aborted, so the "Publish Test Results" step never executes, so you get absolutely no information whatsoever to help you work out WHY it exceeded the timeout period.
Can anyone suggest an alternative?
I was thinking of maybe trying to implement the timeout as part of the test code itself - does anyone know if this is possible? If I launch a thread that monitors for the timeout, and if it is hit, then...?
Could I just have the test terminate it's own process?

Build job timeout in minutes
Specify the maximum time a build job is allowed to execute on an agent
before being canceled by the server. Leave it empty or at zero if you want the job to never be canceled by the server.
This timeout are including all tasks in your build definition, if the test step out of time. Then the whole build definition will be canceled by the server. Certainly, the whole build fails and all subsequent steps are aborted.
According to your requirement, suggest you leaving this value 0 and setting "continue on error" for test step just as comment suggested. With this, if the step fails, all following steps are executed and the step/ overall build partially succeeds. Then you could got related info to trouble shoot the failed task.
If your test will not automatically judge whether the execution is timeout or not, another way is creating a custom build task to collect the execution time of your test task(through read build log), set a judgment to pass of fail the customize step using Logging Commands.

Related

Jenkins job gets stuck in queue. How to make it fail automatically if the worker is offline

So on my jenkins sometimes my worker "slave02" gets offline and needs to manually get unstuck. I will not get into details, because it's not the point of this question here.
The scenario so far:
I've configured a job intentionally to get processed on that exact worker. But obviously it would not start since the worker is offline. I want to get notified when that job gets stuck in queue. I've tried to use Build Timeout Jenkins Plugin and I've configured it to fail the build if it waits for longer than 5 minutes to complete the job.
The problem with this is that the plugin makes sure the job fails 5 minutes after the build gets started... which does not help in my case. Because the job doesn't start, rather it sits in queue waiting to get processed but that never happens. So my question is - is there a way to make the job check if that worker is down to just automatically fail the build and send notification?
I am pretty sure that can be done but I could not find a thread where this type of scenario is being discussed.

Jenkins remote API - wait for build to finish and get output?

When using Jenkins CLI, I can use the build command with options -v and -s to run a build, waiting for it to finish and printing its output.
Is there any way I can achieve the same result (wait for execution and get job output) with a single call to the REST API? I know this can be done by polling for build status until it finishes and then requesting its output, but I want to know if there is a straightforward option for short-running jobs.
You can do like that somehow. But even if you do also you can't able to apply the same code for other jobs. There will be waiting period for the next available executor or some race conditions like this might happen. And holding the rest API for that long period is not gonna be a good option. And nobody suggests that.
So Instead of looking for the REST API, you can have an algorithm for polling itself. instead of every second, take results from the previous builds and process it and try to predict the near possible time and then poll. Like this kind of algorithms or else you can use Jenkins build remaining time also. Hope this makes sense.

Jenkins - monitoring the estimated time of builds

I would like to monitor the estimated time of all of my builds to catch the cases where this value is shown as 'N/A'.
In these cases the build gets stuck (probably due to network issues in my environment) and it won't start new builds for that job until killed manually.
What I am missing is how to get that data for each job, either from api or other source.
I would appreciated any suggestion.
Thanks.
For each job, you can click "Trend" on the job run history table, and it will show you the currently executing progress along with a graph of "usual" execution times.
Using the API, you can go to http://jenkins/job/<your_job_name>/<build_number>/api/xml (or /json) and the information is under <duration> and <estimatedDuration> fields.
Finally, there is a Jenkins Timeout Plugin that you can use to automatically take care of "stuck" builds

Stop Jenkins schedule build if it was failed more than 10 times?

I set my Jenkins job to build automaticlally many times a day by the scheduler.
If the build is failed, it will send mail to my team.
However I don't want to spamming the mail box. How can I set a condition to stop the build scheduler if it was failed more than 10 times ?
Rather than scheduling the job continuously, try the continuous integration paradigm, like this:
Unconditionally schedule the job only rarely. Perhaps once per day, just to ensure than any external factors (missing resources, changed interfaces, etc.) haven't come into play.
Trigger the job when any known source or dependency changes (e.g. source code, jar in your artifact repository, DB schema change, etc.)
Use a suitable plugin to retry failures.
I recommend the Naginator plugin for this. It can nag a limited number of times, and it auto-throttles: it nags frequently to begin with, then less frequently after a protacted period of failure.
Even if you don't change how the job is trigger, Naginator is probably a good solution for you. Use it to send your emails, instead of using an unconditional on-failure step.

How to disable a CodedUI Test Agent from code?

We have a service to pick up custom tests in XML and convert those to CodedUI tests. We then start a process for MSTest to load the tests into the Test Controller which then distributes the tests across various Agents. We run regression tests at night, so no one is around to fix a system if something goes wrong. When certain exceptions occur in the test program, it pops open an error window and no more test can run on the system. Subsequent tests are loaded into the agent and fail immediately because they can not perform their assigned tasks. Thousands of tests that should take all night on multiple systems now fail in minutes.
We can detect that an error occurred by how quickly a test is returned, but we don't know how to disable the agent so as not to allow it to pick up any more tests.
addendum:
If the test has failed so miserably that no more tests can attempt a successful run (as noted, we may not have an action to handle some, likely new, popup), then we want to disable that agent as no more tests need to run on it: they will all fail. As we have many agents running concurrently, if one fails (and gets disabled), the load can still be distributed without a long string of failures. These other regression tests can still have a chance to succeed (everything works) or fail (did we miss another popup, or is this an actual regression failure).
2000 failures in 20 seconds doesn't say anything except 1 system had an problem that no one realized it would have and now we wasted a whole night of tests. 2 failures (1 natural, 1 caused by issue from previous failure) and 1 system down means the total nights run might be extended by an hour or two and we have useful data on how to start the day: fix 1 test and rerun both failures.
One would need to abort the testrun in that case. If you are running mstest yourself, you would need to inject a ^c into the command line process. But: if no-one is around to fix it, why does it matter that the consequenting test fail ? if its just to see which test was the cause of the error quickly, why not generate a code ui check to see if the message box is there and mark the test inconclusive with Assert.inconclusive. The causing test would stand out like a flag.
If you can detect the point at which you want to disable the agent then you can disable the agent by running the "TestAgentConfig.exe delete" which will reset the agent to unconfigured state.

Resources