TFS: not all agent machines run (any) tests during build - tfs

We have a set of approximately 1000 (currently) tests, written in C#, that run twice a week. We are using TFS 2017 update 1 (on-premise), and the system is configured to run tests on 6 VMs that are set up on another machine. 4 of those VMs are running Windows Server 2012R2, and the other 2 are Windows Server 2016. All are fully up-to-date with updates. Development is done using VS 2017. All tests are straight, non-UI functional tests that use instances of SQL server on each machine, with no cross-machine communication or anything like that during tests. Each machine is its own self-contained environment.
All the tests run fine locally (or if they fail we know why and that's fine). When we launch a build, the actual build and other preliminary steps seem to go fine. When it gets to the "Run Functional Tests" step, though, only SOME of the agent machines begin running them, typically 2 of the 6, sometimes 3, and not usually the same ones. Looking at Task Manager I can see that the other VMs have no processor activity to speak of. I have no idea why this should be, or why it seems to be different machines that successfully fire up and begin running tests from run to run.
The kicker is that because all machines are not running, this puts us over the 6 hour limit test run limit, which I have seen dealt with in other threads....I have tried everything suggested in those threads (including the setting in the .runsettings file) and can't get it to let go of that timeout, and so our whole run cancels after 6 hours with a lot of the tests aborted. For some reason last week we got a clean run of all the tests...nothing had changed since the prior run, and it reverted to this behavior with the next run.
Any insight on either the "lazy" machines not running tests OR removing the 6 hour timeout would be much appreciated...
Additional note: this behavior started "out of the blue", on or about August 9. Until then, all machines performed as desired. No software upgrades or anything else suspect happened anywhere near that time. Though the 6-hour time limit thing has been bugging me for a year or so.

There is an Execution Options called Distribute tests by number of machines.
Instead of distributing tests at the assembly level, enabling this
setting will distribute tests based on the number of machines
irrespective of the container assemblies given to the task.
Source Link
Note: that tests within a dll might also be distributed to multiple machines.
To bypass the 6 hours limitation, you could set the runTimeout to xx hours in testsettings files and should be able to run tests more than 6 hours on successfully.
<Execution>
<Timeouts runTimeout="36000000" testTimeout="5400000" />
<AgentRule name="Execution Agents">
</AgentRule>
</Execution>
More details please refer the answer in this thread: TFS 2015 vNext : Test Run always abort after 6 hours

Okay, after going back through everything thoroughly, I discovered that there were actually TWO problems in my setup. The elusive one was that my runsettings file did NOT have the ForcedLegacyMode=True, which is apparently necessary for it to even look for the testsettings file. Once I corrected that, I discovered that it was not correctly pointing to my testsettings file. There is a further problem after that in that it's not discovering any tests now, but I think that might be due to a different problem, so I suspect the problem as originally stated is now resolved. Thanks for the suggestions!

Related

TFS 2017 gets stuck when the Visual Studio Test task tries to publish results

We have a TFS 2017 build agent executing a Visual Studio Test task to execute our unit tests. This has worked fine for several years, but all of a sudden - without any code changes - the task gets stuck.
All the tests have finished running, we see summary information, and it will sit at what appears to be the place where it would normally publish the results... but then nothing happens. We've waited 12+ hours for it to finish. This step normally takes about 90 minutes.
I've confirmed that the TRX file is being created. It's about 4MB in size. We're running a bit over 3000 unit tests.
I've also tried disabling code coverage and attachments upload inside the test task, but it doesn't appear to make a difference.
Below is a screen cap of the log output when the step is stuck.
Lastly, we have lots of other projects on this server whose tests run / publish fine, as well as TFS Releases for this same build that also run tests (integration/system tests) which work without issue.
UPDATE: We ran this build on a different build server, and it published tests correctly. So this means there is something wrong with this specific build server...
UPDATE 2: So I'm not longer sure what is happening here. The original build server we were having issues on is now working fine with no changes whatsoever. Just started working again. The other build server was working, and then stopped. Same issue. I broke up the 3000+ tests into two steps, roughly 50/50, and that worked a couple of times, but now does not. So this does not appear to be server specific, nor does it appear to be related to the quantity of tests. Debug logging offers nothing useful, as everything seems fine right up until it just stops doing anything after generating the TRX file.
UPDATE 3: Well, it's happening again. I'm not sure how to proceed. I even tried Fiddler on the build box to see if I could catch funky looking traffic, but most of the traffic I'd expect to see I don't. It's like a good chunk of the work isn't being captured (such as source downloads, reporting progress, or test result publishing) by Fiddler. Is it not over HTTP/HTTPS?
This was difficult to figure out due to the quantity of tests we're running, but I was able to narrow it down to a test that launched ping.exe:
[ExpectedException(typeof(TimeoutException))]
[TestMethod]
public void ProcessWillTimeout()
{
const string command = "cmd";
const string args = "/C ping 127.0.0.1 /t";
var externalProcessService = new ExternalProcessService();
externalProcessService.Execute(command, args, TimeSpan.FromMilliseconds(500));
}
For whatever reason, this test was leaving both conhost.exe and ping.exe "orphaned". The fact these processes were not terminating was, for an unknown reason, preventing the tests from publishing their results back to TFS. There is probably something somewhere that waits for the a process to finish and that was never happening.
Indeed, we would see a bunch of conhost.exe and ping.exe processes in both Task Manager and Process Explorer:
You'll notice the tool tip there... "[Error opening process]". I couldn't even use Process Explorer to kill these processes - although Task Manager could. Sure enough, when I killed them, the TFS build task would immediately resume and finish publishing results.
So there is clearly some kind of bug in that ExternalProcessService code we were testing (despite carefully having a finally block that terminated the process), but we are at least able to have our build tests run again without issue.
Suggest you abandon this build and trigger it again. To narrow down if this issue could be reproduced stably.
According to your description, all other builds work properly. And it worked fine for several years. All tests pass, the test report is written, but just the task hangs. Please double check if some other processes might possibly not be properly closing down.
Besides use another build agent to test again. Also try to create a newly build definition with the same settings, trigger that definition, this may do the trick.
Moreover, you could enable verbose logging for trouble shooting. To do this simply adding a build variable named system.debug and setting its value to 'true', this will contain a more detail log info.

Getting the name of the development machine at compile time?

I'm building an iOS app that communicates with a server. We have a test / staging server, a production server and each dev has a local instance of the server for development.
I've added some simple logic which configures the address of the server depending on whether we're running a TestFlight build, an App Store build or a debug build (for development). For the development build, the app tries to hit localhost, which is all well and good if we're running on the Simulator, but not so great if we're running on device.
I'm aware of ngrok, which is a possible solution, but since the exposed URL is partially randomly generated (for the free version at least), it's not a great fit. I was thinking that a workable approach for development could be to check the name of the development machine at compile time and insert this value. But I'm not sure how to achieve this, if it's possible at all. I remember doing compile time variable filtering using ant / maven and environment property files back in my Java days, but I'm wondering if there's a fairly straightforward way to achieve this in Xcode.
Can anybody shed any light on this?
So I carried on digging, and went with the following solution. Elements of this have been touched upon in numerous other posts here.
I added a new header file called HostNameMacroHeader.h to my
project.
I added a 'Run Script' phase to my build, before the
'Compile Sources'phase. The script contains the following:
echo "//***AUTOGENERATED FILE***" > ${SRCROOT}/MyAppName/HostNameMacroHeader.h
echo "#define BUILD_HOST_NAME #\"`hostname`\"" >> ${SRCROOT}/MyAppName/HostNameMacroHeader.h
Then in my implementation, where I want to use the server address, I use the generated BUILD_HOST_NAME macro.
It's a somewhat hacky solution, but it does the job for now. Suggestions and cleaner versions are welcome.

Why does my build hang when using Jenkins?

I have a build which hangs in Jenkins. I have deconstructed the build down to a single windows command call, a directory change and a couple of echoes in an effort to isolate the problem. It would appear that the problem lies with a single call to a program executable (now the only call in my build). The build calls the program & then hangs for 30+ minutes (I cancel the build after this time) when it should take less than 1 second. Ordinarily I would be inclined to blame the executable or my misuse thereof, but for the fact that the same call (quite literally copied & pasted) in an ordinary command prompt works perfectly fine. Further muddying the waters is my knowledge of the fact that the build I'm trying implement is working just fine on another Jenkins server I know of, executable and all, and has never had an issue. I'm sorry I can't provide details on the executable in question but it's sensitive information. It may very well be the case that the executable is to blame, but the exact same call in three environments and only one hangs? What do you think?
More info on request.
So there I was, waiting for a reply when I decided to click on one of the "Related" topics on the right hand side. Lo and behold, there it was, a solution to my problem in answer to a different question. In short it goes like this:
Jenkins is a service. Services (on Windows) have a "Log On" account, of which the default appears to be "Local System". This had the effect of forcing (though I don't know why) my program out of quiet mode and thus hanging the build. Changing the "Log On" account to my own seems to have changed the behavioural relationship between Jenkins and the executable so that it now runs smooth and silent.

One build definition won't generate fakes assemblies, another one does

Introduction
I have a problem with Team Foundation Server Express 2013 on my machine. I have two build definitions on the same controller and agent, both of which run on the same server and the same environment as well.
It should be noted that I already looked at the "similar questions" without any luck. This is clearly not related to the same root cause, and the symptoms are slightly different too.
One of them is a gated check-in build definition, which just compiles everything when commiting to the development branch.
Another is a scheduled build definition, which runs every saturday at 3 AM, building any changes that may have been committed to the main branch since last time.
The gated build definition has a process (which only has minor changes for not running tests and just compiling the code) based on the TfvcTemplate.12.xaml template.
The scheduled build definition process is based on some Azure build definition template that might come from an older version of Visual Studio, possibly based on some Azure template, or maybe the TfvcContinuousDeployment.12.xaml template.
The issue
My gated build definition runs just as expected, without issues. It compiles the full solution, and only passes if the compilation succeeds.
The shceduled build definition however fails compiling (even before it reaches the point where it runs the unit tests). The error I see is as follows.
Obviously this is due to missing fakes assemblies. I tried taking the assemblies and checking them in (which I would rather avoid), only to find that this build definition runs just fine, but not the gated one which ran just fine before.
I thought about just running fakes.exe in the build template to just generate them manually before compiling, but in my initial tests (to see if this theory would even work), it won't even run in the commandline, and outputs some errors and warnings that I don't understand (but are probably not relevant anyway, since I might be running fakes.exe with improper arguments).
Updates
Update #1
It should be noted that I have Visual Studio 2013 Ultimate installed on my build server as well. Both this (and TFS 2013 Express) have Update 3 installed, and the server is fully updated.
I ended up abandoning fakes all together, and implementing Moq instead. Works a lot better, and forces me to abandon shims or moles, which are often considered bad practise anyway.

TFS2010: Publishing test results takes forever

We have a TFS2010 instance that is hosted overseas. We have a proxy locally so pulling source code is really no problem. Even full gets are quite speedy. One issue that we are experiencing is that when an automated build completes it takes forever for the test results to be published back to the server. The entire build and test process takes about 5 minutes. The publishing of the test results takes an additional 10 minutes. Twice the time it took to get the source, build it, and run all 1500 unit tests!
Is there any way I can speed this up?
Try setting up a build agent at your location and set a name filter in the build definition to always select that build agent.

Resources