So I was messin around with Jenkins for the laugh and trying to see what a slaves tolerance was for network interruptions. In other words how long of a network outage could happen without disconnecting the slave from the master.
What I found was that if I brought down the interface on the slave using ifdown it stayed connected to the master. The master didn't complain even when I kept the interface down for 15 seconds. So 15 seconds where ping and ssh were not possible.
However when I rebooted the slave the master instantly reported that the slave was offline/disconnected/gone (can't remember the exact terminology)
So why is it that bringing down the interface in this way doesn't seem to bother Jenkins? How is the connection between master and slave retained? Is there something about the ifdown command?
Related
I'm running a Jenkins server and some slaves on a docker swarm that's hosted on preemptive google instances (akin to AWS spot instances). I've got everything set up so that at any given moment there is a Jenkins master running on a single server and slaves running on every other server on the swarm. When one server gets terminated another is spun up and replaces it, and eventually Jenkins is back up running again on another machine even if its server was stopped, and slaves get replaced as they die.
I'm facing two problems:
My first one is when the Jenkins master dies and comes back online it tries to resume the jobs that were previously running and they end up getting stuck trying to be built. Is there any way to automatically have Jenkins restart jobs that were interrupted instead of trying to resume them?
The second is when a slave dies I'd like to automatically restart any jobs that were running on it elsewhere. Is there any way to do that?
Currently I'm dealing with both situations by have an external application retry the failed build jobs, but that's not really optimal.
Thanks!
I have a Jenkins master setup which has 2 linux slaves and a windows slave. I have a configuration where all boxes are switched off in the night and restarted in the morning. The Jenkins master shows 2 linux nodes in the morning however it does not show windows slave (it just disappears and not even shown offline). The Jenkins version I am using : 2.73.
The problem was related to swarm configuration which was resolved after putting together correct configuration files and enable on machine startup (to handle a situation if the machine goes down).
The on-demand slaves are being created successfully from Jenkins. The first build on the slave is successful but the subsequent builds are fails. The restart of the slave or restart of the wimrc services allows the build to proceed again.
The tcpdump shows no errors. Can't figure out what the issue is. Looks like issues Jenkins communicating with on demand slaves using wimrc.
Has anybody faced similar issue?
The on-demand slaves are windows slave
The issue was with the "MaxMemoryPerShellMB" parameter of the Winrm. This was set to too low. So when the npm or git was doing a checkout it was running out this memory in the Winrm shell.
I have increased to 1GB and its working fine now
We have a large number of jenkins slaves setup vip jenkins-swarm plugin(auto-discover the master)
Recently, we started to see slaves going offline and existing jobs get stuck. We fixed it with restarting the master. However, it has been happening too frequently. Everything seems to be fine, no network issue nor gc issue.
Anyone?
On the jenkins slaves, we see this repeatity once the node become unavailable:
Retrying in 10 seconds
Attempting to connect to http://ci.foobar.com/jenkins/ 46185efa-0009-4281-89d2-c4018fa9240c with ID 5a0f1773
Could not obtain CSRF crumb. Response code: 404
Failed to create a slave on Jenkins CODE: 409
A slave called '10.9.201.89-5a0f1773' is already created and on-line
Does the Server maintain a constant/continuous connection with client throughout the build process? or is the interaction between server and client connectionless?
As in does it open a connection distribute builds, close connection, And after the build is over client opens a connection to the master and reports?
A connection is maintained between the master and slave, if just so the console output can be displayed on the master real time (and perhaps other status reporting as well).
Apart from that, builds (i.e., the process executing the build) is self contained and executes independently on the slave machines.