multiple slave processes on jenkins slave machine - jenkins

I have a master and multiple slave machines. The slaves are configured to start via ssh from the master. Initially, there is a single java process running the slave.jar. Occasionally, I'll login to a slave and find that there are 2 or even sometimes 3 java processes running slave.jar. This is while no jobs are running.
How many slave processes should be running when the slave is idle?
tomcat 54054 53913 0 Sep02 ? 00:00:00 bash -c cd "/var/hudson" && java -jar slave.jar
tomcat 54055 53914 0 Sep02 ? 00:00:00 bash -c cd "/var/hudson" && java -jar slave.jar
tomcat 54080 54054 1 Sep02 ? 01:11:45 java -jar slave.jar
tomcat 54081 54055 2 Sep02 ? 01:44:17 java -jar slave.jar

I had the same problem... and recognized after hours(!) of investigation, that our backup system (master) was online and connected to the same slaves (a backup was installed there for test reasons).
Each time-triggered build was triggered nearly at the same time and failed completetly randomly in about a third of the builds jobs executed.
So perhaps there's really another master connecting to your slaves?

Related

Jenkins installation issue through Ansible playbook

I have written ansible playbook to install cloudbees jenkins in our production VM's. While running ansible playbook for the first time, it got successful but jenkins not up and running and same playbook running in second time jenkins up and running fine.
I have manually downloaded jenkins.war(2.138.x) and moved to VM. While running ansible playbook for the first time it is executed successfully, but the jenkins is not up and running, so i went inside vm and checked jenkins home directory. It is only showing one folder i.e WAR, but while running the same playbook for the second time the jenkins is up on running and again i have logged into Vm and checked jenkins home directory. Now i can able to see all configuration files inside home directory.i have checked the logs what i have observed is first time while running the jvm lines is not taken. but while running second time i have observed some jvm related lines.But i have installed java and i set environment variables as well.
- hosts: all
become: true
become_user: user1
tasks:
- name: installing the jenkins
shell: nohup java -DJENKINS_HOME=/user1/jenkins -jar /user1/jenkins.war &
I am expecting the cloudbees jenkins installation for first time only, while running the Ansible-playbook
- hosts: all
become: true
become_user: user1
tasks:
- name: installing the jenkins
shell: cd /user1 ; nohup java -DJENKINS_HOME=/user1/jenkins -jar jenkins.war &
First change the current directory to your jenkins war file directory and then run the nohup command in remote server

Create Jenkins slave from command line

I have a script that create a VM then I need to add the new VM as a slave to existing master,
How can I do that from command line in windows and from the new VM ?
Create a VM (OS = WINDOWS)
Create a Node on jenkins server from command line using jenkins cli:
config.xml | ssh -l fmud -p %JNLP_PORT% %jenkins_IPADDRESS% create-node %TARGET_WIN_NODE_NAME%
Where Config.xml could be gathered from an existing slave with command:
ssh -l fmud -p %JNLP_PORT% %jenkins_IPADDRESS% get-node %TARGET_WIN_NODE_NAME%
Note: JNLP is needed for jenkins-CLI and could be configured in jenkins administrative page.
Also, you will have to change the config file with label and name etc.
Connect the windows slave as service: Download winsw.exe
Rename it to jenkins_slave.exe
Download slave.jar (http://yourserver:port/jnlpJars/agent.jar )
Download and configure jenkins-slave.xml
Download and configure jenkins-slave.exe.config
in cmd: jenkins_slave.exe install
jenkins_slave.exe start
If it is a java slave, that is simple to automate. Like it if you find it helping or ask more specific question otherwise.

Jenkins Slave Offline Node Connection timed out / closed - Docker container - Relaunch step - Configuration is picking Old Port

Jenkins version: 1.643.2
Docker Plugin version: 0.16.0
In my Jenkins environment, I have a Jenkins master with 2-5 slave node servers (slave1, slave2, slave3).
Each of these slaves are configured in Jenkins Global configuration using Docker Plugin.
Everything is working at this minute.
I saw our monitoring system throwing some alerts for high SWAP space usage on slave3 (for ex IP: 11.22.33.44) so I ssh'ed to that machine and ran: sudo docker ps which gave me the valid output for the currently running docker containers on this slave3 machine.
By running ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -10 on the target slave's machine (where 4 containers were running), I found the top 5 processes eating all the RAM was java -jar slave.jar running inside each container. So I thought why not restart the shit and recoup some memory back. In the following output, I see what was the state of sudo docker ps command before and after the docker restart
<container_instance> step. SCROLL right, you'll notice that in the 2nd line for container ID ending with ...0a02, the virtual port (listed under heading NAMES) on the host (slave3) machine was 1053 (which was mapped to container's virtual IP's port 22 for SSH). Cool, what this means is, when from Jenkins Manage Node section, if you try to Relaunch a slave's container, Jenkins will try to connect to the HOST IP's 11.22.33.44:1053 and do whatever it's supposed to successfully bring the slave up. So, Jenkins is holding that PORT (1053) somewhere.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae3eb02a278d docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 26 hours ago Up 26 hours 0.0.0.0:1048->22/tcp lonely_lalande
d4745b720a02 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1053->22/tcp cocky_yonath
bd9e451265a6 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1050->22/tcp stoic_bell
0e905a6c3851 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1051->22/tcp serene_tesla
sudo docker restart d4745b720a02; echo $?
d4745b720a02
0
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae3eb02a278d docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 26 hours ago Up 26 hours 0.0.0.0:1048->22/tcp lonely_lalande
d4745b720a02 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up 4 seconds 0.0.0.0:1054->22/tcp cocky_yonath
bd9e451265a6 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1050->22/tcp stoic_bell
0e905a6c3851 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1051->22/tcp serene_tesla
After running the sudo docker restart <instanceIDofContainer> I ran free -h / grep -i swap /proc/meminfo and found RAM (which was earlier fully used and was showing only remaining 230MB free) is now 1GB free and SWAP size which was 1G total, 1G used (I tried both swappiness 60 or 10), is now 450MB swap space free. So the alert thing got resolved. Cool.
BUT, now as you notice from the sudo docker ps output above, after the restart step, for that container ID ...0a02, I now got a new PORT# 1054!!
When I went to Manage Nodes > Tried to bring this node offline, stopped it, and relaunched it, Jenkins is NOT picking up the NEW PORT (1054). It's still somehow picking the old port 1053 (while trying to make a SSH connection to 11.22.33.44 (Host's IP) on port 1053 (which is mapped to container's Virtual IP's port # 22 (ssh)).
How can I change this port or configuration in Jenkins for this slave container so that Jenkins will see the new PORT and can successfully relaunch?
PS: Clicking "Configure" on the Node to see it's configuration is NOT showing me anything other than just Name field. Usually there's a lot of fields in a regular slave (where you can define the labels, root dir, launch method, properties env variables, tools for the slave environment but I guess for these Docker containers, I'm not seeing anything other than just Name field). Clicking Test Connection in Jenkins Global configuration (under Docker Plugin's section) shows it's successfully finding Docker version 1.8.3
Right now, as 1053 port (telnet) is not working as it's now 1054 for this container's instanceID (after the restart step), Jenkins relaunch step is failing during SSH connection step (first thing it does to connect via SSH method).
[07/27/17 17:17:19] [SSH] Opening SSH connection to 11.22.33.44:1053.
Connection timed out
ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Connection is not established!
at com.trilead.ssh2.Connection.getRemainingAuthMethods(Connection.java:1030)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPasswordAuthenticator.canAuthenticate(TrileadSSHPasswordAuthenticator.java:82)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:207)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:169)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1212)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[07/27/17 17:19:26] Launch failed - cleaning up connection
[07/27/17 17:19:26] [SSH] Connection closed.
OK. Zeeesus!
In JENKINS_HOME (of the MASTER server), I searched which config file was holding the OLD port# info for that/those container node(s) which were now showing as OFFLINE.
Changed directory to: nodes folder inside $JENKINS_HOME and found that there are config.xml files for each nodes.
For ex:
$JENKINS_HOME/nodes/<slave3_node_IP>-d4745b720a02/config.xml
Resolution Steps:
Vim edited the file to change the OLD with NEW port.
Manage Jenkins > Reload configuration from Disk.
Manage Nodes > Selected the particular node which was OFFLINE.
Relaunch slave, and this time Jenkins picked the new PORT and started the container slave as expected (as SSH connection to the new port visible after the configuration change).
I think this page: https://my.company.jenkins.instance.com/projectInstance/docker-plugin/server/<slave3_IP>/ web page, where it shows all the containers info (in a tabular form running on a given slave machine), this page has a button (last column) to STOP a given slave's container but not to START or RESTART.
Having a START or RESTART button there should do what I just did above in some fashion.
Better solution:
What was happening is, all 4 long lived container nodes running on slave3 were competing for gaining all the available RAM (11-12GB) and over the time the JVM process (java -jar slave.jar which the Relaunch step starts on the target container's virtual machine (IP) running on the slave3 slave server) for an individual container were trying to take as much memory (RAM) as they could. That was leading to low FREE memory and thus SWAP getting used and also getting used up to a point where a monitoring tool will start screaming at us via sending notifications etc.
To fix this situation, first thing one should do is:
1) Under Jenkins Global configuration (Manage Jenkins > Configure Systems > Docker Plugin section, for that slave server's Image / Docker Template, under the Advanced Settings section, we can put JVM options to tell the container NOT to compete for all RAM. Putting the following JVM options helped. These JVM settings will try and keep the heap space of each container in a smaller box as to not starve out the rest of the system.
You can start with 3-4GB depending upon how much total RAM you have on your slave/machine where the containers based slave nodes will be running.
2) Look for any recent version of slave.jar (that may have some performance / maintenance enhancements in place which will help.
3) Integrating the monitoring solution (Incinga/etc you have) to auto launch a Jenkins job (where Jenkins job will run some piece of action - BASH one liner, Python shit or Groovy goodness, an Ansible playbook etc) to fix the issue related to any such alert.
4) Automatically have a container slave nodes relaunched (i.e. Relaunch Step) - take slave offline, online, Relaunch step as that'll bring the slave back to a rejuvenated state of freshness. All we have to do is, look for an idle slave (if it's not running any job) then, take it offline > then online > then Relaunch the slave using Jenkins REST API via a small Groovy script and put this all in a Jenkins job and let it do the above if those slave nodes were long lived.
5) OR one can spin the container based slaves on the fly - use and throw model each time Jenkins queues a job to run.

How to have all Jenkins slave tasks executed with nice?

We have a number of Jenkins jobs which may get executed over Jenkins slaves. Is it possible to globally set the nice level of Jenkins tasks to make sure that all Jenkins tasks get executed with a higher nice level?
Yes, that's possible. The "trick" is to start the slave agent with the proper nice level already; all Jenkins processes running on that slave will inherit that.
Jenkins starts the slave agent via ssh, effectively running a command like
cd /path/to/slave/root/dir && java -jar slave.jar
On the Jenkins node config page, you can define a "Prefix Start Slave Command" and a "Suffix Start Slave Command" to have this nice-d. Set as follows:
Prefix Start Slave Command: nice -n -10 sh -c '
Suffix Start Slave Command: '
With that, the slave startup command becomes
nice -n -10 sh -c 'cd "/path/to/slave/root/dir" && java -jar slave.jar'
This assumes that your login shell is a bourne shell. For csh, you will need a different syntax. Also note that this may fail if your slave root path contains blanks.
I usually prefer to "Launch slave via execution of command on the Master", and invoke ssh myself from within a shell wrapper. Then you can select cipher and client of choice, and also setting niceness can be done without Prefix/Suffix kludges and without whitespace pitfalls.

First execution of Docker on a new EC2 Jenkins Slave does not work

I'm using the EC2 Plugin in Jenkins to spin up slave instances when we need them. Recently I've wanted to play around with Docker so I installed it on the AMI we use as a slave - but the first run on the slave never seems to work.
+ docker ps
time="2015-04-17T15:38:20Z" level="fatal" msg="Get http:///var/run/docker.sock/v1.16/containers/json: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?"
Any runs after this seem to work - why won't the slave work on the first job? I've tried using sudo, executing docker ps before docker build but nothing seems to fix the problem.
The problem is that Jenkins is just waiting for the slave to respond to an SSH connection, not that Docker is running.
To prevent the slave from becoming "online" too quickly, put a check in the "Init Script" section in the EC2 Slave Plugin configuration section. Here's an example of the one I use against the base AMI.
while [[ -z $(/sbin/service docker status | grep " is running...") && $sleep_counter -lt 300 ]]; do sleep 1; ((sleep_counter++)); echo "Waiting for docker $sleep_counter seconds - $(/sbin/service docker status)"; done
Amazingly, it can take up to 60 seconds between the slave coming up and the Docker service starting, so I've set the timeout to be 5 minutes.

Resources