jenkins job stuck on '[EnvInject] - Loading node environment variables' - jenkins

I've had this problem where I send a job to one of my nodes and sometimes the job would stuck on
[EnvInject] - Loading node environment variables.
The issue started when I moved my Jenkins machine from Amazon EC2 to Azure VM. Everytime it occured, I have stopped and relaunched the agent on the node, just so in a few minutes/hours it would freeze again. Usually I would stop the job after a few minutes, but if I let it run, in ~15 minutes I would receive the following:
FATAL: java.io.IOException: Unexpected termination of the channel
I have tried the following solutions, but with no help:
Rename the job
Create a new job (which runs just a basic command)
Uncheck the 'Use TCP_NODELAY flag on the SSH connection'
Adding '-Dhudson.slaves.ChannelPinger.pingInterval=2' to the JVM options
My architecture at the time was:
Jenkins as docker container on Azure VM (Ubuntu 18.04)
Jenkins ver 2.202
SSH Build Agents plugin version 1.31.1
Node on MacOS Mojave

The solution that sort of solved the problem, (now it occurs once in a few days, or even better) was adding ClientAliveInterval 120 to /etc/ssh/sshd_config and relaunch the com.openssh.sshd
sudo launchctl stop com.openssh.sshd
sudo launchctl start com.openssh.sshd

Related

Trying to run a backup on another Jenkins server

I recently had some issues regarding version upgrade for my Jenkins server. In order to update the version of the jenkins server, the first step I did was to create a backup:
sudo tar -zcvf /tmp/jenkins.tgz /var/lib/jenkins
Then, I copied the archived file, from server A and untar it on another server, server B. I can see all the files [workspace, config.xml, jobs] of server A to server B in var/lib/jenkins.
When I am logging into the jenkins box it showed:
Jenkins detected that you appear to be running more than one instance of Jenkins that share the same home directory '/var/lib/jenkins’. This greatly confuses Jenkins and you will likely experience strange behaviors, so please correct the situation.
This Jenkins:
490566619 contextPath="" at 8779#jm1597185631ybr.cloud.phx3.gdg
Other Jenkins:
1998724099 contextPath="" at 20292#jm1584048540yxl.cloud.phx3.gdg
So, I stopped the jenkins service using:
sudo service jenkins stop
Then I restarted the service using
sudo service jenkins restart
All the jobs started to appear suddenly. I have following questions:
Why did the jobs started to show up and not throw the error of running multiple instances?
If version is the only issue, why cannot the newly provisioned
server have the updated version? Is it when I copy the files from server A, the server B files gets overwritten and hence, shows the same error of the version upgrade?

Running jenkins on ubuntu subsystem of Windows

I was able to install and run jenkins on my linux subsystem in Windows 10.
It listens on 8082.
But unfortunately, for an unknown reason, it hangs up infinitely after a few minutes (or to be precise after a I've made a change in a job config and execute a build).
Then, I checked in the terminal:
root#jup1t3r /h/navds# service jenkins status
Correct java version found
2 instances of jenkins are running at the moment
but the pidfile /var/run/jenkins/jenkins.pid is missing
root#jup1t3r /h/navds# service jenkins stop
Correct java version found
* Stopping Jenkins Automation Server jenkins
...done.
root#jup1t3r /h/navds# service jenkins status
Correct java version found
2 instances of jenkins are running at the moment
but the pidfile /var/run/jenkins/jenkins.pid is missing
So there is no way to stop Jenkins. How can I restart it ?

Session loogging in /etc/profile

The session logging we have in {{/etc/profile}} can interfere with services that launch sub-shells as new users - specifically, it always launches an interactive terminal, regardless of the context, which can cause certain key processes (e.g. Jenkins) from being able to perform critical tasks.
We had a Jenkins version upgrade and after hte upgrade, Jenkins seems to not be able to restart. Here’s what’s happening
```ubuntu#hoatname:~$ sudo service jenkins status
Correct java version found
Jenkins Automation Server is not running
ubuntu#hostname:~$ sudo service jenkins start
Correct java version found
Starting Jenkins Automation Server jenkins jenkins#hostname:~$
jenkins#hostname:~$
jenkins#hostname:~$ sudo service jenkins status
[sudo] password for jenkins:
jenkins#hostname:~$ exit
exit
[fail]
ubuntu#hostname:~$
```
Essentially, it seems that “service jenkins start” is somehow causing a session to be created, which dumps it into a script. I suspect this is due to how /etc/profile contains a script-based session logger, and i suspect that Jenkins is attempting to execute this script when it su’s into its own jenkins user
What should I do to alleviate this?

jenkins on demand slaves windows

The on-demand slaves are being created successfully from Jenkins. The first build on the slave is successful but the subsequent builds are fails. The restart of the slave or restart of the wimrc services allows the build to proceed again.
The tcpdump shows no errors. Can't figure out what the issue is. Looks like issues Jenkins communicating with on demand slaves using wimrc.
Has anybody faced similar issue?
The on-demand slaves are windows slave
The issue was with the "MaxMemoryPerShellMB" parameter of the Winrm. This was set to too low. So when the npm or git was doing a checkout it was running out this memory in the Winrm shell.
I have increased to 1GB and its working fine now

Is this a supported Jenkins Master-slave configuration?

We have a master Jenkins running on a Linux system. The same master is attached as node using "Launch slave via execution of a command on the master". It has the same FS root as the JENKINS_HOME. The command is ssh "machine_name" "shell_script"
The shell script gets the latest slave.jar and runs it.
The master has 0 executors. The node has been given 7. I'm seeing weird behavior in the builds, like workspaces being deleted once a day, etc. I'm not sure if this is related to the way the Jenkins Master-slave is configured.
Any ideas if this is a supported configuration?

Resources