How can I call a shell script to start a backend Java process? - jenkins

After completing a Jenkins task, I execute a Linux shell script by using Jenkins' post-condition configuration section.
This Linux shell script wants to launch a standby service on the backend and can NOT cause Jenkins to pause.
I tried to use "nohup+&", etc., but it does not work.
Is there a good way to do it?

Jenkins is probably waiting for some pipes to close. Your background process has inherited some file descriptors and is keeping them open for as long as it runs.
If you are lucky, the only file descriptors are 0, 1 and 2 (the standard ones.) You might want to check the file descriptors of the background process using lsof -p PID where PID is the process id of the background process.
You should make sure all of those file descriptors (both inputs and outputs) are redirected for the background process, so start it with something like:
nohup daemon </dev/null >/dev/null 2>&1 &
Feel free to direct the output to a file other than /dev/null but make sure you keep the order of the redirections. The order is important.
If you plan to start background processes from a Jenkins job, be advised that Jenkins will kill background processes when build ends. See https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller on how to prevent that.

I had a similar problem with running a shell script from Jenkins as a background process. I fixed it by using the below command:
BUILD_ID=dontKillMe nohup ./start-fitnesse.sh &

Related

Using SLURM to run TCP client, server

I have a Docker image that needs to be run in an environment where I have no admin privileges, using Slurm 17.11.8 in RHEL. I am using udocker to run the container.
In this container, there are two applications that needs to run:
[1] ROS simulation (there is a rosnode that is a TCP client talking to [2])
[2] An executable (TCP server)
So [1] and [2] needs to run together and they shared some common files as well. Usually, I run them in separate terminals. But I have no idea how to do this with SLURM.
Possible Solution:
(A) Use two containers of the same image, but their files will be stored locally. Could use volumes instead. But this requires me to change my code significantly and maybe break compatibility when I am not running it as containers (e.g in Eclipse).
(B) Use a bash script to launch two terminals and run [1] and [2]. Then srun this script.
I am looking at (B) but have no idea how to approach it. I looked into other approaches but they address sequential executions of multiple processes. I need these to be concurrent.
If it helps, I am using xfce-terminal though I can switch to other terminals such as Gnome, Konsole.
This is a shot in the dark since I don't work with udocker.
In your slurm submit script, to be submitted with sbatch, you could allocate enough resources for both jobs to run on the same node(so you just need to reference localhost for your client/server). Start your first process in the background with something like:
udocker container_name container_args &
The & should start the first container in the background.
You would then start the second container:
udocker 2nd_container_name more_args
This would run without & to keep the process in the foreground. Ideally, when the second container completes the script would complete and slurm cleanup would kill the first container. If both containers will come to an end cleanly you can put a wait at the end of the script.
Caveats:
Depending on how Slurm is configured, processes may not be properly cleaned up at the end. You may need to capture the PID of the first udocker as a variable and kill it before you exit.
The first container may still be processing when the second completes. You may need to add a sleep command at the end of your submission script to give it time to finish.
Any number of other gotchas may exist that you will need to find and hopefully work around.

jenkins kills ssh session when supervisord restarts

I'm using jenkins to do a few actions in a remote server.
I have an Execute Shell command in which I do the following:
sudo ssh <remote server> 'sudo service supervisor restart'
sleep 30
When jenkins reaches the first line I can see 'Restarting Supervisor' but after a moment I see that jenkins closed the ssh connection and moved on to the second line.
I tried adding a 'sleep 30' after the restart command but it still doesn't work.
Seems jenkins doesn't wait for the supervisor restart command to be completed.
Problem is it's not something that always happens, just sometimes, but it does make a lot of problems when it fails.
I think you can never be certain all processes started by supervisord are in a 'ready' state after a restart. Even is the restart action would wait for processes to be started, it wouldn't know if they are 'ready'.
In docker-compose setups that need to know if a certain service is available I've used an extra 'really ready' check for this - optionally in a loop with a sleep/wait. If the process that you are starting opens a port you can use one of the variations of 'wait-for' for this.

Hooking subprocess logs into main log output in Docker

I'm running SGE (Sun Grid Engine) in a Docker container in order to replicate our live SGE cluster. If you haven't run across it, SGE is basically a program that runs other programs (while managing resources across a cluster - i.e. a grid scheduler). That is of course in conflict with the docker "one process per container" philosophy (and if you follow down this path of reasoning far enough you'll think "why use a grid scheduler rather than just sticking docker containers on Swarm or Kubernetes or something", and you'd be right, only I can't change our whole scheduling infrastructure to fix this problem, sadly).
So, I'm trying to get the logs out of those programs run by SGE and into the general docker log. The qsub command (which submits jobs to the SGE queue for running) takes arguments which allow you to specify the location of STDOUT and STDERR.
The best attempt that I have managed so far is to start the two main processes (sge_execd and sge_qmaster) via a script which never terminates (good old tail -f /dev/null), and then do something like the following:
qsub -o /proc/1/fd/1 -e /proc/1/fd/2 my_script
This horrific hack is hooking into the file descriptors of process 1 (i.e. our tail -f-ing, sge-invoking process), which happens to have its STDOUT and STDERR connected to the docker logs, as you might expect.
This feels pretty nasty, though. Can someone suggest a better way of achieving this?

Using SSH (Scripts, Plugins, etc) to start processes

I'm trying to finish a remote deployment by restarting the two processes that make my Python App work. Like so
process-one &
process-two &
I've tried to "Execute a Shell Script" by doing this
ssh -i ~/.ssh/id_... user#xxx.xxx ./startup.sh
I've tried using the Jekins SSH Plugin and the Publish Over SSH Plugin and doing the same thing. All of the previous steps, stopping the processes, restarting other services, pulling in new code work fine. But when I get to the part where I start the services. It executes those two lines, and none of the Plugins or the Default Script execution can get off of the server. They all either hang until I restart Jekins or time out int he case of the Publish Over SSH plugin. So my build either requires a restart of Jenkins, or is marked unstable.
Has anyone had any success doing something similar? I've tried
nohup process-one &
But the same thing has happened. It's not that the services are messing up either, because they actually start properly, it's just that Jenkins doesn't seem to understand that.
Any help would be greatly appreciated. Thank you.
What probably happens in that the process when spawned (even with the &) is consuming the same input and output as your ssh connection. Jenkins is waiting for these pipes to be emptied before the job closes, thus waits for the processes to exit. You could verify that by killing your processes and you will see that the jenkins job terminates.
Dissociating outputs and starting the process remotely
There are multiple solutions to your problem:
(preferred) use proper daemon control tools. Your target platform probably has a standard way to manage those services, e.g. init.d scripts. Note, when writing init.d scripts, make sure you detach the process in the background AND ensure the input/output of the daemon are detached from the shell that starts them. There are several techniques, like like http://www.unix.com/man-page/Linux/8/start-stop-daemon/ tools, daemonize, daemontools or something like the shell script described under https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+as+a+Unix+daemon (take note of the su -s bin/sh jenkins -c "YOUR COMMAND; ...disown" etc). I also list some python specific techniques under.
ssh server 'program < /dev/null > /dev/null 2>&1 &'
or
ssh server 'program < /dev/null >> logfile.log 2>&1 &' if you want to have a crude output management (no log rotation, etc...)
potentially using setsid (I haven't tried) https://superuser.com/questions/172043/how-do-i-fork-a-process-that-doesnt-die-when-shell-exits . In my quick tests I wasn't able to get it to work though...
Python daemons
The initial question was focused on SSH, so I didn't fully described how to run the python process as daemon. This is mostly covered in other techniques:
with start-stop-daemon: start-stop-daemon and python
with upstart on ubuntu: Run python script as daemon at boot time (Ubuntu)
some more python oriented approaches:
How to make a Python script run like a service or daemon in Linux
Can I run a Python script as a service?

How many threads should Jenkins run?

I have a Jenkins server that keeps running out of memory and cannot create native threads. I've upped the memory and installed the Monitoring plugin.
There are about 150 projects on the server, and I've been watching the thread count creep up all day. It is now around 990. I expect when it hits 1024, which is the user limit for threads, Jenkins will run out of memory again.
[edit]: I have hit 1016 threads and am now getting the out of memory error
Is this an appropriate number of threads for Jenkins to be running? How can I tell Jenkins to destroy threads when it is finished with them?
tl;dr:
There was a post-build action running a bash script that didn't return anything via stderr or stdout to Jenkins. Therefore, every time the build ran, threads would be created and get stuck waiting. I resolved this issue by having the bash script return an exit status.
long answer
I am running Jenkins on CentOS and have installed in via the RPM. In terms of modifying the Winstone servlet container, you can change that in Jenkin's init script in /etc/sysctrl/jenkins. However the options above only control the number of HTTP threads that are created, not the number of threads overall.
That would be a solution if my threads were hanging on accessing an HTTP API of Jenkins as part of a post-commit action. However, using the ever-handy Monitoring plugin mentioned in my question, I inspected the stuck threads.
The threads were stuck on something in the com.trilead.ssh2.channel package. The getChannelData method has a while(true) loop that looks for output on the stderr or stdout of an ssh stream. The thread was getting suck in that loop because nothing was coming through. I learned this on GrepCode.
This was because the post-build action was to execute a command via SSH onto a server and execute a bash script that would inspect a git repo. However, the git repo was misconfigured and the git command would error, but the exit 1 status did not bubble up through the bash script (partially due to a misformed if-elif-else statement).
The script completed and the build was considered a success, but somehow the thread handling the SSH connection from Jenkins was left hanging due to this git error.
But thank you for your help on this question!
If you run Jenkins "out of the box" it uses Winstone servlet container. You can pass command-line arguments to it as described here. Some of those parameters can limit the number of threads:
--handlerCountStartup = set the no of worker threads to spawn at startup. Default is 5
--handlerCountMax = set the max no of worker threads to allow. Default is 300
--handlerCountMaxIdle = set the max no of idle worker threads to allow. Default is 50
Now, I tried this some time ago and was not 100% convinced that it worked, so no guarantees, but it is worth a try.

Resources