Telegraf - inputs.procstat pgrep plugin issue - grep

Telegraf v1.0.1
I'm not able to see telegraf[._] (tree) metric anymore after I enabled [[inputs.procstat]] plugin.
Telegraf is installed successfully. Process is running. I'm pretty much using the normal settings for inputs plugins and output plugin.
This is what I got:
ubuntu#jenkins:/tmp/giga_aks_testing/ansible$ grep -C 2 jenkins /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf; echo ; ps -eAf|grep jenkins; echo; pgrep -f jenkins; echo; cat -n /var/log/telegraf/telegraf.log; echo date; echo; ps -eAf|grep telegraf; echo ; sudo service telegraf status
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
root 2875 3685 0 2016 pts/3 00:00:00 sudo su jenkins
root 2876 2875 0 2016 pts/3 00:00:00 su jenkins
jenkins 2877 2876 0 2016 pts/3 00:00:00 bash
jenkins 11645 1 0 2016 ? 00:00:01 /usr/bin/daemon --name=jenkins --inherit --env=JENKINS_HOME=/var/lib/jenkins --output=/var/log/jenkins/jenkins.log --pidfile=/var/run/jenkins/jenkins.pid -- /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
jenkins 11647 11645 0 2016 ? 05:33:22 /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
ubuntu 21973 26885 0 06:57 pts/0 00:00:00 grep --color=auto jenkins
2875
2876
11645
11647
1 2017-01-07T06:54:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
2 2017-01-07T06:55:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
3 2017-01-07T06:56:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
4 2017-01-07T06:57:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
date
telegraf 19336 1 0 05:45 pts/0 00:00:04 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraftelegraf.d
ubuntu 21977 26885 0 06:57 pts/0 00:00:00 grep --color=auto telegraf
telegraf Process is running [ OK ]
ubuntu#jenkins:/tmp/giga_aks_testing/ansible$
Why, the log file is showing an error when the jenkins process is running and pgrep -f jenkins is returning valid result.
PS: [[inputs.procstat]] plugin uses pgrep -f <exe_value_pattern> for it's logic if pattern = method is used, and pgrep <executable> if exe = method is used.
The full /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf file is:
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "telegraf"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "sshd"
prefix = "pgrep_serviceprocess"

OK. Seems like this is an OPEN bug.
Telegraf with [[inputs.procstat]] plugin entry won't barf if there's only one plugin in one file.
If you specify multiple entries, even if those exe = <executables_processes> are running, Telegraf will start spitting those errors out (PS: It won't stop Telegraf service from working though).
To fix the errors, this is what I did:
[[inputs.procstat]]
exe = "telegraf|.*"
prefix = "pgrep_serviceprocess"
Now, as pgrep is used for Telegraf's [[inputs.procstat]] plugin, it'll do this at OS level: pgrep "telegraf|.*".
Now, you can also just give exe = "." (simplest) or like exe = ".*" but practically those will not be easy to find out who actually is trying to do a grep on all processes running on the system.
NOTE: .* (will find every single processes running on the machine), so use it until we get a proper fix for this.
Related Source code Github file: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/procstat/procstat.go
Related issue: https://github.com/influxdata/telegraf/issues/586
I still couldn't find, why "telegraf.x.x" metrics are not available after I enabled [[inputs.procstat]] input. Is that due to a separate file? I'm not sure. But, I can see procstat.x.x metric tree but telegraf.x.x metric tree is not visible now.
OR better,
One can also use:
[[inputs.procstat]]
pattern = "."
prefix = "pgrep_serviceprocess"
The above will do: pgrep -f "." where pattern is . (to catch everything aka every processs/cmd/service running on a machine).
OR (but the following is not scalable solution as you have to know for which user. In some boxes, Jenkins may be running using a user other than jenkins).
[[inputs.procstat]]
user = "jenkins"
prefix = "pgrep_serviceprocess"
The above will do: pgrep -u "jenkins" where user is jenkins (to catch everything aka every processs/cmd/service running on a machine).
To check whether jenkins is running or not or if enhanceio is running or not, you can use [[inputs.exec]] plugin as well. I simply used: [[inputs.filestat]] plugin and it worked when I looked for the pid file for both tools. https://github.com/influxdata/telegraf/tree/master/plugins/inputs/filestat

Related

jmxterm: "Unable to create a system terminal" inside Docker container

I have a Docker image which contains JRE, some Java web application and jmxterm. The latter is used for running some ad-hoc administrative tasks. The image is used on the CentOS 7 server with Docker 1.13 (which is pretty old but is the latest version which is supplied via the distro's repository) to run the web application itself.
All works well, but after updating jmxterm from 1.0.0 to the latest version (1.0.2), I get the following warning when entering the running container and starting jmxterm:
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
After this, jmxterm does not react to arrow keys (when trying to navigate through the command history), nor does it provide autocompletion.
Some quick investigation shows that the problem may be reproduced in the clean environment with CentOS 7. Say, this is how I could bootstrap the system and the container with all stuff I need:
$ vagrant init centos/7
$ vagrant up
$ vagrant ssh
[vagrant#localhost ~]$ sudo yum install docker
[vagrant#localhost ~]$ sudo systemctl start docker
[vagrant#localhost ~]$ sudo docker run -it --entrypoint bash openjdk:11
root#0c4c614de0ee:/# wget https://github.com/jiaqi/jmxterm/releases/download/v1.0.2/jmxterm-1.0.2-uber.jar
And this is how I enter the container and run jmxterm:
[vagrant#localhost ~]$ sudo docker exec -it 0c4c614de0ee sh
root#0c4c614de0ee:/# java -jar jmxterm-1.0.2-uber.jar
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
root#0c4c614de0ee:/# bea<TAB>
<Nothing happens, but autocompletion had to appear>
Few observations:
the problem does not appear with older jmxterm no matter which image do I use;
the problem arises with new jmxterm no matter which image do I use;
the problem is not reproducible on my laptop (which has newer kernel and Docker);
the problem is not reproducible if I use latest Docker (from the external repo) on the CentOS 7 server instead of CentOS 7's native version 1.13.
What happens, and why the error is reproducible only in specific environments? Is there any workaround for this?
TLDR: running new jmxterm versions as java -jar jmxterm-1.0.2-uber.jar < /dev/tty is a quick, dirty and hacky workaround for having the autocompletion and other stuff work inside the interactive container session.
A quick check shows that jmxterm tries to determine the terminal device used by the process — probably to obtain the terminal capabilities later — by running the tty utility:
root#0c4c614de0ee:/# strace -f -e 'trace=execve,wait4' java -jar jmxterm-1.0.2-uber.jar
execve("/opt/java/openjdk/bin/java", ["java", "-jar", "jmxterm-1.0.2-uber.jar"], 0x7ffed3a53210 /* 36 vars */) = 0
...
[pid 432] execve("/usr/bin/tty", ["tty"], 0x7fff8ea39608 /* 36 vars */) = 0
[pid 433] wait4(432, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 432
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
The utility fails with the status of 1, which is likely the reason for the error message. Why?
root#0c4c614de0ee:/# strace -y tty
...
readlink("/proc/self/fd/0", "/dev/pts/3", 4095) = 10
stat("/dev/pts/3", 0x7ffe966f2160) = -1 ENOENT (No such file or directory)
...
write(1</dev/pts/3>, "not a tty\n", 10not a tty
) = 10
The utility says "not a tty" while we definitely have one. A quick check shows that... There is really no PTY device in the container though the standard streams of the shell are connected to one!
root#0c4c614de0ee:/# ls -l /proc/self/fd
total 0
lrwx------. 1 root root 64 Jun 3 21:26 0 -> /dev/pts/3
lrwx------. 1 root root 64 Jun 3 21:26 1 -> /dev/pts/3
lrwx------. 1 root root 64 Jun 3 21:26 2 -> /dev/pts/3
lr-x------. 1 root root 64 Jun 3 21:26 3 -> /proc/61/fd
root#0c4c614de0ee:/# ls -l /dev/pts
total 0
crw-rw-rw-. 1 root root 5, 2 Jun 3 21:26 ptmx
What if we check the same with latest Docker?
root#c0ebd608f79a:/# ls -l /proc/self/fd
total 0
lrwx------ 1 root root 64 Jun 3 21:45 0 -> /dev/pts/1
lrwx------ 1 root root 64 Jun 3 21:45 1 -> /dev/pts/1
lrwx------ 1 root root 64 Jun 3 21:45 2 -> /dev/pts/1
lr-x------ 1 root root 64 Jun 3 21:45 3 -> /proc/16/fd
root#c0ebd608f79a:/# ls -l /dev/pts
total 0
crw--w---- 1 root tty 136, 0 Jun 3 21:44 0
crw--w---- 1 root tty 136, 1 Jun 3 21:45 1
crw-rw-rw- 1 root root 5, 2 Jun 3 21:45 ptmx
Bingo! Now we have our PTYs where they should be, so jmxterm works well with latest Docker.
It seems pretty weird that with older Docker the processes are connected to some PTYs while there are no devices for them in /dev/pts, but tracing the Docker process explains why this happens. Older Docker allocates the PTY for the container before setting other things up (including entering the new mount namespace and mounting devpts into it or just entering the mount namespace in case of docker exec -it):
[vagrant#localhost ~]$ sudo strace -p $(pidof docker-containerd-current) -f -e trace='execve,mount,unshare,openat,ioctl'
...
[pid 3885] openat(AT_FDCWD, "/dev/ptmx", O_RDWR|O_NOCTTY|O_CLOEXEC) = 9
[pid 3885] ioctl(9, TIOCGPTN, [1]) = 0
[pid 3885] ioctl(9, TIOCSPTLCK, [0]) = 0
...
[pid 3898] unshare(CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|CLONE_NEWPID) = 0
...
[pid 3899] mount("devpts", "/var/lib/docker/overlay2/3af250a9f118d637bfba5701f5b0dfc09ed154c6f9d0240ae12523bf252e350c/merged/dev/pts", "devpts", MS_NOSUID|MS_NOEXEC, "newinstance,ptmxmode=0666,mode=0"...) = 0
...
[pid 3899] execve("/bin/bash", ["bash"], 0xc4201626c0 /* 7 vars */ <unfinished ...>
Note the newinstance mount option which ensures that the devpts mount owns its PTYs exclusively and does not share them with other mounts. This leads to the interesting effect: the PTY device for the container stays on the host and belongs to the host's devpts mount, while the containerized process still has access to it, as it obtained the already-open file descriptors just in the beginning of its life!
The latest Docker first mounts devpts for the container and then allocates the PTY, so the PTY belongs to container's devpts mount and is visible inside the container's filesystem:
$ sudo strace -p $(pidof containerd) -f -e trace='execve,mount,unshare,openat,ioctl'
...
[pid 14043] unshare(CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET) = 0
...
[pid 14044] mount("devpts", "/var/lib/docker/overlay2/b743cf16ab954b9a4b4005bca0aeaa019c4836c7d58d6073044e5b48446c3d62/merged/dev/pts", "devpts",
MS_NOSUID|MS_NOEXEC, "newinstance,ptmxmode=0666,mode=0"...) = 0
...
[pid 14044] openat(AT_FDCWD, "/dev/ptmx", O_RDWR|O_NOCTTY|O_CLOEXEC) = 7
[pid 14044] ioctl(7, TIOCGPTN, [0]) = 0
[pid 14044] ioctl(7, TIOCSPTLCK, [0]) = 0
...
[pid 14044] execve("/bin/bash", ["/bin/bash"], 0xc000203530 /* 4 vars */ <unfinished ...>
Well, the problem is caused by inappropriate Docker behavior, but how comes that older jmxterm worked well in the same environment? Let's check (note, that Java 8 image is used here, as older jmxterm does not play well with Java 11):
root#504a7757e310:/# wget https://github.com/jiaqi/jmxterm/releases/download/v1.0.0/jmxterm-1.0.0-uber.jar
root#504a7757e310:/# strace -f -e 'trace=execve,wait4' java -jar jmxterm-1.0.0-uber.jar
execve("/usr/local/openjdk-8/bin/java", ["java", "-jar", "jmxterm-1.0.0-uber.jar"], 0x7fffdcaebdd0 /* 10 vars */) = 0
...
[pid 310] execve("/bin/sh", ["sh", "-c", "stty -a < /dev/tty"], 0x7fff1f2a1cc8 /* 10 vars */) = 0
So, older jmxterm just uses /dev/tty instead of asking tty for the device name, and this works, as this device is present inside the container:
root#504a7757e310:/# ls -l /dev/tty
crw-rw-rw-. 1 root root 5, 0 Jun 3 21:36 /dev/tty
The huge difference between these versions of jmxterm is that newer tool version uses higher major version of jline, which is the library responsible for interaction with the terminal (akin to the readline in the C world). The difference between major jline versions leads to the difference in jmxterm's behavior, and current versions just rely on tty.
This observation leads us to the quick and dirty workaround which does not require neither updating Docker nor patching the jline/jmxterm tandem: we may just attach jmxterm's stdin to /dev/tty forcibly and thus make jline use this device (which is now referenced by /proc/self/fd/0) instead of the /dev/pts entry (which, formally, is not always correct, but is still enough for ad-hoc use):
root#0c4c614de0ee:/# java -jar jmxterm-1.0.2-uber.jar < /dev/tty
Welcome to JMX terminal. Type "help" for available commands.
$>bea<TAB>
bean beans
Now we have the autocompletion, history and other cool things we need to have.
If you are trying to run an interactive application (that needs tty) inside a docker container or a pod in kubernetes, then the following should work.
For docker-compose use:
image: image-name:2.0
container_name: container-name
restart: always
stdin_open: true
tty: true
For kubernetes use:
spec:
containers:
- name: web
image: web:latest
tty: true
stdin: true

Where exactly do the logs of kubernetes pods come from (at the container level)?

I'm looking to redirect some logs from a command run with kubectl exec to that pod's logs, so that they can be read with kubectl logs <pod-name> (or really, /var/log/containers/<pod-name>.log). I can see the logs I need as output when running the command, and they're stored inside a separate log directory inside the running container.
Redirecting the output (i.e. >> logfile.log) to the file which I thought was mirroring what is in kubectl logs <pod-name> does not update that container's logs, and neither does redirecting to stdout.
When calling kubectl logs <pod-name>, my understanding is that kubelet gets them from it's internal /var/log/containers/ directory. But what determines which logs are stored there? Is it the same process as the way logs get stored inside any other docker container?
Is there a way to examine/trace the logging process, or determine where these logs are coming from?
Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run.
In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:
First launch a pod running ubuntu that are sleeping forever:
$> kubectl run test --image=ubuntu --restart=Never -- sleep infinity
Exec into it
$> kubectl exec -it test bash
Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:
root#test:/# ps -auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7 0.0 0.0 18504 3400 pts/0 Ss 20:04 0:00 bash
root 19 0.0 0.0 34396 2908 pts/0 R+ 20:07 0:00 \_ ps -auxf
root 1 0.0 0.0 4528 836 ? Ss 20:03 0:00 sleep infinity
Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).
root#test:/# ls -lrt /dev/stdout
lrwxrwxrwx 1 root root 15 Nov 5 20:03 /dev/stdout -> /proc/self/fd/1
In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.
root#test:/# echo "Hello" > /proc/1/fd/1
Exiting the interactive shell and checking the logs using kubectl logs should now show the output
$> kubectl logs test
Hello

Monit bundle exec rails s

I have the following shell script that allows me to start my rails app, let's say it's called start-app.sh:
#!/bin/bash
cd /var/www/project/current
. /home/user/.rvm/environments/ruby-2.3.3
RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
the file above have permissions of:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
if i want to check the process I do the following:
ps aux | grep -v grep | grep ":4445"
it'd give me the following output:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
P.S: the reason i grep ":4445" is because i have few processes running on different ports. (for different projects)
now coming to monit, i used apt-get to install it, and the latest version from repo is 5.16, as i'm running on Ubuntu 16.04, also note that monit is running as root, that's why i specified the gid uid in the following. (because the start script is used to be executed from "user" and not "root")
Here's the configuration for monit:
set daemon 20 # check services at 20 seconds interval
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events # set the base directory where events will be stored
slots 100 # optionally limit the queue size
set mailserver xx.com port xxx
username "xx#xx.com" password "xxxxxx"
using tlsv12
with timeout 20 seconds
set alert xx#xx.com
set mail-format {
from: xx#xx.com
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
}
set limits {
programOutput: 51200 B
sendExpectBuffer: 25600 B
fileContentBuffer: 51200 B
networktimeout: 10 s
}
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 90% for 10 cycles then alert
if memory usage > 85% then alert
if swap usage > 35% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check process redis
matching "redis"
start program = "/bin/systemctl start redis"
stop program = "/bin/systemctl stop redis"
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
include /etc/monit/conf.d/*
include /etc/monit/conf-enabled/*
Now monit, is detecting and alerting me when the process goes down (if i kill it manually) and when it's manually recovered, but it won't start that shell script automatically.. and according to /var/log/monit.log, it's showing the following:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon
[UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started
[UTC Aug 13 10:16:43] error : 'myapp' process is not running
[UTC Aug 13 10:16:46] info : 'myapp' trying to restart
[UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash
[UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
So far what I see when monit tries to execute the script is that it tries to load it (i can see it for less than 3 seconds using ps aux | grep -v grep | grep ":4445", but this output is different from the above output i showed up, it shows the content of the shell script being executed and specifically this one:
blablalba... nohup bundle exec rails s -e production -p 4445
and then it disappears. then it tries to re-execute the shell.. again and again...
What am I missing, and what is wrong with my configuration? note that I can't change anything in the start-app.sh because it's on production and working 100%. (i just want to monitor it)
Edit: To my understanding and experience, it seems to be a Environment Variable issue or path issue, but i'm not sure how to solve it, it doesn't make any sense to put the env variables inside monit .. what if someone else wanted to edit that shell script or add something new? i hope you get my point
As i expected, it was user-environment issue and i solved it by editing monit configuration as below:
Before (not working)
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
After (working)
check process myapp
matching ":4445"
start program = "/bin/su -s /bin/bash -c '/home/user/start-app.sh' user"
stop program = "/bin/su -s /bin/bash -c '/home/user/stop-app.sh' user"
Explanation: i removed (uid and gid) as "user" from monit because it will only execute the shell script in the name of "user" but it won't get/import/use user's env path, or env variables.

Jenkins with Publish over SSH plugin, -1 exit status

I use Jenkins for build and plugin for deploy my artifacts to server. After deploying files I stopped service by calling eec in plugin
sudo service myservice stop
and I receive answer from Publish over SSH:
SSH: EXEC: channel open
SSH: EXEC: STDOUT/STDERR from command [sudo service myservice stop]...
SSH: EXEC: connected
Stopping script myservice
SSH: EXEC: completed after 200 ms
SSH: Disconnecting configuration [172.29.19.2] ...
ERROR: Exception when publishing, exception message [Exec exit status not zero. Status [-1]]
Build step 'Send build artifacts over SSH' changed build result to UNSTABLE
Finished: UNSTABLE
The build is failed but the service is stopped.
My /etc/init.d/myservice
#! /bin/sh
# /etc/init.d/myservice
#
# Some things that run always
# touch /var/lock/myservice
# Carry out specific functions when asked to by the system
case "$1" in
start)
echo "Starting myservice"
setsid /opt/myservice/bin/myservice --spring.config.location=/etc/ezd/application.properties --server.port=8082 >> /opt/myservice/app.log &
;;
stop)
echo "Stopping script myservice"
pkill -f myservice
#
;;
*)
echo "Usage: /etc/init.d/myservice {start|stop}"
exit 1
;;
esac
exit 0
Please say me why I get -1 exit status?
Well, the script is called /etc/init.d/myservice, so it matches the myservice pattern given to pkill -f. And because the script is waiting for the pkill to complete, it is still alive and gets killed and returns -1 for that reason (there is also the killing signal in the result of wait, but the Jenkins slave daemon isn't printing it).
Either:
come up with more specific pattern for pkill,
use proper pid-file or
switch to systemd, which can reliably kill exactly the process it started.
On this day and age, I recommend the last option. Systemd is simply lot more reliable than init scripts.
Yes, Jan Hudec is right. I call ps ax | grep myservice in Publish over SSH plugin:
83469 pts/5 Ss+ 0:00 bash -c ps ax | grep myservice service myservice stop
So pkill -f myservice will affect the process with PID 83469 which is parent for pkill. This is -1 status cause as I understand.
I changed pkill -f myservice to pkill -f "java.*myservice" and this solved my problem.

Not able to make Jenkins perforce plugin to work with ssh

I am not quite familiar with Jenkins but for some reason I am not able to make the perforce plugin to work. I will list down the problem and what I have tried so as to get a better understanding.
Jenkins Version - 1.561
Perforce Plugin Version - 1.3.27 (I have perforce path configured in Jenkins)
System - Ubuntu 10.04
Problem:
In the Source Code Management's Project Details section ( when you try to configure a new job ) I get "Unable to check workspace against depot" error.
P4PORT(hostname:port) - rsh:ssh -q -a -x -l p4ssh -q -x xxx.xxx.com /bin/true
Username - ialok
Password - N.A (Connection to SCM is via key authentication so left it blank)
Workspace(client) - ialok_jenkins
I let Jenkins create workspace and manage its view by checking the checkbox for both "Let Jenkins Create Workspace" and "Let Jenkins Manage Workspace View"
Client View Type is a View Map with the following mapping:
//sandbox/srkamise/... //ialok_jenkins/srkamise/...
I have the keys loaded prior to starting jenkins and the jenkins process runs as my user (ialok)
~$ ps aux | grep jenkins
ialok 16608 0.0 0.0 14132 552 ? Ss 11:08 0:00 /usr/bin/daemon --name=ialok --inherit --env=JENKINS_HOME=/var/lib/jenkins --output=/var/log/jenkins/jenkins.log --pidfile=/var/run/jenkins/jenkins.pid -- /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080 --ajp13Port=-1
ialok 16609 1.0 13.9 1448716 542156 ? Sl 11:08 1:04 /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080 --ajp13Port=-1
Additionally, I used envInject plugin and "Under Prepare an environment for the run" I added SSD_AGENT_PID, SSH_AUTH_SOCK, P4USER, P4PORT environment parameters. (I did try without envInject but faced the same issue)
It looks like some authentication problem as I double checked the path to p4 executable along with the project mapping and addition of keys to my environment.
Here is the log file indicating a failed run:
Started by user anonymous
[EnvInject] - Loading node environment variables.
[EnvInject] - Preparing an environment for the build.
[EnvInject] - Keeping Jenkins system variables.
[EnvInject] - Keeping Jenkins build variables.
[EnvInject] - Injecting as environment variables the properties content
P4CONFIG=.perforce
P4PORT=rsh:ssh -q -a -x -l p4ssh -q -x xxx.xxx.com /bin/true
P4USER=ialok
SSH_AGENT_PID=25752
SSH_AUTH_SOCK=/tmp/keyring-7GAS75/ssh
[EnvInject] - Variables injected successfully.
[EnvInject] - Injecting contributions.
Building in workspace /var/lib/jenkins/jobs/fin/workspace
Using master perforce client: ialok_jenkins
[workspace] $ /usr/bin/p4 workspace -o ialok_jenkins
Changing P4 Client Root to: /var/lib/jenkins/jobs/fin/workspace
Changing P4 Client View from:
Changing P4 Client View to:
//sandbox/srkamise/... //ialok_jenkins/srkamise/...
Saving new client ialok_jenkins
[workspace] $ /usr/bin/p4 -s client -i
Caught exception communicating with perforce. TCP receive failed. read: socket: Connection reset by peer
For Command: /usr/bin/p4 -s client -i
With Data:
===================
Client: ialok_jenkins
Description:
Root: /var/lib/jenkins/jobs/fin/workspace
Options: noallwrite clobber nocompress unlocked nomodtime rmdir
LineEnd: local
View:
//sandbox/srkamise/... //ialok_jenkins/srkamise/...
===================
com.tek42.perforce.PerforceException: TCP receive failed. read: socket: Connection reset by peer
For Command: /usr/bin/p4 -s client -i
With Data:
===================
Client: ialok_jenkins
Description:
Root: /var/lib/jenkins/jobs/fin/workspace
Options: noallwrite clobber nocompress unlocked nomodtime rmdir
LineEnd: local
View:
//sandbox/srkamise/... //ialok_jenkins/srkamise/...
===================
at com.tek42.perforce.parse.AbstractPerforceTemplate.saveToPerforce(AbstractPerforceTemplate.java:270)
at com.tek42.perforce.parse.Workspaces.saveWorkspace(Workspaces.java:77)
at hudson.plugins.perforce.PerforceSCM.saveWorkspaceIfDirty(PerforceSCM.java:1787)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:895)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1251)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513)
at hudson.model.Run.execute(Run.java:1709)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
ERROR: Unable to communicate with perforce. TCP receive failed. read: socket: Connection reset by peer
For Command: /usr/bin/p4 -s client -i
With Data:
===================
Client: ialok_jenkins
Description:
Root: /var/lib/jenkins/jobs/fin/workspace
Options: noallwrite clobber nocompress unlocked nomodtime rmdir
LineEnd: local
View:
//sandbox/srkamise/... //ialok_jenkins/srkamise/...
===================
Finished: FAILURE
The P4PORT typically is of the form 'hostname.port'. Examples would be:
workshop.perforce.com:1666
myserver.mycompany.net:2500
Here's some docs: http://www.perforce.com/perforce/doc.current/manuals/cmdref/P4PORT.html

Resources