I have the following shell script that allows me to start my rails app, let's say it's called start-app.sh:
#!/bin/bash
cd /var/www/project/current
. /home/user/.rvm/environments/ruby-2.3.3
RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
the file above have permissions of:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
if i want to check the process I do the following:
ps aux | grep -v grep | grep ":4445"
it'd give me the following output:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
P.S: the reason i grep ":4445" is because i have few processes running on different ports. (for different projects)
now coming to monit, i used apt-get to install it, and the latest version from repo is 5.16, as i'm running on Ubuntu 16.04, also note that monit is running as root, that's why i specified the gid uid in the following. (because the start script is used to be executed from "user" and not "root")
Here's the configuration for monit:
set daemon 20 # check services at 20 seconds interval
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events # set the base directory where events will be stored
slots 100 # optionally limit the queue size
set mailserver xx.com port xxx
username "xx#xx.com" password "xxxxxx"
using tlsv12
with timeout 20 seconds
set alert xx#xx.com
set mail-format {
from: xx#xx.com
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
}
set limits {
programOutput: 51200 B
sendExpectBuffer: 25600 B
fileContentBuffer: 51200 B
networktimeout: 10 s
}
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 90% for 10 cycles then alert
if memory usage > 85% then alert
if swap usage > 35% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check process redis
matching "redis"
start program = "/bin/systemctl start redis"
stop program = "/bin/systemctl stop redis"
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
include /etc/monit/conf.d/*
include /etc/monit/conf-enabled/*
Now monit, is detecting and alerting me when the process goes down (if i kill it manually) and when it's manually recovered, but it won't start that shell script automatically.. and according to /var/log/monit.log, it's showing the following:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon
[UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started
[UTC Aug 13 10:16:43] error : 'myapp' process is not running
[UTC Aug 13 10:16:46] info : 'myapp' trying to restart
[UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash
[UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
So far what I see when monit tries to execute the script is that it tries to load it (i can see it for less than 3 seconds using ps aux | grep -v grep | grep ":4445", but this output is different from the above output i showed up, it shows the content of the shell script being executed and specifically this one:
blablalba... nohup bundle exec rails s -e production -p 4445
and then it disappears. then it tries to re-execute the shell.. again and again...
What am I missing, and what is wrong with my configuration? note that I can't change anything in the start-app.sh because it's on production and working 100%. (i just want to monitor it)
Edit: To my understanding and experience, it seems to be a Environment Variable issue or path issue, but i'm not sure how to solve it, it doesn't make any sense to put the env variables inside monit .. what if someone else wanted to edit that shell script or add something new? i hope you get my point
As i expected, it was user-environment issue and i solved it by editing monit configuration as below:
Before (not working)
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
After (working)
check process myapp
matching ":4445"
start program = "/bin/su -s /bin/bash -c '/home/user/start-app.sh' user"
stop program = "/bin/su -s /bin/bash -c '/home/user/stop-app.sh' user"
Explanation: i removed (uid and gid) as "user" from monit because it will only execute the shell script in the name of "user" but it won't get/import/use user's env path, or env variables.
Related
I'm looking to redirect some logs from a command run with kubectl exec to that pod's logs, so that they can be read with kubectl logs <pod-name> (or really, /var/log/containers/<pod-name>.log). I can see the logs I need as output when running the command, and they're stored inside a separate log directory inside the running container.
Redirecting the output (i.e. >> logfile.log) to the file which I thought was mirroring what is in kubectl logs <pod-name> does not update that container's logs, and neither does redirecting to stdout.
When calling kubectl logs <pod-name>, my understanding is that kubelet gets them from it's internal /var/log/containers/ directory. But what determines which logs are stored there? Is it the same process as the way logs get stored inside any other docker container?
Is there a way to examine/trace the logging process, or determine where these logs are coming from?
Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run.
In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:
First launch a pod running ubuntu that are sleeping forever:
$> kubectl run test --image=ubuntu --restart=Never -- sleep infinity
Exec into it
$> kubectl exec -it test bash
Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:
root#test:/# ps -auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7 0.0 0.0 18504 3400 pts/0 Ss 20:04 0:00 bash
root 19 0.0 0.0 34396 2908 pts/0 R+ 20:07 0:00 \_ ps -auxf
root 1 0.0 0.0 4528 836 ? Ss 20:03 0:00 sleep infinity
Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).
root#test:/# ls -lrt /dev/stdout
lrwxrwxrwx 1 root root 15 Nov 5 20:03 /dev/stdout -> /proc/self/fd/1
In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.
root#test:/# echo "Hello" > /proc/1/fd/1
Exiting the interactive shell and checking the logs using kubectl logs should now show the output
$> kubectl logs test
Hello
After starting a SCADA LTS Docker container as suggested on https://github.com/SCADA-LTS/Scada-LTS with the following command:
docker run -it -e DOCKER_HOST_IP=docker-machine ip-p 81:8080 scadalts/scadalts /root/start.sh
...The container works well for some time and then suddenly a "HTTP Status 404" error is shown, like the following:
http://[IP]/ScadaBR/
HTTP Status 404 - /ScadaBR/
type Status report
message /ScadaBR/
description The requested resource is not available.
Apache Tomcat/7.0.85
Where [IP] is the default Docker IP address and port, most of the times is localhost:81.
Any idea how to solve it?
Thank you in advance!
TL;DR
After some time running the MySQLservice dies. Is necessary to restart it manually with this:
docker exec scada service mysql restart
docker exec scada killall tail
DETAILED REPORT
When the error is shown, you can check if all the services are running on the container (in this case named 'scada'):
>docker exec scada ps -A
PID TTY TIME CMD
1 ? 00:00:00 start.sh
790 ? 01:00:22 java
791 ? 00:01:27 tail
858 ? 00:00:00 ps
As can be seen, no MySQL service is running. This explains why Tomcat is running but SCADA-LTS don't.
You can restart MySQL service inside the container with:
docker exec scada service mysql restart
After that SCADA-LTS is still down and you have to restart tomcat which can be done in this way:
docker exec scada killall tail
After a minute or less, all the services are running:
>docker exec scada ps -A
PID TTY TIME CMD
1 ? 00:00:00 start.sh
43 ? 00:00:00 mysqld_safe
398 ? 00:00:00 mysqld
481 ? 00:00:31 java
482 ? 00:00:00 sleep
618 ? 00:00:00 ps
Now SCADA-LTS is running!
I am trying to bypass the login page on RStudio as we are running it in a Docker container and this step is not necessary as we authenticate before we let users launch the container.
I am using the Rocker implementation of RStudio for Docker. We are running on Centos7.
I'm fairly new to SO, so please let me know what information would be helpful for answering the question.
I figured it out.
When you start rserver, add the flag --auth-none=1, so my final CMD in my Dockerfile looked like:
USER rstudio
CMD ["/usr/lib/rstudio-server/bin/rserver","--server-daemonize=0","--auth-none=1"]
I will caution though, the first time I did it, I ran with sudo -E in front of the command and it logged into RStudio as ROOT! (this is also because I had altered the /etc/rstudio/rserver.conf with the setting auth-minimum-user-id=0 because I was trying to get the error to go away (which it did :)
The above code will change to user 'rstudio' before running the command which will log you straight in as rstudio.
Hope that helps someone out there, I know I spent the better portion of my day finding a work-around!
To bypass the login page you need also to define the environment variable USER.
need to set system environmental variable USER=rstudio in order for --auth-none 1
-- https://github.com/rstudio/rstudio/issues/1663
Here is a snippet of Dockerfile permitting to run the RStudio server and to login as the user rstudio.
ENV USER="rstudio"
CMD ["/usr/lib/rstudio-server/bin/rserver", "--server-daemonize", "0", "--auth-none", "1"]
When it's run the login page is not displayed and we can check that the server and the session are running with the rstudio user.
# Run the container
docker run --name rstudio --rm -p 8787:8787 -d rstudio
# Check processes
docker exec -it rstudio ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
rstudio+ 1 0.1 0.3 210792 13844 ? Ssl 21:10 0:00 /usr/lib/rstudi
rstudio 49 0.7 2.3 555096 82312 ? Sl 21:10 0:03 /usr/lib/rstudi
root 570 0.0 0.1 45836 3744 pts/0 Rs+ 21:18 0:00 ps aux
Telegraf v1.0.1
I'm not able to see telegraf[._] (tree) metric anymore after I enabled [[inputs.procstat]] plugin.
Telegraf is installed successfully. Process is running. I'm pretty much using the normal settings for inputs plugins and output plugin.
This is what I got:
ubuntu#jenkins:/tmp/giga_aks_testing/ansible$ grep -C 2 jenkins /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf; echo ; ps -eAf|grep jenkins; echo; pgrep -f jenkins; echo; cat -n /var/log/telegraf/telegraf.log; echo date; echo; ps -eAf|grep telegraf; echo ; sudo service telegraf status
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
root 2875 3685 0 2016 pts/3 00:00:00 sudo su jenkins
root 2876 2875 0 2016 pts/3 00:00:00 su jenkins
jenkins 2877 2876 0 2016 pts/3 00:00:00 bash
jenkins 11645 1 0 2016 ? 00:00:01 /usr/bin/daemon --name=jenkins --inherit --env=JENKINS_HOME=/var/lib/jenkins --output=/var/log/jenkins/jenkins.log --pidfile=/var/run/jenkins/jenkins.pid -- /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
jenkins 11647 11645 0 2016 ? 05:33:22 /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
ubuntu 21973 26885 0 06:57 pts/0 00:00:00 grep --color=auto jenkins
2875
2876
11645
11647
1 2017-01-07T06:54:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
2 2017-01-07T06:55:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
3 2017-01-07T06:56:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
4 2017-01-07T06:57:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
date
telegraf 19336 1 0 05:45 pts/0 00:00:04 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraftelegraf.d
ubuntu 21977 26885 0 06:57 pts/0 00:00:00 grep --color=auto telegraf
telegraf Process is running [ OK ]
ubuntu#jenkins:/tmp/giga_aks_testing/ansible$
Why, the log file is showing an error when the jenkins process is running and pgrep -f jenkins is returning valid result.
PS: [[inputs.procstat]] plugin uses pgrep -f <exe_value_pattern> for it's logic if pattern = method is used, and pgrep <executable> if exe = method is used.
The full /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf file is:
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "telegraf"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "sshd"
prefix = "pgrep_serviceprocess"
OK. Seems like this is an OPEN bug.
Telegraf with [[inputs.procstat]] plugin entry won't barf if there's only one plugin in one file.
If you specify multiple entries, even if those exe = <executables_processes> are running, Telegraf will start spitting those errors out (PS: It won't stop Telegraf service from working though).
To fix the errors, this is what I did:
[[inputs.procstat]]
exe = "telegraf|.*"
prefix = "pgrep_serviceprocess"
Now, as pgrep is used for Telegraf's [[inputs.procstat]] plugin, it'll do this at OS level: pgrep "telegraf|.*".
Now, you can also just give exe = "." (simplest) or like exe = ".*" but practically those will not be easy to find out who actually is trying to do a grep on all processes running on the system.
NOTE: .* (will find every single processes running on the machine), so use it until we get a proper fix for this.
Related Source code Github file: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/procstat/procstat.go
Related issue: https://github.com/influxdata/telegraf/issues/586
I still couldn't find, why "telegraf.x.x" metrics are not available after I enabled [[inputs.procstat]] input. Is that due to a separate file? I'm not sure. But, I can see procstat.x.x metric tree but telegraf.x.x metric tree is not visible now.
OR better,
One can also use:
[[inputs.procstat]]
pattern = "."
prefix = "pgrep_serviceprocess"
The above will do: pgrep -f "." where pattern is . (to catch everything aka every processs/cmd/service running on a machine).
OR (but the following is not scalable solution as you have to know for which user. In some boxes, Jenkins may be running using a user other than jenkins).
[[inputs.procstat]]
user = "jenkins"
prefix = "pgrep_serviceprocess"
The above will do: pgrep -u "jenkins" where user is jenkins (to catch everything aka every processs/cmd/service running on a machine).
To check whether jenkins is running or not or if enhanceio is running or not, you can use [[inputs.exec]] plugin as well. I simply used: [[inputs.filestat]] plugin and it worked when I looked for the pid file for both tools. https://github.com/influxdata/telegraf/tree/master/plugins/inputs/filestat
I'm trying to use Monit to watch a Thin Rails application server process. Here is my Monit config:
check process thin-3000
with pidfile /var/www/apps/myapp/shared/pids/thin.3000.pid
start program = "/bin/su - deploy -c 'thin start -C /etc/thin/myapp.yml -o 3000'"
stop program = "/bin/su - deploy -c 'thin stop -C /etc/thin/myapp.yml -o 3000'"
if failed port 3000 then restart
group thin
This actually works. If I kill the Thin server process, Monit will faithfully restart it. However, if I watch the Monit log it keeps outputting the following over and over again:
[UTC Jan 14 23:01:04] error : 'thin-3000' process is not running
[UTC Jan 14 23:01:04] info : 'thin-3000' trying to restart
[UTC Jan 14 23:01:04] info : 'thin-3000' start: /bin/su
[UTC Jan 14 23:01:34] error : 'thin-3000' failed to start
It looks like whatever mechanism it's using to check if the process is running isn't working correctly. Do you know what I might be doing wrong?