Recently I come across a linux application design. The indent of the application is to log the ethernet frames via dumpcap <> api in linux. But they implemented as below:
Create a new process using fork()
Call dumpcap <> in execl() as shown below
a. execl("/bin/sh", "/bin/sh", "-c", dumpcap<>, NULL);
b. sudo dumpcap -i "eth0" -B 1 -b filesize:5 -w "/mnt/Test_1561890567.pcapng" -t -q
They send a SIGTERM to kill the process
The problem facing now is when ever we run the command from the process after 1184 or 1185 no:files then dumpcap stops logging. The process and thread is alive the command we can see in top command.
Related
I'm using pulumi to manage kubernetes deployments. One of the deployments runs an image which intercepts SIGINT and SIGTERM signals to perform a graceful shutdown like so (this example is running in my IDE):
{"level":"info","msg":"trying to activate gcloud service account","time":"2021-06-17T12:19:25-05:00"}
{"level":"info","msg":"env var not found","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Started Worker","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","Signal":"interrupt","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Worker has been stopped.","time":"2021-06-17T12:19:27-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Stopped Worker","time":"2021-06-17T12:19:27-05:00"}
Notice the "Signal":"interrupt" with a message of Worker has been stopped.
I find that when I alter the source code (which alters the docker image) and run pulumi up the pod doesn't gracefully terminate based on what's described in this blog post. Here's a screenshot of logs from GCP:
The highlighted log line in the image above is the first log line emitted by the app. Note that the shutdown messages aren't logged above the highlighted line which suggests to me that the pod isn't given a chance to perform a graceful shutdown.
Why might the pod not go through the graceful shutdown mechanisms that kubernetes offers? Could this be a bug with how pulumi performs updates to deployments?
EDIT: after doing more investigation I found that this problem is happening because starting a docker container with go run /path/to/main.go actually ends up created two processes like so (after execing into the container):
root#worker-ffzpxpdm-78b9797dcd-xsfwr:/gadic# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.3 2046200 30828 ? Ssl 18:04 0:12 go run src/cmd/worker/main.go --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --grpc-hos
root 3782 0.0 0.5 1640772 43232 ? Sl 18:06 0:00 /tmp/go-build2661472711/b001/exe/main --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --
root 3808 0.1 0.0 4244 3468 pts/0 Ss 19:07 0:00 /bin/bash
root 3817 0.0 0.0 5900 2792 pts/0 R+ 19:07 0:00 ps aux
If run kill -TERM 1 then the signal isn't forwarded to the underlying binary, /tmp/go-build2661472711/b001/exe/main, which means the graceful shutdown of the application isn't executed. However, if I run kill -TERM 3782 then the graceful shutdown logic is executed.
It seems the go run spawns a subprocess and this blog post suggests the signals are only forwarded to PID 1. On top of that, it's unfortunate that go run doesn't forward signals to the subprocess it spawns.
The solution I found is to add RUN go build -o worker /path/to/main.go in my dockerfile and then to start the docker container with ./worker --arg1 --arg2 instead of go run /path/to/main.go --arg1 --arg2.
Doing it this way ensures there aren't any subprocess spawns by go and that ensures signals are handled properly within the docker container.
I have the following shell script that allows me to start my rails app, let's say it's called start-app.sh:
#!/bin/bash
cd /var/www/project/current
. /home/user/.rvm/environments/ruby-2.3.3
RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
the file above have permissions of:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
if i want to check the process I do the following:
ps aux | grep -v grep | grep ":4445"
it'd give me the following output:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
P.S: the reason i grep ":4445" is because i have few processes running on different ports. (for different projects)
now coming to monit, i used apt-get to install it, and the latest version from repo is 5.16, as i'm running on Ubuntu 16.04, also note that monit is running as root, that's why i specified the gid uid in the following. (because the start script is used to be executed from "user" and not "root")
Here's the configuration for monit:
set daemon 20 # check services at 20 seconds interval
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events # set the base directory where events will be stored
slots 100 # optionally limit the queue size
set mailserver xx.com port xxx
username "xx#xx.com" password "xxxxxx"
using tlsv12
with timeout 20 seconds
set alert xx#xx.com
set mail-format {
from: xx#xx.com
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
}
set limits {
programOutput: 51200 B
sendExpectBuffer: 25600 B
fileContentBuffer: 51200 B
networktimeout: 10 s
}
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 90% for 10 cycles then alert
if memory usage > 85% then alert
if swap usage > 35% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check process redis
matching "redis"
start program = "/bin/systemctl start redis"
stop program = "/bin/systemctl stop redis"
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
include /etc/monit/conf.d/*
include /etc/monit/conf-enabled/*
Now monit, is detecting and alerting me when the process goes down (if i kill it manually) and when it's manually recovered, but it won't start that shell script automatically.. and according to /var/log/monit.log, it's showing the following:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon
[UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started
[UTC Aug 13 10:16:43] error : 'myapp' process is not running
[UTC Aug 13 10:16:46] info : 'myapp' trying to restart
[UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash
[UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
So far what I see when monit tries to execute the script is that it tries to load it (i can see it for less than 3 seconds using ps aux | grep -v grep | grep ":4445", but this output is different from the above output i showed up, it shows the content of the shell script being executed and specifically this one:
blablalba... nohup bundle exec rails s -e production -p 4445
and then it disappears. then it tries to re-execute the shell.. again and again...
What am I missing, and what is wrong with my configuration? note that I can't change anything in the start-app.sh because it's on production and working 100%. (i just want to monitor it)
Edit: To my understanding and experience, it seems to be a Environment Variable issue or path issue, but i'm not sure how to solve it, it doesn't make any sense to put the env variables inside monit .. what if someone else wanted to edit that shell script or add something new? i hope you get my point
As i expected, it was user-environment issue and i solved it by editing monit configuration as below:
Before (not working)
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
After (working)
check process myapp
matching ":4445"
start program = "/bin/su -s /bin/bash -c '/home/user/start-app.sh' user"
stop program = "/bin/su -s /bin/bash -c '/home/user/stop-app.sh' user"
Explanation: i removed (uid and gid) as "user" from monit because it will only execute the shell script in the name of "user" but it won't get/import/use user's env path, or env variables.
Assume the following:
I have a program myprogram inside a docker container
I'm running the docker container with
docker run --privileged=true my-label/my-container
Inside the container - the program is being run with:
strace -f -e trace=desc ./myprogram
What I see is that the strace (despite having the -f on) doesn't follow all the child processes.
I see the following output from strace
[pid 10] 07:36:46.668931 write(2, "..\n"..., 454 <unfinished ...>
<stdout of ..>
<stdout other output - but I don't see the write commands - so probably from a child process>
[pid 10] 07:36:46.669684 write(2, "My final output\n", 24 <unfinished ...>
<stdout of My final output>
What I want to see is the other write commands.
Now I should see the the other write commands - because I'm using -f.
What I think is happening is that running inside docker makes the process handling and security different.
My question is: Does strace -f work differently when run inside a docker container?
Note that this application starts and stops in 2 seconds - so the tracing tool has to follow the application lifecycle - like strace does. Connecting to a server background process won't work.
It turns out strace truncates string output - you have to explicitly tell it that you want more than the first n (10?) string chars. You do this with -s 800.
strace -s 800 -ff ./myprogram
You can also get all the write commands by asking strace explicitly with -e write.
strace -s 800 -ff -e write ./myprogram
I have a python/Django project (myproject) running on nginx and uwsgi.
I am running uwsgi command via supervisord. This works perfectly, but on restarting supervisord it creates zombie process. what am i doing wrong? What am I overlooking to do this cleanly? any Advise?
Often times supervisor service takes too long. at that point I have found the following in supervisor.log file
INFO waiting for stage2_BB_wsgi, stage3_BB_wsgi, stage4_BB_wsgi to die
Point to Note: I am running multiple staging server in one machine, namely stage2 .. stageN
supervisor.conf file extract
[program:stage2_BB_wsgi]
command=uwsgi --close-on-exec -s /home/black/stage2/shared_locks/uwsgi_bb.sock --touch-reload=/home/black/stage2/shared_locks/reload_uwsgi --listen 10 --chdir /home/black/stage2/myproject/app/ --pp .. -w app.wsgi -C666 -H /home/black/stage2/myproject/venv/
user=black
numprocs=1
stdout_logfile=/home/black/stage2/logs/%(program_name)s.log
stderr_logfile=/home/black/stage2/logs/%(program_name)s.log
autostart=true
autorestart=true
startsecs=10
exitcodes=1
stopwaitsecs=600
killasgroup=true
priority=1000
thanks in advance.
You will want to set your stopsignal to INT or QUIT.
By default supervisord sends out a SIGTERM when restarting a program. This will not kill uwsgi, only reload it and its workers.
When I ping one site it returns "Request timed out". I want to make little program that will inform me (sound beep or something like that) when this server is online again. No matter in which language. I think it should be very simple script with a several lines of code. So how to write it?
Some implementations of ping allow you to specify conditions for exiting after receipt of packets:
On Mac OS X, use ping -a -o $the_host
ping will keep trying (by default)
-a means beep when a packet is received
-o means exit when a packet is received
On Linux (Ubuntu at least), use ping -a -c 1 -w inf $the_host
-a means beep when a packet is received
-c 1 specifies the number of packets to send before exit (in this case 1)
-w inf specifies the deadline for when ping exits no matter what (in this case Infinite)
when -c and -w are used together, -c becomes number of packets received before exit
Either can be chained to perform your next command, e.g. to ssh into the server as soon as it comes up (with a gap between to allow sshd to actually start up):
# ping -a -o $the_host && sleep 3 && ssh $the_host
Don't forget the notify sound like echo"^G"! Just to be different - here's Windows batch:
C:\> more pingnotify.bat
:AGAIN
ping -n 1 %1%
IF ERRORLEVEL 1 GOTO AGAIN
sndrec32 /play /close "C:\Windows\Media\Notify.wav"
C:\> pingnotify.bat localhost
:)
One way is to run ping is a loop, e.g.
while ! ping -c 1 host; do sleep 1; done
(You can redirect the output to /dev/null if you want to keep it quiet.)
On some systems, such as Mac OS X, ping may also have the options -a -o (as per another answer) available which will cause it to keep pinging until a response is received. However, the ping on many (most?) Linux systems does not have the -o option and the kind of equivalent -c 1 -w 0 still exits if the network returns an error.
Edit: If the host does not respond to ping or you need to check the availability of service on a certain port, you can use netcat in the zero I/O mode:
while ! nc -w 5 -z host port; do sleep 1; done
The -w 5 specifies a 5 second timeout for each individual attempt. Note that with netcat you can even list multiple ports (or port ranges) to scan when some of them becomes available.
Edit 2: The loops shown above keep trying until the host (or port) is reached. Add your alert command after them, e.g. beep or pop-up a window.