I'm doing some processing inside a job which ends up executing an external shell command. The command is executing a script that takes hours to finish.
Problem is that after I start the script using spawn and detach the script stops execution if I shut down the sidekiq job using a kill -15 signal. This behaviour is only occurring if the spawn command is fired by sidekiq - not if I do it in irb and close the console. So somehow it's still bound to sidekiq it seems - but why and how to avoid it?.
test.sh
#!/bin/bash
for a in `seq 1000` ; do
echo "$a "
sleep 1
done
spawn_test_job.rb
module WorkerJobs
class SpawnTestJob < CountrySpecificWorker
sidekiq_options :queue => :my_jobs, :retry => false
def perform version
logfile = "/home/deployer/test_#{version}.log"
pid = spawn(
"cd /home/deployer &&
./test.sh
",
[:out, :err] => logfile
)
Process.detach(pid)
end
end
end
I enqueue the job WorkerJobs::SpawnTestJob.perform_async(1) and if I tail the test_1.log I can see my counter going on. However when I send sidekiq the kill -15 the counter stops and the script pid disappears.
After hours of debugging I ended up finding that systemd was causing this. The process started inside sidekiq got the sidekiq cgroup and whenever you kill a process the default killmode is control-group.
deployer#srv-14:~$ ps -efj | grep test.sh
UID PID PPID PGID SID C STIME TTY TIME CMD
deployer 16679 8455 16678 8455 0 12:59 pts/0 00:00:00 grep --color=auto test.sh
deployer 24904 30861 24904 30861 0 12:52 ? 00:00:00 sh -c cd /home/deployer && ./test.sh
deployer 24906 24904 24904 30861 0 12:52 ? 00:00:00 /bin/bash ./test.sh
deployer 6382 1 6382 6382 38 12:53 ? 00:02:14 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 7787 1 7787 7787 30 12:46 ? 00:04:07 sidekiq 4.2.10 my_proj [6 of 8 busy]
deployer 13680 1 13680 13680 29 12:49 ? 00:03:08 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 14372 1 14372 14372 38 12:49 ? 00:03:48 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 16719 8455 16718 8455 0 12:59 pts/0 00:00:00 grep --color=auto sidekiq
deployer 17678 1 17678 17678 38 12:50 ? 00:03:22 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 18023 1 18023 18023 32 12:50 ? 00:02:49 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 18349 1 18349 18349 34 12:43 ? 00:05:32 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 18909 1 18909 18909 34 12:51 ? 00:02:53 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 22956 1 22956 22956 39 12:01 ? 00:22:42 sidekiq 4.2.10 my_proj [8 of 8 busy]
deployer 30861 1 30861 30861 46 12:00 ? 00:27:23 sidekiq 4.2.10 my_proj [8 of 8 busy]
and
cat /proc/24904/cgroup
11:perf_event:/
10:blkio:/
9:pids:/system.slice
8:devices:/system.slice/system-my_proj\x2dsidekiq.slice
7:cpuset:/
6:freezer:/
5:memory:/
4:cpu,cpuacct:/
3:net_cls,net_prio:/
2:hugetlb:/
1:name=systemd:/system.slice/system-my_proj\x2dsidekiq.slice/my_proj-sidekiq#9.service
I fixed the problem by instructing my sidekiq service that the KillMode is process
References:
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
https://www.freedesktop.org/software/systemd/man/systemd.kill.html
Related
[root#k8s001 ~]# docker exec -it f72edf025141 /bin/bash
root#b33f3b7c705d:/var/lib/ghost# ps aux`enter code here`
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1012 4 ? Ss 02:45 0:00 /pause
root 8 0.0 0.0 10648 3400 ? Ss 02:57 0:00 nginx: master process nginx -g daemon off;
101 37 0.0 0.0 11088 1964 ? S 02:57 0:00 nginx: worker process
node 38 0.9 0.0 2006968 116572 ? Ssl 02:58 0:06 node current/index.js
root 108 0.0 0.0 3960 2076 pts/0 Ss 03:09 0:00 /bin/bash
root 439 0.0 0.0 7628 1400 pts/0 R+ 03:10 0:00 ps aux
The display come from internet, it says pause container is the parent process of other containers in the pod, if you attach pod or other containers, do ps aux, you would see that.
Is it correct, I do in my k8s,different, PID 1 is not /pause.
...Is it correct, I do in my k8s,different, PID 1 is not /pause.
This has changed, pause no longer hold PID 1 despite being the first container created by the container runtime to setup the pod (eg. cgroups, namespace etc). Pause is isolated (hidden) from the rest of the containers in the pod regardless of your ENTRYPOINT/CMD. See here for more background information.
By default, Docker will run your entrypoint (or the command, if there is no entrypoint) as PID 1. However, that is not necessarily always the case, since, depending on how you start the container, Docker (or your orchestrator) can also run its custom init process as PID 1:
$ docker run -d --init --name test alpine sleep infinity
849efe38ecec439550738e981065ec4aff55ef5607f03b9fed975e2d3146b9b0
$ with-docker docker exec -ti test ps
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- sleep infinity
7 root 0:00 sleep infinity
8 root 0:00 ps
For more information on why you would want your entrypoint not to be PID 1, you can check this explanation from a tini developer:
Now, unlike other processes, PID 1 has a unique responsibility, which is to reap zombie processes.
Zombie processes are processes that:
Have exited.
Were not waited on by their parent process (wait is the syscall parent processes use to retrieve the exit code of their children).
Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.
I am able to run containers fine with this combination.
But I noticed - there is no /etc/docker directory on the linux side and when I do ps -eF I get this. I was expecting dockerd and container processes as children of dockerd
rookie#MAIBENBEN-PC:/mnt/c/Users/rookie$ ps -eF
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 1 0 0 223 580 6 04:07 ? 00:00:00 /init
root 98 1 0 223 80 5 04:07 ? 00:00:00 /init
root 99 98 0 223 80 5 04:07 ? 00:00:00 /init
rookie 100 99 0 191067 43220 0 04:07 pts/0 00:00:00 docker serve --address unix:///home/rookie/.docker/run/d
root 101 98 0 0 0 1 04:07 ? 00:00:00 [init] <defunct>
root 103 98 0 223 80 7 04:07 ? 00:00:00 /init
root 104 103 0 384463 28888 0 04:07 pts/1 00:00:00 /mnt/wsl/docker-desktop/docker-desktop-proxy --distro-na
root 142 1 0 223 80 4 05:17 ? 00:00:00 /init
root 143 142 0 223 80 6 05:17 ? 00:00:00 /init
rookie 144 143 0 2509 5048 2 05:17 pts/2 00:00:00 -bash
rookie 221 144 0 2654 3264 7 05:21 pts/2 00:00:00 ps -eF
Your Ubuntu session (and all WSL2 sessions) are set up as docker clients, but the actual docker daemon is running in a separate WSL session named "docker-desktop".
I generally recommend leaving this instance alone, as it is auto-configured and managed by Docker Desktop, but if you really want to take a look, run:
wsl -d docker-desktop
... from PowerShell, CMD, or Windows Start/Run.
Note that this instance is running BusyBox, so some commands will be different than you expect. For instance, the -F argument is not valid for ps.
You'll see dockerd and the associated containerd processes here.
There's also a separate image, docker-desktop-data, but it is not bootable (there is no init in it). If you want to see the filesystem, at least, you can wsl --export it and examine the tar file that is created. I wrote up an answer on Super User with details a few months ago.
I have launched several docker containers and using docker stats, I have verified that one of them increases the consumption of ram memory since it starts until it is restarted.
My question is if there is any way to verify where such consumption comes from within the docker container. There is some way to check the consumption inside the container, something of the docker stats style but for the inside of the container.
Thanks for your cooperation.
Not sure if it's what you are asking for, but here's an example:
(Before your start):
Run a test container docker run --rm -it ubuntu
Install stress by typing apt-get update and apt-get install stress
Run stress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1 (it will start consuming memory)
1. with top
If you go to a new terminal you can type docker container exec -it <your container name> top and you will get something like the following:
(notice that the %MEM usage of PID 285 is 68.8%)
docker container exec -it dreamy_jang top
top - 12:46:04 up 22 min, 0 users, load average: 1.48, 1.55, 1.12
Tasks: 4 total, 2 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 20.8 us, 0.8 sy, 0.0 ni, 78.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 6102828 total, 150212 free, 5396604 used, 556012 buff/cache
KiB Swap: 1942896 total, 1937508 free, 5388 used. 455368 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
285 root 20 0 4209376 4.007g 212 R 100.0 68.8 6:56.90 stress
1 root 20 0 18500 3148 2916 S 0.0 0.1 0:00.09 bash
274 root 20 0 36596 3072 2640 R 0.0 0.1 0:00.21 top
284 root 20 0 8240 1192 1116 S 0.0 0.0 0:00.00 stress
2. with ps aux
Again, from a new terminal you type docker container exec -it <your container name> ps aux
(notice that the %MEM usage of PID 285 is 68.8%)
docker container exec -it dreamy_jang ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18500 3148 pts/0 Ss 12:25 0:00 /bin/bash
root 284 0.0 0.0 8240 1192 pts/0 S+ 12:39 0:00 stress --vm-byt
root 285 99.8 68.8 4209376 4201300 pts/0 R+ 12:39 8:53 stress --vm-byt
root 286 0.0 0.0 34400 2904 pts/1 Rs+ 12:48 0:00 ps aux
My source for this stress thing is from this question: How to fill 90% of the free memory?
I'm trying to get a bare bones Rails app deployed under Apache, Passenger 3.0.0 and Rails 3.0.3. I'm getting all kinds of weird errors. mostly revolving around what I think is related to bundler or RAILS_ENV.
Only non-default thing about the app is that development & test environments use SQLite3 and production uses MySQL.
When hitting app from web browser Passenger is throwing errors regarding gems (sqlite3) that are specifically (in the Gemfile AND in the database.yml) declared as NOT part of the production environment.
How can I tell what user the server is trying to run my Rails app as? I would like to make sure the RAILS_ENV is set correctly for that user as I think Passenger is trying to run this app in development mode for some reason.
Edit: added results of ps aux | grep httpd
myserver:current elvis$ ps aux | grep httpd
elvis 4424 0.4 0.0 66152 192 s000 S+ 11:03AM 0:00.00 grep httpd
_www 1950 0.0 0.2 93024 2544 ?? S 11:40PM 0:01.23 /usr/sbin/httpd -D FOREGROUND
root 1918 0.0 1.0 93024 10244 ?? Ss 11:39PM 0:02.75 /usr/sbin/httpd -D FOREGROUND
_www 4084 0.0 0.2 93024 2536 ?? S 9:41AM 0:00.15 /usr/sbin/httpd -D FOREGROUND
and ls -l ...
myserver:current elvis$ ls -l config
total 48
-rw-rw-r-- 1 aaron admin 1923 Nov 19 21:40 application.rb
-rw-rw-r-- 1 aaron admin 326 Nov 19 21:40 boot.rb
-rw-rw-r-- 1 aaron admin 741 Nov 19 21:40 database.yml
-rw-rw-r-- 1 aaron admin 1257 Nov 19 21:40 deploy.rb
-rw-rw-r-- 1 aaron admin 149 Nov 19 21:40 environment.rb
drwxrwxr-x 5 aaron admin 170 Nov 19 21:40 environments
drwxrwxr-x 7 aaron admin 238 Nov 19 21:40 initializers
drwxrwxr-x 3 aaron admin 102 Nov 19 21:40 locales
-rw-rw-r-- 1 aaron admin 1808 Nov 19 21:40 routes.rb
By default, passenger will run your app as the user who owns the config/environment.rb or config.ru file, see http://www.modrails.com/documentation/Users%20guide%20Apache.html#user_switching
Passenger will run in the production environment by default unless you tell it otherwise with the RailsEnv, see http://www.modrails.com/documentation/Users%20guide%20Apache.html#rails_env
You could run ps aux | grep httpd to see what user is running your apache process.
That's what I use to detect what user should run the rails app
RAILS_USER=$(stat -c '%U' /YOUR_PATH/environment.rb)
echo "Detected rails user: $RAILS_USER"
I have set PassengerPoolIdleTime to 0, with the expectation that this means I can "warm" up a bunch of passenger processes on my server, and the next time I have a burst of traffic (even if it is days later), they will all be warmed up and ready to accept requests.
What I'm seeing instead is that every morning when I get up, passenger-status shows only a handful of processes and they have all only been up since midnight. The previous day I'd warmed up a bunch of processes and the last time I looked at passenger-status (before midnight) there were 50.
Here's the entire Passenger-related snippet from my httpd.conf (I'm on CentOS):
LoadModule passenger_module /usr/local/lib/ruby/gems/1.8/gems/passenger 2.2.11/ext/apache2/mod_passenger.so
PassengerRoot /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.11
PassengerRuby /usr/local/bin/ruby
PassengerMaxPoolSize 60
PassengerPoolIdleTime 0
I've checked the crontabs for root and apache, to see if there might be something triggering an apache restart, but I don't see it.
Here's a snippet of passenger-status, about 11hours and 46minutes after midnight:
----------- General information -----------
max = 60
count = 3
active = 0
inactive = 3
Waiting on global queue: 0
----------- Domains -----------
/var/www/myapp/current:
PID: 20704 Sessions: 0 Processed: 360 Uptime: 11h 44m 16s
PID: 20706 Sessions: 0 Processed: 4249 Uptime: 11h 44m 9s
PID: 20708 Sessions: 0 Processed: 14189 Uptime: 11h 44m 9s
And here's what I see if I do a ps aux | grep apache:
apache 13297 0.0 0.0 546652 5312 ? Sl 14:28 0:00 /usr/sbin/httpd.worker
apache 13332 0.0 0.0 546652 5336 ? Sl 14:28 0:00 /usr/sbin/httpd.worker
apache 13334 0.0 0.0 546652 5328 ? Sl 14:28 0:00 /usr/sbin/httpd.worker
root 16841 0.0 0.0 6004 628 pts/0 S+ 15:48 0:00 grep apache
root 20478 0.0 0.0 88724 3640 ? Sl 04:02 0:01 /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.11/ext/apache2/ApplicationPoolServerExecutable 0 /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.11/bin/passenger-spawn-server /usr/local/bin/ruby /tmp/passenger.30916
apache 20704 0.0 1.7 251080 135164 ? S 04:02 0:06 Rails: /var/www/apps/myapp/current
apache 20706 0.2 1.7 255188 137704 ? S 04:02 1:52 Rails: /var/www/apps/myapp/current
apache 20708 0.9 1.7 255180 139332 ? S 04:02 6:26 Rails: /var/www/apps/myapp/current
The server is on UTC, so 04:02 corresponds to 12:02am my time (EDT).
Assuming that lograte is the culprit, I'd suggest using the copytruncate feature instead of reloading on postrotate. copytruncate isn't atomic, meaning you could lose a couple second's worth of logs. You'll also briefly double the disk space consumed by that log file. Here's some details.
/var/log/apache2/*.log {
weekly
missingok
rotate 52
compress
delaycompress
notifempty
create 640 root adm
sharedscripts
copytruncate
#postrotate
# /etc/init.d/apache2 reload > /dev/null
endscript
}
You could send your logs to a program which logs to a file based on date and eliminates logrotate...
CustomLog "|/usr/local/bin/my_log_script" combined
I discovered what was happening. Here is my logrotate conf file for httpd:
/var/log/httpd/*log {
missingok
notifempty
sharedscripts
postrotate
/sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
}
It's the postrotate script that is doing it. Reloading apache causes the passenger processes to die off.
Anyone have any good suggestions for how to do this without having to reload apache? Or a way to reload apache without killing off the passenger processes (if that's possible)?
Easiest way to logrotate without restarting/reloading a service is to use 'copyontruncate' option. That way logrotate will copy the contents of a log file to another file, and empty the current log file. That way service continues to log to the same file, and logrotate does it's thing. For example:
/var/log/httpd/*log {
copyontruncate
missingok
notifempty
sharedscripts
}