I want to start rsyslog as an additional process in a docker container, because my main service requires it for logging. Therefore trying to set it up with supervisor. But the following fails with a restart-loop for rsyslog. Why?
Dockerfile:
FROM debian:buster-slim
RUN set -e \
&& apt-get update \
&& apt-get install --yes \
rsyslog \
supervisor
COPY /services/rsyslog.conf /etc/rsyslog.d/console.conf
CMD ["supervisord", "-c", "/etc/supervisor.conf"]
supervisor.conf:
[supervisord]
#start in foreground
nodaemon=true
[program:syslog]
command=service rsyslog start
#[programm:another]
#command=...
Result:
process | 2022-10-27 10:07:09,906 INFO Set uid to user 0 succeeded
process | 2022-10-27 10:07:09,907 INFO supervisord started with pid 1
process | 2022-10-27 10:07:10,910 INFO spawned: 'syslog' with pid 9
process | 2022-10-27 10:07:10,987 INFO exited: syslog (exit status 0; not expected)
process | 2022-10-27 10:07:11,990 INFO spawned: 'syslog' with pid 22
process | 2022-10-27 10:07:11,999 INFO exited: syslog (exit status 0; not expected)
process | 2022-10-27 10:07:14,003 INFO spawned: 'syslog' with pid 28
process | 2022-10-27 10:07:14,014 INFO exited: syslog (exit status 0; not expected)
process | 2022-10-27 10:07:17,020 INFO spawned: 'syslog' with pid 34
process | 2022-10-27 10:07:17,030 INFO exited: syslog (exit status 0; not expected)
process | 2022-10-27 10:07:17,031 INFO gave up: syslog entered FATAL state, too many start retries too quickly
Related
I have run the following command while making the kafka cluster up
sudo docker compose up kafka-cluster
i have successfully access the Landoop UI portal a day ago but when i shutdown the system and perform the same steps again. I am now unable to access the landoop ui from this local URL
http://127.0.0.1:3030
I am using Ubuntu 20.04 and the following logs has been generated in the terminal.
[sudo] password for pc-11:
[+] Running 1/0
⠿ Container code-kafka-cluster-1 Created 0.0s
Attaching to code-kafka-cluster-1
code-kafka-cluster-1 | Setting advertised host to 127.0.0.1.
code-kafka-cluster-1 | Starting services.
code-kafka-cluster-1 | This is landoop’s fast-data-dev. Kafka 0.11.0.0, Confluent OSS 3.3.0.
code-kafka-cluster-1 | You may visit http://127.0.0.1:3030 in about a minute.
code-kafka-cluster-1 | 2022-07-14 08:48:34,716 CRIT Supervisor running as root (no user in config file)
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/01-zookeeper.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/02-broker.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/03-schema-registry.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/04-rest-proxy.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/05-connect-distributed.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/06-caddy.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/07-smoke-tests.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/08-logs-to-kafka.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,729 WARN Included extra file "/etc/supervisord.d/99-supervisord-sample-data.conf" during parsing
code-kafka-cluster-1 | 2022-07-14 08:48:34,731 INFO supervisord started with pid 7
code-kafka-cluster-1 | 2022-07-14 08:48:35,735 INFO spawned: 'sample-data' with pid 91
code-kafka-cluster-1 | 2022-07-14 08:48:35,753 INFO spawned: 'zookeeper' with pid 93
code-kafka-cluster-1 | 2022-07-14 08:48:35,766 INFO spawned: 'caddy' with pid 94
code-kafka-cluster-1 | 2022-07-14 08:48:35,770 INFO spawned: 'broker' with pid 95
code-kafka-cluster-1 | 2022-07-14 08:48:35,773 INFO spawned: 'smoke-tests' with pid 97
code-kafka-cluster-1 | 2022-07-14 08:48:35,776 INFO spawned: 'connect-distributed' with pid 98
code-kafka-cluster-1 | 2022-07-14 08:48:35,779 INFO spawned: 'logs-to-kafka' with pid 99
code-kafka-cluster-1 | 2022-07-14 08:48:35,782 INFO spawned: 'schema-registry' with pid 100
code-kafka-cluster-1 | 2022-07-14 08:48:35,785 INFO spawned: 'rest-proxy' with pid 101
code-kafka-cluster-1 | 2022-07-14 08:48:36,262 INFO exited: caddy (exit status 2; not expected)
code-kafka-cluster-1 | 2022-07-14 08:48:37,264 INFO success: sample-data entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,264 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,266 INFO spawned: 'caddy' with pid 381
code-kafka-cluster-1 | 2022-07-14 08:48:37,267 INFO success: broker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,267 INFO success: smoke-tests entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,267 INFO success: connect-distributed entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,267 INFO success: logs-to-kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,267 INFO success: schema-registry entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,268 INFO success: rest-proxy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:48:37,280 INFO exited: caddy (exit status 2; not expected)
code-kafka-cluster-1 | 2022-07-14 08:48:39,285 INFO spawned: 'caddy' with pid 389
code-kafka-cluster-1 | 2022-07-14 08:48:39,348 INFO exited: caddy (exit status 2; not expected)
code-kafka-cluster-1 | 2022-07-14 08:48:42,444 INFO spawned: 'caddy' with pid 403
code-kafka-cluster-1 | 2022-07-14 08:48:42,450 INFO exited: caddy (exit status 2; not expected)
code-kafka-cluster-1 | 2022-07-14 08:48:42,508 INFO gave up: caddy entered FATAL state, too many start retries too quickly
code-kafka-cluster-1 | 2022-07-14 08:49:04,090 INFO exited: schema-registry (exit status 1; not expected)
code-kafka-cluster-1 | 2022-07-14 08:49:04,099 INFO spawned: 'schema-registry' with pid 485
code-kafka-cluster-1 | 2022-07-14 08:49:05,124 INFO success: schema-registry entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
code-kafka-cluster-1 | 2022-07-14 08:49:35,818 INFO exited: smoke-tests (exit status 0; expected)
code-kafka-cluster-1 | 2022-07-14 08:51:35,933 INFO exited: logs-to-kafka (exit status 0; expected)
code-kafka-cluster-1 | 2022-07-14 08:52:53,146 INFO exited: sample-data (exit status 0; expected)
I figured out the solution as fast-data-dev is not maintained so we can make changing in the my configuration or mydocker_compose.yml I have replaced the landoop/fast-data-dev:cp3.3.0 with landoop/fast-data-dev:latestMy final docker-compose.yml is as follows:
version: '2'
services:
# this is our kafka cluster.
kafka-cluster:
image: landoop/fast-data-dev:latest
environment:
ADV_HOST: 127.0.0.1 # Change to 192.168.99.100 if using Docker Toolbox
RUNTESTS: 0 # Disable Running tests so the cluster starts faster
ports:
- 2181:2181 # Zookeeper
- 3030:3030 # Landoop UI
- 8081-8083:8081-8083 # REST Proxy, Schema Registry, Kafka Connect ports
- 9581-9585:9581-9585 # JMX Ports
- 9092:9092 # Kafka Broker
# we will use elasticsearch as one of our sinks.
# This configuration allows you to start elasticsearch
elasticsearch:
image: itzg/elasticsearch:2.4.3
environment:
PLUGINS: appbaseio/dejavu
OPTS: -Dindex.number_of_shards=1 -Dindex.number_of_replicas=0
ports:
- "9200:9200"
# we will use postgres as one of our sinks.
# This configuration allows you to start postgres
postgres:
image: postgres:9.5-alpine
environment:
POSTGRES_USER: postgres # define credentials
POSTGRES_PASSWORD: postgres # define credentials
POSTGRES_DB: postgres # define database
ports:
- 5432:5432 # Postgres port
And after just updating the image with the latest and i was able to get the landoop ui on 127.0.0.1:3030
I am also able to get the access the landoop ui even shutting down the cluster and accessing it again.
POST EDIT
The issue is due to :
PSP (Pod security policy) By default escalation is not permit for my condor user. That is why it is not working. because the supervisord is running as root user and try to write logs and start condor collector as root and not as an other user (i.e condor)
Description
The mini-condor base image is not starting as expected on kubernetes rancher pod.
I am using :
This image : https://hub.docker.com/r/htcondor/mini In a custom namespace in rancher (k8s)
ps : the image was working perfectly on :
a local env
minikube default installation
I am running it as a simple deployment :
When the pod is starting, the Kubernetes default log file is displaying :
2021-09-15 09:26:36,908 INFO supervisord started with pid 1
2021-09-15 09:26:37,911 INFO spawned: 'condor_master' with pid 20
2021-09-15 09:26:37,912 INFO spawned: 'condor_restd' with pid 21
2021-09-15 09:26:37,917 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:37,924 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:38,926 INFO spawned: 'condor_master' with pid 22
2021-09-15 09:26:38,928 INFO spawned: 'condor_restd' with pid 23
2021-09-15 09:26:38,932 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:38,936 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:40,939 INFO spawned: 'condor_master' with pid 24
2021-09-15 09:26:40,943 INFO spawned: 'condor_restd' with pid 25
2021-09-15 09:26:40,947 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:40,948 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:43,953 INFO spawned: 'condor_master' with pid 26
2021-09-15 09:26:43,955 INFO spawned: 'condor_restd' with pid 27
2021-09-15 09:26:43,959 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:43,968 INFO gave up: condor_restd entered FATAL state, too many start retries too quickly
2021-09-15 09:26:43,969 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:44,970 INFO gave up: condor_master entered FATAL state, too many start retries too quickly
Here is a brief cmd and output result:
CMD
output
condor_status
CEDAR:6001:Failed to connect to <127.0.0.1:9618>
condor_master
ERROR "Cannot open log file '/var/log/condor/MasterLog'" at line 174 in file /var/lib/condor/execute/slot1/dir_17406/userdir/.tmpruBd6F/BUILD/condor-9.0.5/src/condor_utils/dprintf_setup.cpp`
1)first try to fix the issue
I decided to customize the image, but the error is the same
The docker images used to try to fix the permission issue
Image :
FROM htcondor/mini:9.2-el7
RUN condor_master
RUN chown condor:root /var/
RUN chown condor:root /var/log
RUN chown -R condor:root /var/log/
RUN chown -R condor:condor /var/log/condor
RUN chown condor:condor /var/log/condor/ProcLog
RUN chown condor:condor /var/log/condor/MasterLog
RUN chmod 775 -R /var/
Kubernetes - rancher
yaml file :
apiVersion: apps/v1
kind: Deployment
metadata:
name: htcondor-mini--all-in-one
namespace: grafana-exporter
spec:
containers:
- image: <custom_image>
imagePullPolicy: Always
name: htcondor-mini--all-in-one
resources: {}
securityContext:
capabilities: {}
stdin: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
tty: true
dnsConfig: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
Here is a brief cmd and output result:
CMD
output
condor_status
CEDAR:6001:Failed to connect to <127.0.0.1:9618>
condor_master
ERROR "Cannot open log file '/var/log/condor/MasterLog'" at line 174 in file /var/lib/condor/execute/slot1/dir_17406/userdir/.tmpruBd6F/BUILD/condor-9.0.5/src/condor_utils/dprintf_setup.cpp`
ls -ld /var/
drwxrwxr-x 1 condor root 17 Nov 13 2020 /var/
ls -ld /var/log/
drwxrwxr-x 1 condor root 65 Oct 7 11:54 /var/log/
ls -ld /var/log/condor
drwxrwxr-x 1 condor condor 240 Oct 7 11:23 /var/log/condor
ls -ld /var/log/condor/MasterLog
-rwxrwxr-x 1 condor condor 3243 Oct 7 11:23 /var/log/condor/MasterLog
MasterLog content :
10/07/21 11:23:21 ******************************************************
10/07/21 11:23:21 ** condor_master (CONDOR_MASTER) STARTING UP
10/07/21 11:23:21 ** /usr/sbin/condor_master
10/07/21 11:23:21 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
10/07/21 11:23:21 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
10/07/21 11:23:21 ** $CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID: 9.2.0-1 $
10/07/21 11:23:21 ** $CondorPlatform: x86_64_CentOS7 $
10/07/21 11:23:21 ** PID = 7
10/07/21 11:23:21 ** Log last touched time unavailable (No such file or directory)
10/07/21 11:23:21 ******************************************************
10/07/21 11:23:21 Using config source: /etc/condor/condor_config
10/07/21 11:23:21 Using local config sources:
10/07/21 11:23:21 /etc/condor/config.d/00-htcondor-9.0.config
10/07/21 11:23:21 /etc/condor/config.d/00-minicondor
10/07/21 11:23:21 /etc/condor/config.d/01-misc.conf
10/07/21 11:23:21 /etc/condor/condor_config.local
10/07/21 11:23:21 config Macros = 73, Sorted = 73, StringBytes = 1848, TablesBytes = 2692
10/07/21 11:23:21 CLASSAD_CACHING is OFF
10/07/21 11:23:21 Daemon Log is logging: D_ALWAYS D_ERROR
10/07/21 11:23:21 SharedPortEndpoint: waiting for connections to named socket master_7_43af
10/07/21 11:23:21 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory
10/07/21 11:23:21 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
10/07/21 11:23:21 Permission denied error during DISCARD_SESSION_KEYRING_ON_STARTUP, continuing anyway
10/07/21 11:23:21 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
10/07/21 11:23:21 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port
10/07/21 11:23:21 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1632433213)
10/07/21 11:23:21 Cannot remove wait-for-startup file /var/lock/condor/shared_port_ad
10/07/21 11:23:21 WARNING: forward resolution of ip6-localhost doesn't match 127.0.0.1!
10/07/21 11:23:21 WARNING: forward resolution of ip6-loopback doesn't match 127.0.0.1!
10/07/21 11:23:22 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 9
10/07/21 11:23:22 Waiting for /var/lock/condor/shared_port_ad to appear.
10/07/21 11:23:22 Found /var/lock/condor/shared_port_ad.
10/07/21 11:23:22 Cannot remove wait-for-startup file /var/log/condor/.collector_address
10/07/21 11:23:23 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 10
10/07/21 11:23:23 Waiting for /var/log/condor/.collector_address to appear.
10/07/21 11:23:23 Found /var/log/condor/.collector_address.
10/07/21 11:23:23 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 11
10/07/21 11:23:23 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 12
10/07/21 11:23:24 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 15
10/07/21 11:23:24 Daemons::StartAllDaemons all daemons were started
A huge thanks for reading. Hope it will help many other people.
Cause of the issue
The issue is due to :
PSP policy (Pod security policy)
By default escalation is not permit for my condor user.
SOLUTION
THE BEST SOLUTION I found at the moment is to run EVERYTHING as condor user and give the permisssion to the condor users. To do so you need :
In the supervisord.conf : Run supervisor as condor user
In the supervisord.conf : run log and socket in /tmp
In the Dockerfile : Change the owner of most of folder by condor
In the deployment.yamlset the ID to 64 (condor user)
Dockerfile
FROM htcondor/mini:9.2-el7
# SET WORKDIR
WORKDIR /home/condor/
RUN chown condor:condor /home/condor
# COPY SUPERVISOR
COPY supervisord.conf /etc/supervisord.conf
# Need to run the cmd to create all dir
RUN condor_master
# FIX PERMISSION ISSUES FOR RANCHER
RUN chown -R condor:condor /var/log/ /tmp &&\
chown -R restd:restd /home/restd &&\
chmod 755 -R /home/restd
supervisord.conf:
[supervisord]
user=condor
nodaemon=true
logfile = /tmp/supervisord.log
directory = /tmp
pidfile = /tmp/supervisord.pid
childlogdir = /tmp
# next 3 sections contain using supervisorctl to manage daemons
[unix_http_server]
file=/tmp/supervisord.sock
chown=condor:condor
chmod=0777
user=condor
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[program:condor_master]
user=condor
command=/usr/sbin/condor_master -f
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile = /var/log/condor_master.log
stderr_logfile = /var/log/condor_master.error.log
deployment.yaml
apiVersion: apps/v1
kind: Deployment
spec:
containers:
- image: <condor-image>
imagePullPolicy: Always
name: htcondor-exporter
ports:
- containerPort: 8080
name: myport
protocol: TCP
resources: {}
securityContext:
capabilities: {}
runAsNonRoot: false
runAsUser: 64
stdin: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
tty: true
I have a problem with supervisor in docker. I use the supervisor to start 4 .sh scripts: datagrid.sh, ml.sh, startmap.sh and dirwatcher.sh.
When I open the container, navigate to the scripts directory and try to start the scripts manually, everything works, the scripts all start, but they don't start on start time. I assume the problem is with the supervisor. Thank you.
The error:
2018-08-08 12:28:08,512 INFO spawned: 'datagrid' with pid 171
2018-08-08 12:28:08,514 INFO spawned: 'dirwatcher' with pid 172
2018-08-08 12:28:08,517 INFO spawned: 'startmap' with pid 173
2018-08-08 12:28:08,519 INFO spawned: 'ml' with pid 175
2018-08-08 12:28:08,520 INFO exited: datagrid (exit status 0; not expected)
2018-08-08 12:28:08,520 INFO exited: dirwatcher (exit status 0; not expected)
2018-08-08 12:28:08,520 INFO exited: startmap (exit status 0; not expected)
2018-08-08 12:28:08,520 INFO exited: ml (exit status 0; not expected)
2018-08-08 12:28:08,527 INFO gave up: datagrid entered FATAL state, too many start retries too quickly
2018-08-08 12:28:08,532 INFO gave up: ml entered FATAL state, too many start retries too quickly
2018-08-08 12:28:08,537 INFO gave up: startmap entered FATAL state, too many start retries too quickly
2018-08-08 12:28:08,539 INFO gave up: dirwatcher entered FATAL state, too many start retries too quickly
My supervisord.conf file:
[supervisord]
nodaemon=false
[program:datagrid]
command=sh /EscomledML/MLScripts/escomled_data_grid.sh start -D
[program:dirwatcher]
command=sh /EscomledML/MLScripts/escomled_dirwatcher.sh start -D
[program:startmap]
command=sh /EscomledML/MLScripts/escomled_startmap.sh start -D
[program:ml]
command=sh /EscomledML/MLScripts/escomled_ml.sh start -D
I use alpine linux in the container.
There are few problems here
The following statement:
[supervisord]
nodaemon=false
This makes the Supervisord run as daemon and the container needs a main process.
Try changing it to
[supervisord]
nodaemon=true
This configuration makes Supervisord itself run as a foreground process, which will keep the container up and running.
From the logs
'520 INFO exited: datagrid (exit status 0; not expected)'
Supervisord is not able to recognise 0 as valid exit code and is exiting the process. Add the following to the conf for all the processes. This will tell Supervisord to try restarting the process only if the exit code is not 0
[program:datagrid]
command=sh /EscomledML/MLScripts/escomled_data_grid.sh start -D
autorestart=unexpected
exitcodes=0
I have a docker image where I wish:
- to run a passenger server and another daemon for monitoring the passenger server.
- the container to exit as soon as either one of these 2 processes exit even once.
- direct all logs to stdout
In config file, I have put an event listener (Reference: https://serverfault.com/questions/760726/how-to-exit-all-supervisor-processes-if-one-exited-with-0-result/762406#762406) that captures some events for passenger_monit program and executes a script tt.sh.
I can see 1 extra instance of passenger_monit program being spawned and reaching FATAL state after a few tries. The other passenger_monit and passenger_server are fine. The other passenger_monit's events don't reach the eventlistener.
These are the scripts which are not working as expected:
This is the supervisord.conf
[supervisord]
nodaemon=true
stdout_logfile=/dev/fd/1
redirect_stderr=true
stdout_logfile_maxbytes=0
[unix_http_server]
file=%(here)s/supervisor.sock
[supervisorctl]
serverurl=unix://%(here)s/supervisor.sock
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[program:passenger_monit]
command=./script/passenger_monit.sh
process_name=passenger_monit
startretries=999
redirect_stderr=true
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
autorestart=true
killasgroup=true
stopasgroup=true
numprocs=1
[program:passenger_server]
command=passenger start
startretries=999
redirect_stderr=true
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
autorestart=true
killasgroup=true
stopasgroup=true
numprocs=1
[eventlistener:passenger_monit_exit]
command=./tt.sh
process_name=passenger_monit
events=PROCESS_STATE_STARTING,PROCESS_STATE_EXITED,PROCESS_STATE_FATAL
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
This is the ./script/passenger_monit.sh
#!/bin/bash
set -x
cd /passenger/newrelic_passenger_plugin/
# if exec is not put, then this process is not killed when supervisord exits
exec ./newrelic_passenger_agent
set +x
This is tt.sh
#!/bin/bash
echo "in tt!"
This is the command I run:
docker exec -it -u deploy 56bbbbe4352b supervisord
This is the output I get:
2016-08-26 19:47:29,369 INFO RPC interface 'supervisor' initialized
2016-08-26 19:47:29,369 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-08-26 19:47:29,370 INFO supervisord started with pid 2446
2016-08-26 19:47:30,374 INFO spawned: 'passenger_monit' with pid 2452
2016-08-26 19:47:30,377 INFO spawned: 'passenger_server' with pid 2453
in tt!
2016-08-26 19:47:30,392 INFO exited: passenger_monit (exit status 0; not expected)
=============== Phusion Passenger Standalone web server started ===============
PID file: /home/deploy/abc/tmp/pids/passenger.3000.pid
Log file: /home/deploy/abc/log/passenger.3000.log
Environment: development
Accessible via: http://0.0.0.0:3000/
You can stop Phusion Passenger Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
2016-08-26 19:47:31,565 INFO spawned: 'passenger_monit' with pid 2494
2016-08-26 19:47:31,566 INFO success: passenger_server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
in tt!
2016-08-26 19:47:31,571 INFO exited: passenger_monit (exit status 0; not expected)
2016-08-26 19:47:33,576 INFO spawned: 'passenger_monit' with pid 2498
in tt!
2016-08-26 19:47:33,583 INFO exited: passenger_monit (exit status 0; not expected)
2016-08-26 19:47:36,588 INFO spawned: 'passenger_monit' with pid 2499
in tt!
2016-08-26 19:47:36,595 INFO exited: passenger_monit (exit status 0; not expected)
2016-08-26 19:47:37,597 INFO gave up: passenger_monit entered FATAL state, too many start retries too quickly
^C2016-08-26 19:47:47,730 WARN received SIGINT indicating exit request
2016-08-26 19:47:47,735 INFO waiting for passenger_server to die
Stopping web server... done
2016-08-26 19:47:47,839 INFO stopped: passenger_server (exit status 2)
This is the output for supervisorctl status
passenger_monit STOPPED Not started
passenger_monit_exit:passenger_monit FATAL Exited too quickly (process log may have details)
passenger_server RUNNING pid 2453, uptime 0:00:14
Output of supervisord -v
3.0b2
The following should work. Notice the 10 second script will be killed after 5 seconds.
[supervisord]
loglevel=warn
nodaemon=true
[program:hello]
command=bash -c "echo waiting 5 seconds . . . && sleep 5"
autorestart=false
numprocs=1
startsecs=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
[program:world]
command=bash -c "echo waiting 10 seconds . . . && sleep 10"
autorestart=false
numprocs=1
startsecs=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
[eventlistener:processes]
command=bash -c "echo READY && read line && kill -SIGQUIT $PPID"
events=PROCESS_STATE_STOPPED,PROCESS_STATE_EXITED,PROCESS_STATE_FATAL
Here is how I configure supervisor:
[supervisord]
nodaemon=true
[program:djangoonlyfonts]
command = /code/deploy/gunicorn.sh ; Command to start app
stdout_logfile = /var/log/supervisor/supervisor.log ; Where to write log messages
redirect_stderr = true ; Save stderr in the same log
autostart=true
autorestart=true
gunicorn.sh:
#!/bin/bash
cd /code
export DJANGO_SETTINGS_MODULE=fuentes.settingsser
/usr/local/bin/gunicorn -b 0.0.0.0:8000 --workers=1 fuentes.wsgi:application
I get:
root#3eb7d4cb7a4e:/code# supervisord
/usr/local/lib/python2.7/site-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2016-08-16 07:53:37,712 CRIT Supervisor running as root (no user in config file)
2016-08-16 07:53:37,715 INFO supervisord started with pid 64
2016-08-16 07:53:38,717 INFO spawned: 'djangoonlyfonts' with pid 67
2016-08-16 07:53:38,721 INFO exited: djangoonlyfonts (exit status 127; not expected)
2016-08-16 07:53:39,723 INFO spawned: 'djangoonlyfonts' with pid 68
2016-08-16 07:53:39,728 INFO exited: djangoonlyfonts (exit status 127; not expected)
2016-08-16 07:53:41,732 INFO spawned: 'djangoonlyfonts' with pid 69
2016-08-16 07:53:41,735 INFO exited: djangoonlyfonts (exit status 127; not expected)
2016-08-16 07:53:44,740 INFO spawned: 'djangoonlyfonts' with pid 70
2016-08-16 07:53:44,743 INFO exited: djangoonlyfonts (exit status 127; not expected)
2016-08-16 07:53:45,745 INFO gave up: djangoonlyfonts entered FATAL state, too many start retries too quickly
but when I execute the command directly:
root#3eb7d4cb7a4e:~# /code/deploy/gunicorn.sh
[2016-08-16 07:55:19 +0000] [84] [INFO] Starting gunicorn 19.6.0
[2016-08-16 07:55:19 +0000] [84] [INFO] Listening at: http://0.0.0.0:8000 (84)
[2016-08-16 07:55:19 +0000] [84] [INFO] Using worker: sync
[2016-08-16 07:55:19 +0000] [89] [INFO] Booting worker with pid: 89
The production file is loaded
It just works, which proves the file is a perfectly executable and it actually works.