supervisord: is it possible to redirect subprocess stdout back to supervisord? - docker

I'm using supervisord as the entry point for Docker containers as described in https://docs.docker.com/articles/using_supervisord/,
I want all logs to be written to stdout so I can take advantage of builtin tools like docker logs or systemd's journal, especially if running the containers on CoreOS.
for stderr there's redirect_stderr=true option for subprocesses,
is it possible to redirect the subprocess stdout back to supervisord somehow and not deal with actual log files ?

You can redirect the program's stdout to supervisor's stdout using the following configuration options:
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
Explanation:
When a process opens /dev/fd/1 (which is the same as /proc/self/fd/1), the system actually clones file descriptor #1 (stdout) of that process. Using this as stdout_logfile therefore causes supervisord to redirect the program's stdout to its own stdout.
stdout_logfile_maxbytes=0 disables log file rotation which is obviously not meaningful for stdout. Not specifying this option will result in an error because the default value is 50MB and supervisor is not smart enough to detect that the specified log file is not a regular file.
For more information:
http://veithen.github.io/2015/01/08/supervisord-redirecting-stdout.html

Related

Tweak logging of standalone Neo4j server

When running the Neo4j standalone community Docker, some logs are written to stdout and some to a file inside the container: debug.log.
I would like to be able to set log4j options, like log-level, appenders etc. The reasons are:
I can't access the log files when running in e.g AWS ECS
The debug logs are quite verbose
log4j-properties are convenient to deploy and manage
So the question is, how can I set a custom log4j properties for a Neo4j server that's running inside a container?
It's not documented if it's even possible, but I have tried the usual things one starts to thing of. First adding a log4j.xml to the classpath but to no avail. Also I have tried setting dbms.jvm.additional to
-Dlog4j.configuration=<path_to_file>
-Dlog4j.configurationFile=<path_to_file>
-Dlog4j2.configurationFile==<path_to_file>
I've verified that the Neo4j process has the correct jvm-arguments:
/usr/local/openjdk-11/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/*
-Xms2048m -Xmx2048m -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions -XX:+TrustFinalNonStaticFields -XX:+DisableExplicitGC
-XX:MaxInlineLevel=15 -XX:-UseBiasedLocking -Djdk.nio.maxCachedBufferSize=262144
-Dio.netty.tryReflectionSetAccessible=true -Djdk.tls.ephemeralDHKeySize=2048
-Djdk.tls.rejectClientInitiatedRenegotiation=true -XX:FlightRecorderOptions=stackdepth=256
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j2.disable.jmx=true
-Dlog4j.configuration=file:/var/lib/neo4j/conf/log4j.properties
-Dlog4j.configurationFile=file:/var/lib/neo4j/conf/log4j.xml
-Dlog4j2.configurationFile=file:/var/lib/neo4j/conf/log4j2.xml
-Dfile.encoding=UTF-8 org.neo4j.server.CommunityEntryPoint
--home-dir=/var/lib/neo4j --config-dir=/var/lib/neo4j/conf
I have verified that no lib-jars contains a log4j-properties that takes precedence.
And whatever I try, no changes are made to the way the server is logging. The same events are written to /logs/debug.log. No exceptions regarding bad log4j config or something like that.
I have created a small project for easier debugging.
I would write out the files to stdout.
# forward logs to docker log collector
RUN ln -sf /dev/stdout /logs/debug.log
So this is a 'docker' solution but handy and works also for other usecases.

How to get a program's std-out to fluentd (without docker)

Scenario:
You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.
Question:
How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?
When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.
Thoughts:
Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?
I like the 'tail' input plugin, which could be what I am looking for, but then:
the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...
I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).
TL;DR
std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
Windows: use fluent-bit.
App side config: use single (or predictable) log locations, and the
fluentd/fluent-bit 'in_tail' plugin.
logging general:
It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):
My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt
This opens the option for fluentd to read from this/these file(s).
Windows:
For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.
fluent-bit supports:
the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.
Appl. side config:
The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.
Fluent-bit setup/config:
For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).
For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).
There are 2 execution methods:
Providing configuration directly via the commandline
Using a config file (example included in zip), and referring to it with the -c flag.
Directly from commandline
Some example executions (without making use of the option to work with a configuration file) can be found here:
PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'
-i declares the input method. Currently, only a few plugins have been implemented, see the man page below.
PS fluent-bit.exe --help
Available Options
-b --storage_path=PATH specify a storage buffering path
-c --config=FILE specify an optional configuration file
-f, --flush=SECONDS flush timeout in seconds (default: 5)
-F --filter=FILTER set a filter
-i, --input=INPUT set an input
-m, --match=MATCH set plugin match, same as '-p match=abc'
-o, --output=OUTPUT set an output
-p, --prop="A=B" set plugin configuration property
-R, --parser=FILE specify a parser configuration file
-e, --plugin=FILE load an external plugin (shared lib)
-l, --log_file=FILE write log info to a file
-t, --tag=TAG set plugin tag, same as '-p tag=abc'
-T, --sp-task=SQL define a stream processor task
-v, --verbose increase logging verbosity (default: info)
-s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
-q, --quiet quiet mode
-S, --sosreport support report for Enterprise customers
-V, --version show version number
-h, --help print this help
Inputs
tail Tail files
dummy Generate dummy data
statsd StatsD input plugin
winlog Windows Event Log
tcp TCP
forward Fluentd in-forward
random Random
Outputs
counter Records counter
datadog Send events to DataDog HTTP Event Collector
es Elasticsearch
file Generate log file
forward Forward (Fluentd protocol)
http HTTP Output
influxdb InfluxDB Time Series
null Throws away events
slack Send events to a Slack channel
splunk Send events to Splunk HTTP Event Collector
stackdriver Send events to Google Stackdriver Logging
stdout Prints events to STDOUT
tcp TCP Output
flowcounter FlowCounter
Filters
aws Add AWS Metadata
expect Validate expected keys and values
record_modifier modify record
rewrite_tag Rewrite records tags
throttle Throttle messages using sliding window algorithm
grep grep events by specified field values
kubernetes Filter to append Kubernetes metadata
parser Parse events
nest nest events by specified field values
modify modify records by applying rules
lua Lua Scripting Filter
stdout Filter events to STDOUT

Docker - Handling multiple services in a single container

I would like to start two different services in my Docker container and exit the container as soon as one of them exits. I looked at supervisor, but I can't find how to get it to quit as soon as one of the managed applications exits. It tries to restart them up to three times, as is the standard setting and then just sits there doing nothing. Is supervisor able to do this or is there any other tool for this? A bonus would be if there also was a way to let both managed programs write to stdout, tagged with their application name, e.g.:
[Program 1] Some output
[Program 2] Some other output
[Program 1] Output again
Since you asked if there was another tool... we designed and wrote a powerful replacement for supervisord that is designed specifically for Docker. It automatically terminates when all applications quit, as well as has special service settings to control this behavior, plus will redirect stdout with tagged syslog-compatible output lines as well. It's open source, and being used in production.
Here is a quick start for Docker: http://garywiz.github.io/chaperone/guide/chap-docker-simple.html
There is also a complete set of tested base-images which are a good example at: https://github.com/garywiz/chaperone-docker, but these might be overkill and the earlier quickstart may do the trick.
I found solutions to both of my requirements by reading through the docs some more.
Exit supervisord on application exit
This can be achieved by using a custom eventlistener. I had to add the following segment into my supervisord configuration file:
[eventlistener:shutdownevent]
command=/shutdownhandler.sh
events=PROCESS_STATE_EXITED
supervisord will start the referenced script and upon the given event being triggered (PROCESS_STATE_EXITED is triggered after the exit of one of the managed programs and it not restarting automatically) will send a line containing data about the event on the scripts stdin.
The referenced shutdownhandler-script contains:
#!/bin/bash
while :
do
echo -en "READY\n"
read line
kill $(cat /supervisord.pid)
echo -en "RESULT 2\nOK"
done
The script has to indicate being ready by sending "READY\n" on its stdout, after which it may receive an event data line on its stdin. For my use case upon receival of a line (meaning one of the managed programs has exited), a SIGTERM is sent to the supervisord process being found by the pid it leaves in its pid file (situated in the root directory by default). For technical completeness, I also included a positive answer for the eventlistener, though that one should never matter.
Tagged output on stdout
I did this by simply starting a tail process in the background before starting supervisord, tailing the programs output log and piping the lines through ts (from the moreutils package) to prepend a tag to it. This way it shows up via docker logs with an easy way to see which program actually wrote the line.
tail -fn0 /var/log/supervisor/program1.log | ts '[Program 1]' &

Ruby background process STDOUT is empty

I'm having a weird issue with a start-up script which runs a Sinatra script using the shell's "daemon" function. The problem is that when I run the command at the command line, I get output to STDOUT. If I run the command at the command line exactly as it is in the script -- less the daemon part -- the output is correctly redirected to the output file. However, when the startup script runs it (see below), I get stuff to the STDERR log but not to the STDOUT log.
The relevant lines of the script:
#!/bin/sh
# (which is and has been a symlink to /bin/bash
# Source function library.
. /etc/init.d/functions
# Set Some Variables
RUNAS="joeuser"
PID=/var/run/myapp.pid
LOG="/var/log/myapp/app-out.log"
ERR_LOG="/var/log/myapp/app-err.log"
APPLICATION_COMMAND="RAILS_ENV=production ruby /opt/myapp/lib/daemons/my-sinatra-app.rb -p 8002 2>>${ERR_LOG} >>${LOG} &"
# Snip a bunch. This is the applicable line from the "start" case:
daemon --user $RUNAS --pidfile $PID $APPLICATION_COMMAND &> /dev/null
Now, the funky parts:
The error log is written to correctly via the redirect of STDERR.
If I reverse the order of the >> and the 2>> (I'm grasping at straws, here!), the behavior does not change: I still get STDERR logged correctly and STDOUT is empty.
If the output log doesn't exist, the STDOUT redirect creates the file. But, the file remains 0-length.
This used to work. The log directory is maintained by log-rotate. All of the more-recent 'out' logs are 0-length. The older ones are not. It seems like it stopped working some time in April. The ruby code didn't change at any time near then; neither did the startup script.
We're running three different services in this way. Two of them are ruby daemons (one uses sinatra, one does not) and the other is a background java process. This is occurring for BOTH of the ruby processes but is not happening on the java process. Maybe something changed in Ruby?
FTR, we've got ruby 1.8.5 and RHEL 5.4.
I've done some more probing. The daemon function does a bunch of stuff, but the meat of the matter is that it runs the program using runuser. The command essentially looks like this:
runuser -s /bin/bash - joeuser -c "ulimit -S -c 0 >/dev/null 2>&1 ; RAILS_ENV=production ruby /opt/myapp/lib/daemons/my-sinatra-app.rb -p 8002 '</dev/null' '>>/var/log/myapp/app-out.log' '2>>/var/log/myapp/app-err.log' '&'"
When I run exactly that at the command line (both with and without the single-ticks that got added somewhere along the line), I get the exact same screwy behavior w.r.t. the output log. So, it seems to me that this is an issue of how ruby (?) interacts with runuser?
Too long to put in a comment :-)
change the shebang to add #!/bin/sh -x and verify that everything is expanded according to your expectations. Also, when executing from terminal, your .bashrc file is sourced, when executing from script, it is not; might be something in you're environment that differ. One way to find out is to do env from terminal and from script and diff the output
env > env_terminal
env > env_script
diff env_terminal env_script
Happy hunting...

Avoid generating empty STDOUT and STDERR files with Sun Grid Engine (SGE) and array jobs

I am running array jobs with Sun Grid Engine (SGE).
My carefully scripted array job workers generate no stdout and no stderr when they function properly. Unfortunately, SGE insists on creating an empty stdout and stderr file for each run.
Sun's manual states:
STDOUT and STDERR of array job tasks will be written into dif-
ferent files with the default location
.['e'|'o']'.'
In order to change this default, the -e and -o options (see
above) can be used together with the pseudo-environment-vari-
ables $HOME, $USER, $JOB_ID, $JOB_NAME, $HOSTNAME, and
$SGE_TASK_ID.
Note, that you can use the output redirection to divert the out-
put of all tasks into the same file, but the result of this is
undefined.
I would like to have the output files suppressed if they are empty. Is there any way to do this?
No, there is no way to do this.

Resources