We are collecting the logs of our applications. Since we containerize our applications, the way to collect logs needs a little bit changes.
We log via the Docker Logging Driver:
Application output the logs to container’s stdout and stderr
Using json-file logging driver, docker output logs to json file on
the host machine
Service on the host machine forwards the log files.
But the logs from Docker has additional information which unnecessary and make the forward step complicated because we need to remove those additional information before forward.
For example, the log from Docker is as below, but all we want is the value of log field. Is there a way to customize log format and only output the information wanted by override some Docker's configurations?
{
“log”: "{“level”: “info”,“message”: “data is correct”,“timestamp”: “2017-08-01T11:35:30.375Z”}\r\n",
“stream”: “stdout”,
“time”: “2017-08-03T07: 58: 02.387253289Z”
}
I don't know of any way to customize the output of the json-file docker log plugin. However docker supports the gelf plugin which allows you to send logs to logstash. Using logstash you can output logs in many different ways (by using output plugins) and at the same time customize the format.
For instance to output logs to a file (without any other metadata) you can use something like the following:
output {
file {
path => "/path/to/logfile"
codec => line { format => "%{message}"}
}
}
If you don't want to add complexity to your logging logic, you can keep using the json-file driver and use an utility such as jq to parse the file and extract only the relevant information. For instance with jq you can do: jq -r .log </path/to/logfile>
This will read each line of the specified file as a json object and output only the log field.
Related
I can not find an example to output Kongs logs as JSONto system out. I am currently using Fluentd to ingest logs from my Kubernetes cluster but I have no idea how to send those logs to Fluentd as structured JSON.
For anyone who is struggling with this, I made the following updates to the kong helm chart values.
env:
admin_access_log: '/dev/stdout structured_logs'
proxy_access_log: '/dev/stdout structured_logs'
nginx_http_log_format: |
structured_logs escape=json '{"remote_addr": "$remote_addr", "remote_user": "$remote_user", "host": "$host"...}
Have you looked at the file-log plugin? https://docs.konghq.com/hub/kong-inc/file-log/
It lets you log to /dev/stdout and use lua to remove/add fields if necessary.
Scenario:
You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.
Question:
How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?
When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.
Thoughts:
Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?
I like the 'tail' input plugin, which could be what I am looking for, but then:
the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...
I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).
TL;DR
std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
Windows: use fluent-bit.
App side config: use single (or predictable) log locations, and the
fluentd/fluent-bit 'in_tail' plugin.
logging general:
It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):
My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt
This opens the option for fluentd to read from this/these file(s).
Windows:
For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.
fluent-bit supports:
the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.
Appl. side config:
The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.
Fluent-bit setup/config:
For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).
For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).
There are 2 execution methods:
Providing configuration directly via the commandline
Using a config file (example included in zip), and referring to it with the -c flag.
Directly from commandline
Some example executions (without making use of the option to work with a configuration file) can be found here:
PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'
-i declares the input method. Currently, only a few plugins have been implemented, see the man page below.
PS fluent-bit.exe --help
Available Options
-b --storage_path=PATH specify a storage buffering path
-c --config=FILE specify an optional configuration file
-f, --flush=SECONDS flush timeout in seconds (default: 5)
-F --filter=FILTER set a filter
-i, --input=INPUT set an input
-m, --match=MATCH set plugin match, same as '-p match=abc'
-o, --output=OUTPUT set an output
-p, --prop="A=B" set plugin configuration property
-R, --parser=FILE specify a parser configuration file
-e, --plugin=FILE load an external plugin (shared lib)
-l, --log_file=FILE write log info to a file
-t, --tag=TAG set plugin tag, same as '-p tag=abc'
-T, --sp-task=SQL define a stream processor task
-v, --verbose increase logging verbosity (default: info)
-s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
-q, --quiet quiet mode
-S, --sosreport support report for Enterprise customers
-V, --version show version number
-h, --help print this help
Inputs
tail Tail files
dummy Generate dummy data
statsd StatsD input plugin
winlog Windows Event Log
tcp TCP
forward Fluentd in-forward
random Random
Outputs
counter Records counter
datadog Send events to DataDog HTTP Event Collector
es Elasticsearch
file Generate log file
forward Forward (Fluentd protocol)
http HTTP Output
influxdb InfluxDB Time Series
null Throws away events
slack Send events to a Slack channel
splunk Send events to Splunk HTTP Event Collector
stackdriver Send events to Google Stackdriver Logging
stdout Prints events to STDOUT
tcp TCP Output
flowcounter FlowCounter
Filters
aws Add AWS Metadata
expect Validate expected keys and values
record_modifier modify record
rewrite_tag Rewrite records tags
throttle Throttle messages using sliding window algorithm
grep grep events by specified field values
kubernetes Filter to append Kubernetes metadata
parser Parse events
nest nest events by specified field values
modify modify records by applying rules
lua Lua Scripting Filter
stdout Filter events to STDOUT
When I run docker logs container or docker-compose -f file.yml logs -f I get a nicely formatted, color parsed view of the log. It handles all the \t \n from my script and shows them as Tabs and New lines. In addition, using chalk, docker outputs the text in colors as specified.
To find the log, I run docker inspect --format='{{.LogPath}}' container_1
Which results in /var/lib/docker/containers/longid/longid-json.log
However, upon saving the file, My nice docker view of this
Turns into this (viewed in a terminal from cat log.json.txt)
I get that it's using the json-file logging driver, but it also appears to mess up the encoding and changes to some ascii format with \uxxxx all over the place.
I tried running the file through ascii2uni and got this
Which got me most of the way there, but it still has some leftover \r\n and also, the problem of it appearing as a JSON file.
So my question is this:
How do I view the file like docker does when running docker logs, with UTF8 encoding, formatted, and with color parsed output? Can I run docker logs myfile.log.txt and read straight from a given file?
This is not particularly about my current problem, but more like in general. Sometimes I have a problem that only happens in production configuration, and I'd like to debug it there. What is the best way to approach that in Elixir? Production runs without a graphical environment (docker).
In dev I can use IEX.pry, but since mix is unavailable in production, that does not seem to be an option.
For Erlang https://stackoverflow.com/a/21413344/1561489 mentions dbg and redbug, but even if they can be used, I would need help on applying them to Elixir code.
First, start a local node running iex on your dev machine using iex -S mix. If you don't want the application that's running locally to cause breakpoints to be activated, you need to disable the app from starting locally. To do this, you can simply comment out the application function in mix.exs or run iex -S mix run --no-start.
Next, you need to connect to the remote node running on docker from iex on your dev node using Node.connect(:"remote#hostname"). In order to do this, you have to make sure both the epmd and the node ports on the remote machine are reachable from your local node.
Finally, once your nodes are connected, from the local iex, run :debugger.start() which opens the debugger with the GUI. Now in the local iex, run :int.ni(<Module you want to debug>) and it will make the module visible to the debugger and you can go ahead and add breakpoints and start debugging.
You can find a tutorial with steps and screenshots here.
In the case that you are running your production on AWS, then you should first and foremost leverage CloudWatch to your advantage.
In your elixir code, configure your logger like this:
config :logger,
handle_otp_reports: true,
handle_sasl_reports: true,
metadata: [:application, :module, :function, :file, :line]
config :logger,
backends: [
{LoggerFileBackend, :shared_error}
]
config :logger, :shared_error,
path: "#{logging_dir}/verbose-error.log",
level: :error
Inside your Dockerfile, configure an environment variable for where exactly erl_crash.dump gets written to, such as:
ERL_CRASH_DUMP=/opt/log/erl_crash.dump
Then configure awslogs inside a .config file under .ebextensions as follows:
files:
"/etc/awslogs/config/stdout.conf":
mode: "000755"
owner: root
group: root
content: |
[erl_crash.dump]
log_group_name=/aws/elasticbeanstalk/your_app/erl_crash.dump
log_stream_name={instance_id}
file=/var/log/erl_crash.dump
[verbose-error.log]
log_group_name=/aws/elasticbeanstalk/your_app/verbose-error.log
log_stream_name={instance_id}
file=/var/log/verbose-error.log
And ensure that you set a volume to your docker under Dockerrun.aws.json
"Logging": "/var/log",
"Volumes": [
{
"HostDirectory": "/var/log",
"ContainerDirectory": "/opt/log"
}
],
After that, you can inspect your error messages under CloudWatch.
Now, if you are using ElasticBeanstalk(which my example above implicitly implies) with Docker deployment as opposed to AWS ECS, then the logs of std_input are redirected by default to /var/log/eb-docker/containers/eb-current-app/stdouterr.log inside CloudWatch.
The main purpose of erl_crash.dump is to at least know when your application crashed, thereby taking the container down. AWS EB will normally restart the container, thus keeping you ignorant about the restart. This understanding can also be obtained from other docker related logs, and you can configure alarms to listen for them and be notified accordingly when your docker had to restart. But another advantage of logging erl_crash.dump to CloudWatch is that if need be, you can always export it later to S3, download the file and import it inside :observer to do analysis of what went wrong.
If after consulting the logs, you still require a more intimate interaction with your production application, then you need to leverage remsh to your node. If you use distillery, you would configure the cookie and the node name of your production application with your release like this:
inside rel/confix.exs, set cookie:
environment :prod do
set include_erts: false
set include_src: false
set cookie: :"my_cookie"
end
and under rel/templates/vm.args.eex you set variables:
-name <%= node_name %>
-setcookie <%= release.profile.cookie %>
and inside rel/config.exs, you set release like this:
release :my_app do
set version: "0.1.0"
set overlays: [
{:template, "rel/templates/vm.args.eex", "releases/<%= release_version %>/vm.args"}
]
set overlay_vars: [
node_name: "p#127.0.0.1",
]
Then you can directly connect to your production node running inside docker by first ssh-ing inside the EC2-instance that houses the docker container, and run the following:
CONTAINER_ID=$(sudo docker ps --format '{{.ID}}')
sudo docker exec -it $CONTAINER_ID bash -c "iex --name q#127.0.0.1 --cookie my_cookie"
Once inside, you can then try to poke around or if need be, at your own peril inject modified code dynamically of the module you would like to inspect. An easy way to do that would be to create a file inside the container and to invoke a Node.spawn_link target_node, fn Code.eval_file(file_name, path) end
In the case your production node is already running and you do not know the cookie, you can go inside your running container and do a ps aux > t.log and do a cat t.log to figure out what random cookie has been applied and use accordingly.
Docker serves as an impediment to the way epmd is able to communicate with other nodes. The best therefore would be to rather create your own AWS AMI image using Packer and do bare metal deployments instead.
Amazon has recently released a new feature to AWS ECS, AWS VPC Networking Mode, which perhaps may facilitate inter-container epmd communication and thus connecting to your node directly. I have not tried it out as yet, I may be wrong.
In the case that you are running on a provider other than AWS, then figuring out how to get easy access to your remote logs with some SSM agent or some other service is a must.
I would recommend using some sort of exception handling tools, so far I am having great experiences on Sentry.
My Fluent Bit Docker container is adding a timestamp with the local time to the logs that received via STDIN; otherwise all the logs received via rsyslog or journald seem to have a UTC time format.
I have a basic EFK stack where I am running Fluent Bit containers as remote collectors which are forwarding all the logs to a FluentD central collector, which is pushing everything into Elasticsearch.
I've added a filter to the Fluent Bit config file where I have experimented with many ways to modify the timestamp, to no avail. It seems like I am overthinking it; it should be much easier to modify the timestamp.
These are all the ways I've tried to modify the timestamp with the fluent-bit.conf filter
[FILTER]
Name record_modifier
Match_Regex ^(?!log.*).*$ ## only match the input received via stdin
Tag log.stdout ## tag to mark input received via stdin
Add sourcetype timestamp ## tried to add timestamp from lua script
Parser docker ## tried to use docker parser for timestamp
Time_key utc ## tried to add timestamp as a key
script test.lua ## sample lua script from fluentbit docs
call cb_print ## call a function from within lua script
What is the de facto method to make all the timestamps uniform to UTC? Any help or suggestion is appreciated.
The way it works is that the docker parser extracts the content of 'log' and respect the timestamp defined by docker.
One quick workaround would be to modify your parsers.conf and make sure the docker parser does not resolve the timestamp, on that way Fluent Bit will assign the current time in UTC for you.