How to handle multiline and single line logs with one Fluend and Elasticsearch - fluentd

In my cluster I have set up Fluentd DaemonSet for Elasticsearch. Its working fine and I can see logs in Kibana. We have different pods/containers. Some are Java, Python, Nginx, etc.
The problem at the moment is, for multiline logs (Java), in case of an exception each line will be pushed to Elasticsearch as a separate log. Instead, it should identify and push multiples lines of a same log as one log statement. I know I can use multiline parse but the problem is on the cluster I have many other containers which do not log multiline logs. For example we have Nginx pods/containers.
I need to configure Fluentd to handle all types of logs, multiline and single line. What is the solution in this case? How should I configure Fluentd to handle both cases?

Related

Docker Apache container logs don't match host bind volume

I have an Apache container (image: httpd:2.4) that I am using in a docker-compose arrangement. I am trying to store the logs on the host so that I can analyze them using AWStats.
To accomplish this, I put the following line in my docker-compose.yml file:
volumes:
# log files
- /path/on/host/apache_logs:/var/log/apache2
And I can see both access.log and error.log in the folder on the host just fine.
The problem is that not all requests are showing up in access.log.
If I inspect the logs on the container (eg. docker logs ...), I can see the log entries for the requests I made (the ones that I was hoping to see in access.log).
Why is there a difference between the two?
And, more importantly, what is the correct approach to get the Apache container to record their logs in a directory that's available on the host?
Update
Based on the comment by #DavidMaze, I can see that the Apache image is configured to send the logs to stdout. So, one solution I am attempting is to just transfer the logs to a file on the host once a day in a cron job based on this answer.
This seems to work okay, but I'm seeing duplicate entries in the log. It's the same data just re-arranged in different formats.
For example, the first line will be in the order of remote host (%h), (%l), userid (%u), time (%t), and then request (%r). And then the second line will be: %t %h, followed by some TLS information (eg. TLSv1.3), and then %r.
The first line is my preferred format (since it obeys the common format from Apache and can be understood by AWStats).
Is there a way to configure docker or Apache to just stick to printing the first line?

How to handle STDOUT logs in K8s?

In a Docker environment my Java-App logs on STDOUT via log4j, the messages will be sent to a Graylog instance. There is no special logging config besides configuring the Console-Appender to use JsonLayout.
My docker-compose.yml snippet:
logging:
driver: gelf
options:
gelf-address: "tcp://[GRAYLOG_HOST]:[PORT]"
tag: "[...]"
Everything works fine there. But we are thinking about changing this environment to K8s.
There will be a Graylog instance in K8s, too. It looks like that there is no K8s equivalent for the docker-compose.yml logging settings. It seems that I have to use some kind of logging agent, e.g. fluent-bit. But the documentation of fluent-bit looks like that it only can collect logs from a log file as input (and some more), but not from STDOUT.
I have the following questions:
Is there another possibility to read the logs directly from STDOUT and send them into Graylog?
If I have to log the log messages into a log file to be read from fluent-bit: Do I have to configure log4j to do some roll-over strategies to prevent, that the log file will be bigger and bigger? I do not want to "waste" my resources "just" for logging.
How do you handle application logs in K8s?
Maybe I misunderstand the logging principles in K8s. Feel free to explain it to me.
Is there another possibility to read the logs directly from STDOUT and send them into Graylog?
Fluent Bit allows for data collection through STDIN. Redirect your application STDOUT to Fluent Bit's STDIN and you are set.
If I have to log the log messages into a log file to be read from fluent-bit: Do I have to configure log4j to do some roll-over strategies to prevent, that the log file will be bigger and bigger? I do not want to "waste" my resources "just" for logging.
In this case you can use logrotate
How do you handle application logs in K8s?
Three possible ways:
Application directly output their traces in external systems (eg. databases).
Sidecar container with embedded logging agent that collect application traces and send them to a store (again database for example).
Cluster-wide centralized logging (eg. ELK stack)
I'd recommend you to use sidecar container for log collection. This is probably most widely used solution.

Forwarding the containers stdout logs to datadog without datadog agents

We re trying to eliminate Datadog agents from our infrastructure. I am trying to find a solution to forward the containers standard output logs to be visualised on datadog but without the agents and without changing the dockerfiles because there are hundreds of them.
I was thinking about trying to centralize the logs with rsyslog but I dont know if its a good idea. Any suggestions ?
This doc will show you a comprehensive list of all integrations that involve log collection. Some of these include other common log shippers, which can also be used to forward logs to Datadog. Among these you'd find...
Fluentd
Logstash
Rsyslog (for linux)
Syslog-ng (for linux, windows)
nxlog (for windows)
That said, you can still just use the Datadog agent to collect logs only (they want you to collect everything with their agent, that's why they warn you against collecting just their logs).
If you want to collect logs from docker containers, the Datadog agent is an easy way to do that, and it has the benefit of adding lots of relevant docker-metadata as tags to your logs. (Docker log collection instructions here.)
If you don't want to do that, I'd look at Fluentd first on the list above -- it has a good reputation for containerized log collection, promotes JSON log formatting (for easier processing), and scales reasonably well.

How to selectively forward the log files to specific indexes in Splunk?

Is it possible to selectively forward the log files to specific indexes in Splunk.
I want to forward a docker container running 3 services logs to Splunk indexer, the problem is that if I use Docker logging driver, all the data written to STDOUT goes to the same index and data segregation is not possible. Instead of that I've setup forwarder and able to send logs but all are going to the same index, I want to configure splunk forwarder to send specific logs to a specific index.
Let me start from the beginning. Running multiple processes in the same container is an anti-pattern. Try to avoid it as much as possible. Kubernetes, for example, have a great solution for your case, where they can deploy two containers in the same Pod and just setup communication between the containers on the same loopback network interface (127.0.0.1), so for the processes, it will look like they are running in the same container. See https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/ for details.
If you still want to have all three processes in the same container, you have two options to get the logs in different indices:
routing on indexer
If you can identify logs on side of indexer, you can forward logs as you usually do and using transforms.conf on the indexer side route them to specific index, see http://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf
[nginx_route]
DEST_KEY = _MetaData:Index
REGEX = nginx .*
FORMAT = index_nginx
avoiding container logs
Another option. You can create a volume for logs, that you will share between your container and forwarder (Splunk Universal Forwarder) or our collector (https://www.outcoldsolutions.com). And in the configuration define to which index you want to forward these logs. In you container you will need to change how you write logs, instead of stdin, write them to the files.

Secure Logging drivers with Docker?

I noticed that the fluentd engine uses the out_forward output to send logs. Meaning all logs are sent in the clear. Is there a way to specify the output type? I'd like to be able to have Docker send logs with out_secure_forward instead.
Are there plans to enable more configuration? Should I use a different logging driver if I want security? Perhaps use the JSON file engine and then use fluentd to ship those securely?
IMO the best option to do what you want is:
introduce an additional docker container (A) to run Fluentd in it
configure your docker containers to send logs (over fluentd log drivers) to that container (A)
send these logs to another site from the fluentd in container (A) by using secure-forward

Resources