Is there a way to send files AS-IS via Fluentd? - fluentd

I'm trying to use using Fluentd to aggregate log files from various servers. And by default it parses the log lines in various ways (and I can see the value in doing that) but in my current situation I would like to send the files AS-IS, without parsing and without changing a thing.
I'm using the in_tail plugin with the following configurations:
<source>
type tail
format none
read_from_head true
path /path/to/logs/*.log
pos_file /path/to/logs/pos_file
tag mylog
</source>
And even this none format parses the logs. For example
I am a line of log
gets parsed as
{"message":"hello world. I am a line of log!"}
I guess the question is: Is there a way for it to send the tail content, without altering anything?
Thanks!

Well, all messages in fluentd will be handled as JSON objects but what you could do is on the receiving end match with a file output (out_file) and that would basically just create log files on the receiving end with the same content as the source.
http://docs.fluentd.org/articles/out_file
You could even "hack" it to output with the format csv and set the delimiter to a whitespace. That could also work...

Related

Fluentd removes log entry after applying json parser

I got 2 docker containers using Fluentd as log driver. Both send valid JSON messages. Here are examples of them:
{"tag":"docker/article-api","log":"{\"level\":\"debug\",\"port\":\":80\",\"time\":\"2020-02-17T17:06:46Z\",\"message\":\"starting the server\"}"}
{"log":"{\"level\":\"info\",\"ts\":1581959205.461808,\"caller\":\"apiserv/main.go:69\",\"msg\":\"Service is ready to listen\"}","tag":"docker/user-api"}
They are quite different, but I am sure both are valid.
As we use Stackdriver logging, I'd like to add the "severity" field equal to the value of level.
Here's the part of the config file, that creates all the confusion.
<filter **>
#type parser
key_name log
replace_invalid_sequence true
<parse>
#type json
</parse>
</filter>
And here's the problem itself. After passing through the filter, the first log entry message is completely removed, while the second one got passed through.
I've tried to specify time_format, but it doesn't seem to work at all.
Aside from that, I've tried to use filter docker**, but it removes all the useful entries instead. It has nothing to do with it, but if you got an idea on what has caused it, I'll appreciate this
Thank you in advance
P.S. I'm using the google-fluentd service if it does make a difference.

Apache Beam ReadFromText() pattern match returns no results

I'm writing an Apache Beam pipeline in python and trying to load multiple text files but encounter an error when using the pattern match. When I pass in an exact filename, the pipeline runs correctly.
For example:
files = p | 'Read' >> ReadFromText('lyrics.txt')
However, when using pattern match an error occurs:
files = p | 'Read' >> ReadFromText('lyrics*')
IOError: No files found based on the file pattern
In this example, I have several files that start with "lyrics".
I've tried many different pattern types but haven't had any success with anything except passing the complete file name. Is there a different way to apply pattern match in this case?
Updated with answer
If you're on Windows don't forget to use a backslash instead of forward slash when specifying directories. For example: ReadFromText('.\lyrics*')
This looks like a bug. I've filed https://issues.apache.org/jira/browse/BEAM-7560. In the meantime, try an absolute path or ReadFromText('./lyrics*').

How to redirect stdout to file in Lua?

I'm trying to redirect stdout in Lua (5.1) to a file instead of to console.
There is a third-party API (which I cannot modify) containing a function that prints out a serialized set of data (I do not know which function does the printing, assume some sort of print())
This data is far too verbose to fit on the screen I have to work with (which cannot be scrolled) so I want to have the function's output be directed to a file instead of to console.
I do not have the ability to patch or manipulate Lua versions.
My thought was to change stdout to a file using the poorly documented io.output() file, but this does not seem to work at all.
io.output("foo") -- creates file "foo", should set stdout to "foo"?
print("testing. 1, 2, 3") -- should print into "foo", goes to console instead
Does anyone know of any way to force a functions output into a file, or force all stdout into a file instead of console? TIA.
You need to use io.write method instead of print. It works in a similar way, but doesn't separate parameters with a tab. io.write respects io.output, but print doesn't.
-- save, might need to restore later
real_stdout = io.stdout
file = io.open ('stdout.log', 'w')
io.stdout = file
.... -- call external API
-- restore
io.stdout = real_stdout

Indexing and Parsing XML files with ElasticSearch

I need to index multiple XML files under multiple directories into ElasticSearch and parse them into JSON format, possibly adding some tags. Is it possible to be done with ElastichSearch and Logstash, and if so how can I do it?
Thank you!
It is possible. Point logstash to your XML files and use tagging to tag different files differently to determine how they will be handled by Logstash down the road. Inside of Logstash, you can set up filters to add tags, and other fields, and in the output portion of logstash you can specify what files gets added to what index inside of elasticsearch

How to handle multiline log entries in Flume

I have just started playing with Flume. I have a question on how to handle log entries that are multiline, as a single event. Like stack traces during error conditions.
For example, treat the below as a single event rather than one event for each line
2013-04-05 05:00:41,280 ERROR (ClientRequestPool-PooledExecutionEngine-Id#4 ) [com.ms.fw.rexs.gwy.api.service.AbstractAutosysJob] job failed for 228794
java.lang.NullPointerException
at com.ms.fw.rexs.core.impl.service.job.ReviewNotificationJobService.createReviewNotificationMessageParameters(ReviewNotificationJobService.java:138)
....
I have configured the source to a spooldir type.
Thank You
Suman
As documentation states, spooldir source creates a new event for each string of characters separated by a newline in input data. You can modify this behaviour by creating your own sink (see http://flume.apache.org/FlumeDeveloperGuide.html#sink) based on code of spooldir source. You'll need to implement parsing algorithm that will be able do detect the start and the end line of message based on some criteria.
Also, there are other sources, such as Syslog UDP and Avro, that treat an entire received message as a single event, so you can use it without any modifcation.
You'll want to look into extending the line deserializer used by spool source, one simple (but potentially flawed) approach would be delimit on newlines, but combine lines that are prefixed with a set number of spaces to the previous line.
In fact there is already a Jira issue for this with a patch:
https://issues.apache.org/jira/browse/FLUME-2779

Resources