Fluentd to format Java exception log in single line - fluentd

I have a java application deployed in GCP .
Pointed that log file to Stackdriver logging using fluentd .
Java exception log was formatted in a separate line. So Stackdriver logging unable to capture it as error/ warning .
I need to format my java application exception trace in a single line .
Need to differentiate info , error , warning .
My fluentd configurations :
<source>
type tail
format none
path /home/app/*-local-app-output.log
pos_file /var/lib/google-fluentd/pos/local-app.pos
read_from_head true
tag local-app
</source>
Also tried with
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
time_format %b %d %H:%M:%S
Current displaying output:
Whereas Deploying the same application in Kubernetes engine , it has separate log Category field as : info , warn , error , critical .
Can any one help me on this ?

Related

fluentd TimeParser Error - Invalid Time Format

I'm trying to get some Cisco Meraki MX firewalls logs pointed to our Kubernetes cluster using fluentd pods. I'm using the #syslog source plugin, and able to get the logs generated, but I keep getting this error
2022-06-30 16:30:39 -0700 [error]: #0 invalid input data="<134>1 1656631840.701989724 838071_MT_DFRT urls src=10.202.11.05:39802 dst=138.128.172.11:443 mac=90:YE:F6:23:EB:T0 request: UNKNOWN https://f3wlpabvmdfgjhufgm1xfd6l2rdxr.b3-4-eu-w01.u5ftrg.com/..." error_class=Fluent::TimeParser::TimeParseError error="invalid time format: value = 1 1656631840.701989724 838071_ME_98766, error_class = ArgumentError, error = string doesn't match"
Everything seems to be fine, but it seems as though the Meraki is sending it's logs in Epoch time, and the fluentd #syslog plugin is not liking it.
I have a vanilla config:
<source>
#type syslog
port 5140
tag meraki
</source>
Is there a way to possibly transform the time strings to something fluentd will like? Or what am I missing here.

Fluentd Parsing error with JSON and Non JSON ending logs

I have a log file where the consecutive lines are coming in different format. one is with JSON object at the end, other is without JSON
Type 1:
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service'
Type 2: (With Json in the end)
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service' {'key':'value'}
Using the below code in my fluentd.conf to parse the log file
#type tail path /root/logs/my-service.log pos_file
/root/logs/my-service.log.pos tag my-service.log #type
multiline format_firstline /^\d{4}-\d{1,2}-\d{1,2}/ format1
/^(?\d{4}-\d{1,2}-\d{1,2}
\d{1,2}:\d{1,2}:\d{1,2})\s+[(?\w*?(?=/))/(?[^\s])]\s+[(?[^\s]+)]\s+(?[^\s]+)\s+(?.+?(?=[?{)|.)(?.*)/
This format works for the TYPE 2 log format mentioned above, but throwing error with TYPE 1.
As belwo
0 dump an error event: error_class=ArgumentError error="params does not exist"
Since JSON is not available at the end.
How do I handle such scenario, this would be really helpful to fix some ongoing issue in my EFK stack.

fluentd - how to source log file name with timestamp

fluentd - how to source log file name with timestamp
e.g. Catalina logs are generated with timestamp e.g.
catalina.2018-11-05.log
catalina.2018-12-03.log
catalina.2018-12-10.log
I would like fluentd to access latest log file based on the timestamp in file name. Can you suggest what the source path should look like in td-agent.conf
<source>
#type tail
path D:\apache-tomcat-9.0.12\logs\catalina.**[TODAY]**.log
pos_file C:\opt\td-agent\javalogs.log.pos
tag javalogs
<parse>
#type json
</parse>
</source>
<match javalogs>
#type stdout
</match>
Try below path syntax.
path D:\apache-tomcat-9.0.12\logs\catalina.%Y-%m-%d.log
Note - Make sure your files are created according to same timezone as fluentd agent process so it can properly tail the correctly created files. Also, fluentd process should have correct read permissions on catalina files.

google-fluentd : change severity in Cloud Logging log_level

We are running spark jobs (lot of spark streaming) on Google cloud Dataproc clusters.
we are using cloud logging to collect all the logs generated by spark jobs.
currently it is generating lot of "INFO" messages which causes the whole log volumes to size of few TBs.
I want to edit the google-fluentd config to restrict the log level to "ERROR" level instead of "INFO".
tried to set the config as "log_level error" , but did not work.
also its mentioned in the comment section in /etc/google-fluentd/google-fluentd.conf as # Currently severity is a seperate field from the Cloud Logging log_level.
# Fluentd config to tail the hadoop, hive, and spark message log.
# Currently severity is a seperate field from the Cloud Logging log_level.
<source>
type tail
format multi_format
<pattern>
format /^((?<time>[^ ]* [^ ]*) *(?<severity>[^ ]*) *(?<class>[^ ]*): (?<message>.*))/
/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf time_format %Y-%m-%d %H:%M:%S,%L
</pattern>
<pattern>
format none
</pattern>
path /var/log/hadoop*/*.log,/var/log/hadoop-yarn/userlogs/**/stderr,/var/log/hive/*.log,/var/log/spark/*.log,
pos_file /var/tmp/fluentd.dataproc.hadoop.pos
refresh_interval 2s
read_from_head true
tag raw.tail.*
</source>
Correct. As the comment states, #log_level and severity are not the same, which is confusing at best. #log_level configures the verbosity for the logger of the component, whereas severity is the field that Stackdriver Logging ingests.
In order to make fluentd exclude any severity below ERROR you can add a grep filter to /etc/google-fluentd/google-fluentd.conf that explicitly excludes these by name.
At some point before the <match **> block add the following:
<filter raw.tail.**>
#type grep
exclude1 severity (DEBUG|INFO|NOTICE|WARNING)
</filter>
Which will check the record for the severity field and reject it if the value matches the regex.

fluentd not capturing milliseconds for time

I am using fluentd version 0.14.6. I want to have the milliseconds (or better) captured by fluentd and then passed on to ElasticSearch, so that the entries are shown in the correct order. Here is my fluentd.conf:
<source>
#type tail
path /home/app/rails/current/log/development.log
pos_file /home/app/fluentd/rails.pos
tag rails.access
format /^\[(?<time>\S{10}T\S{8}.\d{3})\] \[(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\] \[(?<requestUUID>\w+)\] \[AWSELB=(?<awselb>\w+)\] (?<message>.*)/
time_format %Y-%m-%dT%H:%M:%S.%L
</source>
<match rails.access>
#type stdout
time_as_integer false
</match>
And here is a sample log entry from Rails
[2016-09-27T19:10:05.732] [xxx.xxx.xxx.46] [46171c9870ab2d06bc3a9a0bb02] [AWSELB=97B1C1B51866B68887CF7F5B8C352C45CA31592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564] Completed 200 OK in 39.8ms (Views: 0.5ms | ActiveRecord: 14.9ms)
This all parses fine, except the milliseconds are dropped. Here is a result from STDOUT
2016-09-27 19:43:56 +0000 rails.access: {"remoteIP":"xxx.xxx.xxx.46","requestUUID":"0238cb3d812534487181b2c54bd20","awselb":"97B1C1B51866B68887CF7F5B8C352C43CA21592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564","message":""}
I have searched SO, but the two posts listed are from a time before this PR, which is supposed to add in milliseconds. It is merged. The PR mentions adding a time_as_integer option, which I have done. I tried setting it to both true and false, as there is some confusion in the PR, but it made no difference. I also tried putting it into the source, but that threw an error.
I also looked at this post, which is trying to get to nano second, which I don't need. It also is not a good solution for me, as the time would then come from fluentd, not Rails.
Thanks for your help !
Your source is properly configured, the milliseconds are available to the output plugin. The stdout output plugin does not output milliseconds code.
You can test the milliseconds availability by using the file output plugin.
<match rails.access>
#type file
path example.out.log
time_format %Y-%m-%dT%H:%M:%S.%L
</match>
The ElasticSearch output plugin takes milliseconds into account source.

Resources