Fluentd Parsing error with JSON and Non JSON ending logs - fluentd

I have a log file where the consecutive lines are coming in different format. one is with JSON object at the end, other is without JSON
Type 1:
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service'
Type 2: (With Json in the end)
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service' {'key':'value'}
Using the below code in my fluentd.conf to parse the log file
#type tail path /root/logs/my-service.log pos_file
/root/logs/my-service.log.pos tag my-service.log #type
multiline format_firstline /^\d{4}-\d{1,2}-\d{1,2}/ format1
/^(?\d{4}-\d{1,2}-\d{1,2}
\d{1,2}:\d{1,2}:\d{1,2})\s+[(?\w*?(?=/))/(?[^\s])]\s+[(?[^\s]+)]\s+(?[^\s]+)\s+(?.+?(?=[?{)|.)(?.*)/
This format works for the TYPE 2 log format mentioned above, but throwing error with TYPE 1.
As belwo
0 dump an error event: error_class=ArgumentError error="params does not exist"
Since JSON is not available at the end.
How do I handle such scenario, this would be really helpful to fix some ongoing issue in my EFK stack.

Related

fluentd TimeParser Error - Invalid Time Format

I'm trying to get some Cisco Meraki MX firewalls logs pointed to our Kubernetes cluster using fluentd pods. I'm using the #syslog source plugin, and able to get the logs generated, but I keep getting this error
2022-06-30 16:30:39 -0700 [error]: #0 invalid input data="<134>1 1656631840.701989724 838071_MT_DFRT urls src=10.202.11.05:39802 dst=138.128.172.11:443 mac=90:YE:F6:23:EB:T0 request: UNKNOWN https://f3wlpabvmdfgjhufgm1xfd6l2rdxr.b3-4-eu-w01.u5ftrg.com/..." error_class=Fluent::TimeParser::TimeParseError error="invalid time format: value = 1 1656631840.701989724 838071_ME_98766, error_class = ArgumentError, error = string doesn't match"
Everything seems to be fine, but it seems as though the Meraki is sending it's logs in Epoch time, and the fluentd #syslog plugin is not liking it.
I have a vanilla config:
<source>
#type syslog
port 5140
tag meraki
</source>
Is there a way to possibly transform the time strings to something fluentd will like? Or what am I missing here.

Fluentd to format Java exception log in single line

I have a java application deployed in GCP .
Pointed that log file to Stackdriver logging using fluentd .
Java exception log was formatted in a separate line. So Stackdriver logging unable to capture it as error/ warning .
I need to format my java application exception trace in a single line .
Need to differentiate info , error , warning .
My fluentd configurations :
<source>
type tail
format none
path /home/app/*-local-app-output.log
pos_file /var/lib/google-fluentd/pos/local-app.pos
read_from_head true
tag local-app
</source>
Also tried with
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
time_format %b %d %H:%M:%S
Current displaying output:
Whereas Deploying the same application in Kubernetes engine , it has separate log Category field as : info , warn , error , critical .
Can any one help me on this ?

Graylog 2.2.0-beta.1 in Docker with UDP input: Unable to load default stream

I'm trying to use graylog2 to collect logs from docker containers. Docs says that only UDP GELF input is supported for this purpose.
I'm using docker-compose to run the graylog server. See gist for all files used: https://gist.github.com/olegabr/7f5190c453bb63c71dabf151d2373c2f.
And I'm using this command to test it:
sendip -p ipv4 -is 127.0.0.1 -p udp -us 5070 -ud 12201 -d '{"version": "1.1","host":"example.org","short_message":"Short message","full_message":"Backtrace here\n\nmore stuff","level":1,"_user_id":9001,"_some_info":"foo","_some_env_var":"bar"}' -v 127.0.0.1
Server receives this message, but it can not process it. I see following in the graylog2 logs:
2016-12-09 11:53:20,125 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 1 times, retrying every 500ms. Processing is blocked until this succeeds.
2016-12-09 11:53:25,129 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 11 times, retrying every 500ms. Processing is blocked until this succeeds.
e.t.c. many many similar lines.
The API call curl http://admin:123456#127.0.0.1:9000/api/count/total returns
{"events":0}
In the server logs I see that the default stream was initialized:
mongo_1 | 2016-12-09T11:51:12.522+0000 I INDEX [conn3] build index on: graylog.pipeline_processor_pipelines_streams properties: { v: 2, unique: true, key: { stream_id: 1 }, name: "stream_id_1", ns: "graylog.pipeline_processor_pipelines_streams" }
graylog_1 | 2016-12-09 11:51:13,408 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical, running forever.
graylog_1 | 2016-12-09 11:51:13,424 INFO : org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration - Legacy default stream has no connections, no migration needed.
graylog_1 | 2016-12-09 11:51:13,487 INFO : org.graylog2.migrations.V20160929120500_CreateDefaultStreamMigration - Successfully created default stream: All messages
graylog_1 | 2016-12-09 11:51:13,653 INFO : org.graylog2.migrations.V20161125142400_EmailAlarmCallbackMigration - No streams needed to be migrated.
graylog_1 | 2016-12-09 11:51:13,662 INFO : org.graylog2.migrations.V20161125161400_AlertReceiversMigration - No streams needed to be migrated.
graylog_1 | 2016-12-09 11:51:13,672 INFO : org.graylog2.migrations.V20161130141500_DefaultStreamRecalcIndexRanges - Cluster not connected yet, delaying migration until it is reachable.
So, why it can not be loaded when the message arrives? Why it is needed in the first place?
I've tried to find similar reports in web but with no success.
This has nothing to do with the UDP input per se.
Graylog 2.2.0-beta.1 is broken and shouldn't be used. Please downgrade to Graylog 2.1.2 (the latest stable version) or wait for Graylog 2.2.0-beta.2.
See https://groups.google.com/forum/#!searchin/graylog2/docker|sort:date/graylog2/gCycC3_K3vU/EL-Lz_uNDQAJ for a related post on the Graylog mailing list.
same trouble
just setup graylog and configure input gelf udp 12209 port
then test it twice by:
docker run --log-driver=gelf --log-opt gelf-address=udp://127.0.0.1:12209 busybox echo Hello Graylog
in UI i saw:
2 messages in process buffe
2 unprocessed messages are currently in the journal, in 1 segments.
0 messages have been appended in the last second, 0 messages have been read in the last second.
and still getting:
2016-12-09 12:41:23,715 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/584aa67308813b00010d009e] is now RUNNING
2016-12-09 12:41:43,666 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 1 times, retrying every 500ms. Processing is blocked until this succeeds.
anyone have found solution ?

fluentd not capturing milliseconds for time

I am using fluentd version 0.14.6. I want to have the milliseconds (or better) captured by fluentd and then passed on to ElasticSearch, so that the entries are shown in the correct order. Here is my fluentd.conf:
<source>
#type tail
path /home/app/rails/current/log/development.log
pos_file /home/app/fluentd/rails.pos
tag rails.access
format /^\[(?<time>\S{10}T\S{8}.\d{3})\] \[(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\] \[(?<requestUUID>\w+)\] \[AWSELB=(?<awselb>\w+)\] (?<message>.*)/
time_format %Y-%m-%dT%H:%M:%S.%L
</source>
<match rails.access>
#type stdout
time_as_integer false
</match>
And here is a sample log entry from Rails
[2016-09-27T19:10:05.732] [xxx.xxx.xxx.46] [46171c9870ab2d06bc3a9a0bb02] [AWSELB=97B1C1B51866B68887CF7F5B8C352C45CA31592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564] Completed 200 OK in 39.8ms (Views: 0.5ms | ActiveRecord: 14.9ms)
This all parses fine, except the milliseconds are dropped. Here is a result from STDOUT
2016-09-27 19:43:56 +0000 rails.access: {"remoteIP":"xxx.xxx.xxx.46","requestUUID":"0238cb3d812534487181b2c54bd20","awselb":"97B1C1B51866B68887CF7F5B8C352C43CA21592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564","message":""}
I have searched SO, but the two posts listed are from a time before this PR, which is supposed to add in milliseconds. It is merged. The PR mentions adding a time_as_integer option, which I have done. I tried setting it to both true and false, as there is some confusion in the PR, but it made no difference. I also tried putting it into the source, but that threw an error.
I also looked at this post, which is trying to get to nano second, which I don't need. It also is not a good solution for me, as the time would then come from fluentd, not Rails.
Thanks for your help !
Your source is properly configured, the milliseconds are available to the output plugin. The stdout output plugin does not output milliseconds code.
You can test the milliseconds availability by using the file output plugin.
<match rails.access>
#type file
path example.out.log
time_format %Y-%m-%dT%H:%M:%S.%L
</match>
The ElasticSearch output plugin takes milliseconds into account source.

Flume agentSink "Unable to load output format plugin class"

I'm getting the following error and I have no idea why. If I change the sink to "console", it works fine. I'm just trying to recreate an example from the flume documentation except across two different nodes. This is using CDH3.
2011-10-20 17:41:13,046 [main] WARN text.FormatFactory: Unable to load output format plugin class - Class not found
2011-10-20 17:41:13,065 [main] INFO agent.FlumeNode: Loading spec from command line: 'foo:console|agentSink("somehost",35853);'
2011-10-20 17:41:13,228 [main] WARN agent.FlumeNode: Caught exception loading node:null
I'm trying to run flume as such:
flume node_nowatch -1 -s -n foo -c 'foo:console|agentSink("somehost",35853);'
Thanks in advance.

Resources