Stackdriver custom multiline logging, time format - fluentd

I've been trying to set up a custom multiline log parser to get logs into Stackdriver with some readable fields. Currently it looks like this:
<source>
type tail
read_from_head true
path /root/ansible.log
pos_file /var/lib/google-fluentd/pos/ansible.pos
time_format "%a %b %e %T %Z %Y"
format multiline
format_firstline /Started ansible run at/
format1 /Started ansible run at (?<timestart>[^\n]+)\n(?<body>.*)/
format2 /PLAY RECAP.*/
format3 /ok=(?<ok>\d+)\s+changed=(?<changed>\d+)\s+unreachable=(?<unreachable>\d+)\s+failed=(?<failed>\d+).*/
format4 /Finished ansible run at (?<timeend>[^\n]+)/
tag ansible
</source>
It's done to the specifications at http://docs.fluentd.org/v0.12/articles/parser_multiline, and it works. But it works without a proper time stamp - timestart and timeend are just simple fields in the json. So in this current state, the time_format setting is useless, because I don't have a time variable among the regexes. This does aggregate all the variables I need, logs show up in Stackdriver when I run the fluend service, and all is almost happy.
However, when I change one of those time variables' name to time, trying to actually assign a Stackdriver timestamp to the entry, it doesn't work. The fluentd log on the machine says that the worker started and parsed everything, but logs don't show up in the Stackdriver console at all.
timestart and timeend look like Fri Jun 2 20:39:58 UTC 2017 or something along those lines. The time format specifications are at http://ruby-doc.org/stdlib-2.4.1/libdoc/time/rdoc/Time.html#method-c-strptime and I've checked and double checked them too many times and I can't figure out what I'm doing wrong.
EDIT: another detail: when I try to parse out the time variable, while the logs don't show up in the Stackdriver console, the appropriate tag (in this case ansible) shows up in the list of tags. It's just that the results are empty.

You're correct that the Stackdriver logging agent looks for the timestamp in the 'time' field, but it uses Ruby's Time.iso8601 to parse that value (falling back on Time.at on error). The string you quoted (Fri Jun 2 20:39:58 UTC 2017) is not in either of those formats, so it fails to parse it (you could probably see the error in /var/log/google-fluentd/google-fluentd.log).
You could add a record_transformer plugin to your config to change your parsed date to the right format (hint: enable_ruby is your friend). Something like:
<filter foo.bar>
#type record_transformer
enable_ruby
<record>
time ${Time.strptime(record['time'], '%a %b %d %T %Z %Y').iso8601}
</record>
</filter>
should work...

Related

Fluentd regex filter removes other keys

I'm getting a message into fluentd with a few keys already populated from previous stages (fluent-bit on another host). I'm trying to parse the content of the log field as follows:
# Parse app_logs
<filter filter.app.backend.app_logs>
#type parser
key_name log
<parse>
#type regexp
expression /^(?<module>[^ ]*) *(?<time>[\d ,-:]*) (?<severity>[^ ]*) *(?<file>[\w\.]*):(?<function>[\w_]*) (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S,%L
</parse>
</filter>
It works (kind of), as it extracts the fields as expected. That said, it removes all the other fields that were there before.
Example message before the filter:
filter.app.backend.app_logs: {"docker.container_name":"intranet-worker","docker.container_id":"98b7784f27f93a056c05b4c5066c06cb5e23d7eeb436a6e4a66cdf8ff045d29f","time":"2022-06-10T17:00:00.248932151Z","log":"org-worker 2022-06-10 19:00:00,248 INFO briefings.py:check_expired_registrations Checking for expired registrations\n","docker.container_image":"registry.my-org.de/org-it-infrastructure/org-fastapi-backend/backend-worker:v0-7-11","stream":"stdout","docker.container_started":"2022-06-10T14:57:27.925959889Z"}
After the filter, the message looks like this (its a slightly different one, but same stream):
filter.app.backend.app_logs: {"module":"mksp-api","severity":"DEBUG","file":"authToken.py","function":"verify_token","message":"Token is valid, checking permission"}
So only the parsed fields are kept, the rest is removed. Can I somehow use that filter to add the fields to the message, instead of replacing it?
Actually, this scenario is described in the documentation, its not part of the regexp documentation but of the corresponding parser filter documentation:
reserve_data
Keeps the original key-value pair in the parsed result.
Therefore, the following configuration works:
<filter filter.app.backend.app_logs>
#type parser
key_name log
reserve_data true
<parse>
#type regexp
expression /^(?<module>[^ ]*) *(?<time>[\d ,-:]*) (?<severity>[^ ]*) *(?<file>[\w\.]*):(?<function>[\w_]*) (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S,%L
</parse>
</filter>

Can fluent-bit parse multiple types of log lines from one file?

I have a fairly simple Apache deployment in k8s using fluent-bit v1.5 as the log forwarder. My setup is nearly identical to the one in the repo below. I'm running AWS EKS and outputting the logs to AWS ElasticSearch Service.
https://github.com/fluent/fluent-bit-kubernetes-logging
The ConfigMap is here: https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml
The Apache access (-> /dev/stdout) and error (-> /dev/stderr) log lines are both in the same container logfile on the node.
The problem I'm having is that fluent-bit doesn't seem to autodetect which Parser to use, I'm not sure if it's supposed to, and we can only specify one parser in the deployment's annotation section, I've specified apache.
So in the end, the error log lines, which are written to the same file but come from stderr, are not parsed.
Should I be sending the logs from fluent-bit to fluentd to handle the error files, assuming fluentd can handle this, or should I somehow pump only the error lines back into fluent-bit, for parsing?
Am I missing something?
Thanks!
I was able to apply a second (and third) parser to the logs by using the FluentBit FILTER with the 'parser' plugin (Name), like below.
Documented here: https://docs.fluentbit.io/manual/pipeline/filters/parser
[FILTER]
Name parser
Match kube.*
Parser apache_error_custom
Parser apache_error
Preserve_Key On
Reserve_Data On
Key_Name log
Fluentbit is able to run multiple parsers on input.
If you add multiple parsers to your Parser filter as newlines (for non-multiline parsing as multiline supports comma seperated) eg.
[Filter]
Name Parser
Match *
Parser parse_common_fields
Parser json
Key_Name log
The 1st parser parse_common_fields will attempt to parse the log, and only if it fails will the 2nd parser json attempt to parse these logs.
If you want to parse a log, and then parse it again for example only part of your log is JSON. Then you'll want to add 2 parsers after each other like:
[Filter]
Name Parser
Match *
Parser parse_common_fields
Key_Name log
[Filter]
Name Parser
Match *
Parser json
# This is the key from the parse_common_fields regex that we expect there to be JSON
Key_Name log
Here is an example you can run to test this out:
Example
Attempting to parse a log but some of the log can be JSON and other times not.
Example log lines
2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO SomeService:000 - Using decorator records threshold: 0
2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO CompleteOperationLogger:25 - {"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}
parser.conf
[PARSER]
Name parse_common_fields
Format regex
Regex ^(?<timestamp>[^ ]+)\..+ \[(?<log_type>[^ \[\]]+)\] \[(?<transaction_id>[^ \[\]]*)\]\[(?<transaction_id2>[^ \[\]]*)\] (?<level>[^ ]*)\s+(?<service_id>[^ ]+) - (?<log>.+)$
Time_Format %Y-%m-%dT%H:%M:%S
Time_Key timestamp
[PARSER]
Name json
Format json
fluentbit.conf
[SERVICE]
Flush 1
Log_Level info
Parsers_File parser.conf
[INPUT]
NAME dummy
Dummy {"log": "2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO AnonymityService:245 - Using decorator records threshold: 0"}
Tag testing.deanm.non-json
[INPUT]
NAME dummy
Dummy {"log": "2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO CompleteOperationLogger:25 - {\"action\":\"Complete\",\"operation\":\"healthcheck\",\"result\":{\"outcome\":\"Succeeded\"},\"metrics\":{\"delayBeforeExecution\":0,\"duration\":0},\"user\":{},\"tracking\":{}}"}
Tag testing.deanm.json
[Filter]
Name Parser
Match *
Parser parse_common_fields
Key_Name log
[Filter]
Name Parser
Match *
Parser json
Key_Name log
[OUTPUT]
Name stdout
Match *
Results
After the parse_common_fields filter runs on the log lines, it successfully parses the common fields and either will have log being a string or an escaped json string
First Pass
[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"log_type"=>"http-nio-8080-exec-3", "level"=>"INFO", "service_id"=>"CompleteOperationLogger:25", "log"=>"{"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}"}]
Once the Filter json parses the logs, we successfully have the JSON also parsed correctly
Second Pass
[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"action"=>"Complete", "operation"=>"healthcheck", "result"=>{"outcome"=>"Succeeded"}, "metrics"=>{"delayBeforeExecution"=>0, "duration"=>0}, "user"=>{}, "tracking"=>{}}]
Didn't see this for FluentBit, but for Fluentd:
https://github.com/fluent/fluentd-kubernetes-daemonset
https://github.com/repeatedly/fluent-plugin-multi-format-parser#configuration
Note format none as the last option means to keep log line as is, e.g. plaintext, if nothing else worked.
You can also use FluentBit as a pure log collector, and then have a separate Deployment with Fluentd that receives the stream from FluentBit, parses, and does all the outputs. Use type forward in FluentBit output in this case, source #type forward in Fluentd. Docs: https://docs.fluentbit.io/manual/pipeline/outputs/forward

Getting the start time of an LSF job

One can use bjobs to get the start time of an LSF job as such:
bjobs -noheader -o start_time $JOB
However, this returns low-fidelity (e.g., seconds are not necessarily included) human readable output. Higher (but not necessarily "full", I believe) fidelity output can be parsed out of:
bjobs -l $JOB
...but that's rather messy. Also, as I alluded to, I think the output can still be ambiguous; the year doesn't always seem to be included here and I don't recall seeing any time zone information.
How can I get an LSF job's start time deterministically and unambiguously (say, as a Unix epoch)?
IMHO the seconds are always included in start_time and the timestamp is exactly the same as the one reported by bjobs -l $JOBID.
To show the year in the date, you have to set LSB_DISPLAY_YEAR=Y in lsf.conf. This is not set by default in LSF. Don't forget to run lsadmin reconfig;badmin mbdrestart after having modified lsf.conf.
You can convert dates with date:
date --date "`bjobs -noheader -o start_time $JOBID`" +"%s"
or
date --date "$(bjobs -noheader -o start_time $JOBD)" +"%s"
if you prefer.

new_git_repository shallow_since field format

I've a new_git_repository containing:
new_git_repository(
name = "hyperscan",
build_file = "//external-deps/hyperscan:BUILD",
commit = "[COMMIT_HASH]",
remote = "https://github.com/intel/hyperscan.git",
shallow_since = "2018-07-09",
)
When building it says:
DEBUG: Rule 'hyperscan' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1531154744 -0400"
According to this, shouldn't the shallow_since format be of YYYY-MM-DD?
And next, what does shallow_since = "1531154744 -0400" mean?!
Bazel does not process the string specified as shallow_since attribute and passes it directly to git as --shallow-since parameter. It can be seen in Bazel source code here.
The value you see is Git internal date format which is <unix timestamp> <time zone offset>, where <unix timestamp> is the number of seconds since the UNIX epoch. <time zone offset> is a positive or negative offset from UTC. For example CET (which is 1 hour ahead of UTC) is +0100.
Here is the tool for unix timestamp conversion to the human-readable date/time and back.
Bazel uses git log --date=raw to get the timestamp of the commit, and then does a string comparison with the value of shallow_since. In my opinion, it is a bug in Bazel - it should do a date comparison instead.
As specified in the comments, you can use git log --date=raw to get the commit sha and the time (shallow_since) of the desired commit.

Parsing string timestamp with time zone in 3-digit format followed by 'Z'

In the Hadoop infrastructure (Java-based) I am getting timestamps as string values in this format:
2015-10-01T04:22:38:208Z
2015-10-01T04:23:35:471Z
2015-10-01T04:24:33:422Z
I tried different patters following examples for SimpleDateFormat Java class without any success.
Replaced 'T' with ' ' and 'Z' with '', then
"yyyy-MM-dd HH:mm:ss:ZZZ"
"yyyy-MM-dd HH:mm:ss:zzz"
"yyyy-MM-dd HH:mm:ss:Z"
"yyyy-MM-dd HH:mm:ss:z"
Without replacement,
"yyyy-MM-dd'T'HH:mm:ss:zzz'Z'"
In fact, this format is not listed among examples. What should I do with it?
Maybe those 3 digits are milliseconds, and time is in UTC, like this: "yyyy-MM-dd'T'HH:mm:ss.SSSZ"? But it still should look like "2015-11-27T10:50:44.000-08:00" as standardized format ISO-8601.
Maybe, this format is not parsed correctly in the first place?
I use Ruby, Python, Pig, Hive to work with it (but not Java directly), so any example helps. Thanks!
I very strongly suspect the final three digits are nothing to do with time zones, but are instead milliseconds, and yes, the Z means UTC. It's a little odd that they're using : instead of . as the separator between seconds and milliseconds, but that can happen sometimes.
In that case you want
"yyyy-MM-dd'T'HH:mm:ss:SSSX"
... or use
"yyyy-MM-dd'T'HH:mm:ss:SSS'Z'"
and set your SimpleDateFormat's time zone to UTC explicitly.

Resources