My regex parser doesn't seem to work.
I'm guessing it has something to do with the logs coming from Docker and not being escaped.
But I can't get it to work even if I include the Docker parser first.
I've checked it in rubular: https://rubular.com/r/l6LayuI7MQWIUL
fluent-bit.conf
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
[FILTER]
Name grep
Match *
Regex log ^.*{.*}$
[FILTER]
Name parser
Match *
Key_Name log
Parser springboot
[OUTPUT]
Name stdout
Match *
parsers.conf
[PARSER]
Name springboot
Format regex
Regex (?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3}) (?<level>[^ ]*) (?<number>\d*) --- (?<thread>\[[^ ]*) (?<logger>[^ ]*) *: (?<message>[^ ].*)$
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S.%L
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
stout output
[0] docker-container: [1584997354.000000000, {"log"=>"2020-03-23 21:02:34.077 TRACE 1 --- [nio-8080-exec-1] org.zalando.logbook.Logbook : {...}", "container_id"=>"5a1251dcf9de3f0e2b8b7b0bce1d35d9c9285726b477606b6448c7ce9e818511", "container_name"=>"/xxxx", "source"=>"stdout"}]
Thanks
in my case, I'm using the latest version of aws-for-fluent-bit V2.15.0 because I want to save the application logs in cloudwatch and this image comes prepared to handle that.
I didn't use the Kubernetes filter because it adds a lot of things that I can see directly in the cluster, I just need the application logs in cloudwatch for the developers. So I use this amazon provide yaml as a base, only using the INPUT tail for container logs and the container_firstline parser.
As you will see, I create my own filter called "parser" that takes the logs and do a regex. My logs are peculiar because we use in some cases JSON embedded, so at the end, I have 2 types of logs, one whit only text and the other with a JSON inside like this 2:
2021-06-09 15:01:26: a5c35b84: block-bridge-5cdc7bc966-cq44r: clients::63 INFO: Message received from topic block.customer.get.info
2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {"ClientId": 88888, "ServiceNumber": "BBBBBBFA5527", "Status": "AC"}
These 2 types of logs, make me create 2 PARSER type regex, and 1 custom FILTER called parser. The filter matches these 2 types of logs using the parsers (parser_logs and parser_json).
The principal problem was that the JSON part wasn't correctly parsed, always get the JSON part with a backslash(\) to escape the double quotes(") like this:
2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {\"ClientId\": 88888, \"ServiceNumber\": \"BBBBBBFA5527\", \"Status\": "AC"}
the solution was to add the Decode_Field_As that many people say that is not required. In my case, I need it to remove those backslash(). You will see that I use only for the field "additional_message" where I match exactly the JSON.
Finally, here is my config:
.
.
[INPUT]
Name tail
Tag kube.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
Path /var/log/containers/*.log
Docker_Mode On
Docker_Mode_Flush 5
Docker_Mode_Parser container_firstline
Parser docker
DB /var/fluent-bit/state/flb_kube.db
Mem_Buf_Limit 10MB
Skip_Long_Lines Off
Refresh_Interval 10
[FILTER]
Name parser
Match kube.*
Key_Name log
Parser parser_json
Parser parser_logs
.
.
.
[PARSER]
Name parser_logs
Format regex
Regex ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[a-zA-Z0-9 _.,:()'"!¡]*)$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name parser_json
Format regex
Regex ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[^{]*)(?<message_additional>{.*)$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Decode_Field_As escaped_utf8 message_additional do_next
Decode_Field_As escaped message_additional do_next
Decode_Field_As json message_additional
[PARSER]
Name container_firstline
Format regex
Regex (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name docker
Format json
Time_Key #timestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep Off
One thing to keep in mind is that Decode_Field_As needs that the field that will be decode must be entirely a JSON (start with "{" and end with "}"). If it has text, and then a JSON the decode will fail. That's the reason why I have to create 2 PARSER regex.. to match exactly the JSON in some logs inside one unique field called "message_additional".
Here are my new parsed logs in cloudwatch:
{
"environment": "a5c35b84",
"hostname": "block-bridge-5cdc7bc966-qfptx",
"line": "753",
"log_level": "INFO",
"message": "Message received from topic block.customer.get.info",
"module": "block_client",
"time_stamp": "2021-06-15 10:24:38"
}
{
"environment": "a5c35b84",
"hostname": "block-bridge-5cdc7bc966-m5sln",
"line": "64",
"log_level": "INFO",
"message": "Getting ticket(s) using params ",
"message_additional": {
"ClientId": 88888,
"ServiceNumber": "BBBBBBFA5527",
"Status": "AC"
},
"module": "block_client",
"time_stamp": "2021-06-15 10:26:04"
}
Make sure you are using the latest version of Fluent Bit (v1.3.11)
Remove the Decode_Field_As entry from your parsers.conf, is not longer required.
Related
I am running docker swarm on engine version 20.10.17. I find out, some application making long logs (more than 16k) and in Loki, as my logging stack, I cannot parse JSON logs, because docker split logs by 16k strings and insert date in the middle of the log message.
I implement python script, that generates string based on string length and this is my findings (I'll replace repeating character to make it readable).
I message length is 16383:
2022-12-09T11:11:45.750162015Z xxxxxxxxx........xxxxx
I message length is 16384 - this is the limit:
2022-12-09T11:13:15.903855127Z xxxxxxxxx........xxxxx2022-12-09T11:13:15.903855127Z
the date 2022-12-09T11:13:15.903855127Z is the same at the beginning and at the end of message.
I message length is 16385:
2022-12-09T11:14:46.061048967Z xxxxxxxxx........xxxxx2022-12-09T11:14:46.061048967Z x
This is my configuration of docker daemon:
{
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "5m",
"max-file": "5"
}
}
Is there a way how to change the configuration of length of log line? From docs I did not find this option and from source code it looks like its hardcoded. What is the best way how to parse logs properly?
In Loki, I can see whole log as one peace of log, but with dates (e.g. 2022-12-09T11:19:56.575690222Z ) in the middle of the log file (multiple times, based on the length of log line). This is quiet complicated way how to solve it, because that means, I have to check every log processed by Promtail.
tl;dr:
Loki-docker-log-driver -> Loki : ✅ works.
Loki-docker-log-driver -> JSON Decode -> Loki : How?
For my local development, I run several services which log in GELF Format. To get a better overview and time-ordered log stream with filter functionality, I use the loki docker log driver.
The JSON log messages (GELF style) are successfully sent to loki, but I want to get them further processed so that labels are extracted. How can I achieve that?
If you have already sent the logs in JSON format to Loki, all you need to do is to select the desired log stream and pipe it to the "json" parser, like in the following example:
{filename="/var/log/nginx/access.log"} | json
Then, you can use the labels as you wish, like this:
{filename="/var/log/nginx/access.log"} | json | remote_addr="147.741.001.047"
I have a Jenkins job which checks a file version...
This project is parameterized:
Multi-line String Parameter = server
Powershell command
$servers = ($env:servers).Split([Environment]::NewLine, [StringSplitOptions]::RemoveEmptyEntries)
foreach ($server in $servers) {ls "\\$server\c$\update.dll" | % versioninfo}
I trigger it using the URL:
http://MY_JENKINS_SERVER/job/FILE_VERSION/buildWithParameters?servers=10.10.10.1
it works and I get:
ProductVersion FileVersion FileName
-------------- ----------- --------
11.1.1.16 11.1.1.16 \\10.10.10.1\c$\update.dll
But I want to send multiple IP's in the trigger URL, eg
10.10.10.1
10.10.10.2
10.10.10.7
So I output something like:
ProductVersion FileVersion FileName
-------------- ----------- --------
11.1.1.16 11.1.1.16 \\10.10.10.1\c$\update.dll
11.1.1.15 11.1.1.15 \\10.10.10.2\c$\update.dll
11.1.1.15 11.1.1.15 \\10.10.10.7\c$\update.dll
Anyone know how?
Since you are using "Multiline String Parameter", you will have to provide all the required values separated by newline character. Generally, \n is considered for newline, however, you will have to "url-escape" it. Use this %0A as newline character.
So, your POST URL should be something like this :
curl -v -X POST <jenkins_url>/<job_name>/buildWithParameters?server=11.1.1.16%0A11.1.1.15%0A11.1.1.15
More reference on url-escape characters : https://www.w3schools.com/tags/ref_urlencode.asp
I'm using lua script to run redis commands and use {{redis.log()}} in it. It prints the format as mentioned below. But I wanted to change to time format in the log. i.e dd-MM-yyyy HH:mm:ss.SSS instead of default format (dd MMM hh:MM:ss.SSS which I assume)
Format:
[pid] date loglevel message
For instance:
[4018] 14 Nov 07:01:22.119 * Background saving terminated with success
How can I do this?
Regrettably, no, there are no "user serviceable" knobs for this. The output to the log is always preceded by a timestamp in the hardcoded format that's specified in server.c:
off = strftime(buf,sizeof(buf),"%d %b %H:%M:%S.",localtime(&tv.tv_sec));
You can read bytes from system/ports/input, and if you convert them from BINARY! bytes to a STRING! of unicode codepoints you get something coherent:
>> to-string read system/ports/input
Hello
== "Hello^/"
But if you try writing to system/ports/output in Rebol3 you get:
>> write system/ports/output "World"
** Script error: write does not allow none! for its destination argument
The output port is a field in the object, but it's none. Running an ordinary PRINT command that generates output doesn't make the field get set. Where is it?
Also while on the topic, where is the stderr port?
Rebol3 Alpha has not completely implemented the standard output and standard error ports yet.