How to parse my string o/p of logs to JSON in fluentd

How to parse my string o/p of logs to JSON in fluentd - fluentd

log": "2022-12-15 09:52:52 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter, Fluent::Plugin::RecordTransformerFilter] uses #filter_stream method.\n",
```
I want to parse this in json format in fluentd for ex:
log:{
timestamp: 2022-12-15 09:52:52
}
```

Related

import and Parse syslog file to elastisearch

I need to import a set of SYSLOG files to elasticsearch. I'am using a filebeat agent.
I succeeded the data importation, however the data in elasticsearch is not parsed.
This is the input file:
Feb 14 03:43:40 my_host_name run-parts(/etc/cron.daily)[1544] finished rhsmd
Feb 14 03:43:40 my_host_name anacron[240673]: Job `cron.daily' terminated (produced output)
Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)
Feb 14 03:43:41 my_host_name postfix/pickup[241860]: 7E8CFC00BB50: uid=0 from=<root>
I work on the 7.15.2 version of Filebeat and Elasticsearch. I get an index output with the field message not parsed. That contain for example the hole line " Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)".
On the versions 8.0 there is a processor option to add to the configuration file that parse this field:
processors:
- syslog:
field: message
However in the version 7.15.2 this option is not available.
How can I parse this Field in the Filebeat configuration ?
Thank you for your help.

What you could do is either use the dissect or script processors to parse the values according to your needs. Not saying this is the best option, but it is an option

fluentd TimeParser Error - Invalid Time Format

I'm trying to get some Cisco Meraki MX firewalls logs pointed to our Kubernetes cluster using fluentd pods. I'm using the #syslog source plugin, and able to get the logs generated, but I keep getting this error
2022-06-30 16:30:39 -0700 [error]: #0 invalid input data="<134>1 1656631840.701989724 838071_MT_DFRT urls src=10.202.11.05:39802 dst=138.128.172.11:443 mac=90:YE:F6:23:EB:T0 request: UNKNOWN https://f3wlpabvmdfgjhufgm1xfd6l2rdxr.b3-4-eu-w01.u5ftrg.com/..." error_class=Fluent::TimeParser::TimeParseError error="invalid time format: value = 1 1656631840.701989724 838071_ME_98766, error_class = ArgumentError, error = string doesn't match"
Everything seems to be fine, but it seems as though the Meraki is sending it's logs in Epoch time, and the fluentd #syslog plugin is not liking it.
I have a vanilla config:
<source>
#type syslog
port 5140
tag meraki
</source>
Is there a way to possibly transform the time strings to something fluentd will like? Or what am I missing here.

Rsyslog collect logs from different timezones

Im using rsyslog on server to collect logs from remote hosts.
Collect server config:
# timedatectl
Local time: Wed 2022-04-27 16:02:43 MSK
Universal time: Wed 2022-04-27 13:02:43 UTC
RTC time: n/a
Time zone: Europe/Moscow (MSK, +0300)
System clock synchronized: yes
NTP service: inactive
RTC in local TZ: no
# cat /etc/rsyslog.d/20_external.conf
$CreateDirs on
$PreserveFQDN on
# provides UDP syslog reception
module(load="imudp")
input(type="imudp" port="514")
# provides TCP syslog reception
module(load="imtcp")
input(type="imtcp" port="514")
template(
name="external"
type="string"
string="/var/log/external/%HOSTNAME%/%syslogfacility-text%.%programname%.%syslogseverity-text%.log"
)
action(
type="omfile"
dirCreateMode="0775"
FileCreateMode="0644"
dynaFile="external"
)
On remote host
# timedatectl
Local time: Wed 2022-04-27 13:04:03 UTC
Universal time: Wed 2022-04-27 13:04:03 UTC
RTC time: n/a
Time zone: UTC (UTC, +0000)
System clock synchronized: yes
NTP service: inactive
RTC in local TZ: no
# cat /etc/rsyslog.d/10-external.conf
*.* #rserver
# logger "hello, local time $(date)"
And get on rsyslogserver:
cat /var/log/external/ruser.home.xmu/user.root.notice.log
2022-04-27T13:07:06+03:00 ruser.home.xmu root: hello, local time 2022-04-27T13:07:06 UTC
# date
2022-04-27T16:08:56 MSK
What i can do for change time zone settings for some remote hosts on collect-server?
When i reserch incedents from all servers the time does not match in logs. I want the time on the collector in the logs to be in his time zone.
2022-04-27T16:07:06+03:00 ruser.home.xmu root: hello, local time 2022-04-27T13:07:06 UTC

You can define the timezone in rsyslog on the client - which in my opinion is the cleaner solution.
In /etc/rsyslog.conf do the following:
Comment/remove the current template
# Use default timestamp format
#$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
Then add the timezone, as well as a custom log template:
timezone(id="MSK" offset="+03:00")
# Custom time format
$template myTemplate,"%timegenerated% %HOSTNAME% %syslogtag%%msg%\n"
$ActionFileDefaultTemplate myTemplate
However, if you can't access the remote client which is sending the logs, it's possible to use the timestamp when the log was received on the server.
$template myTemplate,"%timegenerated% %HOSTNAME% %syslogtag%%msg%\n"
ruleset(name="myRuleset"){
$ActionFileDefaultTemplate myTemplate
# Do some other stuff
}
module(load="imtcp")
input(type="imtcp" port="5000" ruleset="myRuleset")
module(load="imudp")
input(type="imudp" port="5000" ruleset="myRuleset")
NOTE: Don't forget to restart the rsyslog service after applying the changes.
sudo service rsyslog restart
EDIT:
Creating a template using the advanced syntax would look like the following:
template (name="myTemplate" type="string"
string="%timegenerated% %HOSTNAME% %syslogtag%%msg%\n")
The string is the actual template of the messages that should be logged, not the destination to which the messages should be logged.

I ran into a similar problem recently. I think the simplest way to synchronize data coming from multiple sources is to use UTC timestamps everywhere, this way you avoid any timezone and summer time/winter time problems.
I think the problem here (and the problem I had) is that the syslog ouput uses the rfc3164 format which does not include the timezone information:
<6>Feb 28 12:00:00 192.168.0.1 fluentd[11111]: [error] Hello!
There's no way of knowing at what time that message actually happened without the timezone information. Since rsyslog version 8.18.0 what you can do though is define custom rsyslog templates that print the timestamps in UTC. If you use UTC everywhere you don't need that timestamp information anymore.
I'm sending my logs to fluentd, I use the following configuration in my rsyslog.conf file on my hosts:
template(
name="UTCTraditionalForwardFormat"
type="string"
string="<%PRI%>%TIMESTAMP:::date-utc% %HOSTNAME% %syslogtag:1:32%%msg:::sp-if-no-1st-sp%%msg%"
)
*.* action(
type="omfwd"
Target="10.5.0.9"
Port="5145"
Protocol="udp"
Template="UTCTraditionalForwardFormat"
)
The UTCTraditionalForwardFormat here is the same as the default template RSYSLOG_TraditionalForwardFormat (rsyslog templates) but with an added :::date-utc to the TIMESTAMP property.
On your rsyslog that collects logs, looking at the input module's doc there an experimental parameter DefaultTZ which should let you define the source timezone, something like this should work (I haven't tested):
input(
type="imudp"
port="514"
DefaultTZ="+00:00"
)
Assuming this DefaultTZ parameter works, this should work regardless of your hosts timezone.
Little word about fluentd, the fluentd syslog module also accepts the rfc5424 message format which does encode timezone information. I decided not to go this way though because rsyslog default templates does not seem to have a real rfc5424 version (the RSYSLOG_SyslogProtocol23Format template is very close to the rfc5424, but I don't know how close).

Fluentd Parsing error with JSON and Non JSON ending logs

I have a log file where the consecutive lines are coming in different format. one is with JSON object at the end, other is without JSON
Type 1:
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service'
Type 2: (With Json in the end)
2020-01-29 09:38:09 [/] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer | | Tomcat started on port(s): 9021 (http) with context path '/service' {'key':'value'}
Using the below code in my fluentd.conf to parse the log file
#type tail path /root/logs/my-service.log pos_file
/root/logs/my-service.log.pos tag my-service.log #type
multiline format_firstline /^\d{4}-\d{1,2}-\d{1,2}/ format1
/^(?\d{4}-\d{1,2}-\d{1,2}
\d{1,2}:\d{1,2}:\d{1,2})\s+[(?\w*?(?=/))/(?[^\s])]\s+[(?[^\s]+)]\s+(?[^\s]+)\s+(?.+?(?=[?{)|.)(?.*)/
This format works for the TYPE 2 log format mentioned above, but throwing error with TYPE 1.
As belwo
0 dump an error event: error_class=ArgumentError error="params does not exist"
Since JSON is not available at the end.
How do I handle such scenario, this would be really helpful to fix some ongoing issue in my EFK stack.

Implementing default stackdriver behavior in GKE

I am setting up a GKE cluster for an application that has structured json logging that works very well with Kibana. However, I want to use stackdriver instead.
I see that the application's logs are available in stackdriver with the default cluster configurations. The logs appear as jsonpayload but I want more flexibility and configuration and when I do that following this guide, all of the logs for the same application appear only as textpayload. Ultimately, I want my logs to continue to show up in jsonpayload when I use fluentd agent configurations to take advantage of the label_map.
I followed the guide on removing the default logging service and deploying fluentd agent with an existing cluster with the below GKE versions.
Gcloud version info:
Google Cloud SDK 228.0.0
bq 2.0.39
core 2018.12.07
gsutil 4.34
kubectl version info:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.9-gke.5", GitCommit:"d776b4deeb3655fa4b8f4e8e7e4651d00c5f4a98", GitTreeState:"clean", BuildDate:"2018-11-08T20:33:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
gcloud container cluster describe snippet:
addonsConfig:
httpLoadBalancing: {}
kubernetesDashboard:
disabled: true
networkPolicyConfig:
disabled: true
createTime: '2018-12-24T19:31:21+00:00'
currentMasterVersion: 1.10.9-gke.5
currentNodeCount: 3
currentNodeVersion: 1.10.9-gke.5
initialClusterVersion: 1.10.9-gke.5
ipAllocationPolicy: {}
legacyAbac: {}
location: us-central1-a
locations:
- us-central1-a
loggingService: none
masterAuth:
username: admin
masterAuthorizedNetworksConfig: {}
monitoringService: monitoring.googleapis.com
name: test-cluster-1
network: default
networkConfig:
network: projects/test/global/networks/default
subnetwork: projects/test/regions/us-central1/subnetworks/default
networkPolicy: {}
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-1
serviceAccount: default
nodeIpv4CidrSize: 24
nodePools:
- autoscaling: {}
config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-1
serviceAccount: default
initialNodeCount: 3
management:
autoRepair: true
autoUpgrade: true
name: default-pool
status: RUNNING
version: 1.10.9-gke.5
status: RUNNING
subnetwork: default
zone: us-central1-a
Below is what is included in my configmap for the fluentd daemonset:
<source>
type tail
format none
time_key time
path /var/log/containers/*.log
pos_file /var/log/gcp-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%N%Z
tag reform.*
read_from_head true
</source>
<filter reform.**>
type parser
format json
reserve_data true
suppress_parse_error_log true
key_name log
</filter>
Here is an example json log from my application:
{"log":"org.test.interceptor","lvl":"INFO","thread":"main","msg":"Inbound Message\n----------------------------\nID: 44\nResponse-Code: 401\nEncoding: UTF-8\nContent-Type: application/json;charset=UTF-8\nHeaders: {Date=[Mon, 31 Dec 2018 14:43:47 GMT], }\nPayload: {\"errorType\":\"AnException\",\"details\":[\"invalid credentials\"],\"message\":\"credentials are invalid\"}\n--------------------------------------","#timestamp":"2018-12-31T14:43:47.805+00:00","app":"the-app"}
The result with the above configuration is below:
{
insertId: "3vycfdg1drp34o"
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v2.0-nds8d"
container.googleapis.com/namespace_name: "default"
container.googleapis.com/pod_name: "the-app-68fb6c5c8-mq5b5"
container.googleapis.com/stream: "stdout"
}
logName: "projects/test/logs/the-app"
receiveTimestamp: "2018-12-28T20:14:04.297451043Z"
resource: {
labels: {
cluster_name: "test-cluster-1"
container_name: "the-app"
instance_id: "234768123"
namespace_id: "default"
pod_id: "the-app-68fb6c5c8-mq5b5"
project_id: "test"
zone: "us-central1-a"
}
type: "container"
}
severity: "INFO"
textPayload: "org.test.interceptor"
timestamp: "2018-12-28T20:14:03Z"
}
I have even tried wrapping the json map into one field since it appears that only the "log" field is being parsed. I considered explicitly writing a parser but this seemed infeasible considering the log entry is already in json format and also the fields change from call to call and having to anticipate what fields to parse would not be ideal.
I expected that all of the fields in my log would appear in jsonPayload in the stackdriver log entry. I ultimately want to mimic what occurs with the default logging stackdriver service on a cluster where our logs at least appeared as jsonPayload.

I suspect type tail - format none in your configmap for the fluentd daemonset is not helping. Can you try setting the format to json or multiline, and update ?
type tail
format none

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to parse my string o/p of logs to JSON in fluentd - fluentd

Related

import and Parse syslog file to elastisearch

fluentd TimeParser Error - Invalid Time Format

Rsyslog collect logs from different timezones

Fluentd Parsing error with JSON and Non JSON ending logs

Implementing default stackdriver behavior in GKE

Categories

Resources