Why is fluentd JSON parser not working properly? - fluentd

I'm using the image gcr.io/google-containers/fluentd-elasticsearch (v2.3.1) in order to make fluentd collect some logs and send them to Elastic search. I'm using the below configuration for fluentd:
<source>
type forward
port {{.Values.fluentd.forward.port}}
bind 0.0.0.0
</source>
<filter kube.**>
#type parser
#log_level debug
key_name log
reserve_data true
remove_key_name_field true
<parse>
#type json
time_key time
time_type string
time_format %iso8601
</parse>
</filter>
<filter kube.**>
#type record_transformer
#log_level debug
enable_ruby
<record>
kubernetes ${record["kubernetes"]["cluster_name"] = "{{.Values.clusterName}}"; record["kubernetes"] }
logtrail {"host": "${record['kubernetes']['pod_name']}", "program":"${record['kubernetes']['container_name']}"}
</record>
</filter>
<filter kube.**>
#type concat
key log
stream_identity_key kubernetes["docker_id"]
multiline_end_regexp /\n$/
separator ""
</filter>
The above listed configuration was supposed to parse the JSON that is associated with a key called log. But I'm seeing that the JSON is not getting parsed at all. Below is the JSON that I'm getting after fluentd does the filtering. I had expected that the JSON associated with the key log would be parsed.
{"kubernetes":{"pod_name":"api-dummy-dummy-vcpqr","namespace_name":"dummy","pod_id":"dummy","labels":{"name":"api-dummy","pod-template-hash":"dummy","tier":"dummy"},"host":"dummy","container_name":"api-dummy","docker_id":"dummy","cluster_name":"dummy Dev"},"log":"{\"name\":\"dummy\",\"json\":false,\"hostname\":\"api-dummy-dummy-vcpqr\",\"pid\":24,\"component\":\"dummy\",\"level\":30,\"version\":\"1.0\",\"timestamp\":1539645856126}","stream":"stdout","logtrail":{"host":"api-dummy-dummy-vcpqr","program":"api-dummy"}}
I have spent more than 3 days figuring out the solution for this. I even tried to use https://github.com/edsiper/fluent-plugin-docker but that did not help. Although the plugin helped to parse the JSON, it resulted in the parsed log messages getting rejected by my Elastic search.

Your log field is not valid JSON.
{
"kubernetes": {
"pod_name": "api-dummy-dummy-vcpqr",
"namespace_name": "dummy",
"pod_id": "dummy",
"labels": {
"name": "api-dummy",
"pod-template-hash": "dummy",
"tier": "dummy"
},
"host": "dummy",
"container_name": "api-dummy",
"docker_id": "dummy",
"cluster_name": "dummy Dev"
},
"log": "{\"name\":\"dummy\",\"json\":false,\"hostname\":\"api-dummy-dummy-vcpqr\",\"pid\":24,\"component\":\"dummy\",\"level\":30,\"version\":\"1.0\",\"timestamp\":1539645856126",
"stream": "stdout",
"logtrail": {
"host": "api-dummy-dummy-vcpqr",
"program": "api-dummy"
}
}
You should concatenate log field before parse as JSON.

Related

How to parse JSON data from syslog with fluentd?

My custom rsyslog template:
template(name="outfmt" type="list" option.jsonf="on") {
property(outname="#timestamp"
name="timereported"
dateFormat="rfc3339" format="jsonf")
property(outname="host"
name="hostname" format="jsonf")
property(outname="severity"
name="syslogseverity-text" caseConversion="upper" format="jsonf")
property(outname="facility"
name="syslogfacility-text" format="jsonf")
property(outname="syslog-tag"
name="syslogtag" format="jsonf")
property(outname="source"
name="app-name" format="jsonf")
property(outname="message"
name="msg" format="jsonf")
}
My rsyslog example output:
{
"#timestamp": "2018-03-01T01:00:00+00:00",
"host": "172.20.245.8",
"severity": "DEBUG",
"facility": "local4",
"syslog-tag": "app[1666]",
"source": "app",
"message": " this is my syslog message"
}
How can I parse this log with fluentd and send to elasticsearch?
You can receive logs directly in elasticsearch (without even having to format them to json) through the syslog plugin. This probably would be the most straightforward solution to your problem.
If for some reason u need to use some kind of log aggregator, I personally would not recommend fluentd, as it can bring unecessary complexity with it.
But you could use logstash which is supported by elasticsearch and you can find plenty of documentation about it.

Apache Airflow kubernetes pod operator how do we pass configMap value in `value_from` while forming the environment variable in DAG

I am using Apache Airflow where in one of our DAG's task we are using Kubernetes Pod Operator. This is being done to execute one of our application process in a kubernetes pod. The Kubernetes pod operator works all good.Passing environmental variables via the pod operator works all good. However I am trying to pass an environment variable value from a Kubernetes ConfigMap it is not able to get the values from ConfigMap.
Code snippet is as below. In the code snippet please focus on the line 'SPARK_CONFIG': '{"valueFrom": {"configMapKeyRef": {"key": "endpoint","name": "spark-config"}}}'
Please find below the code snippet
pod_process_task = KubernetesPodOperator(
namespace=cons.K8_NAMESPACE,
image=cons.UNCOMPRESS_IMAGE_NAME,
config_file=cons.K8_CONFIG_FILE,
env_vars={
'FRT_ID': '{{ dag_run.conf["transaction_id"] }}',
'FILE_NAME': '{{ dag_run.conf["filename"]}}',
'FILE_PATH': '{{dag_run.conf["filepath"]}}' + "/" + '{{ dag_run.conf["filename"]}}',
'LOG_FILE': '{{ ti.xcom_pull(key="process_log_dict")["loglocation"] }}',
'SPARK_CONFIG': '{"valueFrom": {"configMapKeyRef": {"key": "endpoint","name": "spark-config"}}}'
},
name=create_pod_name(),
# name= 'integrator',
task_id="decrypt-951",
retries=3,
retry_delay=timedelta(seconds=60),
is_delete_operator_pod=True,
volumes=[volume-a,volume_for_configuration],
volume_mounts=[volume_mount_a,volume_mount_config],
resources=pod_resource_specification,
startup_timeout_seconds=cons.K8_POD_TIMEOUT,
get_logs=True,
on_failure_callback=log_failure_unzip_decrypt,
dag=dag
)
Then on trying to print the variable from the pod I am getting as below. Please note that the other ENV_VARIABLE values have been populated except for the one where I am trying to reference the configMap. Please find below the log output that we get to see while the K8-Pod is being spawned. [In the snippet below , please focus on the parameter 'name': 'SPARK_CONFIG' ]. The rest of the env_variables seems to have the value populated as what I have provided in the code snippet above with Jinga-templating.
'containers': [{'args': None,
'command': None,
'env': [{'name': 'FRT_ID',
'value': '20180902_01605',
'value_from': None},
{'name': 'FILE_NAME',
'value': 'transact_2018-09-02_0321_file_MAR.zip',
'value_from': None},
{'name': 'FILE_PATH',
'value': '/etc/data/app/trk0057.zip',
'value_from': None},
{'name': 'LOG_FILE',
'value': 'log-0057_2018-09.log',
'value_from': None},
{'name': 'SPARK_CONFIG',
'value': '{"valueFrom": {"configMapKeyRef": '
'{"key": "endpoint","name": '
'"spark-config"}}}',
'value_from': None}],
'env_from': None
...
...
...
...
The point is as to how do we pass the ConfigMap value as with the value_from while forming the environment variable in the Apache Airflow kubernetes pod operator
You should be able to accomplish this by using the KubernetesPodOperator configmaps parameter. You can see the docstring here: https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/operators/kubernetes_pod_operator.py#L104
So in this way you would pass configmaps=["spark-config"] presuming your configmap is named spark-config.
Airflow Version: v2.3.3
task_in_pod = KubernetesPodOperator(
...
env_vars={
"VAR_NAME1": "STRING_VALUE1",
"VAR_NAME2": "STRING_VALUE2",
"NEW_VAR_NAME": "$(VAR_NAME_FROM_CONFIG_MAP)" # <- value from ConfigMap
},
configmaps=['configmap-name'], # <- assume this one is already exists
...
)

Using environment variables in Ghost v1 config

In Ghost 0.x, config was provided via a single config.js file with keys for each env.
In Ghost 1.0, config is provided via multiple config.json files
How do you provide environment variables in Ghost 1.0?
I would like to dynamically set the port value using process.env.port on Cloud9 IDE like so.
config.development.json
{
"url": "http://localhost",
"server": {
"port": process.env.port,
"host": process.env.IP
}
}
When I run the application using ghost start with the following config, it says You can access your publication at http://localhost:2368, but when I go to http://localhost:2368 in http://c9.io it gives me an error saying No application seems to be running here!
{
"url": "http://localhost:2368",
"server": {
"port": 2368,
"host": "127.0.0.1"
}
}
I managed to figure out how to do this.
Here is the solution incase if someone else is also trying to figure out how to do the same thing.
In your config.development.json file, add the following.
{
"url": "http://{workspace_name}-{username}.c9users.io:8080",
"server": {
"port": 8080,
"host": "0.0.0.0"
}
}
Alternatively, run the following command in the terminal. This will dynamically get the value for the port and host environment variable and add the above content to the config.development.json file.
ghost config url http://$C9_HOSTNAME:$PORT
ghost config server.port $PORT
ghost config server.host $IP

How to know the file from command "docker change" is regular file or directory

We can get a list of files from command "docker containerID change" to watch the container changes. Can I know that each file in the list is a regular file or directory?
Input"
GET /containers/4fa6e0f0c678/changes HTTP/1.1
Output:
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"Path": "/dev",
"Kind": 0
},
{
"Path": "/dev/kmsg",
"Kind": 1
},
{
"Path": "/test",
"Kind": 1
}
]
The output of the docker change api is a list of all modified files/directories. In linux when a file modified, it's directory is modified as well. if you need to extract only files, you can remove all parents distinguished with their names. like this:
Consider a file name /use/share/my_file changed. It shows in api as:
[{ "Path": "/usr/share/my_file", "Kind": 0 },
{ "Path": "/usr/share", "Kind": 0 },
{ "Path": "/usr", "Kind": 0 }]
So if you want to extract file only you should split each Path and check in any one is it's parent:
["usr","share","my_file"],["usr","share"],["usr"]
or even easier, foreach Path, remove all paths contains in this path:
for /usr/share: /usr is contains in first one, so you can remove it.
for /usr/share/my_file, /usr and /usr/share are contains in the path and con be removed.
Start to do it your self and if any issue remains, we can help to solve it.

Gitlab Webhook and Jenkins: No Data Received

I have been following the below links in order to integrate Gitlab with Jenkins using web hooks. All the below links mention to receive 'JSON' or 'payload' or token over at Jenkins side, but I do not see anything when I try to echo or print these parameters in the Shell script of Jenkins configurations.
In shell script I have this, but I never receive any payloads:
echo "the build worked! The payload is $payload"
I do see some JSON coming through on /var/log/Jenkins/Jenkins.logs, but I want to see the messages coming in inside my 'Console Output', so that I can use the messages coming in from Gitlab to whether trigger a build or not.
Most of these links mention options are not available via Gitlab. One article was mentioning to convert web hook format to application/json, but there are no such options on the Gitlab UI.
How to process a github webhook payload in Jenkins?
http://chloky.com/github-json-payload-in-jenkins/
Jenkins Settings:
Gitlab webhook:
http://xx.xx.xx.xxx:8080/job/Interim_Build/buildWithParameters?token=TOKEN_NAME
Any help would be great. Thanks.
I suggest you to try two solutions (both working for me):
convert json data from Gitlab webhook using this elegant proxy written in Go https://github.com/akira/githookproxy .
It will take the webhook request, and translate it to a request to the target_url in the format of:
payload: JSON body
START: Start commit hash
END: End commit hash
REFNAME: Ref name
emulate jenkins as a Gitlab CI using this Jenkins plugin https://github.com/jenkinsci/gitlab-plugin
For me the best is the first because it is simple and more transparent.
GitLab and GitHub are two separate products. So, the documentation or links for GitHub webhooks that you are referring will not apply to GitLab webhooks.
GitLab invokes the webhook URL with a JSON payload in the request body that carries a lot of information about the GitLab event that led to the webhook invocation. For example, the GitLab webhook push event payload carries the following information in it:
{
"object_kind": "push",
"before": "95790bf891e76fee5e1747ab589903a6a1f80f22",
"after": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7",
"ref": "refs/heads/master",
"checkout_sha": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7",
"user_id": 4,
"user_name": "John Smith",
"user_username": "jsmith",
"user_email": "john#example.com",
"user_avatar": "https://s.gravatar.com/avatar/d4c74594d841139328695756648b6bd6?s=8://s.gravatar.com/avatar/d4c74594d841139328695756648b6bd6?s=80",
"project_id": 15,
"project":{
"id": 15,
"name":"Diaspora",
"description":"",
"web_url":"http://example.com/mike/diaspora",
"avatar_url":null,
"git_ssh_url":"git#example.com:mike/diaspora.git",
"git_http_url":"http://example.com/mike/diaspora.git",
"namespace":"Mike",
"visibility_level":0,
"path_with_namespace":"mike/diaspora",
"default_branch":"master",
"homepage":"http://example.com/mike/diaspora",
"url":"git#example.com:mike/diaspora.git",
"ssh_url":"git#example.com:mike/diaspora.git",
"http_url":"http://example.com/mike/diaspora.git"
},
"repository":{
"name": "Diaspora",
"url": "git#example.com:mike/diaspora.git",
"description": "",
"homepage": "http://example.com/mike/diaspora",
"git_http_url":"http://example.com/mike/diaspora.git",
"git_ssh_url":"git#example.com:mike/diaspora.git",
"visibility_level":0
},
"commits": [
{
"id": "b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327",
"message": "Update Catalan translation to e38cb41.",
"timestamp": "2011-12-12T14:27:31+02:00",
"url": "http://example.com/mike/diaspora/commit/b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327",
"author": {
"name": "Jordi Mallach",
"email": "jordi#softcatala.org"
},
"added": ["CHANGELOG"],
"modified": ["app/controller/application.rb"],
"removed": []
},
{
"id": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7",
"message": "fixed readme",
"timestamp": "2012-01-03T23:36:29+02:00",
"url": "http://example.com/mike/diaspora/commit/da1560886d4f094c3e6c9ef40349f7d38b5d27d7",
"author": {
"name": "GitLab dev user",
"email": "gitlabdev#dv6700.(none)"
},
"added": ["CHANGELOG"],
"modified": ["app/controller/application.rb"],
"removed": []
}
],
"total_commits_count": 4
}
The Jenkins GitLab plugin makes this webhook payload information available in the Jenkins Global Variable env. The available env variables are as follows:
gitlabBranch
gitlabSourceBranch
gitlabActionType
gitlabUserName
gitlabUserEmail
gitlabSourceRepoHomepage
gitlabSourceRepoName
gitlabSourceNamespace
gitlabSourceRepoURL
gitlabSourceRepoSshUrl
gitlabSourceRepoHttpUrl
gitlabMergeRequestTitle
gitlabMergeRequestDescription
gitlabMergeRequestId
gitlabMergeRequestIid
gitlabMergeRequestState
gitlabMergedByUser
gitlabMergeRequestAssignee
gitlabMergeRequestLastCommit
gitlabMergeRequestTargetProjectId
gitlabTargetBranch
gitlabTargetRepoName
gitlabTargetNamespace
gitlabTargetRepoSshUrl
gitlabTargetRepoHttpUrl
gitlabBefore
gitlabAfter
gitlabTriggerPhrase
Just as you would read Jenkins job parameters from Jenkins Global Variable params in your job pipeline script, you could read webhook payload fields from Jenkins Global Variable env:
echo "My Jenkins job parameter is ${params.MY_PARAM_NAME}"
echo "One of Jenkins job webhook payload field is ${env.gitlabMergedByUser}"
Hope, the above information helps solve your problem.

Resources