how to promtail parse json to label and timestamp - monitoring

I have a probleam to parse a json log with promtail, please, can somebody help me please. I try many configurantions, but don't parse the timestamp or other labels.
log entry:
{timestamp=2019-10-25T15:25:41.041-03, level=WARN, thread=http-nio-0.0.0.0-8080-exec-2, mdc={handler=MediaController, ctxCli=127.0.0.1, ctxId=FdD3FVqBAb0}, logger=br.com.brainyit.cdn.vbox.
controller.MediaController, message=[http://localhost:8080/media/sdf],c[500],t[4],l[null], context=default}
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: vbox-main
static_configs:
- targets:
- localhost
labels:
job: vbox
appender: main
__path__: /var/log/vbox/main.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
message: message
context: context
level: level
timestamp:
source: timestamp
format: RFC3339Nano
labels:
context:
level:
output:
source: message

I've tried the setup of Promtail with Java SpringBoot applications (which generates logs to file in JSON format by Logstash logback encoder) and it works.
The example log line generated by application:
{"timestamp":"2020-06-06T01:00:30.840+02:00","version":1,"message":"Started ApiApplication in 1.431 seconds (JVM running for 6.824)","logger_name":"com.github.pnowy.spring.api.ApiApplication","thread_name":"main","level":"INFO","level_value":20000}
The prometail config:
# Promtail Server Config
server:
http_listen_port: 9080
grpc_listen_port: 0
# Positions
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: springboot
pipeline_stages:
- json:
expressions:
level: level
message: message
timestamp: timestamp
logger_name: logger_name
stack_trace: stack_trace
thread_name: thread_name
- labels:
level:
- template:
source: new_key
template: 'logger={{ .logger_name }} threadName={{ .thread_name }} | {{ or .message .stack_trace }}'
- output:
source: new_key
static_configs:
- targets:
- localhost
labels:
job: applogs
__path__: /Users/przemek/tools/promtail/*.log
Please notice that the output (the log text) is configured first as new_key by Go templating and later set as the output source. The logger={{ .logger_name }} helps to recognise the field as parsed on Loki view (but it's an individual matter of how you want to configure it for your application).
Here you will find quite nice documentation about entire process: https://grafana.com/docs/loki/latest/clients/promtail/pipelines/
The example was run on release v1.5.0 of Loki and Promtail (Update 2020-04-25: I've updated links to current version - 2.2 as old links stopped working).
The section about timestamp is here: https://grafana.com/docs/loki/latest/clients/promtail/stages/timestamp/ with examples - I've tested it and also didn't notice any problem. Hope that help a little bit.
The JSON configuration part: https://grafana.com/docs/loki/latest/clients/promtail/stages/json/
Result on Loki:

Related

"Could not find expected ':' " error on Prometheus YAML file

I am trying to configure a YAML using oauth2.0 and I keep getting a "Could not find expected ':'" error on the line towards the end that says "scopes."
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
metrics_path: '/actuator/prometheus'
#scheme defaults to 'http'.
static_configs:
- targets: ["localhost:8080"]
oauth2:
client_id: ''
[ client_secret: '']
scopes:
[ - '']
token_url: ''
#tls_config:
Your YAML is invalid.
I find the Prometheus config documentation confusing to read.
In this case, you're misreading [ ... ]. It's used to denote that something is optional in the documentation. The [ and ] should not be literally included in your YAML.
Use a tool like yamllint to validate your YAML.
So, try:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
rule_files: []
scrape_configs:
- job_name: "prometheus"
metrics_path: "/actuator/prometheus"
static_configs:
- targets: ["localhost:8080"]
oauth2:
client_id: ""
client_secret: ""
scopes: []
token_url: ""
Notes:
Per oauth2 you must have either client_secret or client_secret_file.
I removed your comments for clarity but comments (# ...) are fine.
I removed your commented out list elements and replaced them with empty lists ([]). You will likely need to fix those; you're probably going to need at least one scope, for example.
It's a good practice to be consistent in your use of " or ' to delimit strings; I've used " in the above.
There are 2 ways to represents list in YAML. The JSON way and the YAML way. Somewhat awkwardly, I think you must use the JSON way to represent an empty list ([]). Again, it's good to be consistent and stick with one. The following are equivalent:
JSON-way:
some-list: ["a","b","c"]
YAML-way:
some-list:
- "a"
- "b"
- "c"

Error "Property Listeners cannot be empty" occurs when deploy ruby-on-rails project

I am newbie in AWS Cloudformation. My Elastic Beanstalk Worker uses Ruby on Rails. The EB is a Stack based on cloudformation template.
I don’t know why, when I deploy (eb deploy) recently, Event gave the following error message:
The AWSEBLoadBalancer is not in Resources: of the template. But I find it in .ebextensions of the source code.
Resources:
AWSEBLoadBalancer:
Properties:
AccessLoggingPolicy:
EmitInterval: 5
Enabled: true
S3BucketName:
Ref: LogsBucket
Type: "AWS::ElasticLoadBalancing::LoadBalancer"
DependsOn: "LogsBucketPolicy"
LogsBucket:
DeletionPolicy: Retain
Type: "AWS::S3::Bucket"
LogsBucketPolicy:
Properties:
Bucket:
Ref: LogsBucket
PolicyDocument:
Statement:
-
Action:
- "s3:PutObject"
Effect: Allow
Principal:
AWS:
? "Fn::FindInMap"
:
- Region2ELBAccountId
-
Ref: "AWS::Region"
- AccountId
Resource:
? "Fn::Join"
:
- ""
-
- "arn:aws:s3:::"
-
Ref: LogsBucket
- /AWSLogs/
-
Ref: "AWS::AccountId"
Can you please give me some hints to solve this problem?
The error message says that you are missing Listeners. With the Listeners your balancer definition would be something like (need to modify to your own settings):
AWSEBLoadBalancer:
Properties:
Listeners:
- InstancePort: 80
InstanceProtocol: HTTP
LoadBalancerPort: 80
#PolicyNames:
# - String
Protocol: HTTP
#SSLCertificateId: String
AccessLoggingPolicy:
EmitInterval: 5
Enabled: true
S3BucketName:
Ref: LogsBucket
Type: "AWS::ElasticLoadBalancing::LoadBalancer"
DependsOn: "LogsBucketPolicy"

Implementing default stackdriver behavior in GKE

I am setting up a GKE cluster for an application that has structured json logging that works very well with Kibana. However, I want to use stackdriver instead.
I see that the application's logs are available in stackdriver with the default cluster configurations. The logs appear as jsonpayload but I want more flexibility and configuration and when I do that following this guide, all of the logs for the same application appear only as textpayload. Ultimately, I want my logs to continue to show up in jsonpayload when I use fluentd agent configurations to take advantage of the label_map.
I followed the guide on removing the default logging service and deploying fluentd agent with an existing cluster with the below GKE versions.
Gcloud version info:
Google Cloud SDK 228.0.0
bq 2.0.39
core 2018.12.07
gsutil 4.34
kubectl version info:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.9-gke.5", GitCommit:"d776b4deeb3655fa4b8f4e8e7e4651d00c5f4a98", GitTreeState:"clean", BuildDate:"2018-11-08T20:33:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
gcloud container cluster describe snippet:
addonsConfig:
httpLoadBalancing: {}
kubernetesDashboard:
disabled: true
networkPolicyConfig:
disabled: true
createTime: '2018-12-24T19:31:21+00:00'
currentMasterVersion: 1.10.9-gke.5
currentNodeCount: 3
currentNodeVersion: 1.10.9-gke.5
initialClusterVersion: 1.10.9-gke.5
ipAllocationPolicy: {}
legacyAbac: {}
location: us-central1-a
locations:
- us-central1-a
loggingService: none
masterAuth:
username: admin
masterAuthorizedNetworksConfig: {}
monitoringService: monitoring.googleapis.com
name: test-cluster-1
network: default
networkConfig:
network: projects/test/global/networks/default
subnetwork: projects/test/regions/us-central1/subnetworks/default
networkPolicy: {}
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-1
serviceAccount: default
nodeIpv4CidrSize: 24
nodePools:
- autoscaling: {}
config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-1
serviceAccount: default
initialNodeCount: 3
management:
autoRepair: true
autoUpgrade: true
name: default-pool
status: RUNNING
version: 1.10.9-gke.5
status: RUNNING
subnetwork: default
zone: us-central1-a
Below is what is included in my configmap for the fluentd daemonset:
<source>
type tail
format none
time_key time
path /var/log/containers/*.log
pos_file /var/log/gcp-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%N%Z
tag reform.*
read_from_head true
</source>
<filter reform.**>
type parser
format json
reserve_data true
suppress_parse_error_log true
key_name log
</filter>
Here is an example json log from my application:
{"log":"org.test.interceptor","lvl":"INFO","thread":"main","msg":"Inbound Message\n----------------------------\nID: 44\nResponse-Code: 401\nEncoding: UTF-8\nContent-Type: application/json;charset=UTF-8\nHeaders: {Date=[Mon, 31 Dec 2018 14:43:47 GMT], }\nPayload: {\"errorType\":\"AnException\",\"details\":[\"invalid credentials\"],\"message\":\"credentials are invalid\"}\n--------------------------------------","#timestamp":"2018-12-31T14:43:47.805+00:00","app":"the-app"}
The result with the above configuration is below:
{
insertId: "3vycfdg1drp34o"
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v2.0-nds8d"
container.googleapis.com/namespace_name: "default"
container.googleapis.com/pod_name: "the-app-68fb6c5c8-mq5b5"
container.googleapis.com/stream: "stdout"
}
logName: "projects/test/logs/the-app"
receiveTimestamp: "2018-12-28T20:14:04.297451043Z"
resource: {
labels: {
cluster_name: "test-cluster-1"
container_name: "the-app"
instance_id: "234768123"
namespace_id: "default"
pod_id: "the-app-68fb6c5c8-mq5b5"
project_id: "test"
zone: "us-central1-a"
}
type: "container"
}
severity: "INFO"
textPayload: "org.test.interceptor"
timestamp: "2018-12-28T20:14:03Z"
}
I have even tried wrapping the json map into one field since it appears that only the "log" field is being parsed. I considered explicitly writing a parser but this seemed infeasible considering the log entry is already in json format and also the fields change from call to call and having to anticipate what fields to parse would not be ideal.
I expected that all of the fields in my log would appear in jsonPayload in the stackdriver log entry. I ultimately want to mimic what occurs with the default logging stackdriver service on a cluster where our logs at least appeared as jsonPayload.
I suspect type tail - format none in your configmap for the fluentd daemonset is not helping. Can you try setting the format to json or multiline, and update ?
type tail
format none

Prometheus AlertManager - Send Alerts to different clients based on routes

I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.
I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?
My AlertManager YAML file is:
route:
receiver: X
receivers:
- name: X
email_configs:
- name: Y
email_configs:
And alert.rule files is:
groups:
- name: A
rules:
- alert: A_down
expr: expression
for: 1m
labels:
severity: critical
annotations:
summary: "A is down"
- name: B
rules:
- alert: B_down
expr: expression
for: 1m
labels:
severity: warning
annotations:
summary: "B is down"
The config should roughly look like this (not tested):
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: X
- match:
alertname: B_down
receiver: Y
The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.
For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:
SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route)
-> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...
For example:
if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.
There's 3 files important to configure alerts on your prometheus system:
alertmanager.yml - configuration of you routes (getting the triggered
errors) and receivers (how to handle this errors)
rules.yml - This rules will contain all the thresholds and rules
you'll define in your system.
prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).
I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it):
On /var/data/prometheus-stack/alertmanager/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'JohnDoe#gmail.com'
route:
receiver: defaultTrigger
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
routes:
- match_re:
service: service_overload
owner: ATeam
receiver: pagerDutyTrigger
receivers:
- name: 'pagerDutyTrigger'
pagerduty_configs:
- send_resolved: true
routing_key: <myPagerDutyToken>
Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml
groups:
- name: alerts
rules:
- alert: service_overload_more_than_5000
expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
for: 10m
labels:
service: service_overload
severity: pager
dev_team: myteam
annotations:
dev_team: myteam
priority: Blocker
identifier: '{{ $labels.name }}'
description: 'service overflow'
value: '{{ humanize $value }}%'
On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:
global:
...
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
rule_files:
- "yourRuleFile.yml"
...
Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.
Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts

SNMP Exporter (prometheus) + Extend OID

I'm new on Prometheus and i'm trying to monitor some extend OID I created with the snmp_exporter, but it doesn't work as expected.
My script just does an "echo $VALUE" (value is an integer or a string).
I have this snmpd.conf :
extend value-return-test /usr/local/bin/script.sh
I generated his OID :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendResult.\"value-return-test\" -On
Now I'm able to get all the snmp extend link to my configuration :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects |grep value-return-test
Now, here is my prometheus configuration prometheus.yml :
global:
scrape_interval: 5s
- job_name: 'snmp'
metrics_path: /snmp
params:
module: [tests]
static_configs:
- targets:
- 127.0.0.1 # SNMP device - add your IPs here
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter.
and my snmp.yaml :
tests:
walk:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
With that configuration I'm not able to get my value on the page http://localhost:9116/snmp?target=127.0.0.1&module=tests :
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.004676028
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 0
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.004477656
However if I put my configuration into an other block like the if_mib, I'm able to get the values BUT they are put in the wrong place :
As you can see I got the value "1" instead of "6".
I also tried the snmp exporter generator but i'm not able to build it :
$ go build
# github.com/prometheus/snmp_exporter/generator
./net_snmp.go:6:38: fatal error: net-snmp/net-snmp-config.h: No such file or directory
compilation terminated.
Thanks for your help
If you are able to change snmpd.conf that implies that you have enough control over the machine to run the node exporter. I'd suggest using the textfile collector of the node exporter to expose this data, rather than spending time figuring out the intricacies of how SNMP and MIBs work.
In general you should prefer the Node/WMI exporters where possible over using SNMP.
Using the get parameter instead of walk worked for me.
tests:
get:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32

Resources