fluent-bit when trying to use OUTPUT gelf getting connection timeout after 10 seconds - graylog

We are trying to get EKS logs to Graylog.
Deployed, Graylog using Helm Charts.
We used MongoDB, Elasticsearch, and Graylog to deploy Graylog. Graylog works fine.
After Graylog was created.
To get EKS logs, we deployed Fluent-bit.
To send logs to Graylog in Fluent-bit configuration
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_graylog.db
Parser docker
Docker_Mode On
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
Key log
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
## https://docs.fluentbit.io/manual/pipeline/filters
filters: |
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Merge_Log_Key log_processed
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Annotations Off
Labels On
[FILTER]
Name nest
Match *
Operation lift
Nested_under kubernetes
## https://docs.fluentbit.io/manual/pipeline/outputs
outputs: |
[OUTPUT]
Name es
Match kube.*
Host elasticsearch-master
Port 9200
Logstash_Format On
Retry_Limit Off
Replace_Dots On
[OUTPUT]
Name es
Match host.*
Host elasticsearch-master
Port 9200
Logstash_Format On
Logstash_Prefix node
Retry_Limit Off
Replace_Dots On
[OUTPUT]
Name gelf
Match *
Host graylog.example.com
Port 12201
Mode tcp
Gelf_Short_Message_Key short_message
[OUTPUT]
Name syslog
Match *
Host graylog.example.com
Port 541
Mode udp
Syslog_Format rfc5424
Syslog_Maxsize 2048
Syslog_Severity_Key severity
Syslog_Facility_Key facility
Syslog_Sd_Key sd
Syslog_Message_key message
## https://docs.fluentbit.io/manual/pipeline/parsers
customParsers: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
In fluent-bit logs, I am getting the error
[2022/08/02 06:50:18] [error] [upstream] connection #141 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [upstream] connection #126 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [upstream] connection #143 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [upstream] connection #125 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [upstream] connection #144 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [upstream] connection #139 to graylog.example.com:12201 timed out after 10 seconds
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [error] [output:gelf:gelf.2] no upstream connections available
[2022/08/02 06:50:18] [ warn] [engine] chunk '1-1659422985.239025920.flb' cannot be retried: task_id=108, input=tail.0 > output=gelf.2
[2022/08/02 06:50:18] [ warn] [engine] chunk '1-1659422988.238308295.flb' cannot be retried: task_id=15, input=tail.0 > output=gelf.2
[2022/08/02 06:50:18] [ warn] [engine] chunk '1-1659422988.801295849.flb' cannot be retried: task_id=116, input=tail.0 > output=gelf.2
[2022/08/02 06:50:18] [ warn] [engine] failed to flush chunk '1-1659423006.238302940.flb', retry in 11 seconds: task_id=46, input=tail.0 > output=gelf.2 (out_id=2)
[2022/08/02 06:50:18] [ warn] [engine] chunk '1-1659422989.738179384.flb' cannot be retried: task_id=105, input=tail.0 > output=gelf.2
[2022/08/02 06:50:18] [ warn] [engine] failed to flush chunk '1-1659423007.739931411.flb', retry in 10 seconds: task_id=56, input=tail.0 > output=gelf.2 (out_id=2)

Related

fluentbit fails to communicate with fluentd

I am trying a simple fluentbit / fluentd test with ipv6, but it is not working.
Configuration from fluentbit side:
[SERVICE]
Flush 5
Daemon off
[INPUT]
Name cpu
Tag fluent_bit
[OUTPUT]
Name forward
Match *
Host fd00:7fff:0:2:9c43:9bff:fe00:bb
Port 24000
Configuration from fluentd side:
<source>
type forward
bind ::
port 24000
</source>
~
<match fluent_bit>
type stdout
</match>
I start up fluentd with command: /usr/sbin/td-agent -c test.conf
Then, I start up fluentbit with command: /opt/td-agent-bit/bin/td-agent-bit -c test.conf
The output shows that there is a problem with communication:
Fluent Bit v1.6.6
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/11/29 01:53:49] [ info] [engine] started (pid=142)
[2020/11/29 01:53:49] [ info] [storage] version=1.0.6, initializing...
[2020/11/29 01:53:49] [ info] [storage] in-memory
[2020/11/29 01:53:49] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/11/29 01:53:49] [ info] [sp] stream processor started
[2020/11/29 01:53:53] [error] [io] connection #27 failed to: fd00:7fff:0:2:9c43:9bff:fe00:bb:24000
[2020/11/29 01:53:53] [error] [output:forward:forward.0] no upstream connections available
[2020/11/29 01:53:53] [ warn] [engine] failed to flush chunk '142-1606614829.871139401.flb', retry in 8 seconds: task_id=0, input=cpu.0 > output=forward.0
[2020/11/29 01:53:58] [error] [io] connection #28 failed to: fd00:7fff:0:2:9c43:9bff:fe00:bb:24000
[2020/11/29 01:53:58] [error] [output:forward:forward.0] no upstream connections available
[2020/11/29 01:53:58] [ warn] [engine] failed to flush chunk '142-1606614833.871418916.flb', retry in 6 seconds: task_id=1, input=cpu.0 > output=forward.0
[2020/11/29 01:54:01] [error] [io] connection #29 failed to: fd00:7fff:0:2:9c43:9bff:fe00:bb:24000
[2020/11/29 01:54:01] [error] [output:forward:forward.0] no upstream connections available
[2020/11/29 01:54:01] [ warn] [engine] chunk '142-1606614829.871139401.flb' cannot be retried: task_id=0, input=cpu.0 > output=forward.0
However when I run the command without using the config file but passing the parameters, it works:
# /opt/td-agent-bit/bin/td-agent-bit -i cpu -t fluent_bit -o forward://[fd00:7fff:0:2:9c43:9bff:fe00:bb]:24000 -v
Fluent Bit v1.6.6
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/11/29 01:56:53] [ info] Configuration:
[2020/11/29 01:56:53] [ info] flush time | 5.000000 seconds
[2020/11/29 01:56:53] [ info] grace | 5 seconds
[2020/11/29 01:56:53] [ info] daemon | 0
[2020/11/29 01:56:53] [ info] ___________
[2020/11/29 01:56:53] [ info] inputs:
[2020/11/29 01:56:53] [ info] cpu
[2020/11/29 01:56:53] [ info] ___________
[2020/11/29 01:56:53] [ info] filters:
[2020/11/29 01:56:53] [ info] ___________
[2020/11/29 01:56:53] [ info] outputs:
[2020/11/29 01:56:53] [ info] forward.0
[2020/11/29 01:56:53] [ info] ___________
[2020/11/29 01:56:53] [ info] collectors:
[2020/11/29 01:56:53] [ info] [engine] started (pid=151)
[2020/11/29 01:56:53] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/11/29 01:56:53] [debug] [storage] [cio stream] new stream registered: cpu.0
[2020/11/29 01:56:53] [ info] [storage] version=1.0.6, initializing...
[2020/11/29 01:56:53] [ info] [storage] in-memory
[2020/11/29 01:56:53] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/11/29 01:56:53] [debug] [forward:forward.0] created event channels: read=20 write=21
[2020/11/29 01:56:53] [debug] [router] default match rule cpu.0:forward.0
[2020/11/29 01:56:53] [ info] [sp] stream processor started
[2020/11/29 01:56:57] [debug] [task] created task=0x7f9b4e8580a0 id=0 OK
[2020/11/29 01:56:57] [debug] [output:forward:forward.0] request 5525 bytes to flush
[2020/11/29 01:56:57] [debug] [upstream] KA connection #27 to fd00:7fff:0:2:9c43:9bff:fe00:bb:24000 is now available
[2020/11/29 01:56:57] [debug] [task] destroy task=0x7f9b4e8580a0 (task_id=0)
[2020/11/29 01:57:02] [debug] [task] created task=0x7f9b4e8580a0 id=0 OK
[2020/11/29 01:57:02] [debug] [output:forward:forward.0] request 4420 bytes to flush
Does anyone understand what the difference is and how i can rectify this problem?
I resolved this issue by taking the source from the latest fluent-bit project and compiling it. Then I replaced /usr/sbin/td-agent-bit with the newly built fluent-bit (as td-agent).

Kubernetes cluster does not run after reboot

If I use the kubectl command after a reboot, I will receive an error.
x.x.x.x: 6443 was refused-did you specify the right host or port?
If I check my container with docker ps, kube-apiserver and kube-scheduler are turned on and off.
Why is this happening?
root#taeil-linux:/etc/systemd/system/kubelet.service.d# cd
root#taeil-linux:~# kubectl get nodes
The connection to the server 10.0.0.152:6443 was refused - did you specify the right host or port?
root#taeil-linux:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root#taeil-linux:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.15.3 232b5c793146 2 weeks ago 82.4MB
k8s.gcr.io/kube-apiserver v1.15.3 5eb2d3fc7a44 2 weeks ago 207MB
k8s.gcr.io/kube-scheduler v1.15.3 703f9c69a5d5 2 weeks ago 81.1MB
k8s.gcr.io/kube-controller-manager v1.15.3 e77c31de5547 2 weeks ago 159MB
node carbon c83f74dcf58e 3 weeks ago 895MB
kubernetesui/dashboard v2.0.0-beta1 4640949a39e6 2 months ago 64.6MB
weaveworks/weave-kube 2.5.2 f04a043bb67a 3 months ago 148MB
weaveworks/weave-npc 2.5.2 5ce48e0d813c 3 months ago 49.6MB
kubernetesui/metrics-scraper v1.0.0 44390ebe2b73 4 months ago 36.8MB
k8s.gcr.io/coredns 1.3.1 eb516548c180 7 months ago 40.3MB
k8s.gcr.io/etcd 3.3.10 2c4adeb21b4f 9 months ago 258MB
quay.io/coreos/flannel v0.10.0-amd64 f0fad859c909 19 months ago 44.6MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 20 months ago 742kB
root#taeil-linux:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Fri 2019-09-06 14:29:25 KST; 4min 19s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 14470 (kubelet)
Tasks: 19 (limit: 4512)
CGroup: /system.slice/kubelet.service
└─14470 /usr/bin/kubelet --bootstrap- kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf -- kubeconfig=/etc/kubernetes/kubelet.conf -- config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network- plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-con
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.800330 14470 pod_workers.go:190] Error syncing pod 9a745ac0a776afabd0d387fd0fcb2f54 ("kube-apiserver-taeil-linux_kube- system(9a745ac0a776afabd0d387fd0fcb2f54)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-ta
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.897945 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.916566 14470 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.152:6443/api/v1/pods? fieldSelector=spec.nodeName%3Dtaeil-linux&limit=500&resourceVersion=0: dia
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.998190 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.098439 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.198732 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.299052 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.399343 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.499561 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.599723 14470 kubelet.go:2248] node "taeil-linux" not found
root#taeil-linux:~# systemctl status kube-apiserver
Unit kube-apiserver.service could not be found.
If I try
docker logs
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0906 10:54:19.636649 1 server.go:560] external host was not specified, using 10.0.0.152
I0906 10:54:19.636954 1 server.go:147] Version: v1.15.3
I0906 10:54:21.753962 1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.753988 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
E0906 10:54:21.754660 1 prometheus.go:55] failed to register depth metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754701 1 prometheus.go:68] failed to register adds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754787 1 prometheus.go:82] failed to register latency metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754842 1 prometheus.go:96] failed to register workDuration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754883 1 prometheus.go:112] failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754918 1 prometheus.go:126] failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754952 1 prometheus.go:152] failed to register depth metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754986 1 prometheus.go:164] failed to register adds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755047 1 prometheus.go:176] failed to register latency metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755104 1 prometheus.go:188] failed to register work_duration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755152 1 prometheus.go:203] failed to register unfinished_work_seconds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755188 1 prometheus.go:216] failed to register longest_running_processor_microseconds metric admission_quota_controller: duplicate metrics collector registration attempted
I0906 10:54:21.755215 1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesBy Condition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObj ectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.755226 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,Validating AdmissionWebhook,ResourceQuota.
I0906 10:54:21.757263 1 client.go:354] parsed scheme: ""
I0906 10:54:21.757280 1 client.go:354] scheme "" not registered, fallback to default scheme
I0906 10:54:21.757335 1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0 <nil>}]
I0906 10:54:21.757402 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:21.757666 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0906 10:54:22.753069 1 client.go:354] parsed scheme: ""
I0906 10:54:22.753118 1 client.go:354] scheme "" not registered, fallback to default scheme
I0906 10:54:22.753204 1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0 <nil>}]
I0906 10:54:22.753354 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:22.753855 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:22.757983 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:23.754019 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:24.430000 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:25.279869 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:26.931974 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:28.198719 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:30.825660 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:32.850511 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:36.294749 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:38.737408 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
F0906 10:54:41.757603 1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry {[https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt} true 0xc00063dd40 apiextensions.k8s.io/v1beta1 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:2379: connect: connection refused)
The answer is in the comment by #cewood;
Okay, that helps to understand what you installation is likely to look
like. Regarding the other master components, these are likely running
via the kubelet, and hence there won't be any systemd units for them,
only for the kubelet itself.
With kubeadm install you dont see the services;
as root
systemctl start docker
systemctl start kubectl
switch to non root user
su nonrootuser -
kubectl get pods
Long time no see.
I totally realized how to solve this problem!
If you get an error like this for no reason, you can fix it by:
docker rm $(docker ps -a -q)
Perhaps an error occurred when the existing Kubernetes container was rebooted and the newly running container crashed.
watch docker ps
If you check the container with watch, you can see that kube-apiserver and others are turned off within 1 minute.
So I decided to delete all containers appearing in docker ps -a and it's fixed!

Hyperledger Fabric peer container cannot communicate with couchdb container

I am trying to configure hyperledger fabric network.
I run three zookeepers, three kafkas, three orderers and a couchdb.
They are docker containers and work well. All containers are in the same docker network called ibknet.
After that, I run peer container but it has a problem to detect couchdb container while its starting. Here is my command to run peer container.
docker run -d --name main.stock.ibk.com --hostname main.stock.ibk.com \
-p 7051:7051 \
-p 7053:7053 \
-e CORE_LEDGER_STATE_STATEDATABASE=CouchDB \
-e CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS=couchdb0:5984 \
--network ibknet hyperledger/fabric-peer:x86_64-1.0.4
Here is the list of running docker containers before executing the command above.
0b49e1d9e135 hyperledger/fabric-couchdb:x86_64-1.0.4 "tini -- /docker-e..." 28 minutes ago Up 28 minutes 4369/tcp, 5984/tcp, 9100/tcp couchdb0
592e81a721f1 hyperledger/fabric-orderer:x86_64-1.0.4 "orderer" 28 minutes ago Up 28 minutes 0.0.0.0:9050->7050/tcp orderer2
5a0d0f05770d hyperledger/fabric-orderer:x86_64-1.0.4 "orderer" 28 minutes ago Up 28 minutes 0.0.0.0:8050->7050/tcp orderer1
681894ae835a hyperledger/fabric-orderer:x86_64-1.0.4 "orderer" 28 minutes ago Up 28 minutes 0.0.0.0:7050->7050/tcp orderer0
2c1910c3e293 hyperledger/fabric-kafka:x86_64-1.0.4 "/docker-entrypoin..." 28 minutes ago Up 28 minutes 9092-9093/tcp kafka2
b27ff82abd3d hyperledger/fabric-kafka:x86_64-1.0.4 "/docker-entrypoin..." 29 minutes ago Up 28 minutes 9092-9093/tcp kafka1
2f84d000c2d6 hyperledger/fabric-kafka:x86_64-1.0.4 "/docker-entrypoin..." 29 minutes ago Up 28 minutes 9092-9093/tcp kafka0
25438ef57579 hyperledger/fabric-zookeeper:x86_64-1.0.4 "/docker-entrypoin..." 29 minutes ago Up 29 minutes 2181/tcp, 2888/tcp, 3888/tcp zookeeper2
c01262ae099e hyperledger/fabric-zookeeper:x86_64-1.0.4 "/docker-entrypoin..." 29 minutes ago Up 29 minutes 2181/tcp, 2888/tcp, 3888/tcp zookeeper1
f95e92b10b25 hyperledger/fabric-zookeeper:x86_64-1.0.4 "/docker-entrypoin..." 29 minutes ago Up 29 minutes 2181/tcp, 2888/tcp, 3888/tcp zookeeper0
And I got this error messages in docker logs when I run the peer container.
2017-12-01 07:26:34.238 UTC [ledgermgmt] initialize -> INFO 002 Initializing ledger mgmt
2017-12-01 07:26:34.238 UTC [kvledger] NewProvider -> INFO 003 Initializing ledger provider
2017-12-01 07:26:34.624 UTC [couchdb] handleRequest -> WARN 004 Retrying couchdb request in 125ms. Attempt:1 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:34.987 UTC [couchdb] handleRequest -> WARN 005 Retrying couchdb request in 250ms. Attempt:2 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:35.533 UTC [couchdb] handleRequest -> WARN 006 Retrying couchdb request in 500ms. Attempt:3 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:36.279 UTC [couchdb] handleRequest -> WARN 007 Retrying couchdb request in 1s. Attempt:4 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:37.617 UTC [couchdb] handleRequest -> WARN 008 Retrying couchdb request in 2s. Attempt:5 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:39.819 UTC [couchdb] handleRequest -> WARN 009 Retrying couchdb request in 4s. Attempt:6 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:44.019 UTC [couchdb] handleRequest -> WARN 00a Retrying couchdb request in 8s. Attempt:7 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:26:52.279 UTC [couchdb] handleRequest -> WARN 00b Retrying couchdb request in 16s. Attempt:8 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:27:08.515 UTC [couchdb] handleRequest -> WARN 00c Retrying couchdb request in 32s. Attempt:9 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
2017-12-01 07:27:40.730 UTC [couchdb] handleRequest -> WARN 00d Retrying couchdb request in 1m4s. Attempt:10 Error:Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
panic: Error in instantiating ledger provider: Unable to connect to CouchDB, check the hostname and port: Get http://couchdb0:5984/: dial tcp 75.126.101.240:5984: getsockopt: connection refused
What do I miss? And I have no idea that where 75.126.101.240 address is from.
Actually, I could succeed in running it with --add-host option by mentioning exactly ip. But I don't think that is a proper way because ip is changing whenever I start docker servers.
Please help this issue. Thanks.
Try adding : "--dns-search ." to the docker run command. The resolver on your host system is likely configured to append / search a domain for unknown hosts. "--dns-search ." should force resolution only via the embedded Docker DNS

JMX_exporter shows error in Prometheus & Grafana

I have used JMX exporter to monitor Java application deployed based on jetty .
I have downloaded the jmx_prometheus_javaagent-0.1.0.jar file
Deployed the java application with jmx_exporter command
nohup java -javaagent:./jmx_prometheus_javaagent-0.1.0.jar=7101:config.yaml -Dorg.eclipse.jetty.server.Request.maxFormContentSize=10000000 -Xms256m -Xmx256m -Djava.io.tmpdir=epoch_temp_dir -jar jetty-runner-9.0.7.v20131107.jar --log yyyy_mm_dd-java-application-1-request.log --out yyyy_mm_dd-java-application-1-output.log --port 8091 --path /java-application-1 java-app1.war >> java-application-1.log 2>&1 &
config.yaml
#cat config.yaml
---
startDelaySeconds: 0
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:7101/jmxrmi
ssl: false
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
- pattern: ".*"
Prometheus shows connection timed out in status page
output log : of the java application deployed
io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector collect
SEVERE: JMX scrape failed: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error du
ring JRMP connection establishment; nested exception is:
java.net.SocketTimeoutException: Read timed out]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxScraper.doScrape(JmxScraper.java:106)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:415)
at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:180)
at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:213)
at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:134)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:59)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:
java.net.SocketTimeoutException: Read timed out]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:136)
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205)
at javax.naming.InitialContext.lookup(InitialContext.java:417)
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955)
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
... 17 more
Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:
java.net.SocketTimeoutException: Read timed out
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:304)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:342)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
Opened the 7101 ports in the client server and grant access to Prometheus server.
Prometheus Server RAM usage:
# free -h
total used free shared buff/cache available
Mem: 4.8G 1.9G 118M 256M 2.8G 2.3G
Swap: 0B 0B 0B
Client server RAM :
# free -h
total used free shared buff/cache available
Mem: 9.8G 3.7G 435M 16M 5.6G 5.7G
Swap: 0B 0B 0B
Curl localhost:7101 in the client server is not responding
Remove the jmxUrl from the config, that's not required for agent use.

telegraph database creation failed, how to configure ports in mac os for influxdb telegraph

Following the documentation for telegraph v0.13, we try to launch the telegraph instance in terminal:
telegraf -config telegraf.conf
Note the Database creation failed:
Get http://localhost:8086/query?db=&q=CREATE+DATABASE+IF+NOT+EXISTS+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused
How to fix? Is it a firewall problem?
The error is visible in the log in the terminal output:
2016/07/27 22:15:11 Starting Telegraf (version 0.13.1)
2016/07/27 22:15:11 Loaded outputs: influxdb
2016/07/27 22:15:11 Loaded inputs: cpu mem
2016/07/27 22:15:11 Tags enabled: host=johns-MacBook-Pro-2.local
2016/07/27 22:15:11 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"johns-MacBook-Pro-2.local", Flush Interval:10s
2016/07/27 22:15:30 Output [influxdb] buffer fullness: 11 / 10000 metrics. Total gathered metrics: 11. Total dropped metrics: 0.
2016/07/27 22:15:30 Database creation failed: Get http://localhost:8086/query?db=&q=CREATE+DATABASE+IF+NOT+EXISTS+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused
2016/07/27 22:15:30 Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster
2016/07/27 22:15:40 Output [influxdb] buffer fullness: 21 / 10000 metrics. Total gathered metrics: 21. Total dropped metrics: 0.
2016/07/27 22:15:40 Database creation failed: Get http://localhost:8086/query?db=&q=CREATE+DATABASE+IF+NOT+EXISTS+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused
Okay, wasn't a port issue. influxdb wasn't running properly

Resources