Docker Swarm - Elastichsearch instances cannot resolve each other on the same overlay - docker

I am trying to create an elasticsearch stack using docker swarm (Don't really need swarm functionality, but the .yml has been written)
The problem I seem to be getting is that when i start up the stack, the two masters can't resolve each other (Despite confirming they're on the same network)
and it may be the fault of discovery.seed_hosts providing the masters with an empty list or it could be the docker network not working properly.
"message":"publish_address {10.0.120.16:9300}, bound_addresses {0.0.0.0:9300}"
"message":"bound or publishing to a non-loopback address, enforcing bootstrap checks",
"message":"failed to resolve host [es-master2]"
"message":"master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes [es-master1, es-master2] to bootstrap a cluster: have discovered [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}]; discovery will continue using [] from hosts providers and [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
I'm not sure if the discovering from hosts [] means that something went wrong with the discovery.seed_hosts setting. I suspect it because without the setting it'll be replaced with [127.0.0.1:9300,... etc]
Here's the parts of my docker compose that's relevant. It's part of a bigger file, but right now i just need the two masters to talk first.
.
.
.
networks:
es-internal:
driver: overlay
es-connection:
driver: overlay
external:
driver: overlay
name: external
.
.
.
es-master1:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master1
environment:
- node.name=es-master1
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master2
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master1:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master1-crt
target: /usr/share/elasticsearch/config/certs/es-master1.crt
- source: es-master1-key
target: /usr/share/elasticsearch/config/certs/es-master1.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
es-master2:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master2
environment:
- node.name=es-master2
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master1
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master2:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master2-crt
target: /usr/share/elasticsearch/config/certs/es-master2.crt
- source: es-master2-key
target: /usr/share/elasticsearch/config/certs/es-master2.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
.
.
.
I've tried:
inspecting the network it creates to see if they are in the same network
All sorts of es config options
New swarm service meant to test out pinging between containers on the same overlay network. (They could resolve the hostnames given to them with the ping command)
Expose ports (I know it's useless now)
If anyone has some input it would be great!

Related

Corruption of Portainer's DB

I have a deployment of Portainer 2.14.2 and Docker Engine 20.10.7.
It has been functional for quite a few months. Today I had some problems as the Portainer container (the one that is in charge of the UI, not the agent) was restarting. In one of those restarts, for an unknown reason, the database has been corrupted.
Logs:
time="2022-10-19T10:59:15Z" level=info msg="Encryption key file `portainer` not present"
time="2022-10-19T10:59:15Z" level=info msg="Proceeding without encryption key"
time="2022-10-19T10:59:15Z" level=info msg="Loading PortainerDB: portainer.db"
panic: page 8 already freed
goroutine 35 [running]:
go.etcd.io/bbolt.(*freelist).free(0xc000728600, 0xb175, 0x7f104c311000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/freelist.go:175 +0x2c8
go.etcd.io/bbolt.(*node).spill(0xc000152070)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/node.go:359 +0x216
go.etcd.io/bbolt.(*node).spill(0xc000152000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/node.go:346 +0xaa
go.etcd.io/bbolt.(*Bucket).spill(0xc00013e018)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/bucket.go:570 +0x33f
go.etcd.io/bbolt.(*Tx).Commit(0xc00013e000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/tx.go:160 +0xe7
go.etcd.io/bbolt.(*DB).Update(0xc0001f1000?, 0xc000134ef8)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:748 +0xe5
go.etcd.io/bbolt.(*batch).run(0xc00031c000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:856 +0x126
sync.(*Once).doSlow(0x0?, 0x1?)
/opt/hostedtoolcache/go/1.18.3/x64/src/sync/once.go:68 +0xc2
sync.(*Once).Do(...)
/opt/hostedtoolcache/go/1.18.3/x64/src/sync/once.go:59
go.etcd.io/bbolt.(*batch).trigger(0xc000321a00?)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:838 +0x45
created by time.goFunc
/opt/hostedtoolcache/go/1.18.3/x64/src/time/sleep.go:176 +0x32
My hypothesis is that in one of those restarts, the container might have been stopped in the middle of a writing procedure (although I am not 100% sure).
This is the first time this has happened to me, so I don't know how to recover from this state without deploying a new Portainer stack or erasing the whole database, as this would be a really drastic solution.
If it helps this is the docker-compose:
version: "3.8"
networks:
net:
external: true
services:
agent:
image: portainer/agent:2.14.2-alpine
environment:
AGENT_CLUSTER_ADDR: tasks.agent
AGENT_PORT: 9001
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- net
deploy:
mode: global
restart_policy:
condition: on-failure
portainer:
image: portainer/portainer-ce:2.14.2-alpine
command: -H tcp://tasks.agent:9001 --tlsskipverify --admin-password-file=/run/secrets/portainer_secret
ports:
- "9000:9000"
- "8000:8000"
volumes:
- "/var/volumes/portainer/data:/data"
networks:
- net
secrets:
- portainer_secret
- source: ca_cert_secret
target: /etc/ssl/certs/localCA.pem
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.labels.stateful == true
labels:
- "traefik.enable=true"
- "traefik.passHostHeader=true"
- "traefik.http.routers.portainer.rule=Host(`portainer`)"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.http.routers.portainer.entrypoints=web"
- "traefik.http.routers.portainer.service=portainer"
- "traefik.http.routers.portainer.tls=true"
- "traefik.http.routers.portainer.entrypoints=web-secure"
secrets:
portainer_secret:
external: true
ca_cert_secret:
external: true

Elasticsearch Logstash and Kibana (ELK) stack docker-compose on EC2 failed status checks

I'm running an ELK stack on a T4g.medium box (arm & 4GB ram) on AWS. When using the official Kibana image I see weird behaviour where after approx 4 hours running the CPU spikes (50-60%) and the EC2 box becomes unreachable until restarted. 1 out of 2 status checks fail also. Once restarted it runs for another 4 or so hours then the same happens again. The instance is not under heavy load and it goes down in the middle of the night when there is no load. I'm 99.9% its Kibana causing the issue as gagara/kibana-oss-arm64:7.6.2 has ran for months without issue. Its not an ARM issue or Kibana 7.13 either as I've encountered the same with x86 on older versions of Kibana. Mu config is:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.0
configs:
- source: elastic_config
target: /usr/share/elasticsearch/config/elasticsearch.yml
environment:
ES_JAVA_OPTS: "-Xmx2g -Xms2g"
networks:
- internal
volumes:
- /mnt/data/elasticsearch:/usr/share/elasticsearch/data
deploy:
mode: replicated
replicas: 1
logstash:
image: docker.elastic.co/logstash/logstash:7.13.0
ports:
- "5044:5044"
- "9600:9600"
configs:
- source: logstash_config
target: /usr/share/logstash/config/logstash.yml
- source: logstash_pipeline
target: /usr/share/logstash/pipeline/logstash.conf
environment:
LS_JAVA_OPTS: "-Xmx1g -Xms1g"
networks:
- internal
deploy:
mode: replicated
replicas: 1
kibana:
image: docker.elastic.co/kibana/kibana:7.13.0
configs:
- source: kibana_config
target: /usr/share/kibana/config/kibana.yml
environment:
NODE_OPTIONS: "--max-old-space-size=300"
networks:
- internal
deploy:
mode: replicated
replicas: 1
labels:
- "traefik.enable=true"
load-balancer:
image: traefik:v2.2.8
ports:
- 5601:443
configs:
- source: traefik_config
target: /etc/traefik/traefik.toml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
networks:
- internal
configs:
elastic_config:
file: ./config/elasticsearch.yml
logstash_config:
file: ./config/logstash/logstash.yml
logstash_pipeline:
file: ./config/logstash/pipeline/pipeline.conf
kibana_config:
file: ./config/kibana.yml
traefik_config:
file: ./config/traefik.toml
networks:
internal:
driver: overlay
And I've disabled a pile of stuff in kibana.yml to see if that helped:
server.name: kibana
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://elasticsearch:9200"]
xpack.monitoring.ui.enabled: false
xpack.graph.enabled: false
xpack.infra.enabled: false
xpack.canvas.enabled: false
xpack.ml.enabled: false
xpack.uptime.enabled: false
xpack.maps.enabled: false
xpack.apm.enabled: false
timelion.enabled: false
Has anyone encountered similar problems with a single node ELK stack running on Docker?

Swarm: Traefik returning 404 on compose services

I have two devices running Docker; an Intel NUC and a Raspberry Pi. My NUC is used as a mediaplayer/mediaserver. This also is the manager node. The Pi is being used as Home Assistant and MQTT machine and is set as worker node. I wanted to add them to a swarm so I could use Traefik for reverse proxy and HTTPS on both machines.
NUC:
1 docker-compose file for Traefik, Consul and Portainer.
1 docker-compose file for my media apps (Sabnzbd, Transmission-vpn, Sonarr, Radarr etc).
Pi:
1 docker-compose file for Home Assistant, MQTT etc.
Traefik and Portainer are up and running. I got them setup with `docker stack deploy`. Next I tried to setup my media apps, but they don't need to be connected with the Pi so I tried `docker compose`. Portainer shows the apps are running, but when I go to their subdomain Traefik returns 404 page not found. This makes me conclude that apps running outside the swarm, but connected to Traefik don't work. They also don't show up in the Traefik dashboard.
docker-compose.traefik.yml - 'docker stack deploy'
version: '3.7'
networks:
traefik_proxy:
external: true
agent-network:
attachable: true
volumes:
consul-data-leader:
consul-data-replica:
portainer-data:
services:
consul-leader:
image: consul
command: agent -server -client=0.0.0.0 -bootstrap -ui
volumes:
- consul-data-leader:/consul/data
environment:
- CONSUL_BIND_INTERFACE=eth0
- 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}'
networks:
- traefik_proxy
deploy:
labels:
- traefik.frontend.rule=Host:consul.${DOMAINNAME?Variable DOMAINNAME not set}
- traefik.enable=true
- traefik.port=8500
- traefik.tags=${TRAEFIK_PUBLIC_TAG:-traefik-public}
- traefik.docker.network=traefik_proxy
- traefik.frontend.entryPoints=http,https
- traefik.frontend.redirect.entryPoint=https
- traefik.frontend.auth.forward.address=http://oauth:4181
- traefik.frontend.auth.forward.authResponseHeaders=X-Forwarded-User
- traefik.frontend.auth.forward.trustForwardHeader=true
consul-replica:
image: consul
command: agent -server -client=0.0.0.0 -retry-join="consul-leader"
volumes:
- consul-data-replica:/consul/data
environment:
- CONSUL_BIND_INTERFACE=eth0
- 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}'
networks:
- traefik_proxy
deploy:
replicas: ${CONSUL_REPLICAS:-3}
placement:
preferences:
- spread: node.id
traefik:
image: traefik:v1.7
hostname: traefik
restart: always
networks:
- traefik_proxy
ports:
- target: 80
published: 80
- target: 443
published: 443
- target: 8080
published: 8145
deploy:
replicas: ${TRAEFIK_REPLICAS:-3}
placement:
constraints:
- node.role == manager
preferences:
- spread: node.id
labels:
traefik.enable: 'true'
traefik.backend: traefik
traefik.protocol: http
traefik.port: 8080
traefik.tags: traefik-public
traefik.frontend.rule: Host:traefik.${DOMAINNAME}
traefik.frontend.headers.SSLHost: traefik.${DOMAINNAME}
traefik.docker.network: traefik_proxy
traefik.frontend.passHostHeader: 'true'
traefik.frontend.headers.SSLForceHost: 'true'
traefik.frontend.headers.SSLRedirect: 'true'
traefik.frontend.headers.browserXSSFilter: 'true'
traefik.frontend.headers.contentTypeNosniff: 'true'
traefik.frontend.headers.forceSTSHeader: 'true'
traefik.frontend.headers.STSSeconds: 315360000
traefik.frontend.headers.STSIncludeSubdomains: 'true'
traefik.frontend.headers.STSPreload: 'true'
traefik.frontend.headers.customResponseHeaders: X-Robots-Tag:noindex,nofollow,nosnippet,noarchive,notranslate,noimageindex
traefik.frontend.headers.customFrameOptionsValue: 'allow-from https:${DOMAINNAME}'
traefik.frontend.auth.forward.address: 'http://oauth:4181'
traefik.frontend.auth.forward.authResponseHeaders: X-Forwarded-User
traefik.frontend.auth.forward.trustForwardHeader: 'true'
domainname: ${DOMAINNAME}
dns:
- 1.1.1.1
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ${USERDIR}/docker/traefik:/etc/traefik
- ${USERDIR}/docker/shared:/shared
environment:
CF_API_EMAIL: ${CLOUDFLARE_EMAIL}
CF_API_KEY: ${CLOUDFLARE_API_KEY}
command:
#- "storeconfig" #This is the push to consul, secondary traefik must be created and interfaced to this traefik. Remove this traefik's open ports, it shuts down once consul is messaged.
- '--logLevel=INFO'
- '--InsecureSkipVerify=true' #for unifi controller to not throw internal server error message
- '--api'
- '--api.entrypoint=apiport'
- '--defaultentrypoints=http,https'
- '--entrypoints=Name:http Address::80 Redirect.EntryPoint:https'
- '--entrypoints=Name:https Address::443 TLS TLS.SniStrict:true TLS.MinVersion:VersionTLS12 CipherSuites:TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256'
- '--entrypoints=Name:apiport Address::8080'
- '--file'
- '--file.directory=/etc/traefik/rules/'
- '--file.watch=true'
- '--acme'
- '--acme.storage=/etc/traefik/acme/acme.json'
- '--acme.entryPoint=https'
# not yet ready?
# - "--acme.TLS-ALPN-01=true"
- '--acme.dnsChallenge=true'
- '--acme.dnsChallenge.provider=cloudflare'
- '--acme.dnsChallenge.delayBeforeCheck=60'
- '--acme.dnsChallenge.resolvers=1.1.1.1,1.0.0.1'
- '--acme.onHostRule=true'
- '--acme.email=admin#${DOMAINNAME}'
- '--acme.acmeLogging=true'
- '--acme.domains=${DOMAINNAME},*.${DOMAINNAME},'
- '--acme.KeyType=RSA4096'
#Let's Encrypt's staging server,
#caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
- '--docker'
- '--docker.swarmMode'
- '--docker.domain=${DOMAINNAME}'
- '--docker.watch'
- '--docker.exposedbydefault=false'
#- "--consul"
#- "--consul.endpoint=consul:8500"
#- "--consul.prefix=traefik"
- '--retry'
- 'resolvers=[192,168.1.1:53,1.1.1.1:53,]'
depends_on:
- consul-leader
docker-compose.media.yml - 'docker compose'
sabnzbd:
image: linuxserver/sabnzbd
container_name: sabnzbd
restart: always
network_mode: service:transmission-vpn
# depends_on:
# - transmission-vpn
# ports:
# - '${SABNZBD_PORT}:8080'
volumes:
- ${USERDIR}/docker/sabnzbd:/config
- /media/Data/Downloads:/Downloads
# - ${USERDIR}/Downloads/incomplete:/incomplete-downloads
environment:
PUID: ${PUID}
PGID: ${PGID}
TZ: ${TZ}
UMASK_SET: 002
deploy:
replicas: 1
labels:
traefik.enable: 'true'
traefik.backend: sabnzbd
traefik.protocol: http
traefik.port: 8080
traefik.tags: traefik_proxy
traefik.frontend.rule: Host:sabnzbd.${DOMAINNAME}
# traefik.frontend.rule: Host:${DOMAINNAME}; PathPrefix: /sabnzbd
traefik.frontend.headers.SSLHost: sabnzbd.${DOMAINNAME}
traefik.docker.network: traefik_proxy
traefik.frontend.passHostHeader: 'true'
traefik.frontend.headers.SSLForceHost: 'true'
traefik.frontend.headers.SSLRedirect: 'true'
traefik.frontend.headers.browserXSSFilter: 'true'
traefik.frontend.headers.contentTypeNosniff: 'true'
traefik.frontend.headers.forceSTSHeader: 'true'
traefik.frontend.headers.STSSeconds: 315360000
traefik.frontend.headers.STSIncludeSubdomains: 'true'
traefik.frontend.headers.STSPreload: 'true'
traefik.frontend.headers.customResponseHeaders: X-Robots-Tag:noindex,nofollow,nosnippet,noarchive,notranslate,noimageindex
# traefik.frontend.headers.frameDeny: "true" #customFrameOptionsValue overrides this
traefik.frontend.headers.customFrameOptionsValue: 'allow-from https:${DOMAINNAME}'
traefik.frontend.auth.forward.address: 'http://oauth:4181'
traefik.frontend.auth.forward.authResponseHeaders: X-Forwarded-User
traefik.frontend.auth.forward.trustForwardHeader: 'true'
​
​
I already tried multiple things like removing the deploy command and just using labels etc but that didn't help at all. My Traefik logs also don't show anything that might be saying what's going wrong.
Are you running de .env file to set the environment variables? Because the feature .env is not currently supported by docker stack. You must source manually the .env running export $(cat .env) before running docker stack.

Getting permission denied error when restoring fabric network with old data

I'm trying to restore fabric-network with old blockchain data and for same I followed below steps.
Backup process
1. Stopped docker swarm network.
2. created a directory `bchain_backup` and under this directory I have created sub-directories for every node like orderer1, orderer2 and so on.
3. then I copied the data from container to `bchain_backup` directory
--> "docker cp container_name:/var/hyperledger/production bchain_backup/orderer1
--> executed above step for every node
Restoration process
1. copied all the certs and channel-artifacts
2. mapped '/bchain_backup/orderer1/production:/var/hyperledger/production' in compose-file.
3. performed step 2 for every node.
When I tried to start the network them I'm getting below error:
with Orderer node
panic: Error opening leveldb: open /var/hyperledger/production/orderer/index/LOCK: permission denied
With peer node
panic: Error opening leveldb: open /var/hyperledger/production/ledgersData/ledgerProvider/LOCK: permission denied
Using couchDB
Using Docker-swarm on GCP Ubuntu 18.04 instance
docker-orderer1.yaml file
version: '3.7'
volumes:
orderer1.example.com:
# set external: true and now network name is "networks.test-network.name" instead of "networks.test-network.external.name"
networks:
testchain-network:
external: true
name: testchain-network
services:
orderer1:
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
placement:
constraints:
- node.hostname == gcloud1
resources:
limits:
cpus: '0.50'
memory: 1000M
reservations:
cpus: '0.25'
memory: 50M
hostname: orderer1.example.com
image: hyperledger/fabric-orderer:1.4.4
user: "${UID}:${GID}"
#healthcheck:
#testchain: ["CMD","curl","-f","http://orderer1.example.com:4443/"]
#interval: 1m30s
#timeout: 10s
#retries: 3
#start_period: 1m
environment:
- CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=testchain-network
- ORDERER_HOST=orderer1.example.com
- ORDERER_GENERAL_LOGLEVEL=info
- FABRIC_LOGGING_SPEC=warning
- ORDERER_GENERAL_LISTENADDRESS=0.0.0.0
- ORDERER_GENERAL_LISTENPORT=7050
- ORDERER_GENERAL_GENESISMETHOD=file
- ORDERER_GENERAL_GENESISFILE=/var/hyperledger/orderer/orderer.genesis.block
- ORDERER_GENERAL_LOCALMSPID=OrdererMSP
- ORDERER_GENERAL_LOCALMSPDIR=/var/hyperledger/orderer/msp
- ORDERER_GENERAL_GENESISPROFILE=OrdererOrg
- CONFIGTX_ORDERER_ADDRESSES=[127.0.0.1:7050]
- ORDERER_OPERATIONS_LISTENADDRESS=0.0.0.0:4443
# enabled TLS
- ORDERER_GENERAL_TLS_ENABLED=true
- ORDERER_GENERAL_TLS_PRIVATEKEY=/var/hyperledger/orderer/tls/server.key
- ORDERER_GENERAL_TLS_CERTIFICATE=/var/hyperledger/orderer/tls/server.crt
- ORDERER_GENERAL_TLS_ROOTCAS=[/var/hyperledger/orderer/tls/ca.crt]
#- ORDERER_KAFKA_TOPIC_REPLICATIONFACTOR=1
#- ORDERER_KAFKA_VERBOSE=true
- ORDERER_GENERAL_CLUSTER_CLIENTCERTIFICATE=/var/hyperledger/orderer/tls/server.crt
- ORDERER_GENERAL_CLUSTER_CLIENTPRIVATEKEY=/var/hyperledger/orderer/tls/server.key
- ORDERER_GENERAL_CLUSTER_ROOTCAS=[/var/hyperledger/orderer/tls/ca.crt]
- CORE_CHAINCODE_LOGGING_LEVEL=DEBUG
- CORE_CHAINCODE_LOGGING_SHIM=DEBUG
- ORDERER_TLS_CLIENTROOTCAS_FILES=/var/hyperledger/users/Admin#example.com/tls/ca.crt
- ORDERER_TLS_CLIENTCERT_FILE=/var/hyperledger/users/Admin#example.com/tls/client.crt
- ORDERER_TLS_CLIENTKEY_FILE=/var/hyperledger/users/Admin#example.com/tls/client.key
- GODEBUG=netdns=go
working_dir: /opt/gopath/src/github.com/hyperledger/fabric
command: orderer
volumes:
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/channel-artifacts/:/var/hyperledger/configs:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/channel-artifacts/genesis.block:/var/hyperledger/orderer/orderer.genesis.block:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/ordererOrganizations/example.com/orderers/orderer1.example.com/msp:/var/hyperledger/orderer/msp:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/ordererOrganizations/example.com/orderers/orderer1.example.com/tls/:/var/hyperledger/orderer/tls:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/ordererOrganizations/example.com/users:/var/hyperledger/users:ro
- /home/delta/GoWorkspace/src/github.com/testchain/backup_blockchain/orderer1/production/orderer:/var/hyperledger/production/orderer
ports:
- published: 7050
target: 7050
# mode: host
#- 7050:7050
- published: 4443
target: 4443
# mode: host
networks:
testchain-network:
aliases:
- orderer1.example.com
docker-peer0-org1.yaml
version: '3.7'
volumes:
peer0.org1.example.com:
networks:
testchain-network:
external: true
name: testchain-network
services:
org1peer0couchdb:
hostname: couchdb.peer0.org1.example.com
image: hyperledger/fabric-couchdb:0.4.18
user: "${UID}:${GID}"
environment:
- COUCHDB_USER=couchdb
- COUCHDB_PASSWORD=couchdb123
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
placement:
constraints:
- node.hostname == gcloud1
ports:
- published: 5984
target: 5984
# mode: host
networks:
testchain-network:
aliases:
- couchdb.peer0.org1.example.com
org1peer0:
hostname: peer0.org1.example.com
image: hyperledger/fabric-peer:1.4.4
user: "${UID}:${GID}"
environment:
- CORE_VM_ENDPOINT=unix:///host/var/run/docker.sock
# the following setting starts chaincode containers on the same
# bridge network as the peers
# https://docs.docker.com/compose/networking/
- CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=testchain-network
- CORE_VM_DOCKER_ATTACHSTDOUT=true
- CORE_PEER_ID=peer0.org1.example.com
- CORE_PEER_ADDRESS=peer0.org1.example.com:7051
- CORE_PEER_LISTENADDRESS=0.0.0.0:7051
- CORE_PEER_CHAINCODEADDRESS=peer0.org1.example.com:7052
- CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:7052
- CORE_CHAINCODE_BUILDER=hyperledger/fabric-ccenv:1.4.4
- CORE_CHAINCODE_GOLANG_RUNTIME=hyperledger/fabric-baseos:0.4.18
- CORE_PEER_GOSSIP_BOOTSTRAP=peer1.org1.example.com:8051
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer0.org1.example.com:7051
- CORE_PEER_LOCALMSPID=Org1MSP
- FABRIC_LOGGING_SPEC=info
- CORE_PEER_TLS_ENABLED=true
- CORE_PEER_GOSSIP_USELEADERELECTION=true
- CORE_PEER_ADDRESSAUTODETECT=true
- CORE_PEER_GOSSIP_ORGLEADER=false
- CORE_PEER_PROFILE_ENABLED=true
- CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/fabric/tls/server.crt
- CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/fabric/tls/server.key
- CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/fabric/tls/ca.crt
- CORE_CHAINCODE_LOGGING_LEVEL=DEBUG
- CORE_CHAINCODE_LOGGING_SHIM=DEBUG
- CORE_LOGGING_CAUTHDSL=warning
- CORE_LOGGING_GOSSIP=warning
- CORE_LOGGING_LEDGER=info
- CORE_LOGGING_MSP=warning
- CORE_LOGGING_POLICIES=warning
- CORE_LOGGING_GRPC=DEBUG
- CORE_OPERATIONS_LISTENADDRESS=0.0.0.0:7443
# Client certs
- CORE_PEER_TLS_CLIENTROOTCAS_FILES=/var/hyperledger/users/Admin#org1.example.com/tls/ca.crt
- CORE_PEER_TLS_CLIENTCERT_FILE=/var/hyperledger/users/Admin#org1.example.com/tls/client.crt
- CORE_PEER_TLS_CLIENTKEY_FILE=/var/hyperledger/users/Admin#org1.example.com/tls/client.key
# CouchDB
- CORE_LEDGER_STATE_STATEDATABASE=CouchDB
- CORE_LEDGER_STATE_COUCHDBCONFIG_USERNAME=couchdb
- CORE_LEDGER_STATE_COUCHDBCONFIG_PASSWORD=couchdb123
- CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS=couchdb.peer0.org1.example.com:5984
- GODEBUG=netdns=go
working_dir: /opt/gopath/src/github.com/hyperledger/fabric/peer
command: peer node start
volumes:
- /var/run/:/host/var/run/:rw
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/msp:/etc/hyperledger/fabric/msp:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls:/etc/hyperledger/fabric/tls:ro
- /home/delta/GoWorkspace/src/github.com/testchain/bchain_network/crypto-config/peerOrganizations/org1.example.com/users:/var/hyperledger/users:ro
- /home/delta/GoWorkspace/src/github.com/testchain/backup_blockchain/peer0org1/production:/var/hyperledger/production
#- ../chaincode/:/opt/gopath/src/github.com/chaincode
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
placement:
constraints:
- node.hostname == gcloud1
ports:
- published: 7051
target: 7051
# mode: host
- published: 7052
target: 7052
# mode: host
- published: 7443
target: 7443
# mode: host
networks:
testchain-network:
aliases:
- peer0.org1.example.com
- /home/delta/GoWorkspace/src/github.com/testchain/backup_blockchain/orderer1/production/orderer:/var/hyperledger/production/orderer
Instead of above host path mount
Please create a docker volume for each entity(orderer1, orderer2, etc) and copy all data to the volume and map the volume instead of host path
Usage: docker volume COMMAND
Manage volumes
Commands:
create Create a volume
inspect Display detailed information on one or more volumes
ls List volumes
prune Remove all unused local volumes
rm Remove one or more volumes
Run 'docker volume COMMAND --help' for more information on a command.
Seems like permission issue and check the resources like CPU and ram
usage

Netdata in a docker swarm environment

I'm quite new to Netdata and also Docker Swarm. I ran Netdata for a while on single hosts but now trying to stream Netdata from workers to a manager node in a swarm environment where the manager also should act as a central Netdata instance. I'm aiming to only monitor the data from the manager.
Here's my compose file for the stack:
version: '3.2'
services:
netdata-client:
image: titpetric/netdata
hostname: "{{.Node.Hostname}}"
cap_add:
- SYS_PTRACE
security_opt:
- apparmor:unconfined
environment:
- NETDATA_STREAM_DESTINATION=control:19999
- NETDATA_STREAM_API_KEY=1x214ch15h3at1289y
- PGID=999
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
networks:
- netdata
deploy:
mode: global
placement:
constraints: [node.role == worker]
netdata-central:
image: titpetric/netdata
hostname: control
cap_add:
- SYS_PTRACE
security_opt:
- apparmor:unconfined
environment:
- NETDATA_API_KEY_ENABLE_1x214ch15h3at1289y=1
ports:
- '19999:19999'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
networks:
- netdata
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.role == manager]
networks:
netdata:
driver: overlay
attachable: true
Netdata on the manager works fine and the container runs on the one worker node I'm testing on. According to log output it seems to run well and gathers names from the docker containers running as it does in a local environment.
Problem is that it can't connect to the netdata-central service running on the manager.
This is the error message:
2019-01-04 08:35:28: netdata INFO : STREAM_SENDER[7] : STREAM 7 [send to control:19999]: connecting...,
2019-01-04 08:35:28: netdata ERROR : STREAM_SENDER[7] : Cannot resolve host 'control', port '19999': Name or service not known,
not sure why it can't resolve the hostname, thought it should work that way on the overlay network. Maybe there's a better way to connect and not rely on the hostname?
Any help is appreciated.
EDIT: as this question might come up - the firewall (ufw) on the control host is inactive, also I think the error message clearly points to a problem with name resolution.
Your API-Key is in the wrong format..it has to be a GUID. You can generate one with the "uuidgen" command...
https://github.com/netdata/netdata/blob/63c96aa96f96f3aea10bdcd2ecd92c889f26b3af/conf.d/stream.conf#L7
In the latest image the environment variables does not work.
The solution is to create a configuration file for the stream.
My working compose file is:
version: '3.7'
configs:
netdata_stream_master:
file: $PWD/stream-master.conf
netdata_stream_client:
file: $PWD/stream-client.conf
services:
netdata-client:
image: netdata/netdata:v1.21.1
hostname: "{{.Node.Hostname}}"
depends_on:
- netdata-central
configs:
-
mode: 444
source: netdata_stream_client
target: /etc/netdata/stream.conf
security_opt:
- apparmor:unconfined
environment:
- PGID=999
volumes:
- /proc:/host/proc:ro
- /etc/passwd:/host/etc/passwd:ro
- /etc/group:/host/etc/group:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
deploy:
mode: global
netdata-central:
image: netdata/netdata:v1.21.1
hostname: control
configs:
-
mode: 444
source: netdata_stream_master
target: /etc/netdata/stream.conf
security_opt:
- apparmor:unconfined
environment:
- PGID=999
ports:
- '19999:19999'
volumes:
- /etc/passwd:/host/etc/passwd:ro
- /etc/group:/host/etc/group:ro
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.role == manager]

Resources