Elasticsearch Logstash and Kibana (ELK) stack docker-compose on EC2 failed status checks - docker

I'm running an ELK stack on a T4g.medium box (arm & 4GB ram) on AWS. When using the official Kibana image I see weird behaviour where after approx 4 hours running the CPU spikes (50-60%) and the EC2 box becomes unreachable until restarted. 1 out of 2 status checks fail also. Once restarted it runs for another 4 or so hours then the same happens again. The instance is not under heavy load and it goes down in the middle of the night when there is no load. I'm 99.9% its Kibana causing the issue as gagara/kibana-oss-arm64:7.6.2 has ran for months without issue. Its not an ARM issue or Kibana 7.13 either as I've encountered the same with x86 on older versions of Kibana. Mu config is:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.0
configs:
- source: elastic_config
target: /usr/share/elasticsearch/config/elasticsearch.yml
environment:
ES_JAVA_OPTS: "-Xmx2g -Xms2g"
networks:
- internal
volumes:
- /mnt/data/elasticsearch:/usr/share/elasticsearch/data
deploy:
mode: replicated
replicas: 1
logstash:
image: docker.elastic.co/logstash/logstash:7.13.0
ports:
- "5044:5044"
- "9600:9600"
configs:
- source: logstash_config
target: /usr/share/logstash/config/logstash.yml
- source: logstash_pipeline
target: /usr/share/logstash/pipeline/logstash.conf
environment:
LS_JAVA_OPTS: "-Xmx1g -Xms1g"
networks:
- internal
deploy:
mode: replicated
replicas: 1
kibana:
image: docker.elastic.co/kibana/kibana:7.13.0
configs:
- source: kibana_config
target: /usr/share/kibana/config/kibana.yml
environment:
NODE_OPTIONS: "--max-old-space-size=300"
networks:
- internal
deploy:
mode: replicated
replicas: 1
labels:
- "traefik.enable=true"
load-balancer:
image: traefik:v2.2.8
ports:
- 5601:443
configs:
- source: traefik_config
target: /etc/traefik/traefik.toml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
networks:
- internal
configs:
elastic_config:
file: ./config/elasticsearch.yml
logstash_config:
file: ./config/logstash/logstash.yml
logstash_pipeline:
file: ./config/logstash/pipeline/pipeline.conf
kibana_config:
file: ./config/kibana.yml
traefik_config:
file: ./config/traefik.toml
networks:
internal:
driver: overlay
And I've disabled a pile of stuff in kibana.yml to see if that helped:
server.name: kibana
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://elasticsearch:9200"]
xpack.monitoring.ui.enabled: false
xpack.graph.enabled: false
xpack.infra.enabled: false
xpack.canvas.enabled: false
xpack.ml.enabled: false
xpack.uptime.enabled: false
xpack.maps.enabled: false
xpack.apm.enabled: false
timelion.enabled: false
Has anyone encountered similar problems with a single node ELK stack running on Docker?

Related

Corruption of Portainer's DB

I have a deployment of Portainer 2.14.2 and Docker Engine 20.10.7.
It has been functional for quite a few months. Today I had some problems as the Portainer container (the one that is in charge of the UI, not the agent) was restarting. In one of those restarts, for an unknown reason, the database has been corrupted.
Logs:
time="2022-10-19T10:59:15Z" level=info msg="Encryption key file `portainer` not present"
time="2022-10-19T10:59:15Z" level=info msg="Proceeding without encryption key"
time="2022-10-19T10:59:15Z" level=info msg="Loading PortainerDB: portainer.db"
panic: page 8 already freed
goroutine 35 [running]:
go.etcd.io/bbolt.(*freelist).free(0xc000728600, 0xb175, 0x7f104c311000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/freelist.go:175 +0x2c8
go.etcd.io/bbolt.(*node).spill(0xc000152070)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/node.go:359 +0x216
go.etcd.io/bbolt.(*node).spill(0xc000152000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/node.go:346 +0xaa
go.etcd.io/bbolt.(*Bucket).spill(0xc00013e018)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/bucket.go:570 +0x33f
go.etcd.io/bbolt.(*Tx).Commit(0xc00013e000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/tx.go:160 +0xe7
go.etcd.io/bbolt.(*DB).Update(0xc0001f1000?, 0xc000134ef8)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:748 +0xe5
go.etcd.io/bbolt.(*batch).run(0xc00031c000)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:856 +0x126
sync.(*Once).doSlow(0x0?, 0x1?)
/opt/hostedtoolcache/go/1.18.3/x64/src/sync/once.go:68 +0xc2
sync.(*Once).Do(...)
/opt/hostedtoolcache/go/1.18.3/x64/src/sync/once.go:59
go.etcd.io/bbolt.(*batch).trigger(0xc000321a00?)
/tmp/go/pkg/mod/go.etcd.io/bbolt#v1.3.6/db.go:838 +0x45
created by time.goFunc
/opt/hostedtoolcache/go/1.18.3/x64/src/time/sleep.go:176 +0x32
My hypothesis is that in one of those restarts, the container might have been stopped in the middle of a writing procedure (although I am not 100% sure).
This is the first time this has happened to me, so I don't know how to recover from this state without deploying a new Portainer stack or erasing the whole database, as this would be a really drastic solution.
If it helps this is the docker-compose:
version: "3.8"
networks:
net:
external: true
services:
agent:
image: portainer/agent:2.14.2-alpine
environment:
AGENT_CLUSTER_ADDR: tasks.agent
AGENT_PORT: 9001
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- net
deploy:
mode: global
restart_policy:
condition: on-failure
portainer:
image: portainer/portainer-ce:2.14.2-alpine
command: -H tcp://tasks.agent:9001 --tlsskipverify --admin-password-file=/run/secrets/portainer_secret
ports:
- "9000:9000"
- "8000:8000"
volumes:
- "/var/volumes/portainer/data:/data"
networks:
- net
secrets:
- portainer_secret
- source: ca_cert_secret
target: /etc/ssl/certs/localCA.pem
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.labels.stateful == true
labels:
- "traefik.enable=true"
- "traefik.passHostHeader=true"
- "traefik.http.routers.portainer.rule=Host(`portainer`)"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.http.routers.portainer.entrypoints=web"
- "traefik.http.routers.portainer.service=portainer"
- "traefik.http.routers.portainer.tls=true"
- "traefik.http.routers.portainer.entrypoints=web-secure"
secrets:
portainer_secret:
external: true
ca_cert_secret:
external: true

Docker Swarm - Elastichsearch instances cannot resolve each other on the same overlay

I am trying to create an elasticsearch stack using docker swarm (Don't really need swarm functionality, but the .yml has been written)
The problem I seem to be getting is that when i start up the stack, the two masters can't resolve each other (Despite confirming they're on the same network)
and it may be the fault of discovery.seed_hosts providing the masters with an empty list or it could be the docker network not working properly.
"message":"publish_address {10.0.120.16:9300}, bound_addresses {0.0.0.0:9300}"
"message":"bound or publishing to a non-loopback address, enforcing bootstrap checks",
"message":"failed to resolve host [es-master2]"
"message":"master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes [es-master1, es-master2] to bootstrap a cluster: have discovered [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}]; discovery will continue using [] from hosts providers and [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
I'm not sure if the discovering from hosts [] means that something went wrong with the discovery.seed_hosts setting. I suspect it because without the setting it'll be replaced with [127.0.0.1:9300,... etc]
Here's the parts of my docker compose that's relevant. It's part of a bigger file, but right now i just need the two masters to talk first.
.
.
.
networks:
es-internal:
driver: overlay
es-connection:
driver: overlay
external:
driver: overlay
name: external
.
.
.
es-master1:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master1
environment:
- node.name=es-master1
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master2
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master1:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master1-crt
target: /usr/share/elasticsearch/config/certs/es-master1.crt
- source: es-master1-key
target: /usr/share/elasticsearch/config/certs/es-master1.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
es-master2:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master2
environment:
- node.name=es-master2
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master1
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master2:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master2-crt
target: /usr/share/elasticsearch/config/certs/es-master2.crt
- source: es-master2-key
target: /usr/share/elasticsearch/config/certs/es-master2.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
.
.
.
I've tried:
inspecting the network it creates to see if they are in the same network
All sorts of es config options
New swarm service meant to test out pinging between containers on the same overlay network. (They could resolve the hostnames given to them with the ping command)
Expose ports (I know it's useless now)
If anyone has some input it would be great!

Unable to read input logs filebeat

I am fairly new to docker and I am trying out the ELK Setup with Filebeat. I have a container for filebeat setup in machine 1 and I am trying to collect the logs from /mnt/logs/temp.log (which are non-container logs) to the ELK containers in machine 2. Here's my filebeat configuration:-
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
hints.default_config:
type: container
paths:
- /mnt/logs/temp.log
processors:
- add_cloud_metadata: ~
output.elasticsearch:
hosts: '${ELASTICSEARCH_HOSTS:42.23.12.131:9042}'
Even if I change the filebeat.yml config to the below, it doesn't seem to send any logs to ES:-
filebeat.inputs:
- type: log
paths:
- /mnt/logs/temp.log
output.elasticsearch:
hosts: ["42.23.12.131:9042"]
Can someone please help me out or point me to any site articles or documentation regarding this? Version of filebeat and ELK container is 7.14.0.
Edit: The docker-compose file for ELK is:-
version: '2.2'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
volumes:
- type: bind
source: ./elasticsearch/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- type: volume
source: elasticsearch
target: /usr/share/elasticsearch/data
environment:
ES_JAVA_OPTS: "-Xmx512m -Xms512m"
discovery.type: single-node
ports:
- "9200:9200"
- "9300:9300"
networks:
- elk
logstash:
image: docker.elastic.co/logstash/logstash:7.14.0
volumes:
- type: bind
source: ./logstash/config/logstash.yml
target: /usr/share/logstash/config/logstash.yml
read_only: true
- type: bind
source: ./logstash/pipeline.conf
target: /usr/share/logstash/pipeline.conf
read_only: true
ports:
- "5044:5044/udp"
- "9600:9600"
environment:
LS_JAVA_OPTS: "-Xmx512m -Xms512m"
networks:
- elk
depends_on:
- elasticsearch
kibana:
image: docker.elastic.co/kibana/kibana:7.14.0
volumes:
- type: bind
source: ./kibana/kibana.yml
target: /usr/share/kibana/config/kibana.yml
read_only: true
ports:
- "5601:5601"
networks:
- elk
depends_on:
- elasticsearch
networks:
elk:
driver: bridge
volumes:
elasticsearch:
In your docker-compose file, juste this ports are exposed outside the container (in consideration, the port 9042 is the one you have configured on elasticsearch side) :
ports:
- "9200:9200"
- "9300:9300"
So, if you add the targeted port 9042, it must work. So this must looks like this :
ports:
- "9200:9200"
- "9300:9300"
- "9042:9042"
If is not the port 9042 that you have configured into your elasticsearhc, that means you have to change the configuration from your filebeat agent to have the correct port (probably the 9200)

how to setup kibana user credentials with docker elk stack

How to setup login credentials for kibana gui with docker elk stack containers.
What arguments and environmental variables must be passed in docker-compose.yaml file to get this working.
For setting kibana user credentials for docker elk stack, we have to set xpack.security.enabled: true either in elasticsearch.yml or pass this as a environment variable in docker-compose.yml file.
Pass username & password as environment variable in docker-compose.yml like below:
version: '3.3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1
ports:
- "9200:9200"
- "9300:9300"
configs:
- source: elastic_config
target: /usr/share/elasticsearch/config/elasticsearch.yml
environment:
ES_JAVA_OPTS: "-Xmx256m -Xms256m"
ELASTIC_USERNAME: "elastic"
ELASTIC_PASSWORD: "MyPw123"
http.cors.enabled: "true"
http.cors.allow-origin: "*"
xpack.security.enabled: "true"
networks:
- elk
deploy:
mode: replicated
replicas: 1
logstash:
image: docker.elastic.co/logstash/logstash:6.6.1
ports:
- "5044:5044"
- "9600:9600"
configs:
- source: logstash_config
target: /usr/share/logstash/config/logstash.yml:rw
- source: logstash_pipeline
target: /usr/share/logstash/pipeline/logstash.conf
environment:
LS_JAVA_OPTS: "-Xmx256m -Xms256m"
xpack.monitoring.elasticsearch.url: "elasticsearch:9200"
xpack.monitoring.elasticsearch.username: "elastic"
xpack.monitoring.elasticsearch.password: "MyPw123"
networks:
- elk
deploy:
mode: replicated
replicas: 1
kibana:
image: docker.elastic.co/kibana/kibana:6.6.1
ports:
- "5601:5601"
configs:
- source: kibana_config
target: /usr/share/kibana/config/kibana.yml
networks:
- elk
deploy:
mode: replicated
replicas: 1
configs:
elastic_config:
file: ./elasticsearch/config/elasticsearch.yml
logstash_config:
file: ./logstash/config/logstash.yml
logstash_pipeline:
file: ./logstash/pipeline/logstash.conf
kibana_config:
file: ./kibana/config/kibana.yml
networks:
elk:
driver: overlay
Then add this following lines to kibana.yml:
elasticsearch.username: "elastic"
elasticsearch.password: "MyPw123"
Did not managed to get it working without adding XPACK_MONITORING & SECURITY flags to kibana's container and there was no need for a config file
However I was not able to use kibana user, even after logging in with elastic user and changing kibana's password through the UI.
NOTE: looks like you can't setup default built-in users other than elastic superuser in docker-compose through it's environment. I've tried several times with kibana and kibana_system to no success.
version: "3.7"
services:
elasticsearch:
image: elasticsearch:7.4.0
restart: always
ports:
- 9200:9200
environment:
- discovery.type=single-node
- xpack.security.enabled=true
- ELASTIC_PASSWORD=123456
kibana:
image: kibana:7.4.0
restart: always
ports:
- 5601:5601
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- XPACK_MONITORING_ENABLED=true
- XPACK_MONITORING_COLLECTION_ENABLED=true
- XPACK_SECURITY_ENABLED=true
- ELASTICSEARCH_USERNAME=elastic
- ELASTICSEARCH_PASSWORD="123456"
depends_on:
- elasticsearch
SOURCE
NOTE: looks like this won't work with 8.5.3, Kibana won't accept superuser elastic.
Update
I was able to setup 8.5.3 but with a couple twists. I would build the whole environment, then in elastic's container run the setup-passwords auto
bin/elasticsearch-setup-passwords auto
Grab the auto generated password for kibana_system user and replace it in docker-compose then restart only kibana's container
Kibana 8.5.3 with environment variables:
kibana:
image: kibana:8.5.3
restart: always
ports:
- 5601:5601
environment:
- ELASTICSEARCH_USERNAME="kibana_system"
- ELASTICSEARCH_PASSWORD="sVUurmsWYEwnliUxp3pX"
Restart kibana's container:
docker-compose up -d --build --force-recreate --no-deps kibana
NOTE: make sure to use --no-deps flag otherwise it will restart elastic container if tagged to kibana's

docker swarm: how to publish a service only on a specific node that runs a task

I created a docker-compose.yml file containing two services that are run on two different nodes. The two services are meant to communicate on the same port as client and server. Below is my docker-compose.yml file.
version: "3"
services:
service1:
image: localrepo/image1
deploy:
placement:
constraints: [node.hostname == node1]
replicas: 1
resources:
limits:
cpus: "1"
memory: 1000M
restart_policy:
condition: on-failure
ports:
- 8000:8000
networks:
- webnet
service2:
image: localrepo/image2
deploy:
placement:
constraints: [node.hostname == node2]
replicas: 1
resources:
limits:
cpus: "1"
memory: 500M
restart_policy:
condition: on-failure
ports:
- "8000:8000"
networks:
- webnet
networks:
webnet:
When I issue docker stack deploy -c I get error, reading
> Error response from daemon: rpc error: code = 3 desc = port '8000' is already in use by service.
In this thread I read that deploying a service in a swarm makes the port accessible throughout any node in the swarm. If I understand correctly, that makes the port occupied by any node in the cluster. In the same thread, it was suggested to use mode=host publishing, which will only expose the port on the actual host that the container runs. I applied that in the port as:
ports:
- "mode=host, target=8000, published=8000"
Making that change in both service and trying to issue docker stack gives another error:
> 1 error(s) decoding:
* Invalid containerPort: mode=host, target=8000, published=8000
Does anyone know how to fix this issue?
p.s: I tried both "Version3" and "Version3.2" but the issue didn't solve.
I don't know how did you specify host mode since your docker-compose.yml doesn't represent host mode anywhere.
Incidentally, try with long syntax which can specify host mode in docker-compose.yml file.
This long syntax is new in v3.2 and the below is the example(I check it works)
(This is compatible docker engine version against docker-compose syntax version.)
version: '3.4' # version: '3.2' also will works
networks:
swarm_network:
driver: overlay
services:
service1:
image: asleea/test
command: ["nc", "-vlkp", "8000"]
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == node1
ports:
- published: 8000
target: 8000
mode: host
networks:
swarm_network:
service2:
image: asleea/test
command: ["nc", "service1", "8000"]
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == node2
ports:
- published: 8000
target: 8000
mode: host
networks:
swarm_network:
The problem is fixed after upgrading to the latest docker version, 18.01.0-ce

Resources