Elasticsearch - Bootstrap checks failing max virtual memory areas error - docker

Hello I want to install elk on docker, so I followed the official documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
So when I want to start Elasticsearch in Docker to get the password generated for the elastic user and the enrollment token for enrolling Kibana by executing this command:
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.1.2
I get this error:
ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/Elasticsearch/logs/docker-cluster.log
{"#timestamp":"2022-04-14T12:39:58.449Z", "log.level": "INFO", "message":"stopping ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"Elasticsearch.server","process.thread.name":"Thread-2","log.logger":"org.Elasticsearch.node.Node","Elasticsearch.node.name":"50af9edc5c7d","Elasticsearch.cluster.name":"docker-cluster"}
{"#timestamp":"2022-04-14T12:39:58.512Z", "log.level": "INFO", "message":"stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"Elasticsearch.server","process.thread.name":"Thread-2","log.logger":"org.Elasticsearch.node.Node","Elasticsearch.node.name":"50af9edc5c7d","Elasticsearch.cluster.name":"docker-cluster"}
{"#timestamp":"2022-04-14T12:39:58.513Z", "log.level": "INFO", "message":"closing ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"Elasticsearch.server","process.thread.name":"Thread-2","log.logger":"org.Elasticsearch.node.Node","Elasticsearch.node.name":"50af9edc5c7d","Elasticsearch.cluster.name":"docker-cluster"}
{"#timestamp":"2022-04-14T12:39:58.531Z", "log.level": "INFO", "message":"closed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"Elasticsearch.server","process.thread.name":"Thread-2","log.logger":"org.Elasticsearch.node.Node","Elasticsearch.node.name":"50af9edc5c7d","Elasticsearch.cluster.name":"docker-cluster"}
{"#timestamp":"2022-04-14T12:39:58.535Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"Elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.Elasticsearch.xpack.ml.process.NativeController","Elasticsearch.node.name":"50af9edc5c7d","Elasticsearch.cluster.name":"docker-cluster"}

I solved this problem with these commands:
For Windows and macOS with Docker Desktop
docker-machine ssh
sudo sysctl -w vm.max_map_count=262144
For Windows with Docker Desktop WSL
wsl -d docker-desktop
sysctl -w vm.max_map_count=262144
And finally, I reinitialized the docker containers.
Documentation

I resolved this problem by runing this command:
grep vm.max_map_count /etc/sysctl.conf vm.max_map_count=262144

Related

SSL(curl) connection error in ElasticSearch setup

Have setup a 3-node Elasticsearch cluster using docker-compose. Followed below steps:
On one of the master nodes, es11, gets below error, however same curl command works fine on other 2 nodes i.e. es12, es13:
Error:
curl -X GET 'https://localhost:9316'
curl: (35) Encountered end of file
Below error in logs:
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [es13][SOMEIP:9316][internal:cluster/coordination/join]",
"Caused by: org.elasticsearch.transport.ConnectTransportException: [es11][SOMEIP:9316] handshake failed. unexpected remote node {es13}{SOMEVALUE}{SOMEVALUE
"at org.elasticsearch.transport.TransportService.lambda$connectionValidator$6(TransportService.java:468) ~[elasticsearch-7.17.6.jar:7.17.6]",
"at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:95) ~[elasticsearch-7.17.6.jar:7.17.6]",
"at org.elasticsearch.transport.TransportService.lambda$handshake$9(TransportService.java:577) ~[elasticsearch-7.17.6.jar:7.17.6]",
https://localhost:9316 on browser gives site can't be reached error as well.It seems SSL certificate as created in step 4 below is having some issues in es11.
Any leads please? OR If I repeat step 4, do i need to copy the certs again to es12 & es13?
Below elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
Ports as defined in all 3 nodes docker-compose.yml
environment:
- node.name=es11
- transport.port=9316
ports:
- 9216:9200
- 9316:9316
Initialize a docker swarm. On ES11 run docker swarm init. Follow the instructions to join 12 and 13 to the swarm.
Create an overlay network docker network create -d overlay --attachable elastic
If necessary, bring down the current cluster and remove all the associated volumes by running docker-compose down -v
Create SSL certificates for ES with docker-compose -f create-certs.yml run --rm create_certs
Copy the certs for es12 and 13 to the respective servers
Use this busybox to create the overlay network on 12 and 13 sudo docker run -itd --name containerX --net [network name] busybox
Configure certs on 12 and 13 with docker-compose -f config-certs.yml run --rm config_certs
Start the cluster with docker-compose up -d on each server
Set the passwords for the built-in ES accounts by logging into the cluster docker exec -it es11 sh then running bin/elasticsearch-setup-passwords interactive --url localhost:9316
(as per your https://discuss.elastic.co thread)
you cannot talk HTTP to the transport protocol port, which you have defined in transport.port. you need to talk to port 9200 in the container, which you have mapped to 9216 outside the container
the transport port runs a binary protocol that is not HTTP accessible

Connecting with Portainer: "resource is online but isn't responding to connection attempts"

I installed Ubuntu on an older Laptop. Now there is Docker with Portainer running and I want to access Portainer via my main PC in the same network. When I try to connect to Portainer via my Laptop where it is runnig (not Localhost address) it works fine. But when I try to connect via my PC, I get a timeout. Windows diagnostics says: "resource is online but isn't responding to connection attempts". How can I open Portainer to my local network? Or is this a problem with Ubuntu?
so check if you have openssh server running for ssh! disable firewall on terminal sudo ufw disable check if your network card is running on name eth0 ifconfig if not change following this step below
Using netplan which is the default these days. File /etc/netplan/00-installer-config.yaml file. but b4 you need to get serial/mac
Find the target devices mac/hw address using the lshw command:
lshw -C network
You'll see some output which looks like:
root#ys:/etc# lshw -C network
*-network
description: Ethernet interface
physical id: 2
logical name: eth0
serial: dc:a6:32:e8:23:19
size: 1Gbit/s
capacity: 1Gbit/s
capabilities: ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bcmgenet driverversion=5.8.0-1015-raspi duplex=full ip=192.168.0.112 link=yes multicast=yes port=MII speed=1Gbit/s
So then you take the serial
dc:a6:32:e8:23:19
Note the set-name option.
This works for the wifi section as well.
if you using calbe you can delete everything add the example only change for your serial "mac" sudo nano /etc/netplan/00-installer-config.yaml file.
network:
version: 2
ethernets:
eth0:
dhcp4: true
match:
macaddress: <YOUR MAC ID HERE>
set-name: eth0
Then then to test this config run.
netplan try
When your happy with it
netplan apply
reboot you ubuntu
after restart
stop portainer container
sudo docker stop portainer
remove portainer container
sudo docker rm portainer
now run again on the last version
docker run -d -p 8000:8000 -p 9000:9000 \
--name=portainer --restart=always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v portainer_data:/data \
portainer/portainer-ce:2.13.1

Not able to connect Elasticsearch docker container running on different port

I have a elasticsearch 7.6.1 docker container which i want to run on port 9400,9500 port.
This is the docker run command I have used.
docker run -d --name elasticsearch761v2 -v /data/dump/:/usr/share/elasticsearch/data
-p 9400:9400 -p 9500:9500 -e "discovery.type=single-node" elasticsearch:7.6.1
Which gives the below output.
docker ps -a | grep elastic
idofcontainer elasticsearch:7.6.1 "/usr/local/bin/docke" 18 minutes ago Up 4 minutes
9200/tcp, 0.0.0.0:9400->9400/tcp, 9300/tcp, 0.0.0.0:9500->9500/tcp elasticsearch761v2
I have also set the elasticsearch.yml setting to below.
[root#idofcontainer config]# vi elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
transport.tcp.port: 9400
I have added Iptable entry for the above ports too.
The log for this container is :-
{"type": "server", "timestamp": "2020-04-06T08:25:22,684Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "docker-cluster", "node.name": "4196b5b23",
"message": "Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.kibana_task_manager_1][0], [.kibana_1][0]]]).", "cluster.uuid": "gx7s8R_PTUK4lFPPGBZA", "node.id": "XfLZnNNnQnAOHJnWdDQg" }
The Curl output is this :-
curl http://eserver:9400/_cat
This is not an HTTP port
Because of this, my kibana is also not able to reach the ES server.
I have set the kibana.yml to point to the above port.
Kibana.yml
# Default Kibana configuration for docker target
server.name: kibana
server.host: "0"
elasticsearch.hosts: ["http://eserver:9400/"]
xpack.monitoring.ui.container.elasticsearch.enabled: true
The log of this kibana container.
{"type":"log","#timestamp":"2020-04-06T08:49:14Z","tags":["warning","elasticsearch","admin"],"pid":7,"message":"Unable to revive connection: http://eserver:9400/"}
{"type":"log","#timestamp":"2020-04-06T08:49:14Z","tags":["warning","elasticsearch","admin"],"pid":7,"message":"No living connections"}
You have defined custom transport port 9400 and using it as HTTP port in your curl command to check the Elastic server, which error message is clearly pointing.
This is not an HTTP port
As you mentioned, you want to run your Elastic on 9400 and 9500, then you need to properly bind the default HTTP port 9200 to 9500, using below command.
docker run -d --name elasticsearch761v2 -v /data/dump/:/usr/share/elasticsearch/data
-p 9400:9400 -p 9500:9200 -e "discovery.type=single-node" elasticsearch:7.6.1
Note the only change required is -p 9500:9200 and after that, you can check your ES server using curl http://eserver:9500 , ie using HTTP port.

Flink cannot be run in Marathon

I have three physical nodes with docker installed on them. I have one docker container with Mesos, Marathon, Hadoop and Flink. I configured Master node and Slave nodes for Mesos,Zookeeper and Marathon. I do these works step by step.
First, In Master node, I enter to docker container with this command:
docker run -v /home/user/.ssh:/root/.ssh --privileged -p 5050:5050 -p 5051:5051 -p 5052:5052 -p 2181:2181 -p 8082:8081 -p 6123:6123 -p 8080:8080 -p 50090:50090 -p 50070:50070 -p 9000:9000 -p 2888:2888 -p 3888:3888 -p 4041:4040 -p 7077:7077 -p 52222:22 -e WEAVE_CIDR=10.32.0.2/12 -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e LIBPROCESS_IP=10.32.0.2 -e MESOS_RESOURCES=ports*:[11000-11999] -ti hadoop_marathon_mesos_flink_2 /bin/bash
Then run Mesos and Zookeeper :
/home/zookeeper-3.4.14/bin/zkServer.sh restart
/home/mesos-1.7.2/build/bin/mesos-master.sh --ip=10.32.0.1 --hostname=10.32.0.1 --roles=marathon,flink --quorum=1 --work_dir=/var/run/mesos --log_dir=/var/log/mesos
After that run Marathon in the same container:
/home/marathon-1.7.189-48bfd6000/bin/marathon --master 10.32.0.1:5050 --zk zk://10.32.0.1:2181/marathon --hostname 10.32.0.1 --webui_url 10.32.0.1:8080 --logging_level debug
And finally, I run hadoop:
/opt/hadoop/sbin/start-dfs.sh
Marathon,Mesos and Hadoop are run without any problems.
The most important part of my work is running Flink in Marathon. I configured Flink in docker container like this:
env.java.home: /opt/java
jobmanager.rpc.address: 10.32.0.1
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: 10.32.0.1:2181,10.32.0.2:2181
recovery.zookeeper.path.mesos-workers: /mesos-workers
In Marathon UI, I create Application and put this JSON file on it, but it is failed.
{
"id": "flink",
"cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
-Dmesos.master=10.32.0.1:5050,10.32.0.2:5050
-Dmesos.initial-tasks=1",
"cpus": 1.0,
"mem": 1024
}
Flink application is failed in Mesos UI. It shows this error:
I0428 06:01:39.586699 6155 exec.cpp:162] Version: 1.7.2
I0428 06:01:39.596458 6154 exec.cpp:236] Executor registered on agent 984595ae-e811-48fb-a9f5-ca6128e1cc1a-S0
I0428 06:01:39.598870 6157 executor.cpp:188] Received SUBSCRIBED event
I0428 06:01:39.599761 6157 executor.cpp:192] Subscribed executor on 10.32.0.3
I0428 06:01:39.599963 6157 executor.cpp:188] Received LAUNCH event
I0428 06:01:39.601236 6157 executor.cpp:697] Starting task flink.16a7cc18-697b-11e9-928f-ce235caa831e
I0428 06:01:39.613719 6157 executor.cpp:712] Forked command at 6163
I0428 06:01:39.787395 6157 executor.cpp:1013] Command exited with status 1 (pid: 6163)
I0428 06:01:40.791885 6162 process.cpp:927] Stopped the socket accept loop
The strange thing is that in STDout, I see this text; even though I set JAVA_HOME in /etc/environment and flink-conf.yam.
Please specify JAVA_HOME. Either in Flink config ./conf/flink-conf.yaml or as system-wide JAVA_HOME.
Would you please tell me what I should do for that problem?
Many Thanks.
You can check your Flink log in Slave node. Also, it is better to change your JSON file like this. It helps you to follow your application.
{
"id": "flink",
"cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024
-Djobmanager.rpc.port=6123 -Drest.port=8081
-Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
-Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
-Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024,
"fetch": [
{
"uri": "/home/flink-1.7.0/bin/mesos-appmaster.sh",
"executable": true
}
]
}
Also, JAVA_HOME to Flink_conf.yaml in every nodes, Master and Slaves.
env.java.home: /opt/java
With adding JAVA_HOME, you do not see the error in STDOUT.
I think it is useful.

Docker Swarm with Consul - Manager not electing primary

I'm trying to setup a HA docker cluster on 3 dedicated pc's. I've successfully followed the instructions on docs.docker.com/engine/installation/linux/ubuntulinux and now I'm trying to follow the instructions on https://docs.docker.com/swarm/install-manual. Since I'm not using any virtualization I start at "Set up an consul discovery backend". The PC's (running ubuntu trusty 14.04 server edition) are all in the LAN 192.168.2.0/24. ubuntu001 has .104, ubuntu002 has .106, and ubuntu003 has .105
I did the following according to the instructions:
arnolde#ubuntu001:~$ docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap
arnolde#ubuntu001:~$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise 192.168.2.104:4000 consul://192.168.2.104
arnolde#ubuntu002:~# docker run -d swarm manage -H :4000 --replication --advertise 192.168.2.106:4000 consul://192.168.2.104:8500
arnolde#ubuntu003:~$ docker run -d swarm join --advertise=192.168.2.105:2375 consul://192.168.2.104:8500
But then when trying the next step, the swarm manager does NOT show up as "Primary" like it says it should, and no primary is listed:
arnolde#ubuntu001:~$ docker -H :4000 info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: swarm/1.1.0
Role: replica
Primary:
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 0
Plugins:
Volume:
Network:
Kernel Version: 3.19.0-25-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
And:
arnolde#ubuntu001:~$ docker -H :4000 run hello-world
docker: Error response from daemon: No elected primary cluster manager.
I searched and found https://github.com/docker/swarm/issues/1491 which recommends to use dockerswarm/swarm:master instead, which I did, but it didn't help:
arnolde#ubuntu001:~$ docker run -d -p 4000:4000 dockerswarm/swarm:master manage -H :4000 --replication --advertise 192.168.2.104:4000 consul://192.168.2.104
I didn't find any other input regarding swarm+consul+primary so here I am... any suggestions? Unfortunately I'm not sure how to troubleshoot since I don't even know where to look for logging/debugging info, i.e. if the manager is connecting to consul successfully etc...
I was able to solve it myself after explicitly adding the port number to the consul:// parameter, apparently the docker docs are incomplete:
arnolde#ubuntu001:~$ docker run -d -p 4000:4000 dockerswarm/swarm:master manage -H :4000 --replication --advertise 192.168.2.104:4000 consul://192.168.2.104:8500
arnolde#ubuntu001:~$ docker -H :4000 info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: swarm/1.1.0
Role: replica
Primary: 192.168.2.106:4000
Also I added "-p 4000:4000" to the command on the replica manager (on ubuntu002). Not sure if that was necessary (or even a good idea).
My friends,the first step you should edit docker start daemon configure to write listen the port any other configure ,my environment is centos7,so my daemon configure is in /usr/lib/docker/.... edit "ExecStart=/usr/bin/docker daemon -H fd:// -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-store=consul://192.168.1.102:8500 --cluster-advertise=192.168.1.103:0" each node. and the second step: "docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap" anymore...

Resources