Hadoop Cluster on Docker Swarm - Datanodes Failed to Connect to Namenode

Hadoop Cluster on Docker Swarm - Datanodes Failed to Connect to Namenode - docker

I am new to Docker and trying to build a Hadoop cluster with Docker Swarm. I tried to build it with docker compose and it worked perfectly. However, I would like to add other services like Hive, Spark, HBase to it in the future so a Swarm seems a better idea.
When I tried to run it with a version 3.7 yaml file, the namenode and datanodes started successfully. But when I visited the web UI, it showed that there is no nodes available at the "Datanodes" tab (neither at the "Overview" tab). It seems the datanodes failed to connect to the namenode. I had checked the port of each node with netstat -tuplen and both 7946 and 4789 worked fine.
Here is the yaml file I used:
version: "3.7"
services:
namenode:
image: flokkr/hadoop:latest
hostname: namenode
networks:
- hbase
command: ["hdfs","namenode"]
ports:
- target: 50070
published: 50070
- target: 9870
published: 9870
environment:
- NAMENODE_INIT=hdfs dfs -chmod 777 /
- ENSURE_NAMENODE_DIR=/tmp/hadoop-hadoop/dfs/name
env_file:
- ./compose-config
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
datanode:
image: flokkr/hadoop:latest
networks:
- hbase
command: ["hdfs","datanode"]
env_file:
- ./compose-config
deploy:
mode: global
restart_policy:
condition: on-failure
volumes:
namenode:
datanode:
networks:
hbase:
name: hbase
Basically I just update the yaml file from this repo to version 3.7 and tried to run it on GCP. And here is my repo in case you want to replicate the case.
And this is the status of ports of the manager node:
the worker node:
Thank you for your help!

It seems to be a network related issue, the pods are up an running but they are not registering on your Web GUI maybe the network communication it's not reaching between them. Check your internal firewall rules and OS firewall, run some network test on the specific ports.

Related

Docker swarm does not distribute the container in the cluster

I have a two servers to use in a Docker cluster Swarm(test only), one is a Manager and other is a Worker, but running the command docker stack deploy --compose-file docker-compose.yml teste2 all the services is run in the manager and the worker not receive the containers to run, for some reason the Swarm is not achieving distributing the services in the cluster and running all in manager server.
Will my docker-compose.yml be causing the problem or might it be a network problem?
Here are some settings:
Servers CentOs 7, Docker version 18.09.4;
I executed the commands systemctl stop firewalld && systemctl disable firewalld to disable firewall;
I executed the command docker swarm join --token ... in the worker;
Result docker node ls:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
993dko0vu6vlxjc0pyecrjeh0 * name.server.manager Ready Active Leader 18.09.4
2fn36s94wjnu3nei75ymeuitr name.server.worker Ready Active 18.09.4
File docker-compose.yml:
version: "3"
services:
web:
image: testehello
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
# placement:
# constraints: [node.role == worker]
ports:
- 4000:80
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- 8080:8080
stop_grace_period: 1m30s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
webnet:
I executed the command docker stack deploy --compose-file docker-compose.yml teste2
In the docker-compose.yml I commented the parameters placement and constraints because they did not work and did not start the containers on the servers, without it the containers are started in the manager. Through the Visualizer all appear in the manager.

I think the images are not accessible from a worker node, that is why they not receive containers, try to use this guide by docker https://docs.docker.com/engine/swarm/stack-deploy/
P.S. I think you solved it already, but just in case.

Docker Swarm with springboot app

I'm currently trying to deploy an application with docker swarm in 3 virtual machines, I'm doing it through docker-compose to create the image, my files are the following:
Dockerfile:
FROM openjdk:8-jdk-alpine
WORKDIR /home
ARG JAR_FILE
ARG PORT
VOLUME /tmp
COPY ${JAR_FILE} /home/app.jar
EXPOSE ${PORT}
ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/home/app.jar"]
and my docker-compose is:
version: '3'
services:
service_colpensiones:
build:
context: ./colpensiones-servicio
dockerfile: Dockerfile
args:
JAR_FILE: ColpensionesServicio.jar
PORT: 8082
volumes:
- data:/home
ports:
- 8082:8082
volumes:
data:
I'm using the command docker-compose up -d --build to build the image, I automatically create the container which is deleted later. To use docker swarm I use the 3 machines, one manager and two worker, I have another file to deploy the service with 3 replicas
version: '3'
services:
service_colpensiones:
image: deploy_lyra_colpensiones_service_colpensiones
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
volumes:
- data:/home
ports:
- 8082:8082
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
- webnet
networks:
webnet:
volumes:
data:
So far I think everything is fine because in the console with the command: docker service ls I see the services created, the viewer dockersamples / visualizer: stable, shows me the nodes correctly on port 8080, but when I want to make a request to the url of the services that is in the following way:
curl -4 http://192.168.99.100:8082/colpensiones/msg
the error appears:
curl: (7) Failed to connect to 192.168.99.100 port 8082: Refused connection.
The images from service are:
I am following the docker tutorial: Get Started https://docs.docker.com/get-started/part5/
I hope your help, thanks

I had the same issue but fixed after changing the port number of the spring boot service to
ports:
- "8082:8080"
The actual issue is: tomcat server by default listening on port 8080 not the port mentioned on the compose file. Also i increased the memory limit.
FYI: The internal port of the tasks/container running in the service can be same for other containers as well(:) so mentioning 8080(internal port) for both spring boot container and visualizer container is not a problem.

I also faced the same issue for my application. I rebuilt my app by removing from Dockerfile => -Djava.security.egd=file:/dev/./urandom java cmdline property, and it started working for me.
Please check "docker service logs #containerid#" (to see container ids run command "docker stack ps #servicename#") which served you request at that time, and see if you see any error message.
PS: I recently started on docker, so might not be an expert advice. Just in case if it helps.

How to make HDFS work in docker swarm

I have troubles to make my HDFS setup work in docker swarm.
To understand the problem I've reduced my setup to the minimum :
1 physical machine
1 namenode
1 datanode
This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.
Here is the compose file :
version: '3'
services:
namenode:
image: uhopper/hadoop-namenode
hostname: namenode
ports:
- "50070:50070"
- "8020:8020"
volumes:
- /userdata/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=hadoop-cluster
datanode:
image: uhopper/hadoop-datanode
depends_on:
- namenode
volumes:
- /userdata/datanode:/hadoop/dfs/data
environment:
- CORE_CONF_fs_defaultFS=hdfs://namenode:8020
To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>
Then I run the following command :
hdfs dfs -put test.txt /test.txt
With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.
With docker-swarm, I'm running :
docker swarm init
docker stack deploy --compose-file docker-compose.yml hadoop
Then when all services are up, I put my file on HDFS it fails like this :
INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
If I look in the web UI the datanode seems to be up and no issue is reported...
Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.
Thanks for your help :)

The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.
The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:
version: '3.7'
x-deploy_default: &deploy_default
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: any
delay: 5s
services:
hdfs_namenode:
deploy:
<<: *deploy_default
networks:
hostnet: {}
volumes:
- hdfs_namenode:/hadoop-3.2.0/var/name_node
command:
namenode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
hdfs_datanode:
deploy:
mode: global
networks:
hostnet: {}
volumes:
- hdfs_datanode:/hadoop-3.2.0/var/data_node
command:
datanode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
volumes:
hdfs_namenode:
hdfs_datanode:
networks:
hostnet:
external: true
name: host

privileged mode in docker compose in a swarm

I am using docker-compose.yml to deploy services in a docker swarm which has cluster of raspberry pis. My services require access to the raspberry pi GPIO and needs privileged mode. I am using docker version 18.02 with docker-compose version 3.6. When I deploy the stack, I receive the following message and the services do not get deployed: "Ignoring unsupported options: privileged". Any tips? Below is my docker-compose.yml file
version: '3.6'
networks:
swarm_network:
driver: overlay
services:
service1:
image: localrepo/img1:v0.1
privileged: true
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == home-desktop
ports:
- published: 8000
target: 8000
mode: host
networks:
swarm_network:
service2:
image: localrepo/img1:v0.1
privileged: true
deploy:
mode: replicated
replicas: 1
ports:
- published: 7000
target: 7000
mode: host
networks:
swarm_network:
nodeViewer:
image: alexellis2/visualizer-arm:latest
ports:
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
- swarm_network

Thats because privileged is not supported in docker swarm. I had a similar docker compose running in privileged mode but while using it to docker swarm I removed them and was working well.
That not exactly an error .For example if you use something like links or depends_on . You get similar warning message. These are just the warnings not errors.
This is how you actually check the error logs if there is any
docker service ls (to check running service)
docker service logs servicename

Whole feature is implemented and works as far I can see so who ever want to test it can do it by downloading latest nightly build of Docker engine (dockerd) from https://master.dockerproject.org and the custom build version of Docker CLI from https://github.com/olljanat/cli/releases/tag/beta1
You can also find usage examples for CLI from docker/cli#2199 and for Stack from docker/cli#1940 If you find bugs from those please leave comment to correct PR. Also notice that syntax might still change during review.
Source: https://github.com/moby/moby/issues/25885#issuecomment-557790402
I've personally tested it and it works like a charm. thanks to the author.

Use docker-compose with docker swarm

I'm using docker 1.12.1
I have an easy docker-compose script.
version: '2'
services:
jenkins-slave:
build: ./slave
image: jenkins-slave:1.0
restart: always
ports:
- "22"
environment:
- "constraint:NODE==master1"
jenkins-master:
image: jenkins:2.7.1
container_name: jenkins-master
restart: always
ports:
- "8080:8080"
- "50000"
environment:
- "constraint:NODE==node1"
I run this script with docker-compose -p jenkins up -d.
This Creates my 2 containers but only on my master (from where I execute my command). I would expect that one would be created on the master and one on the node.
I also tried to add
networks:
jenkins_swarm:
driver: overlay
and
networks:
- jenkins_swarm
After every service but this is failing with:
Cannot create container for service jenkins-master: network jenkins_jenkins_swarm not found
While the network is created when I perform docker network ls
Someone who can help me to deploy 2 containers on my 2 nodes with docker-compose. Swarm is defenitly working on my "cluster". I followed this tutorial to verify.

Compose doesn't support Swarm Mode at the moment.
When you run docker compose up on the master node, Compose issues docker run commands for the services in the Compose file, rather than docker service create - which is why the containers all run on the master. See this answer for options.
On the second point, networks are scoped in 1.12. If you inspect your network you'll find it's been created at swarm-level, but Compose is running engine-level containers which can't see the swarm network.

We can do this with docker compose v3 now.
https://docs.docker.com/engine/swarm/#feature-highlights
https://docs.docker.com/compose/compose-file/
You have to initialize the swarm cluster using command
$ docker swarm init
You can add more nodes as worker or manager -
https://docs.docker.com/engine/swarm/join-nodes/
Once you have your both nodes added to the cluster, pass your compose v3 i.e deployment file to create a stack. Compose file should just contain predefined images, you can't give a Dockerfile for deployment in Swarm mode.
$ docker stack deploy -c dev-compose-deploy.yml --with-registry-auth PL
View your stack services status -
$ docker stack services PL
Try to use Labels & Placement constraints to put services on different nodes.
Example "dev-compose-deploy.yml" file for your reference
version: "3"
services:
nginx:
image: nexus.example.com/pl/nginx-dev:latest
extra_hosts:
- "dev-pldocker-01:10.2.0.42”
- "int-pldocker-01:10.2.100.62”
- "prd-plwebassets-01:10.2.0.62”
ports:
- "80:8003"
- "443:443"
volumes:
- logs:/app/out/
networks:
- pl
deploy:
replicas: 3
labels:
feature.description: “Frontend”
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints: [node.role == worker]
command: "/usr/sbin/nginx"
viz:
image: dockersamples/visualizer
ports:
- "8085:8080"
networks:
- pl
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
replicas: 1
labels:
feature.description: "Visualizer"
restart_policy:
condition: any
placement:
constraints: [node.role == manager]
networks:
pl:
volumes:
logs:

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Hadoop Cluster on Docker Swarm - Datanodes Failed to Connect to Namenode - docker

It seems to be a network related issue, the pods are up an running but they are not registering on your Web GUI maybe the network communication it's not reaching between them. Check your internal firewall rules and OS firewall, run some network test on the specific ports.

Related

Docker swarm does not distribute the container in the cluster

Docker Swarm with springboot app

How to make HDFS work in docker swarm

privileged mode in docker compose in a swarm

Use docker-compose with docker swarm

Categories

Resources