I have an Hadoop cluster with one namenode and one datanode instantiated with docker compose. In addition I am trying to launch Hive but the Hive-metastore seems not to find my datanode even if this one is up and running, infact checking the log it shows:
namenode:9870 is available.
check for datanode:9871...
datanode:9871 is not available yet
try in 5s once again ...
Here is my docker-compose.yml
#HADOOP
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
expose:
- "9870"
- "54310"
- "9000"
ports:
- 9870:9870
- 9000:9000
volumes:
- ./data/hadoop_data/:/hadoop_data
environment:
- CLUSTER_NAME=test
- CORE_CONF_fs_defaultFS=hdfs://namenode:9000
- CORE_CONF_hadoop_http_staticuser_user=root
- CORE_CONF_hadoop_proxyuser_hue_hosts=*
- CORE_CONF_hadoop_proxyuser_hue_groups=*
- CORE_CONF_io_compression_codecs=org.apache.hadoop.io.compress.SnappyCodec
- HDFS_CONF_dfs_webhdfs_enabled=true
- HDFS_CONF_dfs_permissions_enabled=false
- HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
- HDFS_CONF_dfs_safemode_threshold_pct=0
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
expose:
- "9871"
environment:
SERVICE_PRECONDITION: "namenode:9870"
ports:
- "9871:9871"
env_file:
- hive.env
hive-server:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-server
volumes:
- ./employee:/employee
env_file:
- hive.env
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
depends_on:
- hive-metastore
ports:
- "10000:10000"
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-metastore
env_file:
- hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode:9871 hive-metastore-postgresql:5432"
depends_on:
- hive-metastore-postgresql
ports:
- "9083:9083"
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
container_name: hive-metastore-postgresql
volumes:
- ./metastore-postgresql/postgresql/data:/var/lib/postgresql/data
depends_on:
- datanode
Related
I want to deploy a service that will allow me to use Spark and MongoDB in a Jupiter notebook.
I use docker-compose to build up the service, and it`s as followed:
version: "3.3"
volumes:
shared-workspace:
networks:
spark-net:
driver: bridge
services:
spark-master:
image: uqteaching/cloudcomputing:spark-master-v1
container_name: spark-master
networks:
- "spark-net"
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
spark-worker-1:
image: uqteaching/cloudcomputing:spark-worker-v1
container_name: spark-worker-1
depends_on:
- spark-master
networks:
- "spark-net"
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
spark-worker-2:
image: uqteaching/cloudcomputing:spark-worker-v1
container_name: spark-worker-2
depends_on:
- spark-master
networks:
- "spark-net"
ports:
- "8082:8082"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
mongo:
image: mongo
container_name: 'mongo'
networks:
- "spark-net"
ports:
- "27017:27017"
mongo_admin:
image: mongo-express
container_name: 'mongoadmin'
networks:
- "spark-net"
depends_on:
- mongo
links:
- mongo
ports:
- "8091:8091"
jupyter-notebook:
container_name: jupyternb
image: jupyter/all-spark-notebook:42f4c82a07ff
depends_on:
- mongo
- spark-master
links:
- mongo
expose:
- "8888"
networks:
- "spark-net"
ports:
- "8888:8888"
volumes:
- ./nbs:/home/jovyan/work/nbs
- ./events:/tmp/spark-events
environment:
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
command: "start-notebook.sh \
--ip=0.0.0.0 \
--allow-root \
--no-browser \
--notebook-dir=/home/jovyan/work/nbs \
--NotebookApp.token='' \
--NotebookApp.password=''
"
And the result is something like this:
I dont know why. Even I set these 2` services to listen to a different port.
They are using 8081/tcp at the same time, which caused them both to crash.
I want to solve this.
mongo-express seems to need port 8081 internal, so use another external port to be able to login to the webui.
http://localhost:8092 would then be something like this:
mongo_admin:
image: mongo-express
container_name: 'mongoadmin'
networks:
- "spark-net"
depends_on:
- mongo
links:
- mongo
ports:
- "8092:8091"
version: '3.3'
services:
#InfluxDB server
influx-db:
image: influxdb:1.8-alpine
container_name: influx-db
ports:
- 8086:8086
restart: always
volumes:
- db-data:/var/lib/influxdb
networks:
- local
#PostgreSQL Database for the application
postgresdb:
image: "postgres:12.0-alpine"
container_name: postgresdb
volumes:
- db-data:/var/lib/postgresql/data
ports:
- 5432:5432
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
restart: always
networks:
- local
#Fron-end Angular Application
fe:
build: './Frontend-Asset'
ports:
- 4201:4201
links:
- sm_abc_be
- sm_um_be
depends_on:
- sm_abc_be
- sm_um_be
networks:
- local
um_fe:
build: './Frontend-User'
ports:
- 4202:4202
links:
- sm_abc_be
- sm_um_be
depends_on:
- sm_abc_be
- sm_um_be
networks:
- local
#Back-end Spring Boot Application
sm_um_be:
build: './um_be'
ports:
- 8081:8081
restart: always
volumes:
- db-data/
links:
- postgresdb
environment:
- SPRING_DATASOURCE_URL=jdbc:postgresql://postgresdb:5432/abcd
- SPRING_DATASOURCE_USERNAME=abc_user
- SPRING_DATASOURCE_PASSWORD=abcpassword
- SPRING_JPA_HIBERNATE_DDL_AUTO=update
depends_on:
- postgresdb
networks:
- local
sm_am_be:
build: './am_be'
ports:
- 8082:8082
restart: always
volumes:
- db-data/
links:
- postgresdb
- influx-db
environment:
- SPRING_DATASOURCE_URL=jdbc:postgresql://postgresdb:5432/am_uuid?currentSchema=abc
- SPRING_DATASOURCE_USERNAME=am_db_user
- SPRING_DATASOURCE_PASSWORD=abcpassword
- SPRING_JPA_HIBERNATE_DDL_AUTO=update
depends_on:
- postgresdb
- influx-db
networks:
- local
#Volumes for DB data
volumes:
db-data:
networks:
local:
driver: bridge
I have docker-compose.yml with below image and configuration
version: '3'
services:
spark-master:
image: bde2020/spark-master:2.4.4-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
here the docker-compose up log ---> https://jpst.it/1Xc4K
and here containers up and running and i mean spark worker connected to spark master without any issues , now problem is i created drone.yml and where i added services component with
services:
jce-cassandra:
image: cassandra:3.0
ports:
- "9042:9042"
jce-elastic:
image: elasticsearch:5.6.16-alpine
ports:
- "9200:9200"
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
janusgraph:
image: janusgraph/janusgraph:latest
ports:
- "8182:8182"
environment:
JANUS_PROPS_TEMPLATE: cassandra-es
janusgraph.storage.backend: cql
janusgraph.storage.hostname: jce-cassandra
janusgraph.index.search.backend: elasticsearch
janusgraph.index.search.hostname: jce-elastic
depends_on:
- jce-elastic
- jce-cassandra
spark-master:
image: bde2020/spark-master:2.4.4-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
but here spark worker is not connected to spark master getting exceptions, here is exception log details , can some one please guide me why am facing this issue
Note : I am trying to create these services in drone.yml for my integration testing
Answering for better formatting. The comments suggest sleeping. Assuming this is the dockerfile (https://hub.docker.com/r/bde2020/spark-worker/dockerfile) You could sleep by adding the command:
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
command: sleep 10 && /bin/bash /worker.sh
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
Although sleep 10 is probably excessive, if this would would sleep 5 or sleep 2
The set up
I am trying to compose a lightweight minimal hadoop stack with the images provided by bde2020 (learning purpose). Right now, the stack includes (among others)
a namenode
a datanote
hue
Basically, I started from Big Data Europe official docker compose, and added a hue image based on their documentation
The issue
Hue's file browser can't access HDFS:
Cannot access: /user/dav. The HDFS REST service is not available. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup".
HTTPConnectionPool(host='namenode', port=50070): Max retries exceeded with url: /webhdfs/v1/user/dav?op=GETFILESTATUS&user.name=hue&doas=dav (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f8119a3cf10>: Failed to establish a new connection: [Errno 111] Connection refused',))
What I tried so far to delimit the issue
to explicitly putted all the services on the same network
to point dfs_webhdfs_url to localhost:9870/webhdfs/v1 in the namenode env file (source) and edit hue.ini in hue's container accordingly (by adding webhdfs_url=http://namenode:9870/webhdfs/v1)
when I log into hue's container, I can see that namenode's port 9870 is open (nmap -p 9870 namenode). 50070 is not. I don't think that my issue is network related. Despite editing hue.ini, Hue still go for port 50070. So, how can I force hue to go for port 9870 in my current setup? (if this is the reason)
docker-compose
version: '3.7'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.1-java8
container_name: namenode
hostname: namenode
domainname: hadoop
ports:
- 9870:9870
volumes:
- hadoop_namenode:/hadoop/dfs/name
- ./entrypoints/namenode/entrypoint.sh:/entrypoint.sh
env_file:
- ./hadoop.env
- .env
networks:
- hadoop_net
# TODO adduser --ingroup hadoop dav
datanode1:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.1.1-java8
container_name: datanode
hostname: datanode1
domainname: hadoop
volumes:
- hadoop_datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
networks:
- hadoop_net
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.1.1-java8
container_name: resourcemanager
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode:9864"
env_file:
- ./hadoop.env
networks:
- hadoop_net
nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.1.1-java8
container_name: nodemanager
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
networks:
- hadoop_net
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.1.1-java8
container_name: historyserver
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
networks:
- hadoop_net
filebrowser:
container_name: hue
image: bde2020/hdfs-filebrowser:3.11
ports:
- "8088:8088"
env_file:
- ./hadoop.env
volumes: # BYPASS DEFAULT webhdfs url
- ./overrides/hue/hue.ini:/opt/hue/desktop/conf.dist/hue.ini
environment:
- NAMENODE_HOST=namenode
networks:
- hadoop_net
networks:
hadoop_net:
volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_historyserver:
I was able to get the Filebrowser working with this INI
[desktop]
http_host=0.0.0.0
http_port=8888
time_zone=America/Chicago
dev=true
app_blacklist=impala,zookeeper,oozie,hbase,security,search
[hadoop]
[[hdfs_clusters]]
[[[default]]]
fs_defaultfs=hdfs://namenode:8020
webhdfs_url=http://namenode:50070/webhdfs/v1
security_enabled=false
And this compose
version: "2"
services:
namenode:
image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
container_name: namenode
ports:
- 8020:8020
- 50070:50070
# - 59050:59050
volumes:
- hadoop_namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
networks:
- hadoop
datanode1:
image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
container_name: datanode1
ports:
- 50075:50075
# - 50010:50010
# - 50020:50020
depends_on:
- namenode
volumes:
- hadoop_datanode1:/hadoop/dfs/data
env_file:
- ./hadoop.env
networks:
- hadoop
hue:
image: gethue/hue
container_name: hue
ports:
- 8000:8888
depends_on:
- namenode
volumes:
- ./conf/hue.ini:/hue/desktop/conf/pseudo-distributed.ini
networks:
- hadoop
- frontend
volumes:
hadoop_namenode:
hadoop_datanode1:
networks:
hadoop:
frontend:
hadoop.env has to add hue as a proxy user as well
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
CORE_CONF_hadoop_http_staticuser_user=root
CORE_CONF_hadoop_proxyuser_hue_hosts=*
CORE_CONF_hadoop_proxyuser_hue_groups=*
HDFS_CONF_dfs_replication=1
HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_permissions_enabled=false
Yeah, found it. A few key elements:
in hadoop 3.*, webhdfs no longer listen to 50070 but 9870 is the standard port
overriding hue.ini involves to mount a file named hue-overrides.ini
the hue image from gethue is more up to date than the one from bde2020 (their hadoop stack rocks, though)
Docker-compose
version: '3.7'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.1-java8
container_name: namenode
ports:
- 9870:9870
- 8020:8020
volumes:
- hadoop_namenode:/hadoop/dfs/name
- ./overrides/namenode/entrypoint.sh:/entrypoint.sh
env_file:
- ./hadoop.env
- .env
networks:
- hadoop
filebrowser:
container_name: hue
image: gethue/hue:4.4.0
ports:
- "8000:8888"
env_file:
- ./hadoop.env
volumes: # HERE
- ./overrides/hue/hue-overrides.ini:/usr/share/hue/desktop/conf/hue-overrides.ini
depends_on:
- namenode
networks:
- hadoop
- frontend
datanode1:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.1.1-java8
container_name: datanode1
volumes:
- hadoop_datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
networks:
- hadoop
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.1.1-java8
container_name: resourcemanager
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode1:9864"
env_file:
- ./hadoop.env
networks:
- hadoop
nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.1.1-java8
container_name: nodemanager
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode1:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
networks:
- hadoop
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.1.1-java8
container_name: historyserver
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode1:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
networks:
- hadoop
networks:
hadoop:
frontend:
volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_historyserver:
hadoop.env
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
CORE_CONF_hadoop_http_staticuser_user=root
CORE_CONF_hadoop_proxyuser_hue_hosts=*
CORE_CONF_hadoop_proxyuser_hue_groups=*
CORE_CONF_io_compression_codecs=org.apache.hadoop.io.compress.SnappyCodec
HDFS_CONF_dfs_replication=1
HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_permissions_enabled=false
HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
hue-overrides.ini
[desktop]
http_host=0.0.0.0
http_port=8888
time_zone=France
dev=true
app_blacklist=impala,zookeeper,oozie,hbase,security,search
[hadoop]
[[hdfs_clusters]]
[[[default]]]
fs_defaultfs=hdfs://namenode:8020
webhdfs_url=http://namenode:9870/webhdfs/v1
security_enabled=false
Thanks #cricket_007
I'm new and learning to start a hadoop system using Docker but I have been stuck at one point for weeks and finally I have to ask here. I could launch the containers individually without any problems and the namenode always recognized the running namenodes. However, when I tried to set up a Docker-compose.yml file to launch all containers at once, I got into multiple problems, one of which is that the namenode never recognized the running datanodes. Do you have any suggestions to help me fix it?
Here's my docker-compose file:
version: "3"
services:
base:
image: hpcnube-base-image
hpcnube-namenode:
image: hpcnube-namenode-image
depends_on:
- base
hostname: hpcnube-namenode
container_name: hpcnube-namenode
networks:
- dockerfiles1_hpcnube-net
ports:
- "9870:9870"
hpcnube-resourcemanager:
image: hpcnube-resourcemanager-image
container_name: hpcnube-resourcemanager
depends_on:
- hpcnube-namenode
- hpcnube-dnnm2
- hpcnube-dnnm1
hostname: hpcnube-resourcemanager
networks:
- dockerfiles1_hpcnube-net
ports:
- "8088:8088"
hpcnube-dnnm1:
image: hpcnube-dnnm-image
container_name: hpcnube-dnnm1
hostname: hpcnube-dnnm1
depends_on:
- base
- hpcnube-namenode
networks:
- dockerfiles1_hpcnube-net
#command: "/opt/bd/start-daemons.sh"
hpcnube-dnnm2:
#build: ./DataNode-NodeManager
image: hpcnube-dnnm-image
container_name: hpcnube-dnnm2
hostname: hpcnube-dnnm2
depends_on:
- base
- hpcnube-namenode
networks:
- dockerfiles1_hpcnube-net
#command: "/opt/bd/start-daemons.sh"
hpcnube-checkpoint:
image: hpcnube-checkpointnode-image
hostname: hpcnube-checkpointnode
depends_on:
- base
- hpcnube-namenode
- hpcnube-resourcemanager
- hpcnube-dnnm2
- hpcnube-dnnm1
networks:
- dockerfiles1_hpcnube-net
hpcnube-timeline:
image: hpcnube-timelineserver-image
hostname: hpcnube-timelineserver
depends_on:
- base
- hpcnube-namenode
- hpcnube-resourcemanager
- hpcnube-dnnm2
- hpcnube-dnnm1
- hpcnube-checkpoint
networks:
- dockerfiles1_hpcnube-net
hpcnube-frontend:
build: ./FrontEnd
hostname: hpcnube-frontend
depends_on:
- base
- hpcnube-namenode
- hpcnube-resourcemanager
- hpcnube-dnnm2
- hpcnube-dnnm1
- hpcnube-checkpoint
- hpcnube-timeline
hostname: hpcnube-frontend
networks:
- dockerfiles1_hpcnube-net
ports:
- "2345:22"
networks:
dockerfiles1_hpcnube-net:
driver: bridge