replicate standard_ds3_v2 databricks cluster in docker - docker

replicating standard_ds3_v2 databricks cluster in docker (14gb 4 cores)
would I just change the spark worker 1 and 2 in dockerfile from 512MB each to 7GB and from 1 core to 2. would this result in the exact same config as the standard_ds3_v2 data bricks cluster?
below is the docker cofig
version: "3.6"
volumes:
shared-workspace:
name: "hadoop-distributed-file-system"
driver: local
services:
jupyterlab:
image: jupyterlab
container_name: jupyterlab
ports:
- 8888:8888
volumes:
- shared-workspace:/opt/workspace
spark-master:
image: spark-master
container_name: spark-master
ports:
- 8080:8080
- 7077:7077
volumes:
- shared-workspace:/opt/workspace
spark-worker-1:
image: spark-worker
container_name: spark-worker-1
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8081:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master
spark-worker-2:
image: spark-worker
container_name: spark-worker-2
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8082:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master

Related

Adguard Home docker compose config and db missing

im trying to run adguard with docker compose. I created a lot more containers with docker compose but this one is not creating any files into the mapped folder.
I tried to rebuild the docker command of the official instruction but any time i recreate the container i end up at the setup page and all settings are deleted.
Any ideas?
This is my compose file:
version: "3"
volumes:
homematic_data:
external: true
networks:
homematic:
services:
samba:
image: dperson/samba
container_name: samba
restart: always
ports:
- "137:137/udp"
- "138:138/udp"
- "139:139/tcp"
- "445:445/tcp"
healthcheck:
disable: true
environment:
- TZ='Europe/Berlin'
- WORKGROUP=workgroup
- RECYCLE=false
- USER1=pi;PASSWORD;1000
- SHARE1=homematic_docker;/shares/homematic_docker;yes;no;yes;pi;pi
volumes:
- /home/pi:/shares/homematic_docker
networks:
- homematic
promtail:
image: grafana/promtail:latest
container_name: promtail
volumes:
- /var/log:/var/log
- ./promtail:/etc/promtail
restart: unless-stopped
command: -config.file=/etc/promtail/promtail-config.yml
networks:
- homematic
node-exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- /:/host:ro,rslave
command:
- '--path.rootfs=/host'
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- --collector.filesystem.ignored-mount-points
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
ports:
- 9100:9100
networks:
- homematic
restart: always
###################### portainer
portainer:
image: portainer/portainer-ce:latest
container_name: portainer
restart: unless-stopped
security_opt:
- no-new-privileges:true
volumes:
- /etc/localtime:/etc/localtime:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./portainer:/data
ports:
- 9000:9000
adguard:
image: adguard/adguardhome
container_name: adguard
restart: unless-stopped
ports:
- 53:53/tcp
- 53:53/udp
- 67:67/udp
- 69:68/udp
- 80:80/tcp
- 443:443/tcp
- 443:443/udp
- 3000:3000/tcp
- 853:853/tcp
- 784:784/udp
- 853:853/udp
- 8853:8853/udp
- 5443:5443/tcp
- 5443:5443/udp
# environment:
# - TZ=Europe/Berlin
volumes:
- /home/pi/homematicDocker/adguard/work:/opt/adguardhome/work\
- /home/pi/homematicDocker/adguard/conf:/opt/adguardhome/conf\
# network_mode: host
raspberrymatic:
image: ghcr.io/jens-maus/raspberrymatic:3.67.10.20230117-27abde9
container_name: homematic
hostname: homematic-raspi
privileged: true
restart: unless-stopped
stop_grace_period: 30s
volumes:
- homematic_data:/usr/local:rw
- /lib/modules:/lib/modules:ro
- /run/udev/control:/run/udev/control
ports:
- "8080:80"
- "2001:2001"
- "2010:2010"
- "9292:9292"
- "8181:8181"
networks:
- homematic
Within the folder "/opt/adguardhome/work" I see a folder data with a database inside. After i finished the setup also the folder conf inside the container has a yaml file.
Unfortunately i copied the backslashes of the docker command into the volume mapping, thats was the problem why i didnt get any data. Thank you Mike!

Got problem deploying docker-compose service(port issue)

I want to deploy a service that will allow me to use Spark and MongoDB in a Jupiter notebook.
I use docker-compose to build up the service, and it`s as followed:
version: "3.3"
volumes:
shared-workspace:
networks:
spark-net:
driver: bridge
services:
spark-master:
image: uqteaching/cloudcomputing:spark-master-v1
container_name: spark-master
networks:
- "spark-net"
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
spark-worker-1:
image: uqteaching/cloudcomputing:spark-worker-v1
container_name: spark-worker-1
depends_on:
- spark-master
networks:
- "spark-net"
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
spark-worker-2:
image: uqteaching/cloudcomputing:spark-worker-v1
container_name: spark-worker-2
depends_on:
- spark-master
networks:
- "spark-net"
ports:
- "8082:8082"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
mongo:
image: mongo
container_name: 'mongo'
networks:
- "spark-net"
ports:
- "27017:27017"
mongo_admin:
image: mongo-express
container_name: 'mongoadmin'
networks:
- "spark-net"
depends_on:
- mongo
links:
- mongo
ports:
- "8091:8091"
jupyter-notebook:
container_name: jupyternb
image: jupyter/all-spark-notebook:42f4c82a07ff
depends_on:
- mongo
- spark-master
links:
- mongo
expose:
- "8888"
networks:
- "spark-net"
ports:
- "8888:8888"
volumes:
- ./nbs:/home/jovyan/work/nbs
- ./events:/tmp/spark-events
environment:
- "PYSPARK_PYTHON=/usr/bin/python3"
- "PYSPARK_DRIVER_PYTHON=/usr/bin/python3"
command: "start-notebook.sh \
--ip=0.0.0.0 \
--allow-root \
--no-browser \
--notebook-dir=/home/jovyan/work/nbs \
--NotebookApp.token='' \
--NotebookApp.password=''
"
And the result is something like this:
I dont know why. Even I set these 2` services to listen to a different port.
They are using 8081/tcp at the same time, which caused them both to crash.
I want to solve this.
mongo-express seems to need port 8081 internal, so use another external port to be able to login to the webui.
http://localhost:8092 would then be something like this:
mongo_admin:
image: mongo-express
container_name: 'mongoadmin'
networks:
- "spark-net"
depends_on:
- mongo
links:
- mongo
ports:
- "8092:8091"

Hive-metastore not finding Hadoop Datanode

I have an Hadoop cluster with one namenode and one datanode instantiated with docker compose. In addition I am trying to launch Hive but the Hive-metastore seems not to find my datanode even if this one is up and running, infact checking the log it shows:
namenode:9870 is available.
check for datanode:9871...
datanode:9871 is not available yet
try in 5s once again ...
Here is my docker-compose.yml
#HADOOP
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
expose:
- "9870"
- "54310"
- "9000"
ports:
- 9870:9870
- 9000:9000
volumes:
- ./data/hadoop_data/:/hadoop_data
environment:
- CLUSTER_NAME=test
- CORE_CONF_fs_defaultFS=hdfs://namenode:9000
- CORE_CONF_hadoop_http_staticuser_user=root
- CORE_CONF_hadoop_proxyuser_hue_hosts=*
- CORE_CONF_hadoop_proxyuser_hue_groups=*
- CORE_CONF_io_compression_codecs=org.apache.hadoop.io.compress.SnappyCodec
- HDFS_CONF_dfs_webhdfs_enabled=true
- HDFS_CONF_dfs_permissions_enabled=false
- HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
- HDFS_CONF_dfs_safemode_threshold_pct=0
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
expose:
- "9871"
environment:
SERVICE_PRECONDITION: "namenode:9870"
ports:
- "9871:9871"
env_file:
- hive.env
hive-server:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-server
volumes:
- ./employee:/employee
env_file:
- hive.env
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
depends_on:
- hive-metastore
ports:
- "10000:10000"
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-metastore
env_file:
- hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:9870 datanode:9871 hive-metastore-postgresql:5432"
depends_on:
- hive-metastore-postgresql
ports:
- "9083:9083"
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
container_name: hive-metastore-postgresql
volumes:
- ./metastore-postgresql/postgresql/data:/var/lib/postgresql/data
depends_on:
- datanode

Running Kibana in Docker image gives Non-Root error

I'm having issues attempting to setup the ELK stack (v7.6.0) in docker using Docker-Compose.
Elastic Search & Logstash startup fine, but Kibana instantly exists, the docker logs for that container report:
Kibana should not be run as root. Use --allow-root to continue.
the docker-compose for those elements looks like this:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.0
environment:
- discovery.type=single-node
ports:
- 9200:9200
mem_limit: 2gb
kibana:
image: docker.elastic.co/kibana/kibana:7.6.0
environment:
- discovery.type=single-node
ports:
- 5601:5601
depends_on:
- elasticsearch
logstash:
image: docker.elastic.co/logstash/logstash:7.6.0
ports:
- "5000:5000/tcp"
- "5000:5000/udp"
- "9600:9600"
mem_limit: 2gb
depends_on:
- elasticsearch
How do I either disable the run as root error, or set the application to not run as root?
In case you don't have a separate Dockerfile for Kibana and you just want to set the startup arg in docker-compose, the syntax there is as follows:
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.9.2
ports:
- 5601:5601
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_URL=http://localhost:9200
networks:
- elastic
entrypoint: ["./bin/kibana", "--allow-root"]
That works around the problem of running it on a Windows Container.
That being said, I don't know why Kibana should not be executed as root.
i've just run this docker image and all works perfectly, i share my docker-compose file:
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.0
container_name: elasticsearch
environment:
- node.name=node
- cluster.name=elasticsearch-default
- bootstrap.memory_lock=true
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
ports:
- "9200:9200"
expose:
- "9200"
networks:
- "esnet"
kibana:
image: docker.elastic.co/kibana/kibana:7.6.0
container_name: kibana
ports:
- "5601:5601"
expose:
- "5601"
networks:
- "esnet"
depends_on:
- elasticsearch
logstash:
image: docker.elastic.co/logstash/logstash:7.6.0
ports:
- "5000:5000/tcp"
- "5000:5000/udp"
- "9600:9600
depends_on:
- elasticsearch
networks:
- "esnet"
networks:
esnet:
I had the same problem. I did run it manually by adding the ENTRYPOINT to the end of Dockerfile.
ARG ELK_VERSION
FROM docker.elastic.co/kibana/kibana:${ELK_VERSION}
ENTRYPOINT ["./bin/kibana", "--allow-root"]
The docker-compose.yml
version: '3.2'
# Elastic stack (ELK) on Docker https://github.com/deviantony/docker-elk
services:
elasticsearch:
build:
context: elasticsearch/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- type: volume
source: elasticsearch
target: /usr/share/elasticsearch/data
ports:
- "9200:9200"
- "9300:9300"
environment:
ES_JAVA_OPTS: "-Xmx256m -Xms256m"
ELASTIC_PASSWORD: changeme
discovery.type: single-node
networks:
- elk
logstash:
build:
context: logstash/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- type: bind
source: ./logstash/pipeline
target: /usr/share/logstash/pipeline
read_only: true
ports:
- "5002:5002/tcp"
- "5002:5002/udp"
- "9600:9600"
environment:
LS_JAVA_OPTS: "-Xmx256m -Xms256m"
networks:
- elk
depends_on:
- elasticsearch
kibana:
build:
context: kibana/ ############## Dockerfile ##############
args:
ELK_VERSION: $ELK_VERSION
ports:
- "5601:5601"
networks:
- elk
depends_on:
- elasticsearch
networks:
elk:
driver: nat
volumes:
elasticsearch:

Creating spark cluster with drone.yml not working

I have docker-compose.yml with below image and configuration
version: '3'
services:
spark-master:
image: bde2020/spark-master:2.4.4-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
here the docker-compose up log ---> https://jpst.it/1Xc4K
and here containers up and running and i mean spark worker connected to spark master without any issues , now problem is i created drone.yml and where i added services component with
services:
jce-cassandra:
image: cassandra:3.0
ports:
- "9042:9042"
jce-elastic:
image: elasticsearch:5.6.16-alpine
ports:
- "9200:9200"
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
janusgraph:
image: janusgraph/janusgraph:latest
ports:
- "8182:8182"
environment:
JANUS_PROPS_TEMPLATE: cassandra-es
janusgraph.storage.backend: cql
janusgraph.storage.hostname: jce-cassandra
janusgraph.index.search.backend: elasticsearch
janusgraph.index.search.hostname: jce-elastic
depends_on:
- jce-elastic
- jce-cassandra
spark-master:
image: bde2020/spark-master:2.4.4-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
but here spark worker is not connected to spark master getting exceptions, here is exception log details , can some one please guide me why am facing this issue
Note : I am trying to create these services in drone.yml for my integration testing
Answering for better formatting. The comments suggest sleeping. Assuming this is the dockerfile (https://hub.docker.com/r/bde2020/spark-worker/dockerfile) You could sleep by adding the command:
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
command: sleep 10 && /bin/bash /worker.sh
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
Although sleep 10 is probably excessive, if this would would sleep 5 or sleep 2

Resources