We have a spark cluster which is built with the help of docker(singularities/spark image). When we remove containers, data which is stored in hdfs is removed. It is normal I know, but how can I solve the problem such that whenever I start cluster again, files in hdfs restore without upload again
You can bind/mount a host volume as below for /opt/hdfs directory for both master & worker -
version: "2"
services:
master:
image: singularities/spark
command: start-spark master
hostname: master
volumes:
- "${PWD}/hdfs:/opt/hdfs"
ports:
- "6066:6066"
- "7070:7070"
- "8080:8080"
- "50070:50070"
worker:
image: singularities/spark
command: start-spark worker master
volumes:
- "${PWD}/hdfs:/opt/hdfs"
environment:
SPARK_WORKER_CORES: 1
SPARK_WORKER_MEMORY: 2g
links:
- master
This way your HDFS files will always persist at ./hdfs(hdfs in current working directory) on the host machine.
Ref - https://hub.docker.com/r/singularities/spark/
Related
My target container contains NGINX logs which I wanted to collect from Elastic Fleet's NGINX Integration.
I followed every step, even successfully hosting the fleet server and the agent in two separate containers, what confuses me, is how can I configure my Agent which has the NGINX integration setup on its policy, to collect logs from the service container?
I have mostly encountered examples using the elastic-agent as a package installer directly on the target container.
I've attached three snippets of my docker-compose setup, that I follow for the Fleet, Agent and App containers.
FLEET SERVER
fleet:
image: docker.elastic.co/beats/elastic-agent:$ELASTIC_VERSION
healthcheck:
test: "curl -f http://127.0.0.1:8220/api/status | grep HEALTHY 2>&1 >/dev/null"
retries: 12
interval: 5s
hostname: fleet
container_name: fleet
restart: always
user: root
environment:
- FLEET_SERVER_ENABLE=1
- "FLEET_SERVER_ELASTICSEARCH_HOST=https://elasticsearch:9200"
- FLEET_SERVER_ELASTICSEARCH_USERNAME=elastic
- FLEET_SERVER_ELASTICSEARCH_PASSWORD=REPLACE1
- FLEET_SERVER_ELASTICSEARCH_CA=$CERTS_DIR/ca/ca.crt
- FLEET_SERVER_INSECURE_HTTP=1
- KIBANA_FLEET_SETUP=1
- "KIBANA_FLEET_HOST=https://kibana:5601"
- KIBANA_FLEET_USERNAME=elastic
- KIBANA_FLEET_PASSWORD=REPLACE1
- KIBANA_FLEET_CA=$CERTS_DIR/ca/ca.crt
- FLEET_ENROLL=1
ports:
- 8220:8220
networks:
- elastic
volumes:
- certs:$CERTS_DIR
Elastic Agent
agent:
image: docker.elastic.co/beats/elastic-agent:$ELASTIC_VERSION
container_name: agent
hostname: agent
restart: always
user: root
healthcheck:
test: "elastic-agent status"
retries: 90
interval: 1s
environment:
- FLEET_ENROLLMENT_TOKEN=REPLACE2
- FLEET_ENROLL=1
- FLEET_URL=http://fleet:8220
- FLEET_INSECURE=1
- ELASTICSEARCH_HOSTS='["https://elasticsearch:9200"]'
- ELASTICSEARCH_USERNAME=elastic
- ELASTICSEARCH_PASSWORD=REPLACE1
- ELASTICSEARCH_CA=$CERTS_DIR/ca/ca.crt
- "STATE_PATH=/usr/share/elastic-agent"
networks:
- elastic
volumes:
- certs:$CERTS_DIR
App Container (NGINX logs)
demo-app:
image: ubuntu:bionic
container_name: demo-app
build:
context: ./docker/
dockerfile: Dockerfile
volumes:
- ./app:/var/www/html/app
- ./docker/nginx.conf:/etc/nginx/nginx.conf
ports:
- target: 90
published: 9090
protocol: tcp
mode: host
networks:
- elastic
The ELK stack currently runs on version 7.17.0.
If anyone could provide any info on what next needs to be done , It'll be very much helpful, thanks!
you could share nginx log files through volume mount.
mount a directory to nginx log directory, and mount that to a directory in your elastic agent container. then youre good to harvest the nginx log in elastic agent container from there.
there might be directory read write permission problem, feel free to ask below.
kinda like:
nginx compose:
demo-app:
...
volumes:
- ./app:/var/www/html/app
- ./docker/nginx.conf:/etc/nginx/nginx.conf
+ - /home/user/nginx-log:/var/log/nginx/access.log
...
elastic agent compose:
services:
agent:
...
volumes:
- certs:$CERTS_DIR
+ - /home/user/nginx-log:/usr/share/elastic-agent/nginx-log
Is there a way to control the distribution of services across different computers? I have one master with two workers and 5 services:
web server
database
redis
celery
s3 storage connection
I only want to outsource the celery workers and run everything else on the master. Is there a way to control that with docker swarm? I have not created a registry yet, because I am not sure if that is still necessary.
Here is my current experimental docker-compose file.
version: "3.8"
volumes:
s3data:
driver: local
services:
web:
image: localhost:5000/web
build: .
env_file:
- ./.env
environment:
- ENVIRONMENT=develop
command: python manage.py runserver 0.0.0.0:8000
volumes:
- ./app/:/app/
- ./lib/lrg_omics/:/lrg-omics/
- s3data:/datalake/
- /data/media/:/appmedia/
- /data/static/:/static/
ports:
- "8000:8000"
depends_on:
- db
- redis
- s3vol
links:
- redis:redis
restart: always
db:
image: postgres
volumes:
- /data/db/:/var/lib/postgresql/data
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
redis:
restart: always
image: redis:alpine
ports:
- "6379:6379"
celery:
restart: on-failure
image: pp-celery-worker
build:
context: .
dockerfile: Dockerfile
command: bash -c "celery -A main worker -l info --concurrency 8"
env_file:
- ./.env
volumes:
- ./app/:/app/
- ./lib/lrg_omics/:/lrg-omics/
- s3data:/datalake/
environment:
- DB_HOST=db
- DB_NAME=app
- DB_USER=postgres
- DB_PASS=postgres
depends_on:
- db
- redis
- web
- s3vol
deploy:
replicas: 2
placement:
max_replicas_per_node: 1
s3vol:
image: elementar/s3-volume
command: /data s3://PQC
environment:
- BACKUP_INTERVAL=2
- AWS_ACCESS_KEY_ID=...
- AWS_SECRET_ACCESS_KEY=...
- ENDPOINT_URL=https://example.com
volumes:
- s3data:/data
When I deploy this with sudo docker stack deploy --compose-file docker-compose-distributed.yml QC
And then look at the services I get something like this:
sudo docker stack services QC
>>>
ID NAME MODE REPLICAS IMAGE PORTS
xx5hkbswipoz QC_celery replicated 0/2 (max 1 per node) celery-worker:latest
natb3trv9ngi QC_db replicated 0/1 postgres:latest
1bxpkb18ojay QC_redis replicated 1/1 redis:alpine *:6379->6379/tcp
6rsl5gfpd0oa QC_s3vol replicated 1/1 elementar/s3-volume:latest
aszkle6msmqr QC_web replicated 0/1 localhost:5000/web:latest *:8000->8000/tcp
For some reason only redis and the S3 containers run. And both of them on the master. Nothing runs on the workers.
I am quite new to docker swarm so there is probably more than one thing wrong here. Any comments on best practices are welcome.
To determine why the services are not starting
docker service ps QC_celery --no-trunc will show the state of the service and a message from docker.
To control placement consult the Compose file version 3 reference on placement constraints. Basically it entails adding to the deploy: node:
deploy:
replicas: 2
placement:
max_replicas_per_node: 1
constraints:
- node.role==worker
While, nominally, compose.yml and stack.yml files share a format, they support different feature subsets and for complex deployments it becomes helpful to split the deployment into discreet compose.yml files for docker compose and stack.yml files for swarm deployments.
docker stack deploy -c docker-compose.yml -c docker-stack.yml QC can merge a docker-compose.yml base file with stack specific settings, and you can keep docker compose artifacts in your docker-compose.override.yml. these artifacts include:
build: - docker swarm needs the image to be built and available in a registry, either local(swarm hosted?) or docker-hub.
depends_on:, links: - not supported by swarm, which assumes services can be restarted at any time, and will find each other using docker networks.
restart: controlled by restart_policy: under deploy:
I have got following compose file where i'm sharing some generated html data from Jenkins container to the host drive and reading this data by Nginx container from the host drive. I'm using Ubuntu Server 18.04 on AWS.
The problem is that I can read contents of the jenkins/workspace/allure-report only once. After updating of the html data it becomes inaccessible for Nginx and it throws 403 status code.
I tried all the possible solutions but nothing works. The only ugly solution is to restart Nginx container after every html data updating. I don't like this way and looking for some inbuilt docker features to resolve this.
What didn't help: sharing volume straight between containers without using docker host drive, using rslave option, using docker separate volume that can be used as buffer between the two containers... I believe it should be much more easier!
version: '2'
services:
jenkins:
container_name: jenkins
image: "jenkins/jenkins"
ports:
- "8088:8080"
- "50000:50000"
env_file:
- variables.env
volumes:
- ./jenkins:/var/jenkins_home
selenoid:
container_name: selenoid
network_mode: bridge
image: "aerokube/selenoid"
# default directory for browsers.json is /etc/selenoid/
command: -listen :4444 -conf /etc/selenoid/browsers.json -video-output-dir /opt/selenoid/video/ -timeout 3m
ports:
- "4444:4444"
env_file:
- variables.env
volumes:
- $PWD:/etc/selenoid/ # assumed current dir contains browsers.json
- /var/run/docker.sock:/var/run/docker.sock
selenoid-ui:
container_name: selenoid-ui
network_mode: bridge
image: "aerokube/selenoid-ui"
links:
- selenoid
ports:
- "8080:8080"
env_file:
- variables.env
command: ["--selenoid-uri", "http://selenoid:4444"]
nginx:
container_name: nginx
image: "nginx"
ports:
- "80:80"
volumes:
- ./jenkins/workspace/allure-report:/usr/share/nginx/html:ro,rslave
Found the solution: the easiest way to get access to the dynamic data is to use volumes_from in that container you want to look from.
When I configured my compose file like that I faced another issue - the 403 status has gone but the data was static. But that was my fault, I didn't use "cp -r " command correctly so my data has been copied only once.
I am using kafka connect HDFS sink and Hadoop (for HDFS) in a docker-compose.
Hadoop (namenode and datanode) seems working correctly.
But I have an error with kafka connect sink:
ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED
(io.confluent.connect.hdfs.TopicPartitionWriter:277)
org.apache.kafka.connect.errors.DataException:
Error creating writer for log file hdfs://namenode:8020/logs/MyTopic/0/log
For information:
Hadoop services in my docker-compose.yml:
namenode:
image: uhopper/hadoop-namenode:2.8.1
hostname: namenode
container_name: namenode
ports:
- "50070:50070"
networks:
default:
fides-webapp:
aliases:
- "hadoop"
volumes:
- namenode:/hadoop/dfs/name
env_file:
- ./hadoop.env
environment:
- CLUSTER_NAME=hadoop-cluster
datanode1:
image: uhopper/hadoop-datanode:2.8.1
hostname: datanode1
container_name: datanode1
networks:
default:
fides-webapp:
aliases:
- "hadoop"
volumes:
- datanode1:/hadoop/dfs/data
env_file:
- ./hadoop.env
And my kafka-connect file:
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=MyTopic
hdfs.url=hdfs://namenode:8020
flush.size=3
EDIT:
I add an env variable for kafka connect to be aware of the cluster name (env variable: CLUSTER_NAME to add in kafka connect service in docker compose file).
The error is not the same (and it seems to solve a problem):
INFO Starting commit and rotation for topic partition scoring-topic-0 with start offsets {partition=0=0} and end offsets {partition=0=2}
(io.confluent.connect.hdfs.TopicPartitionWriter:368)
ERROR Exception on topic partition MyTopic-0: (io.confluent.connect.hdfs.TopicPartitionWriter:403)
org.apache.kafka.connect.errors.DataException: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /topics/+tmp/MyTopic/partition=0/bc4cf075-ccfa-4338-9672-5462cc6c3404_tmp.avro
could only be replicated to 0 nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
EDIT2:
The hadoop.env file is:
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
# Configure default BlockSize and Replication for local
# data. Keep it small for experimentation.
HDFS_CONF_dfs_blocksize=1m
YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver
Finaly like noticed by #cricket_007 I need to configure hadoop.conf.dir.
The directory should contain hdfs-site.xml.
When each service is dockerized, I need to create a named volume in order to share configuration files between kafka-connect service and namenode service.
To do this I add in my docker-compose.yml:
volumes:
hadoopconf:
Then for namenode service I add:
volumes:
- hadoopconf:/etc/hadoop
And for kafka connect service:
volumes:
- hadoopconf:/usr/local/hadoop-conf
Finaly I set hadoop.conf.dir in my HDFS sink properties file to /usr/local/hadoop-conf.
I have a situation with cassandra container.
I have 2 docker-compse.yaml files in different folders.
docker-compose.yaml in folder 1
version: "3"
services:
cassandra-cluster-node-1:
image: cassandra:3.0
container_name: cassandra-cluster-node-1
hostname: cassandra-cluster-node-1
ports:
- '9142:9042'
- '7199:7199'
- '9160:9160'
docker-compose.yaml in folder 2
version: "3"
services:
cassandra-cluster-node-2:
image: cassandra:3.0
container_name: cassandra-cluster-node-2
hostname: cassandra-cluster-node-2
ports:
- '9242:9042'
- '7299:7199'
- '9260:9160'
I tried to up cassandra on folder 1, the system work well, after that I up cassandra on folder 2. But at this time, service cassandra on folder 1 is killed automatically. So I didn't understand with them, could you guys please, who have experiences with Docker can help me to explain this situation?
The error in cassandra_1 after I run cassandra_2
cassandra-cluster-node-1 exited with code 137
Thank you, I'm going to appreciate your help.
137 is out of memory error. Cassandra uses a lot of memory if started with default settings. By default it takes 1/4 of the system memory. For each instans. You can restrict the memory usage using environment variables (see my example further down)
Docker compose creates a network for each directory it runs under. With your setup the two nodes will never be able to find each other. This is the output from my test, your files are put into two directories; cass1 and cass1
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
dbe9cafe0af3 bridge bridge local
70cf3d77a7fc cass1_default bridge local
41af3e02e247 cass2_default bridge local
21ac366b7a31 host host local
0787afb9aeeb none null local
You can see the two networks cass1_default and cass2_default. So the two nodes will not find each other.
If you want them to find each other you have to give the first one as a seed to second one, and they have to be in the same network (same docker-compose file)
version: "3"
services:
cassandra-cluster-node-1:
image: cassandra:3.0
container_name: cassandra-cluster-node-1
hostname: cassandra-cluster-node-1
environment:
- "MAX_HEAP_SIZE=1G"
- "HEAP_NEWSIZE=256M"
ports:
- '9142:9042'
- '7199:7199'
- '9160:9160'
cassandra-cluster-node-2:
image: cassandra:3.0
container_name: cassandra-cluster-node-2
hostname: cassandra-cluster-node-2
environment:
- "MAX_HEAP_SIZE=1G"
- "HEAP_NEWSIZE=256M"
- "CASSANDRA_SEEDS=cassandra-cluster-node-1"
ports:
- '9242:9042'
- '7299:7199'
- '9260:9160'
depends_on:
- cassandra-cluster-node-1