I have made some good experiences with Airflow in the past, unfortunately only as a user and I was not ressponsible for the setup. Now I want to setup Airflow on my own but I am really struggeling with it because tasks are stucked in the queue after a short period of time.
As you can see I am using DockerOperator to run tasks in Docker containers.
This is my Dockerfile:
FROM apache/airflow:2.1.4
RUN pip install --no-cache-dir apache-airflow-providers-docker==2.5.0 boto3==1.21.45 apache-airflow-providers-amazon==3.0.0
And here is my docker-compose file:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
# Default: apache/airflow:master-python3.8
# AIRFLOW_UID - User ID in Airflow containers
# Default: 50000
# AIRFLOW_GID - Group ID in Airflow containers
# Default: 50000
#
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
# Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
# Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
# Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3'
x-airflow-common:
&airflow-common
image: airflow_image:latest
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: 60
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
SATELLITE_ARCHIVE_MOUNT: ${SATELLITE_ARCHIVE_MOUNT}
NWP_ARCHIVE_MOUNT: ${NWP_ARCHIVE_MOUNT}
ECCODES_DEFINITION_PATH: ${ECCODES_DEFINITION_PATH}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- /var/run/docker.sock:/var/run/docker.sock
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test: ["CMD-SHELL", 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume:
I hope anyone of you is able to help me. Using another task scheduler, all tasks run fine, but the task scheduler does not have features like retries with persisting parameters this is why I want to move to Airflow.
Thanks in advance
PS: As a side effect, the tasks run slower in Airflow. Maybe some of you know why?
Related
I'm having some issues while trying to display my local DAGs on Airflow.
I deploy the Airflow with Docker but I'm missing to display the DAGs that I have on my local computer and it's only displaying the "standard" DAGs that came out when I set up the airflow inside the "docker-compose.yaml" file.
the path for my dag/log files is: C:\Users\taz\Documents\workspace (the workspace is the folder where I have the dag and logs folders)
And here is the "docker-compose.yaml"
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.2.5}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
AIRFLOW__CORE__DAGS_FOLDER: /opt/workspace/dags
volumes:
- ./dags: /opt/workspace/dags
- ./logs: /opt/workspace/logs
- ./plugins: /opt/workspace/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
#- postgres-db-volume:/var/lib/postgresql/data
- ./:/workspace
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume:
And while trying to run the docker, I got the error
Cannot start Docker Compose application. Reason: Error invoking remote method 'compose-action': Error: Command failed: docker compose --file "docker-compose.yaml" --project-name "workspace" --project-directory "C:\Users\taz\Documents\workspace" up -d services.airflow-webserver.volumes.0 type is required
You havent set the environment variable of $AIRFLOW_HOME which is why it dosent show your dags, as this variable is an home/ directory to airflow and your volume/mnt is kind of path to dags
I just recently installed airflow 2.1.4 with docker containers, I've successfully set up the postgres, redis, scheduler, 2x local workers, and flower on the same machine with docker-compose.
Now I want to expand, and set up workers on other machines.
I was able to get the workers up and running, flower is able to find the worker node, the worker is receiving tasks from the scheduler correctly, but regardless of the result status of the task, the task would be marked as failed with error message like below:
*** Log file does not exist: /opt/airflow/logs/test/test/2021-10-29T14:38:37.669734+00:00/1.log
*** Fetching from: http://b7a0154e7e20:8793/log/test/test/2021-10-29T14:38:37.669734+00:00/1.log
*** Failed to fetch log file from worker. [Errno -3] Temporary failure in name resolution
Then I tried replaced AIRFLOW__CORE__HOSTNAME_CALLABLE: 'socket.getfqdn' with AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
I got this error instead:
*** Log file does not exist: /opt/airflow/logs/test/test/2021-10-28T15:47:59.625675+00:00/1.log
*** Fetching from: http://172.18.0.2:8793/log/test/test/2021-10-28T15:47:59.625675+00:00/1.log
*** Failed to fetch log file from worker. [Errno 113] No route to host
Then I tried map the port 8793 of the worker with its host machine (in worker_4 below), now it's returning:
*** Failed to fetch log file from worker. [Errno 111] Connection refused
but sometimes still give "Temporary failure in name resolution" error.
I've also tried to copy the URL in the error, and change replace the IP with the host machine ip, and got this message:
Forbidden
You don't have the permission to access the requested resource. It is either read-protected or not readable by the server.
Please let me know if additional info is needed.
Thanks in advance!
Below is my docker-compose.yml for the scheduler/webserver/flower:
version: '3.4'
x-hosts: &extra_hosts
postgres: XX.X.XX.XXX
redis: XX.X.XX.XXX
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.4}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'America/New_York'
AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
AIRFLOW_WEBSERVER_DEFAULT_UI_TIMEZONE: 'America/New_York'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-slack}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
- /etc/hostname:/etc/hostname
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
extra_hosts: *extra_hosts
services:
postgres:
container_name: 'airflow-postgres'
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- ./data/postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
ports:
- '5432:5432'
redis:
image: redis:latest
container_name: 'airflow-redis'
expose:
- 6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
ports:
- '6379:6379'
airflow-webserver:
<<: *airflow-common
container_name: 'airflow-webserver'
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
airflow-scheduler:
<<: *airflow-common
container_name: 'airflow-scheduler'
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
airflow-worker1:
build: ./worker_config
container_name: 'airflow-worker_1'
command: celery worker -H worker_1
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
- redis
- postgres
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
extra_hosts: *extra_hosts
airflow-worker2:
build: ./worker_config
container_name: 'airflow-worker_2'
command: celery worker -H worker_2
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
- redis
- postgres
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
extra_hosts: *extra_hosts
flower:
<<: *airflow-common
container_name: 'airflow_flower'
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
and my docker-compose.yml for worker on another machine:
version: '3.4'
x-hosts: &extra_hosts
postgres: XX.X.XX.XXX
redis: XX.X.XX.XXX
x-airflow-common:
&airflow-common
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'America/New_York'
AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
AIRFLOW_WEBSERVER_DEFAULT_UI_TIMEZONE: 'America/New_York'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
- /etc/hostname:/etc/hostname
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
extra_hosts: *extra_hosts
services:
worker_3:
build: ./worker_config
restart: always
extra_hosts: *extra_hosts
volumes:
- ./airflow.cfg:/opt/airflow/airflow.cfg
- ./dags:/opt/airflow/dags
- ./assets:/opt/airflow/assets
- ./logs:/opt/airflow/logs
- /etc/hostname:/etc/hostname
entrypoint: airflow celery worker -H worker_3
environment:
<<: *airflow-common-env
WORKER_NAME: worker_147
healthcheck:
test: ['CMD-SHELL', '[ -f /usr/local/airflow/airflow-worker.pid ]']
interval: 30s
timeout: 30s
retries: 3
worker_4:
build: ./worker_config_py2
restart: always
extra_hosts: *extra_hosts
volumes:
- ./airflow.cfg:/opt/airflow/airflow.cfg
- ./dags:/opt/airflow/dags
- ./assets:/opt/airflow/assets
- ./logs:/opt/airflow/logs
- /etc/hostname:/etc/hostname
entrypoint: airflow celery worker -H worker_4_py2 -q py2
environment:
<<: *airflow-common-env
WORKER_NAME: worker_4_py2
healthcheck:
test: ['CMD-SHELL', '[ -f /usr/local/airflow/airflow-worker.pid ]']
interval: 30s
timeout: 30s
retries: 3
ports:
- 8793:8793
For this issue: " Failed to fetch log file from worker. [Errno -3] Temporary failure in name resolution"
Looks like the worker's hostname is not being correctly resolved. The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found, Therefore, add the host name to IP mapping on the master's vim /etc/hosts
You need to have the image that's going to be used in all your containers except message broker, meta database and worker monitor. Following is the Dockerfile.
2.If using LocalExecutor, the scheduler and the webserver must be on the same host.
Docker file:
FROM puckel/docker-airflow:1.10.9
COPY airflow/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg
COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
here is for deps for docker to deploy for webserver
webserver:
The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found, therefore, add the host name to IP mapping on the master's vim /etc/hosts
to fix it:
Fist of all, get configuration file by typing:
helm show values apache-airflow/airflow > values.yaml
After that check that fixPermissions is true.
You need to enable persistence volumes:
Enable persistent volumes
enabled: true
Volume size for worker StatefulSet
size: 10Gi
If using a custom storageClass, pass name ref to all statefulSets here
storageClassName:
Execute init container to chown log directory.
fixPermissions: true
Update your installation by:
helm upgrade --install airflow apache-airflow/airflow -n ai
I am trying to setup Vault as s secrets backend with Airflow on my local machine with docker-compose but unable to make a connection. I building on top of Official Airflow docker-compose file. I have added Vault as a service and added VAULT_ADDR=http://vault:8200 as environment variable for the Airflow application.
In one of my dag, I am trying to fetch a secret from the Vault but I am getting connection refused.
When the services are running, I can access Vault CLI and create secrets which means that Vault is running fine. I also tried docker compose exec -- airflow-webserver curl http://vault:8200 to see if there's some issue with the dag but I get the same connection refused error.
I also tried docker compose exec -- airflow-webserver curl http://flower:5555 just to see if the docker networking is working fine and it returned the correct response from flower service.
# example dag
from airflow.decorators import dag, task
from airflow.hooks.base import BaseHook
from airflow.utils.dates import days_ago
default_args = {
'owner': 'BooHoo'
}
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2), tags=['example'])
def get_secrets():
#task()
def get():
conn = BaseHook.get_connection(conn_id='slack_conn_id')
print(f"Password: {conn.password}, Login: {conn.login}, URI: {conn.get_uri()}, Host: {conn.host}")
get()
get_secrets_dag = get_secrets()
Here's the docker compose file.
version: '3'
x-airflow-common:
&airflow-common
image: apache/airflow:2.1.0-python3.7
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false' # default is true
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: 'true'
# AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
AIRFLOW__SECRETS__BACKEND: 'airflow.providers.hashicorp.secrets.vault.VaultBackend'
AIRFLOW__SECRETS__BACKEND_KWARGS: '{"connections_path": "connections", "variables_path": "variables", "mount_point": "secrets", "token": "${VAULT_DEV_ROOT_TOKEN_ID}"}'
VAULT_ADDR: 'http://vault:8200'
SLACK_WEBHOOK_URL: "${SLACK_WEBHOOK_URL}"
volumes:
- ./src/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
vault:
condition: service_healthy
services:
vault:
image: vault:latest
ports:
- "8200:8200"
environment:
VAULT_ADDR: 'http://0.0.0.0:8200'
VAULT_DEV_ROOT_TOKEN_ID: "${VAULT_DEV_ROOT_TOKEN_ID}"
cap_add:
- IPC_LOCK
command: vault server -dev
healthcheck:
test: [ "CMD", "vault", "status" ]
interval: 5s
retries: 5
restart: always
postgres:
# service configuration
redis:
# service configurations
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- "8080:8080"
healthcheck:
test: [ "CMD", "curl", "--fail", "http://localhost:8080/health" ]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: [ "CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"' ]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
# service configuration
volumes:
postgres-db-volume:
I think you need to specify dev listen address in your command:
vault server -dev -dev-listen-address="0.0.0.0:8200"
or set
VAULT_DEV_LISTEN_ADDRESS to 0.0.0.0:8200
Here are the docs: https://www.vaultproject.io/docs/commands/server#dev-options
Problem Definition
I am trying to use two docker-compose.yml files (each in separate directories) on the same host machine, one for Airflow and the other for another application. I have put Airflow's containers in the same named network as my other app (see the below compose files) and confirmed using docker network inspect that the Airflow containers are in the network. However when I make a curl from the airflow worker container the my_keycloak server I get the following error:
Error
Failed to connect to localhost port 9080: Connection refused
Files
Airflow docker-compose.yml
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
#added working directory and scripts folder 6-26-2021 CP
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
#changed from default of 8080 because of clash with baton docker services 6-26-2021 CP
ports:
- 50309:8080
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:50309/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
#added so that airflow can interact with baton 6-30-2021 CP
networks:
- baton_docker_files_tempo
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume:
#added baton network so that airflow can communicate with baton cp 6-28-2021
networks:
baton_docker_files_tempo:
external: true
other apps docker-compose file
version: "3.7"
services:
db:
image: artifactory.redacted.com/docker/postgres:11.3
ports:
- 11101:5432
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: keycloaks156
networks:
- tempo
keycloak:
image: registry.git.redacted.com/tempo23/tempo23-server/keycloak:${TEMPO_VERSION:-develop}
container_name: my_keycloak
environment:
KEYCLOAK_USER: admin
KEYCLOAK_PASSWORD: admin
KEYCLOAK_DEFAULT_THEME: redacted
KEYCLOAK_WELCOME_THEME: redacted
PROXY_ADDRESS_FORWARDING: 'true'
KEYCLOAK_FRONTEND_URL: http://localhost:9080/auth
DB_VENDOR: postgres
DB_ADDR: db
DB_USER: postgres
DB_PASSWORD: postgres
ports:
- 9080:8080
networks:
- tempo
depends_on:
- db
db-migrate:
image: registry.git.redacted.com/tempo23/tempo23-server/db-migrate:${TEMPO_VERSION:-develop}
command: "-url=jdbc:postgresql://db:5432/ -user=postgres -password=postgres -connectRetries=60 migrate"
restart: on-failure:3
depends_on:
- db
networks:
- tempo
keycloak-bootstrap:
image: registry.git.redacted.com/tempo23/tempo23-server/server-full:${TEMPO_VERSION:-develop}
command: ["keycloakBootstrap", "--config", "conf/single.conf"]
depends_on:
- db
restart: on-failure:10
networks:
- tempo
server:
image: registry.git.redacted.com/tempo23/tempo23-server/server:${TEMPO_VERSION:-develop}
command: [ "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005", "conf/single.conf" ]
environment:
AKKA_PARALLELISM_MAX: "2"
DB_THREADPOOL_SIZE: "4"
UNSAFE_ENABLED: "true"
DOCKER_BIND_HOST_ROOT: "${BIND_ROOT}"
DOCKER_BIND_CONTAINER_ROOT: "/var/lib/tempo2"
MESSAGING_HOST: "server"
PUBSUB_TYPE: inmem
TEMPOJOBS_DOCKER_TAG: registry.git.redacted.com/tempo23/tempo23-server/tempojobs:${TEMPO_VERSION:-develop}
NUM_WORKER: 1
ASSET_CACHE_SIZE: 500M
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- "${BIND_ROOT}:/var/lib/tempo2"
ports:
- 2551:2551 # akka port
- 8080:8080 # application http port
- 8081:8081 # executor http port
- 5005:5005 # debug port
networks:
- tempo
restart: always
depends_on:
- db
networks:
tempo:
Read carefully the doc on ports.
It allows to expose a container port to a host port.
Between services in the same network you can just reach a service on service-name:port, in this case keycloak:8080 instead of localhost:9080
No matter where each container resides (any docker-compose file on the same machine). The only thing matter is network as you have mentioned in your question, they are on the same network, so they can see each other on network. But the misunderstanding is where the container are isolated from each other. Therefore instead of localhost you should pass the container-name and execute the curl with it.
Try running:
curl keycloak:9080
I am trying to deploy a mastodon server using this project: https://github.com/tootsuite/mastodon
I am running Docker-Compose and Podman on a Fedora 33 server.
$ docker-compose --version
docker-compose version 1.27.4, build unknown
$ docker --version
podman version 3.0.1
$ cat /etc/fedora-release
Fedora release 33 (Thirty Three)
I had to do some changes into the docker-compose.yml to make it work with Podman. You can see my whole config file below.
version: '3'
services:
db:
restart: always
image: postgres:9.6-alpine
shm_size: 256mb
networks:
- internal_network
healthcheck:
test: ["CMD", "pg_isready", "-U", "postgres"]
timeout: 45s
interval: 10s
retries: 10
volumes:
- ./postgres:/var/lib/postgresql/data
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
redis:
restart: always
image: redis:6.0-alpine
networks:
- internal_network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
timeout: 45s
interval: 10s
retries: 10
volumes:
- ./redis:/data
# es:
# restart: always
# image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.10
# environment:
# - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
# - "cluster.name=es-mastodon"
# - "discovery.type=single-node"
# - "bootstrap.memory_lock=true"
# networks:
# - internal_network
# healthcheck:
# test: ["CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1"]
# volumes:
# - ./elasticsearch:/usr/share/elasticsearch/data
# ulimits:
# memlock:
# soft: -1
# hard: -1
web:
# build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: bash -c "rm -f /mastodon/tmp/pids/server.pid; bundle exec rails s -p 3000"
networks:
- external_network
- internal_network
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:3000/health || exit 1"]
timeout: 45s
interval: 10s
retries: 10
ports:
- "127.0.0.1:3000:3000"
depends_on:
- db
- redis
# - es
volumes:
- ./public/system:/mastodon/public/system
streaming:
build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: node ./streaming
networks:
- external_network
- internal_network
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:4000/api/v1/streaming/health || exit 1"]
timeout: 45s
interval: 10s
retries: 10
ports:
- "127.0.0.1:4000:4000"
depends_on:
- db
- redis
sidekiq:
build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: bundle exec sidekiq
depends_on:
- db
- redis
networks:
- external_network
- internal_network
volumes:
- ./public/system:/mastodon/public/system
## Uncomment to enable federation with tor instances along with adding the following ENV variables
## http_proxy=http://privoxy:8118
## ALLOW_ACCESS_TO_HIDDEN_SERVICE=true
# tor:
# image: sirboops/tor
# networks:
# - external_network
# - internal_network
#
# privoxy:
# image: sirboops/privoxy
# volumes:
# - ./priv-config:/opt/config
# networks:
# - external_network
# - internal_network
networks:
external_network:
internal_network:
internal: true
Here is a diff with remote version of the file on the repository:
(tl;dr: I added options to health-checks and an env variable to authorize running postgres without password and commented build option to use image from the repo, as building was failing too)
$ git diff docker-compose.yml
diff --git a/docker-compose.yml b/docker-compose.yml
index 52eea7a74..a8e047ec7 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
## -9,8 +9,13 ## services:
- internal_network
healthcheck:
test: ["CMD", "pg_isready", "-U", "postgres"]
+ timeout: 45s
+ interval: 10s
+ retries: 10
volumes:
- ./postgres:/var/lib/postgresql/data
+ environment:
+ - POSTGRES_HOST_AUTH_METHOD=trust
redis:
restart: always
## -19,6 +24,9 ## services:
- internal_network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
+ timeout: 45s
+ interval: 10s
+ retries: 10
volumes:
- ./redis:/data
## -42,7 +50,7 ## services:
# hard: -1
web:
- build: .
+ # build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
## -52,6 +60,9 ## services:
- internal_network
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:3000/health || exit 1"]
+ timeout: 45s
+ interval: 10s
+ retries: 10
ports:
- "127.0.0.1:3000:3000"
depends_on:
## -72,6 +83,9 ## services:
- internal_network
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:4000/api/v1/streaming/health || exit 1"]
+ timeout: 45s
+ interval: 10s
+ retries: 10
ports:
- "127.0.0.1:4000:4000"
depends_on:
Generating secrets was fine, but it failed on this command:
$ sudo docker-compose run --rm web bundle exec rails db:migrate
Creating network "mastodon_internal_network" with the default driver
Creating network "mastodon_external_network" with the default driver
Creating mastodon_db_1 ... done
Creating mastodon_redis_1 ... done
Creating mastodon_web_run ... done
rails aborted!
PG::ConnectionBad: could not translate host name "db" to address: Name or service not known
I already used the combination of Docker-Compose and Podman 3.0 with several projects and I never had any issue with hostname resolving inside networks. I wonder if I must specify a driver for this situation.
Also I would like a way to test if I can reach db service with this hostname from the container of web and so, if the problem is in the code (that I highly doubt but I want to be sure).
EDIT1: Logs of db service showing that the service seems to be running fine and ready to accept connections
$ sudo docker logs -f mastodon_db_1
PostgreSQL Database directory appears to contain a database; Skipping initialization
LOG: database system was shut down at 2021-04-01 07:02:04 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
I have found a solution: removing networks' definition.
It sounds cheap, but it worked.
So the final docker-compose.yml looks like this:
version: '3'
services:
db:
restart: always
image: postgres:9.6-alpine
shm_size: 256mb
healthcheck:
test: ["CMD", "pg_isready", "-U", "postgres"]
timeout: 45s
interval: 10s
retries: 10
volumes:
- ./postgres:/var/lib/postgresql/data
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
redis:
restart: always
image: redis:6.0-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
timeout: 45s
interval: 10s
retries: 10
volumes:
- ./redis:/data
# es:
# restart: always
# image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.10
# environment:
# - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
# - "cluster.name=es-mastodon"
# - "discovery.type=single-node"
# - "bootstrap.memory_lock=true"
# networks:
# - internal_network
# healthcheck:
# test: ["CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1"]
# volumes:
# - ./elasticsearch:/usr/share/elasticsearch/data
# ulimits:
# memlock:
# soft: -1
# hard: -1
web:
# build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: bash -c "rm -f /mastodon/tmp/pids/server.pid; bundle exec rails s -p 3000"
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:3000/health || exit 1"]
timeout: 45s
interval: 10s
retries: 10
ports:
- "127.0.0.1:3000:3000"
depends_on:
- db
- redis
# - es
volumes:
- ./public/system:/mastodon/public/system
streaming:
build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: node ./streaming
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy=off localhost:4000/api/v1/streaming/health || exit 1"]
timeout: 45s
interval: 10s
retries: 10
ports:
- "127.0.0.1:4000:4000"
depends_on:
- db
- redis
sidekiq:
build: .
image: tootsuite/mastodon
restart: always
env_file: .env.production
command: bundle exec sidekiq
depends_on:
- db
- redis
volumes:
- ./public/system:/mastodon/public/system
## Uncomment to enable federation with tor instances along with adding the following ENV variables
## http_proxy=http://privoxy:8118
## ALLOW_ACCESS_TO_HIDDEN_SERVICE=true
# tor:
# image: sirboops/tor
# networks:
# - external_network
# - internal_network
#
# privoxy:
# image: sirboops/privoxy
# volumes:
# - ./priv-config:/opt/config
# networks:
# - external_network
# - internal_network