I've been trying to set up an Airflow instance on a set of docker containers for the sake of learning. I know just the core basics of both Docker and Airflow.
I've been following this guide on how to set up Airflow using Docker.
When following the guide I can start up the containers and everything work as expected.
My next step was to start changing different configuration alternatives in the docker-compose.yml-file. I started by trying to set different usernames and passwords.
I created an .env-file in the same directory as the docker-compose.yml-file. See below.
AIRFLOW_UID=50000
USER='user'
PASSWORD='password'
DATABASE='airflow'
_AIRFLOW_WWW_USER_USERNAME='user'
_AIRFLOW_WWW_USER_PASSWORD='password'
In order to use these environments variables, I made some changes to the connection strings in the docker-compose.yml aswell. Original lines has been commented out. Se below.
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.4.2}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${USER}:${PASSWORD}#postgres/${DATABASE}
# For backward compatibility, with Airflow <2.3
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${USER}:${PASSWORD}#postgres/${DATABASE}
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${USER}:${PASSWORD}#postgres/${DATABASE}
# -- Original settings --
#AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:postgresql+psycopg2://airflow:airflow#postgres/airflow
#AIRFLOW__CORE__SQL_ALCHEMY_CONN:postgresql+psycopg2://airflow:airflow#postgres/airflow
#AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: ${USER}
POSTGRES_PASSWORD: ${PASSWORD}
POSTGRES_DB: ${DATABASE}
# -- Original settings --
#POSTGRES_USER: airflow
#POSTGRES_PASSWORD: airflow
#POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", '${USER}']
interval: 5s
retries: 5
restart: always
First I run docker-compose down --volumes --rmi all in order to clean up the environment. I then run docker-compose up airflow-init. Below are the last lines displayed in the terminal.
data_collection_proj-airflow-init-1 | User "user" created with role "Admin"
data_collection_proj-airflow-init-1 | /home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:367: FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
data_collection_proj-airflow-init-1 | FutureWarning,
data_collection_proj-airflow-init-1 | 2.4.2
data_collection_proj-airflow-init-1 exited with code 0
No errors or warnings were displayed so it seems to work fine.
I then run docker-compose up and get the following error.
data_collection_proj-postgres-1 | 2022-11-12 17:47:08.939 UTC [165] FATAL: database "user" does not exist
My changes seem to destroy the set-up but I don't understand why.
You can try this test syntax, it works fine for me:
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: ${USER}
POSTGRES_PASSWORD: ${PASSWORD}
POSTGRES_DB: ${DATABASE}
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "sh -c 'pg_isready -U ${USER} -d ${DATABASE}'"]
interval: 5s
retries: 5
restart: always
Related
I am running an Airflow workflow in Docker to ingest data into GCP but I keep getting this error message in the terminal.
After trying the following docker compose command I get the following errors
$ docker compose build
$ docker compose up
Error:
Error response from daemon: Mounts denied:
The path /.google/credentials/google_credentials.json is not shared from the host and is not known to Docker.
You can configure shared paths from Docker -> Preferences... -> Resources -> File Sharing.
Below is the file structure and Docker YAML file too
YAML
version: "3"
services:
postgres:
image: postgres:13
env_file:
- .env
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
scheduler:
build: .
command: scheduler
restart: on-failure
depends_on:
- postgres
env_file:
- .env
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./scripts:/opt/airflow/scripts
- ~/.google/credentials/:/.google/credentials.json
webserver:
build: .
entrypoint: ./scripts/entrypoint.sh
restart: on-failure
depends_on:
- postgres
- scheduler
env_file:
- .env
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- /.google/credentials/google_credentials.json:/.google/credentials:ro
- ./scripts:/opt/airflow/scripts
user: "${AIRFLOW_UID:-50000}:0"
ports:
- "8082:8080"
healthcheck:
test: ["CMD-SHELL", "[ -f /home/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
volumes:
postgres-db-volume:
enviroment file <.env>
AIRFLOW_UID=50000
#PG_HOST=pgdatabase
#PG_USER=root
#PG_PASSWORD=root
#PG_PORT=5432
#PG_DATABASE=ny_taxi
# Custom
COMPOSE_PROJECT_NAME=de-queries
GOOGLE_APPLICATION_CREDENTIALS=/.google/credentials/google_credentials.json
AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT=google-cloud-platform://?extra__google_cloud_platform__key_path=/.google/credentials/google_credentials.json
# AIRFLOW_UID=
GCP_PROJECT_ID=
GCP_GCS_BUCKET=
# Postgres
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflow
POSTGRES_DB=airflow
# Airflow
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=10
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://${POSTGRES_USER}:${POSTGRES_PASSWORD}#postgres:5432/${POSTGRES_DB}
AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://airflow:airflow#postgres:5432/airflow
AIRFLOW_VAR__METADATA_DB_SCHEMA=airflow
_AIRFLOW_WWW_USER_CREATE=True
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=airflow
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
AIRFLOW__CORE__LOAD_EXAMPLES=False
shell file <.sh>
#!/usr/bin/env bash
export GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS}
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT=${AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT}
airflow db upgrade
airflow users create -r Admin -u admin -p admin -e admin#example.com -f admin -l airflow
# "$_AIRFLOW_WWW_USER_USERNAME" -p "$_AIRFLOW_WWW_USER_PASSWORD"
airflow webserver
I have tried changing my environment variable to a relative path
GOOGLE_APPLICATION_CREDENTIALS=Users/<username>/.google/credentials/google_credentials.json
I have also tried using the terminal to export the path with
export GOOGLE_APPLICATION_CREDENTIALS=/.google/credentials/google_credentials.json
This question already has answers here:
Docker Compose wait for container X before starting Y
(20 answers)
Closed 6 months ago.
I have a docker-compose.yml that looks like this:
version: '3.8'
services:
api:
container_name: todoapp-api
build:
context: .
dockerfile: Dockerfile
ports:
- 3333:3333
depends_on:
postgresdb:
condition: service_healthy
postgresdb:
image: postgres:13
restart: unless-stopped
environment:
- POSTGRES_USER=myuser
- POSTGRES_PASSWORD=mypassword
- POSTGRES_DB=mydb
volumes:
- postgres:/var/lib/postgresql/data
ports:
- '5432:5432'
healthcheck:
test: ['CMD-SHELL', 'pg_isready']
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres:
Problem is that I keep getting the error code P1001 when I run this. I figured adding a healthcheck would do the trick but that doesn't work. Plus, depends_on: seems to not be present in v3 above. Is there a way I can write a script for maybe my package.json to wait for the database to be up?
You could waiting for db in the service which needs database itself. To do so, you have to create an entrypoint.sh file & have that run as main command. It would be sth like this:
entrypoint.sh
#!/bin/bash
set -e
# There are some times database is not ready yet!
# We'll check if database is ready and we can connect to it
# then the rest of the code run as well.
echo "Waiting for database..."
echo DB_NAME: ${DB_NAME}
echo DB_HOST: ${DB_HOST}
echo DB_PORT: ${DB_PORT}
while ! nc -z ${DB_HOST} ${DB_PORT}; do sleep 1; done
echo "Connected to database."
# ... Run what you have to here
Dockerfile
...
CMD /usr/app/entrypoint.sh
I am running keycloak on docker, and I want to import a realm, previously exported. But I get the error: Error during startup: org.keycloak.models.ModelException: Invalid config for hashAlgorithm: Password hashing provider not found
I read that I needed to copy jbcrypt.jar to the container, in standalone/deployments folder. I then created a Dockerfile with the, so I could copy jbcrypt. Please how can I provide a password hashing provider for bcrypt?.
Dockerfile:
FROM jboss/keycloak:latest
ARG KEYCLOAK_HOME=/opt/jboss/keycloak
RUN curl -L https://github.com/leroyguillaume/keycloak-bcrypt/releases/download/1.5.0/keycloak-bcrypt-1.5.0.jar > $KEYCLOAK_HOME/standalone/deployments/keycloak-bcrypt-1.5.0.jar
RUN curl -L https://repo1.maven.org/maven2/org/mindrot/jbcrypt/0.4/jbcrypt-0.4.jar > $KEYCLOAK_HOME/standalone/deployments/jbcrypt-0.4.jar
RUN cd $KEYCLOAK_HOME/standalone/deployments/ && ls
RUN /opt/jboss/keycloak/bin/jboss-cli.sh --command="module add --name=org.mindrot.jbcrypt --resources=$KEYCLOAK_HOME/standalone/deployments/jbcrypt-0.4.jar"
RUN /opt/jboss/keycloak/bin/jboss-cli.sh --command="module add --name=org.mindrot.keycloakbcrypt --resources=$KEYCLOAK_HOME/standalone/deployments/keycloak-bcrypt-1.5.0.jar"
docker-compose.yml
version: '3'
volumes:
postgres_data:
driver: local
services:
ncbs_core_keycloak_postgres:
image: postgres:latest
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: keycloak
POSTGRES_USER: keycloak
POSTGRES_PASSWORD: password
restart: always
ncbs_core_keycloak:
# image: jboss/keycloak:latest
build: .
environment:
DB_VENDOR: POSTGRES
DB_ADDR: ncbs_core_keycloak_postgres
DB_DATABASE: keycloak
DB_USER: keycloak
DB_SCHEMA: public
DB_PASSWORD: password
KEYCLOAK_USER: admin
KEYCLOAK_PASSWORD: admin
KEYCLOAK_IMPORT: ./imports/realm-export.json
# Uncomment the line below if you want to specify JDBC parameters. The parameter below is just an example, and it shouldn't be used in production without knowledge. It is highly recommended that you read the PostgreSQL JDBC driver documentation in order to use it.
#JDBC_PARAMS: "ssl=true"
command:
- "-b 0.0.0.0"
- "-Dkeycloak.migration.action=import"
- "-Dkeycloak.migration.provider=singleFile"
- "-Dkeycloak.migration.file=/opt/jboss/keycloak/imports/realm-export.json"
- "-Dkeycloak.migration.strategy=OVERWRITE_EXISTING"
volumes:
- ./imports:/opt/jboss/keycloak/imports
ports:
- 8080:8080
depends_on:
- ncbs_core_keycloak_postgres
I've used volume in docker-compose to get it working without restart
Download the lib from GitHub and mount as a volume like below:
volumes:
- ./keycloak-bcrypt-1.5.0.jar:/opt/jboss/keycloak/standalone/deployments/keycloak-bcrypt-1.5.0.jar
I am using docker-compose for airflow, as described in the docker-compose.yml file
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: puckel/docker-airflow:1.10.1
build:
context: https://github.com/puckel/docker-airflow.git#1.10.1
dockerfile: Dockerfile
args:
AIRFLOW_DEPS: gcp_api,s3
PYTHON_DEPS: sqlalchemy==1.2.0
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./examples/intro-example/dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
that I am tooking from tuanavu github repo.
My aim is to enable importing airflow variables from a json file with the following comment :
airflow variables --import /variables.json.
I want to override the entrypoint.sh file used by the docker image puckel/docker-airflow:1.10.1
This can be done by adding the fllowing bloc to the entrypoint.sh file used by the docker image puckel/docker-airflow:1.10.1:
if [ -e "/variables.json" ]; then
airflow variables --import /variables.json
fi
in a specific place of the entrypoint.sh of the docker image puckel/docker-airflow:1.10.1
Is there a way please I can do this in the docker-compose.yml file
I'm trying to import configuration from one keycloak instance into many different keycloak instances (Each instance is for the same application just differnt sections in my CICD flow)
I'm running keycloak through Docker and finding it difficult to import the required json file
To get the actual data I want imported, I went to the required realm and simply clicked the export button with clients etc. selected. This downloaded a file to my browser which I now want imported when I build my docker containers
I've tried a lot of different methods I've found online and nothing seems to be working so I'd appreciate some help
The first thing I tried was to import the file through the docker-compose file using the following
KEYCLOAK_IMPORT: /realm-export.json
The next thing I tried was also in my docker-compose where I tried
command: "-b 0.0.0.0 -Djboss.http.port=8080 -Dkeycloak.migration.action=import -Dkeycloak.import=realm-export.json
Finally, I tried going into my Dockerfile and running the import as my CMD using the following
CMD ["-b 0.0.0.0", "-Dkeycloak.import=/opt/jboss/keycloak/realm-export.json"]
Below is my current docker-compose and Dockerfiles without the imports added, they might be some help in answering this question. Thanks in advance
# Dockerfile
FROM jboss/keycloak:4.8.3.Final
COPY keycloak-metrics-spi-1.0.1-SNAPSHOT.jar keycloak/standalone/deployments
And the keycloak releated section of my docker-compose file
postgres:
image: postgres
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: keycl0ak
POSTGRES_USER: keycl0ak
POSTGRES_PASSWORD: password
ports:
- 5431:5431
keycloak:
build:
context: services/keycloak
environment:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycl0ak
DB_USER: keycl0ak
DB_PASSWORD: password
KEYCLOAK_USER: administrat0r
KEYCLOAK_PASSWORD: asc88a8c0ssssqs
ports:
- 8080:8080
depends_on:
- postgres
volumes:
postgres_data:
driver: local
Explanation
First you need to copy the file into your container before you can import it into Keycloak. You could place your realm-export.json in a folder next to the docker-compose.yml, lets say we call it imports. This can be achieved using volumes:. Once the file has been copied into the container then you can use command: as you were before, pointing at the correct file within the container.
File Structure
/your_computer/keycloak_stuff/
|-- docker-compose.yml
|-- imports -> realm-export.json
Docker-Compose
This is how the docker-compose.yml should look with the changes:
postgres:
image: postgres
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: keycl0ak
POSTGRES_USER: keycl0ak
POSTGRES_PASSWORD: password
ports:
- 5431:5431
keycloak:
build:
context: services/keycloak
volumes:
- ./imports:/opt/jboss/keycloak/imports
command:
- "-b 0.0.0.0 -Dkeycloak.import=/opt/jboss/keycloak/imports/realm-export.json"
environment:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycl0ak
DB_USER: keycl0ak
DB_PASSWORD: password
KEYCLOAK_USER: administrat0r
KEYCLOAK_PASSWORD: asc88a8c0ssssqs
ports:
- 8080:8080
depends_on:
- postgres
volumes:
postgres_data:
driver: local
To wrap up the answer of #JesusBenito and #raujonas, the docker-compose could be changed, so that you make use of the keyloak environment KEYCLOAK_IMPORT:
keycloak:
volumes:
- ./imports:/opt/jboss/keycloak/imports
# command: not needed anymore
# - "-b 0.0.0.0 -Dkeycloak.import=/opt/jboss/keycloak/imports/realm-export.json"
environment:
KEYCLOAK_IMPORT: /opt/jboss/keycloak/imports/realm-export.json -Dkeycloak.profile.feature.upload_scripts=enabled
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycl0ak
DB_USER: keycl0ak
DB_PASSWORD: password
KEYCLOAK_USER: administrat0r
KEYCLOAK_PASSWORD: asc88a8c0ssssqs
This config worked for me:
keycloak:
image: mihaibob/keycloak:15.0.1
container_name: keycloak
ports:
- "9091:8080"
volumes:
- ./src/test/resources/keycloak:/tmp/import
environment:
...
KEYCLOAK_IMPORT: /tmp/import/global.json