How to add airflow variables in docker compose file? - docker

I have an docker compose file which spins up the local airflow instance as below:
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
image: puckel/docker-airflow:1.10.6
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- ./dags:/usr/local/airflow/dags
- ${HOME}/.aws:/usr/local/airflow/.aws
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
I want to add some Airflow variables which the underlying dag uses eg: CONFIG_BUCKET.
I have added them as AIRFLOW_VAR_CONFIG_BUCKET=s3://foo-bucket
in environment section of web server but it does not seem to work. Any ideas how can I achieve this ?

You should not add variables to the webserver, but to scheduler. If you are using LocalExecutor, the tasks are run in the context of Scheduler.
Actually what tou should really do is to set all env variables to be the same for all the containers (this is explained here https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html)
Use the same configuration across all the Airflow components. While each component does not require all, some configurations need to be same otherwise they would not work as expected. A good example for that is secret_key which should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.
There are a number of ways you can do it - just read the docker-compose documntation on that https://docs.docker.com/compose/environment-variables . You can also see the "Quick start" docker compose from Airflow docs where we used anchors - which is bit more sphisticated way https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html
Just note that the "quick start" should be just inspiration, it is nowhere near production setup and if you want to make your own docker compose you need to really get a deeper understanding of the docker compose - as warned in the note in our docs.

If you add an environment variable named AIRFLOW_VAR_CONFIG_BUCKET to the list under environment:, it should be accessible by Airflow. Sounds like you're doing that correctly.
Two things to note:
Variables (& connections) set via environment variables are not visible in the Airflow UI. You can test if they exist by executing Variable.get("config_bucket") in code.
The Airflow scheduler/worker (depending on Airflow executor) require access to the variable while running a task. Adding a variable to the webserver is not required.

Related

Does Sidekiq running multiple times when we replicate the Ruby on Rails container?

I have a Ruby on Rails container with Sidekiq in it.
I have a schedule/cron job in the application. As We know, schedule jobs are registered when the application boots.
If I assume that there will be many users one day and I will create multiple container instances for the application, will this cause Sidekiq to execute on multiple containers?
If the answer is yes (execute multiple jobs in each container), how do I make this job only executed by one container?
version: '3.8'
services:
thedatabase:
image: postgres:12
volumes:
- ./database:/var/lib/postgresql/data
environment:
POSTGRES_USER: theuser
POSTGRES_PASSWORD: thepass
ports:
- '5432:5432'
networks:
- excnetwork
theapi:
build:
context: ./dex-api
dockerfile: Dockerfile
volumes:
- ./dex-api:/web
env_file:
- ./dex-api/.env
ports:
- "8081:3000"
depends_on:
- thedatabase
networks:
- excnetwork
redis:
image: redis
command: 'redis-server --requirepass "yourpassword"'
volumes:
- ./redis:/var/lib/redis/data
ports:
- '6379:6379'
networks:
- excnetwork
volumes:
database:
dex-api:
redis:
networks:
excnetwork:
driver: bridge
[If I] create multiple container instances for the application, will this cause Sidekiq to execute on multiple containers?
Yes; but Sidekiq is clever enough to only run each job once, even if there are multiple workers. Similarly, if you're using the paid version of Sidekiq and its periodic jobs, they shouldn't be duplicated since "the leader process enqueues cron jobs" (doc link).
You don't describe how you're starting Rails and Sidekiq, but your Compose setup only has one application container. You might find it useful to split these into two, which will make it easier to scale the components and protect the Web application if a scheduled job has a problem. In your Dockerfile, start one or the other (I'd default to the Rails server)
ENTRYPOINT ["bundle", "exec"]
CMD ["rails", "server", "-b", "0.0.0.0", "-p", "3000"]
In your Compose file you can start multiple containers from the same image, overriding the command: in some of them if required.
services:
theapi:
build: ./dex-api
env_file:
- ./dex-api/.env
ports:
- "8081:3000"
depends_on:
- thedatabase
- redis
theworker:
build: ./dex-api # <-- same as `theapi`
command: sidekiq # <-- override Dockerfile `CMD`
env_file:
- ./dex-api/.env
depends_on:
- thedatabase
- redis
Whilst there's no single answer that solves it all you can use tried and tested means. One of them is to put Sidekiq in a sidecar. You definitely need Sidekiq to be run as a single, independent instance.
There are other patterns for a multi-container solution such as ambassador and adapter but, a sidecar seems to be the most appropriate option in this instance.
Read more here:
https://medium.com/swlh/pattern-sidecars-ambassadors-and-adapters-containers-ec8f4140c495
Differences between Sidecar and Ambassador and Adapter pattern
https://www.weave.works/blog/container-design-patterns-for-kubernetes/

docker-compose failing sayin image doesn't exist

I am trying to take a very difficult, multi-step docker setup and make it into an easy docker-compose. I haven't really used docker-compose before. I am used to using a dockerfile to build and image, then using something like
docker run --name mysql -v ${PWD}/sql-files:/docker-entrypoint-initdb.d ... -h mysql -d mariadb:10.4
Then running the web app in the same manner after building the dockerfile that is simple. Trying to combine these into a docker-compose.yml file seems to be quite difficult. I'll post up my docker-compose.yml file, edited to remove passwords and such and the error I am getting, hopefully someone can figure out why it's failing, because I have no idea...
docker-compose.yml
version: "3.7"
services:
mysql:
image: mariadb:10.4
container_name: mysql
environment:
MYSQL_ROOT_PASSWORD: passwordd1
MYSQL_ALLOW_EMPTY_PASSWORD: "true"
volumes:
- ./sql-files:/docker-entrypoint-initdb.d
- ./ssl:/var/lib/mysql/ssl
- ./tls.cnf:/etc/mysql/conf.d/tls.cnf
healthcheck:
test: ["CMD", "mysqladmin ping"]
interval: 10s
timeout: 2s
retries: 10
web:
build: ./base/newdockerfile
container_name: web
hostname: dev.website.com
volumes:
- ./ssl:/etc/httpd/ssl
- ./webfiles:/var/www
depends_on:
mysql:
condition: service_healthy
ports:
- "8443:443"
- "8888:80"
entrypoint:
- "/sbin/httpd"
- "-D"
- "FOREGROUND"
The error I get when running docker-compose up in the terminal window is...
Service 'web' depends on service 'mysql' which is undefined.
Why would mysql be undefined. It's the first definition in the yml file and has a health check associated with it. Also, it fails very quickly, like within a few seconds, so there's no way all the healthchecks ran and failed, and I'm not getting any other errors in the terminal window. I do docker-compose up and within a couple seconds I get that error. Any help would be greatly appreciated. Thank you.
according to this documentation
depends_on does not wait for db and redis to be “ready” before
starting web - only until they have been started. If you need to wait
for a service to be ready, see Controlling startup order for more on
this problem and strategies for solving it.
Designing your application so it's resilient when database is not available or set up yet is what we all have to deal with. Healthcheck doesn't guarantee you database is ready before the next stage. The best way is probably write a wait-for-it script or wait-for and run it after depends-on:
depends_on:
- "db"
command: ["./wait-for-it.sh"]

Docker Compose dependent named container

I am trying to bring up my container using docker compose. This container is dependent on two other containers with names 'rabbitmq' & 'mysqldb'.
What my scenario is that the dependent named containers are already created with said name. This is because of some additional configuration i needed for mysql.
How can I link those two containers to this container using docker-compose. Such that it also starts up my named containers when bringing up this myservice container.
Would appreciate any help or direction.
myservice:
image: myaccount/myservice
ports:
- "8081:8081"
restart: always
depends_on:
- rabbitmq
- mysqldb
environment:
SPRING_DATASOURCE_URL: 'jdbc:mysql://mysqldb:3306/myservice'
SPRING_PROFILES_ACTIVE: 'mysql'
SPRING_RABBITMQ_HOST: 'rabbitmq'
healthcheck:
test: ["CMD", "curl", "-f", "http://192.168.99.100:8081/actuator/health"]
interval: 1m30s
timeout: 10s
retries: 3
UPDATE:
I was able to resolve this using external_links and the default bridge.
version: '3'
myservice:
image: myaccount/myservice
ports:
- "8081:8081"
restart: always
external_links:
- rabbitmq
- mysqldb
network_mode: bridge
environment:
SPRING_DATASOURCE_URL: 'jdbc:mysql://mysqldb:3306/myservice'
SPRING_PROFILES_ACTIVE: 'mysql'
SPRING_RABBITMQ_HOST: 'rabbitmq'
healthcheck:
test: ["CMD", "curl", "-f", "http://192.168.99.100:8081/actuator/health"]
interval: 1m30s
timeout: 10s
retries: 3
Any other alternative is appreciated. Problem is using this approach, the dependent docker containers already need to be running. However, in case the docker containers are down, I want compose to bring the same containers up. Any ideas?
docker-compose up doesn't run already created containers. It creates the containers from an image and the runs them. So I'll make the presumption that you have images (or dockerfiles) already for these containers. However you can use --no-recreate with docker-compose up to re-use already built containers, this could be a workaround if you have a problem with the regular usage.
Under services in your docker-compose.yml you simply need to define a service for your other two images too. Example below.
services:
service1:
image: images/service1image
depends_on:
- service2
- service3
service2:
image: images/service2image
service3:
build: docker/service3
You don't need to define a network as terrywb suggested if you define them this way. docker-compose automatically creates a bridge network for general use and maps ports for all defined services between them. However if you don't define them as a service then you'd likely need to define a network to connect them, of course if you do this then you won't be able to automatically start them up at the same time using docker-compose which is the whole issue you're trying to solve. If you really don't want them as services then I can only suggest you create a "startup.sh" bash/shell file to handle this. As then you'd be trying to do something outside of the functionality scope docker-compose provides.
I believe that you need to define a network in each of your docker-compose files.
networks:
mynet:
Next, add that network to each container definition
myservice:
networks:
- mynet

Docker-Compose: how to wait for other service to be ready?

I have the following docker-compose, where I need to wait for the service jhipster-registry to be up and accepting connections before starting myprogram-app.
I tried the healtcheck way, following the official doc https://docs.docker.com/compose/compose-file/compose-file-v2/
version: '2.1'
services:
myprogram-app:
image: myprogram
mem_limit: 1024m
environment:
- SPRING_PROFILES_ACTIVE=prod,swagger
- EUREKA_CLIENT_SERVICE_URL_DEFAULTZONE=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/eureka
- SPRING_CLOUD_CONFIG_URI=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/config
- SPRING_DATASOURCE_URL=jdbc:postgresql://myprogram-postgresql:5432/myprogram
- JHIPSTER_SLEEP=0
- SPRING_DATA_ELASTICSEARCH_CLUSTER_NODES=myprogram-elasticsearch:9300
- JHIPSTER_REGISTRY_PASSWORD=53bqDrurQAthqrXG
- EMAIL_USERNAME
- EMAIL_PASSWORD
ports:
- 8080:8080
networks:
- backend
depends_on:
- jhipster-registry:
"condition": service_started
- myprogram-postgresql
- myprogram-elasticsearch
myprogram-postgresql:
image: postgres:9.6.5
mem_limit: 256m
environment:
- POSTGRES_USER=myprogram
- POSTGRES_PASSWORD=myprogram
networks:
- backend
myprogram-elasticsearch:
image: elasticsearch:2.4.6
mem_limit: 512m
networks:
- backend
jhipster-registry:
extends:
file: jhipster-registry.yml
service: jhipster-registry
mem_limit: 512m
ports:
- 8761:8761
networks:
- backend
healthcheck:
test: "exit 0"
networks:
backend:
driver: "bridge"
but I get the following error when running docker-compose up:
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.myprogram-app.depends_on contains {"jhipster-registry": {"condition": "service_started"}}, which is an invalid type, it should be a string
Am I doing something wrong, or this feature is no more supported? How to achieve this sync between services?
Updated version
version: '2.1'
services:
myprogram-app:
image: myprogram
mem_limit: 1024m
environment:
- SPRING_PROFILES_ACTIVE=prod,swagger
- EUREKA_CLIENT_SERVICE_URL_DEFAULTZONE=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/eureka
- SPRING_CLOUD_CONFIG_URI=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/config
- SPRING_DATASOURCE_URL=jdbc:postgresql://myprogram-postgresql:5432/myprogram
- JHIPSTER_SLEEP=0
- SPRING_DATA_ELASTICSEARCH_CLUSTER_NODES=myprogram-elasticsearch:9300
- JHIPSTER_REGISTRY_PASSWORD=53bqDrurQAthqrXG
- EMAIL_USERNAME
- EMAIL_PASSWORD
ports:
- 8080:8080
networks:
- backend
depends_on:
jhipster-registry:
condition: service_healthy
myprogram-postgresql:
condition: service_started
myprogram-elasticsearch:
condition: service_started
#restart: on-failure
myprogram-postgresql:
image: postgres:9.6.5
mem_limit: 256m
environment:
- POSTGRES_USER=myprogram
- POSTGRES_PASSWORD=tuenemreh
networks:
- backend
myprogram-elasticsearch:
image: elasticsearch:2.4.6
mem_limit: 512m
networks:
- backend
jhipster-registry:
extends:
file: jhipster-registry.yml
service: jhipster-registry
mem_limit: 512m
ports:
- 8761:8761
networks:
- backend
healthcheck:
test: ["CMD", "curl", "-f", "http://jhipster-registry:8761", "|| exit 1"]
interval: 30s
retries: 20
#start_period: 30s
networks:
backend:
driver: "bridge"
The updated version gives me a different error,
ERROR: for myprogram-app Container "8ebca614590c" is unhealthy.
ERROR: Encountered errors while bringing up the project.
saying that the container of jhipster-registry is unhealthy, but it's reachable via browser. How can I fix the command in the healthcheck to make it work?
Best Approach - Resilient App Starts
While docker does support startup dependencies, they officially recommend updating your app start logic to test for the availability of external dependencies and retry. This has lots of benefits for robust applications that may restart in the wild on the fly in addition to circumventing the race condition in docker compose up
depends_on & service_healthy - Compose 1.27.0+
depends_on is back in docker compose v1.27.0+ (was deprecated in v3) in the Compose Specification
Each service should also implement a service_healthy check to be able to report if it's fully setup and ready for downstream dependencies.
version: '3.0'
services:
php:
build:
context: .
dockerfile: tests/Docker/Dockerfile-PHP
depends_on:
redis:
condition: service_healthy
redis:
image: redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 1s
timeout: 3s
retries: 30
wait-for-it.sh
The recommended approach from docker according to their docs on Control startup and shutdown order in Compose is to download wait-for-it.sh which takes in the domain:port to poll and then executes the next set of commands if successful.
version: "2"
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
db:
image: postgres
Note: This requires overriding the startup command of the image, so make sure you know what wanted to pass to maintain parity of the default startup.
Further Reading
Docker Compose wait for container X before starting Y
Difference between links and depends_on in docker_compose.yml
How can I wait for a docker container to be up and running?
Docker Compose Wait til dependency container is fully up before launching
depends_on doesn't wait for another service in docker-compose 1.22.0
The documentation suggests that, in Docker Compose version 2 files specifically, depends_on: can be a list of strings, or a mapping where the keys are service names and the values are conditions. For the services where you don't have (or need) health checks, there is a service_started condition.
depends_on:
# notice: these lines don't start with "-"
jhipster-registry:
condition: service_healthy
myprogram-postgresql:
condition: service_started
myprogram-elasticsearch:
condition: service_started
Depending on how much control you have over your program and its libraries, it's better still if you can arrange for the service to be able to start without its dependencies necessarily being available (equivalently, to function if its dependencies die while the service is running), and not use the depends_on: option. You might return an HTTP 503 Service Unavailable error if the database is down, for instance. Another strategy that often is helpful is to immediately exit if your dependencies aren't available but use a setting like restart: on-error to ask the orchestrator to restart the service.
Update to version 3+.
Please follow the documents from version 3:
There are several things to be aware of when using depends_on:
depends_on does not wait for db and redis to be “ready” before
starting web - only until they have been started. If you need to wait
for a service to be ready, see Controlling startup order for more on
this problem and strategies for solving it. Version 3 no longer
supports the condition form of depends_on.
The depends_on option is
ignored when deploying a stack in swarm mode with a version 3 Compose
file.
I would consider using the restart_policy option for configuring your myprogram-app to restart until the jhipster-registry is up and accepting connections:
restart_policy:
condition: on-failure
delay: 3s
max_attempts: 5
window: 60s
With the new docker compose API, we can now use the new --wait option:
docker compose up --wait
If your service has a healthcheck, Docker waits until it has the "healthy" status; otherwise, it waits for the service to be started. That's why it is crucial to have relevant healthchecks for all your services.
Note that this option automatically activate the --detach option.
Check out the documentation here.
The best approach I found is to check for the desired port in the entrypoint. There are different ways to do that e.g. wait-for-it but I like to use this solution that is cross-platform between apline and bash images and doesn't download custom scripts from GitHub:
Install netcat-openbsd (works with apt and apk). Then in the entrypoint (works with both #!/bin/bash and #!/bin/sh):
#!/bin/bash
wait_for()
{
echo "Waiting $1 seconds for $2:$3"
timeout $1 sh -c 'until nc -z $0 $1; do sleep 0.1; done' $2 $3 || return 1
echo "$2:$3 available"
}
wait_for 10 db 5432
wait_for 10 redis 6379
You can also make this into a 1-liner if you don't want to print anything.
Although you already got an answer, it should be mentioned that what you are trying to achieve have some nasty risks.
Ideally a service should be self sufficient and smart enough to retry and await for dependencies to be available (before a going down). Otherwise you will be more exposed to one failure propagating to other services. Also consider that a system reboot, unlike a manual start might ignore the dependencies order.
If one service crash causes all your system to go down, you might have a tool to restart everything again, but it would be better having services that resist that case.
After trying several approaches, IMO the simplest and most elegant option is using the jwilder/dockerize (dockerize) utility image with its -wait flag. Here is a simple example where I need a PostgreSQL database to be ready before starting my app:
version: "3.8"
services:
# Start Postgres.
db:
image: postgres
# Wait for Postgres to be joinable.
check-db-started:
image: jwilder/dockerize:0.6.1
depends_on:
- db
command: 'dockerize -wait=tcp://db:5432'
# Only start myapp once Postgres is joinable.
myapp:
image: myapp:latest
depends_on:
- check-db-started

What is the alternative to condition form of depends_on in docker-compose Version 3?

docker-compose 2.1 offers the nice feature to specify a condition with depends_on. The current docker-compose documentation states:
Version 3 no longer supports the condition form of depends_on.
Unfortunately the documentation does not explain, why the condition form was removed and is lacking any specific recommondation on how to implement that behaviour using V3 upwards.
There's been a move away from specifying container dependencies in compose. They're only valid at startup time and don't work when dependent containers are restarted at run time. Instead, each container should include mechanism to retry to reconnect to dependent services when the connection is dropped. Many libraries to connect to databases or REST API services have configurable built-in retries. I'd look into that. It is needed for production code anyway.
From 1.27.0, 2.x and 3.x are merged with COMPOSE_SPEC schema.
version is now optional. So, you can just remove it and specify a condition as before:
services:
web:
build: .
depends_on:
redis:
condition: service_healthy
redis:
image: redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 1s
timeout: 3s
retries: 30
There are some external tools that let you mimic this behaviour. For example, with the dockerize tool you can wrap your CMD or ENTRYPOINT with dockerize -wait and that will prevent running your application until specified services are ready.
If your docker-compose file used to look like this:
version: '2.1'
services:
kafka:
image: spotify/kafka
healthcheck:
test: nc -z localhost 9092
webapp:
image: foo/bar # your image
healthcheck:
test: curl -f http://localhost:8080
tests:
image: bar/foo # your image
command: YOUR_TEST_COMMAND
depends_on:
kafka:
condition: service_healthy
webapp:
condition: service_healthy
then you can use dockerize in your v3 compose file like this:
version: '3.0'
services:
kafka:
image: spotify/kafka
webapp:
image: foo/bar # your image
tests:
image: bar/foo # your image
command: dockerize -wait tcp://kafka:9092 -wait web://webapp:8080 YOUR_TEST_COMMAND
Just thought I'd add my solution for when running postgres and an application via docker-compose where I need the application to wait for the init sql script to complete before starting.
dockerize seems to wait for the db port to be available (port 5432) which is the equivilant of depends_on which can be used in docker 3:
version: '3'
services:
app:
container_name: back-end
depends_on:
- postgres
postgres:
image: postgres:10-alpine
container_name: postgres
ports:
- "5432:5432"
volumes:
- ./docker-init:/docker-entrypoint-initdb.d/
The Problem:
If you have a large init script the app will start before that completes as the depends_on only waits for the db port.
Although I do agree that the solution should be implemented in the application logic, the problem we have is only for when we want to run tests and prepopulate the database with test data so it made more sense to implement a solution outside the code as I tend not like introducing code "to make tests work"
The Solution:
Implement a healthcheck on the postgres container.
For me that meant checking the command of pid 1 is postgres as it will be running a different command on pid 1 while the init db scripts are running
Write a script on the application side which will wait for postgres to become healthy. The script looks like this:
#!/bin/bash
function check {
STATUS=\`curl -s --unix-socket /var/run/docker.sock http:/v1.24/containers/postgres/json | python -c 'import sys, json; print json.load('sys.stdin')["State"]["Health"]["Status"]'\`
if [ "$STATUS" = "healthy" ]; then
return 0
fi
return 1
}
until check; do
echo "Waiting for postgres to be ready"
sleep 5
done
echo "Postgres ready"
Then the docker-compose should mount the directories of the scripts so that we don't edit the Dockerfile for the application and if we're using a custom postgres image, this way we can continue to use the docker files for your published images.
We're also overriding the entry point defined in the docker file of the app so that we can run the wait script before the app starts
version: '3'
services:
app:
container_name: back-end
entrypoint: ["/bin/sh","-c","/opt/app/wait/wait-for-postgres.sh && <YOUR_APP_START_SCRIPT>"]
depends_on:
- postgres
volumes:
- //var/run/docker.sock:/var/run/docker.sock
- ./docker-scripts/wait-for-postgres:/opt/app/wait
postgres:
image: postgres:10-alpine
container_name: postgres
ports:
- "5432:5432"
volumes:
- ./docker-init:/docker-entrypoint-initdb.d/
- ./docker-scripts/postgres-healthcheck:/var/lib
healthcheck:
test: /var/lib/healthcheck.sh
interval: 5s
timeout: 5s
retries: 10
I reached this page because one container would not wait for the one depending upon and I had to run a docker system prune to get it working. There was an orphaned container error that prompted me to run the prune.

Resources