Airflow settings to run Celery workers in different Docker container? - docker

I am running Airflow in docker container. I have created a separate container to run Postgres server and Rabbitmq server, connected these containers using docker network - by following this nice article. Now my Airflow docker container is running and connected to other containers using docker network - the process went smooth so far. The problem is how to run airflow webserver, airflow scheduler and airflow worker in the same container. After some research I found: it is recommended to run one service in one container. Now I have two solutions
Run multiple services in the same Airflow container - which I could not figure out a simple way implement being a new bee in Docker.
Create separate containers to run Celery worker and Airflow scheduler - but in the airflow.cfg file the setting related to Celery are: broker_url = 'amqp://guest:guest#ksaprice_rabbitmq:8080//', celery_result_backend = db+postgresql://developer:user889#ksaprice_postgres:5432/airflow. These settings refer to either database or rabbitmq which are already running different containers - they do not refer to ip/url which runs celery and scheduler and I assuming it is this way because celery and scheduler runs on the airflow server.
My questions are:
Reffering to point 1: Is there a simple way to run airflow webserver, airflow scheduler and airflow worker commands in the same Airflow container?
Refferring to point 2: Is there a way in airflow.cfg to configure airflow scheduler and airflow worker to run in separate docker containers - and link them using docker network?
I am new bee to Airflow and Docker.

After spending lot of time I found the following answers:
For the first question:
To run multiple services on the same airflow_container do: docker exec -it airflow_container bash, now CLI will be attached to airflow_container then run airflow worker. Repeat the same process for airflow scheduler and airflow flower. Now you will have three different CLIs running three services on the same airflow_container - this is the simplest way I found .
For the second question: There are options here:airflow cli like airflow webserver --hostname=some_host --port=some_port and airflow flower --hostname=some_host --port=some_port to run them on different severs. But for airflow worker there are no options to run on different server - may be there is some other way to run worker on different server.

1- I did install all of these so its possible
2- optimized way is to install airflow(webserver) + backend-DB(Mysql) on one server, queuing(RabbitMQ) on another and Celery part on other set of servers.
bellow I will mention something from the source and can be helpful to clarify things a bit better:
CeleryExecutor is one of the ways you can scale out the number of
workers. For this to work, you need to setup a Celery backend
(RabbitMQ, Redis, …) and change your airflow.cfg to point the executor
parameter to CeleryExecutor and provide the related Celery settings.
Here are a few imperative requirements for your workers:
airflow needs to be installed, and the CLI needs to be in the path
Airflow configuration settings should be homogeneous across the
cluster
Operators that are executed on the worker need to have their
dependencies met in that context. For example, if you use the
HiveOperator, the hive CLI needs to be installed on that box, or if
you use the MySqlOperator, the required Python library needs to be
available in the PYTHONPATH somehow
The worker needs to have access to its DAGS_FOLDER, and you need to
synchronize the filesystems by your own means. A common setup would be
to store your DAGS_FOLDER in a Git repository and sync it across
machines using Chef, Puppet, Ansible, or whatever you use to configure
machines in your environment. If all your boxes have a common mount
point, having your pipelines files shared there should work as well
Source: https://airflow.readthedocs.io/en/1.10.6/howto/executor/use-celery.html

Related

Airflow attempts to kill spawned docker containers but containers live on

We are using Airflow for job scheduling. To run our code in an isolated environment, we have Airflow spawn jobs via the BashOperator and docker-compose. So each airflow task creates a container. This works well but we noticed a problem with jobs that are prematurely terminated.
When a job is, for example, set to failed in the airflow web interface, the log indicates that a SIGTERM was sent and that the process no longer exists. Airflow and its components are themselves running in docker containers (see docker-compose below). Looking at the Airflow-worker container, we see that the process indeed has been killed.
However, when looking at docker ps, we see that the docker container is actually still alive! Obviously this is very dangerous, as potentially long-running tasks that are no longer running according to the airflow GUI are in fact eating up resources.
Any ideas on how to deal with this?
Airflow Version 2.4.3
We use this docker-compose.yml to deploy the different airflow components https://airflow.apache.org/docs/apache-airflow/2.4.3/docker-compose.yaml
We found a solution for our problem.
There is now a DockerOperator available in Airflow as a provider which handles stopping and removing docker containers.
Main code:
https://github.com/apache/airflow/blob/main/airflow/providers/docker/operators/docker.py
Example:
https://github.com/apache/airflow/blob/providers-docker/3.0.0/tests/system/providers/docker/example_docker_copy_data.py

Can we start the containers before building the multiple images with in a same docker-compose.yaml?

I did write docker-compose.yaml for two services named as MySQL and JasperReports. I wrote the docker files for each service. while executing the docker-compose using docker-compose up -d, first it starts building the images for both the services, after that, running the containers based on the mentioned dependency level. but I have the requirement that Mysql service image will have to build first and run the container after that jasper server image will have to build and start running the container. Does it possible using docker-compose? why because jasper server uses the MySQL port and host. so, how cloud I achieve this case?
As explained in the docker-compose docs in Control startup and shutdown order in Compose, you may write a custom shell script that waits for mysql to be ready (i.e. accepting connections), before starting the jasper service.

Docker swarm get deployment status

After running docker stack deploy to deploy some services to swarm is there a way to programmatically test if all containers started correctly?
The purpose would be to verify in a staging CI/CD pipeline that the containers are actually running and didn't fail on startup. Restart is disabled via restart_policy.
I was looking at docker stack services, is the replicas column useful for this purpose?
$ docker stack services --format "{{.ID}} {{.Replicas}}" my-stack-name
lxoksqmag0qb 0/1
ovqqnya8ato4 0/1
Yes, there are ways to do it, but it's manual and you'd have to be pretty comfortable with docker cli. Docker does not provide an easy built-in way to verify that docker stack deploy succeeded. There is an open issue about it.
Fortunately for us, community has created a few tools that implement docker's shortcomings in this regard. Some of the most notable ones:
https://github.com/issuu/sure-deploy
https://github.com/sudo-bmitch/docker-stack-wait
https://github.com/ubirak/docker-php
Issuu, authors of sure-deploy, have a very good article describing this issue.
Typically in CI/CD I see everyone using docker or docker-compose. A container runs the same in docker as it does docker swarm with respects to "does this container work by itself as intended".
That being said, if you still wanted to do integration testing in a multi-tier solution with swarm, you could do various things in automation. Note this would all be done on a single node swarm to make testing easier (docker events doesn't pull node events from all nodes, so tracking a single node is much easier for ci/cd):
Have something monitoring docker events, e.g. docker events -f service=<service-name> to ensure containers aren't dying.
always have healthchecks in your containers. They are the #1 way to ensure your app is healthy (at the container level) and you'll see them succeed or fail in docker events. You can put them in Dockerfiles, service create commands, and stack/compose files. Here's some great examples.
You could attach another container to the same network to test your services remotely 1-by-1 using tasks. with reverse DNS. This will avoid the VIP and let you talk to a specific replica(s).
You might get some stuff out of docker inspect <service-id or task-id>
Another solution might be to use docker service scale - it will not return until service is converged to specified amount of replicas or will timeout.
export STACK=devstack # swarm stack name
export SERVICE_APP=yourservice # service name
export SCALE_APP=2 # desired amount of replicas
docker stack deploy $STACK --with-registry-auth
docker service scale ${STACK}_${SERVICE_APP}=${SCALE_APP}
One drawback of that method is that you need to provide service names and their replica counts (but these can be extracted from compose spec file using jq).
Also, in my use case I had to specify timeout by prepending timeout command, i.e. timeout 60 docker service scale, because docker service scale was waiting its own timeout even if some containers failed, which could potentially slow down continuous delivery pipelines
References
Docker CLI: docker service scale
jq - command-line JSON processor
GNU Coreutils: timeout command
you can call this for every service. it returns when converged. (all ok)
docker service update STACK_SERVICENAME

Why does DataDog prefer the Docker-based Agent installation?

According to the DataDog Docker Integration Docs:
There are two ways to run the [DataDog] Agent: directly on each host, or within a docker-dd-agent container. We recommend the latter.
Why is a Docker-based agent installation preferred over just installing the DataDog agent directly as a service on the box that's running the Docker containers?
One of Dockers main features is portability and it makes sense to bind datadog into that environment. That way they are packaged and deployed together and you don't have the overhead of installing datadog manually everywhere you choose to deploy.
What they are also implying is that you should use docker-compose and turn your application / docker container into an multi-container Docker application, running your image(s) alongside the docker agent. Thus you will not need to write/build/run/manage a container via Dockerfile, but rather add the agent image to your docker-compose.yml along with its configuration. Starting your multi-container application will still be easy via:
docker-compose up
Its really convenient and gives you additional features like their autodiscovery service.

Recommended way to run a Docker Compose stack in production?

I have a couple of compose files (docker-compose.yml) describing a simple Django application (five containers, three images).
I want to run this stack in production - to have the whole stack begin on boot, and for containers to restart or be recreated if they crash. There aren't any volumes I care about and the containers won't hold any important state and can be recycled at will.
I haven't found much information on using specifically docker-compose in production in such a way. The documentation is helpful but doesn't mention anything about starting on boot, and I am using Amazon Linux so don't (currently) have access to Docker Machine. I'm used to using supervisord to babysit processes and ensure they start on boot up, but I don't think this is the way to do it with Docker containers, as they end up being ultimately supervised by the Docker daemon?
As a simple start I am thinking to just put restart: always on all my services and make an init script to do docker-compose up -d on boot. Is there a recommended way to manage a docker-compose stack in production in a robust way?
EDIT: I'm looking for a 'simple' way to run the equivalent of docker-compose up for my container stack in a robust way. I know upfront that all the containers declared in the stack can reside on the same machine; in this case I don't have need to orchestrate containers from the same stack across multiple instances, but that would be helpful to know as well.
Compose is a client tool, but when you run docker-compose up -d all the container options are sent to the Engine and stored. If you specify restart as always (or preferably unless-stopped to give you more flexibility) then you don't need run docker-compose up every time your host boots.
When the host starts, provided you have configured the Docker daemon to start on boot, Docker will start all the containers that are flagged to be restarted. So you only need to run docker-compose up -d once and Docker takes care of the rest.
As to orchestrating containers across multiple nodes in a Swarm - the preferred approach will be to use Distributed Application Bundles, but that's currently (as of Docker 1.12) experimental. You'll basically create a bundle from a local Compose file which represents your distributed system, and then deploy that remotely to a Swarm. Docker moves fast, so I would expect that functionality to be available soon.
You can find in their documentation more information about using docker-compose in production. But, as they mention, compose is primarily aimed at development and testing environments.
If you want to use your containers in production, I would suggest you to use a suitable tool to orchestrate containers, as Kubernetes.
If you can organize your Django application as a swarmkit service (docker 1.11+), you can orchestrate the execution of your application with Task.
Swarmkit has a restart policy (see swarmctl flags)
Restart Policies: The orchestration layer monitors tasks and reacts to failures based on the specified policy.
The operator can define restart conditions, delays and limits (maximum number of attempts in a given time window). SwarmKit can decide to restart a task on a different machine. This means that faulty nodes will gradually be drained of their tasks.
Even if your "cluster" has only one node, the orchestration layer will make sure your containers are always up and running.
You say that you use AWS so why don't you use ECS which is built for what you ask. You create an application which is the pack of your 5 containers. You will configure which and how many instances EC2 you want in your cluster.
You just have to convert your docker-compose.yml to the specific Dockerrun.aws.json which is not hard.
AWS will start your containers when you deploy and also restart them in case of crash

Resources