We are using Airflow for job scheduling. To run our code in an isolated environment, we have Airflow spawn jobs via the BashOperator and docker-compose. So each airflow task creates a container. This works well but we noticed a problem with jobs that are prematurely terminated.
When a job is, for example, set to failed in the airflow web interface, the log indicates that a SIGTERM was sent and that the process no longer exists. Airflow and its components are themselves running in docker containers (see docker-compose below). Looking at the Airflow-worker container, we see that the process indeed has been killed.
However, when looking at docker ps, we see that the docker container is actually still alive! Obviously this is very dangerous, as potentially long-running tasks that are no longer running according to the airflow GUI are in fact eating up resources.
Any ideas on how to deal with this?
Airflow Version 2.4.3
We use this docker-compose.yml to deploy the different airflow components https://airflow.apache.org/docs/apache-airflow/2.4.3/docker-compose.yaml
We found a solution for our problem.
There is now a DockerOperator available in Airflow as a provider which handles stopping and removing docker containers.
Main code:
https://github.com/apache/airflow/blob/main/airflow/providers/docker/operators/docker.py
Example:
https://github.com/apache/airflow/blob/providers-docker/3.0.0/tests/system/providers/docker/example_docker_copy_data.py
Related
I'm using Docker to run a java REST service in a container. If I were outside of a container then I might use a process manager/supervisor to ensures that the java service restarts if it encounters a strange one-off error. I see some posts about using supervisord inside of containers but it seems like they're focused mostly on running multiple services, rather than just keeping one up.
What is the common way of managing services that run in containers? Should I just be using some built in Docker stuff on the container itself rather than trying to include a process manager?
You should not use a process supervisor inside your Docker container for a single-service container. Using a process supervisor effectively hides the health of your service, making it more difficult to detect when you have a problem.
You should rely on your container orchestration layer (which may be Docker itself, or a higher level tool like Docker Swarm or Kubernetes) to restart the container if the service fails.
With Docker (or Docker Swarm), this means setting a restart policy on the container.
I am new to docker containers but we have containers being deployed and due to some internal application network bugs the process running in the container hangs and the docker container is not terminated. While we debug this issue I would like a way to find all those containers and setup a cron job to periodically check and kill those relevant containers.
So how would I determine from "docker ps -a" which containers should be dropped and how would I go about it? Any ideas? We are eventually moving to kubernetes which will help with these issues.
Docker already have a command to cleanup the docker environment, you can use it manually or maybe setup a job to run the following command:
$ docker system prune
Remove all unused containers, networks, images (both dangling and
unreferenced), and optionally, volumes.
refer to the documentation for more details on advanced usage.
I am running Airflow in docker container. I have created a separate container to run Postgres server and Rabbitmq server, connected these containers using docker network - by following this nice article. Now my Airflow docker container is running and connected to other containers using docker network - the process went smooth so far. The problem is how to run airflow webserver, airflow scheduler and airflow worker in the same container. After some research I found: it is recommended to run one service in one container. Now I have two solutions
Run multiple services in the same Airflow container - which I could not figure out a simple way implement being a new bee in Docker.
Create separate containers to run Celery worker and Airflow scheduler - but in the airflow.cfg file the setting related to Celery are: broker_url = 'amqp://guest:guest#ksaprice_rabbitmq:8080//', celery_result_backend = db+postgresql://developer:user889#ksaprice_postgres:5432/airflow. These settings refer to either database or rabbitmq which are already running different containers - they do not refer to ip/url which runs celery and scheduler and I assuming it is this way because celery and scheduler runs on the airflow server.
My questions are:
Reffering to point 1: Is there a simple way to run airflow webserver, airflow scheduler and airflow worker commands in the same Airflow container?
Refferring to point 2: Is there a way in airflow.cfg to configure airflow scheduler and airflow worker to run in separate docker containers - and link them using docker network?
I am new bee to Airflow and Docker.
After spending lot of time I found the following answers:
For the first question:
To run multiple services on the same airflow_container do: docker exec -it airflow_container bash, now CLI will be attached to airflow_container then run airflow worker. Repeat the same process for airflow scheduler and airflow flower. Now you will have three different CLIs running three services on the same airflow_container - this is the simplest way I found .
For the second question: There are options here:airflow cli like airflow webserver --hostname=some_host --port=some_port and airflow flower --hostname=some_host --port=some_port to run them on different severs. But for airflow worker there are no options to run on different server - may be there is some other way to run worker on different server.
1- I did install all of these so its possible
2- optimized way is to install airflow(webserver) + backend-DB(Mysql) on one server, queuing(RabbitMQ) on another and Celery part on other set of servers.
bellow I will mention something from the source and can be helpful to clarify things a bit better:
CeleryExecutor is one of the ways you can scale out the number of
workers. For this to work, you need to setup a Celery backend
(RabbitMQ, Redis, …) and change your airflow.cfg to point the executor
parameter to CeleryExecutor and provide the related Celery settings.
Here are a few imperative requirements for your workers:
airflow needs to be installed, and the CLI needs to be in the path
Airflow configuration settings should be homogeneous across the
cluster
Operators that are executed on the worker need to have their
dependencies met in that context. For example, if you use the
HiveOperator, the hive CLI needs to be installed on that box, or if
you use the MySqlOperator, the required Python library needs to be
available in the PYTHONPATH somehow
The worker needs to have access to its DAGS_FOLDER, and you need to
synchronize the filesystems by your own means. A common setup would be
to store your DAGS_FOLDER in a Git repository and sync it across
machines using Chef, Puppet, Ansible, or whatever you use to configure
machines in your environment. If all your boxes have a common mount
point, having your pipelines files shared there should work as well
Source: https://airflow.readthedocs.io/en/1.10.6/howto/executor/use-celery.html
Whenever I execute
docker-compose start
docker-compose ps
I see my containers with the state "UP". If I do
docker-compose up -d
I will see more verbose but it will have the same state. Is there any difference between both commands?
docker-compose start
(https://docs.docker.com/compose/reference/start/)
Starts existing containers for a service.
docker-compose up
(https://docs.docker.com/compose/reference/up/)
Builds, (re)creates, starts, and attaches to containers for a service.
Unless they are already running, this command also starts any linked services.
The docker-compose up command aggregates the output of each container
(essentially running docker-compose logs -f). When the command exits,
all containers are stopped. Running docker-compose up -d starts the
containers in the background and leaves them running.
If there are existing containers for a service, and the service’s
configuration or image was changed after the container’s creation,
docker-compose up picks up the changes by stopping and recreating the
containers (preserving mounted volumes). To prevent Compose from
picking up changes, use the --no-recreate flag.
For the complete CLI reference:
https://docs.docker.com/compose/reference/
In docker Frequently asked questions this is explained very clearly:
What’s the difference between up, run, and start?
Typically, you want docker-compose up. Use up to start or restart
all the services defined in a docker-compose.yml. In the default
“attached” mode, you see all the logs from all the containers. In
“detached” mode (-d), Compose exits after starting the containers, but
the containers continue to run in the background.
The docker-compose run command is for running “one-off” or “adhoc”
tasks. It requires the service name you want to run and only starts
containers for services that the running service depends on. Use run
to run tests or perform an administrative task such as removing or
adding data to a data volume container. The run command acts like
docker run -ti in that it opens an interactive terminal to the
container and returns an exit status matching the exit status of the
process in the container.
The docker-compose start command is useful only to restart containers
that were previously created, but were stopped. It never creates new
containers.
What is the difference between up, run and start in Docker Compose?
docker-compose up: Builds, (re)creates, and starts containers. It also attaches to containers for a service.
docker-compose run: Run one-off or ad-hoc tasks based on the business requirements. Here, the service name has to be provided and the docker starts only that specific service and also the other services to which the target service is dependent (if any). It is helpful for testing the containers and also for performing tasks
docker-compose start: Start the stopped containers, can't create new ones.
Brand spanking new to Docker here. I have Docker running on a remote VM and am running a single dummy container on it (I can verify the container is running by issuing a docker ps command).
I'd like to secure my Docker installation by giving the docker user non-root access:
sudo usermod -aG docker myuser
But I'm afraid to muck around with Docker while any containers are running in case "hot deploys" create problems. So this has me wondering, in general: if I want to do any sort of operational work on Docker (daemon, I presume) while there are live containers running on it, what do I have to do? Do all containers need to be stopped/halted first? Or will Docker keep on ticking and apply the updates when appropriate?
Same goes for the containers themselves. Say I have a myapp-1.0.4 container deployed to a Docker daemon. Now I want to deploy myapp-1.0.5, how does this work? Do I stop 1.0.4, remove it from Docker, and then deploy/run 1.0.5? Or does Docker handle this for me under the hood?
if I want to do any sort of operational work on Docker (daemon, I presume) while there are live containers running on it, what do I have to do? Do all containers need to be stopped/halted first? Or will Docker keep on ticking and apply the updates when appropriate?
Usually, all containers are stopped first.
That happen typically when I upgrade docker itself: I find all my container stopped (except the data containers, which are just created, and remain so)
Say I have a myapp-1.0.4 container deployed to a Docker daemon. Now I want to deploy myapp-1.0.5, how does this work? Do I stop 1.0.4, remove it from Docker, and then deploy/run 1.0.5? Or does Docker handle this for me under the hood?
That depend on the nature and requirements of your app: for a completely stateless app, you could even run 1.0.5 (with different host ports mapped to your app exposed port), test it a bit, and stop 1.0.4 when you think 1.0.5 is ready.
But for an app with any kind of shared state or resource (mounted volumes, shared data container, ...), you would need to stop and rm 1.0.4 before starting the new container from 1.0.5 image.
(1) why don't you stop them [the data containers] when upgrading Docker?
Because... they were never started in the first place.
In the lifecycle of a container, you can create, then start, then run a container. But a data container, by definition, has no process to run: it just exposes VOLUME(S), for other container to mount (--volumes-from)
(2) What's the difference between a data/volume container, and a Docker container running, say a full bore MySQL server?
The difference is, again, that a data container doesn't run any process, so it doesn't exit when said process stops. That never happens, since there is no process to run.
The MySQL server container would be running as long as the server process doesn't stop.