After installing puckel/docker-airflow locally, no task instance is running and tasks get stuck forever - docker

I used this tutorial to install on my local Mac airflow with docker : http://www.marknagelberg.com/getting-started-with-airflow-using-docker/ and everything worked well. I have the UI and I can connect my dags.
However, when I trigger manually my task it is not running and I get this error message.
My task on the web UI: .
I work on a Mac and I have used this code :
docker pull puckel/docker-airflow
docker run -d -p 8080:8080 -v /path/to/dags:/usr/local/airflow/dags puckel/docker-airflow webserver
Does someone have an idea on how I could fix this ? Thanks for your help

is the airflow scheduler running?
The airflow webserver can only show the dags & task status. The scheduler run the tasks accordingly.
for the command your showed above, there is no call for airflow scheduler.
So, you can run below command in another console.
docker ps |grep airflow
Use above command to get the container id.
docker exec -it [container ID] airflow scheduler
For the ultimate way, I suugested to use docker-compose

Instead of docker, using docker-compose to manage all you docker stack related case.
Here is the sample code for my puckel/docker-airflow based airflow
version: '3'
services:
postgres:
image: 'postgres:12'
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
volumes:
- ./pg_data:/var/lib/postgresql/data
webserver:
image: puckel/docker-airflow:1.10.9
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:airflow#postgres/airflow
volumes:
- ./dags:/usr/local/airflow/dags
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
To use it, You can
1- created a project folder. copy above reference code into
docker-compose.yml
2- check if configuration is right by following docker-compose command
docker-compose config
3- enabled the docker-compse project by:
docker-compose up
Note: if you do not want to see detail logs, you can run it in backgroud by:
docker-compose up -d
Now, you can enjoy airflow UI in you browser. by following url
http://<the host ip>:8080
if you like above answer, pls vote it up.
Good luck
WY

Related

Can't Get Rid of Old Docker-Compose Services

Two weeks ago I created a docker-compose.yml file to start two services, but this week when I try to start those services Docker appends a "-1" to the service name. I am using Docker Desktop on a Windows 10 machine. Here is my yml file:
services:
pgdatabase:
image: postgres:13
environment:
- POSTGRES_USER=####
- POSTGRES_PASSWORD=####
- POSTGRES_DB=ny_taxi
volumes:
- "./ny_taxi_postgres_data:/var/lib/postgresql/data:rw"
ports:
- "5432:5432"
pgadmin:
image: dpage/pgadmin4
environment:
- PGADMIN_DEFAULT_EMAIL=#########.com
- PGADMIN_DEFAULT_PASSWORD=####
ports:
- "8080:80"
This worked perfectly when I created it, but now when I run docker-compose up the containers that get created are pgadmin-1 and pgdatabase-1.
If I then run docker-compose down, and do a docker ps the output shows that no containers are running. However, if I run docker-compose config --services I get the following:
pgadmin
pgdatabase
Restarting Docker does nothing, and the issue occurs even if I delete all containers and all volumes from Docker Desktop.
docker-compose start returns service "pgadmin" has no container to start. If I run docker-compose up and then docker-compose start pgadmin I get no output from the command line. However, listing the active containers after doing this still only shows pgadmin-1. Running docker-compose down after these steps does not resolve the issue.
docker rm -f pgadmin returns Error: No such container: pgadmin.
docker service rm pgadmin returns Error: No such service: pgadmin.
docker-compose up -d --force-recreate --renew-anon-volumes just creates pgadmin-1 and pgdatabase-1 again.

Docker-compose how to update celery without rebuild?

I am working on my django + celery + docker-compose project.
Problem
I changed django code
Update is working only after docker-compose up --build
How can I enable code update without rebuild?
I found this answer Developing with celery and docker but didn't understand how to apply it
docker-compose.yml
version: '3.9'
services:
django:
build: ./project # path to Dockerfile
command: sh -c "
gunicorn --bind 0.0.0.0:8000 core_app.wsgi"
volumes:
- ./project:/project
- ./project/static:/project/static
- media-volume:/project/media
expose:
- 8000
celery:
build: ./project
command: celery -A documents_app worker --loglevel=info
volumes:
- ./project:/usr/src/app
- media-volume:/project/media
depends_on:
- django
- redis
.........
volumes:
pg_data:
static:
media-volume:
Code update without rebuild is achievable and best practice when working with containers otherwise it takes too much time and effort creating a new image every time you change the code.
The most popular way of doing this is to mount your code directory into the container using one of the two methods below.
In your docker-compose.yml
services:
web:
volumes:
- ./codedir:/app/codedir # while 'codedir' is your code directory
In CLI starting a new container
$ docker run -it --mount "type=bind,source=$(pwd)/codedir,target=/app/codedir" celery bash
So you're effectively mounting the directory that your code lives in on your computer inside of the /opt/ dir of the Celery container. Now you can change your code and...
the local directory overwrites the one from the image when the container is started. You only need to build the image once and use it until the installed dependencies or OS-level package versions need to be changed. Not every time your code is modified. - Quoted from this awesome article

docker-compose up not starting service

I am trying to get a simple docker-compose file working on windows.
version: "2"
volumes:
db_data: {}
services:
db:
image: mariadb
environment:
MYSQL_ROOT_PASSWORD: test123
volumes:
- db_data:/var/lib/mysql/data
I need to persist the db data. I've created the directory db_data and I found this solution from a github issue: https://github.com/docker-library/mysql/issues/69. I had previously been using mysql 5.6. I'm simply running
docker-compose up -d
when I check
docker ps
I do not get any running processes. Any help with this would be greatly appreciated. I've added the output from running the command below:
PS D:\test-exercise> docker-compose up -d
Starting test-exercise_db_1 ... done

Airflow: how to run webserver and scheduler together from a docker image?

I'm somewhat inexperienced with both Docker and Airflow, so this might be a silly question. I have a Dockerfile that uses the apache/airflow image together with some of my own DAGs. I would like to launch the airflow web server together with the scheduler and I'm having trouble with this. I can get it working, but I feel that I'm approaching this incorrectly.
Here is what my Dockerfile looks like:
FROM apache/airflow
COPY airflow/dags/ /opt/airflow/dags/
RUN airflow initdb
Then I run docker build -t learning/airflow .. Here is the tough part: I then run docker run --rm -tp 8080:8080 learning/airflow:latest webserver and in a separate terminal I run docker exec `docker ps -q` airflow scheduler. The trouble is, that in practice this generally happens on a VM somewhere, so opening up a second terminal is just not an option and multiple machines will probably not have access to the same docker container. Running webserver && scheduler does not seem to work, the server appears to be blocking and I'm still seeing the message "The scheduler does not appear to be running" in Airflow UI.
Any ideas on what the right way to run server and scheduler should be?
Many thanks!
First, thanks to #Alex and #abestrad for suggesting docker-compose here -- I think this is the best solution. I finally managed to get it working by referring to this great post. So here is my solution:
First, my Dockerfile looks like this now:
FROM apache/airflow
RUN pip install --upgrade pip
RUN pip install --user psycopg2-binary
COPY airflow/airflow.cfg /opt/airflow/
Note that I'm no longer copying dags to the VM, this information is going to be passed through volumes. I then build the docker file via docker build -t learning/airflow .. My docker-compose.yaml looks like this:
version: "3"
services:
postgres:
image: "postgres:9.6"
container_name: "postgres"
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
volumes:
- ./data/postgres:/var/lib/postgresql/data
initdb:
image: learning/airflow
entrypoint: airflow initdb
depends_on:
- postgres
webserver:
image: learning/airflow
restart: always
entrypoint: airflow webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
ports:
- "8080:8080"
depends_on:
- postgres
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./airflow/plugins:/opt/airflow/plugins
- ./data/logs:/opt/airflow/logs
scheduler:
image: learning/airflow
restart: always
entrypoint: airflow scheduler
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-scheduler.pid ]"]
interval: 30s
timeout: 30s
retries: 3
depends_on:
- postgres
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./airflow/plugins:/opt/airflow/plugins
- ./data/logs:/opt/airflow/logs
To use it, first run docker-compose up postgres, then docker-compose up initdb and then docker-compose up webserver scheduler. That's it!
spinning up two docker containers alone may not achieve your goal, as you would need communications between containers. You can manually set up a docker network between your containers, although I haven't tried this approach personally.
An easier way is to use docker-compose, which you can define your resources in a yml file, and let docker-compose create them for you.
version: '2.1'
services:
webserver:
image: puckel/docker-airflow:1.10.4
restart: always
...
scheduler:
image: puckel/docker-airflow:1.10.4
restart: always
depends_on:
- webserver
...
You can find the complete file here
Note: your question applies to any processes, not only Airflow
It's not recommended, of course, but you can find Docker documentation on supervisor which monitors and runs multiple processes under a single supervisord daemon
https://docs.docker.com/config/containers/multi-service_container/

Gitlab-CI backup lost by restarting Docker desktop

I have a docker desktop installed on my windows pc. In that, I have self-hosted gitlab on one docker container. Today I tried to back up my gitlab by typing the following command:
docker exec -t <my-container-name> gitlab-backup create
After running this command the backup was successful and saw a message that backup is done. I then restarted my docker desktop and I waited for the container to start when the container started I accessed the gitlab interface but I saw a new gitlab instance.
I then type the following command to restore my backup:
docker exec -it <my-container-name> gitlab-backup restore
But saw the message that:
No backups found in /var/opt/gitlab/backups
Please make sure that file name ends with _gitlab_backup.tar
What can be the reason am I doing it the wrong way because I saw these commands on gitlab official website.
I have this in the docker-compose.yml file:
version: "3.6"
services:
web:
image: 'gitlab/gitlab-ce'
container_name: 'gitlab'
restart: always
hostname: 'localhost'
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'http://localhost:9090'
gitlab_rails['gitlab_shell_ssh_port'] = 2224
networks:
- gitlab-network
ports:
- '80:80'
- '443:443'
- '9090:9090'
- '2224:22'
volumes:
- '/srv/gitlab/config:/etc/gitlab'
- '/srv/gitlab/logs:/var/log/gitlab'
- '/srv/gitlab/data:/var/opt/gitlab'
networks:
gitlab-network:
name: gitlab-network
I used this command to run the container:
docker-compose up --build --abort-on-container-exit
If you started your container using Volumes, try looking at C:\ProgramData\docker\volume for your backup.
The backup is normally located at: /var/opt/gitlab/backups within the container. So hopefully you mapped /var/opt/gitlab to either a volume or a bind mount.
Did you try supplying the name of the backup file, as for the omnibus install? When I've restored a backup in Docker, I basically use the omnibus instructions, but use docker exec to do it. Here are the commands I've used from my notes.
docker exec -it gitlab gitlab-ctl stop unicorn 
docker exec -it gitlab gitlab-ctl stop sidekiq 
docker exec -it gitlab gitlab-rake gitlab:backup:restore BACKUP=1541603057_2018_11_07_10.3.4
docker exec -it gitlab gitlab-ctl start 
docker exec -it gitlab gitlab-rake gitlab:check SANITIZE=true
It looks like they added a gitlab-backup command at some point, so you can probably use that instead of gitlab-rake.

Resources