Can Airflow running in a Docker container access a local file? - docker

I am a newbie as far as both Airflow and Docker are concerned; to make things more complicated, I use Astronomer, and to make things worse, I run Airflow on Windows. (Not on a Unix subsystem - could not install Docker on Ubuntu 20.4). "astro dev start" breaks with an error, but in Docker Desktop I see, and can start, 3 Airflow-related containers. They see my DAGs just fine, but my DAGs don't see the local file system. Is thus unavoidable with the Airflow + Docker combo? (Seems like a big handicap; one can only use a file in the cloud).

In general, you can declare a volume at image runtime in Docker using the -v switch with your docker run command to mount a local folder on your host to a mount point in your container, and you can access that point from inside the container.
If you go on to use docker-compose up to orchestrate your containers, you can specify volumes in the docker-compose.yml file for your containers which configures the volumes for the containers that run.
In your case, the Astronomer docs here suggest it is possible to create a custom directive in the Astronomer docker-compose.override.yml file to mount the volumes in the Airflow containers created as part of your astro commands for your stack which should then be visible from your DAGs.

Related

using docker-compose on a kubernetes instance with jenkins - mounting empty volumes

I have a Jenkins instance setup using Googles Jenkins on Kubernetes solution. I have not changed any of the settings of the Kubernetes Pod.
When I trigger a new job I am successfully able to get everything up and running until the point of my tests.
My tests use docker-compose. First I make sure to install docker (1.5-1+b1) and docker-compose (1.8.0-2) on the instance (I know I can optimize this by using an image that already includes these, but I am still just in proof-of-concept).
When I run the docker-compose up command everything works and the services start their initialization scripts. However, the mounts are empty. I have verified that the files exist on the Jenkins slave, and the mount is created inside the docker service when I run docker-compose, however they are empty.
Some information:
In order to get around file permissions I am using /tmp as the Jenkins Workspace. I am using SCM to pull my files (successfully) and in the docker-compose file I specify version: '2' and the mount paths with absolute paths. The volume section of the service that fails looks like this:
volumes:
- /tmp/automation:/opt/automation
I changed the command that is run in the service to ls /opt/automation and the result is an empty directory.
What am I missing? I just want to mount a directory into my docker-compose service. This works perfectly from Windows, Ubuntu, and Centos devices. Why won't it work using the Kubernetes instance?
I found the reason it fails here:
A Docker container in a Docker container uses the parent HOST's Docker daemon and hence, any volumes that are mounted in the "docker-in-docker" case is still referenced from the HOST, and not from the Container.
Therefore, the actual path mounted from the Jenkins container "does not exist" in the HOST. Due to this, a new directory is created in the "docker-in-docker" container that is empty. Same thing applies when a directory is mounted to a new Docker container inside a Container.
So it seems like it will be impossible to mount something from the outer docker into the inner docker. And another solution must be found.

Using SMB shares as docker volumes

I'm new to docker and docker-compose.
I'm trying to run a service using docker-compose on my Raspberry PI. The data this service uses is stored on my NAS and is accessible via samba.
I'm currently using this bash script to launch the container:
sudo mount -t cifs -o user=test,password=test //192.168.0.60/test /mnt/test
docker-compose up --force-recreate -d
Where the docker-compose.yml file simply creates a container from an image and binds it's own local /home/test folder to the /mnt/test folder on the host.
This works perfectly fine, when launched from the script. However, I'd like the container to automatically restart when the host reboots, so I specified 'always' as restart policy. In the case of a reboot then, the container starts automatically without anyone mounting the remote folder, and the service will not work correctly as a result.
What would be the best approach to solve this issue? Should I use a volume driver to mount the remote share (I'm on an ARM architecture, so my choices are limited)? Is there a way to run a shell script on the host when starting the docker-compose process? Should I mount the remote folder from inside the container?
Thanks
What would be the best approach to solve this issue?
As #Frap suggested, use systemd units to manage the mount and the service and the dependencies between them.
This document discusses how you could set up a Samba mount as a systemd unit. Under Raspbian, it should look something like:
[Unit]
Description=Mount Share at boot
After=network-online.target
Before=docker.service
RequiredBy=docker.service
[Mount]
What=//192.168.0.60/test
Where=/mnt/test
Options=credentials=/etc/samba/creds/myshare,rw
Type=cifs
TimeoutSec=30
[Install]
WantedBy=multi-user.target
Place this in /etc/systemd/system/mnt-test.mount, and then:
systemctl enable mnt-test.mount
systemctl start mnt-test.mount
The After=network-online.target line should cause systemd to wait until the network is available before trying to access this share. The Before=docker.service line will cause systemd to only launch docker after this share has been mounted. The RequiredBy=docker.service means that if you start docker.service, this share will be mounted first (if it wasn't already), and that if the mount fails, docker will not start.
This is using a credentials file rather than specifying the username/password in the unit itself; a credentials file would look like:
username=test
password=test
You could just replace the credentials option with username= and password=.
Should I mount the remote folder from inside the container?
A standard Docker container can't mount filesystems. You can create a privileged container (by adding --privileged to the docker run command line), but that's generally a bad idea (because that container now has unrestricted root access to your host).
I finally "solved" my own issue by defining a script to run in the /etc/rc.local file. It will launch the mount and docker-compose up commands on every reboot.
Being just 2 lines of code and not dependent on any particular Unix flavor, it felt to me like the most portable solution, barring a docker-only solution that I was unable to find.
Thanks all for the answers

How can I provide application config to my .NET Core Web API services running in docker containers?

I am using Docker to deploy my ASP.NET Core Web API microservices, and am looking at the options for injecting configuration into each container. The standard way of using an appsettings.json file in the application root directory is not ideal, because as far as I can see, that means building the file into my docker images, which would then limit which environment the image could run in.
I want to build an image once which can they be provided configuration at runtime and rolled through the dev, test UAT and into Production without creating an image for each environment.
Options seem to be:
Providing config via environment variables. Seems a bit tedious.
Somehow mapping a path in the container to a standard location on the host server where appsettings.json sits, and getting the service to pick this up (how?)
May be possible to provide values on the docker run command line?
Does anyone have experience with this? Could you provide code samples/directions, particularly on option 2) which seems the best at the moment?
It's possible to create data volumes in the docker image/container. And also mount a host directory into a container. The host directory will then by accessible inside the container.
Adding a data volume
You can add a data volume to a container using the -v flag with the docker create and docker run command.
$ docker run -d -P --name web -v /webapp training/webapp python app.py
This will create a new volume inside a container at /webapp.
Mount a host directory as a data volume
In addition to creating a volume using the -v flag you can also mount a directory from your Docker engine’s host into a container.
$ docker run -d -P --name web -v /src/webapp:/webapp training/webapp python app.py
This command mounts the host directory, /src/webapp, into the container at /webapp.
Refer to the Docker Data Volumes
We are using other packaging system for now (not docker itself), but still have same issue - package can be deployed in any environment.
So, the way we are doing it now:
Use External configuration management system to hold and manage configuration per environment
Inject to our package the basic environment variables to hold the configuration management system connection details
This way we are not only allowing the package to run in almost any "known" environment, but also run-time configuration management.
When you are running docker, you can use environment variable options of the run command:
$ docker run -e "deep=purple" ...

Dealing with data in Docker Containers with Gitlab-Ci

So I am using gitlab-ci to deploy my websites in docker containers, because the gitlab-ci docker runner doesn't seem to do what I want to do I am using the shell executor and let it run docker-compose up -d. Here comes the problem.
I have 2 volumes in my docker-container. ./:/var/www/html/ (which is the content of my git repo, so files I want to replace on build) and a mount that is "inside" of this mount /srv/data:/var/www/html/software/permdata (which is a persistent mount on my server).
When the gitlab-ci runner starts it tries to remove all files while the container is running, but because of this mount in mount it gets a device busy and aborts. So I have to manually stop and remove the container before I can run my build (which kind of defeats the point of build automation).
Options I thought about to fix this problem:
stop and remove the container before gitlab-ci-multi-runner starts (seems not possible)
add the git data to my docker container and only mount my permdata (seems like you can't add data to a container without the volume option with docker compose like you can in a Dockerfile)
Option 2 would be ideal because then it would also sort out my issues with permissions on the files.
Maybe someone has gone through the same problem and could give me an advice
seems like you can't add data to a container without the volume option with docker compose like you can in a Dockerfile
That's correct. The Compose file is not meant to replace the Dockerfile, it's meant to run multiple images for an application or project.
You can modify the Dockerfile to copy in the git files.

Docker exec command not using the mounted directory for /

I am new to docker containers and I and am trying to solve a problem I am facing right now.
These are my understanding based on limited knowledge.
When we create a docker container, Docker creates a local mount and use it as the root file system for the docker container.
Now, if I run any commands in the container from the host server using docker exec the docker is not using the mounted partition as the / file system for the container. I mean, it still pics up the binaries and env variables from the host server. Is there any option/alternate solution for making the docker use the original mounted directory for docker exec too ?
If I access/start the container with docker attach or docker run -i -t /bin/bash, I get the mounted directory as my / file system, which gives me an entirely independent environment from my host system. But this doesn't happen with the docker exec command.
Please help !!
You are operating under a misconception. The docker image only contains what was installed in it. This is usually a very cut down version of an operating system for efficiency reasons.
The docker container is started from an image - and that's a running version, which can change and store state - but may be discarded.
docker run starts a container from an image. You can run the same image multiple times to create completely different containers (which happen to have the same starting point for their content).
docker exec attaches to one of those containers to run a command. So you will only see the things inside it that ... were inside the image, or added post start (like log files). It has no vision of the host filesystem, and may not be the same OS - the only requirement is that it shares elements of the kernel ... although it usually has a selection of the commonly used binaries.
And when you run an image to create a container, you can specify a mount. One of the options when you do this is passing through a host filesystem, with e.g. -v /path/on/host:/path_in/container. But you don't have to, you can use data containers or use a docker volume mount instead. e.g. docker run -v /mount creates a mount point within the container, using the docker filesystem, which isn't part of the parent host. This can be used to make a data container with: docker create -v /path/to/data --name data_for_acontainer some_basic_image
And then mount volumes from that data container on a new one:
docker run -d --volumes-from data_for_acontainer some_app_image
Which will attach that data container onto the /path/to/data mount. But in neither case is the 'host' filesystem touched directly - this is the whole point of dockerising things.

Resources