I am running Airflow on Docker using pucker/docker-airflow image
docker run -d -p 8080:8080 puckel/docker-airflow webserver
How do I make pySpark available?
My goal is to be able to use Spark within my DAG tasks.
Any tip?
Create a requirements.txt, add all the dependencies in this file and then follow: https://github.com/puckel/docker-airflow#install-custom-python-package
- Create a file "requirements.txt" with the desired python modules
- Mount this file as a volume -v $(pwd)/requirements.txt:/requirements.txt (or add it as a volume in docker-compose file)
- The entrypoint.sh script execute the pip install command (with --user option)
Related
I am trying to save a figure to my local after I build and run my docker file.
My docker File is this
FROM python:3.7
WORKDIR /usr/src/app
# copy all the files to the container
COPY . .
RUN mkdir -p /root/.config/matplotlib
RUN echo "backend : Agg" > /root/.config/matplotlib/matplotlibrc
RUN pip install pandas matplotlib==2.2.4
CMD ["python", "./main.py"]
My main.py is this
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data/coding-environment-exercise.csv')
st_name=data['student']
marks=data['mark']
x=list(st_name)
y=list(marks)
print(x)
out_path = '/output/your_filename.png'
plt.plot(x,y)
plt.savefig("test2.png",format="png")
However after I run this docker file via these commands I can't find the png. It tried my code in local python ide. It saves the figure however I couldn't do it via docker.
docker build -t plot:docker .
docker run -it plot:docker
A Docker container has it's own separate file system from the host running the container. In order to obtain the image from the host you must mount a so called volume within the container which you can do using the -v option. You must mount the directory which should contain your image.
docker run -it -v /path/on/host:/path/in/container plot:docker
Please see the Docker documentation on volumes for more details.
Another option to obtain the image is the use of the docker cp command as long as the container is not yet removed which allows you to copy files from within the container to the host (and the other way round).
docker cp <container-name>:/path/to/file/in/container.png /target/path/on/host.png
You can set the container name using the --name flag in the docker run commmand.
docker run --name my-container -it plot:docker
I'm trying to replicate docker run command with options within a docker-compose file:
My Dockerfile is:
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y
RUN apt-get install -y python3-pip python3-dev python3-opencv
RUN apt-get install -y libcanberra-gtk-module libcanberra-gtk0 libcanberra-gtk3-module
WORKDIR /
RUN mkdir /imgs
COPY app.py ./
CMD ["/bin/bash"]
And I use the following command to run the container so that it can display images from shared volume properly:
docker build -t docker_test:v1 .
docker run -it --net=host --env=DISPLAY --volume=$HOME/.Xauthority:/root/.Xauthority docker_test:v1
In order to replicate the previous command, I tried the docker-compose file below:
version: "3.7"
services: docker_test:
container_name: docker_test
build: .
environment:
- DISPLAY=:1
volumes:
- $HOME/.Xauthority:/root/.Xauthority
- $HOME/docker_test/imgs:/imgs
network_mode: "host"
However, after building the image and running app script from inside container (image copied on container, not from shared volume):
docker-compose up
docker run -ti docker_test_docker_test
python3 app.py
The following error arises:
Unable to init server: Could not connect: Connection refused
(OpenCV Image Reading:9): Gtk-WARNING **: 09:15:24.889: cannot open display:
In addition, volumes do not seem to be shared
docker run never looks at a docker-compose.yml file; every option you need to run the container needs to be specified directly in the docker run command. Conversely, Compose is much better at running long-running process than at running interactive shells (and you want the container to run the program directly, in the much the same way you don't typically start a python REPL and invoke main() from there).
With your setup, first you're launching a container via Compose. This will promptly exit (because the main container command is an interactive bash shell and it has no stdin). Then, you're launching a second container with default options and manually running your script there. Since there's no docker run -e DISPLAY option, it doesn't see that environment variable.
The first thing to change here, then, is to make the image's CMD be to start the application
...
COPY app.py .
CMD ./app.py
Then running docker-compose up (or docker run your-image) will start the application without further intervention from you. You probably need a couple of other settings to successfully connect to the host display (propagating $DISPLAY unmodified, mounting the host's X socket into the container); see Can you run GUI applications in a Linux Docker container?.
(If you're trying to access the host display and use the host network, consider whether an isolation system like Docker is actually the right tool; it would be much simpler to directly run ./app.py in a Python virtual environment.)
I am facing an issue where after runnig the container and using bind mount to mount the directory on host to container I am not able to see new files created in host machine inside container.Below is my project structure.
The python code creates a file inside the container which should be available inside the host machine too however this does happen when I start the container with below command. However updates to python code and html is available inside the container.
sudo docker container run -p 5000:5000 --name flaskapp --volume feedback1:/app/feedback/ --volume /home/deepak/PycharmProjects/NewDockerProject/sampleapp:/app flask_image
However after starting the container using below command, everything seems to work fine. I can see all the files from container to host and vice versa(new created , edited).I git this command from docker in the month of lunches book.
sudo docker container run --mount type=bind,source=/home/deepak/PycharmProjects/NewDockerProject/sampleapp,target=/app -p 5000:5000 --name flaskapp
Below is the content of my dockerfile
FROM python:3.8-alpine
WORKDIR /app
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python","main.py"]
Could someone please help me in figuring out the difference between the two commands ? I am using ubuntu. Thank you
In my case i got working volumes using following docker run args (but i am running without --mount type=bind):
docker run -it ... -v mysql_data:/var/lib/mysql -v storage:/usr/shared/app_storage
where:
mysql_data is a volume name
/var/lib/mysql path inside container machine
you could list volumes as:
docker volume ls
and inspect them to see where it points on your system (usually /var/lib/docker/volumes/{volume_nanme}/_data):
docker volume inspect mysql_data
to create volume use following command:
docker volume create {volume_name}
This is the first time I am using docker to run a release file.
I installed docker using
npm install -g docker
I am trying to use Prometheus.
https://github.com/prometheus/prometheus
I followed following steps
Download Prometheus [https://hub.docker.com/r/prom/prometheus/]
docker pull prom/prometheus
C:\xampp\htdocs\prometheus>docker pull prom/prometheus
Saved file tree to doc-filelist.js
Copied JS to doc-script.js
Compiled CSS to doc-style.css
Run docker [https://github.com/prometheus/prometheus]
docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
C:\xampp\htdocs\prometheus>docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
Saved file tree to doc-filelist.js
Copied JS to doc-script.js
Compiled CSS to doc-style.css
I am not sure what is wrong. Please advice
I think what you are installing is this docker which is a documentation generator rather than this docker , which is a container technology.
So , refer to this to install the correct Docker first .Then execute the following commands:
# docker pull prom/prometheus
# docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
Then open the browser and visit http://127.0.0.1:9090 and you should see the Prometheus UI.
If you are using docker to run Prometheus, make sure
Docker engine is installed.
Keep a configuration file (promethues.yml) file.
run the container using the command.
docker run -d -p 9090:9090 -v /path_where_config_file_present/:/etc/prometheus -v /path_where_data_file_to_be_dumped/:/prometheus prom/prometheus:v2.4.0 --config.file=/etc/prometheus/prometheus.yml
I have airflow running on an EC2 instance, and I am scheduling some tasks that spin up a docker container. How do I do that? Do I need to install docker on my airflow container? And what is the next step after. I have a yaml file that I am using to spin up the container, and it is derived from the puckel/airflow Docker image
I got a simpler solution working which just requires a short Dockerfile to build a derived image:
FROM puckel/docker-airflow
USER root
RUN groupadd --gid 999 docker \
&& usermod -aG docker airflow
USER airflow
and then
docker build -t airflow_image .
docker run -v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /usr/bin/docker:/bin/docker:ro \
-v /usr/lib/x86_64-linux-gnu/libltdl.so.7:/usr/lib/x86_64-linux-gnu/libltdl.so.7:ro \
-d airflow_image
Finally resolved
My EC2 setup is running unbuntu Xenial 16.04 and using a modified the puckel/airflow docker image that is running airflow
Things you will need to change in the Dockerfile
Add USER root at the top of the Dockerfile
USER root
mounting docker bin was not working for me, so I had to install the
docker binary in my docker container
Install Docker from Docker Inc. repositories.
RUN curl -sSL https://get.docker.com/ | sh
search for wrapdocker file on the internet. Copy it into scripts directory in the folder where the Dockerfile is located. This starts the docker daemon inside airflow docker
Install the magic wrapper
ADD ./script/wrapdocker /usr/local/bin/wrapdocker
RUN chmod +x /usr/local/bin/wrapdocker
add airflow as a user to the docker group so the airflow can run docker jobs
RUN usermod -aG docker airflow
switch to airflow user
USER airflow
Docker compose file or command line arguments to docker run
Mount docker socket from docker airflow to the docker image just installed
- /var/run/docker.sock:/var/run/docker.sock
You should be good to go !
You can spin up docker containers from your airflow docker container by attaching volumes to your container.
Example:
docker run -v /var/run/docker.sock:/var/run/docker.sock:ro -v /path/to/bin/docker:/bin/docker:ro your_airflow_image
You may also need to attach some libraries required by docker. This depends on the system you are running Docker on. Just read the error messages you get when running a docker command inside the container, it will indicate you what you need to attach.
Your airflow container will then have full access to Docker running on the host.
So if you launch docker containers, they will run on the host running the airflow container.