How to access Docker (with Spark) file systems - docker

Suppose I am running CentOS. I installed docker, then run the image.
Suppose I use this image:
https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook
Then I run
docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
Now, I can open the browser with localhost:8088 and I can create a new Jupyter notebook, type code and run, etc.
However, how can I access the file I created and, for example, commit it to github. Furthermore, if I already have some code on github, how can I pull this code and access these code from docker?
Thank you very much,

You need to mount the volume
docker run -it --rm -p 8888:8888 -v /opt/pyspark-notebook:/home/jovyan jupyter/pyspark-notebook
You should have just executed !pwd in the a new notebook and found which folder it was storing the work in. And then mounted that as a volume. When you run it like above the files would be available on your host in /opt/pyspark-notebook

Related

Unable to change working directory for a jupyter notebook running on a tensorflow docker container

I have followed the steps in the official CUDA on WSL tutorial (https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch05-sub02-jupyter) to set up a jupyter notebook. However, I can't figure out how to change the initial working directory. I tried mounting a local directory with the -v switch as well as appending to the launch command --notebook-dir, but neither one of these solutions worked. The jupyter notebook will always start under "/tf" no matter what I do. Ideally, I would like this to be the same working directory as the one I have on Windows (C:\Users\MyUser).
The only thing I haven't tried is changing the WORKDIR in the docker image "tensorflow/tensorflow:latest-gpu-py3-jupyter" supplied by hub.docker.com as I am not even sure if it is possible to edit it (line 57).
Here is a sample command I have tried running:
docker run -it --gpus all -p 8888:8888 -v /c/Users/MyUser/MyFolder:/home/MyFolder/ tensorflow/tensorflow:latest-gpu-py3-jupyter jupyter notebook --allow-root --ip=0.0.0.0 --NotebookApp.allow_origin='https://colab.research.google.com' --notebook-dir=/c/Users/MyUser/
What is the easiest way to achieve this?
I was able to solve this problem by mounting the directory I want to work in under the local directory that is given in the command "Serving notebooks from local directory:/tf". In my case it's '/tf', but yours could be different. In addition, I changed the first '/' to '//'. Also, the container name should be the last argument (per https://stackoverflow.com/a/34503625). So in your case, the command looks like:
docker run -it --gpus all -p 8888:8888 -v //c/Users/MyUser/MyFolder:/tf/home/MyFolder tensorflow/tensorflow:latest-gpu-py3-jupyter

Trying to run "comitted" Docker image, get "cannot mount volume over existing file, file exists"

I am developing a Docker image. I started with a base image and was working inside it interactively, using bash. I installed a bunch of stuff, and the install (which included compiling a lot of code) took over 20 minutes, so to save my work, I used:
$ docker commit 0f08ac958391 myproject:wip
Now when I try to run the image:
$ docker run --rm -it myproject:wip
docker: Error response from daemon: cannot mount volume over existing file, file exists /var/lib/docker/overlay2/95aa9a9ea7cc0b1ba302adbd287e4d7059ee4fbe64183404df3bc65df332ee63/merged/run/host-services/ssh-auth.sock.
What is going on? How do I fix this?
Note about related/duplicate questions: while there are other questions about this error message, none of the answers directly explain why the error happens in this situation or what to do about it. In fact, most of the questions have no answers at all.
When I ran the base image, I included a mount for the SSH agent socket:
$ docker run --rm -it -v /run/host-services/ssh-auth.sock:/run/host-services/ssh-auth.sock myproject:dev /bin/bash
This bind mounts a file from the host (actually the Docker daemon VM) to a file in the Docker container. When I committed the running image, the image contained the file /run/host-services/ssh-auth.sock. The image also contained an empty volume reference to /run/host-services/ssh-auth.sock. This means that when I ran
$ docker run --rm -it myproject:wip
It was equivalent to running
$ docker run -v /run/host-services/ssh-auth.sock --rm -it myproject:wip
Unfortunately, what that command does is create an anonymous volume and mount it into the directory /run/host-services/ssh-auth.sock in the container. This works if the container has such a directory or even if it does not. What causes it to fail is if the target name is taken up by a file. Docker will not mount a volume over a file.
The solution is to explicitly provide a mapping from a host file to the target volume. Any host file will do, but in my case it is best to use the original. So this works:
docker run --rm -it -v /run/host-services/ssh-auth.sock:/run/host-services/ssh-auth.sock myproject:wip

Docker: error response from daemon: invalid mode: /tf

I'm new to using docker and my objective is to bind mount a docker image to a file path on my host machine (shown in the below directory) so I can:
Run a Jupyter Notebook instance without losing the data every time I end my terminal session
Link my Jupyter Notebook to the same path where my training data resides
I have tried at looking at many threads on the topic to little avail. I run the command shown below and am using Linux Mint:
sudo docker run -it --rm --gpus all -v "$(pwd):/media/hossamantarkorin/Laptop Data II/1- Educational/ML Training/Incident Detection/I75_I95 RITIS":"/tf" -p 8888:8888 tensorflow/tensorflow:2.3.0rc1-gpu-jupyter
What am I doing wrong here?
Thanks,
Hossam
This usually happens when docker is not running.
Try sudo service docker start before entering your command.
I just wanted to provide an update on this. The easiest way to work on your local directory is to:
Do a change directory to where you want to work
Run your docker while bind mounting to your pwd:
sudo docker run -it --rm --gpus all -v "$(pwd):/tf" -p 8888:8888 tensorflow/tensorflow:2.3.0rc1-gpu-jupyter

How to run a jupyter notebook at a particular folder in docker

I have set up jupyter notebook in the correct port in docker , everytime I need to upload data into the notebook and do analysis , is there anyway I can get up my jupyter file location to a particular file ,keeping in mind I'm using docker.
You need to volume your folder to the Docker container. An example that you use jupyter/all-spark-notebook image, so you can run:
docker run -it --rm -p 8888:8888 -p 4040:4040 -v your-path:/home/jovyan/workspace jupyter/all-spark-notebook
Update your-path to the path contains your notebooks

Plone on Docker always starts from scratch

I'm trying to develope Plone project with Docker, i have used this official image of Plone 5.2.0, the images is built a run perfectly with:
$ docker build -t plone-5.2.0-official-img .
$ docker run -p 8080:8080 -it plone-5.2.0-official-cntr
But the plone restarts each time i run the docker container asking to create the project from skratch.
Anybody could help me with this.
Thanks in advance.
You can also use a volume for data like:
$ docker run -p 8080:8080 -it -v plone-data:/data plone-5.2.0-official-cntr
The next time you'll run a new container it will re-use previous data.
If this helps,
Volumes are the docker way to persist data. You can read it up over here
When running the container just add a -v option and specify your path to store your data.
$ docker run -p "port:port" -it -v "path"
This is expected behavior, because docker run starts a new container, which doesn't have the state from your previous container.
You can use docker start CONTAINER, which will have the state from that CONTAINER's setup
https://docs.docker.com/engine/reference/commandline/start/
A more common approach is to use docker-compose.yml and docker-compose up -d, which will, in most cases, reuse previous state.
https://docs.docker.com/compose/gettingstarted/

Resources