I have a container that runs a Python script in order to download a few big files from Amazon S3. The purpose of this container is just to download the files so I have them on my host machine. Because these files are needed by my app (which is running in a separate container with a different image), I bind mount from my host to the app's container the directory downloaded from the first container.
Note 1: I don't want to run the script directly from my host as it has various dependencies that I don't want to install on my host machine.
Note 2: I don't want to download the files while the app's image is being built as it takes too much time to rebuild the image when needed. I want to pass these files from outside and update them when needed.
Is there a way to make the first container to download those files directly to my host machine without downloading them first in the container and then copying them to the host as they take 2x the space needed before cleaning up the container?
Currently, the process is the following:
Build the temporary container image and run it in order to download
the models
Copy the files from the container to the host
Cleanup unneeded container and image
Note: If there is a way to download the files from the first container directly to the second and override them if they exist, it may work too.
Thanks!
You would use a host volume for this. E.g.
docker run -v "$(pwd)/download:/data" your_image
Would run your_image and anything written to /data inside the container would actually write to the host in the ./download directory.
Related
I am finding many answers on how to develop inside a container in Visual Studio Code with the Remote Containers extension, but surprisingly none address my use case.
I can add to the repo only from the host, but can run the code only from the container. If I edit a file in the container, I have to manually copy it to the host so I can commit it, but if I edit the file on the host, I have to manually copy it into the container so I can test it.
I would like to set up the IDE to automatically copy files from the host into the container whenever I save or change a file in a particular directory. This way I can both commit files on the host, and run them in the container, without having to manually run docker cp each time I change a file. Files should not automatically be copied from the container to the host, since the container will contain built files which should not be added to the repo.
It seems highly unlikely that this is impossible; but how?
This can be configured using the Run on Save extension.
Set it to run docker cp on save.
If I do something like
docker run -v /opt/datadir:/var/lib/mysql image
I am mapping some location inside the container to a location in the host.
How is this different to the command COPY used when writing a Dockerfile?
The major difference is seen in case we edit any of the file present inside that location.
Suppose the directory /opt/datadir contains a file temp.txt
In case of bind mount, if you try to edit the file temp.txt from the host machine, the changes will be reflected inside the container and vice-versa.
When we create COPY command in Dockerfile, it copies the content to the filesystem of the container. Hence any changes done inside the container are DOES NOT affect the files present on the host machine.
In this case, if you want changes done on the host machine to be reflected inside the container, then you need to build a docker image and run a new container using the updated image.
When to use what?
For scenarios where the resource needs frequent updates, use bind mounts.
Eg: We want to provide our web server a configuration file that might change frequently.
In case the resource is independent of host filesystem, use COPY command inside dockerfile.
Eg: .tar, .zip, .war files or any file that requires no or very few updates inside the container.
I am running windows 10 and the most recent version of docker. I am trying to run a docker image and transfer files to and from the image.
I have tried using the "docker cp" command, but from what I've seen online, this does not appear to work for docker images. It only works for containers.
When searching for info on this topic, I have only seen responses dealing with containers, not for images.
A docker image is basically a template used for containers. If you add something to the image it will show up in all of the containers. So if you just want to share a single set of files that don't change you can add the copy command to your docker file, and then run the new image and you'll find the container.
Another option is to use shared volumes. Shared volumes are basically folders that exist on both the host machine and the running docker container. If you move a file on the host system into that folder it will be available on the container (and if you put something from the container into the folder on the container side you can access it from the host side).
I know that when we stop docker our changes are lost. There are many answers how to prevent this - commit each time. Idea is that when docker runs it will spin up a fresh container based on the image. On the other hand container persists some data after it exists unless you start using --rm.
Just to simplify:
If you run apt-get install vim, you must commit to save the change
BUT If you change nginx.conf or upload new file to HDFS, you do not lose the data.
So, just curious:
How docker knows what to save and what not? Ex: At the end of apt-get-install we have new files in the system. The same is when I upload new file. for the container/image there is NO difference , Right? Just I/O modification. So how docker know which modification should be saved when we stop the image?
The basic rules here:
Anything you explicitly store outside the container — a database, S3 — will outlive the container.
If you attach a volume to the container when you create the container using a docker run -v option or a Docker Compose volumes: option, any data written to that directory outlives the container. (If it’s a named volume, it lasts until you docker volume rm it.)
Anything else in the container filesystem is lost as soon as you docker rm the container.
If you need things like your application source code or a helper tool installed in an image, write a Dockerfile to describe how to build the image and run docker build. Check the Dockerfile into source control alongside your application.
The general theory of working with Docker is that you always start from a clean slate. When you docker build an image, you start from a base image and install your application into it; you never try to upgrade an installed application. Similarly, when you docker run a container, you start from a fresh copy of its image.
So the clearest answer to the question you ask is really, if you consistently docker rm a container when you stop it, when you docker run a new container, it will have the base image plus the content from the mounted volumes. Docker will never automatically persist anything outside of this.
You should never run docker commit: this leads to magic images that can’t be recreated later (in six months when you discover a critical security issue that risks taking your site down). Similarly, you should never install software in a running container, because it will be lost as soon as the container exits; add it to your Dockerfile and rebuild.
For any Container working with the Docker platform by default all the data generated is temporary and all the file generation or data generation is temporary and no data will persist if you have not mounted the filesystem part of if you have not attached volumes to the container.
IF you are finding that the nginx.conf is getting reused even after changes i would suggest try to find what directories are you trying to mount or mapped to the docker volumes.
The configurations for nginx which reside at /etc/nginx/conf.d/* and you might be mapping the volume with this directory. So if you make any changes in a working container and then remove the container the data will still persist as the data gets written to the writable layer. If the new container which you deploy later with the same volume mapping you will find all the changes you had initially done in the previous case are reflected in the newer container as well.
I have a directory in my Docker container, and I'm trying to make it available locally using -v screenshots:/srv/screenshots in my docker run command but it's not available.
Do I need to add something else to my command?
Host volumes are mapped from the host into the container, not the other way around. This is one way to have persistent storage (so the data don't disappear when the container is re-created).
You can copy the screenshot folder to your host with docker cp and map them in.
You will have your screenshots in the local screenshots folder. Mapping them in with -v screenshots:/srv/screenshots makes them appear in /srv/screenshots in the container, but these files are really on the host.
See: Mount a host directory as data volume