I am aware that you can use curl <registry name>:5009/v2/_catalog to list images.
From this:
https://docs.docker.com/registry/deploying/#storage-customization
Is is stated "By default, the registry stores its data on the local filesystem", but where is that. -I am running Docker For Mac?
I just want an easy way to delete local images, and maybe it is a matter of finding this default host location - yes I looks like I could just use a custom volume.
I am still curious to where this default storage is located.
Is is stated "By default, the registry stores its data on the local filesystem", but where is that. -I am running Docker For Mac?
By default inside the container it's at /var/lib/registry. Each container gets its own read/write layer as part of the overlay filesystem (all of the image layers are read-only). This is stored within /var/lib/docker on the docker host, probably under a container or overlay sub directory, and then under a unique directory name per container. And then that, within Docker Desktop on Mac, is within a VM. You could probably exec into the container to see this path, but I wouldn't recommend modifying the contents directly for your request.
I just want an easy way to delete local images, and maybe it is a matter of finding this default host location - yes I looks like I could just use a custom volume.
First, just to verify you mean deleting images in the registry and not on the local docker engine. If the latter, you can run docker image prune.
Assuming you have a need to run a registry server, you'll want to pass the setting REGISTRY_STORAGE_DELETE_ENABLED=true to the registry (as an environment variable, or by setting the associated yaml config). Then you can call the manifest delete API. You'll want to get the digest and then call
curl -X DELETE http://<registry>:5000/v2/<repo>/manifests/<digest>
Once that's done, you need to run a garbage collection on the registry while no writes are occurring (by setting the server to read-only, picking an idle time, or stopping the registry and running a separate container against the same volume):
docker exec registry /bin/registry garbage-collect /etc/docker/registry/config.yml --delete-untagged
For simplifying the API calls, I've got a regclient project that includes regctl image delete ... and regctl tag delete ... (the latter only deleting one tag even if the manifest is referenced multiple times). And if you want to automate a cleanup policy, there's regclient/regbot in that same project that lets you define your own cleanup policy.
By default the registry container keep the data in /var/lib/registry within the container. You can bind mount a local directory on your host like the example did -v /mnt/registry:/var/lib/registry; which in this case the data will be kept on your host at /mnt/registry. You can also pre-create a volume like docker volume create registry and bind -v registry:/var/lib/registry; which in this case data will be stored at /var/lib/docker/volumes on your host.
Related
Docker newcomer here.
I have a simple image of a django website with a volume defined for the app directory.
I can bind this volume to the actual folder where I do the development with this command :
docker container run --rm -p 8000:8000 --mount type=bind,src=$(pwd)/wordcount-project,target=/usr/src/app/wordcount-project wordcount-django
This works fairly well.
Now I tried to push that simple example in a swarm. Note that I have set up a local registry for the image to be available.
So to start my service I'd do :
docker service create -p 8000:8000 --mount type=bind,source=$(pwd)/wordcount-project,target=/usr/src/app/wordcount-project 127.0.0.1:5000/wordcount-django
It will work after some tries but only because it run on the local node (where the actual folder is) and not a remote node (where there is no wordcount-project folder).
Any idea how to solve this so that this folder can be accessible to all node and yet, still be accessible locally for development ?
Thanks !
Using bind-mount in docker swarn is not recommended, as you can read in the doc. In particular :
Important: Bind mounts can be useful but they can also cause problems. In most cases, it is recommended that you architect your application such that mounting paths from the host is unnecessary.
However, if you still want to use bind-mount, then you have two possibility :
Make sure your folder exists on all the nodes. The main problem here is that you'll have to update it everytime on every node.
Use a shared filesystem (such as sshfs for example) and mount it on a directory on each node. However, now that you have a shared filesystem, then you can just use a docker data volume and change the driver.
You can find some documentation on changing the volume data driver here
Let me explain what I want with a silly example:
After I do "docker pull" to download an image to my host, I want to create a file /etc/myname on this image to have the exact name of this host. As a result, all containers running this image on this host can find the hostname by reading /etc/myname.
Plus, I want the file /etc/myname to be shared across all contains on this host. I know I can easily create this file separately in each container, but that's not what I want.
(Again, this is just a silly example. I don't actually need to store the hostname. I want to store a large amount of host-specific data in a shared file, without using a shared volume).
I can do that by manually creating the file myself, where $dir is the top-most layer of the image:
dir=17024e41f8b6c958c5c9e60bffa8b6c8b2da5a1235b6e18085d5059f9602f605
echo $HOSTNAME > /var/lib/docker/aufs/diff/$dir/etc/myname
But is there a less hacky way to do this?
The easiest way to do this would be to use a shared volume, and that is in fact the only way to do it currently. I assume you know about bind mounting in docker, but I'll show here just in case.
To the docker run command, as well as passing -v <volume name>:<path in container> you can also pass <path on host>:<path in container>. So you could have your metadata in the same place on each host and then bind mount it into the containers.
Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.
I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.
Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??
Following the "data container" pattern, I can easily:
Build a base image with all the static program and config files.
From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
Push both images to Docker Hub.
Pull them down to AWS.
Run the data and base images, using "--volume-from" to refer to the data.
Using "docker volume create":
I'm unclear how to copy my DB into the volume.
I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.
Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?
When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.
Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.
The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).
I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository