All branches of Git lab and Image lost when Docker container exits? How to save? - docker

If my container exits then all my images got lost and their respective data as well.
Could you please answer how to save the data (Here in this case of gitlab, we have multiple branches). How to save those branches even if container exits and next time when we restart the container, I should get all my old branches back?

This question is a bit light on specific details of your workflow, but the general answer to the need for persistent data in the ephemeral container world is volumes. Without a broader understanding of your workflow and infrastructure, it could be as simple as making sure that your gitlab data is in a named local volume. E.g. something you create with docker volume or an image that everyone uses that has a VOLUME location identified in the Dockerfile and is bind mounted to a host location at container run time.
Of course once you are in the world of distributed systems and orchestrating multi-node container environments, local volumes will no longer be a viable answer and you will need to investigate shared volume capabilities from a storage vendor or self-managed with NFS or some other global filesystem capabilities. A lot of good detail is provided in the Docker volume administrative guide if you are new to the volume concept.

Related

How to take backup of docker volumes?

I'm using named volumes to persist data on Host machine in the cloud.
I want to take backup of these volumes present in the docker environment so that I can reuse them on critical incidents.
Almost decided to write a python script to compress the specified directory on the host machine and push it to the AWS S3.
But I would like to know if there is any other approaches to this problem?
docker-volume-backup may be helpful. It allows you to back up your Docker volumes to an external location or to a S3 storage.
Why use a Docker container to back up a Docker volume instead of writing your own Python script? Ideally you don't want to make backups while the volume is being used, so having a container on your docker-compose able to properly stop your container before taking backups can effectively copy data without affecting the application performance or backup integrity.
There's also this alternative: volume-backup

Where do docker images' new Files get saved to in GCP?

I want to create some docker images that generates text files. However, since images are pushed to Container Registry in GCP. I am not sure where the files will be generated to when I use kubectl run myImage. If I specify a path in the program, like '/usr/bin/myfiles', would they be downloaded to the VM instance where I am typing "kubectl run myImage"? I think this is probably not the case.. What is the solution?
Ideally, I would like all the files to be in one place.
Thank you
Container Registry and Kubernetes are mostly irrelevant to the issue of where a container will persist files it creates.
Some process running within a container that generates files will persist the files to the container instance's file system. Exceptions to this are stdout and stderr which are both available without further ado.
When you run container images, you can mount volumes into the container instance and this provides possible solutions to your needs. Commonly, when running Docker Engine, it's common to mount the host's file system into the container to share files between the container and the host: docker run ... --volume=[host]:[container] yourimage ....
On Kubernetes, there are many types of volumes. An seemingly obvious solution is to use gcePersistentDisk but this has a limitation in that it these disks may only be mounted for write on one pod at a time. A more powerful solution may be to use an NFS-based solution such as nfs or gluster. These should provide a means for you to consolidate files outside of the container instances.
A good solution but I'm unsure whether it is available, would be to write your files as Google Cloud Storage objects.
A tenet of containers is that they should operate without making assumptions about their environment. Your containers should not make assumptions about running on Kubernetes and should not make assumptions about non-default volumes. By this I mean, that your containers will write files to container's file system. When you run the container, you apply the configuration that e.g. provides an NFS volume mount or GCS bucket mount etc. that actually persists the files beyond the container.
HTH!

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

How persistent are docker data-only containers

I'm a bit confused about data-only docker containers. I read it's a bad practice to mount directories directly to the source-os: https://groups.google.com/forum/#!msg/docker-user/EUndR1W5EBo/4hmJau8WyjAJ
And I get how I make data-only containers: http://container42.com/2014/11/18/data-only-container-madness/
And I see somewhat similar question like mine: How to deal with persistent storage (e.g. databases) in docker
But what if I have a lamp-server setup.. and I have everything nice setup with data-containers, not linking them 'directly' to my source-os and make a backup once a while..
Than someone comes by, and restarts my server.. How do I setup my docker (data-only)-containers again, so I don't lose any data?
Actually, even though it was shykes who said it was considered a "hack" in that link you provide, note the date. Several eons worth of Docker years have passed since that post about volumes, and it's no longer considered bad practice to mount volumes on the host. In fact, here is a link to the very same shykes saying that he has "definitely used them at large scale in production for several years with no issues". Mount a host OS directory as a docker volume and don't worry about it. This means that your data persists across docker restarts/deployments/whatever. It's right there on the disk of the host, and doesn't go anywhere when your container goes away.
I've been using docker volumes that mount host OS directories for data storage (database persistent storage, configuration data, et cetera) for as long as I've been using Docker, and it's worked perfectly. Furthermore, it appears shykes no longer considers this to be bad practice.
Docker containers will persist on disk until they are explicitly deleted with docker rm. If your server restarts you may need to restart your service containers, but your data containers will continue to exist and their volumes will be available to other containers.
docker rm alone doesn't remove the actual data (which lives on in /var/lib/docker/vfs/dir)
Only docker rm -v would clear out the data as well.
The only issue is that, after a docker rm, a new docker run would re-create an empty volume in /var/lib/docker/vfs/dir.
In theory, you could with symlink redirect the new volume folders to the old ones, but that supposes you notes which volumes were associated to which data container... before the docker rm.
It's worth noting that the volumes you create with "data-only containers" are essentially still directories on your host OS, just in a different location (/var/lib/docker/...). One benefit is that you get to label your volumes with friendly identifiers and thus you don't have to hardcode your directory paths.
The downside is that administrative work like backing up specific data volumes is a bit of a hassle now since you have to manually inspect metadata to find the directory location. Also, if you accidentally wipe your docker installation or all of your docker containers, you'll lose your data volumes.

What are the pros and cons of docker data volumes

I am looking at Managing Data in Containers. There are two ways to manage data in Docker.
Data Volumes, and
Data Volume Containers
https://docs.docker.com/userguide/dockervolumes/
My question is: What are the pros and cons of these two methods?
I wouldn't think of them as different methods.
Volumes are the mechanism to bypass the Union File System, thereby allowing data to be easily shared with other containers and the host. Data-containers simply wrap a volume (or volumes) to provide a handy name which you can use in --volumes-from statements to share data between containers. You can't have data-containers without data volumes.
There are basically three ways you can manage data within a container and it would perhaps be best to outline and provide some case-by-case examples as to when and why you would use these.
First, you have the option to use the Union File System. Each container that runs has an associated writable layer provided by the UFS, so if I run a container based on my choice image, the writes I perform during that session when the container runs can be committed back to the image and persisted, through the means that they are permanently associated with the image's build. So if you have a Debian image and do apt-get update && apt-get install -y python, you have the possibility to commit that back to the image, share it with others and save everyone the time required to perform all those multiple network requests to have an up-to-date container with Python pre-installed.
Secondly, you can use volumes. When the container runs, writes to the directories that are targeted as volumes are kept distinctively of the UFS and remain associated with the container. As long as the associated container exists, so does the volume. Say you had a container who's entry point is a process that produces logs at /var/logs/myapp. Without volumes, the data written by the process could inadvertently be committed back to the image, needlessly adding to it's size. Conversely, as long as the container exists, should the process crash and bring down the container, you can access the logs and inspect what happened. By it's very nature, data stored in volumes associated with such containers is meant to be transient--discard the container and the data generated by the process is gone. If the container's image is updated and your dealt a new one, or you have no need for the generated logs anymore, you can simply remove and recreate the container and effectively flush the generated logs from disk.
While this seems great, what happens with, say, data that's written by a database? Surely, it's not something you'd keep as a part of the UFS, but you can't simply have it flushed if you update the DB image or switch over from foo/postgresql to bar/postgresql and end up with a new container in each case. Clearly, it's unacceptable and that's where the third option comes in, to have a persistent, named container with associated volumes and utilizing the full scope of volume capabilities, such as being able to share them with other containers, even when the associated container isn't actually running. With this pattern, you can have a dbdata container, with /var/lib/postgresql/data configured as a volume. You can then reliably have transient database containers and remove and re-create them leniently without losing important data.
So, to recap some facts about volumes:
Volumes are associated with containers
Writing to volume directories writes directly to the volume itself, bypassing the UFS
This makes it possible to share volumes independently across several containers
Volumes are destroyed when the last associated container is removed
If you don't want to lose important data stored in volumes when removing transient containers, associate the volume with a permanent, named container and share it with the non-persisting containers to retain the data
Then, as a general rule of thumb:
Data which you want to become a permanent feature for every container environment should be written to UFS and committed to the image
Data which is generated through by the container's managed process should be written to a volume
Data written to a volume which you don't want to accidentally lose if you remove a container should be associated with a named container which you intend to keep, and then shared with other transient containers which can be safely removed afterwards
Data containers offer:
A sensible layer of abstraction for your storage. The files that make up your volume are stored and managed for you by Docker.
Data containers are very handy to share using the special "--volumes--from" directive.
The data is automatically cleaned up by deleting the data container.
Project's like Flocker demonstrate how eventually the storage associated with a container will be shared between docker hosts.
Can use the "docker cp" command to pull files out of the container onto the host.
Data volume mappings offer:
Simpler to understand
Explicit and direct control over where the data is stored.
I have experienced file ownership and permission issues. For example Docker processes running as root within a container create files owned by root on the file-system. (Yes, I understand that data volumes store their data the same way, but using the "docker cp" command pulls out the files owned by me :-))
In conclusion I think it really boils down to how much control you wish to exert over the underlying storage. Personally I like the abstraction and indirection provided by Docker, for the same reason I like port mapping.

Resources