Using host filesystem as a read-only base in docker - docker

In docker, is it possible to use part of the host's filesystem to be mounted as read-only in the docker image but any write on it will be on the COW/UFS layer? Below is the usecase I am looking at.
1) We have a proprietary product that takes forever to install with lots of manual intervention. However once the install base is completed the core files are almost not changed as it allows a node-level configuration to be placed in a separate directory that just references the install base. Of course if we need to update the core files then it will be on the host. The core installation will take up about 8GB file space on the host machine.
The host core installation may be virtualized (VMWare or VirtualBox).
2) The core installation would also write its metadata on a database, and each created node will write additional metadata stuff on it. If the DB installation is on the host, can docker run the DB process on a docker image and just reference the DB binaries and data partition as read-only, but write its changes on the data partition on the layer?
If it helps here is a sample relationship I am looking at:
-> Host is a VirtualBox running CentOS, and has the installation of proprietary product and its database.
-> Container A1 will spawn a database process based on the existing database state (empty except for the metadata made during installation).
-> Container A2 will spawn a product process, create the product node using the database offered by A1, and run the build,test,deploy routines.
I need to spawn multiple pairs of the node+database on demand for continuous integration. The setup above should allow me to bring up Container pairs for each isolated node that is needed by our development team. Theoretically I can mount the product base directory as read/write but I think there will be some operations that write data on it (e.g. logs) that I would like to be done on the product process layer instead.
Thanks.

Related

How to migrate Nextcloud Docker to a new machine

I have a Nextcloud installation on a server that was installed using docker-compose. This installation utilizes a Nextcloud docker image and a separate MySQL (8.0) docker image for database access. The data and configuration files are placed in external volumes specified in the docker-compose.yml file.
I have recently put together a new machine that has more memory, a faster CPU, and (most importantly) much more disk space. I would like to migrate my current installation to the new machine.
The actual installation is simple enough: I can simply copy my docker-compose.yml file to the new machine and run it. The problem is with the data and the (somewhat unique) configuration that I have. I would like to get those onto the new machine.
The issue of migrating a dockerized Nextcloud installation has different issues from those associated with migrating a bare-metal or VM installation. For one thing, there is no clear way to place the installation into maintenance mode, you are working with two containers (effectively, this is like coordinating two different machines) and many of the steps described for migrating a bare-metal installation will not work reliably for a containerized installation (yes, one can go into the container to run some of the commands. required, but my attempts to do this resulted in screwed-up migrations).
Doing Google searches, I am seeing plenty of articles and instructions on how to migrate bare-metal Nextcloud installations from one machine to another, and how to migrate bare-metal (and virtual machine) installations to Docker. The procedures are pretty complex and involve placing the installation into maintenance mode and performing various backups and restores. Unfortunately, while I have seen a few people asking about how to migrate dockerized Nextcloud installations, there are no clear instructions on how to do this (at least, none that actually work!). Even the Nextcloud site does not discuss this!
Has anyone successfully migrated a dockerized Nextcloud installation from one machine to another? If so, how exactly was this done?
Was just able to do this myself, although I'm migrating my nextcloud install off my primary home server to a slower NAS-ish box I salvaged together after a move.
The main issues I ran into were file/dir ownership moving from one machine to another. Secondary was ensuring trusted domains were set correctly in config.php
I'm sure it'd be better to use rsync to copy/move files from machine to machine and ensure you keep ownership intact, but I used scp and changed ownership manually. Your nextcloud_data container needs the www-data user to have ownership of the dir you have mapped to /var/www/html and the nextcloud_db (I use mariadb here, YMMV) container needs the systemd-coredump user to have ownership of the dir you have mapped to /var/lib/mysql (or whatever your db backend equivalent is)
Then just make sure you switch over your trusted_domains and trusted_proxies, either using docker-compose env vars, or by editing /var/www/html/config/config.phpdirectly.
Based on Raphael PICCOLO's comments, I created a tarball of everything in the Volumes I was using for my original installation, created a new installation on my target machine, then extracted the tarball on the new machine. There is, however, one other step that must be taken if you do it this way: you must change the ownership of all the files in the tarball so that they are owned by the userID used by the new Nextcloud installation. Otherwise, the new Nextcloud applications will be unable to access any of the resources and attempts to even log in will get 500 Failures on a browser.
There is also a unique ID utilized by the MySQL container, so all the database- related data files must also undergo an ownership change.
Getting the correct userIDs is simple enough: when you first install the new Nextcloud and MySQL database, use the same volumes you had set up in the original docker-compose.yml file. Then, before untaring the data look at the userIDs of the files in the database folder and the Nextcloud folders. TThen when you put the contents of your tarball on the new installation, use chown -R to make the owership changes.
Note that I was transferring my installation from a Centos 7 machine running Docker with the traditional root user to a Centos 8 machoine running Docker in a "non- root user" mode. I do not know how permissions would be affected on other machines or modes.
Still, once the permissions were properly set up, everything works.

Isolating multiple mounted user-space file systems on the running microservice

My situation: I have a microservice running Ubuntu 18.04 on GKE for evaluating the user's code. Every time, a user logs into her project, the service receives the user ID and project ID and it mounts a correct GCS bucket through a user-space file system based on these IDs. This service can be accessed by multiple users at the same time.
My goal: How can I achieve user isolation in a way that each user "lives" in their own filesystem and can't "see" other mounted file systems?
Ideas:
Run a docker container inside a docker container
Start pods on demand every time a new user logs in
Isolate users on the OS level
What about to let your application to spawn a Pod for each user using SecurityContext?
You could specify fsGroup (from the doc: volumes that support ownership management are modified to be owned and writable by the GID specified in fsGroup) in order to enhance the segregation you want.
The challenging part is the clean-up of old Pods no more used, so it must be supervised by your application or, in case of logout (I guess since I don't have so many details about your architecture) performs a clean-up taking advantages of the label system provided by Kubernetes.

Where should production critical and non-production non-critical data stored?

I was asked this question in an interview and i m not sure of the correct answer hence I would like your suggestions.
I was asked whether we should persist production critical data inside of the docker instance or outside of it? What would be my choice and the reasons for it.
Would your answer differ incase we have a non-prod non critical data ?
Back your answers with reasons.
Most data should be managed externally to containers and container images. I tend to view data constrained to a container as temporary (intermediate|discardable) data. Otherwise, if it's being captured but it's not important to my business, why create it?
The name "container" is misleading. Containers aren't like VMs where there's a strong barrier (isolation) between VMs. When you run multiple containers on a single host, you can enumerate all their processes using ps aux on the host.
There are good arguments for maintaining separation between processes and data and running both within a single container makes it more challenging to retain this separation.
Unlike processes, files in container layers are more isolated though. Although the layers are manifest as files on the host OS, you can't simply ls a container layer's files from the host OS. This makes accessing the data in a container more complex. There's also a performance penalty for effectively running a file system atop another file system.
While it's common and trivial to move container images between machines (viz docker push and docker pull), it's less easy to move containers between machines. This isn't generally a problem for moving processes as these (config aside) are stateless and easy to move and recreate, but your data is state and you want to be able to move this data easily (for backups, recovery) and increasingly to move amongst a dynamic pool of nodes that perform processing upon it.
Less importantly but not unimportantly, it's relatively easy to perform the equivalent of a rm -rf * with Docker by removing containers (docker container rm ...) and thereby deleting the application and your data.
The two very most basic considerations you should have here:
Whenever a container gets deleted, everything in the container filesystem is lost.
It's extremely common to delete containers; it's required to change many startup options or to update a container to a newer image.
So you don't really want to keep anything "in the container" as its primary data storage: it's inaccessible from outside the container, and will get lost the next time there's a critical security update and you must delete the container.
In plain Docker, I'd suggest keeping
...in the image: your actual application (the compiled binary or its interpreted source as appropriate; this does not go in a volume)
...in the container: /tmp
...in a bind-mounted host directory: configuration files you need to push into the container at startup time; directories of log files produced by the container (things where you as an operator need to directly interact with the files)
...in either a named volume or bind-mounted host directory: persistent data the container records in the filesystem
On this last point, consider trying to avoid this layer altogether; keeping data in a database running "somewhere else" (could be another container, a cloud service like RDS, ...) simplifies things like backups and simplifies running multiple replicas of the same service. A host directory is easier to back up, but on some environments (MacOS) it's unacceptably slow.
My answers don't change here for "production" vs. "non-production" or "critical" vs. "non-critical", with limited exceptions you can justify by saying "it's okay if I lose this data" ("because it's not the master copy of it").

Intro to Docker for FreeBSD Jail User - How and should I start the container with systemd?

We're currently migrating room server to the cloud for reliability, but our provider doesn't have the FreeBSD option. Although I'm prepared to pay and upload a custom system image for deployment, I nontheless want to learn how to start a application system instance using Docker.
in FreeBSD Jail, what I did was to extract an entire base.txz directory hierarchy as system content into /usr/jail/app, and pkg -r /usr/jail/app install apache24 php perl; then I configured /etc/jail.conf to start the /etc/rc script in the jail.
I followed the official FreeBSD Handbook, and this is generally what I've worked out so far.
But Docker is another world entirely.
To build a Docker image, there are two options: a) import from a tarball, b) use a Dockerfile. The latter of which lets you specify a "CMD", which is the default command to run, but
Q1. why isn't it available from a)?
Q2. where are information like "CMD ENV" stored? in the image? in the container?
Q3. How to start a GNU/Linux system in a container? Do I just run systemd and let it figure out the rest from configuration? Do I need to pass to it some special arguments or envvars?
You should think of a Docker container as a packaging around a single running daemon. The ideal Docker container runs one process and one process only. Systemd in particular is so heavyweight and invasive that it's actively difficult to run inside a Docker container; if you need multiple processes in a container then a lighter-weight init system like supervisord can work for you, but that's usually an exception more than a standard packaging.
Docker has an official tutorial on building and running custom images which is worth a read through; this is a pretty typical use case for Docker. In particular, best practice is to write a Dockerfile that describes how to build an image and check it into source control. Containers should avoid having persistent data if they can (storing everything in an external database is ideal); if you change an image, you need to delete and recreate any containers based on it. If local data is unavoidable then either Docker volumes or bind mounts will let you keep data "outside" the container.
While Docker has several other ways to create containers and images, none of them are as reproducible. You should avoid the import, export, and commit commands; and you should only use save and load if you can't use or set up a Docker registry and are forced to move images between systems via a tar file.
On your specific questions:
Q1. I suspect the best reason the non-docker build paths to create images don't easily let you specify things like CMD is just an implementation detail: if you look at the docker history of an image you'll see the CMD winds up being its own layer. Don't worry about it and use a Dockerfile.
Q2. The default CMD, any set ENV variables, and other related metadata are stored in the image alongside the filesystem tree. (Once you launch a container, it has a normal Unix process tree, with the initial process being pid 1.)
Q3. You don't "start a system in a container". Generally run one process or service in a container, and manage their lifecycles independently.

Best practice to automatically backup remotely hosted server

I am trying to setup a server for team note taking, and I am wondering what is the best way to backup its data, A.K.A my notes, automatically.
Currently I plan to run the server in a docker image.
The docker image will be hosted by a hosting service (such as Google).
I found a free hosting service that fits my need, but it does not allow mounting volumes to a docker image.
Therefore, I think the only way for me to backup my data is to transfer them to some other cloud services.
However, this requires that I have to store some sort of sensitive data for authentication in my docker image, apparently this is not cool.
So:
Is it possible to transfer data from a docker image to a cloud service without taking the risk of leaking password/private key?
Is there any other way to backup my data?
I don't have to use docker as all I need is actually Node.js.
But the server must be hosted on some remote machines because I don't have the ability/time/money to host a machine on my own...
I use borg backup to backup our servers (including docker volumes) ... and it's saved the day many times due to failure and stupidity.
It transfers over SSH so comms are encrypted. The repositories it uses are also encrypted on disk so that makes all your data safe. It de-duplicates, snapshots, prunes, compresses ... the feature list is quite large.
After the first backup, subsequent backups are much faster because it only submits the changes since the previous backup.
You can also mount the snapshots as filesystems so you can hunt down the single file you deleted or just restore the whole lot. The mounts can also be done remotely.
I've configured ours to backup /home, /etc and the /var/lib/docker/volumes directories (among others).
We rent a few cheap storage VPSs and send the data up to them nightly. They're in different geographic locations with different hosting providers, you know, because we're paranoid.
Beside docker swarm secrets, don't forget bind mounts strategies: you could have your data in a volume.
In that case, you can have a backup strategy done on the host (instead of the container at runtime), which would take that volume, compress it and save it elsewhere. See for instance this answer or this one.

Resources