Persist users across containers - docker

For the main RStudio Docker image the user/password information lives in the container. To create a new user you need to run adduser inside the container, see: https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image#multiple-users. This is an issue when updating to a new container as obviously the /etc/passwd, /etc/shadow, etc. would not persist across containers. I was thinking of mounting the files to the host like so
docker run -d -p 8787:8787 \
-v $(pwd)/passwd:/etc/passwd \
-v $(pwd)/shadow:/etc/shadow \
... rocker/rstudio
But I'm unsure if the files associated with the system users should be exposed from the container to the host. Is it better to maintain a separate image built on top of rocker/rstudio with the users added, or is there something else better?

I'd opt for creating a new image with all the users you need. That's the easiest to redeploy. Mounting the files from the host risks showing system files in the image with the wrong ownership. If you need to be able to adjust users on the fly, then a volume just for these files (not mapped from the host) may work, but you'll also want home directories and probably need to mount the entire /etc to avoid inode issues from mounting individual files.

i understand, that was not the question - but the reason its so hard to do what you are trying is - because its the wrong way to go with that.
You can fiddle your way around it, using some kind of PAM plugin to move some few users ( from your app ) to a different auth-plugin, which also can use a file for defintion, like /etc/rstudio_users - very similar as it is done with stfp and the ftp users. You can then safely share this file across the containers without being in the horrible stop sharing all users, also the system users, which will be the shutdown of you initial concept at some point anyway.
If you want this to be done right, use something like LDAP to share authentication data properly

Related

Copy many large files from the host to a docker container and back

I am a beginner with Docker and I have been searching for 2 days now and I do not understand which would be a better solution.
I have a docker container on a Ubuntu server. I need to copy many large video files to the Ubuntu host via FTP. Docker via cron will process the videos using ffmpeg and save the result to the Ubuntu host somehow so the files are accessible via FTP.
What is the best solution:
create a bind drive - I understand the host may change files in the bind drive
create a volume but I do not understand how may I add files to the volume
create a folder on the Ubuntu and have a cron that will copy using "docker cp" command and after a video has been processed to copy it to the host?
Thank you in advance.
Bind-mounting a host directory for this is probably the best approach, for exactly the reasons you lay out: both the host and container can directly read and write to it, but the host can't easily write to a named volume. docker cp is tricky, you note the problem of knowing when the process is completed, and anyone who can run any docker command at all can pretty trivially root the host; you don't want to give this permission to something network-facing.
If you're designing a larger-scale system, you also might consider an approach where no files are actually shared at all. The upload server sends the files (maybe via HTTP POST) to an internal storage service, then posts a message to a message queue (maybe RabbitMQ). That then retrieves the files from the storage service, does its work, uploads the result, and posts a response message. The big advantages of this approach are being able to run it on multiple systems, easily being able to scale the individual components of it, and not needing to worry about filesystem permissions. But, it's a much more involved design.

Docker - Safest way to upload new content to production

I am new to Docker.
Every time i need to upload new content in production I get anxious that something will go wrong so I try to understand how backups work and how to backup my Volumes which seems pretty complicated for me at the moment.
So i have this idea of creating a new image every time I want to upload new content.
Then i pull this image in the machine and stack rm/deploy the container and see if it works - if not I pull the old image.
If the code works I can then delete my old image.
Is this a proper/safe way to update production machines or I need to get going with backups and restores?
I mean i read this guide https://www.thegeekdiary.com/how-to-backup-and-restore-docker-containers/ but I don't quite understand how to restore my volumes.
Any suggestion would be nice.
Thank you
That's a pretty normal way to use Docker. Make sure you give each build a distinct tag, like a date stamp or source-control ID. You can do an upgrade like
# CONTAINER=...
# IMAGE=...
# OLD_TAG=...
# NEW_TAG=...
# Shell function to run `docker run`
start_the_container() {
docker run ... --name "$CONTAINER" "$IMAGE:$1"
}
# Shut down the old container
docker stop "$CONTAINER"
docker rm "$CONTAINER"
# Launch the new container
start_the_container "$NEW_TAG"
# Did it work?
if check_if_container_started_successfully; then
# Delete the old image
docker rmi "$IMAGE:$OLD_TAG"
else
# Roll back
docker stop "$CONTAINER"
docker rm "$CONTAINER"
start_the_container "$OLD_TAG"
docker rmi "$IMAGE:$NEW_TAG"
fi
The only docker run command here is in the start_the_container shell function; if you have environment-variable or volume-mount settings you can put them there, and the old volumes will get reattached to the new container. You do need to back up volume content, but that can be separate from this upgrade process. You should not need to back up or restore the contents of the container filesystems beyond this.
If you're using Kubernetes, changing the image: in a Deployment spec does this for you automatically. It will actually start the new container(s) before stopping the old one(s) so you get a zero-downtime upgrade; the key parts to doing this are being able to identify the running containers, and connecting them to a load balancer of some sort that can route inbound requests.
The important caveat here is that you must not use Docker volumes or bind mounts for key parts of your application. Do not use volumes for your application code, or static asset files, or library files. Otherwise the lifecycle of the volume will take precedence over the lifecycle of the image/container, and you'll wind up running old code and can't update things this way. (They make sense for pushing config files in, reading log files out, and storing things like the underlying data for your database.)

How do I have write privileges for Mounted Drives, External or otherwise for docker?

I have been working a lot with elasticsearch. I have a huge issue with trying to expand my container HDD space. I want to shift my volumes to an External HDD (NTFS or otherwise), but it seems that when I use docker-compose for something like:
volumes:
- /Volumes/Elements/volume_folder/data03:/usr/share/elasticsearch/data
It seems that it doesn't have write permissions. I confirmed it on Windows and Mac that this is the case, but I figured this is actually a common issue massively overcome with docker already. I have been looking but unable to do this.
How is this done? I have Mounted (internal) Drives on my Windows 10 Machine I wanted to set up to store this data, as well as multiple External HDD I wanted to do the same.
I notice that I as the current user always have the r/w/e privileges to the Devices, so I was thinking that there was a way to have docker run as the current user for the purposes of determining Drive privileges?
The current issue is that a container falls outside the scope of the current user, and it seems that the external is something akin to 775.
Can someone assist with this? I was looking on stackoverflow and all the mounts were based on the host machine, but NOT a different drive like this. I can easily set a volume anywhere on the machine but when it comes to External HDD or H:/ or I:/, it seems to be a different story.
I was looking at this Stackoverflow question: Docker volume on external hard drive and I was looking into seeing what I can do. When I looked at preferences, I saw that /Volumes was shared. When I did docker-compose up it says that the system is readonly. (Like previously stated). It is 755. Is there a way to run docker compose as a particular user?
Edit: I was noticing that docker-compose allows a user option, and since I saw that the mounted HDD is owned by me. I said create, maybe i can pass my user into each container and it will access it correctly. I saw this article stating i could do this: https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15
I added user to each service, like this: user: ${CURRENT_UID}
and then in the CLI, i put a couple different options:
CURRENT_UID="$(whoami)" docker-compose up
CURRENT_UID="$(id -u):$(id -g)" docker-compose up
The top one failed because the user was not in passwd, but the bottom one gave me a "permissions denied" error. I was thinking it might have worked, but didnt.

How creating a non root user by simply setting up a random UID in a container FROM scratch works?

I'm setting up a Golang server with Docker and I want an unprivileged user to launch it inside its container for safety.
Here is the simple Dockerfile I use. I import my binary in the container and set a random UID.
FROM scratch
WORKDIR /app
COPY --chown=1001:1001 my-app-binary my-app-binary
USER 1001
CMD ["/app/my-app-binary"]
If my server listens to port 443, It doesn't work since it requires privileged rights. So my app is running by an unprivileged user as intended.
Nonetheless User 1001 was not properly created. The tutorials I saw tell me to create the user in an intermediate 'builder' container (alpine for instance) and import /etc/passwd from it. I didn't find any example doing what I do. (here one tutorial I followed)
Can someone explains to me why my solution works or what I didn't understand?
DISCLOSURE: In my answer I've used quotes from this blog post. I'm neither the author of this post nor in any way related to the author.
It's expected - containers can run under a user that is not known to the container. Quoting docker run docs:
root (id = 0) is the default user within a container. The image developer can create additional users. Those users are accessible by name. When passing a numeric ID, the user does not have to exist in the container.
-- https://docs.docker.com/engine/reference/#user
It helps you resolve issues like this:
Sometimes, when we run builds in Docker containers, the build creates files in a folder that’s mounted into the container from the host (e.g. the source code directory). This can cause us pain, because those files will be owned by the root user. When an ordinary user tries to clean those files up when preparing for the next build (for example by using git clean), they get an error and our build fails.
-- https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15#7d3a
And it's possible because:
Fortunately, docker run gives us a way to do this: the --user parameter. We're going to use it to specify the user ID (UID) and group ID (GID) that Docker should use. This works because Docker containers all share the same kernel, and therefore the same list of UIDs and GIDs, even if the associated usernames are not known to the containers (more on that later).
-- https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15#b430
The above applies to USER dockerfile command as well.
Using a UID not known to the container has some gotchas:
Your user will be $HOME-less
What we’re actually doing here is asking our Docker container to do things using the ID of a user it knows nothing about, and that creates some complications. Namely, it means that the user is missing some of the things we’ve learned to simply expect users to have — things like a home directory. This can be troublesome, because it means that all the things that live in $HOME — temporary files, application settings, package caches — now have nowhere to live. The containerised process just has no way to know where to put them.
This can impact us when we’re trying to do user-specific things. We found that it caused problems using gem install (though using Bundler is OK), or running code that relies on ENV['HOME']. So it may mean that you need to make some adjustments if you do either of those things.
Your user will be nameless, too
It also turns out that we can’t easily share usernames between a Docker host and its containers. That’s why we can’t just use docker run --user=$(whoami) — the container doesn't know about your username. It can only find out about your user by its UID.
That means that when you run whoami inside your container, you'll get a result like I have no name!. That's entertaining, but if your code relies on knowing your username, you might get some confusing results.
-- https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15#e295

Install entire database (including binaries) inside A VOLUME in Docker

I need to containerize a JanusGraph database inside Docker, i don't know what files/directories needs to reside in volume to become persistent/writable. In order to make all the things simple and fast, can i install the entire database in a volume? Not only the data, but the entire app, all the binaries etc. I think this is a fast way to containerize some of my apps.
The janusGraph subdirectories of binaries, data, log resides inside a "janusgraph-hadoop" directory
For example: i will create a volume called /janusgraph-hadoop and run the command to install all the software inside that (it will be a volume).
This can be considered a bad practice or there are no problem in doing that?
I know, we have some JanusGraph already containerized, but they are not official, and my doubt is more general in order to containerize some apps in a more direct way without the need to research what directories need to be in volume and what not.
I will not redistribute any of this, it's just to my use.
At a technical level, nothing would stop you from launching a plain container with an attached volume and installing software there.
docker run -v my_opt:/opt -it --rm ubuntu sh
I wouldn't consider this an especially effective use of Docker. If your colleague wants to use your database installation, you have no way of giving it to them; if you leave the project for six months and come back to it, you'll have no record of how you built this setup. If you were set on this approach, you might find the networking and snapshot setups for more typical virtual machines to be better matched to it.

Resources