How to provide a persistent ubuntu env by k8s - docker

I can provide a ubuntu with ssh by docker, and user can setup their env.
For example, he apt-get install something and modify his bashrc, vimrc and so on.
Once I restart this computer, the user still has same env after restart finished.
How can I provide same service by k8s?
Once I restart the node, it will create another pod on other computer.
But the env is based on init image, not the latest env from the user.
The naive way, mount all volume on the shared storage(PV + PVC). Such as /bin /lib /opt /usr /etc /lib64 /root /var /home and so on(Each possible directory may effected by any installation). What is the best practice or other way to do this?

#Saket is Correct.
If a docker container needs to persist its state (in this case the user changing something inside the container), then that state must be saved somewhere... How would you do this with a VM? Answer: save to disk.
In k8s storage is represented as a persistent volume. Something called a PVC (persistent volume claim), is used to maintain the relationship between the POD (your code) and the actual storage volume (whose implementation details you are abstracted from). The latest version of k8s supports the dynamic creation of persistent volumes, so all you have to do is create a unique PVC specific to each user, when deploying their container (I assume here you have a "Deployment" and "Service" for each user as well).
In conclusion... Unusual to run SSH within a container. Have you considered giving each user their own k8s environment instead? For example Openshift is multi-tenanted. Indeed Redhat are integrating Openshift as a backend for Eclipse Che, thereby running the entire IDE on k8s. See:
https://openshift.io/

I would advise you to use ConfigMaps (https://github.com/kubernetes/kubernetes/blob/master/docs/design/configmap.md). This guide should help what you are trying to do: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#configure-all-key-value-pairs-in-a-configmap-as-pod-environment-variables
Configmaps also allow you to store scripts, so you could have a .bashrc (or a section) stored in a confipmap.

Related

How does the data in HOME directory persist on cloud shell?

Do they use environment / config variables to link the persistent storage to the project related docker image ?
So that everytime new VM is assigned, the cloud shell image can be run with those user specific values ?
Not sure to have caught all your questions and concerns. So, Cloud Shell is in 2 parts:
The container that contains all the installed library, language support/sdk, binaries (docker for example). This container is stateless and you can change it (in the setting section of Cloud Shell) if you want to deploy a custom container. For example, it's what is done with Cloud Run Button for deploying a Cloud Run service automatically.
The volume dedicated to the current user that is mounted in the Cloud Shell container.
By the way, you can easily deduce that all you store outside the /home/<user> directory is stateless and not persist. /tmp directory, docker image (pull or created),... all of these are lost when the Cloud Shell start on other VM.
Only the volume dedicated to the user is statefull, and limited to 5Gb. It's linux environment and you can customize the .profile and .bash_rc files as you want. You can store keys in /.ssh/ directory and all the other tricks that you can do on Linux in your /home directory.

How docker detects which changes should be saved and which not?

I know that when we stop docker our changes are lost. There are many answers how to prevent this - commit each time. Idea is that when docker runs it will spin up a fresh container based on the image. On the other hand container persists some data after it exists unless you start using --rm.
Just to simplify:
If you run apt-get install vim, you must commit to save the change
BUT If you change nginx.conf or upload new file to HDFS, you do not lose the data.
So, just curious:
How docker knows what to save and what not? Ex: At the end of apt-get-install we have new files in the system. The same is when I upload new file. for the container/image there is NO difference , Right? Just I/O modification. So how docker know which modification should be saved when we stop the image?
The basic rules here:
Anything you explicitly store outside the container — a database, S3 — will outlive the container.
If you attach a volume to the container when you create the container using a docker run -v option or a Docker Compose volumes: option, any data written to that directory outlives the container. (If it’s a named volume, it lasts until you docker volume rm it.)
Anything else in the container filesystem is lost as soon as you docker rm the container.
If you need things like your application source code or a helper tool installed in an image, write a Dockerfile to describe how to build the image and run docker build. Check the Dockerfile into source control alongside your application.
The general theory of working with Docker is that you always start from a clean slate. When you docker build an image, you start from a base image and install your application into it; you never try to upgrade an installed application. Similarly, when you docker run a container, you start from a fresh copy of its image.
So the clearest answer to the question you ask is really, if you consistently docker rm a container when you stop it, when you docker run a new container, it will have the base image plus the content from the mounted volumes. Docker will never automatically persist anything outside of this.
You should never run docker commit: this leads to magic images that can’t be recreated later (in six months when you discover a critical security issue that risks taking your site down). Similarly, you should never install software in a running container, because it will be lost as soon as the container exits; add it to your Dockerfile and rebuild.
For any Container working with the Docker platform by default all the data generated is temporary and all the file generation or data generation is temporary and no data will persist if you have not mounted the filesystem part of if you have not attached volumes to the container.
IF you are finding that the nginx.conf is getting reused even after changes i would suggest try to find what directories are you trying to mount or mapped to the docker volumes.
The configurations for nginx which reside at /etc/nginx/conf.d/* and you might be mapping the volume with this directory. So if you make any changes in a working container and then remove the container the data will still persist as the data gets written to the writable layer. If the new container which you deploy later with the same volume mapping you will find all the changes you had initially done in the previous case are reflected in the newer container as well.

How do you configure a docker container during development time such that it can be deployed to kubernetes later

I'm configuring a docker container for development purposes with the intent to re-configure it (minimally) for k8s cluster deployment. Immediately I run into the issue of user permissions with volume mounts to my local source directory.
For deployment to the cluster I will bake my source directory into the image, which is really the only change I would want to make for deployment.
I've read many SO articles suggesting running as your local user/group id (1000/1000 in my case).
In docker, writing file to mounted file-system as non-root?
Docker creates files as root in mounted volume
Let non-root user write to linux host in Docker
Understanding user file ownership in docker: how to avoid changing permissions of linked volumes
Is it possible/sane to develop within a container Docker
But all of those questions seem to glance over a seemingly critical detail. When you use --user to alter your user ID within the docker container you lose root, and along with it a lot of functionality, for example whoami doesn't work. It seems to become very cumbersome to test configuration changes in the docker environment, which is common during development.
The options for developing directly into the docker container seem very limited:
Add user/group 1000/1000 to the docker image, which seems to violate the run-anywhere mantra of docker/kubernetes.
chown all your files constantly during development and use root in the container.
Are there other options to this list that is more palatable for developing directly into a docker container?

Backing up docker volumes

I've created a separate volume on an Ubuntu machine with the intention to store docker volumes and persist data. So far, I've created volumes on the host machine for two services (jira and postgres), which I intent to backup offsite. I am using docker-compose like so
postgres:
volumes:
- /var/dkr/pgdata:/var/lib/postgresql/data
And for jira:
volumes:
- /var/dkr/jira:/var/atlassian/jira
My thinking is that I could just rsync the /var/dkr folder to a temporary location, tar it and send it to S3. Now that I've read a bit more on the process of hosted volumes I am worried that I might end up with messed up GIDs and UIDs for the services when I restore from a backup.
My questions are - has docker resolved this problem in the newer versions (I am using the latest). Is it safe to take this approach? What would be a better way to backup my persistent volumes?
There's no magic solution to uid/gid mapping issues between containers and hosts. It would need to be implemented by the filesystem drivers in the Linux kernel, which is how NFS and some of the VM filesystem mappings work. For "bind" mounts, forcing a uid/gid is not an option from Linux, and Docker is just providing an easy to use interface on top of that.
With your backups, ensure that uid/gid is part of your backup (tar does this by default). Also ensure that the uid/gid being used in your container is defined in the image or specified to a static value in your docker run or compose file. As long as you don't depend on a host specific uid/gid, and restore preserving the uid/gid (default for tar as root), you won't have any trouble.
Worst case, you run something like find /var/dkr -uid $old_uid -exec chown $new_uid {} \; to change your UID's. The tar command also has options for change uid/gid on extract (see the man page for more details).

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

Resources