Persisting docker container logs in kubernetes - docker

I’m looking for a really simple, lightweight way of persisting logs from a docker container running in kubernetes. I just want the stdout (and stderr I guess) to go to persistent disk, I don’t want anything else for analysing the logs, to send them over the internet to a third party, etc. as part of this.
Having done some reading I’ve been considering a DaemonSet with the application container, but then another container which has /var/lib/docker/containers mounted and also a persistent volume (maybe NFS) mounted too. That container would then need a way to copy logs from the default docker JSON logging driver in /var/lib/docker/containers to the persistent volume, maybe rsync running regularly.
Would that work (presumably if the rsync container goes down it's going to miss stuff because nothing's queuing, perhaps that's ok rather than trying to queue potentially huge amounts of logs), is this a sensible approach for the desired outcome? It’s only for one or two containers if that makes a difference. Thanks.

Fluentd supports a simple file output plugin (https://docs.fluentd.org/output/file) which you can easily aim at a PersistentVolume mount. Otherwise you would configure Fluentd (or Bit if you prefer) just like normal for Kubernetes so find your favorite guide and follow it.

Related

Analyze container file system on kubernetes after it exits/crashes

Perhaps a silly question with no sense:
In a kubernetes deployment (or minikube), when a pod container crashes, i would like to analyze the file system at that moment. In this way, i could see core dumps or any other useful information.
I know that i could mount a volume or PVC to get core dumps from a host-defined core pattern location, and i also could get logs by mean a rsyslog sidecar or any other way, but i still would like to do "post-mortem" analysis if possible. I assume that kubernetes should provide (but i don't know how, that's the reason of my question) some mechanism to do this forensics tasks easing the life to all of us, because in a production system we could need to analyze killed/exited containers.
I tried playing directly with docker run without --rm option, but can't get nothing useful from inspection to get useful information or recreate the file system in last moment that had the container alive.
Thank u very much!
When a pod container crashes, i would like to analyze the file system at that moment.
POD (Containers) natively use non-persistent storage.
When a container exits/terminates, so does the container’s storage.
POD (Container) can be connected to storage that is external. This will allows for the storage of persistent data (you can configure volume mount as path to core dump etc..), since this external storage is not removed when a container is stopped/killed will help you with more flexibility to analysis the file system. Configuring container file system storage with commonly used file systems such as NFS .. etc ..

Where should production critical and non-production non-critical data stored?

I was asked this question in an interview and i m not sure of the correct answer hence I would like your suggestions.
I was asked whether we should persist production critical data inside of the docker instance or outside of it? What would be my choice and the reasons for it.
Would your answer differ incase we have a non-prod non critical data ?
Back your answers with reasons.
Most data should be managed externally to containers and container images. I tend to view data constrained to a container as temporary (intermediate|discardable) data. Otherwise, if it's being captured but it's not important to my business, why create it?
The name "container" is misleading. Containers aren't like VMs where there's a strong barrier (isolation) between VMs. When you run multiple containers on a single host, you can enumerate all their processes using ps aux on the host.
There are good arguments for maintaining separation between processes and data and running both within a single container makes it more challenging to retain this separation.
Unlike processes, files in container layers are more isolated though. Although the layers are manifest as files on the host OS, you can't simply ls a container layer's files from the host OS. This makes accessing the data in a container more complex. There's also a performance penalty for effectively running a file system atop another file system.
While it's common and trivial to move container images between machines (viz docker push and docker pull), it's less easy to move containers between machines. This isn't generally a problem for moving processes as these (config aside) are stateless and easy to move and recreate, but your data is state and you want to be able to move this data easily (for backups, recovery) and increasingly to move amongst a dynamic pool of nodes that perform processing upon it.
Less importantly but not unimportantly, it's relatively easy to perform the equivalent of a rm -rf * with Docker by removing containers (docker container rm ...) and thereby deleting the application and your data.
The two very most basic considerations you should have here:
Whenever a container gets deleted, everything in the container filesystem is lost.
It's extremely common to delete containers; it's required to change many startup options or to update a container to a newer image.
So you don't really want to keep anything "in the container" as its primary data storage: it's inaccessible from outside the container, and will get lost the next time there's a critical security update and you must delete the container.
In plain Docker, I'd suggest keeping
...in the image: your actual application (the compiled binary or its interpreted source as appropriate; this does not go in a volume)
...in the container: /tmp
...in a bind-mounted host directory: configuration files you need to push into the container at startup time; directories of log files produced by the container (things where you as an operator need to directly interact with the files)
...in either a named volume or bind-mounted host directory: persistent data the container records in the filesystem
On this last point, consider trying to avoid this layer altogether; keeping data in a database running "somewhere else" (could be another container, a cloud service like RDS, ...) simplifies things like backups and simplifies running multiple replicas of the same service. A host directory is easier to back up, but on some environments (MacOS) it's unacceptably slow.
My answers don't change here for "production" vs. "non-production" or "critical" vs. "non-critical", with limited exceptions you can justify by saying "it's okay if I lose this data" ("because it's not the master copy of it").

logging nginx events from a docker container managed by kubernetes

Currently, to my understanding, kubernetes offers no logging solutions on it's own and it also does not allow one to specify the logging driver when using docker as the container technology due to scope encapsulation concerns.
This leaves folks with the ugly solution of tailing json logs from shared volumes using either fluentd, filebeat, or some other file tailing demon, parsing these, then sending them to the desired storage backend.
My question is, is there any repo or public knowledge config store for this type of scenario for people that have gone through this before? My use case would involve tailing the logs of a nginx docker image and writing out the fluentd/grok pattern myself seems really painful, plus i wouldn't want to struggle on an issue already solved by someone else.
Thanks
We tried logdna and the integration with k8s is pretty solid. Most of the time I just tail the log of some container using kubectl logs -f [CONTAINER_ID]. I'm guessing you're looking for a persistent approach.

Is it "safe" to commit a running container in docker?

As the title goes, safe means... the proper way?
Safe = consistent, no data loss, professional, legit way.
Hope to share some experiences with pro docker users.
Q. Commit is safe for running docker containers (with the exception of rapidly changing realtime stuff and database stuff, your own commentary is appreciated.)
Yes or No answer is accepted with comment. Thanks.
All memory and harddisk storage is saved inside the container instance. You should, as long as you don't use any external mounts/docker volumes and servers (externally connected DBs?) never get in trouble for stopping/restarting and comitting dockers. Please read on to go more in depth on this topic.
A question that you might want to ask yourself initially, is how does docker store changes that it makes to its disk on runtime? What is really sweet to check out, is how docker actually manages to get this working. The original state of the container's hard disk is what is given to it from the image. It can NOT write to this image. Instead of writing to the image, a diff is made of what is changed in the containers internal state in comparison to what is in the docker image.
Docker uses a technology called "Union Filesystem", which creates a diff layer on top of the initial state of the docker image.
This "diff" (referenced as the writable container in the image below) is stored in memory and disappears when you delete your container. When you use docker commit, the writable container that is retained in the temporary "state" of the container is stored inside a new image, however: I don't recommend this. The state of your new docker image is not represented in a dockerfile and can not easily be regenerated from a rebuild. Making a new dockerfile should not be hard. So that is alway the way-to-go for me personally.
When your docker is working with mounted volumes, external servers/DBs, you might want to make sure you don't get out of sync and temporary stop your services inside the docker container. When you would use a dockerfile you can start up a bootstrap shell script inside your container to start up connections, perform checks and initialize the running process to get your application durably set up. Again, running a committed container makes it harder to do something like this.

Docker container behavior when used in production

I am currently reading up on Docker. From what I understand, a container which is based on an image saves only the changes. If I were to use this in a production setup, does it persist it as soon as changes are written to disk by applications running "inside" the container or does it have to be done manually?
My concern is - what if the host abruptly shuts down? Will all the changes be lost?
The theory is that there's no real difference between a Docker container and a classical VM or physical host in most situations.
If the host abruptly dies, you can loose recent data using a container as well as using a physical host:
your application may not have decided to really send the write operation to save the data on disk,
the Operating System may have decided to wait a bit before sending data to storage devices
the filesystem may not have finished the write
the data may not have been really flushed to the physical storage device.
Now by default, Docker uses AUFS (stackable filesystem) which works at the file level.
If you're writing to a file that was existing in the Docker image, AUFS will first copy this base file to the upper, writable layer (container), before writing your change. This causes a delay depending on the size of the original file. Interesting and more technical information here.
I guess that if a power cut occurs happens while this original file is being copied and before your changes have been written, then that would be one reason to get more data loss with a Docker container than with any "classical" host.
You can move your critical data to a Docker "volume", which would be a regular filesystem on the host, bind-mounted into the container. This is the recommended way to deal with important data that you want to keep across containers deployments
To mitigate the AUFS potential issue, you could tell Docker to use LVM thin provisioning block devices instead of AUFS (wipe /var/lib/dockerand start the daemon with docker -d -s devicemapper). However I don't know if this storage backend received as much testing as the default AUFS one (it works ok for me though).

Resources