Persist container logs after restart - docker

I'm trying to persist the logs of a container that is running inside a docker stack I'm deploying the whole thing to the swarm using a .yml file but every solution I come across either does not work or I have to set it up manually like everytime I deploy the stack I have to mount manually. What would be the best way to persist the logs automatically without having to do it manually everytime? (Without Kibana etc..).

Deploy EFK stack in the container platform. FLuentd is run as daemonset collecting the logs from all the containers from a host and feeds to elasticsearch. Using kibana you can visualize the logs stored in elasticsearch.
With curator you can apply data retention policies depending on the amount of days you want to keep the logs.

Kubernetes Volumes can be referred to write the logs into persistent storage.
There are different solutions stacks for shipping, storing and viewing logs.

Related

Log to ELK from Nomad without using container technology

We are using Hashicorp Nomad to run microservices on Windows. We experienced that allocations come and go, but we would like to have centralized logging solution (ideally ELK) for all logs from all jobs and tasks from multiple environments. It is quite simple to do it with dockerized environments, but how can I do it if I run raw_exec tasks?
There's nothing specific to containers for log shipping other than the output driver. If containers write their logs to volumes, which Nomad can be configured to do, then the answer is the same.
Assuming your raw_exec jobs write logs into the local filesystem, then you need a log shipper product such as Filebeat or Fluentd to watch those paths, then push that data to Elastic / Logstash.

Docker Swarm - Should I remove a stack before deploying a stack?

I am not new to Docker, but I am new to Docker Swarm.
Our deployments typically consist of building a new docker image with the latest code, pushing that to our registry and then running docker stack deploy against a compose file.
My question is, do I need to run docker stack rm $STACK_NAME before running the deploy?
I'm not sure if the deploy command for swarm is smart enough to figure out that a docker image has changed and that it needs to do something.
You redeploy the same stack name without deleting the old stack. If you expect to have services deleted from your compose file, then you'll want to include the --prune option. For any unchanged service, swarm will leave it unmodified. But for any services with changes, including a new image on the registry server, you will see a rolling update performed according to the update config you specify in the compose file.
When you use the default VIP to connect to a service, as long as the service exists, even across rolling updates, the VIP will keep the same IP address so that other containers connecting to your service can do so without worrying about a stale DNS reference. And with a replicated service, the rolling update can prevent any visible outage. The combination of the two give you high availability that you would not have when deleting and recreating your swarm stack.

Persistent storage solution for Docker on AWS EC2

I want to deploy a node-red server on my AWS EC2 cluster. I got the docker image up and running without problems. Node-red stores the user flows in a folder named /data. Now when the container is destroyed the data is lost. I have red about several solutions where you can mount a local folder into a volume. What is a good way to deal with persistent data in AWS EC2?
My initial thoughts are to use a S3 volume or mount a volume in the task definition.
It is possible to use a volume driver plugin with docker that supports mapping EBS volumes.
Flocker was one of the first volume managers, it supports EBS and has evolved to support a lot of different back ends.
Cloudstor is Dockers volume plugin (It comes with Docker for AWS/Azure).
Blocker is an EBS only volume driver.
S3 doesn't work well for all file system operations as you can't update a section of an object, so updating 1 byte of a file means you have to write the entire object again. It's also not immediately consistent so a write then read might give you odd/old results.
The EBS volume can only be attached to one instance which means that you can only run your docker containers in one EC2 instance. Assuming that you would like to scale your solution in future with many containers running in ECS cluster then you need to look into EFS. It’s a shared system from AWS. The only issue is performance degradation of EFS over EBS.
The easiest way (and the most common approach) is run your docker with -v /path/to/host_folder:/path/to/container_folder option, so the container will refer to host folder and information will stay after it will be restarted or recreated. Here the detailed information about docker volume system.
I would use AWS EFS. It is like a NAS in that you can have it mounted to multiple instances at the same time.
If you are using ECS for your docker host the following guide may be helpful http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_efs.html

Docker Volume Containers for database, logs and metrics

I have an application that uses an embedded DB and also generates logs and raw metrics to the following directory structure:
/opt/myapp/data/
database/
logs/
raw_metrics/
I am in the process of learning Docker and am trying to "Dockerize" this app and am trying to find a mounting/volume solution that accomplishes the following goals for me:
The embedded database is stored in the same mounted volume regardless of how many container instances of myapp that I have running. In other words, all container instances write their data to the shared database/ volume; and
I'd also prefer the same for my logs and raw metrics (that is: all container instances write logs/metrics to the same shared volume), except here I need to be able to distinguish log and metrics data for each container. In other words, I need to know that container X generated a particular log message, or that container Y responded to a request in 7 seconds, etc.
I'm wondering what the standard procedure is here in Docker-land. After reading the official Docker docs as well as this article on Docker Volumes my tentative approach is to:
Create a Data Volume Container and mount it to, say, /opt/myapp on the host machine
I can then configure my embedded database to read DB contents from/write them to /opt/myapp/database, and I believe (if I understand what I've read correctly), all container instances will be sharing the same DB
Somehow inject the container ID or some other unique identifier into each container instance, and refactor my logging and metrics code to include that injected ID when generating logs or raw metrics, so that I might have, say, an /opt/myapp/logs/containerX.log file, an /opt/myapp/logs/containerY.log file, etc. But I'm very interested in what the standard practice is here for log aggregation amongst Docker containers!
Also, and arguably much more importantly, is the fact that I'm not sure that this solution would work in a multi-host scenario where I have a Swarm/cluster running dozens of myapp containers on multiple hosts. Would my Data Volume Container magically synchronize the /opt/myapp volume across all of the hosts? If not, what's the solution for mounting shared volumes for containers, regardless of whatever host they're running on? Thanks in advance!
There are multiple good questions. Following are some of my answers.
The default logging driver used by Docker is json-file. This will capture stdout and stderr in json format. There are other logging drivers(like syslog, fluentd, LogEntries etc) that can send to central log server. Using central logging also avoids the problem of maintaining volumes by ourselves. All Docker logging drivers are captured here(https://docs.docker.com/engine/admin/logging/overview/#supported-logging-drivers)
If you use Swarm mode with services, there is a concept of service logging where service logs contains logs associated with all containers associated with the service. (https://docs.docker.com/engine/reference/commandline/service_logs/)
Docker log contains container id by default which is added by logging driver. We can customize it using log options(https://docs.docker.com/engine/admin/logging/log_tags/)
For sharing data across containers like database, if the containers are in same host, we can use host based volumes. This will not work across nodes as there is no autosync. For sharing container data across nodes, we can either use shared filesystem(like nfs, ceph, gluster) or Docker volume plugins(ebs, gce)

Moving Docker Containers Around

I would like to use this Docker container:
https://registry.hub.docker.com/u/cptactionhank/atlassian-confluence/dockerfile/
My concern is that if I have to wind up moving this docker container to another machine (or it quits for some reason and needs to be restarted) that all the data (server config and other items stored on the file system) is lost. How do I ensure that this data isn't lost?
Thanks!
The first rule of Docker containers is don't locate your data inside your application container. Data that needs to persist beyond the lifetime of the container should be stored in a Docker "volume", either mounted from a host directory or from a data-only container.
If you want to be able to start containers on different hosts and still have access to your data, you need to make sure your data is available on those hosts. This problem isn't unique to Docker; it's the same problem you would have if you wanted to scale an application across hosts without using Docker.
Solutions include:
A network filesystem like NFS.
A cluster fileystem like Gluster.
A non-filesystem based data store, like a database, or something like Amazon S3.
This is not necessarily an exhaustive list but hopefully gives you some ideas.

Resources