How to work with variables in DOCKERFILE - docker

I am creating NGINX container. I want to write all logs into a mounted volume rather than the default volume. This I can achieve by updating nginx.conf file by pointing access_log and error_log to a folder in mounted volume. The twist is that I want each container to write to container specific folder within the mounted volume.
For eg:
Container image name: mycontainerapp
Mounted volume: /logdirectory
Then I want:
/var/log to point to /logdirectory/mycontainerapp/{containerID}/log
This way, I can have multiple containers log to the common mounted volume.
AFAIK, I can get container ID from /proc/1/cpuset
I am not sure of any other way to get the container ID
Question is, how can I read that containerID and use it to create the mounted volume (with folder name) using DOCKERFILE?
Also, if there is a better approach to what I am trying to achieve, please do let me know as I am a newbie to docker.

Docker has a logging mechanism included which removes standard log files from the equation. All data sent to stdout and stderr will be captured by Dockers logging interface.
There are a number of logging drivers that can then ship logs from your Docker host to a central logging service (Graylog, Syslog, AWS CloudWatch, ETW, Fluentd, Google Cloud, Splunk). The json driver is the default which is locally stored on the Docker host. journald will also be stored and accessible locally.
In nginx config, or any container for that matter, send the access log stdout or /dev/fd/1 and send the error log to stderr or /dev/fd/2
daemon off;
error_log /dev/fd/2 info;
http {
access_log /dev/fd/1;
...
}
Once you start applying this concept to all containers, any log management requirements are removed from the container/application level and pushed up to the host. Container meta data can be attached to logs. It becomes easier to move or change the logging mechanism. Moving to clustered setups like Swarm becomes less of a hassle. This all ties into the 1 process per container idea of the world that Docker pushes.

Related

Is there a way to get an optional bind mount in docker swarm

I have a swarm service that bind-mounts a file that may not exist. If the file does not exist the service fails to deploy (and I get logs complaining about the missing file). I would prefer to have the service deploy anyway, just missing that mount. Is there a way to let that happen?
The being being mounted is a unix socket to a local memcached instance. The app can run without it and we don't run memcached on every node, so I'd like to allow the service to deploy even if the bind mount fails (if the ideal node goes down and the service has to move to another node that doesn't run memcached).
I realize I could move the mount point to a directory that will always exist on every host machine, but I'd prefer to keep the bind mount exposure minimal if possible.
Recently I had a similar scenario and I implemented a NFS server in one node and then I mount it in every swarm node. So, I always have files in the same path.

Scaling filebeat over docker containers

I’m looking for the appropriate way to monitor applicative logs produced by nginx, tomcat, springboot embedded in docker with filebeat and ELK.
In the container strategy, a container should be used for only one purpose.
One nginx per container and one tomcat per container, meaning we can’t have an additional filebeat within a nginx or tomcat container.
Over what I have read over Internet, we could have the following setup:
a volume dedicated for storing logs
a nginx container which mount the dedicated logs volume
a tomcat / springboot container which mount the dedicated logs volume
a filebeat container also mounting the dedicated logs volume
This works fine but when it comes to scale out nginx and springboot container, it is a little bit more complex for me.
Which pattern should I use to push my logs using filebeat to logstash if I have the following configuration:
several nginx containers in load balancing with the same configuration (logs configuration is the same: same path)
several springboot rest api containers behing nginx containers with the same configuration (logs configuration is the same:same path)
Should I create one volume by set of nginx + springboot rest api and add a filebeat container ?
Should I create a global log volume shared by all my containers and have a different log filename by container
(having the name of the container in the filename of the logs?) and having only one filebeat container ?
In the second proposal, how to scale filebeat ?
Is there another way to do that ?
Many thanks for your help.
The easiest thing to do, if you can manage it, is to set each container process to log to its own stdout (you might be able to specify /dev/stdout or /proc/1/fd/1 as a log file). For example, the Docker Hub nginx Dockerfile specifies
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
so the ordinary nginx logs become the container logs. Once you do that, you can plug in the filebeat container input to read those logs and process them. You could also see them from outside the container with docker logs, they are the same logs.
What if you have to log to the filesystem? Or there are multiple separate log streams you want to be able to collect?
If the number of containers is variable, but you have good control over their configuration, then I'd probably set up a single global log volume as you describe and use the filebeat log input to read every log file in that directory tree.
If the number of containers is fixed, then you can set up a volume per container and mount it in each container's "usual" log storage location. Then mount all of those directories into the filebeat container. The obvious problem here is that if you do start or stop a container, you'll need to restart the log manager for the added/removed volume.
If you're actually on Kubernetes, there are two more possibilities. If you're trying to collect container logs out of the filesystem, you need to run a copy of filebeat on every node; a DaemonSet can manage this for you. A Kubernetes pod can also run multiple containers, so your other option is to set up pods with both an application container and a filebeat "sidecar" container that ships the logs off. Set up the pod with an emptyDir volume to hold the logs, and mount it into both containers. A template system like Helm can help you write the pod specifications without repeating the logging sidecar setup over and over.

Filebeat to monitor logs of several containers which are inside the containers

I have one question, Is there any way to ship the logs of each container where the log files are located inside the containers. Actually, the current flow will help to ship the log files which is located in the default path(var/lib/docker/containers//.log). I want to customize the filebeat.yaml to ship the logs from each container to logstash instead of the default path.
If you can set your containers to log to stdout rather than to files, it looks like filebeat has an autodiscover mode which will capture the docker logs of every container.
Another common setup in an ELK world is to configure logstash on your host, and set up Docker's logging options to send all output on containers' stdout into logstash. This makes docker logs not work, but all of your log output is available via Kibana.
If your container processes always write to log files, you can use the docker run -v option or the Docker Compose volumes: option to mount a host directory on to an individual container's /var/log directory. Then the log files will be visible on the host, and you can use whatever file-based collector to capture them. This is in the realm of routine changes that will require you to stop and delete your existing containers before starting them with different options.

Running filebeat on docker host OS and collecting logs from containers

I have a server that is the host OS for multiple docker containers. Each of the containers contains an application that is creating logs. I want these logs to be sent to a single place by using the syslog daemon, and then I want filebeat to transmit this data to another server. Is it possible to install filebeat on the HOST OS (without making another container for filebeat), and make the containers applications' log data be collected by the syslog daemon and then consolidated in /var/log on the host OS? Thanks.
You need to share a volume with every container in order to get your logs in the host filesystem.
Then, you can install filebeat on the host and forward the logs where you want, as they were "standard" log files.
Please be aware that usually docker containers do not write they logs to real log files, but to stdout. That means that you'll probably need custom images in order to fix this logging problem.

Docker Volume Containers for database, logs and metrics

I have an application that uses an embedded DB and also generates logs and raw metrics to the following directory structure:
/opt/myapp/data/
database/
logs/
raw_metrics/
I am in the process of learning Docker and am trying to "Dockerize" this app and am trying to find a mounting/volume solution that accomplishes the following goals for me:
The embedded database is stored in the same mounted volume regardless of how many container instances of myapp that I have running. In other words, all container instances write their data to the shared database/ volume; and
I'd also prefer the same for my logs and raw metrics (that is: all container instances write logs/metrics to the same shared volume), except here I need to be able to distinguish log and metrics data for each container. In other words, I need to know that container X generated a particular log message, or that container Y responded to a request in 7 seconds, etc.
I'm wondering what the standard procedure is here in Docker-land. After reading the official Docker docs as well as this article on Docker Volumes my tentative approach is to:
Create a Data Volume Container and mount it to, say, /opt/myapp on the host machine
I can then configure my embedded database to read DB contents from/write them to /opt/myapp/database, and I believe (if I understand what I've read correctly), all container instances will be sharing the same DB
Somehow inject the container ID or some other unique identifier into each container instance, and refactor my logging and metrics code to include that injected ID when generating logs or raw metrics, so that I might have, say, an /opt/myapp/logs/containerX.log file, an /opt/myapp/logs/containerY.log file, etc. But I'm very interested in what the standard practice is here for log aggregation amongst Docker containers!
Also, and arguably much more importantly, is the fact that I'm not sure that this solution would work in a multi-host scenario where I have a Swarm/cluster running dozens of myapp containers on multiple hosts. Would my Data Volume Container magically synchronize the /opt/myapp volume across all of the hosts? If not, what's the solution for mounting shared volumes for containers, regardless of whatever host they're running on? Thanks in advance!
There are multiple good questions. Following are some of my answers.
The default logging driver used by Docker is json-file. This will capture stdout and stderr in json format. There are other logging drivers(like syslog, fluentd, LogEntries etc) that can send to central log server. Using central logging also avoids the problem of maintaining volumes by ourselves. All Docker logging drivers are captured here(https://docs.docker.com/engine/admin/logging/overview/#supported-logging-drivers)
If you use Swarm mode with services, there is a concept of service logging where service logs contains logs associated with all containers associated with the service. (https://docs.docker.com/engine/reference/commandline/service_logs/)
Docker log contains container id by default which is added by logging driver. We can customize it using log options(https://docs.docker.com/engine/admin/logging/log_tags/)
For sharing data across containers like database, if the containers are in same host, we can use host based volumes. This will not work across nodes as there is no autosync. For sharing container data across nodes, we can either use shared filesystem(like nfs, ceph, gluster) or Docker volume plugins(ebs, gce)

Resources