Docker Logs filling up on running container - docker

I've got a Docker container currently running in production on a CentOS 7 VM. We have encountered a problem where the logs of the container are filling up the host drive (the log files found at /var/lib/docker/{continer_name}) over time and causing the container to become unresponsive forcing us to clear logs on the host in order to enable it to continue processing.
We can't take the container down, meaning I can't just bring it back up using the --log-opt flag to set up some log rotation options.
We've tried using logrotate, but the nature of the container means the logs are being written to regularly and what we find is often the logs are rotated, but the original file does not decrease in size due to being written to whilst the rotation is underway.
I'm trying to find a solution to this problem where we can set up some kind of task that will clear the logs down to a specific file size. Any help is greatly appreciated.

i would suggest mounting the containers logs directory to a host directory, and there you can schedule whatever task to zip/move/delete the log files...

Related

Why does Docker randomly throw this a 'Permission Denied' error when trying to stop a container?

I am trying to stop a docker container and get the following error:
This happens randomly on occasion and it is very frustrating to have restart the docker service and relaunch all my containers.
Would anyone know what could be happening to cause this? As far I have seen or know, there has not been any changes made to the container since they have been launched, may some changes in the content of the data in the containers. Also if people need more information, I would be happy to provide.
FYI everything that I am doing I am doing as a root user.
ALSO -- ABSOLUTLEY CANNOT STOP THE DOCKER DAMON OR RESTART IT, THIS MUST BE RESOLVED WHILE KEEPING THE CURRENT CONTAINERS OPEN AND RUNNIN.

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

Why did docker container restart?

I have an application inside a docker-compose. On startup, a lot of logging messages are created. When looking at the logs of my docker-compose in the morning, I can see those startup messages, but just before that the container was logging production messages.
In the past it happened that the application crashed because of a bug and restarted. Another time there was a memory problem because lots of data was accidently loaded, then there were error messages saying from Go indicating OutOfMemeory errors, then the container restarted.
But from time to time the container restarts without any indication why. How can I find out the reason why it restarts?
Assuming you have access to the host, I would suggest using volumes to persist the container's entire /var/log somewhere on the host. You can look at these log files to discover reasons for shutdown. Check out this unix.stackexchange post for details on how to do that.

How to stop and restart a Compute Engine VM that runs a Docker container

I'm running a Docker container on Compute Engine, using the Container Image VM property.
However, if I stop and restart the VM, my app works but the logs aren't collected any more.
When I run docker ps I only see my own Docker image. However, for a new VM that hasn't been stopped I also see a container image called gcr.io/stackdriver-agents/stackdriver-logging-agent.
Are there any specific steps I need to take to restore the VM as it was before it was stopped? How can I make logging work again, and are there other differences I should be aware of?
I understand you are running a docker container on Compute Engine and when you stop/restart the VM, the logs aren’t being collected anymore. As well as wanting to know how to restore a VM to its previous form and the stackdriver-logging-agent.
As described in this article [1], you can use GCE snapshots to create backups of persistent disks attached to the instance, including boot volumes. This is useful for backing up your data, recreating a disk that might have been lost, or copying a persistent disk. That being said, currently this is the method you can recover deleted disk.
Therefore, unfortunately if there are no snapshots taken already from the VM’s disk(s), the deleted disk volume cannot be recovered, this process is irreversible [2].
In the future, you can set disk ‘auto-delete’ [3] to no when creating an instance, this way disk will remain even if the instance is deleted.
As for the the logging agent image, it’s a container image that streams logs from your VM instances and from selected third-party software packages to Stackdriver Logging. It is a best practice to run the Logging agent on all your VM instances, which can answer your question as to why the logs aren’t appearing anymore. They are simply being recorded by the logging agent and sent to Stackdriver Logging.
For the logs not being recollected you can try this out to reset the service:
Please do the following on your affected Windows instance:
Stop the "StackdriverLogging" service. You can do it from command line with "net stop StackdriverLogging"
Navigate to the following directory: "C:\Program Files (x86)\Stackdriver\LoggingAgent\Main\pos\winevtlog.pos\worker0"
Remove the file “storage.json” located in that directory
Restart StackdriverLogging service - execute "net start StackdriverLogging" from command line.
This should reset logging agent state and make logging functional again.
[1] https://cloud.google.com/compute/docs/disks/create-snapshots
[2] https://cloud.google.com/compute/docs/disks/#pdspecs
[3] https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--disk

Resources