I have an application inside a docker-compose. On startup, a lot of logging messages are created. When looking at the logs of my docker-compose in the morning, I can see those startup messages, but just before that the container was logging production messages.
In the past it happened that the application crashed because of a bug and restarted. Another time there was a memory problem because lots of data was accidently loaded, then there were error messages saying from Go indicating OutOfMemeory errors, then the container restarted.
But from time to time the container restarts without any indication why. How can I find out the reason why it restarts?
Assuming you have access to the host, I would suggest using volumes to persist the container's entire /var/log somewhere on the host. You can look at these log files to discover reasons for shutdown. Check out this unix.stackexchange post for details on how to do that.
Related
I am trying to stop a docker container and get the following error:
This happens randomly on occasion and it is very frustrating to have restart the docker service and relaunch all my containers.
Would anyone know what could be happening to cause this? As far I have seen or know, there has not been any changes made to the container since they have been launched, may some changes in the content of the data in the containers. Also if people need more information, I would be happy to provide.
FYI everything that I am doing I am doing as a root user.
ALSO -- ABSOLUTLEY CANNOT STOP THE DOCKER DAMON OR RESTART IT, THIS MUST BE RESOLVED WHILE KEEPING THE CURRENT CONTAINERS OPEN AND RUNNIN.
I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.
I've got a simple docker development setup for Airflow that includes separate containers for the Airflow UI and Worker. I'm encountering a 403 Forbidden error whenever I attempt to view the log for a task in the Airflow UI.
So far I've ensured they all have the same secret key (in fact, using Docker Volumes they're all reading the exact same configuration file) but this doesn't seem to help. I haven't done anything about time sync, but I'd expect that docker containers would effectively be sharing the system clock anyway so I don't see how they'd get out of sync in the first place.
I can find the log file on the airflow worker, and it has run successfully - but something is obviously missing that should be allowing the airflow UI to display that (and it would be much more convenient for my workflow to be able to see the logs in the UI rather than having to rummage around on the worker).
I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.
I've got a Docker container currently running in production on a CentOS 7 VM. We have encountered a problem where the logs of the container are filling up the host drive (the log files found at /var/lib/docker/{continer_name}) over time and causing the container to become unresponsive forcing us to clear logs on the host in order to enable it to continue processing.
We can't take the container down, meaning I can't just bring it back up using the --log-opt flag to set up some log rotation options.
We've tried using logrotate, but the nature of the container means the logs are being written to regularly and what we find is often the logs are rotated, but the original file does not decrease in size due to being written to whilst the rotation is underway.
I'm trying to find a solution to this problem where we can set up some kind of task that will clear the logs down to a specific file size. Any help is greatly appreciated.
i would suggest mounting the containers logs directory to a host directory, and there you can schedule whatever task to zip/move/delete the log files...