All docker stack are restarting automatically

All docker stack are restarting automatically - docker

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.

After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Related

Why does Docker randomly throw this a 'Permission Denied' error when trying to stop a container?

I am trying to stop a docker container and get the following error:
This happens randomly on occasion and it is very frustrating to have restart the docker service and relaunch all my containers.
Would anyone know what could be happening to cause this? As far I have seen or know, there has not been any changes made to the container since they have been launched, may some changes in the content of the data in the containers. Also if people need more information, I would be happy to provide.
FYI everything that I am doing I am doing as a root user.
ALSO -- ABSOLUTLEY CANNOT STOP THE DOCKER DAMON OR RESTART IT, THIS MUST BE RESOLVED WHILE KEEPING THE CURRENT CONTAINERS OPEN AND RUNNIN.

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

Docker install of AZCore results in authserver+worldserver doesn't exist error

I'm trying to spin up a fresh server using the azerothcore docker installation guide. I have completed all of the early installation steps, up until running the containers. Upon running the containers (for worldserver and authserver) i see the following output from the containers. It appears the destination of the world and auth servers in dist/bin is missing, how may i resolve this issue?

Check your docker settings. Make sure you have enough memory. If containers have low memory they will not finish the compile. Check if you have build issues.

Docker Desktop Kubernetes Unable to connect to the server: EOF

Earlier today I had increased my Docker desktop resources, but when ever since it restarted Kubernetes has not been able to complete its startup. Whenever I try to run a kubectl command, I get Unable to connect to the server: EOF in response.
I had thought that it started because I hadn't deleting a helm chart before adjusting the resource values in Settings, thus said resources having been assigned to the pods instead of the Kubernetes api server. But I have not been able to fix this issue.
This is what I have tried thus far:
Restarting Docker again
Reset Kubernetes
Reset Docker to factory settings
Deleting the VM in hyper-v and restarting Docker
Uninstalling and reinstalling Docker Desktop
Deleting the pki folder and restart Docker
Set the Environment Variable for KUBECONFIG
Deleting .kube/config and restart
Another clean reinstall of Docker Desktop
But Kubernetes does not complete its startup, so I still get Unable to connect to the server: EOF in response.
Is there anything I haven't tried yet?

I'll share that what solved this for me was Docker Desktop settings feature for "reset kubernetes cluster". I know that #shenyongo said that a "reset kubernetes" didn't work, and I suppose they mean this.
But for the sake of other readers who may find this, I had this same error message (with Docker Desktop on Windows 11, using wsl2), and the solution for me was indeed to do this:
open the Settings page (in Docker Desktop--right-click on it in the status tray)
then choose "Kubernetes" on the left
then choose "reset kubernetes cluster"
Yes, that warns that "all stacks and kubernetes resources will be deleted", but as nothing else had worked for me (and I wasn't worried about losing much), I tried it, and it did the trick. In moments, all my k8s functionality was back to working.
As background, k8s had been working fine for me for some time. It was just that one day I found I was getting this error. I searched and searched and found lots of folks asking about it but not getting answers, let alone this answer. To be clear, like the OP here I had tried restarting Docker Desktop, restarting the host machine, even downloading and installing an available DD update (I was only a bit behind), and none of those worked. I didn't proceed to ALL the steps shenyongo did, as I thought I'd try this first, and the reset worked.
Hope that may help others. I realize some may fear losing something, but this helps stress the power of declarative vs imperative k8s configuration. It SHOULD be easy to recreate most everything if necessary. I realize it may not be so for everyone.

Timeouts accessing services on swarm published ports

We're using Docker in Swarm mode to host a number of services. Recently we've hit an issue where we get connection timeouts intermittently (sometimes as much as every other request) when trying to access some services.
We've upgraded the environment to the latest version of Docker (currently Docker version 17.03.0-ce, build 3a232c8), done a staggered reboot of all servers (trying to maintain uptime if possible even though this environment is technically a test environment) and tried stopping / starting services as well, but the issue still persists.
I'm confident the issue is not related to the service that's running in Docker, as we're seeing it on various services which have until recently been running without issue, I think it's more likely an environmental issue, or some problem with Docker's internal routing in the overlay network, but not sure how to prove / solve this.
Any advice on how to diagnose or solve this would be greatly appreciated!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart