When stopping a Docker container in native Docker environment, by default it sends the SIGTERM signal to the container's init process (PID 1) which should be the actual application, which should then handle the shutdown properly. However when running container in the Jelastic, this does not seem to be case, and instead of gracefully terminating the SQL server, it seems that the server crashes every time.
I did try writing and enabling a Systemd service that gets the SQL PID and then send SIGTERM to it, but it doesn't seem to run, and judging from the logs there's no service shutdown messages at all, just startup messages.
So what changes would be required to the container or the environment to get the server to get the SIGTERM signal and have enough time, maybe few seconds, to do the graceful shutdown?
thank you for reporting the issue, we tried to reproduce the problem on our test lab and were able to get exactly same result. We agree that issue is really serious so we are going to fix it with highest priority now. Please accept our apologies for that inconvenience. I want to notice that due to our primary design we also expect the process to be terminated first with "sigterm" signal and only after not receiving a termination result for some period of time the system had to send "sigkill", only after considering that process cannot be terminated gracefully. Our engineers will work on this to explore the issue deeper and will deliver a fix shortly.
Thank you!
Related
When will Docker mark the inner service as failed without exiting a non zero code, and is there anyway to configure it not too do so?
I have a process running inside of docker, which is a python based server handling some heavy tasks. I found that if I send multiple request to it at the same time, although the memory is still quite OK, Docker will kind of thinks the inner process failed somehow (or something I don't know) -- it then restarts the service/container. I am guessing this is caused since the inner process is kind of stucked/not responding hence it gets marked as 'failed' -- I ran this server outside of docker and no error happened.
Also, if I use docker stack to deploy muiltiple replicas, it seems like when one of the replica is being restarted, all of the request to the service will get blocked till that replica recovers -- is it true? And is there anyway prevent this happen?
This is really annoying since the init steps of the server take quite some time...
I have an application inside a docker-compose. On startup, a lot of logging messages are created. When looking at the logs of my docker-compose in the morning, I can see those startup messages, but just before that the container was logging production messages.
In the past it happened that the application crashed because of a bug and restarted. Another time there was a memory problem because lots of data was accidently loaded, then there were error messages saying from Go indicating OutOfMemeory errors, then the container restarted.
But from time to time the container restarts without any indication why. How can I find out the reason why it restarts?
Assuming you have access to the host, I would suggest using volumes to persist the container's entire /var/log somewhere on the host. You can look at these log files to discover reasons for shutdown. Check out this unix.stackexchange post for details on how to do that.
I have a fleet of Backburner workers (Backburner::Workers::Simple). I seem to have hit an edge case where a worker occasionally can't get a DB connection, the job is reaped back by the server, and suddenly the worker goes on a tear, reserving jobs rapid-fire, all of which time out and eventually get buried, because the worker never again successfully gets a DB connection. Obviously, it would be ideal if I could fix the weirdness around DB connections and rapid-fire job reservation. That seems like a longer-term solution, though, because I've looked and don't see anything obvious. What I'd like to do is just have my error handler log the error, and then for the whole worker process to die. All of my workers are under process supervision, so this is a very clean, simple way to get a fresh worker without the DB problem. I've tried adding ; Kernel.exit (and variations on that) to my on_error lambda, but it doesn't seem to make a difference. How do I make this happen?
If the need is to kill the worker completely, no matter what, you can make a call to command line from ruby with system() to run a kill command.
So you just need to get the PID of the worker and then kill it from system call
system("kill -QUIT #{worker.pid}")
looking at how to get the PID of the worker, I get this information from backburner repository, and it seems like you can get the PID from the worker with
Process.pid
I have a universal react app hosted in a docker container in a minikube (kubernetes) dev environment. I use virtualbox and I actually have more microservices on this vm.
In this react app, I use pm2 to restart my app on changes to server code, and webpack hmr to hot-reload client code on changes to client code.
Every say 15-45 seconds, pm2 is logging the below message to me indicating that the app exited due to a SIGKILL.
App [development] with id [0] and pid [299], exited with code [0] via signal [SIGKILL]
I can't for the life of me figure out why it is happening. It is relatively frequent, but not so frequent that it happens every second. It's quite annoying because each time it happens, my webpack bundle has to recompile.
What are some reasons why pm2 might receive a SIGKILL in this type of dev environment? Also, what are some possible ways of debugging this?
I noticed that my services that use pm2 to restart on server changes do NOT have this problem when they are just backend services. I.e. when they don't have webpack. In addition, I don't see these SIGKILL problems in my prod version of the app. That suggests to me there is some problem with the combination of webpack hmr setup, pm2, and minikube / docker.
I've tried the app locally (not in docker /minikube) and it works fine without any sigkills, so it can't be webpack hmr on its own. Does kubernetes kill services that use a lot of memory? (Maybe it thinks my app is using a lot of memory). If that's not the case, what might be some reasons kubernetes or docker send SIGKILL? Is there any way to debug this?
Any guidance is greatly appreciated. Thanks
I can't quite tell from the error message you posted, but usually this is a result of the kernel OOM Killer (Out of Memory Killer) taking out your process. This can be either because your process is just using up too much memory, or you have a cgroup setting on your container that is overly aggressive and causing it to get killed. You may also have under-allocated memory to your VirtualBox instance.
Normally you'll see Docker reporting that the container exited with code 137 in docker ps -a
dmesg or your syslogs on the node in question may show the kernel OOM killer output.
I am making graceful shutdown feature using go lang when kebernetes doing rolling-update on google container engine. Does anyone know what process signal is sent to the running pods when kubectl rolling-update starts?
I've listened to os.Kill, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL, syscall.SIGSTOP signals to be handled, none of those signals was raised while kubectl rolling-update.
I would really appreciate for your answers.
I got a solution! I used shell script file as an ENTRYPOINT and executed go binary in that script file. So process ID of executed go binary was not 1.(shell script process's ID was 1 instead) And docker sent SIGTERM only to PID 1(which is not propagated to it's child processes). So, I had to change my ENTRYPOINT direct to executing go binary, and I got SIGTERM in my go code now. Refer to this link