Why did I have to kill -9 neo4j? - neo4j

I asked nicely:
$ neo4j stop
Stopping Neo4j Server [74949]........................ lots of dots ......
waited a few minutes, asked nicely again:
$ kill 74949
$ ps x | grep neo
74949 ?? R 30:13.01 /Library/Java/Java...org.neo4j.server.Bootstrapper
still running. Done asking nicely.
$ kill -9 74949
Why did neo4j not respect the TERM signal? If it was waiting for something, how could I have found out what?
Normally, I would ask this kind of question on Server Fault, but the neo4j site points here.

Not necessarily the descending order of usefulness...
ps alx might have given a hint (process state - but with Java programs the issue isn't that often the jvm itself that died/locked but stuff that's running inside the jvm)
in top 100% cpu usage may indicate an endless loop
Java processes can end up in a state where all they still do is gc'ing in an almost always vain attempt to free up memory, enabling gc logging can help you detect this condition.
afaik neo4j is remotely monitorable via jmx (visualvm or jconsole) and I've already used these tools to determine what thread hung one of our Glassfish servers.

If you execute kill [PID], you only send (TERM SIGNAL) and the process will close it self gently. If you send SIGNAL number 9 (KILL SIGNAL) the process will end immediately. Regardless of it current state. Signal 9 (can not be ignored).

Related

How to reliably clean up dask scheduler/worker

I'm starting up a dask cluster in an automated way by ssh-ing into a bunch of machines and running dask-worker. I noticed that I sometimes run into problems when processes from a previous experiment are still running. Wha'ts the best way to clean up after dask? killall dask-worker dask-scheduler doesn't seem to do the trick, possibly because dask somehow starts up new processes in their place.
If you start a worker with dask-worker, you will notice in ps, that it starts more than one process, because there is a "nanny" responsible for restarting the worker in the case that it somehow crashes. Also, there may be "semaphore" processes around for communicating between the two, depending on which form of process spawning you are using.
The correct way to stop all of these would be to send a SIGINT (i.e., keyboard interrupt) to the parent process. A KILL signal might not give it the chance to stop and clean up the child process(s). If some situation (e.g., ssh hangup) caused a more radical termination, or perhaps a session didn't send any stop signal at all, then you will probably have to grep the output of ps for dask-like processes and kill them all.

Can the OS kill the process randomly in Linux?

One of our processes went down in a Linux box. When I checked in logs, I could see it was shut down. That shows graceful shutdown. I checked CPU, Memory, process utilization, all under threshold. There were no abnormalities found over memory utilization. Is there any way that OS killed the process randomly?
Any suggestions?
The kernel can kill a process under extreme circumstances ie memory starvation. But since this was not the case, and you are sure that sysads did not kill the process either. The shutdown must have been initiated from within the process.
Linux would not kill your process unless there are extreme circumstances. Although, some other process running under root might be able to send such signals.
You should get more idea from the kernal logs and make sure that was the process killed by OS itself or not.

Memory usage in a Rails app, how to monitor and troubleshoot?

I have a Rails app that among other things, have several background jobs which are computationally expensive (image manipulation :O).
I am using Sidekiq to manage those jobs. I currently have set a concurrency of 5 threads per Sidekiq process and here is what I do in order to see the memory usage:
ps faux | grep sidekiq
Results are this:
hommerzi 3874 3.5 5.7 287484 231756 pts/5 Sl+ 17:17 0:10 | \_ sidekiq 2.17.0 imageparzer [3 of 3 busy]
However, I have a feeling that there must be a way to monitor this correctly from within the Rails app, or am I wrong?
My question would be: How can I monitor my memory usage in this scenario?
My advice would be to use Monit (or God) to manage your processes. This goes for database, server, application; not just background jobs.
Here's an example: Monit Ruby on Rails Sidekiq
Monitor your application for a while and set realistic memory limits. Then, if one of your processes dies or goes above that limit for a given amount of cycles (usually 2 minute checks), it will (re)start the process.
You can also setup an alert email address and a web frontend (with basic HTTP auth). This will prove essential for running stable applications in production. For example, recently I had a sidekiq process get carried away with itself and chew up 250mb memory. Monit then restarted the process (which is now hovering around 150mb) and sent me an alert. Now I can check the logs/system to see why that might have happened. This all happened while I was asleep. Much better than the alternative: waking up and finding your server on its knees with a runaway process or downed service.
https://www.digitalocean.com/community/articles/how-to-install-and-configure-monit

daemon process killed automatically

Recently I run a below command to start daemon process which runs once in a three days.
RAILS_ENV=production lib/daemons/mailer_ctl start, which was working but when I came back after a week and found that process was killed.
Is this normal or not?
Thanks in advance.
Nope. As far as I understand it, this daemon is supposed to run until you kill it. The idea is for it to work at regular intevals, right? So the daemon is supposed to wake up, do its work, then go back to sleep until needed. So if it was killed, that's not normal.
The question is why was it killed and what can you do about it. The first question is one of the toughest ones to answer when debugging detached processes. Unless your daemon leaves some clues in the logs, you may have trouble finding out when and why it terminated. If you look through your logs (and if you're lucky) there may be some clues -- I'd start right around the time when you suspect it last ran and look at your Rails production.log, any log file the daemon may create but also at the system logs.
Let's assume for a moment that you never can figure out what happened to this daemon. What to do about it becomes an interesting question. The first step is: Log as much as you can without making the logs too bulky or impacting performance. At a minimum log wakeup, processing, and completion events, as well as trapping signals and logging them. Best to log to somewhere other than the Rails production.log. You may also want to run the daemon at a shorter interval than 3 days until you are certain it is stable.
Look into using a process monitoring tool like monit (http://mmonit.com/monit/) or god (http://god.rubyforge.org/). These tools can "watch" the status of daemons and if they are not running can automatically start them. You still need to figure out why they are being killed, but at least you have some safety net.

How do you go about setting up monitoring for a non-web frontend process?

I have a worker process that is running in a server with no web frontend. what is the best way to set up monitoring fot it? It recently died for 3 days, and i did not know about it
There are several ways to do this. One simple option is to run a cron job that checks timestamps on the process's logs (and, of course, make sure the process logs something routinely).
Roll your own reincarnation job. Have your background process get its PID, then write it to a specific pre-determined location when it starts. Have another process (or perhaps cron) read that PID, then check the symbolic link proc/{pid}/exe. If that link does not exist or is not your process, it needs to be re-started.
With PHP, you can use posix_getpid() to obtain the PID. Then fopen() / fwrite() to write it to a file. use readlink() to read the symbolic link (take care to note FALSE as a return value).
Here's a simple bash-ified example of how the symlink works:
tpost#tpost-desktop:~$ echo $$
13737
tpost#tpost-desktop:~$ readlink /proc/13737/exe
/bin/bash
So, once you know the PID that the process started with, you can check to see if its still alive, and that Linux has not recycled the PID (you only see PID recycling on systems that have been running for a very long time, but it does happen).
This is a very, very cheap operation, feel free to let it do its work every minute, 30 seconds or even shorter intervals.

Resources