It is save to terminate the services using taskkill when the attempt to stop using NET STOP was failed? And if i terminate it using taskkill, do NET START command will affect or do i need to use START command?
consider this code as example:
#ECHO OFF
:STOP
NET STOP someservices
IF ERRORLEVEL == 0 GOTO :START
GOTO :KILL
:START
NET START someservices
:KILL
TASKKILL /F /IM someservices.exe
SLEEP 10
START someservices.exe
Use SC instead of NET -- it won't fail.
Billy3
EDIT: And you will have difficulty killing a fair number of services which can share the same process.
EDIT2: See this for more details -> http://support.microsoft.com/kb/314056
Related
How should look the Linux command to send terminate signal to the process/PID and if it fails to exit gracefully after 10 seconds kill it?
My attempt is: "sudo timeout -vk 5 10 kill PIDhere" (-v verbose, -k kill after X seconds) but I am not sure if it is good or how to adjust values or if there is better command that even work with part of the name shown in process COMMAND line. ("ps aux" output)
sudo timeout -vk 5 10 kill PIDhere
Will execute kill, and then attempt to terminate that process if it takes too long. Which shouldn't happen, and presumably isn't what you want (if kill was actually hanging, killing it would not affect your actual process). timeout is useful for capping how long a process runs for, not how long it takes to terminate after receiving a signal.
Instead, I'd suggest starting the process asynchronously (e.g. using & in a shell, but any language's subprocess library will have similar functionality) and then waiting for the process to terminate after you send it a signal. I describe doing this in Java in this answer. In the shell that might look like:
$ some_process &
# time passes, eventually we decide to terminate the process
$ kill %1
$ sleep 5s
$ kill -s SIGKILL %1 # will fail and do nothing if %1 has already finished
Or you could rely on wait which will return early if the job terminates before the sleep completes:
$ some_process &
# time passes
$ kill %1
$ sleep 5s &
$ wait -n %1 %2 # returns once %1 or %2 (sleep) complete
$ kill -s SIGKILL %1 # if %2 completes first %1 is still running and will be killed
You can do the same as above with PIDs instead of job IDs, it's just a little more fiddly because you have to worry about PID reuse.
if there is better command that even work with part of the name
Does pkill do what you want?
Given a Linux system, in Haskell GHCi 8.8.3, I can run a Docker command with:
System.Process> withCreateProcess (shell "docker run -it alpine sh -c \"echo hello\""){create_group=False} $ \_ _ _ pid -> waitForProcess pid
hello
ExitSuccess
However, when I switch to create_group=True the process hangs. The effect of create_group is to call set_pgid with 0 in the child, and pid in the parent. Why does that change cause a hang? Is this a bug in Docker? A bug in System.Process? Or an unfortunate but necessary interaction?
This isn't a bug in Haskell or a bug in Docker, but rather just the way that process groups work. Consider this C program:
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
int main(void) {
if(setpgid(0, 0)) {
perror("setpgid");
return 1;
}
execlp("docker", "docker", "run", "-it", "alpine", "echo", "hello", (char*)NULL);
perror("execlp");
return 1;
}
If you compile that and run ./a.out directly from your interactive shell, it will print "hello" as you'd expect. This is unsurprising, since the shell will have already put it in its own process group, so its setpgid is a no-op. If you run it with an intermediary program that forks a child to run it (sh -c ./a.out, \time ./a.out - note the backslash, strace ./a.out, etc.), then the setpgid will put it in a new process group, and it will hang like it does in Haskell.
The reason for the hang is explained in "Job Control Signals" in the glibc manual:
Macro: int SIGTTIN
A process cannot read from the user’s terminal while it is running as a background job. When any process in a background job tries to read from the terminal, all of the processes in the job are sent a SIGTTIN signal. The default action for this signal is to stop the process. For more information about how this interacts with the terminal driver, see Access to the Terminal.
Macro: int SIGTTOU
This is similar to SIGTTIN, but is generated when a process in a background job attempts to write to the terminal or set its modes. Again, the default action is to stop the process. SIGTTOU is only generated for an attempt to write to the terminal if the TOSTOP output mode is set; see Output Modes.
When you docker run -it something, Docker will attempt to read from stdin even if the command inside the container doesn't. Since you just created a new process group, and you didn't set it to be in the foreground, it counts as a background job. As such, Docker is getting stopped with SIGTTIN, which causes it to appear to hang.
Here's a list of options to fix this:
Redirect the process's standard input to somewhere other than the TTY
Use signal or sigaction to make the process ignore the SIGTTIN signal
Use sigprocmask to block the process from receiving the SIGTTIN signal
Call tcsetpgrp(0, getpid()) to make your new process group be the foreground process group (note: this is the most complicated, since it will itself cause SIGTTOU, so you'd have to ignore that signal at least temporarily anyway)
Options 2 and 3 will also only work if the program doesn't actually need stdin, which is the case with Docker. When SIGTTIN doesn't stop the process, reads from stdin will still fail with EIO, so if there's actually data you want to read, then you need to go with option 4 (and remember to set it back once the child exits).
If you have TOSTOP set (which is not the default), then you'd have to repeat the fix for SIGTTOU or for standard output and standard error (except for option 4, which wouldn't need to be repeated at all).
I am running Dask on a SLURM-managed cluster.
dask-ssh --nprocs 2 --nthreads 1 --scheduler-port 8786 --log-directory `pwd` --hostfile hostfile.$JOBID &
sleep 10
# We need to tell dask Client (inside python) where the scheduler is running
scheduler="`hostname`:8786"
echo "Scheduler is running at ${scheduler}"
export ARL_DASK_SCHEDULER=${scheduler}
echo "About to execute $CMD"
eval $CMD
# Wait for dash-ssh to be shutdown from the python
wait %1
I create a Client inside my python code and then when finished, I shut it down.
c=Client(scheduler_id)
...
c.shutdown()
My reading of the dask-ssh help is that the shutdown will shutdown all workers and then the scheduler. But it does not stop the background dask-ssh and so eventually the job timeouts.
I've tried this interactively in the shell. I cannot see how to stop the scheduler.
I would appreciate any help.
Thanks,
Tim
Recommendation with --scheduler-file
First, when setting up with SLURM you might consider using the --scheduler-file option, which allows you to coordinate the scheduler address using your NFS (which I assume you have given that you're using SLURM). Recommend reading this doc section: http://distributed.readthedocs.io/en/latest/setup.html#using-a-shared-network-file-system-and-a-job-scheduler
dask-scheduler --scheduler-file /path/to/scheduler.json
dask-worker --scheduler-file /path/to/scheduler.json
dask-worker --scheduler-file /path/to/scheduler.json
>>> client = Client(scheduler_file='/path/to/scheduler.json')
Given this it also becomes easier to use the sbatch or qsub command directly. Here is an example with SGE's qsub
# Start a dask-scheduler somewhere and write connection information to file
qsub -b y /path/to/dask-scheduler --scheduler-file /path/to/scheduler.json
# Start 100 dask-worker processes in an array job pointing to the same file
qsub -b y -t 1-100 /path/to/dask-worker --scheduler-file /path/to/scheduler.json
Client.shutdown
It looks like client.shutdown only shuts down the client. You're correct that this is inconsistent with the docstring. I've raised an issue here: https://github.com/dask/distributed/issues/1085 for tracking further developments.
In the meantime
These three commands should suffice to tear down the workers, close the scheduler, and stop the scheduler process
client.loop.add_callback(client.scheduler.retire_workers, close_workers=True)
client.loop.add_callback(client.scheduler.terminate)
client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.loop.stop())
What people usually do
Typically people start and stop clusters with whatever means that they started them. This might involve using SLURM's kill command. We should make the client-focused way more consistent though regardless.
I have 2 terminals running, and I'd like to run
#on term1
zeus start
#on term2
zeus server
The problem is that normally it should wait for the start process to complete.
My question is how could I make the second terminal to auto start the server after the zeus start complete?
I've tried sleep 2 ...but there should be better way.
Thanks
I am not sure there is something else than a hacky way for this. As stated in zeus's roadmap, starting the server whithout having to start zeus before is planned for version 2...
If you want a hacky way, you can play with a little shell script like the following:
while [[ "`ps aux | grep "zeus slave" | wc -l`" == "1" ]]; do sleep 1; done; zeus server
I am trying to build a script on ubuntu to start some Erlang code of mine:
the script is something like:
#!/bin/sh
EBIN=$HOME/path_to_beams
ERL=/usr/local/bin/erl
export HEART_COMMAND="$EBIN/starting_script start"
case $1 in
start)
$ERL -sname mynode -pa $EBIN \
-heart -detached -s my_module start_link
;;
*)
echo "Usage: $0 {start|stop|debug}"
exit 1
esac
exit 0
but I'm having a couple of problems.
First of all, the code can be executed only if the script is in the same directory as the beams, this seems strange to me, I double checked the paths, so why doesn't the -pa flag work?
Second, the script (without the -pa) works fine, but if I try to start instead of the main module (a gen_server) its supervisor (-s my_module_sup start_link) it doesn't work...this is strange, because if I start the supervisor from a normal shell everything works fine.
Third, the -heart flag, should restart the script in case of failure, but if I kill the process with a normal Unix kill, the process is not restarted.
Can someone give me some hints?
Thanks in advance,
pdn
The first thing that comes to mind is that you're using erlexport instead of erl. Not sure why you're doing this (I've not heard of erlexport before). Try it with erl instead.
Your -heart flag won't have meaning if the Erlang node itself is killed because the process can't keep itself alive. You would need another process running that monitors the Erlang process and restarts it if killed.