In fluentd we are trying to assign pipeline execution task to worker-0 node using following configuration
<worker 0>
#include ./mgmt/mgmt_fluent.conf
</worker>
While execution, worker-0 update configurations for other worker node and then call fluentd /api/config.reload, which restart all workers including worker-0. We don't want worker-0 to get restarted because of which other task performed by worker-0 gets interrupted.
Is there any way to restart only required workers?
Or is it possible to assign task done by worker-0 to supervisor?
Something similar to,
<supervisor>
#include ./mgmt/mgmt_fluent.conf
</supervisor>
Related
Given a Linux system, in Haskell GHCi 8.8.3, I can run a Docker command with:
System.Process> withCreateProcess (shell "docker run -it alpine sh -c \"echo hello\""){create_group=False} $ \_ _ _ pid -> waitForProcess pid
hello
ExitSuccess
However, when I switch to create_group=True the process hangs. The effect of create_group is to call set_pgid with 0 in the child, and pid in the parent. Why does that change cause a hang? Is this a bug in Docker? A bug in System.Process? Or an unfortunate but necessary interaction?
This isn't a bug in Haskell or a bug in Docker, but rather just the way that process groups work. Consider this C program:
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
int main(void) {
if(setpgid(0, 0)) {
perror("setpgid");
return 1;
}
execlp("docker", "docker", "run", "-it", "alpine", "echo", "hello", (char*)NULL);
perror("execlp");
return 1;
}
If you compile that and run ./a.out directly from your interactive shell, it will print "hello" as you'd expect. This is unsurprising, since the shell will have already put it in its own process group, so its setpgid is a no-op. If you run it with an intermediary program that forks a child to run it (sh -c ./a.out, \time ./a.out - note the backslash, strace ./a.out, etc.), then the setpgid will put it in a new process group, and it will hang like it does in Haskell.
The reason for the hang is explained in "Job Control Signals" in the glibc manual:
Macro: int SIGTTIN
A process cannot read from the user’s terminal while it is running as a background job. When any process in a background job tries to read from the terminal, all of the processes in the job are sent a SIGTTIN signal. The default action for this signal is to stop the process. For more information about how this interacts with the terminal driver, see Access to the Terminal.
Macro: int SIGTTOU
This is similar to SIGTTIN, but is generated when a process in a background job attempts to write to the terminal or set its modes. Again, the default action is to stop the process. SIGTTOU is only generated for an attempt to write to the terminal if the TOSTOP output mode is set; see Output Modes.
When you docker run -it something, Docker will attempt to read from stdin even if the command inside the container doesn't. Since you just created a new process group, and you didn't set it to be in the foreground, it counts as a background job. As such, Docker is getting stopped with SIGTTIN, which causes it to appear to hang.
Here's a list of options to fix this:
Redirect the process's standard input to somewhere other than the TTY
Use signal or sigaction to make the process ignore the SIGTTIN signal
Use sigprocmask to block the process from receiving the SIGTTIN signal
Call tcsetpgrp(0, getpid()) to make your new process group be the foreground process group (note: this is the most complicated, since it will itself cause SIGTTOU, so you'd have to ignore that signal at least temporarily anyway)
Options 2 and 3 will also only work if the program doesn't actually need stdin, which is the case with Docker. When SIGTTIN doesn't stop the process, reads from stdin will still fail with EIO, so if there's actually data you want to read, then you need to go with option 4 (and remember to set it back once the child exits).
If you have TOSTOP set (which is not the default), then you'd have to repeat the fix for SIGTTOU or for standard output and standard error (except for option 4, which wouldn't need to be repeated at all).
def my_task():
print("dask_worker_log_msg")
...
client = Client()
future = client.submit(my_task)
print("dask_client_log_msg")
...
I want to capture "dask_client_log_msg" and other task-logs in one file and "dask_worker_log_msg" and other client-logs in a separate file. As obviously client will run in a separate process altogether than the worker. so I need one process should log all its message in a separate file. Thanks.!
You can get logs from your workers with the Client.get_worker_logs method. You can also download logs from the dashboard in the info pane.
Here's a solution if you're trying to implement a Dask cluster and need the logs from all jobs that it runs (including logs from your scripts from print or logger.info):
Add a redirect in your bash script starting the worker:
dask-worker >> dask_worker.log 2>&1
In your script, set your logger to dask.distributed, like so:
logger = logging.getLogger("distributed.worker")
Configure the log format in .config/dask/distributed.yaml
See also How to capture logs from workers from a Dask-Yarn job?
My goal is to import data from CSV-files into OrientDB.
I use the OrientDB 2.2.22 Docker image.
When I try to execute the /orientdb/bin/oetl.sh config.json script within Docker, I get the error: "Can not open storage it is acquired by other process".
I guess this is, because the OrientDB - service is still running. But, if I try to stop it i get the next error.
./orientdb.sh stop
./orientdb.sh: return: line 70: Illegal number: root
or
./orientdb.sh status
./orientdb.sh: return: line 89: Illegal number: root
The only way for to use the ./oetl.sh script is to stop the Docker instance and restart it in the interactive mode running the shell, but this is awkward because to use the "OrientDB Studio" I have to stop docker again and start it in the normal mode.
As Roberto Franchini mentioned above setting the dbURL parameter in the Loader to use a remote URL fixed the first issue "Can not open storage it is acquired by other process".
The issues with the .orientdb.sh still exists, but with the remote-URL approach I don't need to shutdown and restart the service anymore.
I am running Dask on a SLURM-managed cluster.
dask-ssh --nprocs 2 --nthreads 1 --scheduler-port 8786 --log-directory `pwd` --hostfile hostfile.$JOBID &
sleep 10
# We need to tell dask Client (inside python) where the scheduler is running
scheduler="`hostname`:8786"
echo "Scheduler is running at ${scheduler}"
export ARL_DASK_SCHEDULER=${scheduler}
echo "About to execute $CMD"
eval $CMD
# Wait for dash-ssh to be shutdown from the python
wait %1
I create a Client inside my python code and then when finished, I shut it down.
c=Client(scheduler_id)
...
c.shutdown()
My reading of the dask-ssh help is that the shutdown will shutdown all workers and then the scheduler. But it does not stop the background dask-ssh and so eventually the job timeouts.
I've tried this interactively in the shell. I cannot see how to stop the scheduler.
I would appreciate any help.
Thanks,
Tim
Recommendation with --scheduler-file
First, when setting up with SLURM you might consider using the --scheduler-file option, which allows you to coordinate the scheduler address using your NFS (which I assume you have given that you're using SLURM). Recommend reading this doc section: http://distributed.readthedocs.io/en/latest/setup.html#using-a-shared-network-file-system-and-a-job-scheduler
dask-scheduler --scheduler-file /path/to/scheduler.json
dask-worker --scheduler-file /path/to/scheduler.json
dask-worker --scheduler-file /path/to/scheduler.json
>>> client = Client(scheduler_file='/path/to/scheduler.json')
Given this it also becomes easier to use the sbatch or qsub command directly. Here is an example with SGE's qsub
# Start a dask-scheduler somewhere and write connection information to file
qsub -b y /path/to/dask-scheduler --scheduler-file /path/to/scheduler.json
# Start 100 dask-worker processes in an array job pointing to the same file
qsub -b y -t 1-100 /path/to/dask-worker --scheduler-file /path/to/scheduler.json
Client.shutdown
It looks like client.shutdown only shuts down the client. You're correct that this is inconsistent with the docstring. I've raised an issue here: https://github.com/dask/distributed/issues/1085 for tracking further developments.
In the meantime
These three commands should suffice to tear down the workers, close the scheduler, and stop the scheduler process
client.loop.add_callback(client.scheduler.retire_workers, close_workers=True)
client.loop.add_callback(client.scheduler.terminate)
client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.loop.stop())
What people usually do
Typically people start and stop clusters with whatever means that they started them. This might involve using SLURM's kill command. We should make the client-focused way more consistent though regardless.
In a rails application (or sinatra), if I make a call to a shell command, under what context does this command run?
I'm not sure if I am asking my question correctly, but does it run in the same thread as the rails process?
When you shell out, is it possible to make this a asychronous call? If yes, does this mean at the operating system level it will start a new thread? Can it start in a pool of threads instead of a new thread?
If you are using system('cmd') or simply backticks:
`cmd`
Then the command will be executed in the context of a subshell.
If you wish to run multiple of these at a time, you can use Rubys fork functionality:
fork { system('cmd') }
fork { system('cmd') }
This will create multiple subprocessess which run the individual commands in their respective subshells.
Read up on forking here: http://www.ruby-doc.org/core-2.0/Process.html#method-c-fork
It's more than just a new thread, it's a completely separate process. It will be synchronous and control will not return to Ruby until the command has completed. If you want a fire-and-forget solution, you can simply background the task:
$ irb
irb(main):001:0> system("sleep 30 &")
=> true
irb(main):002:0>
$ ps ax | grep sleep
3409 pts/4 S 0:00 sleep 30
You can start as many processes as you want via system("foo &") or`foo &`.
If you want more control over launching background processes from Ruby, including properly detaching ttys and a host of other things, check out the daemons gem. That's more suitable for long-running processes that you want to manage, with PID files, etc., but it's also possible to just launch tasks with it.
There are alternative solutions for managing background processes depending on your needs. The resque gem is popular for queuing and managing background jobs. It requires Redis and some setup, but it's good if you need that level of control.