Correctly killing newly spawned isolates - dart

I am aware of the fact that when both microtask and event queues of an isolate are empty, the isolate is killed. However, I'm not able to find a reference on the documentation of how a worker isolate can be killed under certain circumstances.
Context
Let's make this example:
Future<void> main() {
final receivePort = ReceivePort();
final worker = await Isolate.spawn<SendPort>((_) {}, receivePort.sendPort);
await runMyProgram(receivePort, worker);
}
Here the main isolate is creating a new one (worker) and then the program starts doing stuff.
Question
How do I manually kill the newly spawned isolate when it's not needed anymore? I wasn't able to explicitly find this information on the documentation so I am kind of guessing. Do I have to do this?
receivePort.close();
worker.kill();
Or is it enough to just close the port, like this?
receivePort.close();
Note
I thought about this. If the worker isolate has both queues (microtask and event) empty and I close the receive port, it should be killed automatically. If this is the case, calling receivePort.close() should be enough!

If you want to make sure the child isolate shuts down, even if it's in the middle of doing something, you'll want to call Isolate.kill().
However, as you've already pointed out, an isolate will exit on its own if it has no more events to process and it holds no open ports (e.g., timers, open sockets, isolate ports, etc). For most cases, this is the ideal way to dispose of an isolate when it's no longer used since it eliminates the risk of killing the isolate while it's in the middle of doing something important.
Assuming your child isolate is good about cleaning up its own open ports when it's done doing what it needs to do, receivePort.close() should be enough for you to let it shut down.

You can kill an isolate from the outside, using the Isolate.kill method on an Isolate object representing that isolate.
(That's why you should be careful about giving away such isolate objects, and why you can create an isolate object without the "kill" capability, that you can more safely pass around.)
You can immediately kill an isolate from the inside using the static Isolate.exit.
Or using Isolate.current.kill. It's like Process.exit, but only for a single isolate.
Or you can make sure you have closed every open receive port in the isolate, and stopped doing anything.
That's the usual approach, but it can fail if you run code provided by others in your isolate. They might open receive ports or start periodic timers which run forever, and that you know nothing about.
(You can try to contain that code in a Zone where you control timers, but that won't stop them from creating receive ports, and they can always access Zone.root directly to leave the zone you put them in.)
Or someone might have Isolate.pauseed your isolate, so the worker code won't run.
If I wanted to be absolutely certain that an isolate is killed,
I'd start out by communicating with my own code running in that isolate (the port receiving worker instructions) and tell it to shut down nicely, as a part of the protocol I am already using to communicate.
The worker code can choose to use Isolate.exit when it's done, or just close all its own resources and hope it's enough. I'd probably tend to use Isolate.exit, but only after waiting for existing worker tasks getting done.
Such a worker task might be hanging (waiting for a future which will never complete). Or it might be live-locking everything by being stuck in a while (true){..can't stop, won't stop!..}. In that case, the waiting should have a timeout.
Because of that, I'd also listen for the isolate to shut down, using Isolate.addOnExitHandler, and start a timer for some reasonable duration, and if I haven't received an "on exit" notification before the timer runs out, or some feedback on the worker shutdown request telling me that things are fine, I'd escalate to isolate.kill(priority: Isolate.immediate); which can kill even a while (true) ... loop.

Related

Why does creating a single ReceiverPort cause the Dart VM to hang?

e.g.:
import 'dart:isolate';
void main() { var p = new ReceivePort(); }
This will make the whole VM hang until I Ctrl-C it. Why is this?
Dart's main function operates a bit differently than other platforms. It's more of an 'init' than anything else; it can exit and the application may continue running. A Dart VM application stays alive if it is listening for events. This generally means one or more open Streams. A ReceivePort is a Stream. Closing this stream would terminate the application.
You can verify this by running this script with dart --observe script.dart and viewing the application in Observatory. You'll notice that you have one isolate and it is 'idle' - this means there are ports open that are waiting for messages. You can click 'see ports' in the isolate panel and the ReceivePort will be the only item in the list. In general, if you are hanging and you can't figure out why, fire up Observatory and check which ports are open.
A Dart isolate stays alive as long as it has something to do.
If you start an asynchronous computation in main, then the isolate keeps running after main completes, waiting for the computation to complete.
When there are no further computations running, the program ends.
A ReceivePort is a port that can receive data from somewhere else. As long as one of those are open, the isolate doesn't know that it's not done. A new event might arrive on the ReceivePort to trigger more computation. The isolate itself doesn't know whether anyone has a SendPort that can send it data, it just assumes that it's possible.
So, a ReceivePort keeps the isolate, and the program, alive because the program doesn't know for sure that it's not done computing yet. That's a good thing. You can create a new isolate and have it wait for commands on a ReceivePort without that isolate shutting down the first time it's idle.
It does mean that you need to close your ports when you are done.
I believe the thread (or webworker) started by the ReceivePort is still alive, and needs to be explicitly shut down before the whole app can exit. Try adding p.close() and if that exits, that explains it.

Is there an Erlang behaviour that can act on its own instead of waiting to be called?

I'm writing an Erlang application that requires actively polling some remote resources, and I want the process that does the polling to fit into the OTP supervision trees and support all the standard facilities like proper termination, hot code reloading, etc.
However, the two default behaviours, gen_server and gen_fsm seem to only support operation based on callbacks. I could abuse gen_server to do that through calls to self or abuse gen_fsm by having a single state that always loops to itself with a timeout 0, but I'm not sure that's safe (i.e. doesn't exhaust the stack or accumulate unread messages in the mailbox).
I could make my process into a special process and write all that handling myself, but that effectively makes me reimplement the Erlang equivalent of the wheel.
So is there a behavior for code like this?
loop(State) ->
do_stuff(State), % without waiting to be called
loop(NewState).
And if not, is there a safe way to trick default behaviours into doing this without exhausting the stack or accumulating messages over time or something?
The standard way of doing that in Erlang is by using erlang:send_after/3. See this SO answer and also this example implementation.
Is it possible that you could employ an essentially non OTP compliant process? Although to be a good OTP citizen, you do ideally want to make your long running processes into gen_server's and gen_fsm's, sometimes you have to look beyond the standard issue rule book and consider why the rules exist.
What if, for example, your supervisor starts your gen_server, and your gen_server spawns another process (lets call it the active_poll process), and they link to each other so that they have shared fate (if one dies the other dies). The active_poll process is now indirectly supervised by the supervisor that spawned the gen_server, because if it dies, so will the gen_server, and they will both get restarted. The only problem you really have to solve now is code upgrade, but this is not too difficult - your gen_server gets a code_change callback call when the code is to be upgraded, and it could simply send a message to the active_poll process, which can make an appropriate fully qualified function call, and bingo, it's running the new code.
If this doesn't suit you for some reason and/or you MUST use gen_server/gen_fsm/similar directly...
I'm not sure that writing a 'special process' really gives you very much. If you wrote a special process correctly, such that it is in theory compliant to OTP design principals, it could still be ineffective in practice if it blocks or busy waits in a loop somewhere, and doesn't invoke sys when it should, so you really have at most a small optimisation over using gen_server/gen_fsm with a zero timeout (or by having an async message handler which does the polling and sends a message to self to trigger the next poll).
If what ever you are doing to actively poll can block (such as a blocking socket read for example), this is really big trouble, as gen_server, gen_fsm or a special process will all be stopped from fullfilling their usual obligations (which they would usually be able to either because the callback in the case of gen_server/gen_fsm returns, or because receive is called and the sys module invoked explicitly in the case of a special process).
If what you are doing to actively poll is non blocking though, you can do it, but if you poll without any delay then it effectively becomes a busy wait (it's not quite because the loop will include a receive call somewhere, which means the process will yield, giving the scheduler voluntary opportunity to run other processes, but it's not far off, and it will still be a relative CPU hog). If you can have a 1ms delay between each poll that makes a world of difference vs polling as rapidly as you can. It's not ideal, but if you MUST, it'll work. So use a timeout (as big as you can without it becoming a problem), or have an async message handler which does the polling and sends a message to self to trigger the next poll.

Erlang process termination: Where/When does it happen?

Consider processes all linked in a tree, either a formal supervision tree, or some ad-hoc structure.
Now, consider some child or worker down in this tree, with a parent or supervisor above it. I have two questions.
We would like to "gracefully" exit this process if it needs to be killed or shutdown, because it could be halfway through updating some account balance. Assume we have properly coded up some terminate function and connected this process to others with the proper plumbing. Now assume this process is in its main loop doing work. The signal to terminate comes in. Where exactly (or possibly the question should be WHEN EXACTLY) does this termination happen? In other words, when will terminate be called? Will the thing just preempt itself right in the middle of the loop it is running and call terminate? Will it wait until the end of the loop but before starting the loop again? Will it only do it while in receive mode? Etc.
Same question but without terminate function having been coded. Assume parent process is a supervisor, and this child is following normal OTP conventions. Parent tells child to shutdown, or parent crashes or whatever. The child is in its main loop. When/where/how does shutdown occur? In the middle of the main loop? After it? Etc.
It is quite nicely explained in the docs (sections 12.4, 12.5, 12.6, 12.7).
There are two cases:
Your process terminated due to some bad logic.
It throws an error, so it can be in the middle of work and this could be bad. If you want to prevent that, you can try to define mechanism, that involves two processes. First one begins the transaction, second one does the actual work and after that, first one commits the changes. If something bad happens to second process (it dies, because of errors), the first one simply does not commit the changes.
You are trying to kill the process from outside. For example, when your supervisor restarts or linked process dies.
In this case, you can also be in the middle of something, but Erlang gives you the trap_exit flag. It means, that instead of dying, the process will receive a message, that you can handle. That in turn means, that terminate function will be called after you get to the receive block. So the process will finish one chunk of work and when it will be ready for next, it will call terminate and after that die.
So you can bypass the exiting by using trap_exit. You can also bypass the trap_exit sending exit(Pid, kill), which terminates process even if it traps exits.
There is no way to bypass exit(Pid, kill), so be careful with using it.

handle saving of transient gen_servers states when using a key-to-pid mechanism

I would like to know how to handle saving of transient gen_servers states when they are associated with a key.
To associate keys with processes, I use a process called pidstore. Pidstore eventually start processes.
I give a Key and a M,F,A to pidstore, it looks for the key in global, then either returns the pid if found or apply MFA (which must return {ok, Pid}), registers the Pid with the key in global and returns the Pid.
I may have many inactive gen_servers with a possibly huge state. So, i've set the handle_info callback to save the state in my database and then stops the process. The gen_servers are considered transient in their supervisor, so they won't be restarted until something needs them again.
Here starts the problems : If I call a process with its key, say {car, 23}, during the saving step of handle_info in the process which represents {car, 23}, i'll get the pid back as intended, because the process is saving and not finished. So i'll call my process with gen_server:call but i'll never have a response (and hit default 5 sec. timeout) because the process is stopping. (PROBLEM A)
To solve this problem, the process could unregister itself from global, then save its state, then stop. But if I need it after it's unregistered but before save is finished, I will load a new process, this process could load non-updated values in the database. (PROBLEM B)
To solve this again, I could ensure that loading and saving in the db are enqueued and can not be concurrent. This could be a bottleneck. (PROBLEM C)
I've thinking about another solution : my processes, before saving, could tell the pidstore that they are busy. The pidstore would keep a list of busy processes, and respond 'busy' to any demand on theese keys.
when the save is done, the pidstore would be told no_more_busy by the process and could start a new process when asked a key. (Even if the old process is not finished, it's done saving so it can just take his time to die alone).
This seems a bit messy to me but it feels simpler to make several attemps to get the Pid from the key instead of wrapping every call to a gen_server to handle the possible timeouts. (when the process is finishing but still registrered in global).
I'm a bit confused about all of theese half-problems and half-solutions. What is the design you use in this situation, or how can I avoid this situation ?
I hope my message is legible, please tell me about english errors too.
Thank You
Maybe you want to do the save to DB part in a gen_server:call. That would prevent other calls from coming in while you are writing to DB.
Generally it sounds to like you have created a process register. You might want to look into gproc (https://github.com/uwiger/gproc) which does a very good job at that if you want register locally. With gproc you can do exactly what you described above, use a key to register a process. Maybe it would be good enough if you register with gproc in your init function and unregister when writing to DB. You could also write to DB in your terminate function.
For now i decided to stick with erlang « let it crash » philosophy. If a process recieves messages as it is shuting down, those messages will not be answered and will trigger a gen_server:call/* timeout.
I think it will be boring to handle this timeout in the right place, i have not decided where at this time, but this is specific to my application so it is pointless here.

Waiting for applications to finish loading [duplicate]

I have an application which needs to run several other applications in chain. I am running them via ShellExecuteEx. The order of running each of the apps is very important cause they are dependant on each other. For example:
Start(App1);
If App1.IsRunning then
Start(App2);
If App2.IsRunning then
Start(App3);
.........................
If App(N-1).IsRunning then
Start(App(N));
Everything works fine but there is a one possible problem:
ShellExecuteEx starts the application, and return almost immediately. The problem might arise when for example App1 has started properly but has not finished some internal tasks, it is not yet ready to use. But ShellExecuteEx is already starting App2 which depends on the App1, and App2 won't start properly because it needs fully initialized App1.
Please note, that I don't want to wait for App(N-1) to finish and then start AppN.
I don't know if this is possible to solve with ShellExecuteEx, I've tried to use
SEInfo.fMask := SEE_MASK_NOCLOSEPROCESS or SEE_MASK_NOASYNC;
but without any effect.
After starting the AppN application I have a handle to the process. If I assume that the application is initialized after its main window is created (all of Apps have a window), can I somehow put a hook on its message queue and wait until WM_CREATE appears or maybe WM_ACTIVATE? In pressence of such message my Application would know that it can move on.
It's just an idea. However, I don't know how to put such hook. So if you could help me in this or you have a better idea that would be great:)
Also, the solution must work on Windows XP and above.
Thanks for your time.
Edited
#Cosmic Prund: I don't understand why did you delete your answer? I might try your idea...
You can probably achieve what you need by calling WaitForInputIdle() on each process handle returned by ShellExecute().
Waits until the specified process has finished processing its initial input and is waiting for user input with no input pending, or until the time-out interval has elapsed.
If your application has some custom initialization logic that doesn't run in UI thread then WaitForInputIdle might not help. In that case you need a mechanism to signal the previous app that you're done initializing.
For signaling you can use named pipes, sockets, some RPC mechanism or a simple file based lock.
You can always use IPC and Interpocess Synchronization to make your application communicate with (and wait for, if needed) each other, as long as you code both applications.

Resources