why process_flag trap exit not work here? - erlang

When I try to use "process_flag" for catching children's error report, The log shows trap_exit doesn't work. The problem make trouble for me for 20 hours.
The following log doesn't contain my trap log which should be shown here, like " lager:info("loop_1_0,~p,~p",[From,Reason]); %%<---should show this line."
(emacs#yus-iMac.local)20> 05:18:26.048 [info] test_a_1
05:18:26.249 [info] levi_simulate_init_1,false
05:18:26.250 [error] gen_server pid_simulate_reader_user terminated with reason: no match of right hand value 3 in levi_simulate:handle_call/3 line 471
05:18:26.250 [error] CRASH REPORT Process pid_simulate_reader_user with 0 neighbours exited with reason: no match of right hand value 3 in levi_simulate:handle_call/3 line 471 in gen_server:terminate/6 line 747
==The following is my example codes:
1. `levi_simulate_tests.erl` main content
error_test_a()->
close_server(?PID_SIMULATE_READER_USER),
timer:sleep(200),
lager:info("test_a_1"),
spawn_trap_exit(fun crash_test_a/0),
pass.
spawn_trap_exit(Fun)->
_Pid = spawn(fun()->
process_flag(trap_exit,true),
Fun(),
loop(),
lager:info("receive_after loop")
end).
loop()->
receive
{'EXIT',From,Reason}->
lager:info("loop_1_0,~p,~p",[From,Reason]); %%<---should show this line.
X ->
lager:info("loop_2,~p",[X]),
loop()
after 3000 ->
ok
end.
crash_test_a()->
close_server(?PID_SIMULATE_READER_USER),
timer:sleep(200),
{ok,Pid} = levi_simulate:start_link(false,?PID_SIMULATE_READER_USER,true,[]),
Id = 1,
gen_server:call(Pid,{test_only}),
ok.
close_server(Server)->
try
Pid = whereis(Server),
case is_process_alive(Pid) of
true ->
exit(Pid,shut);
false ->
ok
end
catch
_:_->
ok
end.
===
2. levi_simulate.erl main content
-module(levi_simulate).
-compile([{parse_transform, lager_transform}]).
-behaviour(gen_server).
start_link(Need_link_ui_pid,Server_name,Connection_condition,
Tag_id_list) ->
gen_server:start_link({local,Server_name},?MODULE,
[Need_link_ui_pid,Server_name,
Connection_condition,Tag_id_list], []).
init([Need_link_ui_pid,Server_name,Connection_condition,Tag_id_list]) ->
case Need_link_ui_pid of
true ->
true = erlang:link(whereis(?PID_UI));
false ->
lager:info("levi_simulate_init_1,false"),
ok
end,
%% A = 2,
%% A = 3,
ok = levi_tag:init(Tag_id_list),
{ok, #state{connection_condition=Connection_condition,
pid_symbol = Server_name}}.
handle_call(Request, From, State) ->
A = 2,
A = 3, %% <------ create exit here
{reply,ok,State}.

One reason could be that when you generate an error in the handle_call/3 it will cause the gen_server:call/2 to generate an exception and crash the calling process as well. In which you will never enter the loop/0 function. And easy way to test this would be to replace the gen_server call with
catch gen_server:call(Pid,{test_only}),
and see what happens.

Related

Interaction with MailboxProcessor and Task hangs forever

I want to process a series of jobs in sequence, but I want to queue up those jobs in parallel.
Here is my code:
open System.Threading.Tasks
let performWork (work : int) =
task {
do! Task.Delay 1000
if work = 7 then
failwith "Oh no"
else
printfn $"Work {work}"
}
async {
let w = MailboxProcessor.Start (fun inbox -> async {
while true do
let! message = inbox.Receive()
let (ch : AsyncReplyChannel<_>), work = message
do!
performWork work
|> Async.AwaitTask
ch.Reply()
})
w.Error.Add(fun exn -> raise exn)
let! completed =
seq {
for i = 1 to 10 do
async {
do! Async.Sleep 100
do! w.PostAndAsyncReply(fun ch -> ch, i)
return i
}
}
|> fun jobs -> Async.Parallel(jobs, maxDegreeOfParallelism = 4)
printfn $"Completed {Seq.length completed} job(s)."
}
|> Async.RunSynchronously
I expect this code to crash once it reaches work item 7.
However, it hangs forever:
$ dotnet fsi ./Test.fsx
Work 3
Work 1
Work 2
Work 4
Work 5
Work 6
I think that the w.Error event is not firing correctly.
How should I be capturing and re-throwing this error?
If my work is async, then it crashes as expected:
let performWork (work : int) =
async {
do! Async.Sleep 1000
if work = 7 then
failwith "Oh no"
else
printfn $"Work {work}"
}
But I don't see why this should matter.
Leveraging a Result also works, but again, I don't know why this should be required.
async {
let w = MailboxProcessor.Start (fun inbox -> async {
while true do
let! message = inbox.Receive()
let (ch : AsyncReplyChannel<_>), work = message
try
do!
performWork work
|> Async.AwaitTask
ch.Reply(Ok ())
with exn ->
ch.Reply(Error exn)
})
let performWorkOnWorker (work : int) =
async {
let! outcome = w.PostAndAsyncReply(fun ch -> ch, work)
match outcome with
| Ok () ->
return ()
| Error exn ->
return raise exn
}
let! completed =
seq {
for i = 1 to 10 do
async {
do! Async.Sleep 100
do! performWorkOnWorker i
return i
}
}
|> fun jobs -> Async.Parallel(jobs, maxDegreeOfParallelism = 4)
printfn $"Completed {Seq.length completed} job(s)."
}
|> Async.RunSynchronously
I think the problem is in your error handling:
w.Error.Add(fun exn -> raise exn)
Instead of handling the exception, you're attempting to raise it again, which I think is causing an infinite loop.
You can change this to print the exception instead:
w.Error.Add(printfn "%A")
Result is:
Work 4
Work 2
Work 1
Work 3
Work 5
Work 6
System.AggregateException: One or more errors occurred. (Oh no)
---> System.Exception: Oh no
at Program.performWork#4.MoveNext() in C:\Users\Brian Berns\Source\Repos\FsharpConsole\FsharpConsole\Program.fs:line 8
--- End of inner exception stack trace ---
I think the gist of the 'why' here is that Microsoft changed the behaviour for 'unobserved' task exceptions back in .NET 4.5, and this was brought through into .NET Core: these exceptions no longer cause the process to terminate, they're effectively ignored. You can read more about it here.
I don't know the ins and outs of how Task and async are interoperating, but it would seem that the use of Task results in the continuations being attached to that and run on the TaskScheduler as a consequence. The exception is thrown as part of the async computation within the MailboxProcessor, and nothing is 'observing' it. This means the exception ends up in the mechanism referred to above, and that's why your process no longer crashes.
You can change this behaviour via a flag on .NET Framework via app.config, as explained in the link above. For .NET Core, you can't do this. You'd ordinarily try and replicate this by subscribing to the UnobservedTaskException event and re-throwing there, but that won't work in this case as the Task is hung and won't ever be garbage collected.
To try and prove the point, I've amended your example to include a timeout for PostAndReplyAsync. This means that the Task will eventually complete, can be garbage collected and, when the finaliser runs, the event fired.
open System
open System.Threading.Tasks
let performWork (work : int) =
task {
do! Task.Delay 1000
if work = 7 then
failwith "Oh no"
else
printfn $"Work {work}"
}
let worker = async {
let w = MailboxProcessor.Start (fun inbox -> async {
while true do
let! message = inbox.Receive()
let (ch : AsyncReplyChannel<_>), work = message
do!
performWork work
|> Async.AwaitTask
ch.Reply()
})
w.Error.Add(fun exn -> raise exn)
let! completed =
seq {
for i = 1 to 10 do
async {
do! Async.Sleep 100
do! w.PostAndAsyncReply((fun ch -> ch, i), 10000)
return i
}
}
|> fun jobs -> Async.Parallel(jobs, maxDegreeOfParallelism = 4)
printfn $"Completed {Seq.length completed} job(s)."
}
TaskScheduler.UnobservedTaskException.Add(fun ex ->
printfn "UnobservedTaskException was fired, re-raising"
raise ex.Exception)
try
Async.RunSynchronously worker
with
| :? TimeoutException -> ()
GC.Collect()
GC.WaitForPendingFinalizers()
The output I get here is:
Work 1
Work 3
Work 4
Work 2
Work 5
Work 6
UnobservedTaskException was fired, re-raising
Unhandled exception. System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (One or more errors occurred. (Oh no))
---> System.AggregateException: One or more errors occurred. (Oh no)
---> System.Exception: Oh no
at Program.performWork#5.MoveNext() in /Users/cmager/dev/ConsoleApp1/ConsoleApp2/Program.fs:line 9
--- End of inner exception stack trace ---
at Microsoft.FSharp.Control.AsyncPrimitives.Start#1078-1.Invoke(ExceptionDispatchInfo edi)
at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction) in D:\a\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 104
at Microsoft.FSharp.Control.AsyncPrimitives.AttachContinuationToTask#1144.Invoke(Task`1 completedTask) in D:\a\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 1145
at System.Threading.Tasks.ContinuationTaskFromResultTask`1.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of inner exception stack trace ---
at Program.clo#46-4.Invoke(UnobservedTaskExceptionEventArgs ex) in /Users/cmager/dev/ConsoleApp1/ConsoleApp2/Program.fs:line 48
at Microsoft.FSharp.Control.CommonExtensions.SubscribeToObservable#1989.System.IObserver<'T>.OnNext(T args) in D:\a\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 1990
at Microsoft.FSharp.Core.CompilerServices.RuntimeHelpers.h#379.Invoke(Object _arg1, TArgs args) in D:\a\_work\1\s\src\fsharp\FSharp.Core\seqcore.fs:line 379
at Program.clo#46-3.Invoke(Object delegateArg0, UnobservedTaskExceptionEventArgs delegateArg1) in /Users/cmager/dev/ConsoleApp1/ConsoleApp2/Program.fs:line 46
at System.Threading.Tasks.TaskScheduler.PublishUnobservedTaskException(Object sender, UnobservedTaskExceptionEventArgs ueea)
at System.Threading.Tasks.TaskExceptionHolder.Finalize()
As you can see, the exception is eventually published by the Task finaliser, and re-throwing it in that handler brings down the app.
While interesting, I'm not sure any of this is practically useful information. The suggestion to terminate the app within MailboxProcessor.Error handler is probably the right one.
As far as I see, when you throw an exception in the MailboxProcessor Body. Then the MailboxProcessor doesn't hang forever, it just stops the whole MailboxProcessor.
Your program also hangs, well because you do a Async.Parallel and wait until every async finished. But those with an exception, never finish, or returns a result. So your program overall, hangs forever.
If you want to explicitly abort, then you need to call System.Environment.Exit, not just throw an exception.
One way to re-write your program is like this.
open System.Threading.Tasks
let performWork (work : int) = task {
do! Task.Delay 1000
if work = 7
then failwith "Oh no"
else printfn $"Work {work}"
}
let mb =
let mbBody (inbox : MailboxProcessor<AsyncReplyChannel<_> * int>) = async {
while true do
let! (ch,work) = inbox.Receive()
try
do! performWork work |> Async.AwaitTask
ch.Reply ()
with error ->
System.Environment.Exit 0
}
MailboxProcessor.Start mbBody
Async.RunSynchronously (async {
let! completed =
let jobs = [|
for i = 1 to 10 do
async {
do! Async.Sleep 100
do! mb.PostAndAsyncReply(fun ch -> ch, i)
return i
}
|]
Async.Parallel(jobs)
printfn $"Completed {Seq.length completed} job(s)."
})
Btw. i changed the seq {} to an array, and additional removed the maxDegreeOfParallelism option. Otherwise the results seemed not to be very parallel in my tests. But you still can keep those if you want.
executing this program prints something like:
Work 10
Work 4
Work 9
Work 3
Work 8

F# async try with not catching exceptions

Strange things... I just wanted to do a simple retry on exceptions in F# but the catch doesn't catch :) Any ideas?
let rec retry times next event =
async {
try
return! next event
with
| _ when times > 0 -> return! retry (times - 1) next event
| error -> return error.Reraise()
}
if the next is a function like;
let handler evt = async { failwith "Oh-no" }
Then the code in try executes but it is not catched. What is going on? :O
UPDATE
The reraise is an extension method as described here: https://github.com/fsharp/fslang-suggestions/issues/660 by nikonthethird.
type Exception with
member this.Reraise () =
(ExceptionDispatchInfo.Capture this).Throw ()
Unchecked.defaultof<_>
Your code does catch the exceptions. Here's the full program I'm running to test it:
let rec retry times next event =
async {
try
printfn "Retry: %A" times
return! next event
with
| _ when times > 0 -> return! retry (times - 1) next event
| error -> raise error
}
let handler evt =
async {
printfn "About to fail once"
failwith "Oh-no"
}
[<EntryPoint>]
let main argv =
retry 3 handler ()
|> Async.RunSynchronously
|> printfn "%A"
0
Output:
Retry: 3
About to fail once
Retry: 2
About to fail once
Retry: 1
About to fail once
Retry: 0
About to fail once
Unhandled exception. System.Exception: Oh-no
You can see that the exceptions are being caught, because handler is invoked multiple times before retry gives up.
Notes:
I replaced return error.Reraise() with raise error, since Exception.Reraise isn't a defined method. I'm not sure what you had in mind here, but it doesn't directly affect the answer to your question.
It's important to fully invoke retry with all three arguments (I used () as the "event"), and then run the resulting async computation synchronously. Maybe you weren't doing that?
You might want to look into using Async.Catch for handling async exceptions instead.

Killing a gen_server without failing the Common Test

I implemented a module which is purposefully supposed to crash (to test the functionality of another module, which is monitoring it). The problem is, when this gen_server crashes it also causes the common test for it to fail. I've tried using try/catch and setting process_flag(trap_exit, true) but nothing seems to work.
Here is some relevant code:
-module(mod_bad_process).
% ...
%% ct calls this function directly
kill() ->
gen_server:call(?MODULE, {update_behavior, kill}).
% ...
handle_cast({update_behavior, Behavior}, _From, State) ->
case Behavior of
kill -> {stop, killed, State};
_ -> {reply, ok, State#{state := Behavior}}
end;
% ...
And the common test:
% ...
-define(BAD_PROC, mod_bad_process).
% ...
remonitor_test(_Conf) ->
InitialPid = whereis(?BAD_PROC),
true = undefined =/= InitialPid,
true = is_monitored_gen_server(?BAD_PROC),
mod_bad_process:kill(), % gen_server crashes
timer:sleep(?REMONITOR_DELAY_MS),
FinalPid = whereis(?BAD_PROC),
true = InitialPid =/= FinalPid,
true = undefined =/= FinalPid,
true = is_monitored_gen_server(?BAD_PROC).
% ...
And the resulting error from ct:
*** CT Error Notification 2021-07-16 16:08:20.791 ***
gen_server:call failed on line 238
Reason: {killed,{gen_server,call,...}}
=== Ended at 2021-07-16 16:08:20
=== Location: [{gen_server,call,238},
{mod_bad_process,kill,48},
{monitor_tests,remonitor_test,62},
{test_server,ts_tc,1784},
{test_server,run_test_case_eval1,1293},
{test_server,run_test_case_eval,1225}]
=== === Reason: {killed,{gen_server,call,
[mod_bad_process_global,
{update_behavior,kill}]}}
===
*** monitor_remonitor_test failed.
Skipping all other cases in sequence.
Any ideas on how to get this functionality without failing the common test?
The problem was that my try/catch attempts were not pattern matching to the actual error. Here is the fix:
-module(mod_bad_process).
% ...
kill() ->
try gen_server:call(?MODULE, {update_behavior, kill}) of
_ -> error(failed_to_kill)
catch
exit:{killed, _} -> ok
end.
% ...

Erlang error: no function clause matching io:request

I'm an experienced programmer new to Erlang and I'm stuck on the following:
myread() ->
{_, MyData } = file:read_file( "hands.txt" ),
io:format( "hands-out.txt", "~w", MyData ).
yields, when myread() is invoked from the shell:
** exception error: no function clause matching io:request("hands-out.txt",
{format,"~w", <<"3h 5h 7h 8h 3h 5h 7h 8h q"...>>})
(io.erl, line 556) in function io:o_request/3 (io.erl, line 63)
Any help would be appreciated.
Two things:
"hands-out.txt", "~w" needs to be one string: "hands-out.txt: ~w"
and the data that's replacing the ~w needs to be a list. So:
io:format( "hands-out.txt: ~w", [MyData] ).
See http://erlang.org/doc/man/io.html#format-2
Also, you should pattern match on the status value in the return from file:read_file/1. In your version, an error, which would be returned as {error, Reason} would match here, since you're using _, and you'd print the error reason rather than the file, which might be confusing.
So either make it {ok, MyData } = file:read_file( "hands.txt" ) if you want to crash on read error, or something like the following if you want to handle that case:
myread() ->
case file:read_file( "hands.txt" ) of
{ok, MyData } ->
io:format( "hands-out.txt: ~w", [MyData] );
{error, Error} ->
io:format("Error: ~w~n", [Error])
end.

Erlang: supervisor:start_child/2 error has me baffled

I'm making slight modifications to Logan/Merritt/Carlson's simple cache, Chapter 6, pp 149-169, Erlang and OTP in Action. So far, no code changes, just renaming the modules.
I start the application:
application:start(gridz).
ok
I insert an item:
gridz_maker:insert(blip, blop).
I get this error:
** exception error: no match of right hand side value
{error,
{function_clause,
[{gridz_edit,init,
[{blop,86400}],
[{file,"src/gridz_edit.erl"},{line,51}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
in function gridz_maker:insert/2 (src/gridz_maker.erl, line 15)
Here's the code:
insert(Key, Value) ->
case gridz_store:lookup(Key) of
{ok, Pid} -> gridz_edit:replace(Pid, Value);
{error, _} -> {ok, Pid} = gridz_edit:create(Value), %% line 15
gridz_store:insert(Key, Pid)
end.
I look at line 15:
{error, _} -> {ok, Pid} = gridz_edit:create(Value),
I expect the error because this is a new item. gridz:edit is a gen_server (sc_element in Logan et/al.) Here's the code for create/1:
create(Value) ->
create(Value, ?DEFAULT_LEASE_TIME).
create(Value, LeaseTime) ->
gridz_sup:start_child(Value, LeaseTime).
And here's the code for gridz_sup:start_child/2:
start_child(Value, LeaseTime) ->
supervisor:start_child(?SERVER, [Value, LeaseTime]).
init([]) ->
Grid = {gridz_edit, {gridz_edit, start_link, []},
temporary, brutal_kill, worker, [gridz_edit]},
Children = [Grid],
RestartStrategy = {simple_one_for_one, 0, 1},
{ok, {RestartStrategy, Children}}.
If I execute supervisor:start_child/2 directly, here's what I get:
{error,{function_clause,[{gridz_edit,init,
[{blop,50400}],
[{file,"src/gridz_edit.erl"},{line,51}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
Line 51 in gridz_edit is an init function:
init([Value, LeaseTime]) ->
Now = calendar:local_time(),
StartTime = calendar:datetime_to_gregorian_seconds(Now),
{ok,
#state{value = Value,
lease_time = LeaseTime,
start_time = StartTime},
time_left(StartTime, LeaseTime)}.
If I execute it directly, it works:
120> gridz_edit:init([blop, (60 * 60 * 24)]).
{ok,{state,blop,86400,63537666408},86400000}
So now I'm baffled. What am I missing? Why does supervisor:start_child/2 throw an error?
Thanks,
LRP
The error says you are passing in a tuple with 2 members, {blop,86400}, when you seem to be expecting a list of 2 members: [Value, LeaseTime]. In your direct execution, you are also using a list, so it works. You should figure out where the tuple is being created, and create a list instead.

Resources