I access the Break Menu of eix 1.8.2 by pressing CTRL + C. It looks like this:
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
At first I assumed kill would be similar to abort (ie, just ends the session), but no. Instead, pressing k produces a core dump and offers more options:
iex(1)>
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
k
Process Information
--------------------------------------------------
=proc:<0.105.0>
State: Waiting
Spawned as: erlang:apply/2
Spawned by: <0.75.0>
Message queue length: 0
Number of heap fragments: 1
Heap fragment data: 5
Link list: [{to,<0.64.0>,#Ref<0.720592203.270008322.27074>}]
Reductions: 4202
Stack+heap: 233
OldHeap: 0
Heap unused: 177
OldHeap unused: 0
BinVHeap: 1
OldBinVHeap: 0
BinVHeap unused: 46421
OldBinVHeap unused: 46422
Memory: 2804
Stack dump:
Program counter: 0x000000001f8230e0 (io:execute_request/2 + 200)
CP: 0x0000000000000000 (invalid)
arity = 0
0x000000001ddcee08 Return addr 0x000000001f8a4ba0 ('Elixir.IEx.Server':io_get/3 + 96)
y(0) #Ref<0.720592203.270008322.27074>
y(1) {false,{get_line,unicode,<<"iex(1)> ">>}}
y(2) <0.64.0>
0x000000001ddcee28 Return addr 0x000000001d53ecf8 (<terminate process normally>)
y(0) <0.105.0>
y(1) <0.75.0>
Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL
(k)ill (n)ext (r)eturn:
If I press k again, I get another core dump. Pressing n also gives me a core dump and I think it's the same as pressing k. The final option, r, does different things depending on what I've done previously. If I've only pressed k or n a few times, it just ignores it and I have to press enter twice. iex interprets the second enter as it normally would and returns nil.
(k)ill (n)ext (r)eturn:
r
nil
If I've pressed k and n a bunch of times, it either does this:
(k)ill (n)ext (r)eturn:
r
** (EXIT from #PID<0.104.0>) shell process exited with reason: killed
Interactive Elixir (1.8.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>
09:39:57.929 [info] Application iex exited: killed
or this:
(k)ill (n)ext (r)eturn:
r
09:46:20.268 [info] Application iex exited: killed
09:46:20.269 [info] Application elixir exited: killed
09:46:20.274 [error] GenServer IEx.Pry terminating
** (stop) killed
Last message: {:EXIT, #PID<0.88.0>, :killed}
State: 1
or this:
(k)ill (n)ext (r)eturn:
r
Logger - error: {removed_failing_handler,'Elixir.Logger'}
Logger - error: {removed_failing_handler,'Elixir.Logger'}
Logger - error: {removed_failing_handler,'Elixir.Logger'}
I am unsure how it decides which of those messages should be displayed.
I'm really curious what (k)ill and it's suboptions do and look forward to learning about it. Any direction is appreciated, thanks!
Looking at the source code:
case 'k':
process_killer();
and
switch(j) {
case 'k':
ASSERT(erts_init_process_id != ERTS_INVALID_PID);
/* Send a 'kill' exit signal from init process */
erts_proc_sig_send_exit(NULL, erts_init_process_id,
rp->common.id, am_kill, NIL,
0);
case 'n': br = 1; break;
case 'r': return;
default: return;
}
k seems to be for enumerating and killing individual processes by sending them a kill signal. The different output is because it depends how each processes handles the signal.
The kill command goes through all running processes, and for each of them displays a bunch of information and asks you whether to:
kill it and go to the next process (k)
go to the next process without killing this one (n), or
stop killing processes and go back to the shell (r).
It might be tricky to identify the process you want to kill. One thing you can look at is the Dictionary line, which for most long-running processes has an $initial_call entry telling you which module contains the code that this process is running. For example:
Dictionary: [{'$ancestors',[<0.70.0>]},{iex_evaluator,ack},{'$initial_call',{'Elixir.IEx.Evaluator',init,4}}]
The different messages are displayed depending on which process(es) you killed. For example, it seems like Elixir.IEx.Evaluator is the process running the Elixir shell, which gives you the shell process exited with reason: killed error message.
A way of looking at this is that it shows the fault tolerance of an Elixir application: even if a process somewhere within the system has an error (in this case caused by explicitly killing the process), the supervisors try to restart the process in question and keep the entire system running.
Actually, I've never used this way of killing processes in a running system. If you know the process id ("pid") of the process you want to kill, you can type something like this into the shell:
Process.exit(pid("0.10.0"), :kill)
without having to step through the list of processes.
Related
Still working through Joe's book, and having hard time fully understanding monitors in general and spawn_monitor in particular. Here's the code I have; the exercise is asking to write a function that will start a process whose job is to print a heartbeat every 5 seconds, and then a function to monitor the above process and restart it. I didn't get to a restart part, because my monitor fails to even detect the process keeling over.
% simple "working" loop
loop_5_print() ->
receive
after 5000 ->
io:format("I'm still alive~n"),
loop_5_print()
end.
% function to spawn and register a named worker
create_reg_keep_alive(Name) when not is_atom(Name) ->
{error, badargs};
create_reg_keep_alive(Name) ->
Pid = spawn(ex, loop_5_print, []),
register(Name, Pid),
{Pid, Name}.
% a simple monitor loop
monitor_loop(AName) ->
Pid = whereis(AName),
io:format("monitoring PID ~p~n", [Pid]),
receive
{'DOWN', _Ref, process, Pid, Why} ->
io:format("~p died because ~p~n",[AName, Why]),
% add the restart logic
monitor_loop(AName)
end.
% function to bootstrapma monitor
my_monitor(AName) ->
case whereis(AName) of
undefined -> {error, no_such_registration};
_Pid -> spawn_monitor(ex, monitor_loop, [AName])
end.
And here's me playing with in:
39> c("ex.erl").
{ok,ex}
40> ex:create_reg_keep_alive(myjob).
{<0.147.0>,myjob}
I'm still alive
I'm still alive
41> ex:my_monitor(myjob).
monitoring PID <0.147.0>
{<0.149.0>,#Ref<0.230612052.2032402433.56637>}
I'm still alive
I'm still alive
42> exit(whereis(myjob), stop).
true
43>
It sure stopped the loop_5_print "worker" - but where's the line that the monitor was supposed to print? The only explanation that I see is that the message emitted by a process quitting in this manner isn't of the pattern on which I am matching inside monitor loop's receive. But that's the only pattern introduced in the book in this chapter, so I'm not buying this explanation..
spawn_monitor is not what you want here. spawn_monitor spawns a process and immediately starts monitoring it. When the spawned process dies, the process that called spawn_monitor gets a message that the process is dead. You need to call erlang:monitor/2 from the process that you want to receive the DOWN messages in, with the second argument being the Pid to monitor.
Just add:
monitor(process, Pid),
after:
Pid = whereis(AName),
and it works:
1> c(ex).
{ok,ex}
2> ex:create_reg_keep_alive(myjob).
{<0.67.0>,myjob}
I'm still alive
I'm still alive
I'm still alive
3> ex:my_monitor(myjob).
monitoring PID <0.67.0>
{<0.69.0>,#Ref<0.2696002348.2586050567.188678>}
I'm still alive
I'm still alive
I'm still alive
4> exit(whereis(myjob), stop).
myjob died because stop
true
monitoring PID undefined
Module test:
tester() ->
receive
X ->
erlang:display("message.."),
tester()
end.
initialize() ->
spawn_link(?MODULE, tester, []),
erlang:display("Started successfully.").
REPL:
length(erlang:processes()). -> 23
Pid = spawn_link(test, initialize, []).
length(erlang:processes()). -> 24
exit(Pid).
length(erlang:processes()). -> 24
It seems that the spawned tester process is still running! How do i make sure that when i exit my application, all spawn_link process are killed too?
Well, you are actually starting two Erlang processes, not one. The first one, to which you are sending the exit signal, dies before you even send the exit signal, so the exit has no effect.
The first process you start in the shell in this line:
Pid = spawn_link(test, initialize, []).
This process starts executing the initialize function, in which it starts the second process, and then it dies because there is nothing else to do. This is the process to which you are trying to send the exit signal.
To fix this simply return the correct Pid from the initialize function:
initialize() ->
Pid = spawn_link(?MODULE, tester, []),
erlang:display("Started successfully."),
Pid.
And start it directly:
Pid2 = test:initialize().
Then you will be able to kill it with exit(Pid2).
I want to run a programme that loads a data into riak database, but the process is stopping suddenly when i gave range about 10. if i gave 5-8 it is taking and loading the data, when i give range more than 10 it was hanging up. what was happening.
test(X,Y) ->
if X < Y ->
{ok,Pid} = riakc_pb_socket:start_link("127.0.0.1",10017),
Val = X,
io:format("~p",[Pid]),
Object = riakc_obj:new(<<"test_age">>,undefined,Val),
riakc_pb_socket:put(Pid,Object),
test(X+1,Y);
true -> []
end.
The function will take a range of numbers and just insert into the riak database with adding the numbers to the old number.
when i run this programme from the erlang shell it is stopping after taking the range about 10:
erlriak:test(1,5)
<0.50.0><0.51.0><0.52.0><0.53.0><0.54.0>[]
erlriak:test(5,15)
<0.62.0><0.63.0><0.64.0><0.65.0><0.66.0><0.67.0><0.68.0><0.69.0><0.70.0>|
It was hanging after the <0.70.0> for minutes and nothing is returning.I am getting the following:
=ERROR REPORT==== 19-May-2014::17:29:54 ===
** Generic server <0.104.0> terminating
** Last message in was {req_timeout,#Ref<0.0.0.423>}
** When Server state == {state,"127.0.0.1",10017,false,false,undefined,false,
gen_tcp,undefined,
{[],[]},
1,[],infinity,undefined,undefined,undefined,
undefined,[],100}
** Reason for termination ==
** disconnected
** exception exit: disconnected
I suspect you are exceeding the limit for simultaneous connections to Riak's PB port. Possible solutions:
call riakc_pb_socket:stop(Pid) to close the connection before recursing
move the call to start_link outside the test function so all requests share the same socket
increase pb_backlog in your app.config so you can have more simultaneous connections
1> dbg:get_tracer().
{error,{no_tracer_on_node,nonode#nohost}}
2> dbg:tracer().
{ok,<0.33.0>}
3> dbg:get_tracer().
{ok,<0.35.0>}
The document tells: get_tracer returns the process or port to which all trace messages are sent.
But it doesn't tells clearly what pid returned by dbg:tracer
As you can see in pman, there are indeed two processes:
<0.33.0> sits in dbg:loop/2
<0.35.0> sits in dbg:tracer_loop/2
You can see what they are doing here: https://github.com/erlang/otp/blob/maint/lib/runtime_tools/src/dbg.erl
I haven't dug too deep into this, but at a glance it looks like the former is doing more manager-like job and the latter actually processes traces.
I have an application distributed over 2 nodes. When I halt() the first node the failover works perfectly, but ( sometimes ? ) when I restart the first node the takeover fails and the application crashes since start_link returns already started.
SUPERVISOR REPORT <0.60.0> 2009-05-20 12:12:01
===============================================================================
Reporting supervisor {local,twitter_server_supervisor}
Child process
errorContext start_error
reason {already_started,<2415.62.0>}
pid undefined
name tag1
start_function {twitter_server,start_link,[]}
restart_type permanent
shutdown 10000
child_type worker
ok
My app
start(_Type, Args)->
twitter_server_supervisor:start_link( Args ).
stop( _State )->
ok.
My supervisor :
start_link( Args ) ->
supervisor:start_link( {local,?MODULE}, ?MODULE, Args ).
Both nodes are using the same sys.config file.
What am I not understanding about this process that the above should not work ?
It seems like your problem stem from twitter server supervisor trying to start one of its children. Since the error report complains about the child with start_function
{twitter_server,start_link,[]}
And since you are not showing that code, I can only guess that it is trying to register a name for itself, but there is already a process registered with that name.
Even more guessing, the reason shows a Pid, the Pid that has the name that we tried to grab for ourself:
{already_started,<2415.62.0>}
The Pid there has a non-zero initial integer, if it was zero it means it is a local process. From which I deduce that you are trying to register a global name, and you are connected to another node where there is already a process globally registered by that name.