a bug in erlang node after an error at another node - erlang

i created 2 erlang nodes in the same Windows machine with two cmd windows:'unclient#MYPC' and 'unserveur#MYPC' , the server code is very simple :
-module(serveur).
-export([start/0,recever/0,inverse/1]).
%%%%
start() ->
process_flag(trap_exit,true),
Pid=spawn_link(serveur,recever,[]),
register(ownServer, Pid).
%%%%
recever() ->
receive
{From, X} ->From ! {ownServer,1/X}
end.
%%%%
inverse(X) ->
ownServer!{self(), X},
receive
{'EXIT',_, _} ->start(),
sorry;
{ownServer, Reply} ->Reply
end.
so at the server node cmd i start this module
c(serveur).
serveur:start().
and i tested this server :at the server node cmd i used :
apply(serveur,inverse,[2]).
and i received 0.5 and i tried too causing an error by using an atom in the place of a number :
apply(serveur,inverse,[a]).
and the shell shows the error and shows 'sorry' and the server returns to its work correctly by restarting his child automatically because it is a system process and he traps the exit of his child.
now at the client node i used the rpc call function to try the connection and all is fine, for example i try :
rpc:call('unserveur#MYPC' ,serveur,inverse,[2]).
and at the client node cmd i receive :0.5
now i use an atom to send it to the server for causing an error
rpc:call('unserveur#MYPC' ,serveur,inverse,[a]).
at the client cmd node :
i waited for the response from the server that should be 'sorry' but i didn't receive anything and there is no more the client prompt :
unclient#MYPC 1>
i can write but the shell does not execute my instructions anymore and there is not any prompt.
at the server node :
i see an error and then the server prompt returns normally
unserveur#MYPC 5>
i tried this at the server node prompt :
apply(serveur, inverse, [2]).
and i had an error, so i restart the server manually by calling the start() function at the server node cmd and after that the server returns to work normally.
I tried self() on the server node cmd before and after the client call and the pid is the same and this is logic because the main server process is a system process so my result that he didn't execute the code after receive {'EXIT',...}.
why that happens ? i couldn't understand this bug so any one can explain to me please what that happens ?

This happens because the EXIT message is sent to any process that is linked to the server process, which is not necessarily the same process that sent the message.
It works when you run it in the shell, since the shell process gets linked to the server process, and the shell process is also the one that calls the inverse function. However, when you make an RPC call, the inverse function is called from a different process.
Try making the RPC call from the client, and then running flush(). in the shell of the server node, and you should see the EXIT message.

Related

Execute 'docker run' from within SBCL Common Lisp

I'm trying to run a function in my lisp program. It is a bot that is connected to an IRC channel and with a special command you can query the bot to evaluate a simple lisp command. Because it is extremely dangerous to execute arbitrary code from people on the internet I want the actual evaluation to happen in a VM that is running a docker for every evaluation query the bot gets.
My function looks like this:
(defun run-command (cmd)
(uiop:run-program (list "docker" "run" "--rm" "-it" "my/docker" "sbcl" "--noinform" "--no-sysinit" "--no-userinit" "--noprint" "--disable-debugger" "--eval" (string-trim '(#\^M) (format nil "~A" cmd))) "--eval" "'(quit)'") :output '(:string :stripped t))
The idea behind this function is to start a docker that contains SBCL, runs the command via SBCL --eval and prints it to the docker std-out. And this printed string should be the result of run-command.
If I call
docker run --rm -it my/docker sbcl --noinform --no-sysinit --no-userinit --noprint --disable-debugger --eval "(print (+ 2 3))" --eval "(quit)"
on my command line I just get 5 as an result, what is exactly what I want.
But if I run the same command within lisp, with the uiop:run-program function I get
Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1004FC3923}>
with command ("docker" "run" "--rm" "-it"
"my/docker" "sbcl" "--noinform"
"--no-sysinit" "--no-userinit" "--noprint"
"--disable-debugger" "--eval" "(+ 2 3)")
as an error message, which means the process failed somehow. But I don't know what exactly could be wrong here. If I just execute for example "ls" I get the output, so the function seems to work properly.
Is there some special knowledge I need about uiop:run-program or am I doing something completely wrong?
Thanks in advance.
Edit: So it turns out that the -it flag caused issues. After removing the flag a new error emerges. Now the bot has not the permissions to execute docker. Is there a way to give it the permissions without granting it sudo rights?
There's, probably, something wrong with the way docker is invoked here (or SBCL). To get the error message, invoke uiop:run-program with :error-output :string argument, and then choose the continue restart to, actually, terminate execution and get the error output printed (if you're running from SLIME or some other REPL). If you call this in a non-interactive environment, you can wrap the call in a handler-bind:
(handler-bind ((uiop/run-program:subprocess-error
(lambda (e) (invoke-restart (find-restart 'continue)))))
(run-command ...))
It turned out the -it did cause trouble. After removing it and elevating the correct permissions to the bot everything worked out fine.

Erlang VM -s Argument is causing my program to fail

I have read the thread here: Erlang VM -s argument misbehaving and have been troubleshooting to no avail.
When I run the erlang vm without the -s flag, my function works:
bridge_sup:start_link().
Bridge Supervisor Initializing
[warning] ClientId is NULL!
[warning] ClientId is NULL!
Success
Success
However, if I have the -s flag set, when my function goes on to call another function emqttc:start_link(...) it never returns:
Bridge Supervisor Initializing
[warning] ClientId is NULL!
[warning] ClientId is NULL!
I can verify that it is not just a print problem because the program I am connecting to receives no signal.
What could possibly be causing this in the Erlang VM? I have also tried using eval to the same effect. Here is the ./run code:
erl -pa ebin -pa deps/*/ebin
Thank you in advance!
Could be a startup order problem. Specifying a command to run using -s (or -run or -eval) means that it starts very quickly, while parts of the system may still be starting up in the background. Try adding a sleep at the start of your function and see if it changes anything. In that case, try to figure out what depends on the order.
I am using Erlang version 19.2. I am not sure if this is a bug in this version, or it is a requirement to start a program, but I added a .app.src file and added "-eval 'application:start(myprog)'" and the program will now start!
Note that it did not start with -s, -eval, or any of that without the app.src file and without application:start

Docker - Handling multiple services in a single container

I would like to start two different services in my Docker container and exit the container as soon as one of them exits. I looked at supervisor, but I can't find how to get it to quit as soon as one of the managed applications exits. It tries to restart them up to three times, as is the standard setting and then just sits there doing nothing. Is supervisor able to do this or is there any other tool for this? A bonus would be if there also was a way to let both managed programs write to stdout, tagged with their application name, e.g.:
[Program 1] Some output
[Program 2] Some other output
[Program 1] Output again
Since you asked if there was another tool... we designed and wrote a powerful replacement for supervisord that is designed specifically for Docker. It automatically terminates when all applications quit, as well as has special service settings to control this behavior, plus will redirect stdout with tagged syslog-compatible output lines as well. It's open source, and being used in production.
Here is a quick start for Docker: http://garywiz.github.io/chaperone/guide/chap-docker-simple.html
There is also a complete set of tested base-images which are a good example at: https://github.com/garywiz/chaperone-docker, but these might be overkill and the earlier quickstart may do the trick.
I found solutions to both of my requirements by reading through the docs some more.
Exit supervisord on application exit
This can be achieved by using a custom eventlistener. I had to add the following segment into my supervisord configuration file:
[eventlistener:shutdownevent]
command=/shutdownhandler.sh
events=PROCESS_STATE_EXITED
supervisord will start the referenced script and upon the given event being triggered (PROCESS_STATE_EXITED is triggered after the exit of one of the managed programs and it not restarting automatically) will send a line containing data about the event on the scripts stdin.
The referenced shutdownhandler-script contains:
#!/bin/bash
while :
do
echo -en "READY\n"
read line
kill $(cat /supervisord.pid)
echo -en "RESULT 2\nOK"
done
The script has to indicate being ready by sending "READY\n" on its stdout, after which it may receive an event data line on its stdin. For my use case upon receival of a line (meaning one of the managed programs has exited), a SIGTERM is sent to the supervisord process being found by the pid it leaves in its pid file (situated in the root directory by default). For technical completeness, I also included a positive answer for the eventlistener, though that one should never matter.
Tagged output on stdout
I did this by simply starting a tail process in the background before starting supervisord, tailing the programs output log and piping the lines through ts (from the moreutils package) to prepend a tag to it. This way it shows up via docker logs with an easy way to see which program actually wrote the line.
tail -fn0 /var/log/supervisor/program1.log | ts '[Program 1]' &

`ejabberdctl start` results in "kernel pid terminated" error -- what do I do?

I have googled for three hours but to no avail.
I have an ejabberd installation which is not installed using apt. It is installed from source and there is no program called ejabberd in it. Start and Stop and everything is through ejabberdctl.
It was running perfectly for a month and all of a sudden one day it stopped with the infamous
kernel pid terminated error
Anytime i do
sudo ejabberdctl start --node ejabberd#MasterService
A erl_crash file gets generated and when i try
ejabberdctl
i get
Failed to connect to RPC at node ejabberd#MasterService
Now what have i tried
Tried killing all running process of ejabberd, beam, epmd and starting fresh - DID NOT WORK
Checked /etc/hosts and hostname and all is well. Hostname is provided in hosts file with the IP
Checked the ejabberdctl.conf file to ensure teh host name is indeed right and the node name is right
checked .erlange.cookie file is being created with content in it
In all of web one way or another the search led me to either one of the above.
I have nowhere else to go and dont know where else to look. Any help would be much appreciated.
You'll have to analyze the crash dump to try to guess why it failed.
To carry out this task, Erlang has a special webtool (called, uh, webtool) from which a special application — Crash Dump Viewer — might be used to load a dump and inspect the stack traces of the Erlang processes at the time of the crash.
You have to
Install the necessary packages:
# apt-get install erlang-webtool erlang-observer
Start an Erlang interpreter:
$ erl
(Further actions are taken there.)
Run the webtool. In a simplest case, it will listen on the local host:
webtool:start().
(Notice the period.) It will print back an URL to navigate in your browser to reach the running tool.
If this happens on a server, and you'd rather like to have the webtool listen on some non-local-host interface, the call encantation would be trickier:
webtool:start(standard_path, [{port, 8888}, {bind_address, {0, 0, 0, 0}}, {server_name, "server.example.com"}]).
The {0, 0, 0, 0} IP spec will make it listen everywhere, and you might as well specify some more sensible octets, like {192, 168, 0, 1}. The server_name clause might use arbitrary name — this is what will be printed in the generated URL, the server's hostname.
Now connect to the tool with your browser, navigate to the "Start tools" menu entry, start crash dump viewer and make a link to it appear in the tool's top menu. Proceed there and find a link to load the crash dump.
After loading a crash dump try to mess around with the tool's interface to look at the stack traces of the active Erlang processes. At least one of them should contain something fishy, which should include an error message — that's what you're looking after to refine your question (or ask another at the ejabberd mailing list).
To quit the tool, run
webtool:stop().
in the running Erlang interpreter. And then quit it either by running
q().
and waiting a bit or pressing Ctrl-g and then entering the letter q followed by pressing the Return key.
The relevant links are: the crash dump viewer manual and the webtool manual.

Cannot spawn an erlang supervisor from the shell

I've implemented a gen_server and supervisor: test_server and test_sup. I want to test them from the shell/CLI. I've written their start_link functions such that their names are registered locally.
I've found that I can spawn the test_server from the command line just fine, but a spawned test_sup does not allow me to interact with the server at all.
For example, I can spawn a test_server by executing:
1> spawn(test_server, start_link, []).
<0.39.0>
2> registered().
[...,test_server,...]
I can interact with the server, and everything appears fine.
However, if I try to do the same thing with test_sup, no new names/Pids are registered in my "CLI process" (using registered/0). My test_server appears to have been spawned, but I cannot interact with it (see Lukas Larsson's comment about SASL to see why this is true).
I'd assume I coded an error in my supervisor, but this method of starting my supervisor works perfectly fine:
1> {ok, Pid}= test_sup:start_link([]).
{ok, <0.39.0>}
2> unlink(Pid).
true
3> registered().
[...,test_server,test_sup,...]
Why is it that I can spawn a gen_server but not a supervisor?
Update
The code I'm using can be found in this post. I'm using echo_server and echo_sup, two very simple modules.
Given that code, this works:
spawn(echo_server, start_link, []).
and this does not:
spawn(echo_sup, start_link, []).
Whenever trying to figure these things out it is usually very helpful to switch on SASL.
application:start(sasl).
That way you will hopefully get to know why you supervisor is terminating.
This explanation was given by Bernard Duggan on the Erlang questions mailing list:
Linked processes don't automatically
die when a process they are linked to
exits with code 'normal'. That's why
[echo_server] doesn't exit when the
spawning process exits. So why does
the supervisor die? The internals of
the supervisor module are in fact
themselves implemented as a
gen_server, but with
process_flag(trap_exit, true) set.
The result of this is that when the
parent process dies, terminate() gets
called (which doesn't happen when
trap_exit is disabled) and the
supervisor shuts down. It makes sense
in the context of a supervisor, since
a supervisor is spawned by its parent
in a supervision tree - if it didn't
die whenever its parent shutdown,
whatever the reason, you'd have
dangling "branches" of the tree.

Resources