Erlang. Connecting to local node: *** ERROR: Shell process terminated - erlang

i have been struggling to connect to erlang node and with no luck.
The situation is following:
1) I have "-detached" erlang node running on local host with -sname n1
2)
$ epmd -names
epmd: up and running on port 4369 with data:
name n1 at port 53653
3) Trying to connect
$ erl -sname test -remsh n1
...
ERROR: Shell process terminated! (^G to start new job)
$ erl -sname test -setcookie *COOKIE* -remsh n1
...
ERROR: Shell process terminated! (^G to start new job)
$ erl -sname test -setcookie *COOKIE* -remsh n1#localhost
...
ERROR: Shell process terminated! (^G to start new job)
What else should i try ?
UPD:
Following #Odobenus Rosmarus advice:
$ hostname
server.domain.com
$ erl -sname test -setcookie *COOKIE* -remsh n1#server.domain.com
** System NOT running to use fully qualified hostnames **
** Hostname server.domain.com is illegal **
** ERROR: Shell process terminated! (^G to start new job) **
another blind try (throw away part of fqdn):
$ erl -sname test -setcookie *COOKIE* -remsh n1#server
Eshell V5.8.5 (abort with ^G)
(ipspy#server)1>
Ok, in 5 tries we are there, cool.

erl -sname test -setcookie *COOKIE* -remsh n1#hostname
where hostname is not localhost, but output of 'hostname' command on your computer.

Related

How to configure docker to expose an Erlang node?

I wrote a simple docker image which starts up an Erlang node (rebar3 release, console launch mode). It starts fine and lets me ping the node from within the container. However, I can't get erl shell to ping it from the host — it simply returns pang and nothing is logged in the dockerized console.
The Dockerfile just starts the node, it doesn't do anything more interesting.
Checklist
Cookie is set and matches
sname is set on both nodes
Docker node is reachable from other container nodes
I refer to the docker node using full sname (tried nodename#localhost, nodename#machinename and nodename#127.0.0.1)
epmd port is exposed (tried without it as well)
What could I have forgotten to make it work?
Disclaimer: This answer is valid for Linux systems. Also, I think that the first and last options are the easiest
There are several ways to do so, but before, let's set some key ideas:
The following script is run inside docker simulating a node, it just waits for a node connection, prints it and terminates.
SCRIPT_RUN_IN_DOCKER='ok = net_kernel:monitor_nodes(true), fun F() -> receive {nodeup, N} -> io:format("Connected to ~p~n", [N]), init:stop() end end().'
In order for the distribution protocol to succeed, not only the node needs to be reached at the name used for the ping, the whole node name must match.
Erlang's -node can be used with IP, it will be used extensively in the following commands
Options
Now, let's get with the options (All the commands are to be run in different terminals)
Docker: Host network namespace
When starting docker in host's network namespace (--net=host), there's no difference with running both outside of docker (for network purposes). It's the easiest way to connect both nodes using plain docker.
-name (ip):
$> docker run --net=host erlang erl -noinput -name foo#127.0.0.1 -setcookie cookie -eval $SCRIPT_RUN_IN_DOCKER
Connected to 'bar#127.0.0.1'
$> erl -noinput -name bar#127.0.0.1 -setcookie cookie -eval "net_adm:ping('foo#127.0.0.1'), init:stop()."
-sname with #localhost:
$> docker run --net=host erlang erl -noinput -sname foo#localhost -setcookie cookie -eval $SCRIPT_RUN_IN_DOCKER
Connected to bar#localhost
$> erl -noinput -sname bar#localhost -setcookie cookie -eval "net_adm:ping('foo#localhost'), init:stop()."
-sname with #$(hostname -f):
$> docker run --net=host erlang erl -noinput -sname foo -setcookie cookie -eval $SCRIPT_RUN_IN_DOCKER
Connected to 'bar#amazing-hostname'
$> erl -noinput -sname bar -setcookie cookie -eval "net_adm:ping('foo#$(hostname -f)'), init:stop()."
Docker: Using docker's default bridge (docker0)
By default, docker starts the containers in its own bridge, and these ips can be reached without need to expose any port.
ip a show docker0 lists 172.17.0.1/16 for my machine, and erlang listens in 172.17.0.2 (shown in docker inspect <container>)
-name (ip):
$> docker run erlang erl -noinput -name foo#172.17.0.2 -setcookie cookie -eval $SCRIPT_RUN_IN_DOCKER
Connected to bar#baz
$> erl -noinput -name bar#baz -setcookie cookie -eval "net_adm:ping('foo#172.17.0.2'), init:stop()."
-sname (fake name resolving to container ip):
# The trick here is to have exactly the same node name for the destination, otherwise the distribution protocol won't work.
# We can achieve the custom DNS resolution in linux by editing /etc/hosts
$> tail -n 1 /etc/hosts
172.17.0.2 erlang_in_docker
$> docker run erlang erl -noinput -name foo#erlang_in_docker -setcookie cookie -eval $SCRIPT_RUN_IN_DOCKER
Connected to 'bar#amazing-hostname'
$> erl -noinput -sname bar -setcookie cookie -eval "net_adm:ping('foo#erlang_in_docker'), init:stop()."
Docker: Using some other docker bridge
Just create the new network and repeat the previous steps, using the ips from the new network
docker network create erlang_docker_network
docker inspect erlang_docker_network
Docker: Exposing ports with two EPMDs
When exposing ports you have to juggle ports and ips because the EPMD ports must be the same.
In this case you are going to have two epmds, one for the host and other for the container (EPMD rejects name requests from non-local peers), listening in the same port number.
The trick here is (ab)using the 127.0.0.* ips that point all to localhost to simulate different nodes. Note the flag to set the distribution port, as mentioned by #legoscia
-name (ip):
$> epmd -address 127.0.0.1
$> docker run -p 127.0.0.2:4369:4369/tcp -p 127.0.0.2:9000:9000/tcp erlang erl -noinput -name foo#127.0.0.2 -setcookie cookie -kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000 -eval $SCRIPT_RUN_IN_DOCKER
Connected to bar#baz
$> erl -noinput -name bar#baz -setcookie cookie -eval "net_adm:ping('foo#127.0.0.2'), init:stop()."
-sname (fake name resolving to 127.0.0.2)
And here we need again the DNS resolution provided by /etc/hosts
$> tail -n 1 /etc/hosts
127.0.0.2 erlang_in_docker
$> epmd -address 127.0.0.1
$> docker run -p 127.0.0.2:4369:4369/tcp -p 127.0.0.2:9000:9000/tcp erlang erl -noinput -name foo#erlang_in_docker -setcookie cookie -kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000 -eval $SCRIPT_RUN_IN_DOCKER
Connected to bar#baz
$> erl -noinput -sname bar#baz -setcookie cookie -eval "net_adm:ping('foo#erlang_in_docker'), init:stop()."
Docker-compose
docker-compose allows you to easily set up multi-container systems. With it, you don't need to create/inspect networks.
Given the following docker-compose.yaml:
version: '3.3'
services:
node:
image: "erlang"
command:
- erl
- -noinput
- -sname
- foo
- -setcookie
- cookie
- -eval
- ${SCRIPT_RUN_IN_DOCKER} # Needs to be exported
hostname: node
operator:
image: "erlang"
command:
- erl
- -noinput
- -sname
- baz
- -setcookie
- cookie
- -eval
- "net_adm:ping('foo#node'), init:stop()."
hostname: operator
If you run the following docker-compose run commands, you'll see the results:
$> docker-compose up node
Creating network "tmp_default" with the default driver
Creating tmp_node_1 ... done
Attaching to tmp_node_1
node_1 | Connected to baz#operator
tmp_node_1 exited with code 0
$> docker-compose run operator

Erlang net_kernel fails to start node (nodistribution)

I am new to Erlang and RabbitMQ.
I have a node on RabbitMQ on CentOS which I had to reset to restart the message queues. Ever since the restart, the Erlang refuses to start the node. There was an erlang_vm corrupted error that was fixed with a rabbit remove and restart. I've tried net_kerlnel start in erlang shell but it fails.
[root#directadmin ~]# erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
1> node().
nonode#nohost
2> net_kernel:start([rabbit, shortnames]).
{error,
{{shutdown,
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
{child,undefined,net_sup_dynamic,
{erl_distribution,start_link,[[rabbit,shortnames]]},
permanent,1000,supervisor,
[erl_distribution]}}}
3>
=INFO REPORT==== 26-Jan-2017::18:58:36 ===
Protocol: "inet_tcp": the name rabbit#directadmin seems to be in use by another Erlang node
I've noticed that someone else had a similar issue and they cited that fixing rule set in iptables resolved their issue. I am not sure how that is done. I've tried service iptables restart but that didn't make any difference
http://erlang.org/pipermail/erlang-questions/2015-October/086270.html
When I try run rabbitmqctl stop_app I get this error
[root#directadmin ~]# rabbitmqctl stop_app
Stopping node rabbit#directadmin ...
Error: erlang_vm_restart_needed
When I try running 'rabbitmqctl stop' I get the vm corrupted error
[root#directadmin ~]# rabbitmqctl stop
Stopping and halting node rabbit#directadmin ...
Error: {badarg,[{io,format,
[standard_error,
"Erlang VM I/O system is damaged, restart needed~n",[]],
[]},
{rabbit_log,handle_damaged_io_system,0,
[{file,"src/rabbit_log.erl"},{line,110}]},
{rabbit_log,with_local_io,1,
[{file,"src/rabbit_log.erl"},{line,95}]},
{rabbit,'-stop_and_halt/0-after$^0/0-0-',0,
[{file,"src/rabbit.erl"},{line,434}]},
{rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,431}]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,187}]}]}
The disk was full maybe due to the errors being written to log files. I deleted logs that occupied the most space in var/log and then ran yum erase erlang followed by a clean reinstall of erlang and rabbitmq. This resolved the issue. Thank you everyone for your contribution!
You need rabbitmqctl stop, not just rabbitmqctl stop_app.
According to the documentation, stop_app "stops the RabbitMQ application, leaving the Erlang node running", while stop "stops the Erlang node on which RabbitMQ is running".
Issue is coming from the fact that epmd is not started.
You need to start epmd manually or to by providing a node name when launching erl. This not specific to rabbitmq distribution.
http://erlang.org/documentation/doc-8.0/erts-8.0/doc/html/epmd.html

tsung ts_config_server Can't start newbeam on host (reason: timeout) Aborting

I am currently in the midst of doing distributed load testing on Amazon's EC2 services and have diligently followed all documentation/forum/support on how to get things to work, but unfortunately find myself stuck at this point. No one in any of the relevant IRC's has been able to answer this either...
Here is what I am seeing:
I am at a point where I can get Tsung to work perfectly well if I run it simply on the controller itself, but with options:
FROM tsung.xml
<client host="tester0" weight="8" maxusers="10000" cpu="4"/>
Also - it works with higher/lower CPU values.
I can also very easily get it to work locally using:
use_controller_vm="true"
but this is of no use to me now, as I cannot get the throughput that I'd like.
In order to get things working, I have ssh keys installed. I have opened all ports on these servers [ 0 - 65535 ] and have the exact same versions of Tsung, Erlang and, well, everything on the server is the same actually (they are images of each other).
Tsung version 1.4.2
Erlang R15B01
Ubuntu 12.04LTS
Same EC2 security group (all ports open - both TCP & UPD and NO iptables or SELinux)
Same EC2 availability zone
When I do start tsung, I get it to work when only send as above to tester0 and ts_config_server starts newbeam with:
ts_config_server:(6:<0.84.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
However, whenever I try to run this with any remote server, the entire test fails and I get zero users:
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451924"
However, when I try to run it with TWO clients (i.e, as below):
<client host="tester0" weight="8" maxusers="10000" cpu="1"/>
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
I once again get zero users to start hitting my web servers. I'm not sure why and this is not intuitive to me at all.
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
ts_config_server:(6:<0.85.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
One thing that I do notice is that, of all the args passed to slave:start, only ONE does not exist, and that is the one following the -boot directive:
/usr/lib/erlang//lib/tsung-1.4.2/priv/tsung
Rather, in that directory, I have only the following files:
:~$ ls /usr/lib/erlang//lib/tsung-1.4.2/priv
tsung.boot tsung_controller.rel tsung_recorder.boot tsung_recorder.script
tsung_controller.boot tsung_controller.script tsung_recorder_load.boot tsung.rel
tsung_controller_load.boot tsung_load.boot tsung_recorder_load.script tsung.script
tsung_controller_load.script tsung_load.script tsung_recorder.rel
The last thing I've actually tried is to log what is happening with my ssh session when I try to slave:start, but I get no results. I did this by running:
erl -rsh ssh -sname tsung -r ssh_log_me
Where ssh_log_me is:
#!/bin/sh
echo "$0" "$#" > /tmp/my-ssh.log
ssh -v "$#" 2>&1 | tee -a /tmp/my-ssh.log
But I get no output when I run:
(tsung#tester0)1> slave:start_link(tester1, tsung, " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751").
{error,timeout}
I have looked through erlang's -boot directive and through the actual erlang code (for ts_config_server) but I am a little lost at this point and might just be missing one last piece of information.
I ask that you please take a look at my xml file here: http://pastebin.com/2MEbL6gd
I recompiled using git's most current version and it worked --- odd that it didn't work with my installed deb package...
Goes to show you - compile from source when unsure!
Ensure ssh key verification is disabled
~/.ssh/config Host * StrictHostKeyChecking no UserKnownHostsFile=/dev/null
Make sure all ports are accessible across controller and worker nodes. If it is in the cloud make sure firewall or security groups allow all ports.
3.Erlang, Tsung must have same version.
4.Ensure all machines are reachable to each other
5.Run erlang test
erl -rsh ssh -name subbu -setcookie tsung Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G) (daya#ip-10-0-100-224.ec2.internal)1> slave:start("worker1.com",bar,"-setcookie tsung").
Warning: Permanently added 'worker1,10.0.100.225' (ECDSA) to the list of known hosts. {ok,bar#worker1}
Run this test from controller to all worker nodes.
You should be able run tests without any issues.
Good Luck!
Subbu

Unable to set-up tsung cluster on EC2 - Erlang crash

I am trying to setup a tsung cluster on two ec2 instances:
Master - ip-10-212-101-85.ec2.internal
Slave - ip-10-116-39-86.ec2.internal
Both have erlang (R15B) and tsung (1.4.2) installed, and install-path is same on both of them.
I can do ssh from Master to Slave and vice versa without password.
Firewall is stopped on both the machines (service iptables stop)
On Master, the attempt to start a erlang slave agent result in {error,timeout}:
[root#ip-10-212-101-85 ~]# erl -rsh ssh -sname foo -setcookie mycookie
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9 (abort with ^G)
(foo#ip-10-212-101-85)1> slave:start('ip-10-116-39-86',bar,"-setcookie mycookie").
{error,timeout}
On Slave, the beam comes up for few seconds then it crashes. The erl_crash.dump can be found here
I am stuck with error, any clue will be very helpful.
PS:
On both machine the /etc/hosts is same, the file looks like below:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.212.101.85 ip-10-212-101-85.ec2.internal
10.116.39.86 ip-10-116-39-86.ec2.internal
Looks like "service iptables stop" on individual nodes is not sufficient.
In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all.
This solved the problem.
If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.

In erlang/OTP how do I start appmon to monitor an existing node?

I have a running erlang application, launched with this command line
erl -boot start_sasl -config config/cfg_qa -detached -name peasy -cookie peasy -pa ./ebin -pa ./ebin/mochiweb -s peasy start
If I start a new node and run appmon:start(), the 'peasy' node won't show up, even if using the same cookie. The same happens with webtool:start()
Anyone?
Found.
As always with erlang, to have two nodes speak to each other, you need to ping:
1> net_adm:ping(other_node_you_want_to_monitor).
pong
2> appmon:start().
{ok,<0.48.0>}
And off you go :)

Resources