I'm currently trying to set up a distributed Tsung load testing
environment which uses the Erlang slave functionality, however I have
been unsuccessful in getting the controller node to start a slave
node. E.g.
(musicglue#load1)1> net:ping(musicglue#load2).
pong
(musicglue#load1)2> slave:start(load2,musicglue,"-setcookie tom").
{error,timeout}
BACKGROUND
My env:
Controller - hostname: load1, user: musicglue, Ubuntu 10.04 LTS,
Erlang R15B01 compiled from source
Slave - hostname: load2, user: musicglue, Ubuntu 10.04 LTS, Erlang
R15B01 complied from source
Firewall disabled
SELinux not installed
Things that are working:
I can SSH from load1 onto load2 and vice versa
I can start an erl sessions on load1 and load2
I can start an erl session on load2 from load1; ssh load2 erl
I can successfully ping load2 from load1 from an erl session using
the same cookie on both nodes.
Ping output:
musicglue#load1:~$ erl -rsh ssh -sname musicglue -setcookie tom
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:
0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
(musicglue#load1)1> net:ping(musicglue#load2).
pong
THE ISSUE
My problem occurs when attempting to start a slave session from load1
on load2:
musicglue#load1:~$ erl -rsh ssh -sname musicglue -setcookie tom
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:
0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
(musicglue#load1)1> net:ping(musicglue#load2).
pong
(musicglue#load1)2> slave:start(load2,musicglue,"-setcookie
tom").
{error,timeout}
Here is the output I get from epmd when I run the slave:start command:
epmd: Thu May 24 10:01:57 2012: Non-local peer connected
epmd: Thu May 24 10:01:57 2012: opening connection on file descriptor
4
epmd: Thu May 24 10:01:57 2012: got 12 bytes
***** 00000000 00 0a 7a 6d 75 73 69 63 67 6c 75 65
|..zmusicglue|
epmd: Thu May 24 10:01:57 2012: ** got PORT2_REQ
epmd: Thu May 24 10:01:57 2012: got 2 bytes
***** 00000000 77 01 |w.|
epmd: Thu May 24 10:01:57 2012: ** sent PORT2_RESP (error) for
"musicglue"
epmd: Thu May 24 10:01:57 2012: closing connection on file descriptor
4
epmd: Thu May 24 10:01:57 2012: Local peer connected
epmd: Thu May 24 10:01:57 2012: opening connection on file descriptor
4
epmd: Thu May 24 10:01:57 2012: got 24 bytes
***** 00000000 00 16 78 ca d6 4d 00 00 05 00 05 00 09 6d 75 73
|..x..M.......mus|
***** 00000010 69 63 67 6c 75 65 00 00 |
icglue..|
epmd: Thu May 24 10:01:57 2012: ** got ALIVE2_REQ
epmd: Thu May 24 10:01:57 2012: registering 'musicglue:1', port 51926
epmd: Thu May 24 10:01:57 2012: type 77 proto 0 highvsn 5 lowvsn 5
epmd: Thu May 24 10:01:57 2012: got 4 bytes
***** 00000000 79 00 00 01 |
y...|
epmd: Thu May 24 10:01:57 2012: ** sent ALIVE2_RESP for "musicglue"
epmd: Thu May 24 10:01:57 2012: unregistering 'musicglue:1', port
51926
epmd: Thu May 24 10:01:57 2012: closing connection on file descriptor
4
Any help or suggestions anyone has would be much appreciated,
Many thanks
EDIT
I should also mention that I can see the ssh connection being successfully acknowledged by load2 but then immediately disconnecting:
May 30 13:49:27 load2 sshd[16169]: Accepted publickey for musicglue from 173.45.236.182 port 51843 ssh2
May 30 13:49:27 load2 sshd[16171]: Received disconnect from 173.45.236.182: 11: disconnected by user
In response to below comments I have also tried to start the slave using different node names for the slave:
musicglue#load1:~$ erl -rsh ssh -sname musicglue -setcookie tom
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
(musicglue#load1)1> slave:start(load2,bar,"-setcookie tom").
{error,timeout}
and for the controller:
musicglue#load1:~$ erl -rsh ssh -sname foo -setcookie tom
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
(foo#load1)1> slave:start(load2,musicglue,"-setcookie tom").
{error,timeout}
and for both:
musicglue#load1:~$ erl -rsh ssh -sname foo -setcookie tom
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
(foo#load1)1> slave:start(load2,bar,"-setcookie tom").
{error,timeout}
But to no avail
SOLUTION
Turns out that my problem was that my slave was unable to SSH onto the controller and therefore could not respond to any commands.
After fixing this port of communication between the two nodes everyone worked perfectly.
An alternate answer for those who find this question via Google. If you're trying to start a service on a separate machine then your controller node name must resolve.
For example, I was having timeouts with:
> node().
someName#host.domain.com
> slave:start('192.168.122.196',bar,"-setcookie cookie").
{error,timeout}
By starting my erlang instance with an explicit domain name:
erl -name someName#192.168.1.5 -setcookie cookie
> slave:start('192.168.122.196',bar,"-setcookie cookie").
This command now succeeds.
Try logging what goes on through SSH by creating a shell script like this somewhere in your PATH:
#!/bin/sh
echo "$0" "$#" > /tmp/my-ssh.log
ssh -v "$#" 2>&1 | tee -a /tmp/my-ssh.log
Call it my-ssh, start Erlang with erl -rsh my-ssh, and check what goes into /tmp/my-ssh.log. That should shed some light on the problem...
Related
First. Erlang nodes failed to connect and Erlang - Nodes don't recognize are useless.
I have tried all the ways.
It is ok for the same machine. But it failed between machines.
test#centos-1:~$ ping apple#centos-1 -c 1
PING apple#centos-1 (192.168.142.135) 56(84) bytes of data.
64 bytes from apple#centos-1 (192.168.142.135): icmp_seq=1 ttl=64 time=0.036 ms
test#centos-1:~$ ping pear#centos-2 -c 1
PING pear#centos-2 (192.168.142.136) 56(84) bytes of data.
64 bytes from pear#centos-2 (192.168.142.136): icmp_seq=1 ttl=64 time=0.292 ms
apple#centos-1 starts
#centos-1:~$ erl -sname apple#centos_1 -kernel inet_dist_listen_min 6369 inet_dist_listen_max 7369 -setcookie CKYBWKWCWNLSPZWSLJXT
Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Eshell V12.2 (abort with ^G)
(apple#centos_1)1>
pear#centos-2 starts
test#centos-2:~$ erl -sname pear#centos-2 -kernel inet_dist_listen_min 6369 inet_dist_listen_max 7369 -setcookie CKYBWKWCWNLSPZWSLJXT
Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Eshell V12.2 (abort with ^G)
(pear#centos-2)1>
connection failed
test#centos-1:~$ erl -sname apple#centos_1 -kernel inet_dist_listen_min 6369 inet_dist_listen_max 7369 -setcookie CKYBWKWCWNLSPZWSLJXT
Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Eshell V12.2 (abort with ^G)
(apple#centos_1)1> net
net net_adm net_kernel
(apple#centos_1)1> net_kernel:connect_node('pear#centos-2').
false
(apple#centos_1)2>
I have checked all the situations I have found
The hosts file
192.168.142.135 apple#centos-1
192.168.142.136 pear#centos-2
cookie
They have the same cookie.
firewall
firewall-cmd --add-port=6000-8000/tcp --permanent
tcpdump
There are not any package.
Linux is not responsible for service names, so this ping should fail:
test#centos-1:~$ ping apple#centos-1 -c 1
This linux ping should succeed:
test#centos-1:~$ ping centos-1 -c 1
Erlang examples are often using functions called ping/pong that would use epmd and use # synax.
This looks good if domains are setup correctly (though note '-' and '_' are not the same):
#centos-1:~$ erl -sname apple#centos-1 -kernel inet_dist_listen_min 6369 inet_dist_listen_max 7369 -setcookie CKYBWKWCWNLSPZWSLJXT Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Hosts are just:
192.168.142.135 centos-1
192.168.142.136 centos-2
so the pear#centos-2 like lines you setup are not being used by erl. You can run as many erl shells as you like with different names and not need to update hosts.
Once that setup is working if you look in /etc/resolv.conf you should have a domain and it should be the same on both machines. If it is, you can try adding an alias with it to the hosts like this:
192.168.142.135 centos-1 centos-1.example.com
192.168.142.136 centos-2 centos-2.example.com
Though ideally the setup in resolv.conf is to a local dns server that set's this naming up so centos-1.example.com and centos-2.example.com can already ping each other.
Can someone give me more then one possibility to how to connect two Erlang nodes.
I know one way using erlang:set_cookie/2 and curious if there is another way.
1. Use -setcookie.
You can also use -setcookie when erlang execute,
In first terminal of my local machine,
hyun#hyun-VirtualBox:~$ erl -sname a -setcookie guitar
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
And second terminal of my local machine,
hyun#hyun-VirtualBox:~$ erl -sname b -setcookie guitar
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
Lastly, in first terminal,
Eshell V7.0 (abort with ^G)
(a#hyun-VirtualBox)1> net_adm:ping('b#hyun-VirtualBox').
pong
2. Copy $HOME/.erlang.cookie
you can just copy $HOME/.erlang.cookie to other remote pc for sharing same cookie value.
Also, you have to think about security.
getting_started
An Erlang node is completely unprotected when running erlang:set_cookie(node(), nocookie). This can sometimes be appropriate for systems that are not normally networked, or for systems which are run for maintenance purposes only. Refer to auth(3) for details on the security system.
According to "Erlang Security 101" by NCC Group (https://www.nccgroup.trust/globalassets/our-research/uk/whitepapers/2014/erlang_security_101_v1-0.pdf), you should not use -setcookie, as other users of the server will be able to see the cookie using ps ax | grep erl. For example, from a terminal on my local computer:
zed#blargh:~$ erl -setcookie abc -sname e1
Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
(e1#blargh)1>
And then from a second terminal, as a different user:
eks#blargh:~$ ps ax | grep erl
2035 pts/7 Sl+ 0:00 /usr/lib/erlang/erts-5.10.4/bin/beam.smp -- -root /usr/lib/erlang -progname erl -- -home /home/zed -- -setcookie abc -sname e1
2065 pts/8 S+ 0:00 grep --color=auto erl
9841 ? S 0:00 /usr/lib/erlang/erts-5.10.4/bin/epmd -daemon
And you can clearly see the cookie in the output of ps. Having the cookie allows a third party to join the erlang cluster. You should instead use the cookie file method, with restrictive permissions on the file.
You should set cookies (in console as you written or on erl execute)
Also, if you set shortname (sname) second node should be running with shortname
If you set nodename, second node also may run with -name
Works:
erl -name obsrv#127.0.0.1 -setcookie democookie
erl -name n2#127.0.0.1 -setcookie democookie
Do not work:
erl -name obsrv#127.0.0.1 -setcookie democookie
erl -name n2 -setcookie democookie
If nodes run on different machines, check port it open 40293
or set port(and set min, max) when erl executing
erl \
-kernel inet_dist_listen_min 40293\
-setcookie democookie\
-name erl_node_1
I have some strange behaviour on my docker containers (CentOS). When I SSH into it there's a running instance of Erlang VM (api#127.0.0.1) I can't connect to it with -remsh argument, however I can ping it. My Erlang node (api#127.0.0.1) works correctly though.
bash-4.2# ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
Eshell V6.1 (abort with ^G)
(remote#127.0.0.1)1> node().
'remote#127.0.0.1'
(remote#127.0.0.1)2> net_adm:ping('api#127.0.0.1').
pong
(remote#127.0.0.1)3> erlang:system_info(system_version).
"Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]\n"
(remote#127.0.0.1)4> rpc:call('api#127.0.0.1', erlang, node, []).
'api#127.0.0.1'
There're 2 linux processes running - one for the actual VM and another for the process that tries to invoke remote shell
26 ? Sl 40:46 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app/ -- -name api#127.0.0.1 -boot releases/14.2.0299/start -config sys -boot_var PATH lib -noshell
32542 ? Sl+ 0:00 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app -- -name remote#127.0.0.1 -remsh api#127.0.0.1
When I copy Erlang binary files to the host (Arch Linux) and run ./bin/erl I have different results:
[jodias#arch tmp]$ ./bin/erl
Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V6.1 (abort with ^G)
1>
Please note that there's the Erlang system version printed and that's missing on a docker container (however Erlang binaries are exactly the same).
What is $TERM in shell you're trying to open remote? There is a problem when TERM is absent or is not known to ncurses which Erlang is built against, making remote shell connection fail silently. Try this one:
TERM=xterm ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
I once reported the problem to Erlang mailing list but no answer came up. Now I see this issue is in Erlang issue tracker. Please vote for it to be picked by OTP team ;)
I am trying to configure eJabberd on my server.
I finished the installation. The following sequence of commands gives an unexpected output:
$ ejabberd start
$ ejabberd status
In this sequence the ejabbed is started and we are able to access the web admin interface.
But after running the ejabberd status its giving following output:
Failed to create main carrier for temp_alloc
/sbin/ejabberdctl: line 412: 9616 Aborted $EXEC_CMD "$ERL $NAME ${CONN_NAME} -noinput -hidden -pa $EJABBERD_EBIN_PATH $KERNEL_OPTS -s ejabberd_ctl -extra $ERLANG_NODE $COMMAND"
Update
executing $ erl giving following output:
Crash dump is being written to: erl_crash.dump...done
Failed to create aux thread
Aborted
Output of crash.dump
=erl_crash_dump:0.3
Wed Nov 18 03:16:51 2015
Slogan: Failed to create aux thread
System version: Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:85:24] [async-threads:10] [hipe] [kernel-poll:false]
Compiled: Tue Nov 17 05:43:11 2015
Taints:
Atoms: 2005
Calling Thread: beam.smp
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING
Scheduler Sleep Info Aux Work: SET_TMO
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 1
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: NONEMPTY_NORMAL | NONEMPTY
Current Process:
I am not able to trace the issue, Any reference will be very helpful.
Run erl with SMP mode disabled i.e. $ erl -smp disable
If it runs successfully go to /sbin/ejabberdctl file, line 412 and add the option there too e.g.
$EXEC_CMD "$ERL $NAME ${CONN_NAME} -smp disable -noinput -hidden -pa $EJABBERD_EBIN_PATH $KERNEL_OPTS -s ejabberd_ctl -extra $ERLANG_NODE $COMMAND"
I am trying to setup a tsung cluster on two ec2 instances:
Master - ip-10-212-101-85.ec2.internal
Slave - ip-10-116-39-86.ec2.internal
Both have erlang (R15B) and tsung (1.4.2) installed, and install-path is same on both of them.
I can do ssh from Master to Slave and vice versa without password.
Firewall is stopped on both the machines (service iptables stop)
On Master, the attempt to start a erlang slave agent result in {error,timeout}:
[root#ip-10-212-101-85 ~]# erl -rsh ssh -sname foo -setcookie mycookie
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9 (abort with ^G)
(foo#ip-10-212-101-85)1> slave:start('ip-10-116-39-86',bar,"-setcookie mycookie").
{error,timeout}
On Slave, the beam comes up for few seconds then it crashes. The erl_crash.dump can be found here
I am stuck with error, any clue will be very helpful.
PS:
On both machine the /etc/hosts is same, the file looks like below:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.212.101.85 ip-10-212-101-85.ec2.internal
10.116.39.86 ip-10-116-39-86.ec2.internal
Looks like "service iptables stop" on individual nodes is not sufficient.
In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all.
This solved the problem.
If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.