Unable to set-up tsung cluster on EC2 - Erlang crash

Unable to set-up tsung cluster on EC2 - Erlang crash - erlang

I am trying to setup a tsung cluster on two ec2 instances:
Master - ip-10-212-101-85.ec2.internal
Slave - ip-10-116-39-86.ec2.internal
Both have erlang (R15B) and tsung (1.4.2) installed, and install-path is same on both of them.
I can do ssh from Master to Slave and vice versa without password.
Firewall is stopped on both the machines (service iptables stop)
On Master, the attempt to start a erlang slave agent result in {error,timeout}:
[root#ip-10-212-101-85 ~]# erl -rsh ssh -sname foo -setcookie mycookie
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9 (abort with ^G)
(foo#ip-10-212-101-85)1> slave:start('ip-10-116-39-86',bar,"-setcookie mycookie").
{error,timeout}
On Slave, the beam comes up for few seconds then it crashes. The erl_crash.dump can be found here
I am stuck with error, any clue will be very helpful.
PS:
On both machine the /etc/hosts is same, the file looks like below:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.212.101.85 ip-10-212-101-85.ec2.internal
10.116.39.86 ip-10-116-39-86.ec2.internal

Looks like "service iptables stop" on individual nodes is not sufficient.
In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all.
This solved the problem.

If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.

Related

Erlang net_kernel fails to start node (nodistribution)

I am new to Erlang and RabbitMQ.
I have a node on RabbitMQ on CentOS which I had to reset to restart the message queues. Ever since the restart, the Erlang refuses to start the node. There was an erlang_vm corrupted error that was fixed with a rabbit remove and restart. I've tried net_kerlnel start in erlang shell but it fails.
[root#directadmin ~]# erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
1> node().
nonode#nohost
2> net_kernel:start([rabbit, shortnames]).
{error,
{{shutdown,
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
{child,undefined,net_sup_dynamic,
{erl_distribution,start_link,[[rabbit,shortnames]]},
permanent,1000,supervisor,
[erl_distribution]}}}
3>
=INFO REPORT==== 26-Jan-2017::18:58:36 ===
Protocol: "inet_tcp": the name rabbit#directadmin seems to be in use by another Erlang node
I've noticed that someone else had a similar issue and they cited that fixing rule set in iptables resolved their issue. I am not sure how that is done. I've tried service iptables restart but that didn't make any difference
http://erlang.org/pipermail/erlang-questions/2015-October/086270.html
When I try run rabbitmqctl stop_app I get this error
[root#directadmin ~]# rabbitmqctl stop_app
Stopping node rabbit#directadmin ...
Error: erlang_vm_restart_needed
When I try running 'rabbitmqctl stop' I get the vm corrupted error
[root#directadmin ~]# rabbitmqctl stop
Stopping and halting node rabbit#directadmin ...
Error: {badarg,[{io,format,
[standard_error,
"Erlang VM I/O system is damaged, restart needed~n",[]],
[]},
{rabbit_log,handle_damaged_io_system,0,
[{file,"src/rabbit_log.erl"},{line,110}]},
{rabbit_log,with_local_io,1,
[{file,"src/rabbit_log.erl"},{line,95}]},
{rabbit,'-stop_and_halt/0-after$^0/0-0-',0,
[{file,"src/rabbit.erl"},{line,434}]},
{rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,431}]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,187}]}]}

The disk was full maybe due to the errors being written to log files. I deleted logs that occupied the most space in var/log and then ran yum erase erlang followed by a clean reinstall of erlang and rabbitmq. This resolved the issue. Thank you everyone for your contribution!

You need rabbitmqctl stop, not just rabbitmqctl stop_app.
According to the documentation, stop_app "stops the RabbitMQ application, leaving the Erlang node running", while stop "stops the Erlang node on which RabbitMQ is running".

Issue is coming from the fact that epmd is not started.
You need to start epmd manually or to by providing a node name when launching erl. This not specific to rabbitmq distribution.
http://erlang.org/documentation/doc-8.0/erts-8.0/doc/html/epmd.html

How to connect two erlang nodes?

Can someone give me more then one possibility to how to connect two Erlang nodes.
I know one way using erlang:set_cookie/2 and curious if there is another way.

1. Use -setcookie.
You can also use -setcookie when erlang execute,
In first terminal of my local machine,
hyun#hyun-VirtualBox:~$ erl -sname a -setcookie guitar
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
And second terminal of my local machine,
hyun#hyun-VirtualBox:~$ erl -sname b -setcookie guitar
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
Lastly, in first terminal,
Eshell V7.0 (abort with ^G)
(a#hyun-VirtualBox)1> net_adm:ping('b#hyun-VirtualBox').
pong
2. Copy $HOME/.erlang.cookie
you can just copy $HOME/.erlang.cookie to other remote pc for sharing same cookie value.
Also, you have to think about security.
getting_started
An Erlang node is completely unprotected when running erlang:set_cookie(node(), nocookie). This can sometimes be appropriate for systems that are not normally networked, or for systems which are run for maintenance purposes only. Refer to auth(3) for details on the security system.

According to "Erlang Security 101" by NCC Group (https://www.nccgroup.trust/globalassets/our-research/uk/whitepapers/2014/erlang_security_101_v1-0.pdf), you should not use -setcookie, as other users of the server will be able to see the cookie using ps ax | grep erl. For example, from a terminal on my local computer:
zed#blargh:~$ erl -setcookie abc -sname e1
Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
(e1#blargh)1>
And then from a second terminal, as a different user:
eks#blargh:~$ ps ax | grep erl
2035 pts/7 Sl+ 0:00 /usr/lib/erlang/erts-5.10.4/bin/beam.smp -- -root /usr/lib/erlang -progname erl -- -home /home/zed -- -setcookie abc -sname e1
2065 pts/8 S+ 0:00 grep --color=auto erl
9841 ? S 0:00 /usr/lib/erlang/erts-5.10.4/bin/epmd -daemon
And you can clearly see the cookie in the output of ps. Having the cookie allows a third party to join the erlang cluster. You should instead use the cookie file method, with restrictive permissions on the file.

You should set cookies (in console as you written or on erl execute)
Also, if you set shortname (sname) second node should be running with shortname
If you set nodename, second node also may run with -name
Works:
erl -name obsrv#127.0.0.1 -setcookie democookie
erl -name n2#127.0.0.1 -setcookie democookie
Do not work:
erl -name obsrv#127.0.0.1 -setcookie democookie
erl -name n2 -setcookie democookie
If nodes run on different machines, check port it open 40293
or set port(and set min, max) when erl executing
erl \
-kernel inet_dist_listen_min 40293\
-setcookie democookie\
-name erl_node_1

Erlang remote shell not working

I have some strange behaviour on my docker containers (CentOS). When I SSH into it there's a running instance of Erlang VM (api#127.0.0.1) I can't connect to it with -remsh argument, however I can ping it. My Erlang node (api#127.0.0.1) works correctly though.
bash-4.2# ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
Eshell V6.1 (abort with ^G)
(remote#127.0.0.1)1> node().
'remote#127.0.0.1'
(remote#127.0.0.1)2> net_adm:ping('api#127.0.0.1').
pong
(remote#127.0.0.1)3> erlang:system_info(system_version).
"Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]\n"
(remote#127.0.0.1)4> rpc:call('api#127.0.0.1', erlang, node, []).
'api#127.0.0.1'
There're 2 linux processes running - one for the actual VM and another for the process that tries to invoke remote shell
26 ? Sl 40:46 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app/ -- -name api#127.0.0.1 -boot releases/14.2.0299/start -config sys -boot_var PATH lib -noshell
32542 ? Sl+ 0:00 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app -- -name remote#127.0.0.1 -remsh api#127.0.0.1
When I copy Erlang binary files to the host (Arch Linux) and run ./bin/erl I have different results:
[jodias#arch tmp]$ ./bin/erl
Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V6.1 (abort with ^G)
1>
Please note that there's the Erlang system version printed and that's missing on a docker container (however Erlang binaries are exactly the same).

What is $TERM in shell you're trying to open remote? There is a problem when TERM is absent or is not known to ncurses which Erlang is built against, making remote shell connection fail silently. Try this one:
TERM=xterm ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
I once reported the problem to Erlang mailing list but no answer came up. Now I see this issue is in Erlang issue tracker. Please vote for it to be picked by OTP team ;)

Connecting to a running Erlang application release in a docker container

This is embarassing, but I am totally stuck and wasted the better part of this morning. I have an Erlang app release created by relx, deployed and running in a Docker container. I need to get to the shell on the running node, but I'm failing to do so. Here is what happens:
$ docker exec -it 770b497d7f27 /bin/bash
[root#ff /]# /app/bin/ff
Usage: ff {start|start_boot <file>|foreground|stop|restart|reboot|pid|ping|console|console_clean|console_boot <file>|attach|remote_console|upgrade|escript|rpc|rpcterms}
[root#ff /]# /app/bin/ff ping
pong
[root#ff /]# /app/bin/ff attach
Can't access pipe directory /tmp/erl_pipes/ff#127.0.0.1/: No such file or directory
[root#ff /]# /app/bin/ff remote_console
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1> ^G
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1>
And that's it - I can exit with q()..
There is no erl_pipes in /tmp.
Control-G seems to be captured by Docker. I cannot get to the "User switch command" menu.
Even running a pure Erlang shell is not so easy:
[root#ff /]# /app/erts-7.1/bin/erl
{"init terminating in do_boot",{'cannot get bootfile','/app/bin/start.boot'}}
Crash dump is being written to: erl_crash.dump...done
init terminating in do_boot ()
I have run out of ideas. Any help would be appreciated.

Found a workaround, managed to get ^G working by overriding the default "dumb" terminal docker sets:
export TERM=xterm
After this ^G works, starting a remote shell works, and I'm a happy camper! Would be glad to know why neither the attach nor the remote_console commands work though.

tsung ts_config_server Can't start newbeam on host (reason: timeout) Aborting

I am currently in the midst of doing distributed load testing on Amazon's EC2 services and have diligently followed all documentation/forum/support on how to get things to work, but unfortunately find myself stuck at this point. No one in any of the relevant IRC's has been able to answer this either...
Here is what I am seeing:
I am at a point where I can get Tsung to work perfectly well if I run it simply on the controller itself, but with options:
FROM tsung.xml
<client host="tester0" weight="8" maxusers="10000" cpu="4"/>
Also - it works with higher/lower CPU values.
I can also very easily get it to work locally using:
use_controller_vm="true"
but this is of no use to me now, as I cannot get the throughput that I'd like.
In order to get things working, I have ssh keys installed. I have opened all ports on these servers [ 0 - 65535 ] and have the exact same versions of Tsung, Erlang and, well, everything on the server is the same actually (they are images of each other).
Tsung version 1.4.2
Erlang R15B01
Ubuntu 12.04LTS
Same EC2 security group (all ports open - both TCP & UPD and NO iptables or SELinux)
Same EC2 availability zone
When I do start tsung, I get it to work when only send as above to tester0 and ts_config_server starts newbeam with:
ts_config_server:(6:<0.84.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
However, whenever I try to run this with any remote server, the entire test fails and I get zero users:
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451924"
However, when I try to run it with TWO clients (i.e, as below):
<client host="tester0" weight="8" maxusers="10000" cpu="1"/>
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
I once again get zero users to start hitting my web servers. I'm not sure why and this is not intuitive to me at all.
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
ts_config_server:(6:<0.85.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
One thing that I do notice is that, of all the args passed to slave:start, only ONE does not exist, and that is the one following the -boot directive:
/usr/lib/erlang//lib/tsung-1.4.2/priv/tsung
Rather, in that directory, I have only the following files:
:~$ ls /usr/lib/erlang//lib/tsung-1.4.2/priv
tsung.boot tsung_controller.rel tsung_recorder.boot tsung_recorder.script
tsung_controller.boot tsung_controller.script tsung_recorder_load.boot tsung.rel
tsung_controller_load.boot tsung_load.boot tsung_recorder_load.script tsung.script
tsung_controller_load.script tsung_load.script tsung_recorder.rel
The last thing I've actually tried is to log what is happening with my ssh session when I try to slave:start, but I get no results. I did this by running:
erl -rsh ssh -sname tsung -r ssh_log_me
Where ssh_log_me is:
#!/bin/sh
echo "$0" "$#" > /tmp/my-ssh.log
ssh -v "$#" 2>&1 | tee -a /tmp/my-ssh.log
But I get no output when I run:
(tsung#tester0)1> slave:start_link(tester1, tsung, " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751").
{error,timeout}
I have looked through erlang's -boot directive and through the actual erlang code (for ts_config_server) but I am a little lost at this point and might just be missing one last piece of information.
I ask that you please take a look at my xml file here: http://pastebin.com/2MEbL6gd

I recompiled using git's most current version and it worked --- odd that it didn't work with my installed deb package...
Goes to show you - compile from source when unsure!

Ensure ssh key verification is disabled
~/.ssh/config Host * StrictHostKeyChecking no UserKnownHostsFile=/dev/null
Make sure all ports are accessible across controller and worker nodes. If it is in the cloud make sure firewall or security groups allow all ports.
3.Erlang, Tsung must have same version.
4.Ensure all machines are reachable to each other
5.Run erlang test
erl -rsh ssh -name subbu -setcookie tsung Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G) (daya#ip-10-0-100-224.ec2.internal)1> slave:start("worker1.com",bar,"-setcookie tsung").
Warning: Permanently added 'worker1,10.0.100.225' (ECDSA) to the list of known hosts. {ok,bar#worker1}
Run this test from controller to all worker nodes.
You should be able run tests without any issues.
Good Luck!
Subbu

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Unable to set-up tsung cluster on EC2 - Erlang crash - erlang

Looks like "service iptables stop" on individual nodes is not sufficient. In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all. This solved the problem.

If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.

Related

Erlang net_kernel fails to start node (nodistribution)

How to connect two erlang nodes?

Erlang remote shell not working

Connecting to a running Erlang application release in a docker container

tsung ts_config_server Can't start newbeam on host (reason: timeout) Aborting

Categories

Resources