Erlang net_kernel fails to start node (nodistribution)

Erlang net_kernel fails to start node (nodistribution) - erlang

I am new to Erlang and RabbitMQ.
I have a node on RabbitMQ on CentOS which I had to reset to restart the message queues. Ever since the restart, the Erlang refuses to start the node. There was an erlang_vm corrupted error that was fixed with a rabbit remove and restart. I've tried net_kerlnel start in erlang shell but it fails.
[root#directadmin ~]# erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
1> node().
nonode#nohost
2> net_kernel:start([rabbit, shortnames]).
{error,
{{shutdown,
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
{child,undefined,net_sup_dynamic,
{erl_distribution,start_link,[[rabbit,shortnames]]},
permanent,1000,supervisor,
[erl_distribution]}}}
3>
=INFO REPORT==== 26-Jan-2017::18:58:36 ===
Protocol: "inet_tcp": the name rabbit#directadmin seems to be in use by another Erlang node
I've noticed that someone else had a similar issue and they cited that fixing rule set in iptables resolved their issue. I am not sure how that is done. I've tried service iptables restart but that didn't make any difference
http://erlang.org/pipermail/erlang-questions/2015-October/086270.html
When I try run rabbitmqctl stop_app I get this error
[root#directadmin ~]# rabbitmqctl stop_app
Stopping node rabbit#directadmin ...
Error: erlang_vm_restart_needed
When I try running 'rabbitmqctl stop' I get the vm corrupted error
[root#directadmin ~]# rabbitmqctl stop
Stopping and halting node rabbit#directadmin ...
Error: {badarg,[{io,format,
[standard_error,
"Erlang VM I/O system is damaged, restart needed~n",[]],
[]},
{rabbit_log,handle_damaged_io_system,0,
[{file,"src/rabbit_log.erl"},{line,110}]},
{rabbit_log,with_local_io,1,
[{file,"src/rabbit_log.erl"},{line,95}]},
{rabbit,'-stop_and_halt/0-after$^0/0-0-',0,
[{file,"src/rabbit.erl"},{line,434}]},
{rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,431}]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,187}]}]}

The disk was full maybe due to the errors being written to log files. I deleted logs that occupied the most space in var/log and then ran yum erase erlang followed by a clean reinstall of erlang and rabbitmq. This resolved the issue. Thank you everyone for your contribution!

You need rabbitmqctl stop, not just rabbitmqctl stop_app.
According to the documentation, stop_app "stops the RabbitMQ application, leaving the Erlang node running", while stop "stops the Erlang node on which RabbitMQ is running".

Issue is coming from the fact that epmd is not started.
You need to start epmd manually or to by providing a node name when launching erl. This not specific to rabbitmq distribution.
http://erlang.org/documentation/doc-8.0/erts-8.0/doc/html/epmd.html

Related

EQMX not started as a service

Fedora 31.
EMQX does not start as a service.
It starts successfully from the console.
=====
===== LOGGING STARTED Wed Apr 8 13:12:55 +05 2020
=====
Exec: /usr/lib/emqx/erts-10.5/bin/erlexec -boot /usr/lib/emqx/releases/v4.0.5/emqx -mode embedded -boot_var ERTS_LIB_DIR /usr/lib/emqx/erts-10.5/../lib -mnesia dir "/var/lib/emqx/mnesia/emqx#127.0.0.1" -config /var/lib/emqx/configs/app.2020.04.08.13.12.56.config -args_file /var/lib/emqx/configs/vm.2020.04.08.13.12.56.args -vm_args /var/lib/emqx/configs/vm.2020.04.08.13.12.56.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka -- console
Root: /usr/lib/emqx
/usr/lib/emqx
Starting emqx on node emqx#127.0.0.1
Start http:management listener on 8081 successfully.
Start http:dashboard listener on 18083 successfully.
Start mqtt:tcp listener on 0.0.0.0:11883 successfully.
Start mqtt:tcp listener on 0.0.0.0:1883 successfully.
Start mqtt:ssl listener on 0.0.0.0:8883 successfully.
EMQ X Broker 4.0.5 is running now!
Eshell V10.5 (abort with ^G)
(emqx#127.0.0.1)1> Stop mqtt:tcp listener on 0.0.0.0:11883 successfully.
(emqx#127.0.0.1)1> Stop mqtt:tcp listener on 0.0.0.0:1883 successfully.
(emqx#127.0.0.1)1> Stop mqtt:ssl listener on 0.0.0.0:8883 successfully.
(emqx#127.0.0.1)1> [os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
What could be the problem?

Maby your TCP port 8083 already in use, please check it
Starting emqx on node emqx#127.0.0.1
Start http:management listener on 8081 successfully.
Start http:dashboard listener on 18083 successfully.
Start mqtt:tcp listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp listener on 0.0.0.0:1883 successfully.
Start mqtt:ws listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl listener on 0.0.0.0:8883 successfully.
Start mqtt:wss listener on 0.0.0.0:8084 successfully.
EMQ X Broker 4.1.0-alpha.2 is running now!
Eshell V10.6.1 (abort with ^G)
(emqx#127.0.0.1)1>

Erlang remote shell not working

I have some strange behaviour on my docker containers (CentOS). When I SSH into it there's a running instance of Erlang VM (api#127.0.0.1) I can't connect to it with -remsh argument, however I can ping it. My Erlang node (api#127.0.0.1) works correctly though.
bash-4.2# ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
Eshell V6.1 (abort with ^G)
(remote#127.0.0.1)1> node().
'remote#127.0.0.1'
(remote#127.0.0.1)2> net_adm:ping('api#127.0.0.1').
pong
(remote#127.0.0.1)3> erlang:system_info(system_version).
"Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]\n"
(remote#127.0.0.1)4> rpc:call('api#127.0.0.1', erlang, node, []).
'api#127.0.0.1'
There're 2 linux processes running - one for the actual VM and another for the process that tries to invoke remote shell
26 ? Sl 40:46 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app/ -- -name api#127.0.0.1 -boot releases/14.2.0299/start -config sys -boot_var PATH lib -noshell
32542 ? Sl+ 0:00 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app -- -name remote#127.0.0.1 -remsh api#127.0.0.1
When I copy Erlang binary files to the host (Arch Linux) and run ./bin/erl I have different results:
[jodias#arch tmp]$ ./bin/erl
Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V6.1 (abort with ^G)
1>
Please note that there's the Erlang system version printed and that's missing on a docker container (however Erlang binaries are exactly the same).

What is $TERM in shell you're trying to open remote? There is a problem when TERM is absent or is not known to ncurses which Erlang is built against, making remote shell connection fail silently. Try this one:
TERM=xterm ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
I once reported the problem to Erlang mailing list but no answer came up. Now I see this issue is in Erlang issue tracker. Please vote for it to be picked by OTP team ;)

Connecting to a running Erlang application release in a docker container

This is embarassing, but I am totally stuck and wasted the better part of this morning. I have an Erlang app release created by relx, deployed and running in a Docker container. I need to get to the shell on the running node, but I'm failing to do so. Here is what happens:
$ docker exec -it 770b497d7f27 /bin/bash
[root#ff /]# /app/bin/ff
Usage: ff {start|start_boot <file>|foreground|stop|restart|reboot|pid|ping|console|console_clean|console_boot <file>|attach|remote_console|upgrade|escript|rpc|rpcterms}
[root#ff /]# /app/bin/ff ping
pong
[root#ff /]# /app/bin/ff attach
Can't access pipe directory /tmp/erl_pipes/ff#127.0.0.1/: No such file or directory
[root#ff /]# /app/bin/ff remote_console
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1> ^G
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1>
And that's it - I can exit with q()..
There is no erl_pipes in /tmp.
Control-G seems to be captured by Docker. I cannot get to the "User switch command" menu.
Even running a pure Erlang shell is not so easy:
[root#ff /]# /app/erts-7.1/bin/erl
{"init terminating in do_boot",{'cannot get bootfile','/app/bin/start.boot'}}
Crash dump is being written to: erl_crash.dump...done
init terminating in do_boot ()
I have run out of ideas. Any help would be appreciated.

Found a workaround, managed to get ^G working by overriding the default "dumb" terminal docker sets:
export TERM=xterm
After this ^G works, starting a remote shell works, and I'm a happy camper! Would be glad to know why neither the attach nor the remote_console commands work though.

Node is not running, Nitrogen

In shell I typed bin/dev page foo and shell returned Node is not running, I checked my logs and noticed the message epmd: epmd: node name already occupied nitrogen
Then, in shell I typed epmd -names and it returned
epmd: up and running on port 4369 with data:
name nitrogen at port 61109
Running epmd -debug gives
epmd: Thu Jun 27 01:01:52 2013: epmd running - daemon = 0
epmd: Thu Jun 27 01:01:52 2013: there is already a epmd running at port 4369
I cannot stop the node, and when I try apparently it is active in the db
epmd: local epmd responded with <>
Killing not allowed - living nodes in database.
In Eshell, I received the following
=ERROR REPORT==== 27-Jun-2013::00:49:53 ===
** Connection attempt from disallowed node 'nitrogen_maint_19141#127.0.0.1' **
Is there a method to get Eshell to recognize this node, in order to run bin/dev function?

I've noticed you posting on the Nitrogen mailing list, and as I understand it, you've got it straightened out, but in this situation, I'd kill the running node manually with a ps aux | grep nitrogen, then kill the process it finds with a simple kill XYZ.
That, or, I've seen the "Node is not running" thing pop up when the process was launched with a different user, such that you don't have access to the erlang pipe.
Admittedly, my advice isn't terribly scientific (killing a process is pretty nasty), but it's a simple solution if for whatever reason something got hosed during launching and you're unable to attach to the node.

Unable to set-up tsung cluster on EC2 - Erlang crash

I am trying to setup a tsung cluster on two ec2 instances:
Master - ip-10-212-101-85.ec2.internal
Slave - ip-10-116-39-86.ec2.internal
Both have erlang (R15B) and tsung (1.4.2) installed, and install-path is same on both of them.
I can do ssh from Master to Slave and vice versa without password.
Firewall is stopped on both the machines (service iptables stop)
On Master, the attempt to start a erlang slave agent result in {error,timeout}:
[root#ip-10-212-101-85 ~]# erl -rsh ssh -sname foo -setcookie mycookie
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9 (abort with ^G)
(foo#ip-10-212-101-85)1> slave:start('ip-10-116-39-86',bar,"-setcookie mycookie").
{error,timeout}
On Slave, the beam comes up for few seconds then it crashes. The erl_crash.dump can be found here
I am stuck with error, any clue will be very helpful.
PS:
On both machine the /etc/hosts is same, the file looks like below:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.212.101.85 ip-10-212-101-85.ec2.internal
10.116.39.86 ip-10-116-39-86.ec2.internal

Looks like "service iptables stop" on individual nodes is not sufficient.
In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all.
This solved the problem.

If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Erlang net_kernel fails to start node (nodistribution) - erlang

The disk was full maybe due to the errors being written to log files. I deleted logs that occupied the most space in var/log and then ran yum erase erlang followed by a clean reinstall of erlang and rabbitmq. This resolved the issue. Thank you everyone for your contribution!

You need rabbitmqctl stop, not just rabbitmqctl stop_app. According to the documentation, stop_app "stops the RabbitMQ application, leaving the Erlang node running", while stop "stops the Erlang node on which RabbitMQ is running".

Issue is coming from the fact that epmd is not started. You need to start epmd manually or to by providing a node name when launching erl. This not specific to rabbitmq distribution. http://erlang.org/documentation/doc-8.0/erts-8.0/doc/html/epmd.html

Related

EQMX not started as a service

Erlang remote shell not working

Connecting to a running Erlang application release in a docker container

Node is not running, Nitrogen

Unable to set-up tsung cluster on EC2 - Erlang crash

Categories

Resources