Connecting to a running Erlang application release in a docker container - docker

This is embarassing, but I am totally stuck and wasted the better part of this morning. I have an Erlang app release created by relx, deployed and running in a Docker container. I need to get to the shell on the running node, but I'm failing to do so. Here is what happens:
$ docker exec -it 770b497d7f27 /bin/bash
[root#ff /]# /app/bin/ff
Usage: ff {start|start_boot <file>|foreground|stop|restart|reboot|pid|ping|console|console_clean|console_boot <file>|attach|remote_console|upgrade|escript|rpc|rpcterms}
[root#ff /]# /app/bin/ff ping
pong
[root#ff /]# /app/bin/ff attach
Can't access pipe directory /tmp/erl_pipes/ff#127.0.0.1/: No such file or directory
[root#ff /]# /app/bin/ff remote_console
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1> ^G
Eshell V7.1 (abort with ^G)
(remshfbfbd4dd-ff#127.0.0.1)1>
And that's it - I can exit with q()..
There is no erl_pipes in /tmp.
Control-G seems to be captured by Docker. I cannot get to the "User switch command" menu.
Even running a pure Erlang shell is not so easy:
[root#ff /]# /app/erts-7.1/bin/erl
{"init terminating in do_boot",{'cannot get bootfile','/app/bin/start.boot'}}
Crash dump is being written to: erl_crash.dump...done
init terminating in do_boot ()
I have run out of ideas. Any help would be appreciated.

Found a workaround, managed to get ^G working by overriding the default "dumb" terminal docker sets:
export TERM=xterm
After this ^G works, starting a remote shell works, and I'm a happy camper! Would be glad to know why neither the attach nor the remote_console commands work though.

Related

docker - driver "devicemapper" failed to remove root filesystem after process in container killed

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?
I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Erlang net_kernel fails to start node (nodistribution)

I am new to Erlang and RabbitMQ.
I have a node on RabbitMQ on CentOS which I had to reset to restart the message queues. Ever since the restart, the Erlang refuses to start the node. There was an erlang_vm corrupted error that was fixed with a rabbit remove and restart. I've tried net_kerlnel start in erlang shell but it fails.
[root#directadmin ~]# erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
1> node().
nonode#nohost
2> net_kernel:start([rabbit, shortnames]).
{error,
{{shutdown,
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
{child,undefined,net_sup_dynamic,
{erl_distribution,start_link,[[rabbit,shortnames]]},
permanent,1000,supervisor,
[erl_distribution]}}}
3>
=INFO REPORT==== 26-Jan-2017::18:58:36 ===
Protocol: "inet_tcp": the name rabbit#directadmin seems to be in use by another Erlang node
I've noticed that someone else had a similar issue and they cited that fixing rule set in iptables resolved their issue. I am not sure how that is done. I've tried service iptables restart but that didn't make any difference
http://erlang.org/pipermail/erlang-questions/2015-October/086270.html
When I try run rabbitmqctl stop_app I get this error
[root#directadmin ~]# rabbitmqctl stop_app
Stopping node rabbit#directadmin ...
Error: erlang_vm_restart_needed
When I try running 'rabbitmqctl stop' I get the vm corrupted error
[root#directadmin ~]# rabbitmqctl stop
Stopping and halting node rabbit#directadmin ...
Error: {badarg,[{io,format,
[standard_error,
"Erlang VM I/O system is damaged, restart needed~n",[]],
[]},
{rabbit_log,handle_damaged_io_system,0,
[{file,"src/rabbit_log.erl"},{line,110}]},
{rabbit_log,with_local_io,1,
[{file,"src/rabbit_log.erl"},{line,95}]},
{rabbit,'-stop_and_halt/0-after$^0/0-0-',0,
[{file,"src/rabbit.erl"},{line,434}]},
{rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,431}]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,187}]}]}
The disk was full maybe due to the errors being written to log files. I deleted logs that occupied the most space in var/log and then ran yum erase erlang followed by a clean reinstall of erlang and rabbitmq. This resolved the issue. Thank you everyone for your contribution!
You need rabbitmqctl stop, not just rabbitmqctl stop_app.
According to the documentation, stop_app "stops the RabbitMQ application, leaving the Erlang node running", while stop "stops the Erlang node on which RabbitMQ is running".
Issue is coming from the fact that epmd is not started.
You need to start epmd manually or to by providing a node name when launching erl. This not specific to rabbitmq distribution.
http://erlang.org/documentation/doc-8.0/erts-8.0/doc/html/epmd.html

solr 6.3.0 not starting Ubuntu 14.04

I am trying to run solr on my machine. I have made everthing available for the same.
For example java and ruby versions are same as asked in the tutorials around.
This is how I am doing it.
solr_wrapper -d solr/config/ --collection_name hydra-development --version 6.3.0
This throws the followign error.
`exec': Failed to execute solr start: (RuntimeError)
Port 8983 is already being used by another process (pid: 1814)
Please choose a different port using the -p option.
The error message clearly indicates that some other process is using port 8983.
U need to find which process and try killing it
first run
$ lsof -i :8983
This will list applications running on port 8983. Lets say the pid of the process is 1814
run
$ sudo kill 1814
if you run into Error CREATEing SolrCore, it is mostly because of the permission issues caused by root installation
first cleanup the broken core:
bin/solr delete -c mycore
and recreate core as the solr user
su -u solr -c "/opt/solr/bin/solr create_core -c mycore"

Erlang remote shell not working

I have some strange behaviour on my docker containers (CentOS). When I SSH into it there's a running instance of Erlang VM (api#127.0.0.1) I can't connect to it with -remsh argument, however I can ping it. My Erlang node (api#127.0.0.1) works correctly though.
bash-4.2# ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
Eshell V6.1 (abort with ^G)
(remote#127.0.0.1)1> node().
'remote#127.0.0.1'
(remote#127.0.0.1)2> net_adm:ping('api#127.0.0.1').
pong
(remote#127.0.0.1)3> erlang:system_info(system_version).
"Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]\n"
(remote#127.0.0.1)4> rpc:call('api#127.0.0.1', erlang, node, []).
'api#127.0.0.1'
There're 2 linux processes running - one for the actual VM and another for the process that tries to invoke remote shell
26 ? Sl 40:46 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app/ -- -name api#127.0.0.1 -boot releases/14.2.0299/start -config sys -boot_var PATH lib -noshell
32542 ? Sl+ 0:00 /home/vcap/app/bin/beam.smp -- -root /home/vcap/app -progname erl -- -home /home/vcap/app -- -name remote#127.0.0.1 -remsh api#127.0.0.1
When I copy Erlang binary files to the host (Arch Linux) and run ./bin/erl I have different results:
[jodias#arch tmp]$ ./bin/erl
Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V6.1 (abort with ^G)
1>
Please note that there's the Erlang system version printed and that's missing on a docker container (however Erlang binaries are exactly the same).
What is $TERM in shell you're trying to open remote? There is a problem when TERM is absent or is not known to ncurses which Erlang is built against, making remote shell connection fail silently. Try this one:
TERM=xterm ./bin/erl -name 'remote#127.0.0.1' -remsh 'api#127.0.0.1'
I once reported the problem to Erlang mailing list but no answer came up. Now I see this issue is in Erlang issue tracker. Please vote for it to be picked by OTP team ;)

Unable to set-up tsung cluster on EC2 - Erlang crash

I am trying to setup a tsung cluster on two ec2 instances:
Master - ip-10-212-101-85.ec2.internal
Slave - ip-10-116-39-86.ec2.internal
Both have erlang (R15B) and tsung (1.4.2) installed, and install-path is same on both of them.
I can do ssh from Master to Slave and vice versa without password.
Firewall is stopped on both the machines (service iptables stop)
On Master, the attempt to start a erlang slave agent result in {error,timeout}:
[root#ip-10-212-101-85 ~]# erl -rsh ssh -sname foo -setcookie mycookie
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9 (abort with ^G)
(foo#ip-10-212-101-85)1> slave:start('ip-10-116-39-86',bar,"-setcookie mycookie").
{error,timeout}
On Slave, the beam comes up for few seconds then it crashes. The erl_crash.dump can be found here
I am stuck with error, any clue will be very helpful.
PS:
On both machine the /etc/hosts is same, the file looks like below:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.212.101.85 ip-10-212-101-85.ec2.internal
10.116.39.86 ip-10-116-39-86.ec2.internal
Looks like "service iptables stop" on individual nodes is not sufficient.
In the Security Group that is applied on the VMs, I added the a new rule that opens port-range 0 - 65535 for all.
This solved the problem.
If that's all verbatim, then the problem is likely slave:start('ip-10-116-39-86',bar,"-sttcookie mycookie"). - Try slave:start('ip-10-116-39-86',bar,"-setcookie mycookie"). instead.

Resources