Erlang Supervisor Strategy For Restarting Connections to Downed Hosts - erlang

I'm using erlang as a bridge between services and I was wondering what advice people had for handling downed connections?
I'm taking input from local files and piping them out to AMQP and it's conceivable that the AMQP broker could go down. For that case I would want to keep retrying to connect to the AMQP server but I don't want to peg the CPU with those connections attempts. My inclination is to put a sleep into the reboot of the AMQP code. Wouldn't that 'hack' essentially circumvent the purpose of failing quickly and letting erlang handle it? More generally, should the erlang supervisor behavior be used for handling downed connections?

I think it's reasonable to code your own semantics for handling connections to an external server yourself. Supervisors are best suited to handling crashed/locked/otherwise unhealthy processes in your own process tree not reconnections to an external service.
Is your process that pipes the local files in the same process tree as the AMQP broker or is it a separate service?

Related

Preventing uwsgi_response_write_body_do() TIMEOUT

We use uwsgi with the python3 plugin, under nginx, to serve potentially hundreds of megabytes of data per query. Sometimes when nginx is queried from client a slow network connection, a uwsgi worker dies with "uwsgi_response_write_body_do() TIMEOUT !!!".
I understand the uwsgi python plugin reads from the iterator our app returns as fast as it can, trying to send the data over the uwsgi protocol unix socket to nginx. The HTTPS/TCP connection to the client from nginx will get backed up from a slow network connection and nginx will pause reading from its uwsgi socket. uwsgi will then fail some writes towards nginx, log that message and die.
Normally we run nginx with uwsgi buffering disabled. I tried enabling buffering, but it doesn't help as the amount of data it might need to buffer is 100s of MBs.
Our data is not simply read out of a file, so we can't use file offload.
Is there a way to configure uwsgi to pause reading from the our python iterator if that unix socket backs up?
The existing question here "uwsgi_response_write_body_do() TIMEOUT - But uwsgi_read_timeout not helping" doesn't help, as we have buffering off.
To answer my own question, adding socket-timeout = 60 is helping for all but the slowest client connection speeds.
That's sufficient so this question can be closed.

Restcomm jDiameter: Error creating SCTP socket

I am trying to create a standalone SCTP diameter client using Jdiameter. The jar libraries I am using are jdiameter-api-1.5.9.0-build538-SNAPSHOT and jdiameter-impl-1.5.9.0-build538-SNAPSHOT
But I get this error Unable to create server socket for LocalPeer 'client.test.com' at 127.0.0.1:55555 (org.mobicents.protocols.api.AssociationListener)
It works fine with TCP. I tried to debug but couldn't figure out the problem. Kindly help me with this.
SCTP will not work on windows systems. For linux systems, you might have to install the sctp stack. However, be aware that for some linux distributions you might run into strange issues with it, so e.g. that the port is still being blocked even after all server sockets, client sockets etc are closed and even the processes have been shut down or killed. In these cases, you need to wait about 5-10 minutes until the sctp stack recognizes that there is no one anymore who is interested in this port and releases it by itself.

How does puma master process transfer the request to workers?

I've been searching for an answer on this but I couldn't find one.
How does Puma master process communicates with the workers ? How the master process sends the request to the worker ? Is this done with shared memory ? Unix socket ?
Thanks!
The master doesn't deal with requests, it merely monitors the workers and restarts them when necessary.
The workers, independently, will pull requests from some queueing system, e.g. a TCP port or unix socket.

Short Circuit Erlang Port Mapper Daemon

Given a known TCP port and name for a remote beam.smp service, as well as a known cookie, is it possible to short circuit the Erlang Port Mapper Daemon handshake phase of the Erlang distribution protocol and establish an Erlang shell directly to the target beam.smp service?
The protocol is documented here:
http://erlang.org/doc/apps/erts/erl_dist_protocol.html
And here:
https://github.com/blackberry/Erlang-OTP/blob/master/lib/kernel/internal_doc/distribution_handshake.txt
But it is not clear to me if the recv_challenge/send_challenge authentication occurs via the Erlang Port Mapper Daemon or the beam.smp service bound to a specific port.
Thank you for your time.
Authentication occurs between Erlang VMs (beam or beam.smp). epmd only handles port registration. Simply short-circuiting epmd is not extremely easy, and other approaches might be more appropriate to your actual need.
Unfortunately, epmd is not an option for the default distribution protocol (inet_tcp_dist) or for its SSL counterpart. There are two undocumented options that look like you can disable epmd (-no_epmd) or provide an alternative implementation (epmd_module). However, the dependency of the distribution protocols on epmd is hard-coded and does not depend on these options.
So you could:
override the erl_epmd module at the code server level (probably the dirtiest approach);
provide an alternative distribution protocol which would copy (or call) inet_tcp_dist except for the parts where erl_epmd is called. Mainly, you need to provide your own implementation of setup/5.
If you don't want the shell node to connect to epmd for registering its name, you will also need to override listen/1. In this case, you can pass -no_epmd to the command line.
Alternatively, you can connect to epmd to register the listening node in order to create a shell connection using the default protocol.
This approach is particularly useful if epmd lost track of a node (e.g. it was killed, unfortunately epmd is a single point of failure). To do so:
Create a TCP connection to epmd and send a packet to register the lost node with its known port and name. Keep the TCP connection open or epmd will unregister the node.
Connect a new shell to the lost node using the name used in previous step.
You can then close the connection established in (1) and eventually re-register the lost node to epmd by calling erl_epmd:register_node/2 (and sending well-crafted tcp_closed message if required).

MPI error due to Timeout in making connection to a remote process

I'm trying to run a NAS-UPC benchmark to study it's profile. UPC uses MPI to communicate with remote processes .
When I run the benchmark with 64 processes , i get the following error
upcrun -n 64 bt.C.64
"Timeout in making connection to remote process on <<machine name>>"
Can anybody tell me why this error occurs ?
this probably means that you're failing to spawn the remote processes - upcrun delegates that to a per-conduit mechanism, which may involve your scheduler (if any). my guess is that you're depending on ssh-type remote access, and that's failing, probably because you don't have keys, agent or host-based trust set up. can you ssh to your remote nodes without password? sane environment on the remote nodes (paths, etc)?
"upcrun -v" may illuminate the problem, even without resorting to the man page ;)

Resources