Erlang: automatic population of .hosts.erlang file? - erlang

I am using net_adm:world() to connect to Nodes on other hosts, but the only way I got this to work is once I manually created a hosts file and list the name of the other host in the file. If I had 10 hosts I would have to put this file on all ten machines and update the list ten times every time a new host is added to the cluster.
Is there no way this file can be automatically updated each time a connection to a Node on a new host is made?

your .hosts.erlang file doesn't need to be complete or 100% correct. a node only needs to connect to one other to learn about every other node in the cluster.
you could skip maintaining the .hosts.erlang file and use mutlicast UDP to dynamically discover nodes. See nodefinder for example code.
we started down the multicast UDP route but then decided to just maintain a central hosts file and use rsync to distribute it to all hosts. we restart nodes infrequently so it hasn't been a big problem.

We use chef to prepopulate the .hosts.erlang file for nodes that belong to a cluster. The function net_adm:world() can be used to determine the nodes that are currently part of a cluster which does not necessarily match what is contained .hosts.erlang, e.g., when one of the nodes is down. An alternative to using net_adm:world() is the function net_adm:world_list(Hosts) which takes a list of hosts (instead of reading from .hosts.erlang) and does the same as net_adm:world() to determine the currently connected nodes.

Related

overriding configuration on a running tarantool instance

Can anyone tell me in the course, it is possible to override the parameters of individual box.cfg on a running instance. For example, add a replica, for several days I have been trying to deploy three replicas on three hosts via the docker service stack.
When I raise my hands on each server, everything works, through deploy they do not see each other and fall. I've tried all sorts of ways. hung up the endpoint on the target nodes, when requested, it gives the ip of the machine on which the container rises, if the ip matches one of those indicated in SEED, then substitutes the internal ip of the container instead (otherwise it cannot connect to itself).
In theory, it all works as I described, but there are suspicions that everything is not much different, I suppose that the problem is that before the declaration of box.cfg the instance does not reserve the address. Alas, I can not go inside the container because it cannot rise. I got the idea that if all three nodes are declared at the minimum settings and as soon as they rise to listen to the subnet, as soon as the node finds another, it will write it to replication and override box.cfg. Correct me please who had experience.
Some of the box.cfg parameters are dynamic. For example, the box.cfg{listen=}. You can set this one from the Lua code as you wish. In your case, if the container gets its IP address later, you need to specify only the port in listen. This way, Tarantool will listen on all possible interfaces.
The replication_source is a bit trickier. You can set it dynamically, but your first (initializing) call to box.cfg should be with the replication_source. This is because all instances that are initialized without this parameter will create their own replicaset, and it will make it impossible to join them to another replicaset.
You can read more about Tarantool replication architecture here: https://www.tarantool.io/en/doc/latest/book/replication/repl_architecture/

Is there a way to add a separate graph for each host on a Datadog dashboard, when the hosts frequently change?

I'm trying to make a dashboard to monitor a process which runs on 5 remote machines simultaneously. I want the dashboard to display the metrics for each machine separately - basically, I want to create five separate graphs, one for each machine that runs the process. My problem is that the remote machines are reassigned periodically, so I have no way of knowing the name of the host at any given time.
I've tried creating five separate graphs, with each one filtered by a different host name tag, but the graphs do not seem to pick up the new host when the lease for the process is changed. I also know you can split out one graph for each host using metrics explorer, but I haven't found any way to automatically do that on a dashboard. Does anyone know if this is possible? Leases for the process are assigned through AWS, if that is helpful.
Thanks in advance for any suggestions.

Need help setting up a dev/test Corda Network with docker

I want to set up an environment where I have several VMs, representing several partners, and where each VM host one or more nodes. Ideally, I would use kubernetes to bring up/down my environment. I have understood from the docs that this has to be done as a Dev-network, not as my own compatibility zone or anything.
However, the steps to follow are not clear (to me). I have used Dockerform or the docker image provided, but this does not seem to be the way for what i need to do.
My current (it changes with the hours) understanding is that:
a) I should create a network between the vms that will be hosting nodes. To do so, i understand i should use Cordite or the Bootstrap jar. Cordite documentation seems clearer that the Corda docs, but i haven't been able to try it yet. Should one or the other be my first step? Can anyone shed some light on how?
b) Once I have my network created I need a certifying entity (Thanks #Chris_Chabot for pointing it out!)
c) The next step should be running deployNodes so I create the config files. Here, I am not sure of whether I can indicate in deployNodes at which IPs? should the nodes be created or I just need to create the dockerfiles and certificate folders and so on, and distribute across the VMs them accordingly. I am not sure either about how to point out to the Network service.
Personally, I guess that I will not use the Dockerfiles if I am going to use Kubernetes and that I only need to distribute the certificates and config files to all the slave VMs so they are available to the nodes when they are to be launched.
To be clear, and honest :D, this is even before including any cordapp in the containers, I am just trying to have the environment ready. Basically, starting a process that builds the nodes, distribute the config files among the slave vms, and runs the dockers with the nodes. As explained in a comment, the goal here is not testing Cordapps, is testing how to deploy an operative distributed dev environment.
ANY help is going to be ABSOLUTELY welcome.
Thanks!
(Developer Relations # R3 here)
A network of Corda nodes needs three things:
- A notary node, or a pool of multiple notary nodes
- A certification manager
- A network map service
The certification manager is the root of the trust in the network, and, well, manages certificates. These need to be distributed to the nodes to declare and prove their identity.
The nodes connect to the network map service, which checks their certificate to see if they have access to the network, and if so, add them to the list of nodes that it manages -- and distributes this list of node identities + ip addresses to all the nodes on that network.
Finally the nodes use the notaries to sign the transactions that happen on the network.
Generally we find that most people start developing on the https://testnet.corda.network/ network, and later deploy to the production corda.network.
One benefit of that is that this already comes with all these pieces (certification manager, network map, and a geographically distributed pool of notaries). The other benefit is that it guarantees that you have interoperability with other parties in the future, as everyone uses the same root certificate authority -- With your own network other 3rd parties couldn't just connect as they'd be on a different cert chain and couldn't be validated.
If however you have a strong reason to want to build your own network, you can use Cordite to provide the network map and certman services.
In that case step 1 is to go through the setup and configuration instructions on https://gitlab.com/cordite/network-map-service
Once that is fully setup and up and running, https://docs.corda.net/permissioning.html has some more information on how the certificates are setup, and the "Joining an existing Compatibility Zone" section in https://docs.corda.net/docker-image.html has instructions on how to get a Corda docker image / node to join that network by specifying which network map / certman url's to use.
Oh and on the IP network question: The network manager stores a combination of the X509 identity and the IP address for each node which it distributes to the network -- this means that every node, including the notaries, certman, network map and all nodes need to be able to connect to that IP address -- either by all being on the same network that you created, or by having public ip addresses

How to reconnect partitioned nodes in erlang cluster

Looking for some solutions to handle Erlang cluster partitions. Basically, whenever cluster participant is reachable again it should be added back to the cluster.The easiest solution is probably to use erlang node monitoring.
Are there any other / better solutions, maybe more dynamic which does not require fixed nodes list?
There are a few 3rd party libraries that don't have to be configured using a fixed node list. The two that I am familiar with are redgrid and erlang-redis_sd_epmd, there are probably others, but i'm just not familiar with them.
Both of these do have an external dependancy on redis which may or may not be desirable depending on what you decide is needed.
redgrid is the simpler implementation, but doesn't have a ton of features. Basically the erlang nodes connect to redis, and all erlang nodes connected to redis then establish connections to each other. You can associate meta-data with a node and retrieve it on another node.
erlang-redis_sd_epmd is a bit more complex, but allows a lot more configuration. For example instead of just automatically connecting all nodes, a node can publish services that it can perform, and a connecting node can look up nodes based on the services provided.
Not an off the shelf solution, but if you're already doing custom mods to ejabberd you can try integrating this code which resolves mnesia conflicts after cluster partitions.
https://github.com/uwiger/unsplit

Is this the right way of building an Erlang network server for multi-client apps?

I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.

Resources