Distributed computing in a network - Framework/SDK - network-programming

I need to build a system that consist of:
Nodes, each mode can accept one input.
The node that received the input shares it with all nodes in the network.
Each node do a computation on the input (same computation but each node has a different database so the results are different for each node).
The node that received the input consolidate each node result and apply a logic to determine the overall result.
This result is returned to the caller.
It's very similar to a map-reduce use case. Just there will be a few nodes (maybe 10~20), and solutions like hadoop seems an overkill.
Do you know of any simple framework/sdk to build:
Network (discovery, maybe gossip protocol)
Distribute a task/data to each node
Aggregate the results
Can be in any language.
Thanks very much
Regads;
fernando

Ok to begin with, there are many ways to do this. I would suggest the following if you are just starting to tackle this architecture:
Pub/Sub with Broker
Programs like RabbitMQ are meant to easily allow for variable amounts of nodes to connect and speak to one another. Most importantly, they allow for transparency and observability. You can easily ask the Broker which nodes are connected and even view messages in transit. Basically they are a 'batteries included' means of delaying with a large amount of clients.
Brokerless (Update)
I was looking for a more 'symmetric' architecture where each node is the same and do not have a centralized broker/queue manager.
You can use a brokerless Pub/Subs, but I personally avoid them. While they have tooling, it is hard to understand their registration protocols if something odd happens. I generally just use Multicast as it is very straight forward, especially if each node has just one network interface, and you can extend/modify behavior just with routing infra.
Here is how you scheme would work with Multicast:
All nodes join a known multicast address (IE: 239.1.2.3:8000)
All nodes would need to respond to a 'who's here' message
All nodes would either need to have a 'do work' api either via multicast or from consumer to node (node address grabbed from 'who's here message)
You would need to make these messages yourself, but given how short i expect them to be it should be pretty simple.
The 'who's here' message from the consumer could just be a message with a binary zero.
The 'who's here' response could just be a 1 followed by the nodes information (making it a TLV would probably be best though)
Not sure if each node has unique arguments or not so i don't know how to make your 'do work' message or responce

Related

How to get a list of all topics containing specific values known to MQTT broker?

I'm looking for a way to get a list of all topics known to a broker. There are some quite similar question's, but they didn't help me to figure it out for my use case.
I've got 3 Raspberry Pi's with multiple sensors (temperature, humidity) which are connected over an MQTT network. Every Pi has it's own database containing time series of measurements and other system variables(like CPU).
Now I'm looking for a way for the following szenario:
I want to monitor my system and detect anomalies. For that I want to get all sensor-time series in the last x seconds and process them in a python script. My system to do the monitoring calculations can be every Pi.
Example: I'm on RPI2 and want to monitor the whole distributed network. There's no given knowledge about the sensors attached to the Pi's. Now from my python script running on RP2 I would initalise a MQTT client and subscribe every sensor data on the broker. I know about the wildcard # but I'm not sure how to use it in that case. My magic command would look like the following pseudo code:
1) client subscribe to all sensor data - #/sensor/#
2) get list with all topics
3) client subscribe to all topics from given list list/#
4) analyse data for anomalies every x seconds
First, your wildcard topic patterns are not valid. Topic patterns can only contain a single '#' character and it can only appear at the end of a topic e.g. foo/bar/# is valid, #/foo is not. You can use the + character which is a single level wildcard character.
This means a topic pattern of +/sensor/# will match each of the following:
rpi1/sensor/foo
rpi1/sensor/bar/temp
but not
rpi1/foo/sensor/bar
Next brokers do not have a list of topics that exist. Topics only really exist at the instant that a message is published to one, the broker then checks the patterns that subscribing clients have requested and checks that topic against the list and delivers it to the clients that match.
Thirdly when bridging brokers in loops like that you have to be very careful with the bridge filters to make sure that messages don' end up a constant loop.
The solution is probably to designate a "master" broker and bridge all the others one way to that broker and then have the client subscribe to either '#' to get everything or something more like '+/sensor/#' to just see the sensor readings.

Can I call GenServer client functions from a remote node?

I have a GenServer on a remote node with both implementation and client functions in the module. Can I use the GenServer client functions remotely somehow?
Using GenServer.call({RemoteProcessName, :"app#remoteNode"}, :get) works a I expect it to, but is cumbersome.
If I want clean this up am I right in thinking that I'd have to write the client functions on the calling (client) node?
You can use the :rpc.call/{4,5} functions.
:rpc.call(:"app#remoteNode", MyModule, :some_func, [arg1, arg2])
For large number of calls, It's better to user gen_server:call/2-3.
If you want to use rpc:call/4-5, you should know that it is just one process named rex on each node for handling all requests. So if it is running one Mod:Func(Arg1, Arg2, Argn), It can not response to other request at this time !
TL;DR
Yes
Discussion
There are PIDs, messages, monitors and links. Nothing more, nothing less. That is your universe. (Unless you get into some rather esoteric aspects of the runtime implementation -- but at the abstraction level represented by EVM languages the previously stated elements (should) constitute your universe.)
Within an Erlang environment (whether local or distributed in a mesh) any PID can send a message addressed to any other PID (no middle-man required), as well as establish monitors and so on.
gen_server:cast sends a gen_server packaged message (so it will arrive in the form handle_cast/2 will be called on). gen_server:call/2 establishes a monitor and a timeout for receiving a labeled reply. Simply doing PID ! SomeMessage does essentially the same thing as gen_server:cast (sends a message) without any of the gen_server machinery behind it (messier to abstract as an interface).
That's all there is to it.
With this in mind, of course you can use gen_server:call/2 across nodes, as long as they are connected into a cluster/mesh via disterl. Two disconnected nodes would have to communicate a different way (network sockets) and wouldn't have any knowledge of each other's internal mapping of PIDs, but as long as disterl is being used they all translate PIDs amongst themselves quite readily. Named processes is where things get a little tricky, but that is the purpose of the global module and utilities such as gproc (though dependence on such facilities beyond a certain point is usually an indication of an architectural problem).
Of course, just because PIDs from any node can communicate with PIDs from another node doesn't always means they should. The physical topology of the network (bandwidth, latency, jitter) comes into play when you start sending high-frequency or large messages (lots of gen_server:calls), and you have always got to think of partition tolerance -- but for off-loading heavy sorts of work (rare) or physically partitioning sub-systems within a very large system (more common) directly sending messages is a very simple way to take a program coded for a single node and distribute it across a cluster.
(With all that in mind, it is somewhat rare to see the rpc module used.)

Erlang clusters

I'm trying to implement a cluster using Erlang as the glue that holds it all together. I like the idea that it creates a fully connected graph of nodes, but upon reading different articles online, it seems as though this doesn't scale well (having a max of 50 - 100 nodes). Did the developers of OTP impose this limitation on purpose? I do know that you can setup nodes to have explicit connections only as well as have hidden nodes, etc. But, it seems as though the default out-of-the-box setup isn't very scalable.
So to the questions:
If you had 5 nodes (A, B, C, D, E) that all had explicit connections such that A-B-C-D-E. Does Erlang/OTP allow A to talk directly to E or does A have to pass messages from B through D to get to E, and thus that's the reason for the fully connected graph? Again, it makes sense but it doesn't scale well from what I've seen.
If one was to try and go for a scalable and fault-tolerant system, what are your options? It seems as though, if you can't create a fully connected graph because you have too many nodes, the next best thing would be to create a tree of some kind. But, this doesn't seem very fault-tolerant because if the root or any parent of children nodes dies, you would lose a significant portion of your cluster.
In looking into supervisors and workers, all of the examples I've seen apply this to processes on a single node. Could it be applied to a cluster of nodes to help implement fault-tolerance?
Can nodes be part of several clusters?
Thanks for your help, if there is a semi-recent website or blogpost (roughly 1-year old) that I've missed, I'd be happy to look at those. But, I've scoured the internet pretty well.
Yes, you can send messages to a process on any remote node in a cluster, for example, by using its process identifier (pid). This is called location transparency. And yes, it scales well (see Riak, CouchDB, RabbitMQ, etc).
Note that one node can run hundred thousands of processes. Erlang has proven to be very scalable and was built for fault tolerance. There are other approaches to build bigger, e.g. SOA approach of CloudI (see comments). You also could build clusters that use hidden nodes if you really really need to.
At the node level you would take a different approach, for example, build identical nodes that are easy to replace if they fail and the work is taken over by the remaining nodes. Check out how Riak handles this (look into riak_core and check the blog post Introducing Riak Core).
Nodes can leave and enter a cluster but cannot be part of multiple clusters at the same time. Connected nodes share one cluster cookie which is used to identify connected nodes. You can set the cookie while the VM is running (see Distributed Erlang).
Read http://learnyousomeerlang.com/ for greater good.
The distribution protocol is about providing robustness, not scalability. What you want to do is to group your cluster into smaller areas and then use connections, which are not distribution in Erlang but in, say, TCP sessions. You could run 5 groups of 10 machines each. This means the 10 machines have seamless Pid distribution: you can call a pid on another machine. But distributing to another group means you can't seamlessly address the group like that.
You generally want some kind of "route reflection" as in BGP.
1) I think you need a direct connection between nodes to communicate between processes. This does, however, mean that you don't need persistent connections between all the nodes if two will never communicate (say if they're only workers, not coordinators).
2) You can create a not-fully-connected graph of erlang nodes. The documentation is hard to find, and comes with problems - you disable the global system which handles global names in the cluster, so you have to do everything by locally registered names, or locally registered names on remote nodes. Or just use Pids, as they work too. To start an erlang node like this, use erl ... -connect_all false .... I hope you know what you're up to, as I couldn't trust myself to do that.
It also turns out that a not-fully-connected graph of erlang nodes is a current research topic. The RELEASE Project is currently working on exactly that, and have come up with a concept of S-groups, which are essentially fully-connected groups. However, nodes can be members of more than one S-group and nodes in separate s-groups don't have to be fully connected but can establish the connections they need on demand to do direct node-to-node communication. It's worth finding presentations of theirs because the research is really interesting.
Another thing worth pointing out is that several people have found that you can get up to 150-200 nodes in a fully-connected cluster. Do you really have a use-case for more nodes than that? Surely 150-200 incredibly beefy computers would do most things you could throw at them, unless you have a ridiculous project to do.
3) While you can't start processes on a different node using gen_server:start_link/3,4, you can certainly call servers on a foreign node very easily. It seems that they've overlooked being able to start servers on foreign nodes, but there's probably good reason for it - such as a ridiculous number of error cases.
4) Try looking at hidden nodes, and at having a not-fully-connected cluster. They should allow you to group nodes as you see fit.
TL;DR: Scaling is hard, let's go shopping.
There are some good answers already, so I'm trying to be simple.
1) No, if A and E are not connected directly, A cannot talk to E. The distribution protocol runs on direct TCP connection - no routing included.
2) I think a tree structure is good enough - trade-offs always exist.
3) There's no 'supervisor for nodes', but erlang:monitor_node is your friend.
4) Yes. A node can talk to nodes from different 'clusters'. In the local node, use erlang:set_cookie(OtherNode, OtherCookie) to access a remote node with a different cookie.
1)
yes. they talk to each other
2) 3) and 4)
Generally speaking, when building a scalable and fault tolerant system, you would want, or more over, need to divide the work load to different "regions" or "clusters". Supervisor/Worker model has this envisioned thus the topology. What you need is a few processes coordinating work between clusters and all workers within one single cluster will talk to each other to balance out within group.
As you can see, with this topology, the "limitation" is not really a limitation as long as you divide your tasks carefully and in a balanced fashion. Personally, I believe a tree like structure for supervisor processes is not avoidable in large scale systems, and this is the practice I'm following. Reasons are vary but boils down to scalability, fault tolerance as fall back policy implementation, maintenance need and portability of the clusters.
So in conclusion,
2) use a tree-like topology for your supervisors. let workers explicitly connect to each other and talk within their own domain with the supervisors.
3) while this is the native designed environment, as I presume, I'm pretty sure a supervisor can talk to a worker on a different machine. I would not suggest this as fault tolerance can be hell in remote worker scenario.
4) you should never let a node be part of two different cluster at the same moment. You can switch it from one cluster to another though.

Is this the right way of building an Erlang network server for multi-client apps?

I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.

erlang general question on socket

I have a question about a project I should implement for my Distributed System course.
The project consist in designing and implementing a library that provides a reliable multicast service to user processes. All processes belong to a group, and a message is sent by a member process to all members of the group. The sender is excluded from the recipient list.
This seems to me quite easy to implement in erlang, due to its message passing structure...more points are given if you use rpc call instead of normal sockets based implementation..
Now my question is this: one of the mandatory points of this projects requires that sockets aren't kept open when there is no communication going on between processes...
Our course is held in C, but we are free to use any language we like...can I satisfy this constraint using erlang nodes and rpc calls?
thanks in advance
Yes. The rpc module even has multicall, which takes a list of nodes and will do exactly what you described. It won't hold your sockets open when it's not using them either.
Despite what the other answers say, Erlang's default behavior does not satisfy your constraints.
A typical network of Erlang nodes using Erlang distribution will remain densely connected (every node connected to every other node) with TCP sockets open even when you're not using them. You will either have to use -connect_all false and manage opening/closing the connections to other nodes yourself, or you will have to develop your own distribution protocol. I would recommend the latter, especially since you are learning. The trick to make it easy is to use term_to_binary and binary_to_term.

Resources