Cross-cluster message passing in Elixir? - erlang

Suppose I have two clusters like the following:
Cluster 1
a <---> b
Cluster 2
c <---> d
where a/b and c/d are separate clusters.
Now suppose I wanted to pass a message from node a in cluster 1 to some remote pid in cluster 2, that may be running on either node c or node d. WITHOUT joining the two clusters together, is it possible to pass messages between them with the standard Erlang/Elixir remote-communication tools, or would I need to use an external system such as RabbitMQ, Redis pub/sub, ...?

Related

How to set dynamically environment variable in Docker Swarm base on number of nodes

I have a swarm on multiple nodes, each of the nodes has its own URL address.
Each of the nodes can return 301 to a different node URL (external URL) in some cases.
I want to make sure that a node doesn't redirect a client to a node that is down.
I thought maybe to add an environment variable to the swarm and in it set all the live nodes, and when some node is down than to automatically reset the environment variable to the new value.
for example, I have:
www.a.com (node a), www.b.com (node b), www.c.com (node c).
when they are all up, the environment is set to (www.a.com,www.b.com,www.c.com), when b for example is down then it will change to (www.a.com,www.c.com).
Now the questions I have is:
A. How can I do that? How can I make the swarm update an environment variable base on the nodes that are up?
B. How can I make the dockers in the node know that they need to reload the environment variable? (I don't want to read it every x seconds, I think its too expensive)

EventStore 3-node cluster behaviour when one node is down

After connecting to a 3-node event store cluster, then, for whatever reason, one of the 3 nodes is down (could be any one of the 3), what happens if the clients tries to append some data to the cluster now?
Does it write the data to the two remaining nodes?
Is the behaviour deterministic or it depends on which node (master or slave) is down?
Event Store clustering relies on the gossip seed to select one master from all available nodes. As soon as the master node is down, the cluster will elect a new master. All the writes always, unconditionally, are directed to the master node.
You must ensure that you use the proper connection string to connect to the cluster, not to a single node, like:
Multi-node DNS name:
var connectionString = "ConnectTo=discover://admin:changeit#mycluster:3114; HeartBeatTimeout=500
Individual cluster nodes list:
var connectionString = "GossipSeeds=192.168.0.2:1111,192.168.0.3:1111; HeartBeatTimeout=500"
You can only force the connection to use slave nodes when using it for subscriptions.

HBase Row Key Design

I'm using Hbase coupled with phoenix for interractive analytics and i'm trying to desing my hbase row key for an iot project but i'm not very sure if i'm doing it right.
My Database can be represented into something like this :
Client--->Project ----> Cluster1 ---> Cluster 2 ----> Sensor1
Client--->Project ----> Building ----> Sensor2
Client--->Project ----> Cluster1 ---> Building ----> Sensor3
What i have done is a Composite primary key of ( Client_ID, Project_ID,Cluster_ID,Building_iD, SensorID)
(1,1,1#2,0,1)
(1,1,0,1,2)
(1,1,1,1,3)
And we can specify multiple Cluster or building with a seperator # 1#2#454 etc
and if we don't have a node we insert 0.
And in the columns family we will have the value of the sensor and multiples meta_data.
My Question is this hbase row key design for a request that say we want all sensors for the cluster with ID 1 is valid ?
I thought also to just put the Sensor_ID,TimeStamp in the key and put all the rooting in the column family but with this design im not sure its a good fit for my requests .
My third idea for this project is to combine neo4j for the rooting and hbase for the data.
Anyone got any experience on similar problems to guide me on the best approach to design this database ?
It seems that you are dealing with time series data. Once of the main risks of using HBase with time series data (or other forms of monotonically increasing keys) is hotspotting. This is dangerous scenario that might arise and make your cluster behave as a single machine.
You should consider OpenTSDB on top of HBase as it approaches the problem quite nicely. The single most important thing to understand is how it engineers the HBase schema/key. Note that the timestamp is not in the leading part of the key and it assumes a number of distinct metric_uid >>> of the number of slave nodes and region servers (This is essential for a balanced cluster).
An OpenTSDB key has the following structure:
<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
Depending on your specific use case you should engineer your metric_uid appropriately (maybe a compound key unique to a sensor reading) as well as the tags. Tags will play a fundamental role in data aggregation.
NOTE: As of v2.0 OpenTSDB introduced the concept of Trees that could be very helpful to 'navigate' your sensor readings and facilitate aggregations. I'm not too familiar with them but I assume that you could create a hierarchical structure that will help determining which sensors are associated with which client, project, cluster, building, and so on...
P.S. I don't think that there is room for Neo4J in this project.

Erlang scalability questions related to gen_server:call()

in erlang otp when making a gen_server:call(), you have to send in the name of the node, to which you are making the call.
Lets say I have this usecase:
I have two nodes: 'node1' and 'node2' running. I can use those nodes to make gen_server:call() to each other.
Now lets say I added 2 more nodes: 'node3' and 'node4' and pinged each other so that all nodes can see and make gen_server:calls to each other.
How do the erlang pros handle the dynamic adding of new nodes like that so that they know the new node names to enter into the gen_server calls, or is it a requirement to know beforehand the names of all the nodes so that they are hardcoded in somewhere like the sys.config?
you can use:
erlang:nodes()
to get a "now" view of the node list
Also, you can use:
net_kernel:monitor_nodes(true) to be notified as nodes come and go (via ping / crash / etc)
To see if your module is running on that node you can either call the gen_server with some kind of ping callback
That, or you can use the rpc module to call erlang:whereis(name) on the foreign node.

should everything connect with node 0 in neo4j

In neo4j should all nodes connect to node 0 so that you can create a traversal that spans across all objects? Is that a performance problem when you get to large datasets? If so, how many nodes is too much? Is it ok not to have nodes connect to node 0 if I don't see a use case for it now, assuming I use indexes for finding specific nodes?
There is no need or requirement to connect everything to the root node. Indexes work great in finding starting points for your traversal. If you have say less then 5000 nodes connected to a starting node (like the root node), then a relationship scan is cheaper than an index lookup.
To judge what is better, you need to know a bit more about the domain.

Resources