EventStore 3-node cluster behaviour when one node is down - eventstoredb

After connecting to a 3-node event store cluster, then, for whatever reason, one of the 3 nodes is down (could be any one of the 3), what happens if the clients tries to append some data to the cluster now?
Does it write the data to the two remaining nodes?
Is the behaviour deterministic or it depends on which node (master or slave) is down?

Event Store clustering relies on the gossip seed to select one master from all available nodes. As soon as the master node is down, the cluster will elect a new master. All the writes always, unconditionally, are directed to the master node.
You must ensure that you use the proper connection string to connect to the cluster, not to a single node, like:
Multi-node DNS name:
var connectionString = "ConnectTo=discover://admin:changeit#mycluster:3114; HeartBeatTimeout=500
Individual cluster nodes list:
var connectionString = "GossipSeeds=192.168.0.2:1111,192.168.0.3:1111; HeartBeatTimeout=500"
You can only force the connection to use slave nodes when using it for subscriptions.

Related

How to set dynamically environment variable in Docker Swarm base on number of nodes

I have a swarm on multiple nodes, each of the nodes has its own URL address.
Each of the nodes can return 301 to a different node URL (external URL) in some cases.
I want to make sure that a node doesn't redirect a client to a node that is down.
I thought maybe to add an environment variable to the swarm and in it set all the live nodes, and when some node is down than to automatically reset the environment variable to the new value.
for example, I have:
www.a.com (node a), www.b.com (node b), www.c.com (node c).
when they are all up, the environment is set to (www.a.com,www.b.com,www.c.com), when b for example is down then it will change to (www.a.com,www.c.com).
Now the questions I have is:
A. How can I do that? How can I make the swarm update an environment variable base on the nodes that are up?
B. How can I make the dockers in the node know that they need to reload the environment variable? (I don't want to read it every x seconds, I think its too expensive)

Cross-cluster message passing in Elixir?

Suppose I have two clusters like the following:
Cluster 1
a <---> b
Cluster 2
c <---> d
where a/b and c/d are separate clusters.
Now suppose I wanted to pass a message from node a in cluster 1 to some remote pid in cluster 2, that may be running on either node c or node d. WITHOUT joining the two clusters together, is it possible to pass messages between them with the standard Erlang/Elixir remote-communication tools, or would I need to use an external system such as RabbitMQ, Redis pub/sub, ...?

Erlang ensure Mnesia schema replication

I have a distributed application.
In this, a Master node starts a mnesia schema with 4 tables. Some of them are replicated to other nodes, some are not.
When a node spawns, it registers at the master node and is added to the schema and the data are replicated to this node.
How can I ensure that my replication is finished?
I tried the following:
Timeout=60000,
TabList = [tab1, tab2, tab3, tab4],
mnesia:wait_for_tables(TabList, Timeout).
However, it does not take 60 seconds, not even 5 seconds until I get an error:
{{badmatch,{aborted,{no_exists,tab1}}}
Obviously it does not work..
When a new node joins a cluster, a rpc call from the master node performs the following function on the new node:
start_Mnesia(MasterNode) ->
mnesia:start(),
mnesia:change_config(extra_db_nodes, [MasterNode]),
Tabs=mnesia:system_info(tables) -- [schema],
[mnesia:add_table_copy(Tab, node(), ram_copies) || Tab <- Tabs].
Is it also waiting until it is written to ram_copies?
Thanks.
When a node joins your mnesia cluster it is already synchronised, regardless of copies of tables it, or other nodes, do or do not have.
You should see that a new node, after registering with your master and being added to the cluster, can already access all your tables. Adding a copy of a table doesn't change that, regardless of the state/stage of that copy.
When you add a copy of a table on your new node you can continue to run transactions, during and after replication, and the data that is or is not replicated to the node originating the transaction will make no difference to the correctness of the result.
So, if you are concerned just with synchronisation in terms of keeping your transactions ACID, then don't worry.
If you are concerned when when your data is actually replicated and stored safely on the other nodes, that's a different thing. In this case, I have observed that when you runmnesia:add_table_copy(Table, NewNode, disc_copies) it blocks, and returns only when NewNode has copied the data to the filesystem.
Remember though, unless you run mnesia:sync_transaction/3 all the time you don't have guarantees about data actually being on any disc after a transaction completes anyway.

Erlang scalability questions related to gen_server:call()

in erlang otp when making a gen_server:call(), you have to send in the name of the node, to which you are making the call.
Lets say I have this usecase:
I have two nodes: 'node1' and 'node2' running. I can use those nodes to make gen_server:call() to each other.
Now lets say I added 2 more nodes: 'node3' and 'node4' and pinged each other so that all nodes can see and make gen_server:calls to each other.
How do the erlang pros handle the dynamic adding of new nodes like that so that they know the new node names to enter into the gen_server calls, or is it a requirement to know beforehand the names of all the nodes so that they are hardcoded in somewhere like the sys.config?
you can use:
erlang:nodes()
to get a "now" view of the node list
Also, you can use:
net_kernel:monitor_nodes(true) to be notified as nodes come and go (via ping / crash / etc)
To see if your module is running on that node you can either call the gen_server with some kind of ping callback
That, or you can use the rpc module to call erlang:whereis(name) on the foreign node.

Remove not_exist_already node from mnesia cluster(scheme)

I have a bad node (it doesn't exist) in the mnesia cluster data when I get:
> mnesia:system_info(db_nodes)
[bad#node, ...]
How do I remove it from the cluster?
I tried:
> mnesia:del_table_copy(scheme, bad#node).
{aborted,{not_active,"All replicas on diskfull nodes are not active yet"...
What does this mean? How can I fix it?
Update. Before remove node from schema we need to stop mnesia on it
I had a similar problem years ago. What you are trying to do is remove an offline node, which as far as I am aware was impossible in earlier versions of mnesia.
You can however connect to the cluster using a dummy node named bad#node, and started with a tweaked system.config of the original clustered node. Once its online remove from the cluster.

Resources