how do I remove an extra node - erlang

I have a group of erlang nodes that are replicating their data through Mnesia's "extra_db_nodes"... I need to upgrade hardware and software so I have to detach some nodes as I make my way from node to node.
How does one remove a node and still preserve the data that was inserted?
[update] removing nodes is as important as adding them. Over time as your cluster grows it must also contract. If not then Mnesia is going to be busy trying to send data to nonexistent nodes filling up queues and keeping the network busy.
[final update] after pouring through the erlang/mnesia source code I was able to determine that it is not possible to completely disassociate nodes. While del_table_copy removes the linkage between tables it is incomplete. I would close this question but none of the close descriptions are adequate.

I wish I had found this a long time ago: http://weblambdazero.blogspot.com/2008/08/erlang-tips-and-tricks-mnesia.html
basically, with a properly functioning cluster....
login to the cluster to be removed
stop mnesia
mnesia:stop().
login to a different node on the cluster
delete the schema
mnesia:del_table_copy(schema, node#host.domain).

I'm extremely late to the party, but came across this info in the doc when looking for a solution to the same problem:
"The function call
mnesia:del_table_copy(schema,
mynode#host) deletes the node
'mynode#host' from the Mnesia system.
The call fails if mnesia is running on
'mynode#host'. The other mnesia nodes
will never try to connect to that node
again. Note, if there is a disc
resident schema on the node
'mynode#host', the entire mnesia
directory should be deleted. This can
be done with mnesia:delete_schema/1.
If mnesia is started again on the the
node 'mynode#host' and the directory
has not been cleared, mnesia's
behaviour is undefined."
(http://www.erlang.org/doc/apps/mnesia/Mnesia_chap5.html#id74278)
I think the following might do what you desire:
AllTables = mnesia:system_info(tables),
DataTables = lists:filter(fun(Table) -> Table =/= schema end,
AllTables),
RemoveTableCopy = fun(Table,Node) ->
Nodes = mnesia:table_info(Table,ram_copies) ++
mnesia:table_info(Table,disc_copies) ++
mnesia:table_info(Table,disc_only_copies),
case lists:member(Node,Nodes) of
true -> mnesia:del_table_copy(Table,Node);
false -> ok
end
end,
[RemoveTableCopy(Tbl,'gone#gone_host') || Tbl <- DataTables].
rpc:call('gone#gone_host',mnesia,stop,[]),
rpc:call('gone#gone_host',mnesia,delete_schema,[SchemaDir]),
RemoveTablecopy(schema,'gone#gone_host').
Though, I haven't tested it since my scenario is slightly different.

I've certainly used this method to perform this (supporting the mnesia:del_table_copy/2 use). See removeNode/1 below:
-module(tool_bootstrap).
-export([bootstrapNewNode/1, closedownNode/0,
finalBootstrap/0, removeNode/1]).
-include_lib("records.hrl").
-include_lib("stdlib/include/qlc.hrl").
bootstrapNewNode(Node) ->
%% Make the given node part of the family and start the cloud on it
mnesia:change_config(extra_db_nodes, [Node]),
%% Now make the other node set things up
rpc:call(Node, tool_bootstrap, finalBootstrap, []).
removeNode(Node) ->
rpc:call(Node, tool_bootstrap, closedownNode, []),
mnesia:del_table_copy(schema, Node).
finalBootstrap() ->
%% Code removed to actually copy over my tables etc...
application:start(cloud).
closedownNode() ->
application:stop(cloud), mnesia:stop().

If you have replicated the table (added table copies) on nodes other than the one you're removing, then you're already fine - just remove the node.
If you wanted to be slightly tidier you'd delete the table copies from the node you're about to remove first via mnesia:del_table_copy/2.
Generally, mnesia gracefully handles node loss and detects node rejoin (rebooted nodes obtain new table copies from nodes that kept running, nodes that didn't reboot are detected as a network partition event). Mnesia does not consume CPU or network traffic for nodes that have gone down. I think, though I haven't confirmed it in the source, mnesia won't reconnect to nodes that have gone down automatically - the node that goes down is expected to reboot (mnesia) and reconnect.
mnesia:add_table_copy/3, mnesia:move_table_copy/3 and mnesia:del_table_copy/2 are the functions you should look at for live schema management.
The extra_db_nodes parameter should only be used when initialising a new DB node - once a new node has a copy of the schema it doesn't need the extra_db_nodes parameter.

Related

Distributing in-memory linked list

I have a program which is build based on a singly linked list. There are different programs which creates some form of data and this data sent to this linked list module to be added. As long as I've RAM available, program working as intended. Periodically -about every year-, I archive the entire linked list to the disk -due to requirement, I'm archiving all-. So far so good.
What happens if I wanted to add new node to the list whilst RAM is full and I haven't archived and freed the memory on RAM? This might occur when producer count goes up or regardless of producer count, there may be more data created depending or where it's used etc. I couldn't find a clear solution the scale the on-memory linked list. There is a workaround in my head but don't know even if it works so I thought better to ask here.
When the RAM start to get almost full, I would create a new instance
of the linked list program -just another machine on the cloud or new
physical computer on premise, whatever -.
I do have an service discovery module -something like ZooKeeper-, this discovery module will detect the newly created machine and adds to the list.
When first instance is almost in it's limits, it will check if there is an available instance, if there is; it will relay the node to the next instance and it will update its last node's next pointer to something special. If you wanted to traverse the list from start to finish across all the machines every time you come to this special node, it will have the information of the which machine has the next node. Traversal will continue from the next machine that the last node points to.
Since this this not a hash map or something in that nature, I can't just add replicate the service and for example relay the incoming request based on a given key to a particular machine.
Rather than archiving part of the old data and loading that to the RAM and continuing on like that, I thought it would be better to have a last pointer to point to a different machine and continue reading from that machine. My choice for a network call seemed better because this program will be used in a intranet, but still I couldn't find a solid solution on paper.
Is there a such example that I can study on and try to find a better solution? Is this solution feasible?
An example:
Machine 1:
1st node : [data:x, *next: 2nd Node address],
2nd node : [data:123, *next: 3rd Node address],
...
// at this point RAM is almost full
// receive next instance's ip
(n-1)th node : [data:987, *next: nth Node address],
nth node : [data:x2t, type: LastNodeInMachine, *next: nullptr]
Machine 2:
1st node == (n+1) node : [data:x, *next: 2nd Node address],
... and so on

Processes spawned on connected nodes get same PID

I have four Erlang nodes working together on multi-process application.
In my order one process is monitoring which draws the location of the processes on the area and the three other nodes handle the processes location and movement. On the monitor I use an ETS database to store the locations when the key is the process PID. I have noticed that the nodes creates processes which have the same PIDs which obviously interrupts with the management of the entire system.
I have tried to connect the processes with:
net_adm:ping(...).
net_kernel:connect(...).
I was hoping that when the nodes will be aware of each other they will give different PIDs but that did not work.
The PIDs may be printed the same, e.g. <0.42.0>, but that's just an output convention: PIDs on the local node are printed with the first number being 0. If you'd send this PID to another node and print it there, it would be printed as <2265.42.0> or similar. PIDs are always associated with the name of the node where the process is running, and you can extract it with node(Pid). Therefore, PIDs from different nodes will never compare equal.
This answer goes into more details about the structure of a PID.

The impact of a distributed application configuration on node discovery via net_adm:ping/0

I am experiencing different behavior with respect to net_adm:ping/1 when being done in the context of a Distributed Application.
I have an application that pings a well-known node on start-up and in that way discovers all nodes in a mesh of connected nodes.
When I start this application on a single node (non-distributed configuration), the net_adm:ping/1 followed by a nodes/0 reports 4 other nodes (this is correct). The 4 nodes are on 2 different physical machines, so what is returned is the following n1#machine_1, n2#machine_2, n3#machine_2, n4#machine_1 (ip addresses are actually returned, not machine_x).
When part of a two-node distributed application, on the node where the application starts, the net_adm:ping/1 followed by a nodes/0 reports 2 nodes, one from each machine(n1#machine1, n2#machine2). A second call to nodes/0 after about a 750 ms delay results in the correct 5 nodes being found. Two of the three missing nodes are required for my application to work and so, not finding them, the application dies.
I am using R15B02
Is latency regarding the transitive node-discovery process known to be different when some of the nodes in the mesh are participating in distributed application configuration?
The kernel application documentation mentions the way to synchronize nodes in order to stop the boot phase until ready to move forward and everything is in place. Here are the options:
sync_nodes_mandatory = [NodeName]
Specifies which other nodes must be alive in order for this node to start properly. If some node in the list does not start within the specified time, this node will not start either. If this parameter is undefined, it defaults to [].
sync_nodes_optional = [NodeName]
Specifies which other nodes can be alive in order for this node to start properly. If some node in this list does not start within the specified time, this node starts anyway. If this parameter is undefined, it defaults to the empty list.
A file using them could look as follows:
[{kernel,
[{sync_nodes_mandatory, [b#ferdmbp, c#ferdmbp]},
{sync_nodes_timeout, 30000}]
}].
Starting the node a#ferdmbp by calling erl -sname a -config config-file-above. The downside of this approach is that each node needs its own config file.

What is the difference between "raw" dirty operations, and dirty operations within mnesia:async_transaction

What is the difference between a series of mnesia:dirty_ commands executed within a function passed to mnesia:async_dirty() and those very same transactions executed "raw"?
I.e., is there any difference between doing:
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1})
and
F = fun() ->
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1})
end,
mnesia:async_dirty(F)
Thanks
Lets first quote the Users' Guide on async_dirty context:
By passing the same "fun" as argument to the function mnesia:async_dirty(Fun [, Args]) it will be performed in dirty context. The function calls will be mapped to the corresponding dirty functions.This will still involve logging, replication and subscriptions but there will beno locking, local transaction storage or commit protocols involved. Checkpointretainers will be updated but will be updated "dirty". Thus, they will be updatedasynchronously. The functions will wait for the operation to be performed on one node but not the others. If the table resides locally no waiting will occur.
The two options you have provided will be executed the same way. However, when you execute the dirty functions out of the fun as in option one, each one is a separate call into mnesia. With async_dirty, the 3 calls will be bundled and mnesia will only wait until the 3 are completed on the local node to return. However, the behavior of these two may differ in a mnesia multi-node cluster. Do some tests :)

Mnesia: How to lock multiple rows simultaneously so that I can write/read a "consistent" set of of records

HOW I WISH I HAD PHRASED MY QUESTION TO BEGIN WITH
Take a table with 26 keys, a-z and let them have integer values.
Create a process, Ouch, that does two things over and over again
In one transaction, write random values for a, b, and c such that those values always sum to 10
In another transaction, read the values for a, b and c and complain if their values do not sum to 10
If you spin-up even a few of these processes you will see that very quickly a, b and c are in a state where their values do not sum to 10. I believe there is no way to ask mnesia to "lock these 3 records before starting the writes (or reads)", one can only have mnesia lock the records as it gets to them (so to speak) which allows for the set of records' values to violate my "must sum to 10" constraint.
If I am right, solutions to this problem include
lock the entire table before writing (or reading) the set of 3 records -- I hate to lock whole table for 3 recs,
Create a process that keeps track of who is reading or writing which keys and protects bulk operations from anyone else writing or reading until the operation is completed. Of course I would have to make sure that all processes made use of this... crap, I guess this means writing my own AccessMod as the fourth parameter to activity/4 which seems like a non-trivial exercise
Some other thing that I am not smart enough to figure out.
thoughts?
Ok, I'm an ambitious Erlang newbee, so sorry if this is a dumb question, but
I am building an application-specific, in-memory distributed cache and I need to be able to write sets of Key, Value pairs in one transaction and also retrieve sets of values in one transaction. In other words I need to
1) Write 40 key,value pairs into the cache and ensure that no one else can read or write any of these 40 keys during this multi-key write operation; and,
2) Read 40 keys in one operation and get back 40 values knowing that all 40 values have been unchanged from the moment that this read operation started until it ended.
The only way I can think of doing this is to lock the entire table at the beginning of the fetch_keylist([ListOfKeys]) or at the beginning of the write_keylist([KeyValuePairs], but I don't want to do this because I have many processes simultaneously doing their own multi_key reads and writes and I don't want to lock the entire table any time any process needs to read/write a relatively small subset of records.
Help?
Trying to be more clear: I do not think this is just about using vanilla transactions
I think I am asking a more subtle question than this. Imagine that I have a process that, within a transaction, iterates through 10 records, locking them as it goes. Now imagine this process starts but before it iterates to the 3rd record ANOTHER process updates the 3rd record. This will be just fine as far as transactions go because the first process hadn't locked the 3rd record (yet) and the OTHER process modified it and released it before the first process got to it. What I want is to be guaranteed that once my first process starts that no other process can touch the 10 records until the first process is done with them.
PROBLEM SOLVED - I'M AN IDIOT... I guess...
Thank you all for your patients, especially Hynek -Pichi- Vychodil!
I prepared my test code to show the problem, and I could in fact reproduce the problem. I then simplified the code for readability and the problem went away. I was not able to again reproduce the problem. This is both embarrassing and mysterious to me since I had this problem for days. Also mnesia never complained that I was executing operations outside of a transaction and I have no dirty transactions anywhere in my code, I have no idea how I was able to introduce this bug into my code!
I have pounded the notion of Isolation into my head and will not doubt that it exists again.
Thanks for the education.
Actually, turns out the problem was using try/catch around mnesia operations within a transaction. See here for more.
Mnesia transaction will do exactly this thing for you. It is what is transaction for unless you do dirty operations. So just place your write and read operations to one transaction a mnesia will do rest. All operations in one transaction is done as one atomic operation. Mnesia transaction isolation level is what is sometimes known as "serializable" i.e. strongest isolation level.
Edit:
It seems you missed one important point about concurrent processes in Erlang. (To be fair it is not only true in Erlang but in any truly concurrent environment and when someone arguing else it is not really concurrent environment.) You can't distinguish which action happen first and which happen second unless you do some synchronization. Only way you can do this synchronization is using message passing. You have guaranteed only one thing about messages in Erlang, ordering of messages sent from one process to other process. It means when you send two messages M1 and M2 from process A to process B they arrives in same order. But if you send message M1 from A to B and message M2 from C to B they can arrive in any order. Simply because how you can even tell which message you sent first? It is even worse if you send message M1 from A to B and then M2 from A to C and when M2 arrives to C send M3 from C to B you don't have guarantied that M1 arrives to B before M3. Even it will happen in one VM in current implementation. But you can't rely on it because it is not guaranteed and can change even in next version of VM just due message passing implementation between different schedulers.
It illustrates problems of event ordering in concurrent processes. Now back to the mnesia transaction. Mnesia transaction have to be side effect free fun. It means there may not be any message sending outside from transaction. So you can't tell which transaction starts first and when starts. Only thing you can tell if transaction succeed and they order you can only determine by its effect. When you consider this your subtle clarification makes no sense. One transaction will read all keys in atomic operation even it is implemented as reading one key by one in transaction implementation and your write operation will be also performed as atomic operation. You can't tell if write to 4th key in second transaction was happen after you read 1st key in first transaction because there it is not observable from outside. Both transaction will be performed in particular order as separate atomic operation. From outside point of view all keys will be read in same point of time and it is work of mnesia to force it. If you send message from inside of transaction you violate mnesia transaction property and you can't be surprised it will behave strange. To be concrete, this message can be send many times.
Edit2:
If you spin-up even a few of these processes you will see that very
quickly a, b and c are in a state where their values do not sum to 10.
I'm curious why you think it would happen or you tested it? Show me your test case and I will show mine:
-module(transactions).
-export([start/2, sum/0, write/0]).
start(W, R) ->
mnesia:start(),
{atomic, ok} = mnesia:create_table(test, [{ram_copies,[node()]}]),
F = fun() ->
ok = mnesia:write({test, a, 10}),
[ ok = mnesia:write({test, X, 0}) || X <-
[b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]],
ok
end,
{atomic, ok} = mnesia:transaction(F),
F2 = fun() ->
S = self(),
erlang:send_after(1000, S, show),
[ spawn_link(fun() -> writer(S) end) || _ <- lists:seq(1,W) ],
[ spawn_link(fun() -> reader(S) end) || _ <- lists:seq(1,R) ],
collect(0,0)
end,
spawn(F2).
collect(R, W) ->
receive
read -> collect(R+1, W);
write -> collect(R, W+1);
show ->
erlang:send_after(1000, self(), show),
io:format("R: ~p, W: ~p~n", [R,W]),
collect(R, W)
end.
keys() ->
element(random:uniform(6),
{[a,b,c],[a,c,b],[b,a,c],[b,c,a],[c,a,b],[c,b,a]}).
sum() ->
F = fun() ->
lists:sum([X || K<-keys(), {test, _, X} <- mnesia:read(test, K)])
end,
{atomic, S} = mnesia:transaction(F),
S.
write() ->
F = fun() ->
[A, B ] = L = [ random:uniform(10) || _ <- [1,2] ],
[ok = mnesia:write({test, K, V}) || {K, V} <- lists:zip(keys(),
[10-A-B|L])],
ok
end,
{atomic, ok} = mnesia:transaction(F),
ok.
reader(P) ->
case sum() of
10 ->
P ! read,
reader(P);
_ ->
io:format("ERROR!!!~n",[]),
exit(error)
end.
writer(P) ->
ok = write(),
P ! write,
writer(P).
If it would not work it would be really serious problem. There are serious applications including payment systems which rely on it. If you have test case which shows it is broken, please report bug at erlang-bugs#erlang.org
Have you tried mnesia Events ? You can have the reader subscribe to mnesia's Table Events especially write events so as not to interrupt the process doing the writing. In this way, mnesia just keeps sending a copy of what has been written in real-time to the other process which checks what the values are at any one time. take a look at this:
subscriber()->
mnesia:subscribe({table,YOUR_TABLE_NAME,simple}),
%% OR mnesia:subscribe({table,YOUR_TABLE_NAME,detailed}),
wait_events().
wait_events()->
receive
%% For simple events
{mnesia_table_event,{write, NewRecord, ActivityId}} ->
%% Analyse the written record as you wish
wait_events();
%% For detailed events
{mnesia_table_event,{write, YOUR_TABLE, NewRecord, [OldRecords], ActivityId}} ->
%% Analyse the written record as you wish
wait_events();
_Any -> wait_events()
end.
Now you spawn your analyser as a process like this:
spawn(?MODULE,subscriber,[]).
This makes the whole process to run without any process being blocked, mnesia needs not lock any tabel or record because now what you have is a writer process and an analyser process. The whole thing will run in real-time. Remember that there are many other events that you can make use of if you wish by pattern matching them in the subscriber wait_events() receive body.
Its possible to build a heavy duty gen_server or complete application intended for reception and analysis of all your mnesia events. Its usually better to have one capable subscriber than many failing event subscribers. If i have understood you question well, this unblocking solution fits your requirements.
mnesia:read/3 with write locks seems to be suffient.
Mnesia's transaction is implemented by read-write lock and locks are well-formed (holding lock untill the end of transaction). So the isolation level is serializable.
The granularity of locks are per record as long as you access by primary key.

Resources