The impact of a distributed application configuration on node discovery via net_adm:ping/0 - erlang

I am experiencing different behavior with respect to net_adm:ping/1 when being done in the context of a Distributed Application.
I have an application that pings a well-known node on start-up and in that way discovers all nodes in a mesh of connected nodes.
When I start this application on a single node (non-distributed configuration), the net_adm:ping/1 followed by a nodes/0 reports 4 other nodes (this is correct). The 4 nodes are on 2 different physical machines, so what is returned is the following n1#machine_1, n2#machine_2, n3#machine_2, n4#machine_1 (ip addresses are actually returned, not machine_x).
When part of a two-node distributed application, on the node where the application starts, the net_adm:ping/1 followed by a nodes/0 reports 2 nodes, one from each machine(n1#machine1, n2#machine2). A second call to nodes/0 after about a 750 ms delay results in the correct 5 nodes being found. Two of the three missing nodes are required for my application to work and so, not finding them, the application dies.
I am using R15B02
Is latency regarding the transitive node-discovery process known to be different when some of the nodes in the mesh are participating in distributed application configuration?

The kernel application documentation mentions the way to synchronize nodes in order to stop the boot phase until ready to move forward and everything is in place. Here are the options:
sync_nodes_mandatory = [NodeName]
Specifies which other nodes must be alive in order for this node to start properly. If some node in the list does not start within the specified time, this node will not start either. If this parameter is undefined, it defaults to [].
sync_nodes_optional = [NodeName]
Specifies which other nodes can be alive in order for this node to start properly. If some node in this list does not start within the specified time, this node starts anyway. If this parameter is undefined, it defaults to the empty list.
A file using them could look as follows:
[{kernel,
[{sync_nodes_mandatory, [b#ferdmbp, c#ferdmbp]},
{sync_nodes_timeout, 30000}]
}].
Starting the node a#ferdmbp by calling erl -sname a -config config-file-above. The downside of this approach is that each node needs its own config file.

Related

Distributing in-memory linked list

I have a program which is build based on a singly linked list. There are different programs which creates some form of data and this data sent to this linked list module to be added. As long as I've RAM available, program working as intended. Periodically -about every year-, I archive the entire linked list to the disk -due to requirement, I'm archiving all-. So far so good.
What happens if I wanted to add new node to the list whilst RAM is full and I haven't archived and freed the memory on RAM? This might occur when producer count goes up or regardless of producer count, there may be more data created depending or where it's used etc. I couldn't find a clear solution the scale the on-memory linked list. There is a workaround in my head but don't know even if it works so I thought better to ask here.
When the RAM start to get almost full, I would create a new instance
of the linked list program -just another machine on the cloud or new
physical computer on premise, whatever -.
I do have an service discovery module -something like ZooKeeper-, this discovery module will detect the newly created machine and adds to the list.
When first instance is almost in it's limits, it will check if there is an available instance, if there is; it will relay the node to the next instance and it will update its last node's next pointer to something special. If you wanted to traverse the list from start to finish across all the machines every time you come to this special node, it will have the information of the which machine has the next node. Traversal will continue from the next machine that the last node points to.
Since this this not a hash map or something in that nature, I can't just add replicate the service and for example relay the incoming request based on a given key to a particular machine.
Rather than archiving part of the old data and loading that to the RAM and continuing on like that, I thought it would be better to have a last pointer to point to a different machine and continue reading from that machine. My choice for a network call seemed better because this program will be used in a intranet, but still I couldn't find a solid solution on paper.
Is there a such example that I can study on and try to find a better solution? Is this solution feasible?
An example:
Machine 1:
1st node : [data:x, *next: 2nd Node address],
2nd node : [data:123, *next: 3rd Node address],
...
// at this point RAM is almost full
// receive next instance's ip
(n-1)th node : [data:987, *next: nth Node address],
nth node : [data:x2t, type: LastNodeInMachine, *next: nullptr]
Machine 2:
1st node == (n+1) node : [data:x, *next: 2nd Node address],
... and so on

Processes spawned on connected nodes get same PID

I have four Erlang nodes working together on multi-process application.
In my order one process is monitoring which draws the location of the processes on the area and the three other nodes handle the processes location and movement. On the monitor I use an ETS database to store the locations when the key is the process PID. I have noticed that the nodes creates processes which have the same PIDs which obviously interrupts with the management of the entire system.
I have tried to connect the processes with:
net_adm:ping(...).
net_kernel:connect(...).
I was hoping that when the nodes will be aware of each other they will give different PIDs but that did not work.
The PIDs may be printed the same, e.g. <0.42.0>, but that's just an output convention: PIDs on the local node are printed with the first number being 0. If you'd send this PID to another node and print it there, it would be printed as <2265.42.0> or similar. PIDs are always associated with the name of the node where the process is running, and you can extract it with node(Pid). Therefore, PIDs from different nodes will never compare equal.
This answer goes into more details about the structure of a PID.

Why does Riak store all my documents on only one node? n_val is equal to 3

I have built a 5-node cluster using Riak 2.0pre11 on EC2 servers. Installed Riak, got it working, then repeated the same actions on 4 more servers using a bash script. At that point I used riak-admin cluster join riak#node1.example.com on nodes 2 thru 5 to form a cluster.
Using the Python Riak client I wrote a script to send 10,000 documents to Riak. Works fine and I can wrote another script to retrieve a doc which worked fine. Other than specifying the use of protobufs I haven't specified any other options when storing keys. I stored all the docs via a connection to node1.
However Riak seems to be storing all 3 replicas on the same node, in other words the storage used on node1 is about 3x the original HTML docs.
The script connected to node 1 and that is where all docs are stored. I changed the script to connect to node 2 and send 10,000 more which also all ended up in node 1. I used the command du -h /data/riak/bitcask to verify the aggregate stored size of the objects. On nodes 2 thru 4 there is only a few K which is the overhead of an empty Bitcask datastore.
For each document I specified the key similar to this
http://www.example.com/blogstore/007529.html4787somehash4787947:2014-03-12T19:14:32.887951Z
The first part of all keys are identical (testing), only the .html name and the ISO 8601 timestamp are different. Is it possible that I have somehow subverted the perfect hashing function?
Basically I used a default config. What could be wrong? Since Riak 2.0 uses a different config format, here is a fragment of the generated config for riak-core in the old format:
{riak_core,
[{enable_consensus,false},
{platform_log_dir,"/var/log/riak"},
{platform_lib_dir,"/usr/lib/riak/lib"},
{platform_etc_dir,"/etc/riak"},
{platform_data_dir,"/var/lib/riak"},
{platform_bin_dir,"/usr/sbin"},
{dtrace_support,false},
{handoff_port,8099},
{ring_state_dir,"/datapool/riak/ring"},
{handoff_concurrency,2},
{ring_creation_size,64},
{default_bucket_props,
[{n_val,3},
{last_write_wins,false},
{allow_mult,true},
{basic_quorum,false},
{notfound_ok,true},
{rw,quorum},
{dw,quorum},
{pw,0},
{w,quorum},
{r,quorum},
{pr,0}]}]}
If the bitcask directory only grows on a single node, it sounds like the nodes might not be communicating. Please run riak-admin member-status to verify that all nodes in the cluster are active.
Once you have issued the riak-admin cluster join <node> commands on all the nodes joining the cluster, you will also need to run riak-admin cluster plan to verify that the plan is correct before committing it using riak-admin cluster commit. These commands are described in greater detail here..

Can I use module pool to decide load balance?

I'm looking for an automatic way to do my load balance and this module attracted me.
As the manual says,
pool can be used to run a set of Erlang nodes as a pool of computational processors.
It is organized as a master and a set of slave nodes and includes the following features:
The slave nodes send regular reports to the master about their current load.
Queries can be sent to the master to determine which node will have the least load.
The BIF statistics(run_queue) is used for estimating future loads.
It returns the length of the queue of ready to run processes in the Erlang runtime system.
What's the frequency and load for the slave nodes to send regular reports?
Is it a proper way to make load balance?
Reports are sent every 2 seconds and use information gathered from statistics(run_queue) to determine the node with the least load. run_queue returns the queue size of the current node's scheduler.
When you call pool:get_node/0 you are getting the node with the lowest number of tasks waiting to be executed on it's scheduler. Keep in mind that nodes are kept in sorted order so calls to pool:get_node/0 do not directly query nodes, but rather rely on information that could be up to 2 seconds old.
If you need a load balanced pool of nodes, pool works great.
Here is some more info from the pool.erl source:
%% Supplies a computational pool of processors.
%% The chief user interface function here is get_node()
%% Which returns the name of the nodes in the pool
%% with the least load !!!!
%% This function is callable from any node including the master
%% That is part of the pool
%% nodes are scheduled on a per usgae basis and per load basis,
%% Whenever we use a node, we put at the end of the queue, and whenever
%% a node report a change in load, we insert it accordingly

how do I remove an extra node

I have a group of erlang nodes that are replicating their data through Mnesia's "extra_db_nodes"... I need to upgrade hardware and software so I have to detach some nodes as I make my way from node to node.
How does one remove a node and still preserve the data that was inserted?
[update] removing nodes is as important as adding them. Over time as your cluster grows it must also contract. If not then Mnesia is going to be busy trying to send data to nonexistent nodes filling up queues and keeping the network busy.
[final update] after pouring through the erlang/mnesia source code I was able to determine that it is not possible to completely disassociate nodes. While del_table_copy removes the linkage between tables it is incomplete. I would close this question but none of the close descriptions are adequate.
I wish I had found this a long time ago: http://weblambdazero.blogspot.com/2008/08/erlang-tips-and-tricks-mnesia.html
basically, with a properly functioning cluster....
login to the cluster to be removed
stop mnesia
mnesia:stop().
login to a different node on the cluster
delete the schema
mnesia:del_table_copy(schema, node#host.domain).
I'm extremely late to the party, but came across this info in the doc when looking for a solution to the same problem:
"The function call
mnesia:del_table_copy(schema,
mynode#host) deletes the node
'mynode#host' from the Mnesia system.
The call fails if mnesia is running on
'mynode#host'. The other mnesia nodes
will never try to connect to that node
again. Note, if there is a disc
resident schema on the node
'mynode#host', the entire mnesia
directory should be deleted. This can
be done with mnesia:delete_schema/1.
If mnesia is started again on the the
node 'mynode#host' and the directory
has not been cleared, mnesia's
behaviour is undefined."
(http://www.erlang.org/doc/apps/mnesia/Mnesia_chap5.html#id74278)
I think the following might do what you desire:
AllTables = mnesia:system_info(tables),
DataTables = lists:filter(fun(Table) -> Table =/= schema end,
AllTables),
RemoveTableCopy = fun(Table,Node) ->
Nodes = mnesia:table_info(Table,ram_copies) ++
mnesia:table_info(Table,disc_copies) ++
mnesia:table_info(Table,disc_only_copies),
case lists:member(Node,Nodes) of
true -> mnesia:del_table_copy(Table,Node);
false -> ok
end
end,
[RemoveTableCopy(Tbl,'gone#gone_host') || Tbl <- DataTables].
rpc:call('gone#gone_host',mnesia,stop,[]),
rpc:call('gone#gone_host',mnesia,delete_schema,[SchemaDir]),
RemoveTablecopy(schema,'gone#gone_host').
Though, I haven't tested it since my scenario is slightly different.
I've certainly used this method to perform this (supporting the mnesia:del_table_copy/2 use). See removeNode/1 below:
-module(tool_bootstrap).
-export([bootstrapNewNode/1, closedownNode/0,
finalBootstrap/0, removeNode/1]).
-include_lib("records.hrl").
-include_lib("stdlib/include/qlc.hrl").
bootstrapNewNode(Node) ->
%% Make the given node part of the family and start the cloud on it
mnesia:change_config(extra_db_nodes, [Node]),
%% Now make the other node set things up
rpc:call(Node, tool_bootstrap, finalBootstrap, []).
removeNode(Node) ->
rpc:call(Node, tool_bootstrap, closedownNode, []),
mnesia:del_table_copy(schema, Node).
finalBootstrap() ->
%% Code removed to actually copy over my tables etc...
application:start(cloud).
closedownNode() ->
application:stop(cloud), mnesia:stop().
If you have replicated the table (added table copies) on nodes other than the one you're removing, then you're already fine - just remove the node.
If you wanted to be slightly tidier you'd delete the table copies from the node you're about to remove first via mnesia:del_table_copy/2.
Generally, mnesia gracefully handles node loss and detects node rejoin (rebooted nodes obtain new table copies from nodes that kept running, nodes that didn't reboot are detected as a network partition event). Mnesia does not consume CPU or network traffic for nodes that have gone down. I think, though I haven't confirmed it in the source, mnesia won't reconnect to nodes that have gone down automatically - the node that goes down is expected to reboot (mnesia) and reconnect.
mnesia:add_table_copy/3, mnesia:move_table_copy/3 and mnesia:del_table_copy/2 are the functions you should look at for live schema management.
The extra_db_nodes parameter should only be used when initialising a new DB node - once a new node has a copy of the schema it doesn't need the extra_db_nodes parameter.

Resources