I have a distributed application.
In this, a Master node starts a mnesia schema with 4 tables. Some of them are replicated to other nodes, some are not.
When a node spawns, it registers at the master node and is added to the schema and the data are replicated to this node.
How can I ensure that my replication is finished?
I tried the following:
Timeout=60000,
TabList = [tab1, tab2, tab3, tab4],
mnesia:wait_for_tables(TabList, Timeout).
However, it does not take 60 seconds, not even 5 seconds until I get an error:
{{badmatch,{aborted,{no_exists,tab1}}}
Obviously it does not work..
When a new node joins a cluster, a rpc call from the master node performs the following function on the new node:
start_Mnesia(MasterNode) ->
mnesia:start(),
mnesia:change_config(extra_db_nodes, [MasterNode]),
Tabs=mnesia:system_info(tables) -- [schema],
[mnesia:add_table_copy(Tab, node(), ram_copies) || Tab <- Tabs].
Is it also waiting until it is written to ram_copies?
Thanks.
When a node joins your mnesia cluster it is already synchronised, regardless of copies of tables it, or other nodes, do or do not have.
You should see that a new node, after registering with your master and being added to the cluster, can already access all your tables. Adding a copy of a table doesn't change that, regardless of the state/stage of that copy.
When you add a copy of a table on your new node you can continue to run transactions, during and after replication, and the data that is or is not replicated to the node originating the transaction will make no difference to the correctness of the result.
So, if you are concerned just with synchronisation in terms of keeping your transactions ACID, then don't worry.
If you are concerned when when your data is actually replicated and stored safely on the other nodes, that's a different thing. In this case, I have observed that when you runmnesia:add_table_copy(Table, NewNode, disc_copies) it blocks, and returns only when NewNode has copied the data to the filesystem.
Remember though, unless you run mnesia:sync_transaction/3 all the time you don't have guarantees about data actually being on any disc after a transaction completes anyway.
I have a requirement to remove a running node from an mnesia cluster. This is a legitimate node that needs to have some maintenance performed. However, we want to keep this node running and servicing requests. I found this post. Which helps remove it from the additional nodes. However, once you re-start mnesia on the orphan node, it returns to the other nodes in the cluster.
From each of the non-orphan nodes, I run a script that does the following:
rpc:call('node_to_be_orphaned', mnesia, stop, []),
mnesia:del_table_copy(schema, 'node_to_be_orphaned'),
^^ At this point mnesia:system_info(db_nodes) shows that the node has indeed been removed.
rpc:call('node_to_be_orphaned', mnesia, start, []),
Now it's back. Ugh!
So, I then tried to flip it and remove the other nodes from the orphan first adding the following.
rpc:call(ThisNode, mnesia, stop, []),
rpc:call('node_to_be_orphaned', mnesia, del_table_copy, [schema, node()]),
rpc:call(ThisNode, mnesia, start, []),
This just creates a loop with no difference.
Is there a way to take a node out of mnesia clustering while leaving it up-and-running?
Any and all guidance is greatly appreciated
The schema is what is bothering you. You can add nodes, but removing them while keeping the table copies is, err, difficult. This is what happens when a node is connected to a distributed schema, besides receiving a new schema :
Adding a node to the list of nodes where the schema is replicated will
affect two things. First it allows other tables to be replicated to
this node. Secondly it will cause Mnesia to try to contact the node at
start-up of disc-full nodes.
This is what the documentation says about disconnecting a node from a distributed table while still keeping the schema running on the node:
The function call mnesia:del_table_copy(schema, mynode#host) deletes
the node 'mynode#host' from the Mnesia system. The call fails if
mnesia is running on 'mynode#host'. The other mnesia nodes will never
try to connect to that node again. Note, if there is a disc resident
schema on the node 'mynode#host', the entire mnesia directory should
be deleted. This can be done with mnesia:delete_schema/1. If mnesia is
started again on the the node 'mynode#host' and the directory has not
been cleared, mnesia's behaviour is undefined.
An existing distributed schema can't be kept on a disconnected node. You have to recreate one, and copy the table info.
If you wish to keep the current schema on your node, you could remove any shared table from it and use purely local tables instead.
If you really wish to remove the node from the schema, you could export the data, erase the schema and create a new, undistributed one, and import the data, for testing and development.
Here are some useful functions you could use in both cases:
Copying a mnesia table
Mnesia tables can be easily copied, like in this example I just wrote (and tested) for the sheer fun of it:
copy_table(FromTable,ToTable) ->
mnesia:create_table(ToTable, [
{attributes, mnesia:table_info(FromTable,attributes)},
{index, mnesia:table_info(FromTable,index)},
% Add other attributes to be inherited, if present
{record_name,FromTable},
{access_mode, read_write},
{disc_copies,[node()]}
]),
Keys = mnesia:dirty_all_keys(FromTable),
CopyJob = fun(Record,Counter) ->
mnesia:write(ToTable,Record,write),
Counter + 1
end,
mnesia:transaction(fun() -> mnesia:foldl(CopyJob,0,FromTable) end).
This function would copy any table (distributed or not) to a merely local onem keeping it's attributes and record definitions. You would have to use mnesia:read
Exporting/importing a mnesia table to/from a file
This other functions export a mnesia table to a file, and import it back again. They would need some minor tweaks to import them to an arbitrary named table. (you could use mnesia:ets/1 for the sheer experience of it):
export_table(Table) ->
Temp = ets:new(ignoreme,[bag,public]),
Job = fun(Key) ->
[Record] = mnesia:dirty_read(Table,Key),
ets:insert(Temp,Record) end,
Keys = mnesia:dirty_all_keys(Table),
[Job(Key) || Key <- Keys],
Path = lists:concat(["./",atom_to_list(Table),".ets"]),
ets:tab2file(Temp,Path,[{extended_info,[md5sum,object_count]}]),
ets:delete(Temp).
import_table(Table) ->
Path = lists:concat(["./",atom_to_list(Table),".ets"]),
{ok,Temp} = ets:file2tab(Path,[{verify,true}]),
{atomic,Count} = mnesia:transaction(fun() ->
ets:foldl(fun(Record,I) -> mnesia:write(Record),I+1 end
,0
,Temp)
end),
ets:delete(Temp).
I'm using mongodb replica set in my rails application with 1 primary(node A) and 1 secondary node(node B).
It was working prefectly fine until i added one more node(node C) and made node C as primary. Now that primary node (node C) is having all the content but as per my observation content created on previous primary(node A) can only be read now but not edited or destroyed. As i have understood that data can only be written to primary node so i guess data from secondary(node A- earlier primary) can only be read while being accessed.
Is this a common behaviour or i'm missing something?
EDIT:
I took a db dump of replica set from the primary node(node C) and then db.dropDatabase() and mongorestore again. I found data missing in some collections. Can anyone explain what could be the issue.
In a mongodb replica set you can only write (modify, create, destroy) on the primary node. Writes are then propagated to other (secondary) nodes in the replica set. Note that this propagation may not be immediate.
However when the primary change you should be able to write on data previously written by another primary.
Note that when you add a node to a replica set, it's preferable to load the latest database backup within this node before. The replication process is based on an oplog shared between each node that indicates creation/deletion/update, however this oplog has a limited number of entries. So earlier entries may not be considered by your new primary ...
I deleted the reference node. So I need to recreate the reference node.
Using cypher how to create a node with id 0?
thanks.
The short answer is you can't, and you don't need to. Do you have a specific problem without that node? If so, maybe you can elaborate, chances are there is something else that answers your problem better than trying to recreate a node with a specific id.
The long answer is you can't assign id:s to nodes with cypher. The id is an index or offset into the node storage on disk, so it makes sense to let Neo4j worry about it and not try to manipulate it or include it in any application logic. See Node identifiers in neo4j and Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?.
You also most likely don't need a reference node. It is created by default in a new database, but it's use is deprecated and it won't exist in future releases. See Is concept of reference node in neo4j still used or deprecated?.
If you still want to assign id to nodes you create, it is accidentally possible in a roundabout way with with the CSV batch importer (1,2) and, I believe, with the Java API batch inserter.
If you still want to recreate or simulate the reference node you can either delete the database data files and let Neo4j recreate the the database, or you can try what this person did: Recreate reference node in a Neo4j database. You can also force Neo4j to recycle the ids of deleted nodes faster, so that new nodes that you create receive those ids that have been freed up and not yet reassigned.
In neo4j should all nodes connect to node 0 so that you can create a traversal that spans across all objects? Is that a performance problem when you get to large datasets? If so, how many nodes is too much? Is it ok not to have nodes connect to node 0 if I don't see a use case for it now, assuming I use indexes for finding specific nodes?
There is no need or requirement to connect everything to the root node. Indexes work great in finding starting points for your traversal. If you have say less then 5000 nodes connected to a starting node (like the root node), then a relationship scan is cheaper than an index lookup.
To judge what is better, you need to know a bit more about the domain.