mnesia files damaged need to forensically dump everything - erlang

I have damaged my Mnesia database beyond repair as a result of overestimating the fragility of the implementation. When I try Mnesia API the records I need are not visible even though they keys are visible in the file. Even though the documentation indicates that Mnesia artifacts are DETS files they cannot be opened with or identified as DETS artifacts. PS: dump_to_textfile() does not work either.

Eventually I was able to dump my DB. It did not end my Mnesia problems but it gave me options I did not have before.
SETUP:
Originally I had implemented a master-master mnesia cluster. (read the docs). It turns out that not even the most seasoned Erlang programmer uses Mnesia replication as there are to many flaws. In fact I come to this information from the Erlang inner circle and a few L1 teams too. In my case, however, the work was already in production. And that's when problems started.
We started getting DB consistency errors and, my favorite, network or DB partition errors. It takes a very highly skilled and knowledgeable individual to recover as well as a lot of planning and code in advance; which I did not have.
Ultimately I took two steps. (a) removed the second app so that even though the DB was in a master-master cluster; one was a slave because it was never used as a master. (b) In a second implementation I split the cluster so that the app ran on a single node with a single DB. #a was in production and #b was the warm standby. Replication was manual as writes were very rare.
In the single node deployment there are two nodes. The first node is the application; app#ks and on the same hardware was an "erl" node when I needed to rpc into the app and see how things were going.
MY SOLUTION:
when I posted this question I was trying to dump the contents of my Mnesia DB. I was having a number of problems because I was trying to access the DB from the admin node as the application node was operational.
Because I was trying to access the mnesia lib from the erl node the DB was not LOCAL to the erl node and so dump_to_textfile produced an empty file. I eventually had success when I used rpc to tell the app#ks node to dump.
STILL UNDEFINED
When I launched the admin node I set the mnesia dir parameter to the same folder as the app#ks node. I have a vague memory that this is undesirable.
There are many more Mnesia issues to solve but none that refer to the problem I reported. But I still do not know how to extract the raw data from the various DB files.

Related

neo4j is storing arbitrary files in drive C?

my C Drive size is growing and my server is not running any thing but neo4j.
even though i configured neo4j to store database information on some other drive.
node count might be irrelevant but for the record, i have almost 10 million nodes and traffic to database about 200 request / minute.
is there any thing else written by neo4j that i should be aware of?
dbms.directories.data=E:/MyNeoDB4/
dbms.directories.logs=E:/MyNeoDb4
dbms.jvm.additional=-Dunsupported.dbms.udc.source=zip
dbms.memory.heap.initial_size=15
dbms.memory.heap.max_size=15G
dbms.security.procedures.unrestricted=apoc.*
dbms.memory.pagecache.size=8G
Update 1:
things i have checked already:
my debug log is being written some where other than Drive C
metrics.enabled=false
Update 2:
- as #InverseFalcon said i also checked transaction logs in the first step. they were being written in some other directory.
(Note: Answer was written before original question was updated to say that neither metrics nor logs were the likely culprits)
Logs, and possibly metrics
I'm not sure what your logging needs have been like, but a major source of disk consumption that is not the data itself is the writing of log files. They typically do not grow extremely quickly, but it totally depends on your set up.
I suspect that your drive may be filling up with logs, although I am surprised it's filling up so quickly. I would check out your log files and see if they are full of long chains of exceptions.
It could also be metrics being exported to CSV on the local disk, although I do not believe that Neo4J will do that without being explicitly configured to do so.
More info on metrics is at the official docs:
https://neo4j.com/docs/operations-manual/current/monitoring/metrics/
A variant on Rebecca Nelson's answer, you might want to check for transaction log files.
Transaction logs are the source of truth for changes made to a database, and they are not the same kinds of logs as the readable log files (debug.log, neo4j.log) that live in the logs folder.
You can find transaction logs in your graph.db folder (or whatever name you've given to your graph database folder) using the naming pattern neostore.transaction.db.0 (with incremental numbering of the log files starting with 0).
Transaction logs are a stage of data persistence. Transactions affecting the database first write to these logs. When criteria are met, a checkpoint operation occurs which flushes the contents of the transaction logs to the datastore files (some of the other files in the graph.db folder) and the transaction logs are pruned and/or rotated.
While you should not modify or delete transaction log files yourself, you can add configuration parameters in neo4j.conf to control how these files are handled.
Here are the docs dealing with transaction logs.

How Erlang Mnesia Distribution Works

I am getting through the online examples, and can already use mnesia ram copies and also connect them, but I am a bit confused on a couple of things.
1: Does the starter node (the one who creates the schema), only have the local schema? (for example, in root folder = Mnesia.name#ip)
I ask because on another node, I can simply start mnesia, and change_config(extra_db_nodes, [node]), and automatically get all the data that is on the starting node.
This seems weird to me, what happens if all nodes go down? This means starter node needs to be ran first before you can do anything.
2: There seems to be a lot of different ways to connect nodes, and to copy the tables ... Could I get a list of different ways to do this, and their impacts?
3: From the first question, after calling change_config, how can you know that its finished downloading all the data before you can start to use it? For example, if someone connects to the node, and you check if they are already online, they might be connected to another node and you dont get that data during the check.
4: After connecting to a node, are you automatically connected to all nodes? And does it automatically update your local ram copies without doing anything? How does it assure synchronization when reading, and writing? Do I have to do anything special?
And about question 1 again -- couldn't you have a node process running that holds the local schema, and use this node to connect all nodes together? And if possible could you forbid mnesia from copying ram copies to this node process?
I know this is a lot, so thank you for your time.
Not a direct answer to your questions, but you can check out Erlang Performance Lab which might help you understand how some operations in Mnesia works by visualizing the messages between different nodes.

setting up Neo4j replication on two instances

I am planning to configure some sort of 2 node replication for neo4j, similar to mysql replication. Since I am a little constrained on resources I don't want to pay for more than two Cloud compute instances. Also I am happy with just one real time or near real time copy of the neo4j database. So the approach i can think of is:
Configure HA on the two compute nodes with the help of an arbiter instance. Setup one neo4j instance (master) on first node and another neo4j instance (slave) + another neo4j instance (arbiter, only for arbitration, no data logging) instance on second node.
OR
Setup a cron for online backup using the neo4j-backup tool. Setup incremental backups every hour or so. Not sure the load it may put on the prod server, planning to test that out.
I am more inclined on the first approach since I get a more real time copy the database (I also get HA/load balancing with instant failover but that is not a priority right now).
Please let me know
which of the two approach is better,
if there is another way to achieve the same or
if any of the above approaches are not suitable or have some flaws.
I am a little new to Neo4j HA so please pardon me for my ignorance. Thanks !
So. You already mentioned available solutions.
TL;DR; I prefer first option.
Cluster
In general, recommended layout is 3 nodes (2 slaves + 1 master).
But your layout - 2 nodes (1 master + 1 slave + 1 arbiter) is viable too. Especially if one server can handle your workload.
Good things:
Almost "real-time" replica.
Possibility to utilise resources to handle bigger workload.
Better availability.
Notes:
If you have 10mb/sec write load on master, then same load will be applied on slave node. This shouldn't affect reads from slave at all (except write load is REALLY huge).
Maintenance costs are bigger, then single-instance installation. You should plan how to handle cluster upgrades, configuration updates, plugin updates.
Branched data. In clustered environment there is possibility to end up in "split-brain" scenario, when 2 nodes have different data and decision should be made which data should be kept. Neo4j handles such cases quite good. But you should keep in mind that small data-loss can occur in VERY RARE scenarios.
Backup
Good things:
Simple. Just do backups from database.
Consistency check. When backup is made, tool runs consistency check to verify if database is not damaged. There is no possibility that Backup will screw up live database. If there any issues - you will be notified via logs from backup utility. See below detailed info on to how backup is performed.
Database. Neo4j backup is fully-functional database. You can spin-up server that points to backup database, and do everything you wan't.
Incremental backups. You can do incremental backups as often, as you wan't.
Notes:
Neo4j scales vertically very well (depends on size of database). It can handle huge load on single instance (we had up to 3k requests/second on medium machine). So, you can get one bigger machine for Neo4j server and other smaller (cheaper) for backups.
How backup is performed?
One thing that should be kept in mind - live database is still fully operational. Backup utility doesn't not stop or prevent any actions.
When transaction in database is committed, all changes are appended to transaction log.
When there are no previous backup present: copy whole storage.
When there is previous backup AND transaction logs are available: copy new transaction logs and replay them on to storage.
When there is previous backup AND transactions are NOT available: discard existing storage, copy existing storage.
Why transaction logs can not be available? Your configuration may say to keep only latest transaction logs (i.e. 1 hour), or not to keep at all.
Relevant settings:
keep_logical_logs
logical_log_rotation_threshold
Other
Anyway, you should consider making backups event in clustered environment. Everything can fail, in any moment.
In general - everything depends on your load and database size.
If your database is small enough to fully fit in memory and one machine is enough to handle all load, then one Neo4j instance will be enough. Just do backup.
If you wan't better scalability/availability and real-time working replica, then cluster setup is best choice.

One replicated mnesia table has become out-of-sync

I have an erlang application currently running on four nodes with a replicated mnesia db that stores minimal data regarding connected clients. The mnesia replication has been working seamlessly in the past (as far as I know anyway) but a client recently noticed that one of the nodes is missing some ids related to his application.
I'm not really sure how this happened. Our network may have had a hiccup at the time. Maybe? But, of more urgency at the moment is getting the data into a good state across all nodes. Is there a way to tell mnesia to replicate from a known-good node?
Mnesia is legendary about this issue. It's a huge PITA.
Looking at it from CAP theorem's point of view, most systems built with Mnesia end up being C-A (consistency-availability with no partition tolerance) systems. For most of the time you have (and heavily rely on) its hard consistency. Then a network partition happens...
It's still available for writes, but these writes destroy consistency. And later on, Mnesia has no mechanism for automatic data repair.
Everyone who uses Mnesia in a cluster should familiarize themselves with these tradeoffs. Your problem is a clear sign that using Mnesia was a poor choice. Double so if this data is critical to you.
I too use Mnesia in such a way (sometimes we all need speed you know). But I make sure to only use it to store data that I can easily reconstruct. In general, if you need it stored on disk, Mnesia is no good, except for toy projects.
I make sure to always have this function at hand:
reinit_mnesia_cluster() ->
rpc:multicall(mnesia, stop, []),
AllNodes = [node() | nodes()],
mnesia:delete_schema(AllNodes),
mnesia:create_schema(AllNodes),
rpc:multicall(mnesia, start, []).
Use it only after the network partition has been resolved and all nodes are reachable. This will erase all Mnesia replicas and start it anew. Again, if you can't live with what it does, then using Mnesia was a poor choice.
For important data that needs hard consistency, use SQL. For important data that needs availability, use Riak. For shared state that needs speed, use Redis. Mnesia is no replacement for these systems, although at first it does seem so.
Edit on 2014-11-16: Here is a much better article on the topic, explaining in detail what I said above https://medium.com/#jlouis666/mnesia-and-cap-d2673a92850
Honestly, I think the cleanest way to get an out-of-sync Mnesia to replicate from a known good node is to shut down the application on the bad node, and delete all its Mnesia database files, then do the following.
Write an escript that starts Mnesia up standalone using the "bad" node name and Mnesia directory, replicates the tables from a known good node, and shuts Mnesia down. Run that escript on the bad node.
The act of replicating the tables and shutting Mnesia down gracefully puts the node back in sync with the cluster. Then, when you start the application up on the bad node, it will join up and stay in sync with the cluster.
Of course, this description lacks precise details, but that's the gist of it. There are surely less brute force ways of doing this, but unless you have massive amounts of data to replicate, I think this way is the quickest and cleanest.

What is the significance of a Mnesia Master Node in a cluster

I am running two erlang nodes with a replicated mnesia database. Whenever I tried to start one of them while mnesia IS NOT Running on the other one, mnesia:wait_for_tables(?TABS,?TIMEOUT), would hang on the node that its called from. I need to have a structure where (if both nodes are not running), I can start working with one while the other is down and later decide to bring the other one up yet continue to work well. I need to be sure that the first node that was running has updated the later when it gets up. Does this necessarily require me to have one as the master?
%%% Edited...........................................................................
Oh, I've got it. The database I was using had a couple of fragmented tables. Some of the fragments had been distributed across the network for load balancing. So, Mnesia on one host would try to load them across the network and would fail since mnesia on the other one is down!
I guess this has got nothing to do with a mnesia master node. But I still would love to understand the significance of the same because I've not used it before, yet, I always play with distributed schemas.
Thanks again...
Mnesia master nodes are used to resolve split-brain situations in a fairly brutal fashion. If mnesia discovers a split-brain situation, it will issue an event, "running partitioned network". One way to respond to this would be to set master nodes to the "island" that you want to keep, and then restart the other nodes. When they come back up, they will unconditionally load tables from the master nodes.
There is another mechanism in mnesia, called force_load. One should be very careful with it, but in the case where you have two nodes, A and B, terminate B (A logs B as down), then terminate A, then restart B, B will have no info about when A went down, so will refuse to load tables that have a copy on A. If you know that A is not coming back soon, you could choose to call mnesia:force_load_tables(Ts) on B, which will cause it to run with its own copies. Once A comes back up, it will detect that B is up, and will load tables from it. As you can see, there are several other scenarios where you can end up with an inconsistent database. Mnesia will not fix that, but tries to provide tools to resolve the situation if it arises. In the scenario above, unfortunately, mnesia will give you no hints, but it is possible to create an application that detects the problem.

Resources