Deleted databases still appear in leaf node - memory

I had an issue recently on memory allocation for databases created on memsql which I raised here and found a solution. i.e: Either to delete unnecessary databases or reduce transaction-buffer system variable. So I selected 1st option to delete the unnecessary databases and to keep only 20 databases. But still they say only 64MB is allocated per a DB. So I ran the query
SHOW DATABASES EXTENDED;
on my leaf node and it surprisingly returned 130 DBs including all past dropped dbs. which causes my Alloc_durability_large to 7GB. But it should be 64MB * 20 = 1280MB.
How can I get rid of these databases and why they are not removed although I have dropped the databases in master. Do I need to drop databases in both leaf and master for it to have an effect?
And databases are replicated as _0 _1 _2 _4 eg: mark db is replicated as mark_0, mark_1 etc? should I manually delete all?
Note: I have restarted memsql still no effect.

When you run DROP DATABASE on the master aggregator, it drops the database cluster-wide, you do not need to do any operations on the leaf normally. The mark_0, mark_1, etc are partitions of the mark database on the leaves, and are normally removed when you drop database mark.
Are you running HA? It's possible you are seeing orphan databases because of operations performed while a node was offline. You can use the command CLEAR ORPHAN DATABASES to remove them - use EXPLAIN CLEAR ORPHAN DATABASES first to see what databases it thinks are orphans.

Related

Is it safe to reduce TFS DB size by using Delete stored procedures?

We have TFS 2017.3 and the database it's huge - about 1.6 TB.
I want to try to clean up the space by running these 2 stored procedures:
prc_CleanupDeletedFileContent
prc_DeleteUnusedFiles
prc_DeleteUnusedContent
Is it safe to run it?
Is there a chance that it will delete important things I am currently using? (of course, I will do a backup before...)
What are the best values to put in these stored procedures?
Another thing - If I run this query:
SELECT A.[ResourceId]
FROM [Tfs_DefaultCollection].[dbo].[tbl_Content] As A
left join [Tfs_DefaultCollection].[dbo].[tbl_FileMetadata] As B on A.ResourceId=B.ResourceId
where B.[ResourceId] IS Null
I got result of 10681.
If I run this query:
SELECT A.[ResourceId]
FROM PTU_NICE_Coll.[dbo].[tbl_Content] As A
left join PTU_NICE_Coll.[dbo].tbl_FileReference As B on A.ResourceId=B.ResourceId
where B.[ResourceId] IS Null
I got result of 10896.
How can I remove this rows? and is it completely safe to remove them?
Generally we don't recommend to do actions against the DB directly as it may cause problems.
However if you have to do that, then you need to backup the DBs first.
You can refer to below articles to clean up and reduce the size of the TFS databases:
Control\Reduce TFS DB Size
Cleaning up and reduce the size of the TFS database
Clean up your Team Project Collection
Another option is to dive deep into the database, and run the cleanup
stored procedures manually. If your Content table is large:
EXEC prc_DeleteUnusedContent 1
If your Files table is large:
EXEC prc_DeleteUnusedFiles 1, 0, 1000
This second sproc may run for a long time, that’s why it has the third
parameter which defines the batch size. You should run this sprocs
multiple times, or if it completes quickly, you can increase the chunk
size.

Erlang ensure Mnesia schema replication

I have a distributed application.
In this, a Master node starts a mnesia schema with 4 tables. Some of them are replicated to other nodes, some are not.
When a node spawns, it registers at the master node and is added to the schema and the data are replicated to this node.
How can I ensure that my replication is finished?
I tried the following:
Timeout=60000,
TabList = [tab1, tab2, tab3, tab4],
mnesia:wait_for_tables(TabList, Timeout).
However, it does not take 60 seconds, not even 5 seconds until I get an error:
{{badmatch,{aborted,{no_exists,tab1}}}
Obviously it does not work..
When a new node joins a cluster, a rpc call from the master node performs the following function on the new node:
start_Mnesia(MasterNode) ->
mnesia:start(),
mnesia:change_config(extra_db_nodes, [MasterNode]),
Tabs=mnesia:system_info(tables) -- [schema],
[mnesia:add_table_copy(Tab, node(), ram_copies) || Tab <- Tabs].
Is it also waiting until it is written to ram_copies?
Thanks.
When a node joins your mnesia cluster it is already synchronised, regardless of copies of tables it, or other nodes, do or do not have.
You should see that a new node, after registering with your master and being added to the cluster, can already access all your tables. Adding a copy of a table doesn't change that, regardless of the state/stage of that copy.
When you add a copy of a table on your new node you can continue to run transactions, during and after replication, and the data that is or is not replicated to the node originating the transaction will make no difference to the correctness of the result.
So, if you are concerned just with synchronisation in terms of keeping your transactions ACID, then don't worry.
If you are concerned when when your data is actually replicated and stored safely on the other nodes, that's a different thing. In this case, I have observed that when you runmnesia:add_table_copy(Table, NewNode, disc_copies) it blocks, and returns only when NewNode has copied the data to the filesystem.
Remember though, unless you run mnesia:sync_transaction/3 all the time you don't have guarantees about data actually being on any disc after a transaction completes anyway.

Neo takes 13 seconds to count 0 nodes

I've been experimenting with Neo4j for a while now. I've been inserting data from the medical terminology database SNOMED in order to experiment with a large dataset.
While experimenting, I have repeatedly inserted and then deleted around 450,000 nodes. I'm finding Neo's performance to be somewhat unsatisfactory. For example, I have just removed all nodes from the database. Running the query:
match (n) return count (n)
takes 13085 ms to return 0 nodes.
I'm struggling to understand why it might take this time to count 0 things. Does Neo retain some memory of nodes that are deleted? Is it in some way hobbled by the fact that I have inserted and removed large amounts of nodes in the past? Might it perform better if I delete the data directory instead of deleting all nodes with Cypher?
Or is there some twiddling with memory allocation and so forth that might help?
I'm running it on an old-ish laptop running Linux Mint.
It's partly due to neo4j's store format. Creating new nodes or relationships assigns them ids, where ids are actual offsets into the store files. Deleting a node or relationship marks that record as not in use in the store file. Looking at all nodes is done by scanning the node store file to spot records that are in use.

Loading Neo4j database dump (neo4j-shell)

My database was affected by the bug in Neo4j 2.1.1 that tends to corrupt the database in the areas where many nodes have been deleted. It turns out most of the relationships that have been affected were marked for deletion in my database. I have dumped the rest of the data using neo4j-shell and with a single query. This gives a 1.5G Cypher file that I need to import into a mint database to have my data back in a healthy data structure.
I have noticed that the dump file contains definitions for (1) schema, (2) nodes and (3) relationships. I have already removed the schema definitions from the file because they can be applied later on. Now the issue is that since the dump file uses a single series of identifiers for nodes during node creation (in the following format: _nodeid) and relationship creation, it seems that all CREATE statements (33,160,527 in my case) need to be run in a single transaction.
My first attempt to do so kept the server busy for 36 hours without results. I had neo4j-shell load the data directly into a new database directory instead of connecting to a server. The data files in the new database directory never showed any sign of receiving data, and the message log showed many messages indicating thread blocks.
I wonder what is the best way of getting this data back into the database? Should I load a specific config file? Do I need to allocate a large Java heap? What is the trick to have such a large dump file loaded into a database?
The dump command is not meant for larger scale exports, there was originally a version that did, but it was not included in the product.
if you have the old database still around, you can try some things:
contact Neo4j support to help you recover your data
use my store-utils to copy it over to a new db (it will skip all broken records)
query the data with cypher and export the results as csv
you could use the shell-import-tools for that
and then import your data from the CSV using either the shell tools again, or the load csv command or the batch-importer
Here is what I finally did:
First I identified all unaffected nodes and marked them with one specific label (let's say Carriable). This was a pretty easy process in my case because all the affected nodes had the same label, so, I just excluded this specific label. In my case I did not have to identify the affected relationships separately because all the affected relationships were also connected to nodes from the affected label.
Then I exported the whole database except the affected nodes and relationships to GraphML using a single query (in neo4j-shell):
export-graphml -o /home/mah/full.gml -t -r match (n:Carriable) optional match (n)-[i]-(:Carriable) return n,i
This took about a half hour to yield a 4GB XML file.
Then I imported the entire GraphML back into a mint database:
JAVA_OPTS="-Xmx8G" neo4j-shell -c "import-graphml -c -t -b 10000 -i /home/mah/full.gml" -path /db/newneo
This took yet another half hour to accomplish.
Please note that I allocated more than sufficient Java heap memory (JAVA_OPTS="-Xmx8G"), imposed a particularly small batch size (-b 10000) and allowed the use of on-disk caching.
Finally, I removed the unnecessary "Carriable" label and recreated the constraints.

Unable to edit data created on primary node in mongodb after the node became secondary

I'm using mongodb replica set in my rails application with 1 primary(node A) and 1 secondary node(node B).
It was working prefectly fine until i added one more node(node C) and made node C as primary. Now that primary node (node C) is having all the content but as per my observation content created on previous primary(node A) can only be read now but not edited or destroyed. As i have understood that data can only be written to primary node so i guess data from secondary(node A- earlier primary) can only be read while being accessed.
Is this a common behaviour or i'm missing something?
EDIT:
I took a db dump of replica set from the primary node(node C) and then db.dropDatabase() and mongorestore again. I found data missing in some collections. Can anyone explain what could be the issue.
In a mongodb replica set you can only write (modify, create, destroy) on the primary node. Writes are then propagated to other (secondary) nodes in the replica set. Note that this propagation may not be immediate.
However when the primary change you should be able to write on data previously written by another primary.
Note that when you add a node to a replica set, it's preferable to load the latest database backup within this node before. The replication process is based on an oplog shared between each node that indicates creation/deletion/update, however this oplog has a limited number of entries. So earlier entries may not be considered by your new primary ...

Resources