neo4j broken/corrupted after ungraceful shutdown - neo4j

I'm using Neo4j over windows for testing purposes and I'm working with a db containing ~2 million relations and about the same amount of nodes. after I had an ungraceful shutdown of neo4j while writing a batch of relations the db got corrupted.
it seems like there are some broken nodes/relations in the db and whenever I try to read them I get this error (I'm using py2neo):
Error: NodeImpl#1292315 not found. This can be because someone else deleted this entity while we were trying to read properties from it, or because of concurrent modification of other properties on this entity. The problem should be temporary.
I tried rebooting but neo4j fails to recover from this error. I found this question:
Neo4j cannot read certain nodes. Throws NotFoundException. Corrupt database
but the answer he got is no good for me because it involved in going over the db and redo the indexing, and I can't even read those broken nodes/relations so I can't fix their index (tried it and got the same error).
In general I've had many stability issues with neo4j (and on multiple platforms, not just windows). if no decent solution is found for this problem I will have to switch to a different database.
thanks in advance!

I wrote a tool a while ago that allows you to copy a broken store and keeps the good records intact.
You might want to check it out. I assume you used the 2.1.x version of Neo4j.
https://github.com/jexp/store-utils/tree/21
For 2.0.x check out:
https://github.com/jexp/store-utils/tree/20

To verify if your datastore is consistent follow the steps mentioned in http://www.markhneedham.com/blog/2014/01/22/neo4j-backup-store-copy-and-consistency-check/.
Are you referring to batch inserter API when speaking of "while writing a batch of relations"?
If so, be aware that batch inserter API requires a clean shutdown, see the big fat red warning on http://docs.neo4j.org/chunked/stable/batchinsert.html.

Are the broken nodes schema indexed and are you attempting to read them via this indexed label/property? If so, it's possible you may have a broken index following the sudden shutdown.
Assuming this is the case, you could try deleting the schema subdirectory within the graph store directory while the server is not running and let the database rebuild the index on restart. While this isn't an official way to recover from a broken index, it can sometimes work. Obviously, I suggest you back up your store before trying this.

Related

MySQL Error 2013: Lost connection to MySQL server during query

I've read all post with the same or very close headline, but still can't find a proper solution or explanation to my problem.
I'm working with MySQL Workbench 6.3 CE. I have been able to create a database with several tables, and create a conexion with python to write data to it. Still, I had a problem related to a varchar field that needed to be set to more than 45 characters. When I try to set it to bigger limits, like VARCHAR(70), no matter how many times I try, wether I set higher limits for timeout, I get the 2013 error, saying my connection was closed during the query.
I'm using the above version of workbench, on windows 10, and I'm trying to modify that field from the workbench. Afer that first time, I can't drop a table either, nor can I connect from python.
What is happening?
Ok, apparently what was happening is that I had a block, and there where a lot of query waiting in a situation of "waiting for table metadata block".
I did the following in the console of workbench
Select concat('KILL ',id,';') from information_schema.processlist where user='root'
that generates a list of all those processes. I copy that list in a new tab, and execute a massive kill of processes. After that it worked again.
Can anybody explain me how did I arrive to that situation and what precautions to take in my python scripts so as to avoid it?
Thnak you

How to send SQL queries to two databases simultaneously in Rails?

I have a very high-traffic Rails app. We use an older version of PostgreSQL as the backend database which we need to upgrade. We cannot use either the data-directory copy method because the formats of data files have changed too much between our existing releases and the current PostgreSQL release (10.x at the time of writing). We also cannot use the dump-restore processes for migration because we would either incur downtime of several hours or lose important customer data. Replication would not be possible as the two DB versions are incompatible for that.
The strategy so far is to have two databases and copy all the data (and functions) from existing to a new installation. However, while the copy is happening, we need data arriving at the backend to reach both servers so that once the data migration is complete, the switch becomes a matter of redeploying the code.
I have figured out the other parts of the puzzle but am unable to determine how to send all writes happening on the Rails app to both DB servers.
I am not bothered if both installations get queried for displaying data to the user (I can discard the data coming out of the new installation); so, if it is possible on driver level, or adding a line somewhere in the ActiveRecord, I am fine with it.
PS: Rails version is 4.1 and the company is not planning to upgrade that.
you can have multiple database by adding an env for the database.yml file. After that you can have a seperate class Like ActiveRecordBase and connect that to the new env.
have a look at this post
However, as I can see, that will not solve your problem. Redirecting new data to the new DB while copying from the old one can lead to data inconsistencies.
For and example, ID of a record can be changed due to two data source feeds.
If you are upgrading the DB, I would recommend define a schedule downtime and let your users know in advance. I would say, having a small downtime is far better than fixing inconstant data down the line.
When you have a downtime,
Let the customers know well in advance
Keep the downtime minimal
Have a backup procedure, in an even the new site takes longer than you think, rollback to the old site.

mnesia files damaged need to forensically dump everything

I have damaged my Mnesia database beyond repair as a result of overestimating the fragility of the implementation. When I try Mnesia API the records I need are not visible even though they keys are visible in the file. Even though the documentation indicates that Mnesia artifacts are DETS files they cannot be opened with or identified as DETS artifacts. PS: dump_to_textfile() does not work either.
Eventually I was able to dump my DB. It did not end my Mnesia problems but it gave me options I did not have before.
SETUP:
Originally I had implemented a master-master mnesia cluster. (read the docs). It turns out that not even the most seasoned Erlang programmer uses Mnesia replication as there are to many flaws. In fact I come to this information from the Erlang inner circle and a few L1 teams too. In my case, however, the work was already in production. And that's when problems started.
We started getting DB consistency errors and, my favorite, network or DB partition errors. It takes a very highly skilled and knowledgeable individual to recover as well as a lot of planning and code in advance; which I did not have.
Ultimately I took two steps. (a) removed the second app so that even though the DB was in a master-master cluster; one was a slave because it was never used as a master. (b) In a second implementation I split the cluster so that the app ran on a single node with a single DB. #a was in production and #b was the warm standby. Replication was manual as writes were very rare.
In the single node deployment there are two nodes. The first node is the application; app#ks and on the same hardware was an "erl" node when I needed to rpc into the app and see how things were going.
MY SOLUTION:
when I posted this question I was trying to dump the contents of my Mnesia DB. I was having a number of problems because I was trying to access the DB from the admin node as the application node was operational.
Because I was trying to access the mnesia lib from the erl node the DB was not LOCAL to the erl node and so dump_to_textfile produced an empty file. I eventually had success when I used rpc to tell the app#ks node to dump.
STILL UNDEFINED
When I launched the admin node I set the mnesia dir parameter to the same folder as the app#ks node. I have a vague memory that this is undesirable.
There are many more Mnesia issues to solve but none that refer to the problem I reported. But I still do not know how to extract the raw data from the various DB files.

Neo4j 2.0.4 browser cannot query large datasets

Whenever I try to run cypher queries in Neo4j browser 2.0 on large (anywhere from 3 to 10GB) batch-imported datasets, I receive an "Unknown Error." Then Neo4j server stops responding, and I need to exit out using Task Manager. Prior to this operation, the server shuts down quickly and easily. I have no such issues with smaller batch-imported datasets.
I work on a Win 7 64bit computer, using the Neo4j browser. I have adjusted the .properties file to allow for much larger memory allocations. I have configured my JVM heap to 12g, which should be fine for 64bit JDK. I just recently doubled my RAM, which I thought would fix the issue.
My CPU usage is pegged. I have the logs enabled but I don't know where to find them.
I really like the visualization capabilities of the 2.0.4 browser, does anyone know what might be going wrong?
Your query is taking a long time, and the web browser interface reports "Unknown Error" after a certain timeout period. The query is still running, but you won't see the results in the browser. This drove me nuts too when it first happened to me. If you run the query in the neo4j shell you can verify whether or not this is the problem, because the shell won't time out.
Once this timeout occurs, you can find that the whole system becomes quite non-responsive, especially if you re-run the query, because now you have two extremely long queries running in parallel!
Depending on the type of query, you may be able to improve performance. Sometimes it's as simple as limiting the number of returned nodes (in cases where you only need to find one node or path).
Hope this helps.
Grace and peace,
Jim

Migrate Data from Neo4j to SQL

Hi I am using neo4j in my application and my structure is as following:
I am using Embedded Graph API
I have several databases that I point to using a pool that I maintain in my application eg-> db1, db2, db3, ..... db100
When I want to access a particular database I point to it using new EmbeddedGraphDatabase("Path to db(n)")
The problem is that when the connection pool count increases the RAM size being consumed by the application keep increasing and breaks down the application at a point of limit.
So I am Thinking of migrating from Neo4j to some other Database.
Additionally only a small part of my database is utilizing the graph structure.
One way for migration is that I write a script for it. Is there any better option?
My another question is what is the best Database so that my structure can be maintained.
Other view-point that I am thinking about is I can keep part of my data into Neo4j and shift another part to some other database.
If anything is unclear I can clarify.
Thanks in advance.
An EmbeddedGraphDatabase instance is not the equivalent of a "connection" in SQL. It's designed to run a long time (days, months). Hence starting/stopping is costly.
What is the use case for having hundreds of separate databases in the same JVM?
Your lots of small databases will perform poorly as the graphdb is designed to hold the whole datamodel on a single host.
Do you run a single JVM per database?
You can control the amount of memory used by neo4j by providing the correct properties for memory mapping and also use the gcr cache from neo4j-enterprise and control the cache size-property variables.
I think it still makes sense to keep the graph part in Neo4j and only move the non-graphy part.

Resources