does neo4j embedded driver lock the db files? - neo4j

I have a general question about the embedded driver for neo4j. What exactly does it mean to be embedded, besides it being lower level and higher performance. Is it an actual instance of the database service or just a driver for connecting to an existing database process or service. For instance
Does using the embedded driver libraries acquire an exclusive lock on the database files?
Can multiple clients use the embedded driver to use the same database at the same time?
Can it run against a database that already has a database service(along with the REST api) running? Initial tests seem to indicate no since it throws a file lock exception.
Does the embedded driver have to be on the same machine or process as the database service? For instance if the db data files are on a shared SAN that multiple machines can access, and there is another server that is running the REST api and the neo4j service. The configuration on the driver seems to point to the data files directly rather than a service or port.

I am using embedded Neo4j in a project.
Embedded Neo4j is a Neo4j server started and shutdown by your application. So it is not just a driver used to connect to some standalone server. For a standalone server you would use Neo4j over Rest (locally or remotely).
Because of it's implementation embedded neo4j can be used by only one application - the application that started the embedded instance. It retrieves a lock on the graph files, and you can't use any other application (e.g. neo4j-sh) to access those files as long the embedded server is running.

Related

Persisting Neo4j transactions to external storage

I'm currently working on a new Java application which uses an embedded Neo4j database as its data store. Eventually we'll be deploying to a cloud host which has no persistent data storage available - we're fine while the app is running but as soon as it stops we lose access to anything written to disk.
Therefore I'm trying to come up with a means of persisting data across an application restart. We have the option of capturing any change commands as they come into our application and writing them off somewhere but that means retaining a lifetime of changes and applying them in order as an application node comes back up. Is there any functionality in Neo4j or SDN that we could leverage to capture changes at the Neo4j level and write them off to and AWS S3 store or the like? I have had a look at Neo4j clustering but I don't think that will work either from a technical level (limited protocol support on our cloud platform) or from the cost of an Enterprise licence.
Any assistance would be gratefully accepted...
If you have an embedded Neo4j, you should know where in your code you are performing an update/create/delete query in Neo, no ?
To respond to your question, Neo4j has a TransactionEventHandler (https://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/event/TransactionEventHandler.html) that captures all the transaction and tells you what node/rel has been added, updated, deleted.
In fact it's the way to implement triggers in Neo4j.
But in your case I will consider to :
use another cloud provider that allow you to have a storage
if not possible, to implement a hook on the application shutdown that copy the graph.db folder to a storage (do the opposite for the startup)
use Neo4j as a remote server, and to install it on a cloud provider with a storage.

Neo4j Server vs Embedded mode

I wanted to know exactly what is meant by neo4j server and the embedded mode. Even i gone through the post Neo4j Server vs. Embedded. But i dint get clearly those concepts. I have installed neo4j 2.1.1 on windows 64bit machine which is a neo4j server. So when neo4j embedded mode will come into picture?
Also how can we switch between embedded mode to server mode or vice-versa?
When i was working with mysql to neo4j migration(using batch-import), after importing the nodes and relationships into neo4j getting a message in a messages.log file as below:
Clean shutdown on BatchInserter(EmbeddedBatchInserter[C:\Users\Neo4j\t2.db])
How embedded is appearing here if i have installed neo4j server ? So please clarify these queries.
Thanks
Embedded databases run inside of your application, meaning they're in the same JVM as your application. In general, with embedded databases you'll do direct database access or cypher queries. There are a lot of pros and cons here - one of the cons is that your JVM process locks the database; you can't have a bunch of different applications in different JVMs accessing the same embedded database at the same time. The pro is direct access.
When you're running a server, usually that means you're using the web admin components which also provide a set of RESTful services. The pro of this is that it's in a different JVM. Meaning you could access it more easily from other programming languages, over the network, and so on. You could have many applications in many JVMs all talking to a server instance via RESTful services. Generally access isn't as fast, but it's more flexible. When you run it this way though, direct access to the graph inside of a java application (using the Neo4J API) is off limits.
If you want to run the web admin/GUI stuff and RESTful services from within an embedded database, you can do that. See these instructions for how.
Here's a code snippet: what you need is the WrappingNeoServerBootstrapper.
AbstractGraphDatabase graphdb = getGraphDb();
WrappingNeoServerBootstrapper srv;
srv = new WrappingNeoServerBootstrapper( graphdb );
srv.start();
// The server is now running
// until we stop it:
srv.stop();

How can I run multiple Neo4j databases on a single server?

How can I run multiple Neo4j databases simultaneously on a single server? I would like to have separate data directories and ports if this is possible.
Has anyone done this successfully and if so explain how to do this
I have tried something like:
bin\neo4j start
To set up Neo4j with multiple instances on a single server, you essentially configure a cluster, with each node having its own set of configuration properties. You then run the cluster in single-instance (non-HA) mode (otherwise you'll just end up with a replication cluster, which doesn't meet your requirement).
Full instructions are in the Neo4j docs online and in your local doc\manual folder.
Note: The folks at Neo Technology call this out for dev/test purposes. I can't offer guidance on running this in production, other than the fact you'd have multiple instances competing for the same resources (cpu, disk, memory, network).
It's possible to setup Rexster to serve up multiple neo4j database directories. This is great if you're using the Gremlin query language. Other access forms may not be available (beyond my knowledge). Check out this question/answer: possible to connect to multiple neo4j databases via bulbs/Rexster?

Delphi - Folder Synchronization over network

I have an application that connects to a database and can be used in multi-user mode, whereby multiple computers can connect the the same database server to view and modify data. One of the clients is always designated to be the 'Master' client. This master also receives text information from either RS232 or UDP input and logs this data every second to a text file on the local machine.
My issue is that the other clients need to access this data from the Master client. I am just wondering the best and most efficient way to proceed to solve this problem. I am considering two options:
Write a folder synchronize class to synchronize the folder on the remote (Master) computer with the folder on the local (client) computer. This would be a threaded, buffered file copying routine.
Implement a client/server so that the Master computer can serve this data to any client that connects and requests the data. The master would send the file over TCP/UDP to the requesting client.
The solution will have to take the following into account:
a. The log files are being written to every second. It must avoid any potential file locking issues.
b. The copying routine should only copy files that have been modified at a later date than the ones already on the client machine.
c. Be as efficient as possible
d. All machines are on a LAN
e. The synchronization need only be performed, say, every 10 minutes or so.
f. The amount of data is only in the order of ~50MB, but once the initial (first) sync is complete, then the amount of data to transfer would only be in the order of ~1MB. This will increase in the future
Which would be the better method to use? What are the pros/cons? I have also seen the Fast File Copy post which i am considering using.
If you use a database, why the "master" writes data to a text file instead of to the database, if those data needs to be shared?
Why invent the wheel? Use rsync instead. Package for windows: cwrsync.
For example, on the Master machine install rsync server, and on the client machines install rsync clients or simply drop files in your project directory. Whenever needed your application on a client machine shall execute rsync.exe requesting to synchronize necessary files from the server.
In order to copy open files you will need to setup Windows Volume Shadow Copy service. Here's a very detailed description on how the Master machine can be setup to allow copying of open files using Windows Volume Shadow Copy.
Write a web service interface, so that the clients an connect to the server and pull new data as needed. Or, you could write it as a subscribe/push mechanism so that clients connect to the server, "subscribe", and then the server pushes all new content to the registered clients. Clients would need to fully sync (get all changes since last sync) when registering, in case they were offline when updates occurred.
Both solutions would work just fine on the LAN, the choice is yours. You might want to also consider those issues related to the technology you choose:
Deployment flexibility. Using file shares and file copy requires file sharing to work, and all LAN users might gain access to the log files.
Longer term plans: File shares are only good on the local network, while IP based solutions work over routed networks, including Internet.
The file-based solution would be significantly easier to implement compared to the IP solution.

Have additional connections to Derby (read-only)

What I want to do: My application has a full connection to a Derby DB, and I want to poke around in the DB (read-only) in parallel (using a different tool).
I'm not sure how Derby actually works internally, but I understand that I can have only 1 active connection to a Derby DB.
However, since the DB is only consisting of files on my HDD, shouldn't I be able to open additional connections to it, in read-only mode?
Are there any tools to do just that?
There are two possibilities how to run Apache Derby DB.
Embedded: You run DB within your application → only one connection possible
Client: You start DB as server in separate process → classic DB with many connections
You can recognize the type upon driver size. If the driver has more then 2MB that you use embedded version.
Update
When you startup the derby engine (server or embedded) it gets exclusive access to database files.
If you need to access a single database from more than one Java Virtual Machine (JVM), you will need to put a server solution in place. You can allow applications from multiple JVMs that need to access that database to connect to the server.
For details see Double-booting system behavior.
I realize this is an old question, but I thought I might add a little more detail on a solution since links in the currently accepted answer are broken.
It is possible to run the Derby Network Server within a JVM that is using the embedded database already. The code that is using the embedded Derby database doesn't need to change anything and can keep using the DB as is, but with the Derby Network Server started, other programs can connect to derby and access the database.
All you need to do is ensure that derbynet.jar is on the classpath
And then you can do one of the following
Include the following line in the derby.properties file: derby.drda.startNetworkServer=true
Specify the property as a system property at java start
java -Dderby.drda.startNetworkServer=true
You can use the NetworkServerControl API to start the Network Server from a separate thread within a Java application:
NetworkServerControl server = new NetworkServerControl();
server.start (new PrintWriter(System.out));
More details here: http://db.apache.org/derby/docs/10.9/adminguide/tadminconfig814963.html
Keep in mind that doing this does not enable any security on this connection, so it is not a good idea to do this on a production system. It is possible to add security though and that is documented here: http://db.apache.org/derby/docs/10.9/adminguide/cadminnetservsecurity.html
Two other ideas:
In your application, shut down the database and close the connection when the database is not actively in use. Then your application won't interfere with another tool which is trying to open the database.
Make a copy of your database, by taking a backup (you can do this while the database is open by your application), then restore that backup to a separate place on your disk. Then you can use another tool to access the copied database at your ease.
If you can afford the memory and do not need up-to-date data, then you can access read-only databases from multiple JVMs by creating in-memory copies:
ij> connect 'jdbc:derby:memory:memdb;restoreFrom=mydb';

Resources