I have my Neo4J (embedded) database setup like this:
I attach several user nodes to the reference node.
To each user node can be attached one or more project nodes.
To each project node is a complex graph attached.
The complex graph is traversable with a single traverse pattern (there is a hidden tree structure in them).
What I'd like to do is the following:
Remove all the nodes below a project node.
Remove all the project nodes below a user, when there's nothing below the project nodes
Export all nodes below a specific user node to .graphML (probably using Gremlin Java API?)
Import a .graphML file back to the database below a specific user node without removing the information that is located under different user nodes.
I've already worked with the Gremlin GraphML reader for importing and exporting entire Neo4J databases but I was not able to find something about importing/exporting subgraphs.
If this is in fact possible, how would Neo4J handle two users trying to import something at the same time? For instance user 1 imports his section under the user1 node, and user 2 imports his data under the user 2 node simultaneously.
The other possibility is to have a Neo4J database per user, but this is the less preferable option really and I'm very unsure whether it's actually posssible, be it embedded or server version. I've read something about having multiple server versions on different ports but our amount of users is per definition unlimited...
Any help would be greatly appreciated.
EDIT 1: I've also come across something called Geoff (org.neo4j.geoff) which deals with subgraphs. I'm absolutely clueless as to how this works but I'm looking into it right now.
You might take a lock on the user node when starting the import, so that the second import would have to wait (and would have to check).
With cypher queries you can delete the subgraphs and also export them to cypher again. There is export code for query-results in the Neo4j Console repository.
There you can also find geoff-export and import and also cypher importers.
One option may be to use something like Tinkerpop blueprints to create a generic Graph when traversing, then doing a GraphML export.
https://github.com/tinkerpop/blueprints/wiki/GraphML-Reader-and-Writer-Library would have more information, but if you are looking to export subgraphs, this is probably your best option.
Related
I am getting through the online examples, and can already use mnesia ram copies and also connect them, but I am a bit confused on a couple of things.
1: Does the starter node (the one who creates the schema), only have the local schema? (for example, in root folder = Mnesia.name#ip)
I ask because on another node, I can simply start mnesia, and change_config(extra_db_nodes, [node]), and automatically get all the data that is on the starting node.
This seems weird to me, what happens if all nodes go down? This means starter node needs to be ran first before you can do anything.
2: There seems to be a lot of different ways to connect nodes, and to copy the tables ... Could I get a list of different ways to do this, and their impacts?
3: From the first question, after calling change_config, how can you know that its finished downloading all the data before you can start to use it? For example, if someone connects to the node, and you check if they are already online, they might be connected to another node and you dont get that data during the check.
4: After connecting to a node, are you automatically connected to all nodes? And does it automatically update your local ram copies without doing anything? How does it assure synchronization when reading, and writing? Do I have to do anything special?
And about question 1 again -- couldn't you have a node process running that holds the local schema, and use this node to connect all nodes together? And if possible could you forbid mnesia from copying ram copies to this node process?
I know this is a lot, so thank you for your time.
Not a direct answer to your questions, but you can check out Erlang Performance Lab which might help you understand how some operations in Mnesia works by visualizing the messages between different nodes.
In Neo4j, I created the database through the various exercises I'm doing.
When I run a query, for example MATCH (n) RETURN (n), until that database that was created in "Christmas of 1914" appears on the screen, making my interface ugly, polluted, loaded with unnecessary objects to work at that moment.
If I work with Northwind, I want to see only Northwind, if I work with Facebook, I just want to see Social, and so on. I do not want to see all the databases on the planet on my screen each time I run a query like MATCH (n) RETURN (n).
Neo4j doesn't really have a direct equivalent to multiple databases stored within the same server instance. There are three options for achieving this:
1) the closest match would be create run an additional instance of neo4j on the same server. You will need to edit the neo4j.conf file to give the new instance a new port number and a new data directory. This will give you isolation between the data and user accounts in the two databases. The downside is you will need to divide up the RAM on the box before running, effectively limiting both instances to half the RAM.
2) You can attach labels to your nodes to identify which bucket of data (database in the RDBMS world) each node belongs to. You can operate as if the two are isolated even though they really live in the same database instance. Neo4j won't do a lot to help you enforce this, you will need to do the work at the application level. There is a mechanism for you to restrict users to only being able interact with a subset of your graph but you have to write custom procedures and restrict the users to only using those. I haven't tried it but it sounds tedious.
https://neo4j.com/docs/operations-manual/current/security/authentication-authorization/subgraph-access-control/
3) If you are running on VMs or the cloud, you mind as well just create a new instance for your second database. It achieves the same effect as number one but with better isolation of resources.
This is just a general question, not too technical. We have this use-case wherein we are to load hundreds of thousands of records to an existing Neo4j database. Now, we cannot afford to make the database offline because of users who are accessing it. I know that Neo4j requires exclusive lock on the database while it's performing batch updates. Is there a way around my problem? I don't want to lock my database while doing updates. I still want my users to access it - even for just read-only access. Thanks.
Neo4j never requires exclusive lock on the database. It selectively locks portions of the graph that are affected by mutating operations. So there are some things you can do to achieve your goal. Are you a Neo4j Enterprise customer?
Option 1: If so, you can run your batch insert on the master node and route users to slaves for reading.
Option 2: Alternatively, you could do a "blue-green" style deployment where you:
take a backup (B) of your existing database (A), then mark the A database read-only
apply your batch inserts onto B either by starting a separate instance, or even better, using BatchInserters. That way, you'll insert your hundreds of thousands in a few seconds
start the new database B
flip a switch on a load-balancer, so that users start to be routed to the B instead of A
take A down
(Please let me know if you need some tips how to make a read-only DB.)
Option 3: If you can only afford to run one instance at any one time, then there are techniques you can employ to let your users access the database as usual and still insert large volumes of data. One of them could be using a single-threaded "writer" with a queue that batches write operations. Because one thread only ever writes to the database, you never run into deadlock scenarios and people can happily read from the database. For option 3, I suggest using GraphAware Writer.
I've assumed you are not trying to insert hundreds of thousands of nodes to a running Neo4j database using Cypher. If you are, I would start there and change it to use Java APIs or the BatchInserter API.
I am looking for a method to store cypher queries and when adding nodes and relations be notified when it matches said query? Can this be done currently? Something similar to ElasticSearch percolators would be great.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
Update
The answer below was accurate in 2014. It's mostly accurate in 2018.
But now there is a way of implementing triggers in Neo4j provided by Max DeMarzi which is pretty good, and will get the job done.
Original answer below.
No, it doesn't.
You might be able to get something similar to what you want by using a TransactionEventHandler object, which basically lets you bind a piece of code (in java) to the processing of a transaction.
I'd be really careful with running cypher in this context though. Depending on what kind of matching you want to do, you could really slaughter performance by running that each time new data is created in the graph. Usually triggers in an RDBMS are specific to inserts or updates on a particular table. In Neo4J, the closest equivalent you might have is on creating/modifying a node of a certain type of label. If your app has any number of different node classes, it wouldn't make sense to run your trigger code whenever new relationships/nodes are created, because most of the time the node type probably wouldn't be relevant to the trigger code.
Related reading: Do graph databases support triggers? and a feature request for triggers in neo4j
Neo4j 3.5 supports triggers.
To use this functionality - Enable apoc.trigger.enabled=true in $NEO4J_HOME/config/neo4j.conf.
you have to add APOC to the server - it's not there by default.
In a trigger you register Cypher statements that are called when data in Neo4j is changed (created, updated, deleted). You can run them before or after commit.
Here is the help doc -
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#triggers
I'm just getting into Neo4j and I spent a couple hours mapping out some nodes and relationships.
I D/L'd the JSON and I'm trying to move the nodes to another computer, it seems like it should be a pretty simple query, but everything I'm finding about batch import is for csv's and a bit more involved.
Is the just a simple cypher to import the JSON d'l from the local Neo4j server?
Moving a full graph db to another box is most simply done by copying over the data/graph.db directory.
Alternatively you can use neo4j-shell's dump command.