I am planning to use neo4j for an event management system. The involved entities are events, places, persons, organizations, so and so forth. But inorder to keep each org data separate I plan to create separate DB instance for each org. Becoz of this separation of Database instances, the 'Place' nodes are likely to get repeated in these mutliple db instances.
So, now, is it possible to aggregate for events based on Place nodes from all db instances? Or is it that I have to build my own custom aggregation like map-reduce?
Thanks in advance for helping me out on this..
In Neo4j 4.0, if you have a license for enterprise edition, you can leverage Neo4j Fabric, which should allow you to do exactly this: connect to a proxy instance, which must be configured to see other running db instances (which may be running on the same Neo4j dbms as the proxy instance, or which could instead be running on separate servers/clusters).
Then you can query across the graphs, aggregating and working with the result set across them as needed.
Related
While creating some basic workflow using KNIME and PSQL I have encountered problems with selecting proper node for fetching data from db.
In node repo we can find at least:
PostgreSQL Connector
Database Reader
Database Connector
Actually, we can do the same using 2) alone or connecting either 1) or 2) to node 3) input.
I assumed there are some hidden advantages like improved performance with complex queries or better overall stability but on the other hand we are using exactly the same database driver, anyway..
There is a big difference between the Connector Nodes and the Reader Node.
The Database Reader, reads data into KNIME, the data is then on the machine running the workflow. This can be a bad idea for big tables.
The Connector nodes do not. The data remains where it is (usually on a remote machine in your cluster). You can then connect Database nodes to the connector nodes. All data manipulation will then happen within the database, no data is loaded to your machine (unless you use the output port preview).
For the difference of the other two:
The PostgresSQL Connector is just a special case of the Database Connector, that has pre-set configuration. However you can make the same configuration with the Database Connector, which allows you to choose more detailed options for non standard databases.
One advantage of using 1 or 2 is that you only need to enter connection details once for a database in a workflow, and can then use multiple reader or writer nodes. I'm not sure if there is a performance benefit.
1 offers simpler connection details with the bundled postgres jdbc drivers than 2
I am building a neo4j ogm service using ogm in java. I need to connect to 2 neo4j server from my service to handle failover and replication. Is it possible to create multiple session each towards different neo4j server from one ogm service.
You can, in theory, create multiple SessionFactory instances, pointing to different database instances and perform each operation on both. Just use Java configuration instead of property file (this is true for OGM only, SDN would not be that simple).
There are multiple things to look out for:
you can't rely on the the auto generated ids as they could be different on each database instance
when writing to 2 instances, write to first instances may (for various reasons - network issues, latencies, concurrency etc..) succeed and write to second may fail, or vice versa - your code would need to handle that somehow
concurrency in general - queries dependent on state of the database may behave differently on the two instances because of one of them received more updates than the other (the second is "behind")
Because of all these reasons I wouldn't recommend such solution at all.
You would be better off with either Neo4j's HA or causal cluster. See the website regarding licensing.
Till now, I queried neo4j graph db in two ways,
Using server, where I need to select the data base, starting the
server, and querying via web page.
Using Java, In which, selecting the database path, creating the database object and executing the query.
Now, I am shifting from MySQL to Neo4j, where I have to replicate those databases and has to perform join queries.
My initial thought to do is, replicating every database in MySQL as a graph db in Neo4j. But, I don't have any clue on querying on two different graph's at a time.
Putting my question straightly:
How to perform join query on two different graphs in neo4j?
Neo4j doesn't really have a concept of different graphs. Each Neo4j database is all one big graph. You can store one set of nodes with a certain set of labels along side another set of nodes with another set of labels. Those two groups could be unconnected or connected in a few places and they could be thought of as different graphs, but there's nothing special to separate them.
If you want to have different Neo4j databases, you need to have different databases directories (graph.db). In server mode those would be handled by different server installations with different sets of ports. In Java they can simply be in different directories. There is no way to run a "join" between two datasets in either case without loading your data into memory and doing whatever you like to have one dataset.
How can I run multiple Neo4j databases simultaneously on a single server? I would like to have separate data directories and ports if this is possible.
Has anyone done this successfully and if so explain how to do this
I have tried something like:
bin\neo4j start
To set up Neo4j with multiple instances on a single server, you essentially configure a cluster, with each node having its own set of configuration properties. You then run the cluster in single-instance (non-HA) mode (otherwise you'll just end up with a replication cluster, which doesn't meet your requirement).
Full instructions are in the Neo4j docs online and in your local doc\manual folder.
Note: The folks at Neo Technology call this out for dev/test purposes. I can't offer guidance on running this in production, other than the fact you'd have multiple instances competing for the same resources (cpu, disk, memory, network).
It's possible to setup Rexster to serve up multiple neo4j database directories. This is great if you're using the Gremlin query language. Other access forms may not be available (beyond my knowledge). Check out this question/answer: possible to connect to multiple neo4j databases via bulbs/Rexster?
Question: Does Informix have a construct equivalent to Oracle's "materialized view" or is there a better way to synchronize two tables (not DB's) accross a DB link?
I could write a sync myself (was asked to) but that seems like re-inventing the wheel.
Background: Recently we had to split (one part of DB one one server, the other part on the other server) a monolithic Informix 9.30 DB (Valent's MPM) since the combination of AppServer and DB server couldn't handle the load anymore.
In doing this we had to split a user defined table space (KPI Repository) aranged in a star shema of huge fact tables and well defined dimension tables.
Unfortunately a telco manager decided to centralize the dimension tables (Normalization, no data redundancy, no coding needed) on one machine and thus make them available as views over a DB-link on the other machine. This is both slow and unstable, as it every now and then crashes the DB server if the view is used in sub-queries (demonstrable), very uncool on a producton server
I may be getting your requirements but could you not just use enterprise replication to replicate the single table across the DB's?
IDS 9.30 is archaic (four main releases off current). Ideally, it should not still be in service; you should be planning to upgrade to IDS 11.50.
As MrWiggles states, you should be looking at Enterprise Replication (ER); it allows you to control which tables are replicated. ER allows update-anywhere topologies; that is, if you have 2 systems, you can configure ER so that changes on either system are replicated to the other.
Note that IDS 9.40 and 10.00 both introduced a lot of features to make ER much simpler to manage - more reasons (if the fact that IDS 9.30 is out of support is not sufficient) to upgrade.
(IDS does not have MQT - materialized query tables.)