Multiple sessions in neo4j ogm - neo4j

I am building a neo4j ogm service using ogm in java. I need to connect to 2 neo4j server from my service to handle failover and replication. Is it possible to create multiple session each towards different neo4j server from one ogm service.

You can, in theory, create multiple SessionFactory instances, pointing to different database instances and perform each operation on both. Just use Java configuration instead of property file (this is true for OGM only, SDN would not be that simple).
There are multiple things to look out for:
you can't rely on the the auto generated ids as they could be different on each database instance
when writing to 2 instances, write to first instances may (for various reasons - network issues, latencies, concurrency etc..) succeed and write to second may fail, or vice versa - your code would need to handle that somehow
concurrency in general - queries dependent on state of the database may behave differently on the two instances because of one of them received more updates than the other (the second is "behind")
Because of all these reasons I wouldn't recommend such solution at all.
You would be better off with either Neo4j's HA or causal cluster. See the website regarding licensing.

Related

Neo4j querying and aggregation across multiple instances

I am planning to use neo4j for an event management system. The involved entities are events, places, persons, organizations, so and so forth. But inorder to keep each org data separate I plan to create separate DB instance for each org. Becoz of this separation of Database instances, the 'Place' nodes are likely to get repeated in these mutliple db instances.
So, now, is it possible to aggregate for events based on Place nodes from all db instances? Or is it that I have to build my own custom aggregation like map-reduce?
Thanks in advance for helping me out on this..
In Neo4j 4.0, if you have a license for enterprise edition, you can leverage Neo4j Fabric, which should allow you to do exactly this: connect to a proxy instance, which must be configured to see other running db instances (which may be running on the same Neo4j dbms as the proxy instance, or which could instead be running on separate servers/clusters).
Then you can query across the graphs, aggregating and working with the result set across them as needed.

Difference between database connector/reader nodes in KNIME

While creating some basic workflow using KNIME and PSQL I have encountered problems with selecting proper node for fetching data from db.
In node repo we can find at least:
PostgreSQL Connector
Database Reader
Database Connector
Actually, we can do the same using 2) alone or connecting either 1) or 2) to node 3) input.
I assumed there are some hidden advantages like improved performance with complex queries or better overall stability but on the other hand we are using exactly the same database driver, anyway..
There is a big difference between the Connector Nodes and the Reader Node.
The Database Reader, reads data into KNIME, the data is then on the machine running the workflow. This can be a bad idea for big tables.
The Connector nodes do not. The data remains where it is (usually on a remote machine in your cluster). You can then connect Database nodes to the connector nodes. All data manipulation will then happen within the database, no data is loaded to your machine (unless you use the output port preview).
For the difference of the other two:
The PostgresSQL Connector is just a special case of the Database Connector, that has pre-set configuration. However you can make the same configuration with the Database Connector, which allows you to choose more detailed options for non standard databases.
One advantage of using 1 or 2 is that you only need to enter connection details once for a database in a workflow, and can then use multiple reader or writer nodes. I'm not sure if there is a performance benefit.
1 offers simpler connection details with the bundled postgres jdbc drivers than 2

Can I have some keyspaces replicated to some nodes?

I am trying to build multiple API for which I want to store the data with Cassandra. I am designing it as if I would have multiple hosts but, the hosts I envisioned would be of two types: trusted and non-trusted.
Because of that I have certain data which I don't want to end up replicated on a group of the hosts but the rest of the data to be replicated everywhere.
I considered simply making a node for public data and one for protected data but that would require the trusted hosts to run two nodes and it would also complicate the way the API interacts with the data.
I am building it in a docker container also, I expect that there will be frequent node creation/destruction both trusted and not trusted.
I want to know if it is possible to use keyspaces in order to achieve my required replication strategy.
You could have two Datacenters one having your public data and the other the private data. You can configure keyspace replication to only replicate that data to one (or both) DCs. See the docs on replication for NetworkTopologyStrategy
However there are security concerns here since all the nodes need to be able to reach one another via the gossip protocol and also your client applications might need to contact both DCs for different reads and writes.
I would suggest you look into configuring security perhaps SSL for starters and then perhaps internal authentication. Note Kerberos is also supported but this might be too complex for what you need at least now.
You may also consider taking a look at the firewall docs to see what ports are used between nodes and from clients so you know which ones to lock down.
Finally as the above poster mentions, the destruction / creation of nodes too often is not good practice. Cassandra is designed to be able to grow / shrink your cluster while running, but it can be a costly operation as it involves not only streaming data from / to the node being removed / added but also other nodes shuffling around token ranges to rebalance.
You can run nodes in docker containers, however note you need to take care not to do things like several containers all accessing the same physical resources. Cassandra is quite sensitive to io latency for example, several containers sharing the same physical disk might render performance problems.
In short: no you can't.
All nodes in a cassandra cluster from a complete ring where your data will be distributed with your selected partitioner.
You can have multiple keyspaces and authentication and authorziation within cassandra and split your trusted and untrusted data into different keyspaces. Or you an go with two clusters for splitting your data.
From my experience you also should not try to create and destroy cassandra nodes as your usual daily business. Adding and removing nodes is costly and needs to be monitored as your cluster needs to maintain repliaction and so on. So it might be good to split cassandra clusters from your api nodes.

Should instances of a horizontally scaled microservice share DB?

Given a microservice that owns a relational database and needs to scale horizontally, I see two approaches to provisioning of the database server:
provide each instance of the service with it's own DB server instance with a coupled process lifecycle
OR
have the instances connect to a shared (by identical instances of the same service) independent db server or cluster
With an event driven architecture and the former approach, each instance of the microservice would need to process each event and take the appropriate action to mutate its own isolated state. This seems inefficient.
Taking the latter approach, only one instance has to process the event to achieve the same effect but as a mutation of the shared state. One must ensure each event is processed by only one instance of the given microservice (is this trivial?) to avoid conflict.
Is there consensus on preferred approach here? What lessons has your experience taught you on this?
I would go with the first approach, a service local DB. Each instance has its own DB instance. This enables to change the persistence layer between versions of the service.
Changing the ER model otherwise would lead to conflicts. You would also be able to change to a NoSQL solution with this approach easily.
With the event driven design, I can recommend this book: Designing Event Driven Systems
As I see it, a service receives an request that leads to an Event. This Event is consumed by the other instances of the service, therefore the request doesn't need to be processed again, but the result has to be copied to the instances state.

Does informix have a "materialized view" equivalent or DB-table syncing

Question: Does Informix have a construct equivalent to Oracle's "materialized view" or is there a better way to synchronize two tables (not DB's) accross a DB link?
I could write a sync myself (was asked to) but that seems like re-inventing the wheel.
Background: Recently we had to split (one part of DB one one server, the other part on the other server) a monolithic Informix 9.30 DB (Valent's MPM) since the combination of AppServer and DB server couldn't handle the load anymore.
In doing this we had to split a user defined table space (KPI Repository) aranged in a star shema of huge fact tables and well defined dimension tables.
Unfortunately a telco manager decided to centralize the dimension tables (Normalization, no data redundancy, no coding needed) on one machine and thus make them available as views over a DB-link on the other machine. This is both slow and unstable, as it every now and then crashes the DB server if the view is used in sub-queries (demonstrable), very uncool on a producton server
I may be getting your requirements but could you not just use enterprise replication to replicate the single table across the DB's?
IDS 9.30 is archaic (four main releases off current). Ideally, it should not still be in service; you should be planning to upgrade to IDS 11.50.
As MrWiggles states, you should be looking at Enterprise Replication (ER); it allows you to control which tables are replicated. ER allows update-anywhere topologies; that is, if you have 2 systems, you can configure ER so that changes on either system are replicated to the other.
Note that IDS 9.40 and 10.00 both introduced a lot of features to make ER much simpler to manage - more reasons (if the fact that IDS 9.30 is out of support is not sufficient) to upgrade.
(IDS does not have MQT - materialized query tables.)

Resources