How can I run multiple Neo4j databases on a single server? - neo4j

How can I run multiple Neo4j databases simultaneously on a single server? I would like to have separate data directories and ports if this is possible.
Has anyone done this successfully and if so explain how to do this
I have tried something like:
bin\neo4j start

To set up Neo4j with multiple instances on a single server, you essentially configure a cluster, with each node having its own set of configuration properties. You then run the cluster in single-instance (non-HA) mode (otherwise you'll just end up with a replication cluster, which doesn't meet your requirement).
Full instructions are in the Neo4j docs online and in your local doc\manual folder.
Note: The folks at Neo Technology call this out for dev/test purposes. I can't offer guidance on running this in production, other than the fact you'd have multiple instances competing for the same resources (cpu, disk, memory, network).

It's possible to setup Rexster to serve up multiple neo4j database directories. This is great if you're using the Gremlin query language. Other access forms may not be available (beyond my knowledge). Check out this question/answer: possible to connect to multiple neo4j databases via bulbs/Rexster?

Related

deploying ElasticSearch production using multiple clusters and nodes under docker

I am new to launching ES for the production environment. I want to create production-ready ElasticSearch clusters having master nodes and data and backup nodes and etc. I read tutorials on the internet regarding this matter including the official document but I cannot get my head around the topic in the official document it's running multiple clusters under one machine what if that machine goes down for some reason? where are the master nodes playing in that scenario? where are the backup nodes? to protect against data loss?
I want to know if there are any straightforward solutions that I can use for deploying the ES on multiple machines serving the same purpose (for one project with specific data types) that can be easily distributed and fault-tolerant?
Running multiple containers on a single host makes sense if you have a lot of resources on a given host that you want to partition up and use efficiently. then you can have multiple hosts with multiple Elasticsearch containers forming a cluster
If you do that, look at using allocation awareness to make sure shards are adequately balanced so that the loss of a single host will mean you maintain your data

Can I have some keyspaces replicated to some nodes?

I am trying to build multiple API for which I want to store the data with Cassandra. I am designing it as if I would have multiple hosts but, the hosts I envisioned would be of two types: trusted and non-trusted.
Because of that I have certain data which I don't want to end up replicated on a group of the hosts but the rest of the data to be replicated everywhere.
I considered simply making a node for public data and one for protected data but that would require the trusted hosts to run two nodes and it would also complicate the way the API interacts with the data.
I am building it in a docker container also, I expect that there will be frequent node creation/destruction both trusted and not trusted.
I want to know if it is possible to use keyspaces in order to achieve my required replication strategy.
You could have two Datacenters one having your public data and the other the private data. You can configure keyspace replication to only replicate that data to one (or both) DCs. See the docs on replication for NetworkTopologyStrategy
However there are security concerns here since all the nodes need to be able to reach one another via the gossip protocol and also your client applications might need to contact both DCs for different reads and writes.
I would suggest you look into configuring security perhaps SSL for starters and then perhaps internal authentication. Note Kerberos is also supported but this might be too complex for what you need at least now.
You may also consider taking a look at the firewall docs to see what ports are used between nodes and from clients so you know which ones to lock down.
Finally as the above poster mentions, the destruction / creation of nodes too often is not good practice. Cassandra is designed to be able to grow / shrink your cluster while running, but it can be a costly operation as it involves not only streaming data from / to the node being removed / added but also other nodes shuffling around token ranges to rebalance.
You can run nodes in docker containers, however note you need to take care not to do things like several containers all accessing the same physical resources. Cassandra is quite sensitive to io latency for example, several containers sharing the same physical disk might render performance problems.
In short: no you can't.
All nodes in a cassandra cluster from a complete ring where your data will be distributed with your selected partitioner.
You can have multiple keyspaces and authentication and authorziation within cassandra and split your trusted and untrusted data into different keyspaces. Or you an go with two clusters for splitting your data.
From my experience you also should not try to create and destroy cassandra nodes as your usual daily business. Adding and removing nodes is costly and needs to be monitored as your cluster needs to maintain repliaction and so on. So it might be good to split cassandra clusters from your api nodes.

micro service architecture database backup and restore

I'm working on a big project, which is based on micro service architecture , so consider I have 10 service which some of them have their own database,
these databases are in different technologies (mysql, mongodb , elastic, ... )
so what is the best practice for backup and restore collection of services?
the real problem is these databases are related to each other, for example in my logic backend server I keep oauhId of each user which comes from oauth server,
now consider restore these two databases separately and now my users db in logic server contains some users which there aren't any related records to them on oauth server,
just for your information, I'm using docker , docker-compose, docker swarm for my service orchestration.
As an idea: check how your services depend on each other. If your dependencies are acyclic, you might be able to backup all your data outside-in or inside-out, without running into consistency issues.
Doing so would guarantee you to have no elements in services depending on an inner one after your restore.
If your services show cyclic dependencies, you might be better serviced to have each service redundantly (e.g. master slave replication). Then you can take down the slave instances, taking a backup from the whole lot of slaves while they are offline. That would allow you to create an atomic backup accross all services. However your quality of the backup is then based on the quality of your master slave replication at each service.
Lastly you could keep record of change per service, plus a full backup. Thus you can write your rollback and the start applying the record of change until you reach a consistent state accross the service instances. I think that requires you to have logical dependencies (request identifier) that allows you to correlate the record of change elements (i.e. apply them across the services without the risk to apply them in a way that defies the logical dependencies that occured when clients actually interacted with your services).
I hope these ideas can help you solve your problem :)

Multiple standalone neo4j instances on a single machine

I was wondering if I could run multiple standalone instances of neo4j on a single machine. I understand that I could configure multiple instances as HA cluster (here), but that is not my intention, I only need two totally different and independent instances of neo4j on my machine (Which is a Mac OSX if that makes a difference). This is only for my dev testing and I tried having two separate directories with different data/ and setting two different ports for them, but only one runs properly.
I would appreciate any help coming my way. Thank you.
The most easy way is to unpack the neo4j installation into two different locations. In one of the locations you need to change the port settings in conf/neo4j-server.properties and, if neo4j-shell is enabled conf/neo4j.properties as well.
Also consider to set dbms.pagecache.memory to a reasonable value. By default each instance will eat up up to 75 % of RAM minus heap space - which is too much when running multiple instance on one box.
Based on #mepla's findings: the https port in neo4j-server.properties needs to be changed as well.
You can also run individual docker images which point to different data directories,
see: http://neo4j.com/developer/docker
You can use Ineo:
https://github.com/cohesivestack/ineo
A simple but useful Neo4j instance manager
this github repository (Multiple-Instances-Neo4j-Manager) provides neo4j manager to deal with multiple standalone instances on a single machine.

Neo4j Server vs Embedded mode

I wanted to know exactly what is meant by neo4j server and the embedded mode. Even i gone through the post Neo4j Server vs. Embedded. But i dint get clearly those concepts. I have installed neo4j 2.1.1 on windows 64bit machine which is a neo4j server. So when neo4j embedded mode will come into picture?
Also how can we switch between embedded mode to server mode or vice-versa?
When i was working with mysql to neo4j migration(using batch-import), after importing the nodes and relationships into neo4j getting a message in a messages.log file as below:
Clean shutdown on BatchInserter(EmbeddedBatchInserter[C:\Users\Neo4j\t2.db])
How embedded is appearing here if i have installed neo4j server ? So please clarify these queries.
Thanks
Embedded databases run inside of your application, meaning they're in the same JVM as your application. In general, with embedded databases you'll do direct database access or cypher queries. There are a lot of pros and cons here - one of the cons is that your JVM process locks the database; you can't have a bunch of different applications in different JVMs accessing the same embedded database at the same time. The pro is direct access.
When you're running a server, usually that means you're using the web admin components which also provide a set of RESTful services. The pro of this is that it's in a different JVM. Meaning you could access it more easily from other programming languages, over the network, and so on. You could have many applications in many JVMs all talking to a server instance via RESTful services. Generally access isn't as fast, but it's more flexible. When you run it this way though, direct access to the graph inside of a java application (using the Neo4J API) is off limits.
If you want to run the web admin/GUI stuff and RESTful services from within an embedded database, you can do that. See these instructions for how.
Here's a code snippet: what you need is the WrappingNeoServerBootstrapper.
AbstractGraphDatabase graphdb = getGraphDb();
WrappingNeoServerBootstrapper srv;
srv = new WrappingNeoServerBootstrapper( graphdb );
srv.start();
// The server is now running
// until we stop it:
srv.stop();

Resources