Failover handling with Community Edition of Neo4j - neo4j

I am using Community Edition of Neo4j on my local machine. Is there any way for handling failover (Failover mechanism is present in Enterprise Edition)?.
One option 1 have thought of is to take the backup of directory on real time and run the new Neo4j installation on backup directory. Is there any way to manage it through any script or some software like Load Balancer which will enable the backup on same port and IP on failover of main Server.
If anyone has prepared setup of handling failover with Community Edition. Please share.

Copying over the store directory while the database is running is a dangerous operation that might result in corrupted data.
Of course you can write a clustering solution on top of Neo4j community yourself, but be assured that you need to invest multiple man-years to do this in production ready way. That's why Neo4j enterprise already solved that problem for you.
The recommended way of course is to use enterprise edition. Contact Neo4j sales folks to gather more information regarding licensing and prices.

Related

Neo4J online backup — any way to address the security flaw?

If I am to make an online backup using the neo4j-admin backup tool remotely, as is advised by Neo4J, I have to open a public IP and the backup port on my Neo4J application.
However, I don't see neo4j-admin asking for any login credentials, basically making it possible for anybody to access the server and copy all the data while the port is opened.
There is no setting inside the neo4j.conf that would only accept backup requests from a certain address.
So what does it mean? When the online backups are done remotely, as is advised, the database may be vulnerable to somebody else just copying all the data.
I didn't find anything in Neo4J documentation that addresses this flaw (only a warning) and it looks like in more than 7 years that this feature has been available as a part of the commercial enterprise version there has not been any solution offered for this.
What do you do to protect the DB then? At the moment the only solution seems to not back it up remotely, but that causes additional stress on the server and is not the best solution. Plus the online backup is not stable when done locally for large DBs. Another solution could be to only open the port remotely via some kind of API to the server, but that may still be exploited if somebody figures out the time frame when the backup is made.
The documentation states that ne04j-admin must be invoked as the neo4j user. That is the user that owns the neo4j executables and the databases. So the security is handled by the OS login and the file permissions should be set to prevent unathorised access to the neo4j directories/files including the neo4j-admin executable.

Cytoscape installation

We would like to install Cytoscape on a server (Windows or Linux) and allow multiple users to connect simultaneously using a client software. Is this possible, and if so, what are requirements in regards to the server and the client software?
Certainly that's possible. There aren't any particular requirements, but keep in mind that for large networks, Cytoscape likes lots of memory. At any rate, we use X2Go on our Linux servers and it works really well for Cytoscape. Keep in mind, however, that a CytoscapeConfiguration directory will be created in each user's home and that apps and configuration information will be stored there.

Amazon Windows Services for VPS, please advise

I need the VPS services for hosting my ASP.NET project.
However, it's not just asp.net hosting, I also need SQL Server, RabbitMq and either my running conrole app or my windows service.
So I read the suggestions to use Amazon Web Services as they provide first year for free.
However when I registered I found that I don't have a clue of where I am:
I don't see the option of creating a virtual machine with Windows
I don't see the option of setting up SQL Server on such the machine
and so on.
So I was wondering whether I'm in the right place?
Please advise if AWS can provide me with what I need or I came to the wrong place?
AWS can provide all that you listed, but you'll need to do some learning on your end.
Basically you create an EC2 instance, and then use RDP to remote into it, and you can install software and configure it to your hearts content - just like it was any other physical server.
If you want to use SQL Server, you'll have the choice of installing it directly on the instance using your own license, or using their 'hosted' version of SQL Server call RDS. You'll need to read about it and decide which option is better for your project - there is no single right way.
Lastly, I will point out that although the 'free-tier' is nice, except for a really small application (i.e. small db on a low traffic website), you may find out the 'free-tier' does not quite give you all the power you need to run a busy application. I would not base your decision on wether or not you should use AWS on how much 'free' stuff you can get. The free-tier is nice for learning, but plan on spending some money for a truly robust solution.

Neo4j Server vs Embedded mode

I wanted to know exactly what is meant by neo4j server and the embedded mode. Even i gone through the post Neo4j Server vs. Embedded. But i dint get clearly those concepts. I have installed neo4j 2.1.1 on windows 64bit machine which is a neo4j server. So when neo4j embedded mode will come into picture?
Also how can we switch between embedded mode to server mode or vice-versa?
When i was working with mysql to neo4j migration(using batch-import), after importing the nodes and relationships into neo4j getting a message in a messages.log file as below:
Clean shutdown on BatchInserter(EmbeddedBatchInserter[C:\Users\Neo4j\t2.db])
How embedded is appearing here if i have installed neo4j server ? So please clarify these queries.
Thanks
Embedded databases run inside of your application, meaning they're in the same JVM as your application. In general, with embedded databases you'll do direct database access or cypher queries. There are a lot of pros and cons here - one of the cons is that your JVM process locks the database; you can't have a bunch of different applications in different JVMs accessing the same embedded database at the same time. The pro is direct access.
When you're running a server, usually that means you're using the web admin components which also provide a set of RESTful services. The pro of this is that it's in a different JVM. Meaning you could access it more easily from other programming languages, over the network, and so on. You could have many applications in many JVMs all talking to a server instance via RESTful services. Generally access isn't as fast, but it's more flexible. When you run it this way though, direct access to the graph inside of a java application (using the Neo4J API) is off limits.
If you want to run the web admin/GUI stuff and RESTful services from within an embedded database, you can do that. See these instructions for how.
Here's a code snippet: what you need is the WrappingNeoServerBootstrapper.
AbstractGraphDatabase graphdb = getGraphDb();
WrappingNeoServerBootstrapper srv;
srv = new WrappingNeoServerBootstrapper( graphdb );
srv.start();
// The server is now running
// until we stop it:
srv.stop();

offline web application design recommendation

I want to know which is the best architecture to adopt for this case :
I have many shops that connect to a web application developed using Ruby on Rails.
internet is not reachable all the time
The solution was to develop an offline system which requires installing a local copy of the distant database.
All this wad already developed.
Now what I want to do :
Work always on the local copy of the database.
Any change on the local database should be synchronized with distant database.
All the local copies should have the same data in other local copies.
To resolve this problem I thought about using a JMS like software eventually Rabbit MQ.
This consists on pushing any sql request into a JMS queue that will be executed on the distant instance of the application which will insert into the distant DB and push the insert or SQL statement into another queue that will be read by all the local instances. This seems complicated and should slow down the application.
Is there a design or recommendation that I must apply to resolve this kind of problem ?
You can do that but essentially you are developing your own replication engine. Those things can be a bit tricky to get right (what happens if m1 and m3 are executed on replica r1, but m2 isn't?) I wouldn't want to develop something like that unless you are sure you have the resources to make it work.
I would look into existing off-the shelf replication solution. If you are already using a SQL DB it probably has some support for it. Look here for more details if you are using MySQL
Alternatively, if you are willing to explore other backends, I heard that CouchDB has great support for replication. I also heard of people using git libraries to do that sort of thing.
Update: After your comment, I realize you already use MySql replication and are looking for solution for re-syncing the databases after being offline.
Even in that case RabbitMQ doesn't help you at all since it requires constant connection to work, so you are back to square one. Easiest solution would be to just write all the changes (SQL commands) into a text file at a remote location, then when you get connection back copy that file (scp, ftp, emaill or whatever) to master server, run all the commands there and then just resync all the replicas.
Depending on your specific project you may also need to make sure there are no conflicts when running commands from different remote location but there is no general technical solution to this. Again, depending on the project, you may want to cancel one of the transactions, notify the users that it happened and so on.
I would recommend taking a look at CouchDB. It's a non-SQL database that does exactly what you are describing automatically. It's used especially in phone applications that often don't have internet or data connectivity. The idea is that you have a local copy of a CouchDB database and one or more remote CouchDB databases. The CouchDB server then takes care of teh replication of the distributed systems and you always work off your local database. This approach is nice because you don't have to build your own distributed replication engine. For more details I would take a look at the 'Distributed Updates and Replication' section of their documentation.

Resources