Hi I am new to HA concepts and Neo4j HA. I have gone through the Neo4j Docs but i still have a couple of questions that come to my mind.
When using a php script to connect to Neo4j database via REST what ip should i use for the cluster. Is there a common ip for the cluster?
I ask this because if the master fails a new neo4j instance becomes the master. how should my script connect to the new master. Should i use third party software for pointing to the new master. can that happen automatically with neo4j through a common cluster ip. pardon me if my concepts are weak, just need some guidance.
How can i direct all reads and writes to the master only and use the slaves only for replication. Or is this the default setting. I see multiple read & multiple write scenarios so i am getting confused.
Is there any doc/material that explains further on setting up an Arbiter Instance or should i just configure 3 node Neo4j HA as explained in http://neo4j.com/docs/stable/ha-setup-tutorial.html and run the below command for one of the instance -
neo4j_home$ ./bin/neo4j-arbiter start
Any help is appreciated. Thanks!
Welcome to the community of Neo4j Users ;)
First I recommend you to look on neo4j-php-client, because it support Neo4j HA cluster and it could solve your question and problems. Instead of finding your own solutions.
Best practice is to use some kind of load balancing front of the Neo4j HA Cluster. Here is the great article about it: http://blog.armbruster-it.de/2015/08/neo4j-and-haproxy-some-best-practices-and-tricks/
You can do that on load balancer level based on HTTP methods (GET redirect to slaves; POST, PUT, DELETE redirect to master). But there is a problem with Cypher endpoint, because it uses only POST method. You can use additional HTTP header to distinguish between read and write request, but that logic must be in your application.
For start it's good enough to start with official documentation.
Resources
Neo4j HA cluster configuration (example)
Neo4j cluster and firewalls
As my friend MicTech mentioned, generally we use HAProxy as load balancer on top of Neo4j.
With the php client mentioned, you have a great configuration mechanism that allows to :
When using HA Proxy, define your read/write queries so it will automatically add a header to the http request. The header is configurable too.
When not using HAProxy, you can in the client setup, define all your neo4j instances and activate the High-Availibility extension (works only with cache enabled). So when the master is down, the client will automatically try to detect the new elected master and rewrite the connections configuration in the cache for further requests.
I tried to make the README as good as possible, please read it and open issues on the repository if there are things that are missing.
https://github.com/graphaware/neo4j-php-client
Related
Preface
I am currently trying to learn how micro-services work and how to implement container replication and API gateways. I've hit a block though.
My Application
I have three main services for my application.
API Gateway
Crawler Manager
User
I will be focusing on the API Gateway and Crawler Manager services for this question.
API Gateway
This is a docker container running a Go server. The communication is all done with GraphQL.
I am using an API Gateway because I expect to have different services in my application each having their own specialized API. This is to unify everything.
All it does is proxy requests to their appropriate service and return a response back to the client.
Crawler Manager
This is another docker container running a Go server. The communication is done with GraphQL.
More or less, this behaves similar to another API gateway. Let me explain.
This service expects the client to send a request like this:
{
# In production 'url' will be encoded in base64
example(url: "https://apple.example/") {
test
}
}
The url can only link to one of these three sites:
https://apple.example/
https://peach.example/
https://mango.example/
Any other site is strictly prohibited.
Once the Crawler Manager service receives a request and the link is one of those three it decides which other service to have the request fulfilled. So in that way, it behaves much like another API gateway, but specialized.
Each URL domain gets its own dedicated service for processing it. Why? Because each site varies quite a bit in markup and each site needs to be crawled for information. Because their markup is varied, I'd like a service for each of them so in case a site is updated the whole Crawler Manager service doesn't go down.
As far as the querying goes, each site will return a response formatted identical to other sites.
Visual Outline
Problem
Now that we have a bit of an idea of how my application works I want to discuss my actual issues here.
Is having a sort of secondary API gateway standard and good practice? Is there a better way?
How can I replicate this system and have multiple Crawler Manager service family instances?
I'm really confused on how I'd actually create this setup. I looked at clusters in Docker Swarm / Kubernetes, but with the way I have it setup it seems like I'd need to make clusters of clusters. That makes me question my design overall. Maybe I need to not think about keeping them so structured?
At a very generic level, if service A calls service B that has multiple replicas B1, B2, B3, ... then it needs to know how to call them. The two basic options are to have some sort of service registry that can return all of the replicas, and then pick one, or to put a load balancer in front of the second service and just directly reach that. Usually setting up the load balancer is a little bit easier: the service call can be a plain HTTP (GraphQL) call, and in a development environment you can just omit the load balancer and directly have one service call the other.
/-> service-1-a
Crawler Manager --> Service 1 LB --> service-1-b
\-> service-1-c
If you're willing to commit to Kubernetes, it essentially has built-in support for this pattern. A Deployment is some number of replicas of identical pods (containers), so it would manage the service-1-a, -b, -c in my diagram. A Service provides the load balancer (its default ClusterIP type provides a load balancer accessible only within the cluster) and also a DNS name. You'd configure your crawler-manager pods with perhaps an environment variable SERVICE_1_URL=http://service-1.default.svc.cluster.local/graphql to connect everything together.
(In your original diagram, each "box" that has multiple replicas of some service would be a Deployment, and the point at the top of the box where inbound connections are received would be a Service.)
In plain Docker you'd have to do a bit more work to replicate this, including manually launching the replicas and load balancers.
Architecturally what you've shown seems fine. The big "if" to me is that you've designed it so that each site you're crawling potentially gets multiple independent crawling containers and a different code base. If that's really justified in your scenario, then splitting up the services this way makes sense, and having a "second routing service" isn't really a problem.
I've created a Neo4j Local Graph DB containing some data that I need to use on a Databricks Notebook to do some graph analysis. I've seen that there's the Neo4j Spark Connector available and I was wondering if it were possible to access my local db using it, I don't have any hosting service available for my database and haven't managed to find one that offers a free trial and it's fairly easy to setup with Neo4j.
Any help would be greatly appreciated, I'm fairly with both Neo4j and Databricks so I hope my question is fairly explained.
If you're running Neo4j on localhost with the default ports, you onl have to configure your password in spark.neo4j.bolt.password=<password>.
Otherwise set the spark.neo4j.bolt.url in your SparkConf pointing e.g. to bolt://host:port.
You can provide user and password as part of the URL bolt://neo4j:<password>#localhost or individually in spark.neo4j.bolt.user and spark.neo4j.bolt.password.
For more details, refer "Neo4j Connector to Apache Spark".
Hope this helps.
This question relates to Umbraco, Umbraco slave instances and the Umbraco Relation Service API.
We currently have a site designated as a master Umbraco instance which handles all updates to content. Our intention is to set up slave instances in different regions around the world behind a traffic manager to improve site performance globally. We've tested and this setup works fine.
As I understand it, the Umbraco slave instances with the "out of the box" slave configuration will not have their own databases, but instead poll a service on the master instance for content.
Question
We were planing on using the Umbraco relations API to relate multilingual content. I understand that this incurs a hit on the database, as our slave websites will not have a database, I presume this won't work.
Is my understanding of the situation correct?
Can the relations API be configured to work in this situation?
If not, is there a recommended alternate approach to managing related content in a way that will be supported by slave servers?
Thanks for any help you can provide, I'd be happy to answer any questions or provide any clarification necessary.
I'm working on an application using Spring Data Neo4j that works with an embedded Neo4j Server. I would like for my application to be able to work with a cluster containing 3 Neo4j nodes, one of this nodes being the embedded server.
I am trying to accomplish some sort of load balancing within the cluster: 1. round-robin requests on each server or 2. write requests on the master embedded server and read requests on the other two servers.
Does Spring Data Neo4j have any kind of load balancing mechanism out of the box? What configuration is necessary to achieve this? Do I need additional tools like HAProxy or mod_proxy? Is there any example of how they can be integrated with the Neo4j cluster and Spring Data Neo4j?
A load balancer component is not part of Neo4j nor part of Spring Data Neo4j. For a sample setup using Neo4j as server is documented at http://docs.neo4j.org/chunked/stable/ha-haproxy.html.
Since your application uses SDN in embedded HA mode, you need to expose the status of your local instance (master or slave) yourself to achieve the same like /db/manage/server/ha/master does in server mode. You might use HighlyAvailableGraphDatabase.isMaster() in your implementation.
I am trying to create a ASP.Net with neo4jclient project to be hosted on the Azure and am kind of unable to grasp how to do the following:
get hold of an neo4j rest endpoint address once the worker role has started. I think I am seeing a different address each time the emulator spins up a instance of worker role. I believe that i'll need this to create an client somewhat like this
neo4jClient = new GraphClient(new Uri("http ://localhost:7474/db/data"));
so any thoughts on how to get hold of the uri after the neo4j is deployed by AzureWorkerHost.
Also how is the graph database persisted on the blob store, in the example its always deploying a new instance of pristine db in the zip and updating, which is probably not correct. I am unable to understand where to configure this.
BTW I am using the Neo4j 2.0 M06 and when it runs in emulator, I get an endpoint somewhat like this http://127.255.0.1:20000 in the emulator log but i am unable to access it from my base machine.
any clue what might be going on here?
Thanks,
Kiran
AzureWorkerHost was a proof of concept that hasn't been touched in a year.
The GitHub readme says:
Just past alpha. Some known deficiencies still. Not quite beta.
You likely don't want to use it.
The preferred way of hosting on Azure these days seems to be IaaS approach inside a VM. (There's a preconfigured one in VM Depot, but that's a little old now too.)
Or, you could use a hosted endpoint from somebody like GrapheneDB.
To answer you question generally though, Azure manages all the endpoints. The worker roles says "hey, I need an endpoint to bind to!" and Azure works that out for it.
Then, you query this from the Web role by interrogating Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.Roles.
You'll likely not want to use the AzureWorkerHost for a production scenario, as the instances in the deployed configuration will destroy your data when they are re-imaged.
Please review these slides that illustrate step-by-step deployment of a Windows Azure Virtual Machine image of Neo4j community edition.
http://de.slideshare.net/neo4j/neo4j-on-azure-step-by-step-22598695
A Neo4j 2.0 Community Virtual Machine image will be released with the official release build of Neo4j 2.0. If you plan to use more than 30GB of data storage, please be aware that the currently supported VM image in Windows Azure's image depot must be configured from console through remote SSH to Linux.
Continue with your development using http://localhost:7474/ and then setup the VM when you are ready for a staging or production build to be deployed.
Also you can use Heroku's free Neo4j database deployment but you must configure the basic authentication for your GraphClient connection in Neo4jClient.