Connecting Local Neo4j Graph to Databricks Cluster - neo4j

I've created a Neo4j Local Graph DB containing some data that I need to use on a Databricks Notebook to do some graph analysis. I've seen that there's the Neo4j Spark Connector available and I was wondering if it were possible to access my local db using it, I don't have any hosting service available for my database and haven't managed to find one that offers a free trial and it's fairly easy to setup with Neo4j.
Any help would be greatly appreciated, I'm fairly with both Neo4j and Databricks so I hope my question is fairly explained.

If you're running Neo4j on localhost with the default ports, you onl have to configure your password in spark.neo4j.bolt.password=<password>.
Otherwise set the spark.neo4j.bolt.url in your SparkConf pointing e.g. to bolt://host:port.
You can provide user and password as part of the URL bolt://neo4j:<password>#localhost or individually in spark.neo4j.bolt.user and spark.neo4j.bolt.password.
For more details, refer "Neo4j Connector to Apache Spark".
Hope this helps.

Related

Can an Akka.net node hosted within a container participate in a cluster outside of the container host?

I'm fairly new to Akka.net and I'm a total noob when it comes to containers so please forgive me if this is too simple (but I kind of hope it is).
I'm trying to build a web app cluster using Azure app services. I want the lighthouse to be hosted in an Azure container instance. I've been successful putting the cluster together on my local box (without docker). I've tried standing up a local docker container with port forwarding but I haven't been able to get it to work.
Thanks in advance for your help.
You can definitely do this, but since you're using Azure App Services I'd recommend taking a look at Akka.Management and Akka.Disovery.Azure instead.
This will eliminate the need to use Lighthouse at all - and instead your nodes can form a cluster on Azure App Service by querying a shared Azure Table Storage table instead.
There's a complete Azure App Services demo that shows how to do this here: https://github.com/petabridge/azure-app-service-akkadotnet
And the relevant code is here: https://github.com/petabridge/azure-app-service-akkadotnet/blob/dev/src/Akka.ShoppingCart/Startup.cs
NOTE: this uses the Akka.Hosting methods, which eliminates 99% of HOCON configuration and ties into Microsoft.Extensions for configuration, hosting, and DI. Akka.Hosting is a relatively new package and just hit stable at the end of 2022. You should definitely use it - all of the documentation and examples will be reworked to incorporate it once Akka.NET v1.5 ships at the end of February, 2023.

Neo4j browser and some databse queries

what is the difference between remote and a local graph in neo4j browser. I have searched an answer for this question but didn't get any so please help me .
thanks is advance.
I'm guessing this is asking about the options in the Neo4j Desktop.
A local graph is one you create on the same machine that is running the Neo4j Desktop application.
A remote instance is a Neo4j instance that resides on a different machine, so you need to supply the connection information so a bolt connection can be made so you access the instance.

queries regarding neo4j HA setup

Hi I am new to HA concepts and Neo4j HA. I have gone through the Neo4j Docs but i still have a couple of questions that come to my mind.
When using a php script to connect to Neo4j database via REST what ip should i use for the cluster. Is there a common ip for the cluster?
I ask this because if the master fails a new neo4j instance becomes the master. how should my script connect to the new master. Should i use third party software for pointing to the new master. can that happen automatically with neo4j through a common cluster ip. pardon me if my concepts are weak, just need some guidance.
How can i direct all reads and writes to the master only and use the slaves only for replication. Or is this the default setting. I see multiple read & multiple write scenarios so i am getting confused.
Is there any doc/material that explains further on setting up an Arbiter Instance or should i just configure 3 node Neo4j HA as explained in http://neo4j.com/docs/stable/ha-setup-tutorial.html and run the below command for one of the instance -
neo4j_home$ ./bin/neo4j-arbiter start
Any help is appreciated. Thanks!
Welcome to the community of Neo4j Users ;)
First I recommend you to look on neo4j-php-client, because it support Neo4j HA cluster and it could solve your question and problems. Instead of finding your own solutions.
Best practice is to use some kind of load balancing front of the Neo4j HA Cluster. Here is the great article about it: http://blog.armbruster-it.de/2015/08/neo4j-and-haproxy-some-best-practices-and-tricks/
You can do that on load balancer level based on HTTP methods (GET redirect to slaves; POST, PUT, DELETE redirect to master). But there is a problem with Cypher endpoint, because it uses only POST method. You can use additional HTTP header to distinguish between read and write request, but that logic must be in your application.
For start it's good enough to start with official documentation.
Resources
Neo4j HA cluster configuration (example)
Neo4j cluster and firewalls
As my friend MicTech mentioned, generally we use HAProxy as load balancer on top of Neo4j.
With the php client mentioned, you have a great configuration mechanism that allows to :
When using HA Proxy, define your read/write queries so it will automatically add a header to the http request. The header is configurable too.
When not using HAProxy, you can in the client setup, define all your neo4j instances and activate the High-Availibility extension (works only with cache enabled). So when the master is down, the client will automatically try to detect the new elected master and rewrite the connections configuration in the cache for further requests.
I tried to make the README as good as possible, please read it and open issues on the repository if there are things that are missing.
https://github.com/graphaware/neo4j-php-client

Spring Data Neo4j load balancing

I'm working on an application using Spring Data Neo4j that works with an embedded Neo4j Server. I would like for my application to be able to work with a cluster containing 3 Neo4j nodes, one of this nodes being the embedded server.
I am trying to accomplish some sort of load balancing within the cluster: 1. round-robin requests on each server or 2. write requests on the master embedded server and read requests on the other two servers.
Does Spring Data Neo4j have any kind of load balancing mechanism out of the box? What configuration is necessary to achieve this? Do I need additional tools like HAProxy or mod_proxy? Is there any example of how they can be integrated with the Neo4j cluster and Spring Data Neo4j?
A load balancer component is not part of Neo4j nor part of Spring Data Neo4j. For a sample setup using Neo4j as server is documented at http://docs.neo4j.org/chunked/stable/ha-haproxy.html.
Since your application uses SDN in embedded HA mode, you need to expose the status of your local instance (master or slave) yourself to achieve the same like /db/manage/server/ha/master does in server mode. You might use HighlyAvailableGraphDatabase.isMaster() in your implementation.

AzureWorkerHost get the uri after startup for Neo4jClient

I am trying to create a ASP.Net with neo4jclient project to be hosted on the Azure and am kind of unable to grasp how to do the following:
get hold of an neo4j rest endpoint address once the worker role has started. I think I am seeing a different address each time the emulator spins up a instance of worker role. I believe that i'll need this to create an client somewhat like this
neo4jClient = new GraphClient(new Uri("http ://localhost:7474/db/data"));
so any thoughts on how to get hold of the uri after the neo4j is deployed by AzureWorkerHost.
Also how is the graph database persisted on the blob store, in the example its always deploying a new instance of pristine db in the zip and updating, which is probably not correct. I am unable to understand where to configure this.
BTW I am using the Neo4j 2.0 M06 and when it runs in emulator, I get an endpoint somewhat like this http://127.255.0.1:20000 in the emulator log but i am unable to access it from my base machine.
any clue what might be going on here?
Thanks,
Kiran
AzureWorkerHost was a proof of concept that hasn't been touched in a year.
The GitHub readme says:
Just past alpha. Some known deficiencies still. Not quite beta.
You likely don't want to use it.
The preferred way of hosting on Azure these days seems to be IaaS approach inside a VM. (There's a preconfigured one in VM Depot, but that's a little old now too.)
Or, you could use a hosted endpoint from somebody like GrapheneDB.
To answer you question generally though, Azure manages all the endpoints. The worker roles says "hey, I need an endpoint to bind to!" and Azure works that out for it.
Then, you query this from the Web role by interrogating Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.Roles.
You'll likely not want to use the AzureWorkerHost for a production scenario, as the instances in the deployed configuration will destroy your data when they are re-imaged.
Please review these slides that illustrate step-by-step deployment of a Windows Azure Virtual Machine image of Neo4j community edition.
http://de.slideshare.net/neo4j/neo4j-on-azure-step-by-step-22598695
A Neo4j 2.0 Community Virtual Machine image will be released with the official release build of Neo4j 2.0. If you plan to use more than 30GB of data storage, please be aware that the currently supported VM image in Windows Azure's image depot must be configured from console through remote SSH to Linux.
Continue with your development using http://localhost:7474/ and then setup the VM when you are ready for a staging or production build to be deployed.
Also you can use Heroku's free Neo4j database deployment but you must configure the basic authentication for your GraphClient connection in Neo4jClient.

Resources