Insert/update docs in neo4j using Couchbase - neo4j

I want to insert/update the document in Couchbase from their it should be automatically inserted/updated to neo4j database. Is their any plugin or software to do the same? How can I achieve this functionality?
Couchbase enterprise version: 6.6
Neo4j enterprise version: 4.1.3
I read this blog https://dzone.com/articles/couchbase-amp-jdbc-integrations-for-neo4j-3x but I am not getting clarity over Neo4jJSON Loader, please guide me for the same.

You could also use the Couchbase Eventing Service which will respond to any mutation and trigger a fragment of JavaScript code. Refer to https://docs.couchbase.com/server/current/eventing/eventing-overview.html
Now you would probably want to utilize something similar to the code in this scriptlet example: https://docs.couchbase.com/server/current/eventing/eventing-handler-curl-post.html provided that the Neo4j REST API has a sub 1 ms performance and honors KeepAlive a 12 physical core system could stream about 40K inserts (or updates) per second from Couchbase to your Neo4j instance.

You can use the Couchbase Kafka connector to send CDC events to Kafka.
https://docs.couchbase.com/kafka-connector/current/quickstart.html
From there, you can read the kafka topics in order to import the data into Neo4j :
https://github.com/neo4j-contrib/neo4j-streams

Related

How to use EmbeddedKsql in SpringBoot application?

I have Kafka Streams java Application up and running. I was trying to use KSQL to create simple queries and Use Kafka Stream for complex solution. I wanted to run both KSQL and Kafka Streams as
Java application.
I was going to through https://github.com/confluentinc/ksql/blob/master/ksqldb-examples/src/main/java/io/confluent/ksql/embedded/EmbeddedKsql.java. is there any documentation for EmbeddedKsql ? or any working prototype ?
KsqlDb 0.10 has been launched and of the newest features in it is the Java Client.
Please go through - https://www.confluent.io/blog/ksqldb-0-10-0-latest-features-updates/
The KsqlDB server does not have a supported Java API at this time. The project doesn't offer any guarantees of maintaining compatibility between releases.
If you were to run ksqlDB embedded in your Java application then KsqlContext would be the class to play around with. But I'm not sure how up-to-date it is, nor can I guarantee it won't be removed in a future release. I'm afraid there there aren't any documentation or examples to look at, as it's not a supported use.
The only supported way to communicate with ksqlDB is really through its HTTP endpoints. You could still embed the server in your own Java app and talk locally of HTTP, though running them in separate JVMs has many benefits.

web logs parsing for Spark Streaming

I plan to create a system where I can read web logs in real time, and use apache spark to process them. I am planning to use kafka to pass the logs to spark streaming to aggregate statistics.I am not sure if I should do some data parsing (raw to json ...), and if yes, where is the appropriate place to do it (spark script, kafka, somewhere else...) I will be grateful if someone can guide me. Its kind of a new stuff to me. Cheers
Apache Kafka is a distributed pub-sub messaging system. It does not provide any way to parse or transform data it is not for that. But any Kafka consumer can process, parse or transform the data published to Kafka and republished the transformed data to another topic or store it in a database or file system.
There are many ways to consume data from Kafka one way is the one you suggested, real-time stream processors(apache flume, apache-spark, apache storm,...).
So the answer is no, Kafka does not provide any way to parse the raw data. You can transform/parse the raw data with spark but as well you can write your own consumer as there are many Kafka clients ports or use any other built consumer Apache flume, Apache storm, etc

Neo4j Server vs Embedded mode

I wanted to know exactly what is meant by neo4j server and the embedded mode. Even i gone through the post Neo4j Server vs. Embedded. But i dint get clearly those concepts. I have installed neo4j 2.1.1 on windows 64bit machine which is a neo4j server. So when neo4j embedded mode will come into picture?
Also how can we switch between embedded mode to server mode or vice-versa?
When i was working with mysql to neo4j migration(using batch-import), after importing the nodes and relationships into neo4j getting a message in a messages.log file as below:
Clean shutdown on BatchInserter(EmbeddedBatchInserter[C:\Users\Neo4j\t2.db])
How embedded is appearing here if i have installed neo4j server ? So please clarify these queries.
Thanks
Embedded databases run inside of your application, meaning they're in the same JVM as your application. In general, with embedded databases you'll do direct database access or cypher queries. There are a lot of pros and cons here - one of the cons is that your JVM process locks the database; you can't have a bunch of different applications in different JVMs accessing the same embedded database at the same time. The pro is direct access.
When you're running a server, usually that means you're using the web admin components which also provide a set of RESTful services. The pro of this is that it's in a different JVM. Meaning you could access it more easily from other programming languages, over the network, and so on. You could have many applications in many JVMs all talking to a server instance via RESTful services. Generally access isn't as fast, but it's more flexible. When you run it this way though, direct access to the graph inside of a java application (using the Neo4J API) is off limits.
If you want to run the web admin/GUI stuff and RESTful services from within an embedded database, you can do that. See these instructions for how.
Here's a code snippet: what you need is the WrappingNeoServerBootstrapper.
AbstractGraphDatabase graphdb = getGraphDb();
WrappingNeoServerBootstrapper srv;
srv = new WrappingNeoServerBootstrapper( graphdb );
srv.start();
// The server is now running
// until we stop it:
srv.stop();

Benefit of Apache Flume

I am new with Apache Flume.
I understand that Apache Flume can help transport data.
But I still fail to see the ultimate benefit offered by Apache Flume.
If I can configure a software or make a software to send which data goes where, why I need Flume?
Maybe someone can explain a situation that shows Apache Flume's benefit?
Reliable transmission (if you use the file channel):
Flume sends batches of small events. Every time it sends a batch to the next node it waits for acknowledgment before deleting. The storage in the file channel is optimized to allow recovery on crash.
I think the biggest benefit that you get out of flume is extensiblity. Basically all components starting from source, interceptor and sink, everything is extensible.
We use flume and read data using custom kakfa source, data is in the form of JSON we parse it in custom kafka source and then pass it on to HDFS sink. It working reliably in 5 of nodes. We extended only kafka source, HDFS sink functionality we got out the box.
At the same time, being from the Hadoop ecosystem, you get great community support and multiple options to use the tools in different ways.

How to use ganglia ui with flume?

I am interested in monitoring my multi-agent apache flume setup. I have enabled the inbuilt ganglia server which provides me the flume metrics through JSON data. Now I am interested in viewing these info in graphs/charts. TO achieve this I am using ganglia web ui, I have these questions - Do I have to install gmond and gmetad to achieve it, if not then how I will use the existing ganglia info with the ganglia web ui ?
Thanks in advance.
You'll need both, IMHO. Moreover, I think Flume can communicate directly to a gmond by appending some stuff in JAVA_OPTS, see hortonworks docs.
You'll need gmetad because it stores your data in RRD files, and the web UI query on it to display graphs.
Graphite can do the job too.

Resources