How to batch queries and perform them later in pyneo2 v3?

How to batch queries and perform them later in pyneo2 v3? - py2neo

In py2neo V2, it was possible to do this by using query.append() which add the query to a queue and all the queries can be executed later.
How can I do the same thing in v3? I cannot find an equivalent function in the documentation for this.

The following is in Python 3.6.3 using py2neo version 3.1.2:
from py2neo import Graph
g = Graph(host="localhost", user="neo4j", bolt=True) # Modify for your situation
transaction = g.begin(autocommit=False)
transaction.append("match(n) return count(n);")
This fails and returns the following warning:
/usr/local/bin/ipython3:1: DeprecationWarning: Transaction.append(...) is deprecated, use Transaction.run(...) instead
Hope this helps.

Related

Spark Structured Streaming and Neo4j

My goal is to write transformed data from a MongoDB collection into Neo4j using Spark Structured Streaming. According to the Neo4j docs, this should be possible with the "Neo4j Connector for Apache Spark" version 4.1.2.
Batch queries so far work fine. However, with the following example below, I run into an error message:
spark-shell --packages org.mongodb.spark:mongo-spark-connector:10.0.2,org.neo4j:neo4j-connector-apache-spark_2.12:4.1.2_for_spark_3
val dfTxn = spark.readStream.format("mongodb")
.option("spark.mongodb.connection.uri", "mongodb://<IP>:<PORT>")
.option("spark.mongodb.database", "test")
.option("spark.mongodb.collection", "txn")
.option("park.mongodb.read.readPreference.name","primaryPreferred")
.option("spark.mongodb.change.stream.publish.full.document.only", "true")
.option("forceDeleteTempCheckpointLocation", "true").load()
val query = dfPaymentTx.writeStream.format("org.neo4j.spark.DataSource")
.option("url", "bolt://<IP>:<PORT>")
.option("save.mode", "Append")
.option("checkpointLocation", "/tmp/checkpoint/myCheckPoint")
.option("labels", "Account")
.option("node.keys", "txn_snd").start()
This gives me the following error message:
java.lang.UnsupportedOperationException: Data source org.neo4j.spark.DataSource does not support streamed writing
Although the Connector should officially support streaming starting with version 4.x. Does anybody have an idea what I'm doing wrong?
Thanks in advance!

Incase, if the connector doesnt support streaming writes, you can try something like below.
you can leverage foreachBatch() functionality from spark structured streaming and write the data into Neo4j in batch mode.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch
def process_entry(df, id):
df.write.ToNeo4j(url=url, table="mytopic", mode="append", properties=props)
query = df.writeStream.foreachBatch(process_entry).start()
In the above code, you can have your Neo4j Writer logic and you can write the data into database using batch mode.

Prevent writing back for Neo4j algo

I work with a Neo4j database where I only have read permission.
I'm trying to run some of the algo algorithms. E.g. a community detection algo.scc.
According to the documentation algo.scc has a parameter write which
Specifies if the result should be written back as a node property.
However when I run it with write set to false
CALL algo.scc('Employee','MANAGES', {write:false})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize;
I get the following error:
Neo.ClientError.Security.Forbidden: Write operations are not allowed for user 'dm00221' with roles [reader].
I couldn't find any examples in documentation with {write:false} option.
What am I doing wrong?

Try this if it helps.
CALL algo.scc.stream('Employee', 'MANAGES', {concurrency:4})
YIELD nodeId, partition

BigQueryIO.Write in dataflow 2.X

Below code worked for Dataflow 1.9 sdk, migrating to 2.X
PCollection<TableRow> tableRow = ...
tableRow.apply(BigQueryIO.Write()
.to(String.format("%1$s:%2$s.%3$s",projectId, bqDataSet, bqTable))
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
I get
The method apply(PTransform<? super PCollection<TableRow>,OutputT>) in the type PCollection<TableRow> is not applicable for the arguments (BigQueryIO.Write<Object>)
Release notes are not much of a help here and documentation on 2.X is non existant redirects to beam API page.

Have you tried using BigqueryIO.writeTableRows()?
Apache Beam 2.1.0 BigqueryIO documentation
https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html

You can try providing TableRow type explicitly (BigQuery.<TableRow>write()...) or use BigQuery.writeTableRows() as suggested above.
Looks like the interface was made generic in 2.x. Earlier version had TableRow hard coded.

What is the replacement for newEmbeddedDatabaseBuilder function?

I would like to work with Neo4j packages for java.
I see that the function newEmbeddedDatabaseBuilder is deprecated.
What is the best way to work now with Neo4j using java code?
thanks

In Neo4j 3.0, you'll use the GraphDatabaseFactory-
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
The Neo4j Java manual is available here: http://neo4j.com/docs/java-reference/current/#tutorials-java-embedded

How to list all installed neo4j server plugins from web interface or shell?

Right now I'm using the neo4jrestclient python package to list extensions:
from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")
ext = gdb.extensions
Is there a direct shell command I can use do this? I also don't see anything on the web interface. I'm using 1.8.
Thanks!

Since this is the top answer in Google and the accepted answer is out of date for v3.0+, here's a new answer.
On this page, they show a number of new procedures, and the one in question to get a list of all procedures in the database (including plugins) is "dmbs.procedures()", and I find it most useful to have the signature of the procedure as well as the name. The query for that is:
CALL dbms.procedures() YIELD name, signature RETURN name, signature

Run
curl -v http://localhost:7474/db/data/
From the command line. Extensions aren't available on the web interface.

You have the extensions documentation for neo4j-rest-client:
>>> gdb.extensions
{u'GetAll': <Neo4j ExtensionModule: [u'get_all_nodes', u'getAllRelationships']>}
>>> gdb.extensions.GetAll
<Neo4j ExtensionModule: [u'get_all_nodes', u'getAllRelationships']>
>>> gdb.extensions.GetAll.getAllRelationships()[:]
[<Neo4j Relationship: http://localhost:7474/db/data/relationship/0>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/1>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/2>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/3>]

In newer versions, it's
SHOW PROCEDURES yield name, description, signature

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to batch queries and perform them later in pyneo2 v3? - py2neo

In py2neo V2, it was possible to do this by using query.append() which add the query to a queue and all the queries can be executed later. How can I do the same thing in v3? I cannot find an equivalent function in the documentation for this.

Related

Spark Structured Streaming and Neo4j

Prevent writing back for Neo4j algo

BigQueryIO.Write in dataflow 2.X

What is the replacement for newEmbeddedDatabaseBuilder function?

How to list all installed neo4j server plugins from web interface or shell?

Categories

Resources