Stored procedures in gremlin CosmosDB - stored-procedures

I'm new to gremlin and cosmos DB and was trying to use Stored Procedure in cosmos DB gremlin API.
I started with Quick-start-nodejs doc for creating a node.js app, connected with CosmosDB gremlin API. Now I want to use Stored Procedure in that app.
I found only single doc for stored procedures in cosmos DB, and that's only for Document DB (in SQL). I didn't found any doc related to stored procedure in gremlin.
Can anyone guide me, how to do that?
Thanks in advance.

I had the same problem as you. And found that Cosmos DB in Gremlin or Graph mode does not support stored procedures. You can create them from UI but because you can not use any Gremlin query or have trigger - they are useless. Also there is no documentation on it.
I have found a post from March 2019 that stored procedures are on the roadmap for Germlin
https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/20115355-gremlin-queries-from-stored-procedures
Personally, for my use case I am considering use of Neo4j instead of Cosmos because of the lack of stored procedures

What's your use case?
Gremlin is a language for traversing graph. Gremlin has no knowledge of CosmosDB stored procedure and hence you can't really execute stored procedure via gremlin.
However, CosmosDB is multi-model. You can talk to it via gremlin as well as native DocumentDB API.
You should look into how to execute stored procedure via DocumentDB API.

Based on your comment on the first answer to the question: "Actually I want to add some edges every time a new vertex is created. For example, whenever a vertex with the label EMPLOYEE is created, an edge to the vertex COMPANY must be automatically created." here, you can look into TinkerPop's EventStrategy.
EDIT:
Adding essential parts from the link above in case the link changes:
The purpose of the EventStrategy is to raise events to one or more MutationListener objects as changes to the underlying Graph occur within a Traversal. Such a strategy is useful for logging changes, triggering certain actions based on change, or any application that needs notification of some mutating operation during a Traversal. If the transaction is rolled back, the event queue is reset.
The following events are raised to the MutationListener:
New vertex
New edge
Vertex property changed
Edge property changed
Vertex property removed
Edge property removed
Vertex removed
Edge removed
To start processing events from a Traversal first implement the MutationListener interface. An example of this implementation is the ConsoleMutationListener which writes output to the console for each event. The following console session displays the basic usage:
gremlin> import org.apache.tinkerpop.gremlin.process.traversal.step.util.event.*
==>org.apache.tinkerpop.gremlin.structure.*, org.apache.tinkerpop.gremlin.structure.util.*, org.apache.tinkerpop.gremlin.process.traversal.*, org.apache.tinkerpop.gremlin.process.traversal.step.*, org.apache.tinkerpop.gremlin.process.remote.*, org.apache.tinkerpop.gremlin.structure.util.empty.*, org.apache.tinkerpop.gremlin.structure.io.*, org.apache.tinkerpop.gremlin.structure.io.graphml.*, org.apache.tinkerpop.gremlin.structure.io.graphson.*, org.apache.tinkerpop.gremlin.structure.io.gryo.*, org.apache.commons.configuration.*, org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.*, org.apache.tinkerpop.gremlin.process.traversal.strategy.optimization.*, org.apache.tinkerpop.gremlin.process.traversal.strategy.finalization.*, org.apache.tinkerpop.gremlin.process.traversal.strategy.verification.*, org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.*, org.apache.tinkerpop.gremlin.process.traversal.util.*, org.apache.tinkerpop.gremlin.process.computer.*, org.apache.tinkerpop.gremlin.process.computer.bulkdumping.*, org.apache.tinkerpop.gremlin.process.computer.bulkloading.*, org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.*, org.apache.tinkerpop.gremlin.process.computer.traversal.*, org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.*, org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.optimization.*, org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.decoration.*, org.apache.tinkerpop.gremlin.util.*, org.apache.tinkerpop.gremlin.util.iterator.*, static org.apache.tinkerpop.gremlin.structure.io.IoCore.*, static org.apache.tinkerpop.gremlin.process.traversal.P.*, static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*, static org.apache.tinkerpop.gremlin.process.computer.Computer.*, static org.apache.tinkerpop.gremlin.util.TimeUtil.*, static org.apache.tinkerpop.gremlin.process.traversal.SackFunctions.Barrier.*, static org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality.*, static org.apache.tinkerpop.gremlin.structure.Column.*, static org.apache.tinkerpop.gremlin.structure.Direction.*, static org.apache.tinkerpop.gremlin.process.traversal.Operator.*, static org.apache.tinkerpop.gremlin.process.traversal.Order.*, static org.apache.tinkerpop.gremlin.process.traversal.Pop.*, static org.apache.tinkerpop.gremlin.process.traversal.Scope.*, static org.apache.tinkerpop.gremlin.structure.T.*, static org.apache.tinkerpop.gremlin.process.traversal.step.TraversalOptionParent.Pick.*, org.apache.tinkerpop.gremlin.driver.*, org.apache.tinkerpop.gremlin.driver.exception.*, org.apache.tinkerpop.gremlin.driver.message.*, org.apache.tinkerpop.gremlin.driver.ser.*, org.apache.tinkerpop.gremlin.driver.remote.*, groovyx.gbench.*, groovyx.gprof.*, static groovyx.gprof.ProfileStaticExtension.*, org.apache.tinkerpop.gremlin.giraph.process.computer.*, org.apache.hadoop.conf.*, org.apache.hadoop.hdfs.*, org.apache.hadoop.fs.*, org.apache.hadoop.util.*, org.apache.hadoop.io.*, org.apache.hadoop.io.compress.*, org.apache.hadoop.mapreduce.lib.input.*, org.apache.hadoop.mapreduce.lib.output.*, org.apache.tinkerpop.gremlin.hadoop.*, org.apache.tinkerpop.gremlin.hadoop.structure.*, org.apache.tinkerpop.gremlin.hadoop.structure.util.*, org.apache.tinkerpop.gremlin.hadoop.structure.io.*, org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.*, org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.*, org.apache.tinkerpop.gremlin.hadoop.structure.io.script.*, org.apache.tinkerpop.gremlin.hadoop.process.computer.mapreduce.*, org.apache.tinkerpop.gremlin.groovy.jsr223.dsl.credential.*, static org.apache.tinkerpop.gremlin.groovy.jsr223.dsl.credential.CredentialGraph.*, org.apache.tinkerpop.gremlin.neo4j.structure.*, static org.apache.tinkerpop.gremlin.neo4j.process.traversal.LabelP.*, org.apache.tinkerpop.gremlin.spark.process.computer.*, org.apache.tinkerpop.gremlin.spark.structure.*, org.apache.tinkerpop.gremlin.spark.structure.io.*, org.apache.tinkerpop.gremlin.tinkergraph.structure.*, org.apache.tinkerpop.gremlin.tinkergraph.process.computer.*, org.apache.tinkerpop.gremlin.process.traversal.step.util.event.*
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> l = new ConsoleMutationListener(graph)
==>MutationListener[tinkergraph[vertices:6 edges:6]]
gremlin> strategy = EventStrategy.build().addListener(l).create()
==>EventStrategy
gremlin> g = graph.traversal().withStrategies(strategy)
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.addV().property('name','stephen')
Vertex [v[13]] added to graph [tinkergraph[vertices:7 edges:6]]
==>v[13]
gremlin> g.E().drop()
Edge [e[7][1-knows->2]] removed from graph [tinkergraph[vertices:7 edges:6]]
Edge [e[8][1-knows->4]] removed from graph [tinkergraph[vertices:7 edges:5]]
Edge [e[9][1-created->3]] removed from graph [tinkergraph[vertices:7 edges:4]]
Edge [e[10][4-created->5]] removed from graph [tinkergraph[vertices:7 edges:3]]
Edge [e[11][4-created->3]] removed from graph [tinkergraph[vertices:7 edges:2]]
Edge [e[12][6-created->3]] removed from graph [tinkergraph[vertices:7 edges:1]]
By default, the EventStrategy is configured with an EventQueue that raises events as they occur within execution of a Step. As such, the final line of Gremlin execution that drops all edges shows a bit of an inconsistent count, where the removed edge count is accounted for after the event is raised. The strategy can also be configured with a TransactionalEventQueue that captures the changes within a transaction and does not allow them to fire until the transaction is committed.
WARNING
EventStrategy is not meant for usage in tracking global mutations across separate processes. In other words, a mutation in one JVM process is not raised as an event in a different JVM process. In addition, events are not raised when mutations occur outside of the Traversal context.

Related

Read uncommitted using Firebird FIBPlus components

Using a TpFIBTransaction component, I'm trying to start a READ UNCOMMITTED transaction.
First of all, the TPBMode property has 3 possible values:
tpbDefault
tpbReadCommitted
tpbRepeatableRead
In TpFIBTransaction.StartTransaction I saw that setting tpbReadCommitted forces the following parameters:
write
isc_tpb_nowait
read_committed
rec_version
Using tpbRepeatableRead forces the following parameters instead:
write
isc_tpb_nowait
concurrency
So, it seems the only way to have "custom" transaction parameters is to set the tpbDefault value.
The values allowed for the TrParams property are the following (from fib.pas unit)
TPBConstantNames: array[1..isc_tpb_last_tpb_constant] of String = (
'consistency',
'concurrency',
'shared',
'protected',
'exclusive',
'wait',
'nowait',
'read',
'write',
'lock_read',
'lock_write',
'verb_time',
'commit_time',
'ignore_limbo',
'read_committed',
'autocommit',
'rec_version',
'no_rec_version',
'restart_requests',
'no_auto_undo',
'no_savepoint'
);
I've tried adding the 'read' value only, but it seems it's still unable to read uncommitted data, even if there's not 'read_committed' in TrParams property.
MyTransaction.TrParams.Clear();
MyTransaction.TrParams.Add('read');
Is there some missing value in TPBConstantNames (Something like 'read_uncommitted', if it exists...), or is there another way to setup a Firebird "read uncommitted" transaction?
It is not possible because Firebird does not support read uncomitted isolation level.
You can find the following information in the documentation documentation:
Note
The READ UNCOMMITTED isolation level is a synonym for READ
COMMITTED, and provided only for syntax compatibility. It provides the
exact same semantics as READ COMMITTED, and does not allow you to view
uncommitted changes of other transactions.
and:
The three isolation levels supported in Firebird are:
SNAPSHOT
SNAPSHOT TABLE STABILITY
READ COMMITTED with two specifications (NO RECORD_VERSION and
RECORD_VERSION)

neo4j: user defined procedure exception "Write operations are not allowed for `READ` transactions"

I have implemented a user defined procedure using the example template.
The procedure is annotated using "#Procedure(value = "foo.bar", mode = Mode.WRITE)", nevertheless, when I try to execute an operation on a Node instance that modifies the graph, it fails with "Write operations are not allowed for READ transactions".
The node instance was obtained via db.findNode(...), and the write-operation that I am attempting to execute is nodeinstance.createRelationshipTo(...)
Interestingly, the code works fine when run in the context of the neo4j testing harness.
Any help greatly appreciated!
From inspecting the APOC user defined procedures, I learned the answer. I am using neo4j 3.0.7, for 3.0.x a procedure that wants to write to the graph must be annotated with "#PerformsWrites", as well as "#Procedure". The mode argument", mode = Mode.WRITE", is for 3.1, and "#PerformsWrites" is for 3.0.x -- learned this from Stefan Armbruster

Can Dataflow sideInput be updated per window by reading a gcs bucket?

I’m currently creating a PCollectionView by reading filtering information from a gcs bucket and passing it as side input to different stages of my pipeline in order to filter the output. If the file in the gcs bucket changes, I want the currently running pipeline to use this new filter info. Is there a way to update this PCollectionView on each new window of data if my filter changes? I thought I could do it in a startBundle but I can’t figure out how or if it’s possible. Could you give an example if it is possible.
PCollectionView<Map<String, TagObject>>
tagMapView =
pipeline.apply(TextIO.Read.named("TagListTextRead")
.from("gs://tag-list-bucket/tag-list.json"))
.apply(ParDo.named("TagsToTagMap").of(new Tags.BuildTagListMapFn()))
.apply("MakeTagMapView", View.asSingleton());
PCollection<String>
windowedData =
pipeline.apply(PubsubIO.Read.topic("myTopic"))
.apply(Window.<String>into(
SlidingWindows.of(Duration.standardMinutes(15))
.every(Duration.standardSeconds(31))));
PCollection<MY_DATA>
lineData = windowedData
.apply(ParDo.named("ExtractJsonObject")
.withSideInputs(tagMapView)
.of(new ExtractJsonObjectFn()));
You probably want something like "use an at most a 1-minute-old version of the filter as a side input" (since in theory the file can change frequently, unpredictably, and independently from your pipeline - so there's no way really to completely synchronize changes of the file with the behavior of the pipeline).
Here's a (granted, rather clumsy) solution I was able to come up with. It relies on the fact that side inputs are implicitly also keyed by window. In this solution we're going to create a side input windowed into 1-minute fixed windows, where each window will contain a single value of the tag map, derived from the filter file as-of some moment inside that window.
PCollection<Long> ticks = p
// Produce 1 "tick" per second
.apply(CountingInput.unbounded().withRate(1, Duration.standardSeconds(1)))
// Window the ticks into 1-minute windows
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
// Use an arbitrary per-window combiner to reduce to 1 element per window
.apply(Count.globally());
// Produce a collection of tag maps, 1 per each 1-minute window
PCollectionView<TagMap> tagMapView = ticks
.apply(MapElements.via((Long ignored) -> {
... manually read the json file as a TagMap ...
}))
.apply(View.asSingleton());
This pattern (joining against slowly changing external data as a side input) is coming up repeatedly, and the solution I'm proposing here is far from perfect, I wish we had better support for this in the programming model. I've filed a BEAM JIRA issue to track this.

Neo4j Java VM Tuning (v2.3 Community)

From what I can tell I'm having an issue with my Neo4j v2.3 Community Java VM adding items to the Old Gen Heap and never being able to Garbage Collecting them.
Here is a detailed outline of the situation.
I have a PHP file which calls the Dropbox Delta API and writes out the file structure to my Neo4j Database. Each call to Delta returns a 2000 Item data sets of which I pull out the information I need, the following is an example of what that query looks like with just one item, usually I send in full batches of 2000 items as it gave me the best results.
***Following is an example Query***
MERGE (c:Cloud { type:'Dropbox', id_user:'15', id_account:''})
WITH c
UNWIND [
{ parent_shared_folder_id:488417928, rev:'15e1d1caa88',.......}
]
AS items MERGE (i:Item { id:items.path, id_account:'', id_user:'15', type:'Dropbox' })
ON Create SET i = { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
ON Match SET i+= { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
MERGE (p:Item {id_user:'15', id:items.parentPath, id_account:'', type:'Dropbox'})
MERGE (p)-[:Contains]->(i)
MERGE (c)-[:Owns]->(i)
***The query is sent via Everyman****
static function makeQuery($client, $qry) {
return new Everyman\Neo4j\Cypher\Query($client, $qry);
}
This works fine and generally from start to finish takes 8-10 seconds to run.
The Dropbox account I am accessing contains around 35000 items, and takes around 18 runs of my PHP to populate my Neo4j Database with the folder/file structure of the dropbox account.
With every run of this PHP, around 50mb of items are added to the Neo4j JVM Old Gen heap, 30mb of that is not removed by GC.
The end result is obviously the VM runs out of memory and gets stuck in a constant state of GC throttling.
I have tried a range of Neo4j VM settings, as well as an update from Neo4j v2.2.5 to v2.3, which actually has appeared to make the problem worse.
My current settings are as follows,
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=3072m
-XX:MaxNewSize=3072m
-XX:SurvivorRatio=1
I am testing on a windows 10 PC with 8GB of ram and an i5 2.5GHz quad core. Java 1.8.0_60
Any info on how to solve this issue would be greatly appreciated.
Cheers, Jack.
Reduce the new size to 1024m
change your settings to:
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=1024m
It is most likely that the size of your tx grows too large.
I recommend sending each of the parents in separately, so instead of the UNWIND sent one statement each.
Make sure to use the new transactional http endpoint, I recommend to go wit neoclient instead of Neo4jPHP
You should also use parameters instead of literal values!!!
And don't repeeat user-id and type etc. properties on every item.
Are you sure you want to connect everything to c not just the root of the directory structure? I would do the latter.
MERGE (c:Cloud:Dropbox { id_user:{userId}})
MERGE (p:Item:Dropbox {id:{parentPath}})
// owning the parent should be good enough
MERGE (c)-[:Owns]->(p)
WITH c
UNWIND {items} as item
MERGE (i:Item:Dropbox { id:item.path})
ON Create SET i += { is_dir:item.is_dir, name:item.name, created_at:item.created_at }
SET i += { description:item.description, size:item.size, modified:items.modified, processed:timestamp()}
MERGE (p)-[:Contains]->(i);
Make sure to use 2.3.0 for best MERGE performance for relationships.

Using Neo4j with React JS

Can we use graph database neo4j with react js? If not so is there any alternate option for including graph database in react JS?
Easily, all you need is neo4j-driver: https://www.npmjs.com/package/neo4j-driver
Here is the most simplistic usage:
neo4j.js
//import { v1 as neo4j } from 'neo4j-driver'
const neo4j = require('neo4j-driver').v1
const driver = neo4j.driver('bolt://localhost', neo4j.auth.basic('username', 'password'))
const session = driver.session()
session
.run(`
MATCH (n:Node)
RETURN n AS someName
`)
.then((results) => {
results.records.forEach((record) => console.log(record.get('someName')))
session.close()
driver.close()
})
It is best practice to close the session always after you get the data. It is inexpensive and lightweight.
It is best practice to only close the driver session once your program is done (like Mongo DB). You will see extreme errors if you close the driver at a bad time, which is incredibly important to note if you are beginner. You will see errors like 'connection to server closed', etc. In async code, for example, if you run a query and close the driver before the results are parsed, you will have a bad time.
You can see in my example that I close the driver after, but only to illustrate proper cleanup. If you run this code in a standalone JS file to test, you will see node.js hangs after the query and you need to press CTRL + C to exit. Adding driver.close() fixes that. Normally, the driver is not closed until the program exits/crashes, which is never in a Backend API, and not until the user logs out in the Frontend.
Knowing this now, you are off to a great start.
Remember, session.close() immediately every time, and be careful with the driver.close().
You could put this code in a React component or action creator easily and render the data.
You will find it no different than hooking up and working with Axios.
You can run statements in a transaction also, which is beneficial for writelocking affected nodes. You should research that thoroughly first, but transaction flow is like this:
const session = driver.session()
const tx = session.beginTransaction()
tx
.run(query)
.then(// same as normal)
.catch(// errors)
// the difference is you can chain multiple transactions:
const tx1 = await tx.run().then()
// use results
const tx2 = await tx.run().then()
// then, once you are ready to commit the changes:
if (results.good !== true) {
tx.rollback()
session.close()
throw error
}
await tx.commit()
session.close()
const finalResults = { tx1, tx2 }
return finalResults
// in my experience, you have to await tx.commit
// in async/await syntax conditions, otherwise it may not commit properly
// that operation is not instant
tl;dr;
Yes, you can!
You are mixing two different technologies together. Neo4j is graph database and React.js is framework for front-end.
You can connect to Neo4j from JavaScript - http://neo4j.com/developer/javascript/
Interesting topic. I am using the driver in a React App and recently experienced some issues. I am closing the session every time a lifecycle hook completes like in your example. When there where more intensive queries I would see a timeout error. Going back to my setup decided to experiment by closing the driver in some more expensive queries and it looks like (still need more testing) the crashes are gone.
If you are deploying a real-world application I would urge you to think about Authentication and Authorization when using a DB-React setup only as you would have to store username/password of the neo4j server in the client. I am looking into options of having the Neo4J server issuing a token and receiving it for Authorization but the best practice is for sure to have a Node.js server in the middle with something like Passport to handle Authentication.
So, all in all, maybe the best scenario is to only use the driver in Node and have the browser always communicating with the Node server using axios...

Resources