In the Tinkerpop or Titan documentation, all operations are based on a sample graph. How to creat a new empty graph to work on?
I am programming in erlang connecting to Tinkergraph, planned to use Titan later in production. There is no erlang driver for both so I am connecting by REST. It is easy to read from graph, but if I want to read from user's input then write into the graph, for example, to create a person named teddy:
screenshot 1
I got those errors. What is the correct way?
Thank you.
Update: For following situation:
23> Newperson=terry.
terry
24> Newperson.
terry
If I want to add this terry, below two will not work. What's the correct way to do it?
screenshot 2
1
TitanGraph titanGraph = TitanFactory.open(config); will open a titan graph without the sample data.
If you have already commited the sample data to your keyspace then you can just change the keyspace defined in your config file.
For example if you are using a cassandra backend you would change storage.cassandra.keyspace=xxxxxx .
You can also clear any keyspace using TitanCleanup.clear(graph);
2
As for the error you are seeing. It looks like you are trying to label your vertex incorrectly. I posted the following and it worked:
{
"gremlin" : "g.addV(label, x).property(y,z)",
"bindings" :
{
"x" : "person",
"y" : "name",
"z" : "Teddy"
}
}
A final note, when you start using Titan 1.0.0 make sure you checkout this section of the tinkerpop docs. Especially make sure to change the channel in the gremlin-server.yaml config to:
channelizer: com.tinkerpop.gremlin.server.channel.HttpChannelizer
Answer to my own question: construct a Body by lists:concat() or ++, then post
Related
My goal is to write transformed data from a MongoDB collection into Neo4j using Spark Structured Streaming. According to the Neo4j docs, this should be possible with the "Neo4j Connector for Apache Spark" version 4.1.2.
Batch queries so far work fine. However, with the following example below, I run into an error message:
spark-shell --packages org.mongodb.spark:mongo-spark-connector:10.0.2,org.neo4j:neo4j-connector-apache-spark_2.12:4.1.2_for_spark_3
val dfTxn = spark.readStream.format("mongodb")
.option("spark.mongodb.connection.uri", "mongodb://<IP>:<PORT>")
.option("spark.mongodb.database", "test")
.option("spark.mongodb.collection", "txn")
.option("park.mongodb.read.readPreference.name","primaryPreferred")
.option("spark.mongodb.change.stream.publish.full.document.only", "true")
.option("forceDeleteTempCheckpointLocation", "true").load()
val query = dfPaymentTx.writeStream.format("org.neo4j.spark.DataSource")
.option("url", "bolt://<IP>:<PORT>")
.option("save.mode", "Append")
.option("checkpointLocation", "/tmp/checkpoint/myCheckPoint")
.option("labels", "Account")
.option("node.keys", "txn_snd").start()
This gives me the following error message:
java.lang.UnsupportedOperationException: Data source org.neo4j.spark.DataSource does not support streamed writing
Although the Connector should officially support streaming starting with version 4.x. Does anybody have an idea what I'm doing wrong?
Thanks in advance!
Incase, if the connector doesnt support streaming writes, you can try something like below.
you can leverage foreachBatch() functionality from spark structured streaming and write the data into Neo4j in batch mode.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch
def process_entry(df, id):
df.write.ToNeo4j(url=url, table="mytopic", mode="append", properties=props)
query = df.writeStream.foreachBatch(process_entry).start()
In the above code, you can have your Neo4j Writer logic and you can write the data into database using batch mode.
I have a situation where I want to import my graph data to database.I am having janusgraph(latest version) running with cassandra(version 3) and elasticsearch(version 6.6.0) using Docker.I have been suggested to use gryo format.So I have tried this command
graph.io(IoCore.gryo()).reader().create().readGraph(ToInputStream.from("my_graph.kryo"), graph);
but ended up with an error
No such property: ToInputStream for class: Script4
The documentation I am following is here.Please take a look and put me in a right procedure. Thanks in advance!
ToInputStream is not a function of Gremlin or JanusGraph. I believe that it is only a function of IBM Compose so unless you are running JanusGraph on that specific platform, this command will not work.
Versions of JanusGraph that utilize TinkerPop 3.4.x will support the io() step and this is the preferred manner in which to load gryo (as well as graphson and graphml) files.
Graph graph = ... // setup JanusGraph instance
GraphTraversalSource g = traversal().withGraph(graph); // might use withRemote() here instead depending on how you are connecting I suppose
g.io("graph.kryo").read().iterate()
Note that if you are connecting remotely - it seems you are sending scripts to the Docker instance given your error - then be sure that that "graph.kryo" file path is accessible to Docker. That's what's nice about ToInputStream from Compose as it allows you to access remote sources.
I'd like to perform an atomic GET in Redis, and if the value returned is equal to some expected value, I'd like to do a SET, but I want to chain all of this together as one atomic operation. (I'm trying to set a flag that indicates whether any process is writing data to disk, as only one process may be permitted to do so.)
Is it possible to accomplish this with Redis?
I have seen documentation on MULTI operations but I haven't seen conditional operations i MULTI operations. Any suggestions others can offer with this would be greatly appreciated!
You can do both the GET and set operations on the redis server itself using Lua scripts. They're atomic and allow you to add logic too.
I ended up using redlock-py, an implementation of the redlock algorithm that the Redis docs recommend for creating write locks: https://redis.io/topics/distlock. The linked article is fantastic reading for anyone looking to create similar write locks in Redis.
redis-if - lua script for "conditional transactions". More convenient than WATCH + MULTY.
You can pass any combination of conditions & followed commands as json object:
const Redis = require('ioredis')
const redis = new Redis()
redis.defineCommand('transaction', { lua: require('redis-if').script, numberOfKeys: 0 })
await redis.set('custom-state', 'initialized')
await redis.set('custom-counter', 0)
// this call will change state and do another unrelated operation (increment) atomically
let success = await redis.transaction(JSON.stringify({
if: [
// apply changes only if this process has acquired a lock
[ 'initialized', '==', [ 'sget', 'custom-state' ] ]
],
exec: [
[ 'set', 'custom-state', 'finished' ],
[ 'incr', 'custom-counter' ]
]
}))
With this script we removed all custom scripting from our projects.
I came across this post looking for a similar type of function, but I didn't see any options that appealed to me. I opted instead to write a small module in Rust that provides this exact type of operation:
https://github.com/KennethWilke/redis-setif
With this module you would do this via:
SETIF <key> <expected> <new>
HSETIF <key> <field> <expected> <new>
You can do this by SET command, with these 2 arguments, which according to the docs here:
GET - return the old string stored at key, or nil if key did not
exist.
NX - Only set the key if it does not already exist.
Since Redis doesn't execute any command while another command is running - you have the 2 operations in an atomic manner.
From what I can tell I'm having an issue with my Neo4j v2.3 Community Java VM adding items to the Old Gen Heap and never being able to Garbage Collecting them.
Here is a detailed outline of the situation.
I have a PHP file which calls the Dropbox Delta API and writes out the file structure to my Neo4j Database. Each call to Delta returns a 2000 Item data sets of which I pull out the information I need, the following is an example of what that query looks like with just one item, usually I send in full batches of 2000 items as it gave me the best results.
***Following is an example Query***
MERGE (c:Cloud { type:'Dropbox', id_user:'15', id_account:''})
WITH c
UNWIND [
{ parent_shared_folder_id:488417928, rev:'15e1d1caa88',.......}
]
AS items MERGE (i:Item { id:items.path, id_account:'', id_user:'15', type:'Dropbox' })
ON Create SET i = { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
ON Match SET i+= { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
MERGE (p:Item {id_user:'15', id:items.parentPath, id_account:'', type:'Dropbox'})
MERGE (p)-[:Contains]->(i)
MERGE (c)-[:Owns]->(i)
***The query is sent via Everyman****
static function makeQuery($client, $qry) {
return new Everyman\Neo4j\Cypher\Query($client, $qry);
}
This works fine and generally from start to finish takes 8-10 seconds to run.
The Dropbox account I am accessing contains around 35000 items, and takes around 18 runs of my PHP to populate my Neo4j Database with the folder/file structure of the dropbox account.
With every run of this PHP, around 50mb of items are added to the Neo4j JVM Old Gen heap, 30mb of that is not removed by GC.
The end result is obviously the VM runs out of memory and gets stuck in a constant state of GC throttling.
I have tried a range of Neo4j VM settings, as well as an update from Neo4j v2.2.5 to v2.3, which actually has appeared to make the problem worse.
My current settings are as follows,
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=3072m
-XX:MaxNewSize=3072m
-XX:SurvivorRatio=1
I am testing on a windows 10 PC with 8GB of ram and an i5 2.5GHz quad core. Java 1.8.0_60
Any info on how to solve this issue would be greatly appreciated.
Cheers, Jack.
Reduce the new size to 1024m
change your settings to:
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=1024m
It is most likely that the size of your tx grows too large.
I recommend sending each of the parents in separately, so instead of the UNWIND sent one statement each.
Make sure to use the new transactional http endpoint, I recommend to go wit neoclient instead of Neo4jPHP
You should also use parameters instead of literal values!!!
And don't repeeat user-id and type etc. properties on every item.
Are you sure you want to connect everything to c not just the root of the directory structure? I would do the latter.
MERGE (c:Cloud:Dropbox { id_user:{userId}})
MERGE (p:Item:Dropbox {id:{parentPath}})
// owning the parent should be good enough
MERGE (c)-[:Owns]->(p)
WITH c
UNWIND {items} as item
MERGE (i:Item:Dropbox { id:item.path})
ON Create SET i += { is_dir:item.is_dir, name:item.name, created_at:item.created_at }
SET i += { description:item.description, size:item.size, modified:items.modified, processed:timestamp()}
MERGE (p)-[:Contains]->(i);
Make sure to use 2.3.0 for best MERGE performance for relationships.
I'm trying to use the Java API for Neo4j but I seem to be stuck at IndexHits. If I query the DB with Cypher using
START n=node:types(type="Process") RETURN n;
I get all 2087 nodes of type "Process".
In my application I have the following lines
Index<Node> nodeIndex = db.index().forNodes("types");
IndexHits<Node> hits = nodeIndex.get("type", "Process");
System.out.println("Node index size: " + hits.size());
which leads my console to spit out a value of 0. Here, db is of course an instance of GraphDatabaseService.
I expected an object that included all 2087 nodes. What am I doing wrong?
The .size() question is just the prelude to my iterator
for(Node process : hits) { ... }
but that does not much when hits.size() == 0. According to http://api.neo4j.org/1.9.2/org/neo4j/graphdb/index/IndexHits.html this should be possible, provided there is something in hits.
Thanks in advance for your help.
I figured it out. Man, I feel so embarrassed...
It so happens that I had set up the DB_PATH to my default data folder, whereas the default storage folder is the default data folder plus graph.db. When I tried to run the code from that corrected DB_PATH I got an error saying that a lock file was in place because the Neo4j server was running. After shutting it down it worked perfectly.
So, if you happen to see the following error, just stop the server and run the code again:
Caused by: org.neo4j.kernel.StoreLockException: Could not create lock file
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:40)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:491)
I found on several forums that you cannot run the Neo4j server and use the Java API to query it at the same time.