neo4j-graphql-schema: Cypher query to Graphql Schema and query - neo4j

I tested the following query in a Neo4j browser and it worked. Now I am trying to define a schema and query for a React app using the neo4j-graphql-js plugin/driver (import { neo4jgraphql } from "neo4j-graphql-js";). My issue is I do not know how to define a schema that will take 2 parameters, example: action(startTime: {epoch}, endTime: {epoch}). The action node in the DB has the following properties: action, timelist and timestamp.
MATCH (sec:Second)<-[:AT_TIME]-(act:action)-[:TARGET]->(obj:object)
WHERE act.timestamp >= 1499350389000 AND act.timestamp <= 1499350389000
RETURN sec, act, obj

Related

JS Neo4jError: Cannot run query in this transaction, because it has been rolled back either because of an error or explicit termination

I fire few hundreds of below mentioned query concurrently (tried synchronously also) from JS neo4j-driver 4.4.1. Few of the queries, sometimes throws the following error in nodejs. But when my retry logic retries after sometime, it works.
Query
MERGE (n0:Movie {movie_id: $movie_id})
WITH n0
CALL apoc.lock.nodes([n0])
CALL {
WITH n0
WITH n0 WHERE n0.updated_at IS NULL OR n0.updated_at < datetime($updated_at)
MERGE (n:Movie {movie_id: $movie_id})
ON CREATE SET n.movie_id = $movie_id
SET n.name = $name
SET n.downloads = $downloads
SET n.updated_at = datetime($updated_at)
RETURN count(*) AS cnt
}
RETURN n0, cnt
I run this query in separate transactions like below.
const session = driver.session();
await session.writeTransaction(async tx => {
return await tx.run(QUERY, {args});
});
Log
Neo4jError: Cannot run query in this transaction, because it has been rolled back either because of an error or explicit termination.
I couldn't find any trace related to that query in neo4j logs.
Any help with this?

Using DASK to read files and write to NEO4J in PYTHON

I am having trouble parallelizing code that reads some files and writes to neo4j.
I am using dask to parallelize the process_language_files function (3rd cell from the bottom).
I try to explain the code below, listing out the functions (First 3 cells).
The errors are printed at the end (Last 2 cells).
I am also listing environments and package versions at the end.
If I remove dask.delayed and run this code sequentially, its works perfectly well.
Thank you for your help. :)
==========================================================================
Some functions to work with neo4j.
from neo4j import GraphDatabase
from tqdm import tqdm
def get_driver(uri_scheme='bolt', host='localhost', port='7687', username='neo4j', password=''):
"""Get a neo4j driver."""
connection_uri = "{uri_scheme}://{host}:{port}".format(uri_scheme=uri_scheme, host=host, port=port)
auth = (username, password)
driver = GraphDatabase.driver(connection_uri, auth=auth)
return driver
def format_raw_res(raw_res):
"""Parse neo4j results"""
res = []
for r in raw_res:
res.append(r)
return res
def run_bulk_query(query_list, driver):
"""Run a list of neo4j queries in a session."""
results = []
with driver.session() as session:
for query in tqdm(query_list):
raw_res = session.run(query)
res = format_raw_res(raw_res)
results.append({'query':query, 'result':res})
return results
global_driver = get_driver(uri_scheme='bolt', host='localhost', port='8687', username='neo4j', password='abc123') # neo4j driver object.=
This is how we create a dask client to parallelize.
from dask.distributed import Client
client = Client(threads_per_worker=4, n_workers=1)
The functions that the main code is calling.
import sys
import time
import json
import pandas as pd
import dask
def add_nodes(nodes_list, language_code):
"""Returns a list of strings. Each string is a cypher query to add a node to neo4j."""
list_of_create_strings = []
create_string_template = """CREATE (:LABEL {{node_id:{node_id}}})"""
for index, node in nodes_list.iterrows():
create_string = create_string_template.format(node_id=node['new_id'])
list_of_create_strings.append(create_string)
return list_of_create_strings
def add_relations(relations_list, language_code):
"""Returns a list of strings. Each string is a cypher query to add a relationship to neo4j."""
list_of_create_strings = []
create_string_template = """
MATCH (a),(b) WHERE a.node_id = {source} AND b.node_id = {target}
MERGE (a)-[r:KNOWS {{ relationship_id:{edge_id} }}]-(b)"""
for index, relations in relations_list.iterrows():
create_string = create_string_template.format(
source=relations['from'], target=relations['to'],
edge_id=''+str(relations['from'])+'-'+str(relations['to']))
list_of_create_strings.append(create_string)
return list_of_create_strings
def add_data(language_code, edges, features, targets, driver):
"""Add nodes and relationships to neo4j"""
add_nodes_cypher = add_nodes(targets, language_code) # Returns a list of strings. Each string is a cypher query to add a node to neo4j.
node_results = run_bulk_query(add_nodes_cypher, driver) # Runs each string in the above list in a neo4j session.
add_relations_cypher = add_relations(edges, language_code) # Returns a list of strings. Each string is a cypher query to add a relationship to neo4j.
relations_results = run_bulk_query(add_relations_cypher, driver) # Runs each string in the above list in a neo4j session.
# Saving some metadata
results = {
"nodes": {"results": node_results, "length":len(add_nodes_cypher),},
"relations": {"results": relations_results, "length":len(add_relations_cypher),},
}
return results
def load_data(language_code):
"""Load data from files"""
# Saving file names to variables
edges_filename = './edges.csv'
features_filename = './features.json'
target_filename = './target.csv'
# Loading data from the file names
edges = helper.read_csv(edges_filename)
features = helper.read_json(features_filename)
targets = helper.read_csv(target_filename)
# Saving some metadata
results = {
"edges": {"length":len(edges),},
"features": {"length":len(features),},
"targets": {"length":len(targets),},
}
return edges, features, targets, results
The main code.
def process_language_files(process_language_files, driver):
"""Reads files, creates cypher queries to add nodes and relationships, runs cypher query in a neo4j session."""
edges, features, targets, reading_results = load_data(language_code) # Read files.
writing_results = add_data(language_code, edges, features, targets, driver) # Convert files nodes and relationships and add to neo4j in a neo4j session.
return {"reading_results": reading_results, "writing_results": writing_results} # Return some metadata
# Execution starts here
res=[]
for index, language_code in enumerate(['ENGLISH', 'FRENCH']):
lazy_result = dask.delayed(process_language_files)(language_code, global_driver)
res.append(lazy_result)
Result from res. These are dask delayed objects.
print(*res)
Delayed('process_language_files-a73f4a9d-6ffa-4295-8803-7fe09849c068') Delayed('process_language_files-c88fbd4f-e8c1-40c0-b143-eda41a209862')
The errors. Even if use dask.compute(), I am getting similar errors.
futures = dask.persist(*res)
AttributeError Traceback (most recent call last)
~/Code/miniconda3/envs/MVDS/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
48 buffers.clear()
---> 49 result = pickle.dumps(x, **dump_kwargs)
50 if len(result) < 1000:
AttributeError: Can't pickle local object 'BoltPool.open.<locals>.opener
==========================================================================
# Name
Version
Build
Channel
dask
2020.12.0
pyhd8ed1ab_0
conda-forge
jupyterlab
3.0.3
pyhd8ed1ab_0
conda-forge
neo4j-python-driver
4.2.1
pyh7fcb38b_0
conda-forge
python
3.9.1
hdb3f193_2
You are getting this error because you are trying to share the driver object amongst your worker.
The driver object contains private data about the connection, data that do not make sense outside the process (and also are not serializable).
It is like trying to open a file somewhere and share the file descriptor somewhere else.
It won't work because the file number makes sense only within the process that generates it.
If you want your workers to access the database or any other network resource, you should give them the directions to connect to the resource.
In your case, you should not pass the global_driver as a parameter but rather the connection parameters and let each worker call get_driver to get its own driver.

Unable to delete 15k+ nodes using neo4jclient

I am writing a Windows console application in C# that is supposed to import 15k nodes from an XML file and build a graph database in Neo4j (2.0.0) using the neo4jclient.
At the very beginning of the app, I am trying to remove all nodes and relationships from the database so that at every run of the application the database is fresh and clean:
Console.WriteLine("Deleting nodes and relationships...");
graphClient.Cypher.OptionalMatch("n-[r]-()").Delete("r").ExecuteWithoutResults();
graphClient.Cypher.Match("n").Delete("n").ExecuteWithoutResults();
Console.WriteLine("...done!");
At the moment the database has about 16k nodes (with no relationships between them) which were created by a couple of previous run of the application itself. When the second Delete statement above runs, this exception is thrown after 30 or so second:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
--- End of inner exception stack trace ---
at Neo4jClient.GraphClient.SendHttpRequest(HttpRequestMessage request, String commandDescription, HttpStatusCode[] expectedStatusCodes) in c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\GraphClient.cs:line 138
at Neo4jClient.GraphClient.Neo4jClient.IRawGraphClient.ExecuteCypher(CypherQuery query) in c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\GraphClient.cs:line 843
at Neo4jClient.Cypher.CypherFluentQuery.ExecuteWithoutResults() in c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\Cypher\CypherFluentQuery.cs:line 322
at Xml2Cypher.Program.Main(String[] args) in c:\_PrivateProjects\KanjiDoc2Neo4J\Xml2Cypher\Program.cs:line 25
I also tried batching the delete statements using a Limit statement, but I get the same exception. Any ideas? I think the Request is timing out, but even batching doesn't seem to solve the issue.
graphClient.Cypher.Match("n").With("n").Limit(100).Delete("n").ExecuteWithoutResults();
I tried running the following statement from the browser:
match (n:Characters) with n limit 100 delete n
but even there I get an "unknown error".
Just increase the http timeout in neo4jclient then it should work.
The error you get in browser is wrong. It should say: "node has still relationships"
the query for deletes is:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
DELETE n,r
If you have many rels to delete you probably want to batch it. Unfortunately the promising PERIODIC COMMIT was limited to LOAD CSV after 2.1-M01 :(
So you're back to batching yourself (delete a block of 5k nodes and their rels)
MATCH (n)
LIMIT 5000
OPTIONAL MATCH (n)-[r]->()
DELETE n,r
RETURN count(*)
repeat until it returns 0.
You could also stop your Windows service, delete the graph.db folder, and restart the service. I have used this to completely "refresh" the database as it will create a new database. Something like this should work:
public static void Main(string[] args)
{
RefreshDatabase();
}
private static void RefreshDatabase()
{
ServiceController sc = new ServiceController("Neo4j Graph Database", "computername");
if (sc.Status != ServiceControllerStatus.Stopped)
sc.Stop();
Console.WriteLine("Stopping Neo4j Graph Database service...");
//sc.WaitForStatus(ServiceControllerStatus.Stopped);
Console.WriteLine("Neo4j Graph Database service stopped.\n");
Console.WriteLine("Deleting graph.db files and folders...\n");
RecursiveDelete(#"C:\neo4j-community-2.0.1\data\graph.db");
Console.WriteLine("Finished deleting graph.db folder contents.\n");
Console.WriteLine("Starting Neo4j Graph Database service...\n");
sc.Start();
//sc.WaitForStatus(ServiceControllerStatus.Running);
Console.WriteLine("Neo4j Graph Database running.\n");
}
private static void RecursiveDelete(string path)
{
DirectoryInfo di = new DirectoryInfo(path);
foreach (FileInfo file in di.GetFiles())
{
file.Delete();
}
foreach (DirectoryInfo directory in di.GetDirectories())
{
directory.Delete(true);
}
}

Neo4J: Automatic Indexing on Batch Execution

Is it possible to import data on Neo4J using the automatic indexing feature? I'm trying to import data using BatchInserter and BatchInserterIndex like the following example:
BatchInserter inserter = BatchInserters.inserter("/home/fmagalhaes/Neo4JDatabase");
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(inserter);
BatchInserterIndex nodeIndex = indexProvider.nodeIndex("node_auto_index", MapUtil.stringMap("type","exact"));
BatchInserterIndex relIndex = indexProvider.relationshipIndex("relationship_auto_index", MapUtil.stringMap("type","exact"));
...
inserter.createNode(vertexId, properties);
nodeIndex.add(vertexId, properties);
...
The problem is that when batch processing is completed, I'm trying to open this database with Blueprints generic API by doing the following:
Graph g = new Neo4jGraph("/home/fmagalhaes/Neo4JDatabase");
Set<String> nodeIndices = ((KeyIndexableGraph)g).getIndexedKeys(Vertex.class);
Set<String> relIndices = ((KeyIndexableGraph)g).getIndexedKeys(Edge.class);
and both nodeIndices and relIndices are empty. Auto indexing feature is disabled when I open the graph database on Blueprints API. Is it possible to create an automatic index during the batch processing such that this index will be visible (and will continue to index data automatically as properties are added to vertices and edges) when I open the database with Blueprints API?
you have to cleanly shut down both the batch-index as well as the batch inserter
you probably don't want to index all properties, just the key ones that you use to look up nodes
you have to enable auto-indexing in the neo4j config for the database you start afterwards, and for the same properties that you indexed during batch-insertion

Neo4j : When Query using an index hint , I got Unknown error

I use the exactly query just as http://docs.neo4j.org/chunked/milestone/query-using.html says.
My Neo4j Kernel version is
Neo4j - Graph Database Kernel 2.0.0-M03
I don't know why?
It's ok for me to run
CREATE (_1 { `name`:"Emil" })
CREATE (_2:`German` { `name`:"Stefan", `surname`:"Plantikow" })
CREATE (_3 { `age`:34, `name`:"Peter" })
CREATE (_4:`Swedish` { `age`:36, `awesome`:true, `name`:"Andres", `surname`:"Taylor" })
CREATE _1-[:`KNOWS`]->_3
CREATE _2-[:`KNOWS`]->_4
CREATE _4-[:`KNOWS`]->_3
But I got Unknown error while using
match n:Swedish using index n:Swedish(surname)
where n.surname = 'Taylor'
return n
If your query explicitly mandates to use an index, you need to make sure that it exists.
So run before querying:
CREATE INDEX ON :Swedish(surname)

Resources