Mirror/setup of a Python rdflib graph from a remote SPARQL endpoint - rdflib

How do I setup a Python rdflib graph from a remote SPARQL endpoint?
I have done a SELECT ?s ?p ?o WHERE { ?s ?p ?o } SPARQL query to the remote endpoint to get all triples and I have downloaded the resulting XML file. I was then expecting there would be a function somewhere in rdflib to build the local rdflib graph in a Python one-liner.
The closest I have is:
from rdflib.plugins.sparql.results.xmlresults import XMLResultParser
result = XMLResultParser().parse('dump.xml')
This is not the original graph but rather the SPARQL result.
I suppose I would then have to iterate through the result.bindings variable adding the data with the Graph.add method, like
g = rdflib.Graph()
for binding in result.bindings:
g.add((binding[rdflib.term.Variable('s')],
binding[rdflib.term.Variable('p')],
binding[rdflib.term.Variable('o')]))
It seems to be possible to query the original graph via g.query.
So is this it or would there be a more direct way of mirroring the remote triple store?

Related

Importing bulk json data into neo4j

I am trying to load json file of size about 700k. But it is showing me the heap memory out of space error.
My query is as below:
WITH "file:///Users//arundhathi.d//Documents//Neo4j//default.graphdb//import//tjson.json" as url
call apoc.load.json(url) yield value as article return article
Like in csv I tried to use USING PERIODIC COMMIT 1000 with json. But I am not allowed to use with loading json.
How to load bulk json data?.
You can also convert JSON into CSV files using jq - a uber fast json converter. https://stedolan.github.io/jq/tutorial/
This is the recommended way according to: https://neo4j.com/blog/bulk-data-import-neo4j-3-0/
If you have many files, write a python program or similar that iterates through the length of files calling:
os.system("cat file{}.json | jq '. [.entity1, .entity2, .entity3] | #csv' >> concatenatedCSV.csv".format(num))
or in Go:
exec.Command("cat file"+num+".json | jq '. [.entity1, .entity2, .entity3] | #csv' >> concatenatedCSV.csv")
I recently did this for about 700GB of JSON files. It takes some thought to get the csv files in the right format, but if you follow the tutorial on jq you'll pickup how to do it. Additionally, check out how the headers need to be and what not here: https://neo4j.com/docs/operations-manual/current/tools/import/
It took about a day to convert it all, but given the transaction overhead of using apoc, and the ability to reimport at anytime once the files are in the format it is worth it in the long run.
apoc.load.json now supports a json-path as a second parameter.
To get the first 1000 JSON objects from the array in the file, try this:
WITH "file:///path_to_file.json" as url
CALL apoc.load.json(url, '[0:1000]') YIELD value AS article
RETURN article;
The [0:1000] syntax specifies a range of array indices, and the second number is exclusive (so, in this example, the last index in the range is 999).
The above should at least work in neo4j 3.1.3 (with apoc release 3.1.3.6). Note also that the Desktop versions of neo4j (installed via the Windows and OSX installers) have a new requirement concerning where to put plugins like apoc in order to import local files.

How to set "resultDataContents" in Neo4jrb?

I want to visualise data from Neo4j with the frontend-library D3.js in an Rails application, using Neo4jrb. For example I could use the following query to get my graph data.
query = "MATCH path = (a)-[b]->(c) RETURN path"
result = Neo4j::Session.current.query(query)
But this query is not giving me the exact data I want.
According to the Neo4j data visualisation guide there is a possibility to set the parameter resultDataContents to "graph". (
Neo4j documentation for "resultDataContents")
This is exactly what I need for my application. Is there any possibility to set this parameter in Neo4jrb, or another idea how to achieve such a result?
Unfortunately not currently. The neo4j-core gem (which the neo4j gem uses) was build to abstract away the REST format. The "graph" format returns data in a different way.
You have a couple of options. You could make the JSON queries yourself or you could retrieve the nodes and relationships from the queries that you perform and then build your own nodes/relationships structure which is returned. This might be more future-proof anyway if you ever want to switch to Bolt.
A way that you might do this in your case:
query = "MATCH path = (a)-[b]->(c) RETURN nodes(path) AS nodes, rels(path) AS rels"
result = Neo4j::Session.current.query(query)
response = {nodes: [], rels: []}
result.each do |row|
response[:nodes].concat(row.nodes)
response[:rels].concat(row.rels)
end
response[:nodes].uniq!
response[:rels].uniq!

py2neo return number of nodes and relationships created

I need to create a python function such that it adds nodes and relationship to a graph and returns the number of created nodes and relationships.
I have added the nodes and relationship using graph.cypher.execute().
arr_len = len(dic_st[story_id]['PER'])
for j in dic_st[story_id]['PER']:
graph.cypher.execute("MERGE (n:PER {name:{name}})",name = j[0].upper()) #creating the nodes of PER in the story
print j[0]
for j in range(0,arr_len):
for k in range(j+1,arr_len):
graph.cypher.execute("MATCH (p1:PER {name:{name1}}), (p2:PER {name:{name2}}) WHERE upper(p1.name)<>upper(p2.name) CREATE UNIQUE (p1)-[r:in_same_doc {st_id:{st_id}}]-(p2)", name1=dic_st[story_id]['PER'][j][0].upper(),name2=dic_st[story_id]['PER'][k][0].upper(),st_id=story_id) #linking the edges for PER nodes
What I need is to return the number of new nodes and relationships created.
What I get to know from the neo4j documentation is that there is something called "ON CREATE" and "ON MATCH" for MERGE in cypher, but thats not being very useful.
The browser interface for neo4j do actually shows the number of nodes and relationship updated. This is what I need to return, but I am not getting quite the way for it to access it.
Any help please.
In case you need the exact counts of properties either created or updated then you have use "Match" with "Create" or "Match" with "Set" and then count the size of results. Merge may not return which ones are updated and which ones are created.
When you post your query against the Cypher endpoint of the neo4j REST API without using py2neo, you can include the argument "includeStats": true in your post request to get the node/relationship statistics. See this question for an example.
As far as I can tell, py2neo currently does not support additional parameters for the Cypher query (even though it is using the same API endpoints under the hood).
In Python, you could do something like this (using the requests and json packages):
import requests
import json
payload = {
"statements": [{
"statement": "CREATE (t:Test) RETURN t",
"includeStats": True
}]
}
r = requests.post('http://your_server_host:7474/db/data/transaction/commit',
data=json.dumps(payload))
print(r.text)
The response will include statistics about the number of nodes created etc.
{
"stats":{
"contains_updates":true,
"nodes_created":1,
"nodes_deleted":0,
"properties_set":1,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":1,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
After executing your query using x = session.run(...) you can use x.summary.counters to get the statistics noted in Martin Perusse's answer. See the documentation here.
In older versions the counters are available as a "private" field under x._summary.counters.

how to use Union/or in sparql path with arbitrary length?

I'm using below query to find all properties with domain of city (or superclasses of city) or range of country (or superclasses of country) from DBPedia ontology. when I use path with fixed length there is no problem but when I put * to define paths with arbitrary length, I get this error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable
'_::trans_subj_6_4' is used in subexpressions of the query but not
assigned
my SPARQL:
define sql:signal-void-variables 1
define input:default-graph-uri <http://dbpedia.org>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
select ?property where{
{ ?property rdfs:domain/^rdfs:subClassOf* dbo:City }
UNION
{ ?property rdfs:range/^rdfs:subClassOf* dbo:Country }
}
Also when I put any number instead of *, I get same error. I'm using Virtuoso as DBPedia SPARQL endpoint.
Use VALUES instead of UNION (when you can)
The error Virtuoso is giving you is more about its implementation of property paths and union than the actual SPARQL query. The SPARQL part of the query looks correct. (I can't speak to the Virtuoso specific defines.)
In many places that required union in the original SPARQL standard, you can now use values to specify the particular values that variables can have. It typically leads to more readable queries (at least, in my opinion), and some endpoints, such as Virtuoso, seem to handle it better.
Using values (and using the dbpedia-owl prefix that the web interface to the endpoint uses), you query becomes the following, and Virtuoso returns what you're looking for:
select ?property where {
values (?p ?v) { (rdfs:domain dbpedia-owl:City)
(rdfs:range dbpedia-owl:Country) }
?property ?p ?class .
?class ^rdfs:subClassOf* ?v .
}
SPARQL results
Other Notes
Also when I put any number instead of *, I get same error. I'm using
Virtuoso as DBPedia SPARQL endpoint.
While Virtuoso accepts the {n,m} notation for lengths of property paths, do be aware that while those appeared in some drafts of property paths, they didn't actually make it into the SPARQL 1.1 standard. Virtuoso still accepts them, but if you use them, you might not be able to use your query with other endpoints.

NEO4J execute severals statement

How it's possible to run a collection of query like this (came from a spreadsheet copy) directly in one cypher query? one by one it's ok, but need 100 copy/paste
*******************************
MATCH (c:`alpha`)
where c.name = "a-01"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=2
return c ;
MATCH (c:`beta`)
where c.name = "a-02"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=0
return c ;
and 100 other lines ...
*********************************
you may try the 'union' clause, which joins the results of queries into one big-honkin result set:
http://docs.neo4j.org/chunked/milestone/query-union.html
That said - the root behavior of what you are trying to do could use some details - maybe there's a better way to write the query - you could use Excel to 'build' the unified query via calculations / macros, you could possibly write a unified query that combines the rules you are trying to follow, there's a lot of options, but it's hard to know a starting direction w/o context....
Talking about the REST API you can use the transactional endpoint in Neo4J 2.0, or the batch endpoint in Neo4J 1.x.
If you want to use the shell, have a look to the import page, in particular the neo4j-shell-tools where they're importing massive quantity of data batching multiple queries.

Resources