Can we use Stardog to query .ttl files? - stardog

I'm asking myself a question : I have a .ttl file stored somewhere on the internet (let's say http://www.example/org/myFile) and I want to query it.
Can I use Stardog to query it ? Something like (in node.js)
const stardog = new Stardog({
endpoint: 'http://www.example.org'
});
and query it with a SPARQL command line ?
I'm asking myself this question because I think the .ttl file need to be stored in a Stardog instance. (and then, http://www.example.org is supposed to be a Stardog instance !)
Thanks,
Clément

It is true that you cannot query a Turtle file. You need to first load it into a Stardog database. See the Known Issues in Stardog documentation:
Queries with FROM NAMED with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query.
If you have data stored in another SPARQL endpoint you can query it using SPARQL's federated query functionality (SERVICE keyword) without loading the data into Stardog.

Related

Azure Data Factory read url values from csv and copy to sql database

I am quite new in ADF so thats why i am asking you for any suggestions.
The use case:
I have a csv file which contains unique id and url's (see image below). i would like to use this file in order to export the value from various url's. In the second image you can see a example of the data from a url.
So in the current situation i am using each url and insert this manually as a source from the ADF Copy Activity task to export the data to a SQL DB. This is very time consuming method.
How can i create a ADF pipeline to use the csv file as a source, and that a copy activity use each row of the url and copy the data to Azure SQL DB? Do i need to add GetMetaData activity for example? so how?
Many thanks.
use a look up activity that reads all the data,Then use a foreach loop which reads line by line.Inside foreach use a copy activity where u can able to copy response to the sink.
In order to copy XML response of URL, we can use HTTP linked service with XML dataset. As #BeingReal said, Lookup activity should be used to refer the table which contains all the URLs and inside for each activity, Give the copy activity with HTTP as source and sink as per the requirement. I tried to repro the same in my environment. Below are the steps.
Lookup table with 3 URLs are taken as in below image.
For-each activity is added in sequence with Lookup activity.
Inside For-each, Copy activity is added. Source is given as HTTP linked service.
In HTTP linked service, base URL, #item().name is given. name is the column that stored URLs in the lookup table. Replace the name with the column name that you gave in lookup table.
In Sink, azure database is given. (Any sink of your requirement is to be used). Data is copied to SQL database.
this is the dataset HTTP inside the copy activity
This is the input of the Copy Activity inside the for each
this is the output of the Copy Activity
My sink is A Azure SQL Database without any tables yet. i would like to create auto table on the fly from ADF. Dont understand why this error came up

Allow exporting queries via persistConfig while retaining operation text rather than id

I'm attempting to "export" all queries in our Relay codebase from relay-compiler by using the following relay.config.json:
{
"persistConfig": {
"file": "queryMap.json"
}
}
The relay-compiler only performs this step in addition to rewriting all queries in __generated__/ used by the app from "text" to "id", expecting the app to send the query identifier as doc_id parameter with requests, rather than the full query as a query parameter (see Relay docs).
I only want to export the query map, but still continue using the query "text" in the app. That's both for developer ergonomics (easier to reason about queries you can see in the network panel), but most importantly because our server (Hasura) doesn't support persisted queries. The goal is to import the query map into Hasura as an allow-list for security purposes instead.
I'm not too fluent in Rust, but looking through the source code it sounds like that would be a new feature request for the relay-compiler?

Implementation of object cache in Neo4j 3.2.3

I have read this post "Understanding of Neo4j object cache", but can't find 'NodeImpl' any more in the source code of Neo4j 3.2.3.
I tried some code to track down to the implementation of Neo4j, but fail to find access to any cache other than page cache. I tried to get property of same node twice, expect to hit the cache when shoot the second query.
Node n = db.getNodeById(0);
n.getProperty("name");
String name = (String) n.getProperty("name");
System.out.println("name: " + name);
There is a lot of 'InstanceCache' inside 'StoreStatement', but as the comment implies, instance cache is used for single object, not used for connection between node and relationship as described here in 'An overview of Neo4j Internals'.
My question is:
What's the implementation of object cache inside neo4j 3.2.3 ?
Is there anything newer for internals of neo4j ? The slide I got is published 6 year ago.
The Object Cache doesn't exist anymore in Neo4j (since version 3.0 from what I remember), there is only the page cache.
Slides from Tobias that explain the graph storage are still correct.

Neo4jClient: Does it support timeout parameter for executing queries?

I am using neo4j 1.9.4 community and soon will upgrade to 2.x. I know that neo4j can be configured with timeout in 2 ways as described here
But I wanna know if the second approach i.e. including max-execution-timeout in REST call header is supported by any of the Neo4jClient method so that I can pass that value as a parameter before returning my results? Also, will it be supported by future neo4j release?
The goal is to define a timeout for each query rather global setting.
Something like :
GraphClient
.Root
.StartCypher("root")
.Match("root-[:HAS]->childs")
.Return("childs")
.Results(5000) // passing timeout value in ms
At present, no, Neo4jClient doesn't have this capability.
Will it be supported? That depends on a few things, but as it's open source (https://github.com/Readify/Neo4jClient) you can always add it yourself and get it into the next release.
PS. For using against a 2.x db, you will want to look at the .Cypher so your query will look more like:
GraphClient.Cypher
.Match("root-[:HAS]->child")
.Return(child => child.As<ChildObj>())
.Results();

Caching Index with Neo4jClient

My Neo4j index has over 1.4M entries. My queries are running very slow. I have cached most of the database. However, now I have found that lot of disk read of the lucene index are taking place.
Per this article following code will help witch caching the index.
Index<Node> index = graphDb.index().forNodes( "actors" );
((LuceneIndex<Node>) index).setCacheCapacity( "name", 300000 );
Anyway I can do it via Neo4jClient? I have got so far as
var indexes = _graphClient.GetIndexes(IndexFor.Node);
var index = indexes.ElementAt(0);
But then it does not give me an option to set the cache capacity. Any thoughts how I can set the cache parameters via Neo4jClient or reduce the index look up time? TIA.
Neo4jClient works via the REST API. The behaviour you are describing is from the native Java API, and not exposed via the REST API. There is no way to do this via Neo4jClient, or any other REST based driver. You may be able to do it via config instead.

Resources