Implementing Jena Dataset provider over MongoDB - jena

I have started to implement a set of classes that provide a direct interface to MongoDB for persistence, similar in spirit to the now-unmaintained SDB persistor implementation for RDBMS.
I am using the time-honored technique of creating the necessary concrete classes from the interfaces and doing a println in each method, therein allowing me to trace the execution. I have gotten all the way to where the engine is calling out to my cursor set up:
public ExtendedIterator<Triple> find(Node s, Node p, Node o) {
System.out.println("+++ MongoGraph:extenditer:find(" + s + p + o + ")");
// TBD Need to turn s,p,o into a match expression! Easy!
MongoCursor cur = this.coll.find().iterator();
ExtendedIterator<Triple> curs = new JenaMongoCursorIterator(cur);
return curs;
}
Sadly, when I later call this:
while(rs.hasNext()) {
QuerySolution soln = rs.nextSolution() ;
System.out.println(soln);
}
It turns out rs.hasNext() is always false even though material is present in the MongoCursor (I can debug-print it in the find() method). Also, the trace print in the next() function in my concrete iterator JenaMongoCursorIterator (which extends NiceIterator which I believe is OK) is never hit. In short, the basic setup seems good but then the engine never cranks the iterator on find()
Trying to use SDB as a guide is completely overwhelming for someone not intimately familiar with the software architecture. It's fully factored and filled with interfaces and factories and although that is excellent, it is difficult to nav.
Has anyone tried to create their own persistor implementation and if so, what are the basic steps to getting a "hello world" running? Hello World in this case is ANY implementation, non-optimized, that can call next() on something to produce a Triple.

TLDR: It is now working.
I was coding too eagerly and JenaMongoCursorIterator contained a method hasNexl which of course did not override hasNext (with a t ) in the default implementation of NiceIterator which returns false.
This is the sort of problem that eclipse and visual debugging and tracing makes a lot easier to resolve than regular jdb. jdb is fine if you know the software architecture pretty well but if you don't, the multiple open source files and being able to mouse over vars and such provides a tremendous boost in the amount of context that can be created to home in on the problem.

Related

Saxon - s9api - setParameter as node and access in transformation

we are trying to add parameters to a transformation at the runtime. The only possible way to do so, is to set every single parameter and not a node. We don't know yet how to create a node for the setParameter.
Current setParameter:
QName TEST XdmAtomicValue 24
Expected setParameter:
<TempNode> <local>Value1</local> </TempNode>
We searched and tried to create a XdmNode and XdmItem.
If you want to create an XdmNode by parsing XML, the best way to do it is:
DocumentBuilder db = processor.newDocumentBuilder();
XdmNode node = db.build(new StreamSource(
new StringReader("<doc><elem/></doc>")));
You could also pass a string containing lexical XML as the parameter value, and then convert it to a tree by calling the XPath parse-xml() function.
If you want to construct the XdmNode programmatically, there are a number of options:
DocumentBuilder.newBuildingStreamWriter() gives you an instance of BuildingStreamWriter which extends XmlStreamWriter, and you can create the document by writing events to it using methods such as writeStartElement, writeCharacters, writeEndElement; at the end call getDocumentNode() on the BuildingStreamWriter, which gives you an XdmNode. This has the advantage that XmlStreamWriter is a standard API, though it's not actually a very nice one, because the documentation isn't very good and as a result implementations vary in their behaviour.
Another event-based API is Saxon's Push class; this differs from most push-based event APIs in that rather than having a flat sequence of methods like:
builder.startElement('x');
builder.characters('abc');
builder.endElement();
you have a nested sequence:
Element x = Document.elem('x');
x.text('abc');
x.close();
As mentioned by Martin, there is the "sapling" API: Saplings.doc().withChild(elem(...).withChild(elem(...)) etc. This API is rather radically different from anything you might be familiar with (though it's influenced by the LINQ API for tree construction on .NET) but once you've got used to it, it reads very well. The Sapling API constructs a very light-weight tree in memory (hance the name), and converts it to a fully-fledged XDM tree with a final call of SaplingDocument.toXdmNode().
If you're familiar with DOM, JDOM2, or XOM, you can construct a tree using any of those libraries and then convert it for use by Saxon. That's a bit convoluted and only really intended for applications that are already using a third-party tree model heavily (or for users who love these APIs and prefer them to anything else).
In the Saxon Java s9api, you can construct temporary trees as SaplingNode/SaplingElement/SaplingDocument, see https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingDocument.html and https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingElement.html.
To give you a simple example constructing from a Map, as you seem to want to do:
Processor processor = new Processor();
Map<String, String> xsltParameters = new HashMap<>();
xsltParameters.put("foo", "value 1");
xsltParameters.put("bar", "value 2");
SaplingElement saplingElement = new SaplingElement("Test");
for (Map.Entry<String, String> param : xsltParameters.entrySet())
{
saplingElement = saplingElement.withChild(new SaplingElement(param.getKey()).withText(param.getValue()));
}
XdmNode paramNode = saplingElement.toXdmNode(processor);
System.out.println(paramNode);
outputs e.g. <Test><bar>value 2</bar><foo>value 1</foo></Test>.
So the key is to understand that withChild() returns a new SaplingElement.
The code can be compacted using streams e.g.
XdmNode paramNode2 = Saplings.elem("root").withChild(
xsltParameters
.entrySet()
.stream()
.map(p -> Saplings.elem(p.getKey()).withText(p.getValue()))
.collect(Collectors.toList())
.toArray(SaplingElement[]::new))
.toXdmNode(processor);
System.out.println(paramNode2);

Possible to create Graal native function callable from C without isolate?

I'd like to create a library, written in Java, callable from C, with simple method signatures:
int addThree(int in) {
return in + 3;
}
I know it's possible to do this with GraalVM if you do a little dance and create an Isolate in your C program and pass it in as the first parameter in every function call. There is good sample code here.
The problem is that the system I'm writing for, Postgres, can load C libraries and call functions in them, but I would have to create a wrapper function in C that would wrap every function I wanted to expose. This really limits the value of being able to slap something together in Java and use it in Postgres directly. I'd have to do something like this:
int myPublicAddThreeFunction(int in) {
graal_isolatethread_t *thread = NULL;
if (graal_create_isolate(NULL, NULL, &thread) != 0) {
fprintf(stderr, "error on isolate creation or attach\n");
return 1;
}
return SomeClassName_addThree_big_random_string_here(thread, in);
}
Is there a way, in Java alone, to expose a simple C function? I'm thinking I could create the isolate in a static method that gets loaded once on startup, somehow set it as the current isolate, and have the Java method just use it. Haven't been able to figure it out, though.
Also, it would be real nice not to have to append a big random string to every function name.

Lua C - decent LuaJIT-compatible way of throwing an error into a resuming coroutine

In short, while the code seems to work fine, I'm curious of whether less hacky approaches (than the ones I came up so far) with exist.
Suppose you create a coroutine via lua_newthread, and later suspend it from a cclocure via lua_yield. Your program takes a lap around non-Lua code, and it is now time to resume the coroutine via lua_resume, but - suppose the arguments that Lua code provided are extremely illegal, and we should give it an error to indicate that.
As you might know, you can't call lua_error (or luaL_error) on a state unless it's currently running. So the state must resume but immediately get hit by an error.
In 5.3 you would use lua_yieldk, provide a continuation function, and call lua_error or luaL_error in there. Et voilĂ .
But, alas - LuaJIT does not implement lua_yieldk, so what options are we left to?
A single-use hook
Suppose the error message is stored in a char error_text[256]. We could then bind a per-instruction hook immediately before resuming,
lua_sethook(L, throw_error, LUA_MASKCOUNT, 1);
int result = lua_resume(L, NULL, ret_count);
and then unbind the hook and throw an error in there
void throw_error(lua_State *L, lua_Debug *ar) {
lua_sethook(L, throw_error, 0, 0);
if (error_text == nullptr) return; // trust no one, especially yourself
luaL_error(L, "%s", error_text);
}
Should string cleanup be required, you would of course prefer to concatenate error_text to luaL_where(L, 1) yourself before calling _error, as both lua_error and luaL_error do a long jump and thus will be the last thing your function does.
A Lua-side wrapper
Suppose you decide to pull a somewhat node.js-like on this and have your C code resume with an error, result pair so that you could have a wrapper function like so
function some(arg)
local e, r = some_native(arg)
if (e) then
error(e)
else
return r
end
end
Or maybe refactor your API entirely so that errors are also handled using the same pattern, but that's a story for another day.
Option 1 seems less hacky of two.
Option 2 seems less likely to cause any trouble.
I can't help the feeling that there's a much better way of doing this that I'm overlooking (after all, lua_yieldk is a relatively recent addition).

Result vs raise in F# async?

It seems like there are two ways to return errors in an async workflow: raise and Result.
let willFailRaise = async {
return raise <| new Exception("oh no!")
}
let willFailResult = async {
return Result.Error "oh no!"
}
For the caller, the handling is a bit different:
async {
try
let! x = willFailRaise
// ...
with error ->
System.Console.WriteLine(error)
}
async {
let! maybeX = willFailResult
match maybeX with
| Result.Ok x ->
// ...
| Result.Error error ->
System.Console.WriteLine(error)
}
My questions are:
What are the advantages / disadvantages of each approach?
Which approach is more idiomatic F#?
It depends on what kind of error we are talking about. Basically there are three kinds:
Domain errors (e.g. user provided invalid data, user with this email is already registered, etc.)
Infrastructure errors (e.g you can't connect to another microservice or DB)
Panics (e.g. NullReferenceExceptionor StackOverflowException etc.), which are caused by programmers' mistakes.
While both approaches can get the job done, usually your concern should be to make your code as self-documented and easy-to-read as possible. Which means the following:
Domain errors: definitely go for Result. Those "errors" are expected, they are part of your workflow. Using Result reflects your business rules in function's signature, which is very useful.
Infrastructure failures: it depends. If you have microservices, then probably those failures are expected and maybe it would be more convenient to use Result. If not -- go for exceptions.
Panics: definitely Exception. First of all, you can't cover everything with Result, you gonna need global exception filter either way. Second thing -- if you try to cover all possible panics - code becomes a nasty disaster extremely fast, that will kill the whole point of using Result for domain errors.
So really this has nothing to do with Async or C# interop, it's about code readability and maintainability.
As for C# iterop -- don't worry, Result has all the methods to help, like IsError and so on. But you can always add an extension method:
[<AutoOpen>]
module Utils =
type Result<'Ok, 'Error> with
member this.Value =
match this with
| Ok v -> v
| Error e -> Exception(e.ToString()) |> raise
This is one of the many aspects of F# programming that suffers from the mind-split at the core of the language and its community.
On one hand you have "F# the .NET Framework language" where exceptions are the mechanism for handling errors, on the other - "F# the functional programming language" that borrows its idioms from the Haskell side of the world. This is where Result (also known as Either) comes from.
The answer to the question "which one is idiomatic" will change depending who you ask and what they have seen, but my experience has taught me that when in doubt, you're better off using exceptions. Result type has its uses in moderation, but result-heavy programming style easily gets out of hand, and once that happens it's not a pretty sight.
Raise
Advantages
Better .NET interop as throwing exceptions is fairly common in .NET
Can create custom Exceptions
Easier to get the stack trace as it's right there
You probably have to deal with exceptions from library code anyways in most standard async operations such as reading from a webpage
Works with older versions of F#
Disadvantages:
If you aren't aware of it potentially throwing an exception, you might not know to catch the exception. This could result in runtime explosions
Result
Advantages
Any caller of the async function will have to deal with the error, so runtime explosions should be avoided
Can use railway oriented programming style, which can make your code quite clean
Disadvantages
Only available in F# 4.1 or later
Difficult for non-F# languages to use it
The API for Result is not comprehensive.
There are only the functions bind, map, and mapError
Some functions that would be nice to have:
bimap : ('TSuccess -> 'a) -> ('TError -> 'e) -> Result<'TSuccess,'TError> -> Result<'a, 'e>
fold : ('TSuccess -> 'T) -> ('TError -> 'T) -> Result<'TSuccess, 'TError> -> 'T
isOk

Caching streams in Functional Reactive Programming

I have an application which is written entirely using the FRP paradigm and I think I am having performance issues due to the way that I am creating the streams. It is written in Haxe but the problem is not language specific.
For example, I have this function which returns a stream that resolves every time a config file is updated for that specific section like the following:
function getConfigSection(section:String) : Stream<Map<String, String>> {
return configFileUpdated()
.then(filterForSectionChanged(section))
.then(readFile)
.then(parseYaml);
}
In the reactive programming library I am using called promhx each step of the chain should remember its last resolved value but I think every time I call this function I am recreating the stream and reprocessing each step. This is a problem with the way I am using it rather than the library.
Since this function is called everywhere parsing the YAML every time it is needed is killing the performance and is taking up over 50% of the CPU time according to profiling.
As a fix I have done something like the following using a Map stored as an instance variable that caches the streams:
function getConfigSection(section:String) : Stream<Map<String, String>> {
var cachedStream = this._streamCache.get(section);
if (cachedStream != null) {
return cachedStream;
}
var stream = configFileUpdated()
.filter(sectionFilter(section))
.then(readFile)
.then(parseYaml);
this._streamCache.set(section, stream);
return stream;
}
This might be a good solution to the problem but it doesn't feel right to me. I am wondering if anyone can think of a cleaner solution that maybe uses a more functional approach (closures etc.) or even an extension I can add to the stream like a cache function.
Another way I could do it is to create the streams before hand and store them in fields that can be accessed by consumers. I don't like this approach because I don't want to make a field for every config section, I like being able to call a function with a specific section and get a stream back.
I'd love any ideas that could give me a fresh perspective!
Well, I think one answer is to just abstract away the caching like so:
class Test {
static function main() {
var sideeffects = 0;
var cached = memoize(function (x) return x + sideeffects++);
cached(1);
trace(sideeffects);//1
cached(1);
trace(sideeffects);//1
cached(3);
trace(sideeffects);//2
cached(3);
trace(sideeffects);//2
}
#:generic static function memoize<In, Out>(f:In->Out):In->Out {
var m = new Map<In, Out>();
return
function (input:In)
return switch m[input] {
case null: m[input] = f(input);
case output: output;
}
}
}
You may be able to find a more "functional" implementation for memoize down the road. But the important thing is that it is a separate thing now and you can use it at will.
You may choose to memoize(parseYaml) so that toggling two states in the file actually becomes very cheap after both have been parsed once. You can also tweak memoize to manage the cache size according to whatever strategy proves the most valuable.

Resources