How do I create my own rules in Jena Fuseki from string content? - jena

I am trying to create my own property rules in Jena Fuseki. To do so I am using the Generic Rule Reasoning that allows me to use my own rules. When I use this strategy with my rules from a file everything works fine:
:model_inf a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL <http://jena.hpl.hp.com/2003/GenericRuleReasoner> ;
ja:rulesFrom <file://...> ;
] .
However, I would not want to use a file but add the rules directly as a string. I tried just to copy the content of the rule files that worked in the example above, for instance (a small slice of the file):
#-*-mode: conf-unix-*-
#prefix time: <http://www.w3.org/2006/time#>
#include <owlmicro>
-> table(owl:sameAs).
#---------------------------------------------------------------------------
# Equality
#---------------------------------------------------------------------------
sameAs_symmetry:
(?x owl:sameAs ?y)
-> (?y owl:sameAs ?x).
sameAs_transitivity:
(?x owl:sameAs ?y)
(?y owl:sameAs ?z)
-> (?x owl:sameAs ?x).
sameAs_Thing1:
-> [(?x rdf:type owl:Thing) <- (?x owl:sameAs ?y)].
sameAs_Thing2:
-> [(?x owl:sameAs ?x) <- (?x rdf:type owl:Thing)].
and put this in a variable string_rules_variable (with proper escaping):
:model_inf a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL <http://jena.hpl.hp.com/2003/GenericRuleReasoner> ;
ja:rules [
${string_rules_variable}
] ;
] .
where ${string_rules_variable} (javascript string interpolation) contains the rules read from the file.
In the end, the repository was created with no error, but the rules doesn't worked nor the owlmicro statements appeared in the repository.
So, am I doing anything wrong, or is it a Jena Fuseki issue?
P.S. I am using nodejs to send this in the body of a post request with text/turtle content-type in headers.

Related

Comment conflict in HQL grammar

I am trying to create the --i; statement.
But my problem lies with the single line comment rule of HQL which states:
L_S_COMMENT : ('--' | '//') .*? '\r'? '\n' -> channel(HIDDEN) ;
I wrote the rules in the lexer:
T_SUB2 : '--' ;
T_SEMICOLON : ';' ;
Rule in parser:
dummy_rule: T_SUB2 'i' T_SEMICOLON ;
When i test the rule it works fine with the parse tree correctly displayed, But when i press ENTER for a new line it shows an error, And it wont accept any more rules, I know its the L_S_COMMENT rule because when i remove it the rules works fine.
But deleting it is not the optimal solution any ideas what might cause this and how to bypass it.
If the relevant statements always have to be terminated in a SEMI, then effectively exclude then from the comment definition:
COMMENT
: ( CMark .*? Vws
| DMark .*? ~[; \t\r\n\f] Hws* Vws
) -> channel(HIDDEN)
;
fragment CMark : '//' ;
fragment DMark : '--' ;
fragment Hws : [ \t] ;
fragment Vws : [\r\n]+ ;
Explanation
The first alt of the rule matches a standard // comment
The second alt will match a -- comment iff the one visible character immediately prior to the terminating whitespace is not a SEMI. The ~ is set negation, while [; \t\r\n\f] is a set of characters. Since there is no operator modifying the set, ~[; \t\r\n\f] will match only a single character that is not one of the named characters.
Hence, the comment rule will not match the terminal portion of a line of code that contains -- and terminates in a SEMI.

How can I check if first character of a line is "*" in ANTLR4?

I am trying to write a parser for a relatively simple but idiosyncratic language.
Simply put, one of the rules is that comment lines are denoted by an asterisk only if that asterisk is the first character of the line. How might I go about formalising such a rule in ANTLR4? I thought about using:
START_LINE_COMMENT: '\n*' .*? '\n' -> skip;
But I am certain this won't work with more than one line comment in a row, as the newline at the end will be consumed as part of the START_LINE_COMMENTtoken, meaning any subsequent comment lines will be missing the required initial newline character, which won't work. Is there a way I can perhaps check if the line starts with a '*' without needing to consume the prior '\n'?
Matching a comment line is not easy. As I write one grammar per year, I had to grab to The Definitive ANTLR Reference to refresh my brain. Try this :
grammar Question;
/* Comment line having an * in column 1. */
question
: line+
;
line
// : ( ID | INT )+
: ( ID | INT | MULT )+
;
LINE_COMMENT
: '*' {getCharPositionInLine() == 1}? ~[\r\n]* -> channel(HIDDEN) ;
ID : [a-zA-Z]+ ;
INT : [0-9]+ ;
//WS : [ \t\r\n]+ -> channel(HIDDEN) ;
WS : [ \t\r\n]+ -> skip ;
MULT : '*' ;
Compile and execute :
$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar:
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question question -tokens data.txt
[#0,0:3='line',<ID>,1:0]
[#1,5:5='1',<INT>,1:5]
[#2,9:12='line',<ID>,2:2]
[#3,14:14='2',<INT>,2:7]
[#4,16:26='* comment 1',<LINE_COMMENT>,channel=1,3:0]
[#5,32:35='line',<ID>,4:4]
[#6,37:37='4',<INT>,4:9]
[#7,39:48='*comment 2',<LINE_COMMENT>,channel=1,5:0]
[#8,51:78='* comment 3 after empty line',<LINE_COMMENT>,channel=1,7:0]
[#9,81:81='*',<'*'>,8:1]
[#10,83:85='not',<ID>,8:3]
[#11,87:87='a',<ID>,8:7]
[#12,89:95='comment',<ID>,8:9]
[#13,97:100='line',<ID>,9:0]
[#14,102:102='9',<INT>,9:5]
[#15,107:107='*',<'*'>,9:10]
[#16,109:110='no',<ID>,9:12]
[#17,112:118='comment',<ID>,9:15]
[#18,120:119='<EOF>',<EOF>,10:0]
with the following data.text file :
line 1
line 2
* comment 1
line 4
*comment 2
* comment 3 after empty line
* not a comment
line 9 * no comment
Note that without the MULT token or '*' somewhere in a parser rule, the asterisk is not listed in the tokens, but the parser complains :
line 8:1 token recognition error at: '*'
If you display the parsing tree
$ grun Question question -gui data.txt
you'll see that the whole file is absorbed by one line rule. If you need to recognize lines, change the line and white space rules like so :
line
: ( ID | INT | MULT )+ NL
| NL
;
//WS : [ \t\r\n]+ -> skip ;
NL : [\r\n] ;
WS : [ \t]+ -> skip ;

ANTLR4: Two channels, one for CSV-formatted data, one for key/value-formatted data -- does not work

The lexer grammar below contains two sets of rules: (1) rules for tokenizing CSV-formatted input, and (2) rules for tokenizing key/value-formatted input. For (1) I put the tokens on channel(0). For (2) I put the tokens on channel(1). Do you see any problems with my lexer grammar?
Also below is a parser grammar and it also contains two sets of rules: (1) rules for structuring CSV tokens into a parse tree, and (2) rules for for structuring key/value tokens into a parse tree. Do you see any problems with my parser grammar?
When I apply ANTLR to the grammar files, compile, and then run the test rig (with the -gui flag) using this CSV input:
FirstName, LastName, Street, City, State, ZipCode
Mark,, 4460 Stuart Street, Marion Center, PA, 15759
the parse tree is completely wrong - the tree contains no data. I have no idea why the parse tree is wrong. Any suggestions? I have tested each part separately (removed the key/value rules from the lexer and parser grammars and ran it with CSV input, removed the CSV rules from the lexer and parser grammars and ran it with key/value input) and it works fine.
Lexer Grammar
lexer grammar MyLexer;
COMMA : ',' -> channel(0) ;
NL : ('\r')?'\n' -> channel(0) ;
WS : [ \t\r\n]+ -> skip, channel(0) ;
STRING : (~[,\r\n])+ -> channel(0) ;
KEY : ('FirstName' | 'LastName') -> channel(1) ;
EQ : '=' -> channel(1) ;
NL2 : ('\r')?'\n' -> channel(1) ;
WS2 : [ \t\r\n]+ -> skip, channel(1) ;
VALUE : (~[=\r\n])+ -> channel(1) ;
Parser Grammar
parser grammar MyParser;
options { tokenVocab=MyLexer; }
csv : (header rows)+ EOF ;
header : field (COMMA field)* NL ;
rows : (row)* ;
row : field (COMMA field)* NL ;
field : STRING | ;
keyValue : pairs EOF ;
pairs : (pair)+ ;
pair : key EQ value NL2;
key : KEY ;
value : VALUE ;
The longest token match wins and if two matches are equal-sized the first one matches. That means:
STRING subsumes KEY, EQ and VALUE, you will never get Tokens of the latter types.
The ANTLR parser needs random Access on the token stream, thus not allowing context sensitive lexing.
I suggest to put both lexer grammars into separate grammars. Maybe it gets tricky to use them with a common parser grammar. If so - split the parser grammar as well.

Jena GenericRuleReasoner

What happens if we put a variable in the head of a GenericRuleReasoner, which does not appear in the body of the rule?
For instance if we have the following rule :
rule1: (?x rdf:type :Person) -> (?y :father ?x)
The rule says that every person has a father.
Suppose we have a triple :a rdf:type :Person
How does the reasoner behaves here? Will it create a new triple with blank node like _x :father :a ?
I think it will complain about that. It is, after all, ambiguous: do you mean 'there is a ?y such that...' or 'for any ?y ....'?
From what you say it's clear that you expect former, the existential version, because that's what introducing a bNode does. So try:
rule1: makeTemp(?y), (?x rdf:type ex:Person) -> (?y ex:fatherOf ?x)
or
rule1: makeInstance(?y, ex:father, ?x), (?x rdf:type ex:Person) -> (?y ex:fatherOf ?x)
the latter of which will give you a consistent father node, whereas the former simply introduces a bNode.

Overlapping Tokens in ANTLR 4

I have the following ANTLR 4 combined grammar:
grammar Example;
fieldList: field* ;
field: 'field' identifier '{' note '}' ;
note: NOTE ;
identifier: IDENTIFIER ;
NOTE: [A-Ga-g] ;
IDENTIFIER: [A-Za-z0-9]+ ;
WS: [ \t\r\n]+ -> skip ;
This parses:
field x { A }
field x { B }
This does not:
field a { A }
field b { B }
In the case where parsing fails, I think the lexer is getting confused and emitting a NOTE token where I want it to emit an IDENTIFIER token.
Edit:
In the tokens coming out of the lexer, the 'NOTE' token is showing up where the parser is expecting 'IDENTIFIER'. 'NOTE' has higher precedence because it's shown first in the grammar. So, I can think of two ways to fix this... first, I could alter the grammar to disambiguate 'NOTE' and 'IDENTIFIER' (like adding a '$' in front of 'NOTE'). Or, I could just use 'IDENTIFIER' where I would use note and then deal with detecting issues when I walk the parse tree. Neither of those feel optimal. Surely there must be a way to fix this?
I actually ended up solving it like this:
grammar Example;
fieldList: field* ;
field: 'field' identifier '{' note '}' ;
note: NOTE ;
identifier: IDENTIFIER | NOTE ;
NOTE: [A-Ga-g] ;
IDENTIFIER: [A-Za-z0-9]+ ;
WS: [ \t\r\n]+ -> skip ;
My parse tree still ends up looking how I'd like.
The actual grammar I'm developing is more complicated, as is the workaround based on this approach. But in general, the approach seems to work well.
Quick and dirty fix for your problem can be:
Change IDENTIFIERto match only the complement of NOTE. Then you put them together in identifier.
Resulting grammar:
grammar Example;
fieldList: field* ;
field: 'field' identifier '{' note '}' ;
note: NOTE ;
identifier: (NOTE|IDENTIFIER_C)+ ;
NOTE: [A-Ga-g] ;
IDENTIFIER_C: [H-Zh-z0-9] ;
WS: [ \t\r\n]+ -> skip ;
Disadvantage of this solution is, that you do not get the Identifier as tokens and you tokenize every single Character.

Resources