I have a question.
I have a Jena rule like this:
[rule1:
(?if rdf:type p:InferredConfiguration)
(?if p:userName ?userEmail)
(?subProfile rdf:type u:PersonSubProfile)
(?subProfile u:hasUsername ?email)
equal(?userEmail, ?email)
(?subProfile u:hasName "")
(?subProfile u:hasLastname "")
(?subProfile u:hasPhone "")
(?subProfile u:hasEducation "Low")
->
(?subProfile u:hasPhone "00000")
print('**************** Phone defined - Rule 1 ***************')
]
The problem is that the rule above does not update the value of "hasPhone" property, but it adds one more value.
How can I update the value or remove the old one and add the new one?
Thank you.
I found solution
[rule1:
(?if rdf:type p:InferredConfiguration)
(?if p:userName ?userEmail)
(?subProfile rdf:type u:PersonSubProfile)
(?subProfile u:hasUsername ?email)
equal(?userEmail, ?email)
(?subProfile u:hasName "")
(?subProfile u:hasLastname "")
(?subProfile u:hasPhone ?var)
equal (?var, "")
(?subProfile u:hasEducation "Low")
->
drop(7)
(?subProfile u:hasPhone "00000")
print('**************** Phone defined - Rule 1 ***************')
]
Related
I'm trying to learn ANTLR4 and I'm already having some issues with my first experiment.
The goal here is to learn how to use ANTLR to syntax highlight a QScintilla component. To practice a little bit I've decided I'd like to learn how to properly highlight *.ini files.
First things first, in order to run the mcve you'll need:
Download antlr4 and make sure it works, read the instructions on the main site
Install python antlr runtime, just do: pip install antlr4-python3-runtime
Generate the lexer/parser of ini.g4:
grammar ini;
start : section (option)*;
section : '[' STRING ']';
option : STRING '=' STRING;
COMMENT : ';' ~[\r\n]*;
STRING : [a-zA-Z0-9]+;
WS : [ \t\n\r]+;
by running antlr ini.g4 -Dlanguage=Python3 -o ini
Finally, save main.py:
import textwrap
from PyQt5.Qt import *
from PyQt5.Qsci import QsciScintilla, QsciLexerCustom
from antlr4 import *
from ini.iniLexer import iniLexer
from ini.iniParser import iniParser
class QsciIniLexer(QsciLexerCustom):
def __init__(self, parent=None):
super().__init__(parent=parent)
lst = [
{'bold': False, 'foreground': '#f92472', 'italic': False}, # 0 - deeppink
{'bold': False, 'foreground': '#e7db74', 'italic': False}, # 1 - khaki (yellowish)
{'bold': False, 'foreground': '#74705d', 'italic': False}, # 2 - dimgray
{'bold': False, 'foreground': '#f8f8f2', 'italic': False}, # 3 - whitesmoke
]
style = {
"T__0": lst[3],
"T__1": lst[3],
"T__2": lst[3],
"COMMENT": lst[2],
"STRING": lst[0],
"WS": lst[3],
}
for token in iniLexer.ruleNames:
token_style = style[token]
foreground = token_style.get("foreground", None)
background = token_style.get("background", None)
bold = token_style.get("bold", None)
italic = token_style.get("italic", None)
underline = token_style.get("underline", None)
index = getattr(iniLexer, token)
if foreground:
self.setColor(QColor(foreground), index)
if background:
self.setPaper(QColor(background), index)
def defaultPaper(self, style):
return QColor("#272822")
def language(self):
return self.lexer.grammarFileName
def styleText(self, start, end):
view = self.editor()
code = view.text()
lexer = iniLexer(InputStream(code))
stream = CommonTokenStream(lexer)
parser = iniParser(stream)
tree = parser.start()
print('parsing'.center(80, '-'))
print(tree.toStringTree(recog=parser))
lexer.reset()
self.startStyling(0)
print('lexing'.center(80, '-'))
while True:
t = lexer.nextToken()
print(lexer.ruleNames[t.type-1], repr(t.text))
if t.type != -1:
len_value = len(t.text)
self.setStyling(len_value, t.type)
else:
break
def description(self, style_nr):
return str(style_nr)
if __name__ == '__main__':
app = QApplication([])
v = QsciScintilla()
lexer = QsciIniLexer(v)
v.setLexer(lexer)
v.setText(textwrap.dedent("""\
; Comment outside
[section s1]
; Comment inside
a = 1
b = 2
[section s2]
c = 3 ; Comment right side
d = e
"""))
v.show()
app.exec_()
and run it, if everything went well you should get this outcome:
Here's my questions:
As you can see, the outcome of the demo is far away from being usable, you definitely don't want that, it's really disturbing. Instead, you'd like to get a similar behaviour than all IDEs out there. Unfortunately I don't know how to achieve that, how would you modify the snippet providing such a behaviour?
Right now I'm trying to mimick a similar highlighting than the below snapshot:
you can see on that screenshot the highlighting is different on variable assignments (variable=deeppink and values=yellowish) but I don't know how to achieve that, I've tried using this slightly modified grammar:
grammar ini;
start : section (option)*;
section : '[' STRING ']';
option : VARIABLE '=' VALUE;
COMMENT : ';' ~[\r\n]*;
VARIABLE : [a-zA-Z0-9]+;
VALUE : [a-zA-Z0-9]+;
WS : [ \t\n\r]+;
and then changing the styles to:
style = {
"T__0": lst[3],
"T__1": lst[3],
"T__2": lst[3],
"COMMENT": lst[2],
"VARIABLE": lst[0],
"VALUE": lst[1],
"WS": lst[3],
}
but if you look at the lexing output you'll see there won't be distinction between VARIABLE and VALUES because order precedence in the ANTLR grammar. So my question is, how would you modify the grammar/snippet to achieve such visual appearance?
The problem is that the lexer needs to be context sensitive: everything on the left hand side of the = needs to be a variable, and to the right of it a value. You can do this by using ANTLR's lexical modes. You start off by classifying successive non-spaces as being a variable, and when encountering a =, you move into your value-mode. When inside the value-mode, you pop out of this mode whenever you encounter a line break.
Note that lexical modes only work in a lexer grammar, not the combined grammar you now have. Also, for syntax highlighting, you probably only need the lexer.
Here's a quick demo of how this could work (stick it in a file called IniLexer.g4):
lexer grammar IniLexer;
SECTION
: '[' ~[\]]+ ']'
;
COMMENT
: ';' ~[\r\n]*
;
ASSIGN
: '=' -> pushMode(VALUE_MODE)
;
KEY
: ~[ \t\r\n]+
;
SPACES
: [ \t\r\n]+ -> skip
;
UNRECOGNIZED
: .
;
mode VALUE_MODE;
VALUE_MODE_SPACES
: [ \t]+ -> skip
;
VALUE
: ~[ \t\r\n]+
;
VALUE_MODE_COMMENT
: ';' ~[\r\n]* -> type(COMMENT)
;
VALUE_MODE_NL
: [\r\n]+ -> skip, popMode
;
If you now run the following script:
source = """
; Comment outside
[section s1]
; Comment inside
a = 1
b = 2
[section s2]
c = 3 ; Comment right side
d = e
"""
lexer = IniLexer(InputStream(source))
stream = CommonTokenStream(lexer)
stream.fill()
for token in stream.tokens[:-1]:
print("{0:<25} '{1}'".format(IniLexer.symbolicNames[token.type], token.text))
you will see the following output:
COMMENT '; Comment outside'
SECTION '[section s1]'
COMMENT '; Comment inside'
KEY 'a'
ASSIGN '='
VALUE '1'
KEY 'b'
ASSIGN '='
VALUE '2'
SECTION '[section s2]'
KEY 'c'
ASSIGN '='
VALUE '3'
COMMENT '; Comment right side'
KEY 'd'
ASSIGN '='
VALUE 'e'
And an accompanying parser grammar could look like this:
parser grammar IniParser;
options {
tokenVocab=IniLexer;
}
sections
: section* EOF
;
section
: COMMENT
| SECTION section_atom*
;
section_atom
: COMMENT
| KEY ASSIGN VALUE
;
which would parse your example input in the following parse tree:
I already implemented something like this in C++.
https://github.com/tora-tool/tora/blob/master/src/editor/tosqltext.cpp
Sub-classed QScintilla class and implemented custom Lexer based on ANTLR generated source.
You might even use ANTLR parser (I did not use it), QScitilla allows you to have more than one analyzer (having different weight), so you can periodically perform some semantic check on text. What can not be done easily in QScintilla is to associate token with some additional data.
Syntax highlighting in Sctintilla is done by dedicated highlighter classes, which are lexers. A parser is not well suited for such kind of work, because the syntax highlighting feature must work, even if the input contains errors. A parser is a tool to verify the correctness of the input - 2 totally different tasks.
So I recommend you stop thinking about using ANTLR4 for that and just take one of the existing Lex classes and create a new one for the language you want to highlight.
I want to parse latex file so that i can convert it into html. I started with basic minimum text file. But the tree created by "grun Tex_grammar start -gui" is showing errors. I think there is some error in the code which i am not able to figure out
Below is my Grammar code:
///////////////////////////////////////////////////////
grammar Tex_grammar;
r : START BEGINDOC body_text ENDDOC EOF ; //
body_text: body_text body_parts
| body_parts
;
body_parts: section
| subsection
;
section: section (SECTION'{' name'}') text ('\r\n')*
|(SECTION'{' name'}') text ('\r\n')*
;
subsection: subsection (SUBSECTION'{' name'}') text ('\r\n')*
|(SUBSECTION'{' name'}') text ('\r\n')*
;
name : WORD ' ' WORD*;
text: text name* '\r|\n'
|name '\r|\n'
|' '+
;
\\Lexer
START: '\\documentclass{article}';
BEGINDOC: '\\begin{document}';
ENDDOC: '\\end{document}';
SECTION : '\\section';
SUBSECTION : '\\subsection';
INTEGER : [0-9]+ ;
WORD : (((PUNCTUATION)|[a-zA-Z0-9]|(GREEK_LETTERS))+|'('|')');
GREEK_LETTERS : '\\alpha'
|'\\beta'|'\\gamma'|'\\delta'|'\\epsilon';
fragment LETTERS : [a-zA-Z];
fragment PUNCTUATION : ('.'|'\'|!|'?'|:|;|');
I define an assembler file with name dataset2.ttl. The content of this file is:
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.
<#data1> rdf:type tdb:GraphTDB ;
tdb:dataset <#dataset> ;
tdb:graphName <http://example.org/data1> ;
ja:content [ja:externalContent <file:///C:/Users/data/data1.ttl>;];
.
The related jena code to create a datase is:
public class TDB {
public static void main(String[] args) {
Dataset ds = null;
try {
ds = TDBFactory.assembleDataset("Dataset2.ttl");
if(ds == null) {
System.out.println("initial tdb failed");
} else {
System.out.println("Default Model:");
Model model = ds.getDefaultModel();
ds.begin(ReadWrite.WRITE);
model.write(System.out, "TURTLE");
}
} finally {
if(ds != null) {
ds.close();
}
}
}
The content in data1.ttl is:
#prefix : <http://example.org/> .
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
:alice
a foaf:Person ;
foaf:name "Alice" ;
foaf:mbox <mailto:alice#example.org> ;
foaf:knows :bob ;
foaf:knows :charlie ;
foaf:knows :snoopy ;
.
:bob
foaf:name "Bob" ;
foaf:knows :charlie ;
.
:charlie
foaf:name "Charlie" ;
foaf:knows :alice ;
.
A dataset has been created using this code. However, the content in the file of "data1.ttl" has not been read into the model. What is the problem of my code?
You also have
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.
and
ds = TDBFactory.assembleDataset("Dataset2.ttl");
so you are asking Jena to assemble a dataset. That dataset will be <#dataset> (find by the type). It is not connected the graph you define so that's ignored; you can remove that part. Assembling the dataset is the way to do this.
You have tdb:unionDefaultGraph true so the default graph for query is the combination of all named graphs in the dataset.
Pick one out with model.getNamedModel.
If you use SPARQL, use the GRAPH keyword.
I would try validating your ttl file online to make sure they dataset2.ttl and data.ttl are both valid. I noticed you seem to add an extra semi-colon at the end when it's not needed (it should end with just a period).
try changing your line to this:
ja:content [ja:externalContent <file:///C:/Users/data/data1.ttl>] .
<#data1> rdf:type tdb:GraphTDB ;
tdb:dataset <#dataset> ;
tdb:graphName <http://example.org/data1> ;
ja:content [ja:externalContent <file:///C:/Users/data/data1.ttl>;];
.
Note the tdb:GraphTDB which means attach to a graph in the database. It does not load data with ja:content.
As a persistent store, it is expected that the data is already loaded, e.g.by tdbloader, not every time the assembler is used.
I want to change every entry in csv file to 'BlahBlah'
For that I have antlr grammar as
grammar CSV;
file : hdr row* row1;
hdr : row;
row : field (',' value1=field)* '\r'? '\n'; // '\r' is optional at the end of a row of CSV file ..
row1 : field (',' field)* '\r'? '\n'?;
field
: TEXT
{
$setText("BlahBlah");
}
| STRING
|
;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""' | ~'"')* '"' ;
But when I run this on antlr4
error(63): CSV.g4:13:3: unknown attribute reference setText in $setText
make: *** [run] Error 1
why is setText not supported in antlr4 and is there any other alternative to replace text?
Couple of problems here:
First, have to identify the receiver of the setText method. Probably want
field : TEXT { $TEXT.setText("BlahBlah"); }
| STRING
;
Second is that setText is not defined in the Token class.
Typically, create your own token class extending CommonToken and corresponding token factory class. Set the TokenLableType (in the options block) to your token class name. The setText method in CommonToken will then be visible.
tl;dr:
Given the following grammar (derived from original CSV.g4 sample and grammar attempt of OP (cf. question)):
grammar CSVBlindText;
#header {
import java.util.*;
}
/** Derived from rule "file : hdr row+ ;" */
file
locals [int i=0]
: hdr ( rows+=row[$hdr.text.split(",")] {$i++;} )+
{
System.out.println($i+" rows");
for (RowContext r : $rows) {
System.out.println("row token interval: "+r.getSourceInterval());
}
}
;
hdr : row[null] {System.out.println("header: '"+$text.trim()+"'");} ;
/** Derived from rule "row : field (',' field)* '\r'? '\n' ;" */
row[String[] columns] returns [Map<String,String> values]
locals [int col=0]
#init {
$values = new HashMap<String,String>();
}
#after {
if ($values!=null && $values.size()>0) {
System.out.println("values = "+$values);
}
}
// rule row cont'd...
: field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
( ',' field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
)* '\r'? '\n'
;
field
: TEXT
| STRING
|
;
TEXT : ~[',\n\r"]+ {setText( "BlahBlah" );} ;
STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote
One has:
$> antlr4 -no-listener CSVBlindText.g4
$> grep setText CSVBlindText*java
CSVBlindTextLexer.java: setText( "BlahBlah" );
Compiling it works flawlessly:
$> javac CSVBlindText*.java
Testdata (the users.csv file just renamed):
$> cat blinded_by_grammar.csv
User, Name, Dept
parrt, Terence, 101
tombu, Tom, 020
bke, Kevin, 008
Yields in test:
$> grun CSVBlindText file blinded_by_grammar.csv
header: 'BlahBlah,BlahBlah,BlahBlah'
values = {BlahBlah=BlahBlah}
values = {BlahBlah=BlahBlah}
values = {BlahBlah=BlahBlah}
3 rows
row token interval: 6..11
row token interval: 12..17
row token interval: 18..23
So it looks as if the setText() should be injected before the semicolon of a production and not between alternatives (wild guessing here ;-)
Previous iterations below:
Just guessing, as I 1) have no working antlr4 available currently and 2) did not write ANTLR4 grammars for quite some time now - maybe without the Dollar ($) ?
grammar CSV;
file : hdr row* row1;
hdr : row;
row : field (',' value1=field)* '\r'? '\n'; // '\r' is optional at the end of a row of CSV file ..
row1 : field (',' field)* '\r'? '\n'?;
field
: TEXT
{
setText("BlahBlah");
}
| STRING
|
;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""' | ~'"')* '"' ;
Update: Now that an antlr 4.5.2 (at least via brew) instead of a 4.5.3 is available, I digged into that and answering some comment below from OP: the setText() will be generated in lexer java module if the grammar is well defined. Unfortunately debugging antlr4 grammars for a dilettant like me is ... but nevertheless very nice language construction kit IMO.
Sample session:
$> antlr4 -no-listener CSV.g4
$> grep setText CSVLexer.java
setText( String.valueOf(getText().charAt(1)) );
The grammar used:
(hacked up from example code retrieved via:
curl -O http://media.pragprog.com/titles/tpantlr2/code/tpantlr2-code.tgz )
grammar CSV;
#header {
import java.util.*;
}
/** Derived from rule "file : hdr row+ ;" */
file
locals [int i=0]
: hdr ( rows+=row[$hdr.text.split(",")] {$i++;} )+
{
System.out.println($i+" rows");
for (RowContext r : $rows) {
System.out.println("row token interval: "+r.getSourceInterval());
}
}
;
hdr : row[null] {System.out.println("header: '"+$text.trim()+"'");} ;
/** Derived from rule "row : field (',' field)* '\r'? '\n' ;" */
row[String[] columns] returns [Map<String,String> values]
locals [int col=0]
#init {
$values = new HashMap<String,String>();
}
#after {
if ($values!=null && $values.size()>0) {
System.out.println("values = "+$values);
}
}
// rule row cont'd...
: field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
( ',' field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
)* '\r'? '\n'
;
field
: TEXT
| STRING
| CHAR
|
;
TEXT : ~[',\n\r"]+ ;
STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote
/** Convert 3-char 'x' input sequence to string x */
CHAR: '\'' . '\'' {setText( String.valueOf(getText().charAt(1)) );} ;
Compiling works:
$> javac CSV*.java
Now test with a matching weird csv file:
a,b
"y",'4'
As:
$> grun CSV file foo.csv
line 1:0 no viable alternative at input 'a'
line 1:2 no viable alternative at input 'b'
header: 'a,b'
values = {a="y", b=4}
1 rows
row token interval: 4..7
So in conclusion, I suggest to rework the logic of the grammar (I presume inserting "BlahBlahBlah" was not essential but a mere debugging hack).
And citing http://www.antlr.org/support.html :
ANTLR Discussions
Please do not start discussions at stackoverflow. They have asked us to
steer discussions (i.e., non-questions/answers) away from Stackoverflow; we
have a discussion forum at Google specifically for that:
https://groups.google.com/forum/#!forum/antlr-discussion
We can discuss ANTLR project features, direction, and generally argue about
whatever we want at the google discussion forum.
I hope this helps.
Implementing a peg.js based parser, I get stuck adding code to to handle c-style comments /* like this */.
I need to find the end marker without eating it.
this not working:
multi = '/*' .* '*/'
The message is:
line: 14
Expected "*/" or any character but end of input found.
I do understand why this is not working, but unfortunately I have no clue how to make comment parsing functional.
Here's the code so far:
start = item*
item = comment / content_line
content_line = _ p:content _ {return ['CONTENT',p]}
content = 'some' / 'legal' / 'values'
comment = _ p:(single / multi) {return ['COMMENT',p]}
single = '//' p:([^\n]*) {return p.join('')}
multi = 'TODO'
_ = [ \t\r\n]* {return null}
and some sample input:
// line comment, no problems here
/*
how to parse this ??
*/
values
// another comment
some legal
Use a predicate that looks ahead and makes sure there is no "*/" ahead in the character stream before matching characters:
comment
= "/*" (!"*/" .)* "*/"
The (!"*/" .) part could be read as follows: when there's no '*/' ahead, match any character.
This will therefor match comments like this successfully: /* ... **/
complete code:
Parser:
start = item*
item = comment / content_line
content_line = _ p:content _ {return ['CONTENT',p]}
content = 'all' / 'legal' / 'values' / 'Thanks!'
comment = _ p:(single / multi) {return ['COMMENT',p]}
single = '//' p:([^\n]*) {return p.join('')}
multi = "/*" inner:(!"*/" i:. {return i})* "*/" {return inner.join('')}
_ = [ \t\r\n]* {return null}
Sample:
all
// a comment
values
// another comment
legal
/*12
345 /*
*/
Thanks!
Result:
[
["CONTENT","all"],
["COMMENT"," a comment"],
["CONTENT","values"],
["COMMENT"," another comment"],
["CONTENT","legal"],
["COMMENT","12\n345 /* \n"],
["CONTENT","Thanks!"]
]