Gold Parser Lowercase - parsing

Gold Parser issue: Having a problem with Uppercase/Lowercase character differentiation. The following in my grammar is failing:
LowercaseLetter = {&61 .. &7A}
LowercaseLetters = LowercaseLetter+
UppercaseLetter = {&41 .. &5A}
UppercaseLetters = UppercaseLetter+
I get a 'DFA State' indicating 'Cannot distinguish between LowercaseLetter and UppercaseLetter'.
I find this confusing because I believe that LowercaseLetter is defined by the set of ascii characters 'a' through to 'z', and UppercaseLetter is defined by the set of ascii characters 'A' through to 'Z'.
Any assistance most welcome.

The answer is to set the Parameter, "Case Sensitive", to True within your Grammar, as follows:
"Case Sensitive" = 'True'

Related

PLY yacc parser : how can I create a recursive rule for parsing matrices?

I'm trying to parse matrices written this way : [[1,2];[3,2];[3,4]]
So, the matrix syntax is of the form: [[A0,0, A0,1, ...]; [A1,0, A1,1, ...];...]
The semicolon is used to separate the rows of a matrix, so it is not present in theassignment of a matrix that has only one row. On the other hand, the comma is usedto separate the columns of a matrix, which on the other hand will not be present in the assignment of a matrix that has only one column.
Now my question is : how can I parse a matrix with a variable size ? I guess with recursive rule in ply yacc, but how? Every attempt lead to a infinite recursion.
This is the error I get :
WARNING: Symbol 'primary' is unreachable
WARNING: Symbol 'expr' is unreachable
ERROR: Infinite recursion detected for symbol 'primary'
ERROR: Infinite recursion detected for symbol 'expr'
When I thry this kind of code:
def p_test(t):
'''primary : constant
| LPAREN expr RPAREN'''
print('yo')
t[0] = 1
def p_expression_matrice(t):
'''expr : primary
| primary '+' primary'''
print('hey')
t[0] = 1
(this i just a first attempt to understand how to write recurion in yacc, not even an answer to my real problem)
This is my lexer :
from global_variables import tokens
import ply.lex as lex
t_PLUS = r'\+'
t_MINUS = r'\-'
t_TIMES = r'\*'
t_DIVIDE = r'\/'
t_MODULO = r'\%'
t_EQUALS = r'\='
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_LBRACK = r'\['
t_RBRACK = r'\]'
t_SEMICOLON = r'\;'
t_COMMA = r'\,'
t_POWER = r'\^'
t_QUESTION = r'\?'
t_NAME = r'[a-zA-Z]{2,}|[a-hj-zA-HJ-Z]' # all words (only letters) except the word 'i' alone
t_COMMAND = r'![\x00-\x7F]*' # all unicode characters after '!'
def t_NUMBER(t):
r'\d+(\.\d+)?'
try:
t.value = int(t.value)
except:
t.value = float(t.value)
return t
def t_IMAGINE(t):
r'i'
t.value = 1j
return t
t_ignore = " \t"
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
lexer = lex.lex()
After multiple tries, I got this :
def p_test3(t):
'''expression : expression SEMICOLON expression'''
t[0] = []
t[0].append(t[1])
t[0].append(t[3])
def p_test2(t):
'''expression : LBRACK expression RBRACK'''
t[0] = t[2]
def p_test(t):
'''expression : expression COMMA NUMBER'''
t[0] = []
try:
for i in t[1]:
t[0].append(i)
except:
t[0].append(t[1])
t[0].append(t[3])
Which alllows me to parse this : [[1,2,3,4,5];[5,4,3,2,1]] to get this : [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]] stored in a array of array.
Next step, I will store many of these in a dict and I'ill keep going, thanks for the support #rici :)

lua tables - string representation

as a followup question to lua tables - allowed values and syntax:
I need a table that equates large numbers to strings. The catch seems to be that strings with punctuation are not allowed:
local Names = {
[7022003001] = fulsom jct, OH
[7022003002] = kennedy center, NY
}
but neither are quotes:
local Names = {
[7022003001] = "fulsom jct, OH"
[7022003002] = "kennedy center, NY"
}
I have even tried without any spaces:
local Names = {
[7022003001] = fulsomjctOH
[7022003002] = kennedycenterNY
}
When this module is loaded, wireshark complains "}" is expected to close "{" at line . How can I implement a table with a string that contains spaces and punctuation?
As per Lua Reference Manual - 3.1 - Lexical Conventions:
A short literal string can be delimited by matching single or double quotes, and can contain the (...) C-like escape sequences (...).
That means the short literal string in Lua is:
local foo = "I'm a string literal"
This matches your second example. The reason why it fails is because it lacks a separator between table members:
local Names = {
[7022003001] = "fulsom jct, OH",
[7022003002] = "kennedy center, NY"
}
You can also add a trailing separator after the last member.
The more detailed description of the table constructor can be found in 3.4.9 - Table Constructors. It could be summed up by the example provided there:
a = { [f(1)] = g; "x", "y"; x = 1, f(x), [30] = 23; 45 }
I really, really recommend using the Lua Reference Manual, it is an amazing helper.
I also highly encourage you to read some basic tutorials e.g. Learn Lua in 15 minutes. They should give you an overview of the language you are trying to use.

How to highlight QScintilla using ANTLR4?

I'm trying to learn ANTLR4 and I'm already having some issues with my first experiment.
The goal here is to learn how to use ANTLR to syntax highlight a QScintilla component. To practice a little bit I've decided I'd like to learn how to properly highlight *.ini files.
First things first, in order to run the mcve you'll need:
Download antlr4 and make sure it works, read the instructions on the main site
Install python antlr runtime, just do: pip install antlr4-python3-runtime
Generate the lexer/parser of ini.g4:
grammar ini;
start : section (option)*;
section : '[' STRING ']';
option : STRING '=' STRING;
COMMENT : ';' ~[\r\n]*;
STRING : [a-zA-Z0-9]+;
WS : [ \t\n\r]+;
by running antlr ini.g4 -Dlanguage=Python3 -o ini
Finally, save main.py:
import textwrap
from PyQt5.Qt import *
from PyQt5.Qsci import QsciScintilla, QsciLexerCustom
from antlr4 import *
from ini.iniLexer import iniLexer
from ini.iniParser import iniParser
class QsciIniLexer(QsciLexerCustom):
def __init__(self, parent=None):
super().__init__(parent=parent)
lst = [
{'bold': False, 'foreground': '#f92472', 'italic': False}, # 0 - deeppink
{'bold': False, 'foreground': '#e7db74', 'italic': False}, # 1 - khaki (yellowish)
{'bold': False, 'foreground': '#74705d', 'italic': False}, # 2 - dimgray
{'bold': False, 'foreground': '#f8f8f2', 'italic': False}, # 3 - whitesmoke
]
style = {
"T__0": lst[3],
"T__1": lst[3],
"T__2": lst[3],
"COMMENT": lst[2],
"STRING": lst[0],
"WS": lst[3],
}
for token in iniLexer.ruleNames:
token_style = style[token]
foreground = token_style.get("foreground", None)
background = token_style.get("background", None)
bold = token_style.get("bold", None)
italic = token_style.get("italic", None)
underline = token_style.get("underline", None)
index = getattr(iniLexer, token)
if foreground:
self.setColor(QColor(foreground), index)
if background:
self.setPaper(QColor(background), index)
def defaultPaper(self, style):
return QColor("#272822")
def language(self):
return self.lexer.grammarFileName
def styleText(self, start, end):
view = self.editor()
code = view.text()
lexer = iniLexer(InputStream(code))
stream = CommonTokenStream(lexer)
parser = iniParser(stream)
tree = parser.start()
print('parsing'.center(80, '-'))
print(tree.toStringTree(recog=parser))
lexer.reset()
self.startStyling(0)
print('lexing'.center(80, '-'))
while True:
t = lexer.nextToken()
print(lexer.ruleNames[t.type-1], repr(t.text))
if t.type != -1:
len_value = len(t.text)
self.setStyling(len_value, t.type)
else:
break
def description(self, style_nr):
return str(style_nr)
if __name__ == '__main__':
app = QApplication([])
v = QsciScintilla()
lexer = QsciIniLexer(v)
v.setLexer(lexer)
v.setText(textwrap.dedent("""\
; Comment outside
[section s1]
; Comment inside
a = 1
b = 2
[section s2]
c = 3 ; Comment right side
d = e
"""))
v.show()
app.exec_()
and run it, if everything went well you should get this outcome:
Here's my questions:
As you can see, the outcome of the demo is far away from being usable, you definitely don't want that, it's really disturbing. Instead, you'd like to get a similar behaviour than all IDEs out there. Unfortunately I don't know how to achieve that, how would you modify the snippet providing such a behaviour?
Right now I'm trying to mimick a similar highlighting than the below snapshot:
you can see on that screenshot the highlighting is different on variable assignments (variable=deeppink and values=yellowish) but I don't know how to achieve that, I've tried using this slightly modified grammar:
grammar ini;
start : section (option)*;
section : '[' STRING ']';
option : VARIABLE '=' VALUE;
COMMENT : ';' ~[\r\n]*;
VARIABLE : [a-zA-Z0-9]+;
VALUE : [a-zA-Z0-9]+;
WS : [ \t\n\r]+;
and then changing the styles to:
style = {
"T__0": lst[3],
"T__1": lst[3],
"T__2": lst[3],
"COMMENT": lst[2],
"VARIABLE": lst[0],
"VALUE": lst[1],
"WS": lst[3],
}
but if you look at the lexing output you'll see there won't be distinction between VARIABLE and VALUES because order precedence in the ANTLR grammar. So my question is, how would you modify the grammar/snippet to achieve such visual appearance?
The problem is that the lexer needs to be context sensitive: everything on the left hand side of the = needs to be a variable, and to the right of it a value. You can do this by using ANTLR's lexical modes. You start off by classifying successive non-spaces as being a variable, and when encountering a =, you move into your value-mode. When inside the value-mode, you pop out of this mode whenever you encounter a line break.
Note that lexical modes only work in a lexer grammar, not the combined grammar you now have. Also, for syntax highlighting, you probably only need the lexer.
Here's a quick demo of how this could work (stick it in a file called IniLexer.g4):
lexer grammar IniLexer;
SECTION
: '[' ~[\]]+ ']'
;
COMMENT
: ';' ~[\r\n]*
;
ASSIGN
: '=' -> pushMode(VALUE_MODE)
;
KEY
: ~[ \t\r\n]+
;
SPACES
: [ \t\r\n]+ -> skip
;
UNRECOGNIZED
: .
;
mode VALUE_MODE;
VALUE_MODE_SPACES
: [ \t]+ -> skip
;
VALUE
: ~[ \t\r\n]+
;
VALUE_MODE_COMMENT
: ';' ~[\r\n]* -> type(COMMENT)
;
VALUE_MODE_NL
: [\r\n]+ -> skip, popMode
;
If you now run the following script:
source = """
; Comment outside
[section s1]
; Comment inside
a = 1
b = 2
[section s2]
c = 3 ; Comment right side
d = e
"""
lexer = IniLexer(InputStream(source))
stream = CommonTokenStream(lexer)
stream.fill()
for token in stream.tokens[:-1]:
print("{0:<25} '{1}'".format(IniLexer.symbolicNames[token.type], token.text))
you will see the following output:
COMMENT '; Comment outside'
SECTION '[section s1]'
COMMENT '; Comment inside'
KEY 'a'
ASSIGN '='
VALUE '1'
KEY 'b'
ASSIGN '='
VALUE '2'
SECTION '[section s2]'
KEY 'c'
ASSIGN '='
VALUE '3'
COMMENT '; Comment right side'
KEY 'd'
ASSIGN '='
VALUE 'e'
And an accompanying parser grammar could look like this:
parser grammar IniParser;
options {
tokenVocab=IniLexer;
}
sections
: section* EOF
;
section
: COMMENT
| SECTION section_atom*
;
section_atom
: COMMENT
| KEY ASSIGN VALUE
;
which would parse your example input in the following parse tree:
I already implemented something like this in C++.
https://github.com/tora-tool/tora/blob/master/src/editor/tosqltext.cpp
Sub-classed QScintilla class and implemented custom Lexer based on ANTLR generated source.
You might even use ANTLR parser (I did not use it), QScitilla allows you to have more than one analyzer (having different weight), so you can periodically perform some semantic check on text. What can not be done easily in QScintilla is to associate token with some additional data.
Syntax highlighting in Sctintilla is done by dedicated highlighter classes, which are lexers. A parser is not well suited for such kind of work, because the syntax highlighting feature must work, even if the input contains errors. A parser is a tool to verify the correctness of the input - 2 totally different tasks.
So I recommend you stop thinking about using ANTLR4 for that and just take one of the existing Lex classes and create a new one for the language you want to highlight.

Trying to add special character (%) sign to variable with following concatenation sign in ESQL, it gives me the the below error

Trying to add special character (%) sign to variable with following concatenation sign, but it gives me the the error: Invalid characters.
DECLARE Percent CHARACTER CAST ( ' %' AS CHARACTER CCSID 1208);
SET AlocatedAmount = 45
SET InPercent = AlocatedAmount||'%'
Result should be: InPercent = 45%
Error:Invalid characters::45 %
What's going wrong here?
AlocatedAmount seems to be an INTEGER, on what you cannot use the concatenation operator.
You need to cast that to CHARACTER first:
SET InPercent = CAST(AlocatedAmount AS CHARACTER) || '%';
So there is also the option of using FORMAT in your CAST
DECLARE Num INTEGER;
DECLARE FormattedStr CHAR;
SET Num = 45;
SET FormattedStr = CAST(Num AS CHAR FORMAT '#0%');
More information can be found at https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ak05615_.htm

Lua pattern replace uppercase letters

I need a special Lua pattern that takes all the uppercase letters in a string, and replaces them with a space and the respective lowercase letter;
TestStringOne => test string one
this isA TestString => this is a test string
Can it be done?
Assuming only ASCII is used, this works:
function lowercase(str)
return (str:gsub("%u", function(c) return ' ' .. c:lower() end))
end
print(lowercase("TestStringOne"))
print(lowercase("this isA TestString"))
function my(s)
s = s:gsub('(%S)(%u)', '%1 %2'):lower()
return s
end
print(my('TestStringOne')) -->test string one
print(my('this isA TestString')) -->this is a test string

Resources