Properly matching a set of tokens against my BASIC grammar - parsing

I am working on writing a BASIC interpreter in Prolog. DCGs are a little tricky, which is why I am having trouble here today.
Here is my grammar.
bool --> [true].
bool --> [false].
is_num_char(AD) :- AD = '.'; (atom_codes(AD, [D]), D >= 48, D =< 57).
number --> [].
number --> is_num_char, number.
quotes_on_atom(S) :- atom_chars(S, ['"' | C]), last(C, '"').
string --> quotes_on_atom.
literal --> bool; number; string.
variable --> \+ literal.
assignment --> string, ['='], expr.
equal --> expr, ['=='], expr.
not_equal --> expr, ['!='], expr.
if --> [if], expr, [then], expr.
for_decl --> [for], assignment, [to], number, [step], number.
for_next --> [next], integer.
expr --> literal; variable; assignment;
equal; not_equal; if; for_decl; for_next.
Here is my main goal:
main :-
% expected: [for, [i, '=', 0], to, '5', step, '1']
trace, phrase(expr, [for, i, '=', '0', to, '5', step, '1']).
Here is the error I get:
uncaught exception: error(existence_error(procedure,is_num_char/2),number/0).
A stack trace revealed this:
The debugger will first creep -- showing everything (trace)
1 1 Call: phrase(expr,[for,i,=,'0',to,'5',step,'1']) ?
2 2 Call: expr([for,i,=,'0',to,'5',step,'1'],_335) ?
3 3 Call: literal([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Call: bool([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Fail: bool([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Call: number([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Exit: number([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
3 3 Exit: literal([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
2 2 Exit: expr([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
2 2 Redo: expr([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
3 3 Redo: literal([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
4 4 Redo: number([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
5 5 Call: is_num_char([for,i,=,'0',to,'5',step,'1'],_446) ?
5 5 Exception: is_num_char([for,i,=,'0',to,'5',step,'1'],_459) ?
4 4 Exception: number([for,i,=,'0',to,'5',step,'1'],_335) ?
3 3 Exception: literal([for,i,=,'0',to,'5',step,'1'],_335) ?
2 2 Exception: expr([for,i,=,'0',to,'5',step,'1'],_335) ?
1 1 Exception: phrase(expr,[for,i,=,'0',to,'5',step,'1']) ?
uncaught exception: error(existence_error(procedure,is_num_char/2),number/0)
{trace}
It seems that is_num_char is being passed the whole list of tokens, along with something else. I do not understand why this is happening, given that the number rule accepts only one argument. Additionally, it's odd that the token list unifies with number to begin with. It should be unified with for_decl instead. If you are knowledgeable about Prolog DCGs please let me know what I am doing wrong here.

Related

How do I create a DCG rule inverse to another in Prolog?

I am writing a Commodore BASIC interpreter in Prolog, and I am writing some DCGs to parse it. I have verified the DCGs below to work except for the variable one. My goal is this: for anything which isn't a boolean, integer, float, or a string, it's a variable. However, anything that I give it via phrase just results in no.
bool --> [true].
bool --> [false].
integer --> [1]. % how to match nums?
float --> [0.1].
string --> [Str], {atom_chars(Str, ['"' | Chars]), last(Chars, '"')}.
literal --> bool; integer; float; string.
variable --> \+ literal.
I ran a stack trace like this (with gprolog)
main :- trace, phrase(variable, [bar]).
Looking at this, I cannot figure out why variable fails, given that it fails for each case in literal. I'm guessing that the error is pretty simple, but I'm still stumped, so does anyone who's good at Prolog have an idea of what I'm doing wrong?
| ?- main.
The debugger will first creep -- showing everything (trace)
1 1 Call: phrase(variable,[bar]) ?
2 2 Call: variable([bar],_321) ?
3 3 Call: \+literal([bar],_348) ?
4 4 Call: literal([bar],_348) ?
5 5 Call: bool([bar],_348) ?
5 5 Fail: bool([bar],_348) ?
5 5 Call: integer([bar],_348) ?
5 5 Fail: integer([bar],_348) ?
5 5 Call: float([bar],_348) ?
5 5 Fail: float([bar],_348) ?
5 5 Call: string([bar],_348) ?
6 6 Call: atom_chars(bar,['"'|_418]) ?
6 6 Fail: atom_chars(bar,['"'|_418]) ?
5 5 Fail: string([bar],_348) ?
4 4 Fail: literal([bar],_348) ?
3 3 Exit: \+literal([bar],_348) ?
2 2 Exit: variable([bar],[bar]) ?
1 1 Fail: phrase(variable,[bar]) ?
(2 ms) no
{trace}
To expand a bit on the other answer, the key problem is that a DCG rule like \+ literal does not consume items from the input. It only checks that the next item, if any, is not a literal.
To actually consume an item, you need to use a list goal, similarly to how you use a goal [1] to consume a 1 element. So:
variable -->
\+ literal, % if there is a next element, it's not a literal
[_Variable]. % consume this next element, which is a variable
For example:
?- phrase(variable, [bar]).
true.
?- phrase((integer, variable, float), [I, bar, F]).
I = 1,
F = 0.1.
Having that singleton variable _Variable is a bit strange -- when you parse like this, you lose the name of the variable. When your parser is expanded a bit, you will want to use arguments to your DCG rules to communicate information out of the rules:
variable(Variable) -->
\+ literal,
[Variable].
For example:
?- phrase((integer, variable(Var1), float, variable(Var2)), [I, bar, F, foo]).
Var1 = bar,
Var2 = foo,
I = 1,
F = 0.1.
You can detect a string of integers like this (I've added an argument to collect the digits):
integer([H|T]) --> digit(H), integer(T).
integer([]) --> [].
digit(0) --> "0".
digit(1) --> "1".
...
digit(9) --> "9".
As for variable -- it needs to consume text, so you'd want something similar to integer above, but change digit(H) to something that recognizes a character that's part of a "variable".
If you want further clues (although sometimes using slightly advanced tricks): https://www.swi-prolog.org/pldoc/man?section=basics and the code is here: https://github.com/SWI-Prolog/swipl-devel/blob/master/library/dcg/basics.pl

Lua odd MIN Integer number

Problem (Tested on Lua 5.3 and 5.4):
a = -9223372036854775807 - 1 ==> -9223372036854775808 (lua_integer)
b = -9223372036854775808 ==> -9.2233720368548e+018 (lua_number)
Question:
Is it possible to get "-9223372036854775808" without modify "luaconf.h" or write "-9223372036854775807 - 1"?
When you write b = -9223372036854775808 in your program, the Lua parser treats this as "apply negation operator to positive integer constant", but positive constant is beyond integer range, so it's treated as float, and negation is applied to the float number, and the final result is float.
There are two solutions to get minimal integer:
Bitwise operators convert floats to integers (bitwise OR has lower priority then negation):
b = -9223372036854775808|0
Use the special constant from math library:
b = math.mininteger
P.S.
Please note that additional OR in the expression b = -9223372036854775808|0 does not make your program slower. Actually, all calculations (negation and OR) are done at compile time, and the bytecode contains only the final constant you need:
$ luac53 -l -l -p -
b = -9223372036854775808|0
main <stdin:0,0> (2 instructions at 0x244f780)
0+ params, 2 slots, 1 upvalue, 0 locals, 2 constants, 0 functions
1 [1] SETTABUP 0 -1 -2 ; _ENV "b" -9223372036854775808
2 [1] RETURN 0 1
constants (2) for 0x244f780:
1 "b"
2 -9223372036854775808
locals (0) for 0x244f780:
upvalues (1) for 0x244f780:
0 _ENV 1 0

How to end the tokenization of a sequence in Flex tool?

I am using Flex (fast lexical analyser generator) tool. I have defined my reges as such:
value [0-9]|[1-9][0-9]*
id [a-zA-Z][a-zA-Z0-9]*
plus "+"
...
I have a few more keywords and operators defined like those. Here is one sample output that helps you understand my problem:
> 123
VALUE: 123 (123)
> name
IDENTIFIER: name
> 1230
VALUE: 1230 (1230)
> 0123
VALUE: 0 (0)
VALUE: 123 (123)
> 123x
VALUE: 123 (123)
IDENTIFIER: x
> x+
IDENTIFIER: x
OP_PLUS: +
As long as it fits a token to a proper class, it finishes and goes to the start state int the DFA again. Doing it just as it should. But I don't know how to handle this in Flex.
And I believe numbers with leading zeros regex working just fine, yet it crashes because of the same reason. A proper output I am waiting for:
> (+ x 3)
OPEN_P: (
PLUS: +
IDENTIFIER: x
VALUE: 3 (3)
CLOSE_P: )
> 0123
SYNTAX ERROR
> 123X
SYNTAX ERROR
> 123+
SYNTAX ERROR
I don't want this sequence 123x to be shown as like this:
VALUE: 123
ID: x
Instead I want to get a syntax error. Because 123x is not a valid sequence for me. or 0123, or 123+ etc.

Printing k, v from a defaultdict(list)

I have a program that connects via SSH (Paramiko library) to a Cisco Wireless
LAN Controller (WLC). I then run a 'show client summary' and parse\process
the output to generate a report.
Everything works except the printing.
NOTE: 'e' is a dictionary created with: defaultdict(list)
If I use this:
for k, v in e.items():
print('{:25}'.format(k), end='')
for i in v:
print('{:5}'.format(i), end='')
print("\n")
The output looks like this:
AP Count
------------------------------
AP0027.e3f1.9208 8 7 6
AP70df.2f42.3450 1 1 1
AP25-AthleticOffice 4 4 3
AP70df.2f74.9868 1 1 1
AP70df.2f42.3174 2 2 2
I don't want the extra blank line between the data lines.
But if I simply get rid of the last line: print("\n"),
then I get this format for the output:
AP0027.e3f1.9208 8 7 6AP70df.2f42.3450 1 1 1AP25-AthleticOffice 4 4 3AP70df.2f42.3174 1 1 1AP70df.2f42.3174 2 2 2
No carriage return.
I am either getting zero carriage return or two.
This happens because print() already appends the end character - which is \n by default. You can fix it by printing just an empty string (which is the same as print('', end='\n')):
for k, v in e.items():
print('{:25}'.format(k), end='')
for i in v:
print('{:5}'.format(i), end='')
print('')

Functional impact of declaring local variables via function parameters

In writing some one-off Lua code for an answer, I found myself code golfing to fit a function on a single line. While this code did not fit on one line...
foo=function(a,b) local c=bob; some_code_using_c; return c; end
...I realized that I could just make it fit by converting it to:
foo=function(a,b,c) c=bob; some_code_using_c; return c; end
Are there any performance or functional implications of using a function parameter to declare a function-local variable (assuming I know that a third argument will never be passed to the function) instead of using local? Do the two techniques ever behave differently?
Note: I included semicolons in the above for clarity of concept and to aid those who do not know Lua's handling of whitespace. I am aware that they are not necessary; if you follow the link above you will see that the actual code does not use them.
Edit Based on #Oka's answer, I compared the bytecode generated by these two functions, in separate files:
function foo(a,b)
local c
return function() c=a+b+c end
end
function foo(a,b,c)
-- this line intentionally blank
return function() c=a+b+c end
end
Ignoring addresses, the byte code report is identical (except for the number of parameters listed for the function).
You can go ahead and look at the Lua bytecode generated by using luac -l -l -p my_file.lua, comparing instruction sets and register layouts.
On my machine:
function foo (a, b)
local c = a * b
return c + 2
end
function bar (a, b, c)
c = a * b
return c + 2
end
Produces:
function <f.lua:1,4> (4 instructions at 0x80048fe0)
2 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [2] MUL 2 0 1
2 [3] ADD 3 2 -1 ; - 2
3 [3] RETURN 3 2
4 [4] RETURN 0 1
constants (1) for 0x80048fe0:
1 2
locals (3) for 0x80048fe0:
0 a 1 5
1 b 1 5
2 c 2 5
upvalues (0) for 0x80048fe0:
function <f.lua:6,9> (4 instructions at 0x800492b8)
3 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [7] MUL 2 0 1
2 [8] ADD 3 2 -1 ; - 2
3 [8] RETURN 3 2
4 [9] RETURN 0 1
constants (1) for 0x800492b8:
1 2
locals (3) for 0x800492b8:
0 a 1 5
1 b 1 5
2 c 1 5
upvalues (0) for 0x800492b8:
Not very much difference, is there? If I'm not mistaken, there's just a slightly different declaration location specified for each c, and the difference in the params size, as one might expect.

Resources