How to end the tokenization of a sequence in Flex tool? - parsing

I am using Flex (fast lexical analyser generator) tool. I have defined my reges as such:
value [0-9]|[1-9][0-9]*
id [a-zA-Z][a-zA-Z0-9]*
plus "+"
...
I have a few more keywords and operators defined like those. Here is one sample output that helps you understand my problem:
> 123
VALUE: 123 (123)
> name
IDENTIFIER: name
> 1230
VALUE: 1230 (1230)
> 0123
VALUE: 0 (0)
VALUE: 123 (123)
> 123x
VALUE: 123 (123)
IDENTIFIER: x
> x+
IDENTIFIER: x
OP_PLUS: +
As long as it fits a token to a proper class, it finishes and goes to the start state int the DFA again. Doing it just as it should. But I don't know how to handle this in Flex.
And I believe numbers with leading zeros regex working just fine, yet it crashes because of the same reason. A proper output I am waiting for:
> (+ x 3)
OPEN_P: (
PLUS: +
IDENTIFIER: x
VALUE: 3 (3)
CLOSE_P: )
> 0123
SYNTAX ERROR
> 123X
SYNTAX ERROR
> 123+
SYNTAX ERROR
I don't want this sequence 123x to be shown as like this:
VALUE: 123
ID: x
Instead I want to get a syntax error. Because 123x is not a valid sequence for me. or 0123, or 123+ etc.

Related

Properly matching a set of tokens against my BASIC grammar

I am working on writing a BASIC interpreter in Prolog. DCGs are a little tricky, which is why I am having trouble here today.
Here is my grammar.
bool --> [true].
bool --> [false].
is_num_char(AD) :- AD = '.'; (atom_codes(AD, [D]), D >= 48, D =< 57).
number --> [].
number --> is_num_char, number.
quotes_on_atom(S) :- atom_chars(S, ['"' | C]), last(C, '"').
string --> quotes_on_atom.
literal --> bool; number; string.
variable --> \+ literal.
assignment --> string, ['='], expr.
equal --> expr, ['=='], expr.
not_equal --> expr, ['!='], expr.
if --> [if], expr, [then], expr.
for_decl --> [for], assignment, [to], number, [step], number.
for_next --> [next], integer.
expr --> literal; variable; assignment;
equal; not_equal; if; for_decl; for_next.
Here is my main goal:
main :-
% expected: [for, [i, '=', 0], to, '5', step, '1']
trace, phrase(expr, [for, i, '=', '0', to, '5', step, '1']).
Here is the error I get:
uncaught exception: error(existence_error(procedure,is_num_char/2),number/0).
A stack trace revealed this:
The debugger will first creep -- showing everything (trace)
1 1 Call: phrase(expr,[for,i,=,'0',to,'5',step,'1']) ?
2 2 Call: expr([for,i,=,'0',to,'5',step,'1'],_335) ?
3 3 Call: literal([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Call: bool([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Fail: bool([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Call: number([for,i,=,'0',to,'5',step,'1'],_335) ?
4 4 Exit: number([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
3 3 Exit: literal([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
2 2 Exit: expr([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
2 2 Redo: expr([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
3 3 Redo: literal([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
4 4 Redo: number([for,i,=,'0',to,'5',step,'1'],[for,i,=,'0',to,'5',step,'1']) ?
5 5 Call: is_num_char([for,i,=,'0',to,'5',step,'1'],_446) ?
5 5 Exception: is_num_char([for,i,=,'0',to,'5',step,'1'],_459) ?
4 4 Exception: number([for,i,=,'0',to,'5',step,'1'],_335) ?
3 3 Exception: literal([for,i,=,'0',to,'5',step,'1'],_335) ?
2 2 Exception: expr([for,i,=,'0',to,'5',step,'1'],_335) ?
1 1 Exception: phrase(expr,[for,i,=,'0',to,'5',step,'1']) ?
uncaught exception: error(existence_error(procedure,is_num_char/2),number/0)
{trace}
It seems that is_num_char is being passed the whole list of tokens, along with something else. I do not understand why this is happening, given that the number rule accepts only one argument. Additionally, it's odd that the token list unifies with number to begin with. It should be unified with for_decl instead. If you are knowledgeable about Prolog DCGs please let me know what I am doing wrong here.

How do I create a DCG rule inverse to another in Prolog?

I am writing a Commodore BASIC interpreter in Prolog, and I am writing some DCGs to parse it. I have verified the DCGs below to work except for the variable one. My goal is this: for anything which isn't a boolean, integer, float, or a string, it's a variable. However, anything that I give it via phrase just results in no.
bool --> [true].
bool --> [false].
integer --> [1]. % how to match nums?
float --> [0.1].
string --> [Str], {atom_chars(Str, ['"' | Chars]), last(Chars, '"')}.
literal --> bool; integer; float; string.
variable --> \+ literal.
I ran a stack trace like this (with gprolog)
main :- trace, phrase(variable, [bar]).
Looking at this, I cannot figure out why variable fails, given that it fails for each case in literal. I'm guessing that the error is pretty simple, but I'm still stumped, so does anyone who's good at Prolog have an idea of what I'm doing wrong?
| ?- main.
The debugger will first creep -- showing everything (trace)
1 1 Call: phrase(variable,[bar]) ?
2 2 Call: variable([bar],_321) ?
3 3 Call: \+literal([bar],_348) ?
4 4 Call: literal([bar],_348) ?
5 5 Call: bool([bar],_348) ?
5 5 Fail: bool([bar],_348) ?
5 5 Call: integer([bar],_348) ?
5 5 Fail: integer([bar],_348) ?
5 5 Call: float([bar],_348) ?
5 5 Fail: float([bar],_348) ?
5 5 Call: string([bar],_348) ?
6 6 Call: atom_chars(bar,['"'|_418]) ?
6 6 Fail: atom_chars(bar,['"'|_418]) ?
5 5 Fail: string([bar],_348) ?
4 4 Fail: literal([bar],_348) ?
3 3 Exit: \+literal([bar],_348) ?
2 2 Exit: variable([bar],[bar]) ?
1 1 Fail: phrase(variable,[bar]) ?
(2 ms) no
{trace}
To expand a bit on the other answer, the key problem is that a DCG rule like \+ literal does not consume items from the input. It only checks that the next item, if any, is not a literal.
To actually consume an item, you need to use a list goal, similarly to how you use a goal [1] to consume a 1 element. So:
variable -->
\+ literal, % if there is a next element, it's not a literal
[_Variable]. % consume this next element, which is a variable
For example:
?- phrase(variable, [bar]).
true.
?- phrase((integer, variable, float), [I, bar, F]).
I = 1,
F = 0.1.
Having that singleton variable _Variable is a bit strange -- when you parse like this, you lose the name of the variable. When your parser is expanded a bit, you will want to use arguments to your DCG rules to communicate information out of the rules:
variable(Variable) -->
\+ literal,
[Variable].
For example:
?- phrase((integer, variable(Var1), float, variable(Var2)), [I, bar, F, foo]).
Var1 = bar,
Var2 = foo,
I = 1,
F = 0.1.
You can detect a string of integers like this (I've added an argument to collect the digits):
integer([H|T]) --> digit(H), integer(T).
integer([]) --> [].
digit(0) --> "0".
digit(1) --> "1".
...
digit(9) --> "9".
As for variable -- it needs to consume text, so you'd want something similar to integer above, but change digit(H) to something that recognizes a character that's part of a "variable".
If you want further clues (although sometimes using slightly advanced tricks): https://www.swi-prolog.org/pldoc/man?section=basics and the code is here: https://github.com/SWI-Prolog/swipl-devel/blob/master/library/dcg/basics.pl

Does GNU FORTH have an editor?

Chapter 3 of Starting FORTH says,
Now that you've made a block "current", you can list it by simply typing the word L. Unlike LIST, L does not want to be proceeded by a block number; instead it lists the current block.
When I run 180 LIST, I get
Screen 180 not modified
0
...
15
ok
But when I run L, I get an error
:30: Undefined word
>>>L<<<
Backtrace:
$7F0876E99A68 throw
$7F0876EAFDE0 no.extensions
$7F0876E99D28 interpreter-notfound1
What am I doing wrong?
Yes, gForth supports an internal (BLOCK) editor. Start gforth
type: use blocked.fb (a demo page)
type: 1 load
type editor
words will show the editor words,
s b n bx nx qx dl il f y r d i t 'par 'line 'rest c a m ok
type 0 l to list screen 0 which describes the editor,
Screen 0 not modified
0 \\ some comments on this simple editor 29aug95py
1 m marks current position a goes to marked position
2 c moves cursor by n chars t goes to line n and inserts
3 i inserts d deletes marked area
4 r replaces marked area f search and mark
5 il insert a line dl delete a line
6 qx gives a quick index nx gives next index
7 bx gives previous index
8 n goes to next screen b goes to previous screen
9 l goes to screen n v goes to current screen
10 s searches until screen n y yank deleted string
11
12 Syntax and implementation style a la PolyFORTH
13 If you don't like it, write a block editor mode for Emacs!
14
15
ok
Creating your own block file
To create your own new block file myblocks.fb
type: use blocked.fb
type: 1 load
type editor
Then
type use myblocks.fb
1 load will show BLOCK #1 (lines 0 till 15. 16 Lines of 64 characters each)
1 t will highlight line 1
Type i this is text to [i]nsert into line 1
After the current BLOCK is edited type flush in order to write BLOCK #1 to the file myblocks.fb
For more information see, gForth Blocks
It turns out these are "Editor Commands" the book says,
For Those Whose EDITOR Doesn't Follow These Rules
The FORTH-79 Standard does not specify editor commands. Your system may use a different editor; if so, check your systems documentation
I don't believe gforth supports an internal editor at all. So L, T, I, P, F, E, D, R are all presumably unsupported.
gforth is well integrated with emacs. In my xemacs here, by default any file called *.fs is considered FORTH source. "C-h m", as usual, gives the available commands.
No, GNU Forth doesn't have an internal editor; I use Vim :)

What is wrong with my attempt to concatenate within an if statement

I'm trying to concatenate leading 0s to the hundreds place.
001 ones
010 tens
100 hundreds
for i = 1 to 100
let x =
if i < 10 then sprintf "Hello World 00%i" i
elif (i >= 10) && (i < 100) then sprintf "Hello World 0%i" i
Squigglies on the elif - The expression expected to have have unit but
it has string
The problem is that an if without an else must have type unit. That is, if you want your if to have a meaningful value (such as your concatenated string), it must have an else.
If you're wondering why, just ask yourself this: What would the value of x be when i is 100 or greater?

Ruby-on-rails evaluation of mathementical expression

In Ruby-on-rails, I am receiving input from a call to a XL macro(currently hard coded), which places a mathematical expression in the spreadsheet. If I call the macro I will receive a worksheet with an expression like this in one of the cells
x + ( 3 / 12)
In the R-O-R application I wish to take this expression and evaluate for different values of x.
row.each do |row|
y = row
end
I want to find the value of y for say example x = 2 ? Should I receive this expression as a literal ?
There is no built-in function to do this securely. You need a math parser and evaluator. You can write one yourself or you could use an existing one like Dentaku.
eval and gsub will get you most of the way there. Fire up irb:
(533)⚡️ irb
2.1.2 :001 > exp = 'x + (3 / 12)'
=> "x + (3 / 12)"
2.1.2 :002 > eval(exp.gsub(/x/, '25'))
=> 25
2.1.2 :003 > exp = 'x + (4.0 / 25.0) + 4'
=> "x + (4.0 / 25.0) + 4"
2.1.2 :004 > eval(exp.gsub(/x/, '25'))
=> 29.16
Notice the result of command 002. Ruby is assuming the 3 / 12 is integer math, so the result will be an integer, which is 0 in this case. In 003 floating point math occurs because the numbers are decimals. This aspect may be an issue you need to tackle more creatively, or just make sure your expressions conform to the type of math you need to occur.
Be aware of the security concerns with eval, you're executing Ruby code in there, so mean people may put mean things in there to try and execute it.
Why not write a one line function, as following:
def foo(x) x + (3 / 12) end
Now you can use this to calculate any value of x, for x = 2, you can do: foo(2) or foo 2.

Resources