How do I check if this grammar is SLR(1)?
S' -> S
S -> [ B
A -> int
A -> [ B
B -> ]
B -> C
C -> A ]
C -> A , C
First I've created it's automaton, then computed the follow sets for non-terminals and then created the parsing table.
I'm not sure if my automaton is correct, but after doing the parsing table for SLR(1) grammar I did not find any errors.
Below is my attempt at the automaton.
I0:
S' -> .S
S -> .[B
I1 (I0 -> S):
S -> [.B
B -> .]
B -> .C
C -> .A]
C -> .A,C
A -> .int
A -> .[B
I3 (I2 -> B)
S -> [B.
I4 (I2 -> ])
B -> ].
I5 (I2 -> C)
B -> C.
I6 (I2 -> A)
C -> A.]
C -> A.,C
I7 (I2 -> int)
A -> int.
I8 (I2 -> [)
A -> [.B
B -> .]
B -> .C
C -> .A]
C -> .A,C
A -> .int
A -> .[B
I8 -> ] = I4
I8 -> C = I5
I8 -> A = I6
I8 -> int = I7
I8 -> [ = I8
I9 (I6 -> ])
C -> A].
I10 (I6 -> ,)
C -> A,.C
C -> .A]
C -> .A,C
A -> .int
A -> .[B
I11 (I8 -> B)
A -> [B.
I12 (I10 -> C)
C -> A,C.
I10 -> A = I6
I10 -> int = I7
I10 -> [ = I8
Considering that you have done the hard work, to find all the states, now it is time to check the Rules of SLR(1).
https://en.wikipedia.org/wiki/SLR_grammar
As you can realize, your question misses the Follow() set which is mandatory in the Rules.
Yes, you can find out on your table parser if there is a conflict or not, but this is an answer depended more on experience than actual science.
Check the Rules one-by-one and you will be fine :)
Depending on the fact that you are sure about the correctness of your states, I see no Shift/Reduce or Reduce/Reduce conflict, so the Grammar is SLR(1)
Related
I was writing an LL(1) parser for an expression grammar. I had the following grammar:
E -> E + E
E -> E - E
E -> E * E
E -> E / E
E -> INT
However, this is left recursive and I removed the left recursion with the following grammar:
E -> INT E'
E' -> + INT E'
E' -> - INT E'
E' -> * INT E'
E' -> / INT E'
E' -> ε
If I was to have the expression 1 + 2 * 3, how would the parser know to evaluate the multiplication before the addition?
Try this:
; an expression is an addition
E -> ADD
; an addition is a multiplication that is optionally followed
; by +- and another addition
ADD -> MUL T
T -> PM ADD
T -> ε
PM -> +
PM -> -
; a multiplication is an integer that is optionally followed
; by */ and another multiplication
MUL -> INT G
G -> MD MUL
G -> ε
MD -> *
MD -> /
; an integer is a digit that is optionally followed by an integer
INT -> DIGIT J
J -> INT
J -> ε
; digits
DIGIT -> 0
DIGIT -> 1
DIGIT -> 2
DIGIT -> 3
DIGIT -> 4
DIGIT -> 5
DIGIT -> 6
DIGIT -> 7
DIGIT -> 8
DIGIT -> 9
I have a string:
"abc abc abc abc"
How do I calculate the number of "abc" repetitions?
If you are looking for practical and efficient implementation which will scale well for even longer substrings you can use binary:matches/2,3 which is using Boyer–Moore string search algorithm (and Aho-Corasic for multiple substrings). It obviously works only for ASCII or Latin1 strings.
repeats(L, S) -> length(binary:matches(list_to_binary(L), list_to_binary(S))).
If it is for education purposes, you can write your own less efficient version for lists of any kind. If you know substring in compile time you can use very simple and not so much bad in performance:
-define(SUBSTR, "abc").
repeats(L) -> repeats(L, 0).
repeats(?SUBSTR ++ L, N) -> repeats(L, N+1);
repeats([_|L] , N) -> repeats(L, N);
repeats([] , N) -> N.
If you don't know substring you can write a little bit more complicated and less efficient
repeats(L, S) -> repeats(L, S, 0).
repeats([], _, N) -> N;
repeats(L, S, N) ->
case prefix(L, S) of
{found, L2} -> repeats( L2, S, N+1);
nope -> repeats(tl(L), S, N)
end.
prefix([H|T], [H|S]) -> prefix(T, S);
prefix( L, [ ]) -> {found, L};
prefix( _, _ ) -> nope.
And you, of course, can try write some more sophisticated variant as simplified Boyer–Moore for lists.
1> F = fun
F([],_,_,N) -> N;
F(L,P,S,N) ->
case string:sub_string(L,1,S) == P of
true -> F(tl(string:sub_string(L,S,length(L))),P,S,N+1);
_ -> F(tl(L),P,S,N)
end
end.
#Fun<erl_eval.28.106461118>
2> Find = fun(L,P) -> F(L,P,length(P),0) end.
#Fun<erl_eval.12.106461118>
3> Find("abc abc abc abc","abc").
4
4>
this works if defined in a module, or in the shell but only with the R17.
length(lists:filter(fun(X) -> X=="abc" end, string:tokens("abc abc abc abc", " "))).
I have trouble understanding how to compute the lookaheads.
Lets say that I have this extend grammar:
S'-> S
S -> L=R | R
L -> *R | i
R -> L
I wrote the State 0 so:
S'-> .S, {$}
S -> .L=R, {$}
S -> .R, {$}
L -> .*R, {=,$}
L -> .i, {=,$}
R -> .L {=,$}
Using many parsing emulator i see that all calculators says:
R -> .L {$}
Why? Can't the R be followed by a "="?
I am studying rabbitmq source code now for learning erlang technique.
The following is from rabbit_misc.erl file. The purpose is to check application's minimum version.
In the 5th and 7th sub sentance of version_compare/N, there is is a special character, which is $0. But I don't know how it happens?
My reason that it will not happens is that in the last sentance, after lists:splitwith/N, AT1 and BT1 will be started with "$.".
version_compare(A, B, lte) ->
case version_compare(A, B) of
eq -> true;
lt -> true;
gt -> false
end;
version_compare(A, B, gte) ->
case version_compare(A, B) of
eq -> true;
gt -> true;
lt -> false
end;
version_compare(A, B, Result) ->
Result =:= version_compare(A, B).
version_compare(A, A) ->
eq;
version_compare([], [$0 | B]) ->
version_compare([], dropdot(B));
version_compare([], _) ->
lt; %% 2.3 < 2.3.1
version_compare([$0 | A], []) ->
version_compare(dropdot(A), []);
version_compare(_, []) ->
gt; %% 2.3.1 > 2.3
version_compare(A, B) ->
{AStr, ATl} = lists:splitwith(fun (X) -> X =/= $. end, A),
{BStr, BTl} = lists:splitwith(fun (X) -> X =/= $. end, B),
ANum = list_to_integer(AStr),
BNum = list_to_integer(BStr),
if ANum =:= BNum -> version_compare(dropdot(ATl), dropdot(BTl));
ANum < BNum -> lt;
ANum > BNum -> gt
end.
$0 is not a special character -- this is zero string: "0".
Versions may be complex: 0.1.22.333 and splitwith/2 splits into head and tail ("0" and ".1.22.333").
I imagine that handling $0 is for cases like "1.0.0" and "1"
{"1",".0.0"} vs {"1",[]}
What's best way to do the following? Binary -> list -> binary seems unnecessary.
binary_and(A, B) ->
A2 = binary_to_list(A),
B2 = binary_to_list(B),
list_to_binary([U band V || {U, V} <- lists:zip(A2, B2)]).
If don't care of performance, your code is absolutely OK. Otherwise you can do something different.
For example Erlang supports Integers of arbitrary size:
binary_and(A, B) ->
Size = bit_size(A),
<<X:Size>> = A,
<<Y:Size>> = B,
<<(X band Y):Size>>.
Or you can handcraft your own binary zip routine:
binary_and(A,B) -> binary_and(A, B, <<>>).
binary_and(<<A:8, RestA/bytes>>, <<B:8, RestB/bytes>>, Acc) ->
binary_add(RestA, RestB, <<Acc/bytes, (A band B):8>>);
binary_and(<<>>, <<>>, Result) -> Result.
Or optimized version:
binary_and(A,B) -> binary_and(A, B, <<>>).
binary_and(<<A:64, RestA/bytes>>, <<B:64, RestB/bytes>>, Acc) ->
binary_add(RestA, RestB, <<Acc/bytes, (A band B):64>>);
binary_and(<<A:8, RestA/bytes>>, <<B:8, RestB/bytes>>, Acc) ->
binary_add(RestA, RestB, <<Acc/bytes, (A band B):8>>);
binary_and(<<>>, <<>>, Result) -> Result.
or more sophisticated
binary_and(A,B) -> binary_and({A, B}, 0, <<>>).
binary_and(Bins, Index, Acc) ->
case Bins of
{<<_:Index/bytes, A:64, _/bytes>>, <<_:Index/bytes, B:64, _/bytes>>} ->
binary_add(Bins, Index+8, <<Acc/bytes, (A band B):64>>);
{<<_:Index/bytes, A:8, _/bytes>>, <<_:Index/bytes, B:8, _/bytes>>} ->
binary_add(Bins, Index+1, <<Acc/bytes, (A band B):8>>);
{<<_:Index/bytes>>, <<_:Index/bytes>>} -> Acc
end.
Anyway you have to measure if you are really interested in performance. May be the first one is the fastest for your purposes.
If you want to see the power of the dark side...
binary_and(A, B) ->
Size = erlang:byte_size(A),
Size = erlang:byte_size(B),
Res = hipe_bifs:bytearray(Size, 0),
binary_and(Res, A, B, 0, Size).
binary_and(Res, _A, _B, Size, Size) ->
Res.
binary_and(Res, A, B, N, Size) ->
Bin = hipe_bifs:bytearray_sub(A, N) band hipe_bifs:bytearray_sub(B,N),
hipe_bifs:bytearray_update(Res, N, Bin),
binary_and(Res, A, B, N+1, Size).