I am studying pushdown automata for a course and I've reached a conceptual plothole.
Does the stack basically have infinite memory or only space for a single symbol?
If I have the following string:
abba
And the following rules with q6 as my acceptance state:
(q0,a,Z)=(q1,a)
(q1,b,a)=(q2,b)
(q2,b,b) = (q3,b)
(q3,a,b) = (q4,a)
(q4,lambda,a)=(q5,lambda)
(q5,lambda,a)=(q5,lambda)
(q5,lambda,b)=(q5,lambda)
(q5,lambda,Z)=(q6,lambda)
In the states my stack looks like this:
q0: Z
q1: aZ
q2: baZ
q3: bbaZ
q4: abbaZ
q5: Z because eventually everything is popped
q6: Z
Is this the proper transformation of the stack? Basically every push it grows indefinitely? Or should every push swap with the current top?
For example the states would like:
q0: Z
q1: aZ
q2: bZ
q3: bZ
q4: aZ
q5: Z
q6: Z
Related
I am writing a parser for a query engine. My parser DCG query is not deterministic.
I will be using the parser in a relational manner, to both check and synthesize queries.
Is it appropriate for a parser DCG to not be deterministic?
In code:
If I want to be able to use query/2 both ways, does it require that
?- phrase(query, [q,u,e,r,y]).
true;
false.
or should I be able to obtain
?- phrase(query, [q,u,e,r,y]).
true.
nevertheless, given that the first snippet would require me to use it as such
?- bagof(X, phrase(query, [q,u,e,r,y]), [true]).
true.
when using it to check a formula?
The first question to ask yourself, is your grammar deterministic, or in the terminology of grammars, unambiguous. This is not asking if your DCG is deterministic, but if the grammar is unambiguous. That can be answered with basic parsing concepts, no use of DCG is needed to answer that question. In other words, is there only one way to parse a valid input. The standard book for this is "Compilers : principles, techniques, & tools" (WorldCat)
Now you are actually asking about three different uses for parsing.
A recognizer.
A parser.
A generator.
If your grammar is unambiguous then
For a recognizer the answer should only be true for valid input that can be parsed and false for invalid input.
For the parser it should be deterministic as there is only one way to parse the input. The difference between a parser and an recognizer is that a recognizer only returns true or false and a parser will return something more, typically an abstract syntax tree.
For the generator, it should be semi-deterministic so that it can generate multiple results.
Can all of this be done with one, DCG, yes. The three different ways are dependent upon how you use the input and output of the DCG.
Here is an example with a very simple grammar.
The grammar is just an infix binary expression with one operator and two possible operands. The operator is (+) and the operands are either (1) or (2).
expr(expr(Operand_1,Operator,Operand_2)) -->
operand(Operand_1),
operator(Operator),
operand(Operand_2).
operand(operand(1)) --> "1".
operand(operand(2)) --> "2".
operator(operator(+)) --> "+".
recognizer(Input) :-
string_codes(Input,Codes),
DCG = expr(_),
phrase(DCG,Codes,[]).
parser(Input,Ast) :-
string_codes(Input,Codes),
DCG = expr(Ast),
phrase(DCG,Codes,[]).
generator(Generated) :-
DCG = expr(_),
phrase(DCG,Codes,[]),
string_codes(Generated,Codes).
:- begin_tests(expr).
recognizer_test_case_success("1+1").
recognizer_test_case_success("1+2").
recognizer_test_case_success("2+1").
recognizer_test_case_success("2+2").
test(recognizer,[ forall(recognizer_test_case_success(Input)) ] ) :-
recognizer(Input).
recognizer_test_case_fail("2+3").
test(recognizer,[ forall(recognizer_test_case_fail(Input)), fail ] ) :-
recognizer(Input).
parser_test_case_success("1+1",expr(operand(1),operator(+),operand(1))).
parser_test_case_success("1+2",expr(operand(1),operator(+),operand(2))).
parser_test_case_success("2+1",expr(operand(2),operator(+),operand(1))).
parser_test_case_success("2+2",expr(operand(2),operator(+),operand(2))).
test(parser,[ forall(parser_test_case_success(Input,Expected_ast)) ] ) :-
parser(Input,Ast),
assertion( Ast == Expected_ast).
parser_test_case_fail("2+3").
test(parser,[ forall(parser_test_case_fail(Input)), fail ] ) :-
parser(Input,_).
test(generator,all(Generated == ["1+1","1+2","2+1","2+2"]) ) :-
generator(Generated).
:- end_tests(expr).
The grammar is unambiguous and has only 4 valid strings which are all unique.
The recognizer is deterministic and only returns true or false.
The parser is deterministic and returns a unique AST.
The generator is semi-deterministic and returns all 4 valid unique strings.
Example run of the test cases.
?- run_tests.
% PL-Unit: expr ........... done
% All 11 tests passed
true.
To expand a little on the comment by Daniel
As Daniel notes
1 + 2 + 3
can be parsed as
(1 + 2) + 3
or
1 + (2 + 3)
So 1+2+3 is an example as you said is specified by a recursive DCG and as I noted a common way out of the problem is to use parenthesizes to start a new context. What is meant by starting a new context is that it is like getting a new clean slate to start over again. If you are creating an AST, you just put the new context, items in between the parenthesizes, as a new subtree at the current node.
With regards to write_canonical/1, this is also helpful but be aware of left and right associativity of operators. See Associative property
e.g.
+ is left associative
?- write_canonical(1+2+3).
+(+(1,2),3)
true.
^ is right associative
?- write_canonical(2^3^4).
^(2,^(3,4))
true.
i.e.
2^3^4 = 2^(3^4) = 2^81 = 2417851639229258349412352
2^3^4 != (2^3)^4 = 8^4 = 4096
The point of this added info is to warn you that grammar design is full of hidden pitfalls and if you have not had a rigorous class in it and done some of it you could easily create a grammar that looks great and works great and then years latter is found to have a serious problem. While Python was not ambiguous AFAIK, it did have grammar issues, it had enough issues that when Python 3 was created, many of the issues were fixed. So Python 3 is not backward compatible with Python 2 (differences). Yes they have made changes and libraries to make it easier to use Python 2 code with Python 3, but the point is that the grammar could have used a bit more analysis when designed.
The only reason why code should be non-deterministic is that your question has multiple answers. In that case, you'd of course want your query to have multiple solutions. Even then, however, you'd like it to not leave a choice point after the last solution, if at all possible.
Here is what I mean:
"What is the smaller of two numbers?"
min_a(A, B, B) :- B < A.
min_a(A, B, A) :- A =< B.
So now you ask, "what is the smaller of 1 and 2" and the answer you expect is "1":
?- min_a(1, 2, Min).
Min = 1.
?- min_a(2, 1, Min).
Min = 1 ; % crap...
false.
?- min_a(2, 1, 2).
false.
?- min_a(2, 1, 1).
true ; % crap...
false.
So that's not bad code but I think it's still crap. This is why, for the smaller of two numbers, you'd use something like the min() function in SWI-Prolog.
Similarly, say you want to ask, "What are the even numbers between 1 and 10"; you write the query:
?- between(1, 10, X), X rem 2 =:= 0.
X = 2 ;
X = 4 ;
X = 6 ;
X = 8 ;
X = 10.
... and that's fine, but if you then ask for the numbers that are multiple of 3, you get:
?- between(1, 10, X), X rem 3 =:= 0.
X = 3 ;
X = 6 ;
X = 9 ;
false. % crap...
The "low-hanging fruit" are the cases where you as a programmer would see that there cannot be non-determinism, but for some reason your Prolog is not able to deduce that from the code you wrote. In most cases, you can do something about it.
On to your actual question. If you can, write your code so that there is non-determinism only if there are multiple answers to the question you'll be asking. When you use a DCG for both parsing and generating, this sometimes means you end up with two code paths. It feels clumsy but it is easier to write, to read, to understand, and probably to make efficient. As a word of caution, take a look at this question. I can't know that for sure, but the problems that OP is running into are almost certainly caused by unnecessary non-determinism. What probably happens with larger inputs is that a lot of choice points are left behind, there is a lot of memory that cannot be reclaimed, a lot of processing time going into book keeping, huge solution trees being traversed only to get (as expected) no solutions.... you get the point.
For examples of what I mean, you can take a look at the implementation of library(dcg/basics) in SWI-Prolog. Pay attention to several things:
The documentation is very explicit about what is deterministic, what isn't, and how non-determinism is supposed to be useful to the client code;
The use of cuts, where necessary, to get rid of choice points that are useless;
The implementation of number//1 (towards the bottom) that can "generate extract a number".
(Hint: use the primitives in this library when you write your own parser!)
I hope you find this unnecessarily long answer useful.
I am new to RL and I am referring couple of books and tutorials, yet I have a basic question and I hope to find that fundamental answer here.
the primary book referred: Sutton & Barto 2nd edition and a blog
Problem description (only Q learning approach): The agent has to reach from point A to point B and it is in a straight line, point B is static and only the initial position of Agent is always random.
-----------A(60,0)----------------------------------B(100,0)------------->
keeping it simple Agent always moves in the forward direction. B is always at X-axis position 100, which also a goal state and in first iteration A is at 60 X-axis position. So actions will be just "Go forward" and "Stop". Reward structure is to reward the agent 100 when A reaches point B and else just maintain 0, and when A crosses B it gets -500. So the goal for the Agent is to reach and stop at position B.
1)how many states would it require to go from point A to point B in this case? and how to define a Q and an R matrix for this?
2)How to add a new col and row if a new state is found?
Any help would be greatly appreciated.
Q_matrix implementation:
Q_matrix((find(List_Ego_pos_temp == current_state)) ,
possible_actions) = Q_matrix(find(List_Ego_pos_temp == current_state),possible_actions) + this.learning_rate * (Store_reward(this.Ego_pos_counter) + ...
this.discount * max(Q_matrix(find(List_Ego_pos_temp == List_Ego_pos_temp(find(current_state)+1))),possible_actions) - Q_matrix((find(List_Ego_pos_temp == current_state)) , possible_actions));
This implementation is in matlab.
List_Ego_pos_temp is a temporary list which store all the positions of the Agent.
Also, lets say there are ten states 1 to 10 and we also know that with what speed and distance the agent moves in each state to reach till state 10 and the agent always can move only sequentially which means agent can go from s1 to s2 to s3 to s4 till 10 not s1 to s4 or s10.
lets say at s8 is the goal state and Reward = 10, s10 is a terminal state and reward is -10, from s1 to s7 it receives reward of 0.
so will it be a right approach to calculate a Q table if the current state is considered as state1 and the next state is considered as state2 and in the next iteration current state as state2 and the next state as state3 and so on? will this calculate the Q table correctly as the next state is already fed and nothing is predicted?
Since you are defining the problem in this case, many of the variables are dependent on you.
You can define a minimum state (for e.g. 0) and a maximum state (for e.g. 150) and define each step as a state (so you could have 150 possible states). Then 100 will be your goal state. Then your action will be defined as +1 (move one step) and 0 (stop). Then the Q matrix will be a 150x2 matrix for all possible states and all actions. The reward will be scalar as you have defined.
You do not need to add new column and row, since you have the entire Q matrix defined.
Best of luck.
Can someone please explain in a not so formal way how the greedy choice is the optimal solution for the activity selection problem? This is the simplest explanation that I have found but I don't really get it
How does Greedy Choice work for Activities sorted according to finish time?
Let the given set of activities be S = {1, 2, 3, ..n} and activities be sorted by finish time. The greedy choice is to always pick activity 1. How come the activity 1 always provides one of the optimal solutions. We can prove it by showing that if there is another solution B with the first activity other than 1, then there is also a solution A of the same size with activity 1 as the first activity. Let the first activity selected by B be k, then there always exist A = {B – {k}} U {1}.(Note that the activities in B are independent and k has smallest finishing time among all. Since k is not 1, finish(k) >= finish(1)).
The following is my understanding of why greedy solution always words:
Assertion: If A is the greedy choice(starting with 1st activity in the sorted array), then it gives the optimal solution.
Proof: Let there be another choice B starting with some activity k (k != 1 or finishTime(k)>= finishTime(1)) which alone gives the optimal solution.So, B does not have the 1st activity and the following relation could be written between A & B could be written as:
A = {B - {k}} U {1}
Here:
1.Sets A and B are disjoint
2.Both A and B have compatible activities in them
Since we conclude that |A|=|B|, therefore activity A also gives the optimal solution.
Let's say A is a the optimal solution which starts with 1 if the intervals are S={1,2,3,.....m} and the length of the solution is say n1. If A is not an optimal solution, then there exists another solution B which starts with k!=1 and finishTime(k)>=finishTime(1), which has length n2.
So, n2>n1.
Now, if we exclude k from solution B then we are left with n2-1 number of elements.
Since, k doesn't overlap with other intervals in B, 1 will also not overlap.
This is because all intervals in B(excluding k) will have startTime>= finishTime(k)>=finishTime(1).Hence, if we replace k with 1 in B, we still have n2 length. But optimal solution starting with 1 was A with length n1. We are getting n1=n2 , which contradicts n2>n1. Hence Solution starting with 1 is optimal.
I'm using Gforth to try to implement exponentiation. I understand, in theory, how a stack-based language is supposed to operate. However, I'm having difficulties with my implementation of it on Gforth.
Here's what I have right now:
: myexp
1 swap ?do rot dup * rot rot loop ;
However, when I run it I see a stack underflow like:
3 2 myexp
:1: Stack underflow
3 2 >>>myexp<<<
Backtrace:
$7F645EFD6EF0 rot
$2
$1
Is Gforth's looping structure manipulating the stack when it loops?
I'm in the dark on how Forth works as most looping examples I've seen online are rather involved and confusing to someone new to Forth.
What is wrong with my implementation?
The 1 swap is wrong. ?do wants the lower bound at the top of the
stack.
The loop body is wrong. The two bounds are removed from the data stack, so your use of rot to access the exponentiation base doesn't work.
: myexp ( u1 u2 -- u3 ) \ u3 = u1^u2
over swap 1 ?do over * loop nip ;
I'm not sure how to use Gforth's floating point stack, so I can't give you the answer, but instead of using a loop, you can use the Pascal programming trick of defining exponentiation like so:
x^y = exp(y*ln(x))
Note...for more information, see this answer from the question on Exponentiation of real numbers.
I'm trying to teach myself Prolog. Below, I've written some code that I think should return all paths between nodes in an undirected graph... but it doesn't. I'm trying to understand why this particular code doesn't work (which I think differentiates this question from similar Prolog pathfinding posts). I'm running this in SWI-Prolog. Any clues?
% Define a directed graph (nodes may or may not be "room"s; edges are encoded by "leads_to" predicates).
room(kitchen).
room(living_room).
room(den).
room(stairs).
room(hall).
room(bathroom).
room(bedroom1).
room(bedroom2).
room(bedroom3).
room(studio).
leads_to(kitchen, living_room).
leads_to(living_room, stairs).
leads_to(living_room, den).
leads_to(stairs, hall).
leads_to(hall, bedroom1).
leads_to(hall, bedroom2).
leads_to(hall, bedroom3).
leads_to(hall, studio).
leads_to(living_room, outside). % Note "outside" is the only node that is not a "room"
leads_to(kitchen, outside).
% Define the indirection of the graph. This is what we'll work with.
neighbor(A,B) :- leads_to(A, B).
neighbor(A,B) :- leads_to(B, A).
Iff A --> B --> C --> D is a loop-free path, then
path(A, D, [B, C])
should be true. I.e., the third argument contains the intermediate nodes.
% Base Rule (R0)
path(X,Y,[]) :- neighbor(X,Y).
% Inductive Rule (R1)
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), not(member(Z, P)), path(Z,Y,P).
Yet,
?- path(bedroom1, stairs, P).
is false. Why? Shouldn't we get a match to R1 with
X = bedroom1
Y = stairs
Z = hall
P = []
since,
?- neighbor(bedroom1, hall).
true.
?- not(member(hall, [])).
true.
?- path(hall, stairs, []).
true .
?
In fact, if I evaluate
?- path(A, B, P).
I get only the length-1 solutions.
Welcome to Prolog! The problem, essentially, is that when you get to not(member(Z, P)) in R1, P is still a pure variable, because the evaluation hasn't gotten to path(Z, Y, P) to define it yet. One of the surprising yet inspiring things about Prolog is that member(Ground, Var) will generate lists that contain Ground and unify them with Var:
?- member(a, X).
X = [a|_G890] ;
X = [_G889, a|_G893] ;
X = [_G889, _G892, a|_G896] .
This has the confusing side-effect that checking for a value in an uninstantiated list will always succeed, which is why not(member(Z, P)) will always fail, causing R1 to always fail. The fact that you get all the R0 solutions and none of the R1 solutions is a clue that something in R1 is causing it to always fail. After all, we know R0 works.
If you swap these two goals, you'll get the first result you want:
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), path(Z,Y,P), not(member(Z, P)).
?- path(bedroom1, stairs, P).
P = [hall]
If you ask for another solution, you'll get a stack overflow. This is because after the change we're happily generating solutions with cycles as quickly as possible with path(Z,Y,P), only to discard them post-facto with not(member(Z, P)). (Incidentally, for a slight efficiency gain we can switch to memberchk/2 instead of member/2. Of course doing the wrong thing faster isn't much help. :)
I'd be inclined to convert this to a breadth-first search, which in Prolog would imply adding an "open set" argument to contain solutions you haven't tried yet, and at each node first trying something in the open set and then adding that node's possibilities to the end of the open set. When the open set is extinguished, you've tried every node you could get to. For some path finding problems it's a better solution than depth first search anyway. Another thing you could try is separating the path into a visited and future component, and only checking the visited component. As long as you aren't generating a cycle in the current step, you can be assured you aren't generating one at all, there's no need to worry about future steps.
The way you worded the question leads me to believe you don't want a complete solution, just a hint, so I think this is all you need. Let me know if that's not right.