What will be the e-closure(r) in following NFA - closures

Sorry guys I cannot provide the pic here...I was unable to upload the pic...so.. I will give the transition table of the problem.
(S/I)....a...b.....c.......e(elipson)
p>.......{p}.....{q}...{r} ..¤(phi)
q>.......{q} ....{r} ..¤.... {p}
r(final)>..{r}...¤....{p}....{q}
Here ¤ is phai
p is starting state
And r is final state
My doubt is...Will e-closure of final state {r} have starting state {p}......,even if starting state will not have a direct reach through elipson to final state
.....but...final state reach the starting state through elipson to state {q} and then to starting state {p}
In my book it is given that
e-closure (r)={r,q}
But my question is why it is not ....{p,q,r}...while final state {r} is reaching starting state {p} as well...

ϵ-closure(s) is a set of NFA states reachable from NFA state s on
ϵ-transitions alone.
You were correct in thinking the same. Please follow the definition mentioned above.
So, ϵ-closure(r) = set of NFA states reachable from NFA state r on ϵ-transitions alone = {p,q,r}. Hence, your book has incorrectly computed it.
The answer has to be {p,q,r}.
NOTE that state {r} is included because a state will always remain on itself at ϵ-transition. State {p} was possible because p is reachable from NFA state q on ϵ-transition, which was reachable from NFA state r on ϵ-transition.

Related

Defining states, Q and R matrix in reinforcement learning

I am new to RL and I am referring couple of books and tutorials, yet I have a basic question and I hope to find that fundamental answer here.
the primary book referred: Sutton & Barto 2nd edition and a blog
Problem description (only Q learning approach): The agent has to reach from point A to point B and it is in a straight line, point B is static and only the initial position of Agent is always random.
-----------A(60,0)----------------------------------B(100,0)------------->
keeping it simple Agent always moves in the forward direction. B is always at X-axis position 100, which also a goal state and in first iteration A is at 60 X-axis position. So actions will be just "Go forward" and "Stop". Reward structure is to reward the agent 100 when A reaches point B and else just maintain 0, and when A crosses B it gets -500. So the goal for the Agent is to reach and stop at position B.
1)how many states would it require to go from point A to point B in this case? and how to define a Q and an R matrix for this?
2)How to add a new col and row if a new state is found?
Any help would be greatly appreciated.
Q_matrix implementation:
Q_matrix((find(List_Ego_pos_temp == current_state)) ,
possible_actions) = Q_matrix(find(List_Ego_pos_temp == current_state),possible_actions) + this.learning_rate * (Store_reward(this.Ego_pos_counter) + ...
this.discount * max(Q_matrix(find(List_Ego_pos_temp == List_Ego_pos_temp(find(current_state)+1))),possible_actions) - Q_matrix((find(List_Ego_pos_temp == current_state)) , possible_actions));
This implementation is in matlab.
List_Ego_pos_temp is a temporary list which store all the positions of the Agent.
Also, lets say there are ten states 1 to 10 and we also know that with what speed and distance the agent moves in each state to reach till state 10 and the agent always can move only sequentially which means agent can go from s1 to s2 to s3 to s4 till 10 not s1 to s4 or s10.
lets say at s8 is the goal state and Reward = 10, s10 is a terminal state and reward is -10, from s1 to s7 it receives reward of 0.
so will it be a right approach to calculate a Q table if the current state is considered as state1 and the next state is considered as state2 and in the next iteration current state as state2 and the next state as state3 and so on? will this calculate the Q table correctly as the next state is already fed and nothing is predicted?
Since you are defining the problem in this case, many of the variables are dependent on you.
You can define a minimum state (for e.g. 0) and a maximum state (for e.g. 150) and define each step as a state (so you could have 150 possible states). Then 100 will be your goal state. Then your action will be defined as +1 (move one step) and 0 (stop). Then the Q matrix will be a 150x2 matrix for all possible states and all actions. The reward will be scalar as you have defined.
You do not need to add new column and row, since you have the entire Q matrix defined.
Best of luck.

proof of optimality in activity selection

Can someone please explain in a not so formal way how the greedy choice is the optimal solution for the activity selection problem? This is the simplest explanation that I have found but I don't really get it
How does Greedy Choice work for Activities sorted according to finish time?
Let the given set of activities be S = {1, 2, 3, ..n} and activities be sorted by finish time. The greedy choice is to always pick activity 1. How come the activity 1 always provides one of the optimal solutions. We can prove it by showing that if there is another solution B with the first activity other than 1, then there is also a solution A of the same size with activity 1 as the first activity. Let the first activity selected by B be k, then there always exist A = {B – {k}} U {1}.(Note that the activities in B are independent and k has smallest finishing time among all. Since k is not 1, finish(k) >= finish(1)).
The following is my understanding of why greedy solution always words:
Assertion: If A is the greedy choice(starting with 1st activity in the sorted array), then it gives the optimal solution.
Proof: Let there be another choice B starting with some activity k (k != 1 or finishTime(k)>= finishTime(1)) which alone gives the optimal solution.So, B does not have the 1st activity and the following relation could be written between A & B could be written as:
A = {B - {k}} U {1}
Here:
1.Sets A and B are disjoint
2.Both A and B have compatible activities in them
Since we conclude that |A|=|B|, therefore activity A also gives the optimal solution.
Let's say A is a the optimal solution which starts with 1 if the intervals are S={1,2,3,.....m} and the length of the solution is say n1. If A is not an optimal solution, then there exists another solution B which starts with k!=1 and finishTime(k)>=finishTime(1), which has length n2.
So, n2>n1.
Now, if we exclude k from solution B then we are left with n2-1 number of elements.
Since, k doesn't overlap with other intervals in B, 1 will also not overlap.
This is because all intervals in B(excluding k) will have startTime>= finishTime(k)>=finishTime(1).Hence, if we replace k with 1 in B, we still have n2 length. But optimal solution starting with 1 was A with length n1. We are getting n1=n2 , which contradicts n2>n1. Hence Solution starting with 1 is optimal.

Pushdown automata stack states?

I am studying pushdown automata for a course and I've reached a conceptual plothole.
Does the stack basically have infinite memory or only space for a single symbol?
If I have the following string:
abba
And the following rules with q6 as my acceptance state:
(q0,a,Z)=(q1,a)
(q1,b,a)=(q2,b)
(q2,b,b) = (q3,b)
(q3,a,b) = (q4,a)
(q4,lambda,a)=(q5,lambda)
(q5,lambda,a)=(q5,lambda)
(q5,lambda,b)=(q5,lambda)
(q5,lambda,Z)=(q6,lambda)
In the states my stack looks like this:
q0: Z
q1: aZ
q2: baZ
q3: bbaZ
q4: abbaZ
q5: Z because eventually everything is popped
q6: Z
Is this the proper transformation of the stack? Basically every push it grows indefinitely? Or should every push swap with the current top?
For example the states would like:
q0: Z
q1: aZ
q2: bZ
q3: bZ
q4: aZ
q5: Z
q6: Z

LR(1) Automata: difference between items

I have a doubt regarding the LR(1) automata construction:
Is the state with the kernel [A->b., x] (state_1) equivalent to the state with the kernel [A->b.,x/y] (state_2)?
Like, if I'm on the state [A->.b, x] and shift_b from this state, do I need to create state_1 if I already have state_2?
Hope it's clear.

expressing temporal logic of actions in erlang. any natural way?

I would like to translate some actions specified in TLA in Erlang. Can you think of any natural way of doing this directly in Erlang or of any framework available for such? In a nutshell (a very small one), TLA actions are conditions on variables, some of which are primed, meaning that they represent the values of the variables in the next state. For example:
Action(x,y,z) ->
and PredicateA(x),
and or PredicateB(y)
or PredicateC(z)
and x' = x+1
This action means that, whenever the state of the system is such that PredicateA is true for variable x and either PredicateB is true for y or PredicateC is true for z, then the system may change it's state so that everything remains the same except that x changes to the current value plus 1.
Expressing that in Erlang requires a lot of plumbing, at least in the way I've found. For example, by having a loop that evaluates conditions before triggering them, like:
what_to_do(State,NewInfo) ->
PA = IsPredicateA(State,NewInfo),
PB = IsPredicateB(State,NewInfo),
PC = IsPredicateC(State,NewInfo),
[{can_do_Action1, PA and (PB or PC}, %this is the action specified above.
{can_do_Action2, PA and PC}, %this is some other action
{can_do_Action3, true}] %this is some action that may be executed at any time.
loop(State) ->
NewInfo = get_new_info(),
CanDo = what_to_do(State,NewInfo),
RandomAction = rand_action(CanDo),
case RandDomAction of
can_do_Action1 -> NewState = Action(x,y,z);
can_do_Action2 -> NewState = Action2(State);
can_do_Action3 -> NewState = Action3(State)
end,
NewestState = clean_up_old_info(NewState,NewInfo),
loop(NewestState).
I am thinking writing a framework to hide this plumbing, incorporating message passing within the get_new_info() function and, hopefully, still making it OTP compliant. If you know of any framework that already does that or if you can think of a simple way of implementing this, I would appreciate to hear about it.
I believe gen_fsm(3) behaviour could probably make your life slightly easier.
FSM from Finite State Machine, not Flying Spaghetti Monster, though the latter could help, too.

Resources