Value iteration not converging - Markov decision process - machine-learning

I am having an issue with the results I am getting from performing value iteration, with the numbers increasing to infinity so I assume I have a problem somewhere in my logic.
Initially I have a 10x10 grid, some tiles with a reward of +10, some with a reward of -100, and some with a reward of 0. There are no terminal states. The agent can perform 4 non-deterministic actions: move up, down, left, and right. It has an 80% chance of moving in the chosen direction, and a 20% chance of moving perpendicularly.
My process is to loop over the following:
For every tile, calculate the value of the best action from that tile
For example to calculate the value of going north from a given tile:
self.northVal = 0
self.northVal += (0.1 * grid[x-1][y])
self.northVal += (0.1 * grid[x+1][y])
self.northVal += (0.8 * grid[x][y+1])
For every tile, update its value to be: the initial reward + ( 0.5 * the value of the best move for that tile )
Check to see if the updated grid has the changed since the last loop, and if not, stop the loop as the numbers have converged.
I would appreciate any guidance!

What you're trying to do here is not Value Iteration: value iteration works with a state value function, where you store a value for each state. This means, in value iteration, you don't keep an estimate of each (state,action) pair.
Please refer the 2nd edition of Sutton and Barto book (Section 4.4) for explanation, but here's the algorithm for quick reference. Note the initialization step: you only need a vector storing the value for each state.

Related

How random is arc4random (mac os x)? (or what am I doing wrong?)

I'm playing with an optimized game of life implementation in swift/mac_os_x. First step: randomize a big grid of cells (50% alive).
code:
for(var i=0;i<768;i++){
for(var j=0;j<768;j++){
let r = Int(arc4random_uniform(100))
let alive = (aliveOdds > r)
self.setState(alive,cell: Cell(tup:(i,j)),cells: aliveCells)
}
}
I expect a relatively uniform randomness. What I get has definite patterns:
Zooming in a bit on the lower left:
(I've changed the color to black on every 32 row and column, to see if the patterns lined up with any power of 2).
Any clue what is causing the patterns? I've tried:
replacing arc4random with rand().
adding arc4stir() before each arc4random_uniform call
shifting the display (to ensure the pattern is in the data, not a display glitch)
Ideas on next steps?
You cannot hit period or that many regular non-uniform clusters of arc4random on any displayable set (16*(2**31) - 1).
These are definitely signs of the corrupted/unininitialized memory. For example, you are initializing 768x768 field, but you are showing us 1024xsomething field.
Try replacing Int(arc4random_uniform(100)) with just 100 to see.

Lemniscate of Bernoulli in Objective-C

I found a very interesting thread in the GameDev side, link below:
https://gamedev.stackexchange.com/a/43704
I would like to implement this formula to draw a eight/infinity sign into in view, I don't see how I can do this.
Someone can give a clue to start the code?
Thanks for reading,
Given the parametric representation
scale = 2 / (3 - cos(2*t));
x = scale * cos(t);
y = scale * sin(2*t) / 2;
it's quite straightforward to write the code that draws the figure. What you do is start the variable t at 0, and increment it in a loop by a small value (say 0.05) each iteration until it reaches 2*PI. At each step, draw a line from the previous (x, y) point to the next calculated point. This will be a short line for each step, but together they will form the curved figure.
You can fiddle with the increment value to generate a figure that looks good for your application.

How the window function works in STFT

Can anyone experienced in signal processing and STFT explains to me why the window function in the below posted image is from (t-t'), given that t is the total time and t' is the width of the window?
I can not figure it out because, initially, the window is located at t=0, and if the window length for an example is 3, then the window will spans from t=0 -> t=3, and if the total time T = 10 for an example then the window function will be like w(T-3), which is 7?! I really can not understand it and I believe there is any hidden thing I can not comprehend.
Kindly please clarify it and guide. Thanks
Image:
note that, the width of the winow function is constant throughout the entire STFT process. and the time (t) in the function g(t-t') indicate sthat, t: is the current time on the time axis and it is variable each time the window is moved/shifted to the righ to overlap the main signal.
in other words, and i hope this clarifies better, the "t" at the end of the time axis is NOT the "t" in the function g(t-t'). as i stated earlir in the function g(t-t'), t: is the current time on the time axis and it is variable for each shift of the window function and t': is the width of the window and it is constant throughout the entire STFT process.
t is your time variable, not the total time.
t' is not the width of the window, it is the integration variable in the integral, and the integral is missing a dt' at the right end.
g(x) is the window function, and the width of it is not defined above, but represented as the width of the light blue bell in the figure.
The image may have a different interpretation, but it may be wrong; if you apply these adjustments:
Swap the labels t and t' on the horizontal axis.
Change x(t) with x(t') on the vertical axis.
you are now looking at x(t') (black line) and at g(t-t') (upper contour of the light-blue zone) for a FIXED time t. The bell-shaped window function is centered around t, and the product of the bell and of the signal is the function of which you are calculating the Fourier transform in the equation, and it is non-zero only in proximity of the fixed value of t. Consistently, the quantity is the 'local', i.e. short-time, Fourier transform of the signal, in the vicinity of the fixed time t.
You can do the same for all values of t (with a different figure for each value, with a bell shifted to the left/right), and obtain the STFT.

How to move image with low values?

The problem is simple: I want to move (and later, be able to rotate) an image. For example, every time i press the right arrow on my keyboard, i want the image to move 0.12 pixels to the right, and every time i press the left arrow key, i want the image to move 0.12 pixels to the left.
Now, I have multiple solutions for this:
1) simply add the incremental value, i.e.:
image.x += 0.12;
this is of course assuming that we're going to the right.
2) i multiplicate the value of a single increment by the times i already went into this particular direction + 1, like this:
var result:Number = 0.12 * (numberOfTimesWentRight+1);
image.x = result;
Both of these approaches work but produce similiar, yet subtly different, results. If we add some kind of button component that simply resets the x and y coordinates of the image, you will see that with the first approach the numbers don't add up correctly.
it goes from .12, .24, .359999, .475 etc.
But with the second approach it works well. (It's pretty obvious as to why though, it seems like += operations with Numbers are not really precise).
Why not use the second approach then? Well, i want to rotate the image as well. This will work for the first attempt, but after that the image will jump around. Why? In the second approach we never took the original position of the image in account. So if the origin-point shifts a bit down or up because you rotated your image, and THEN you try to move the image again: it will move to the same position as if you hadn't rotated before.
Alright, to make this short:
How can i reliably move, scale and rotate images for 1/10 of a pixel?
Short answer: I don't know! You're fighting with floating point math!
Luckily, I have a workaround, if you don't mind.
You store the location (x and y) of the image in a separate variable... at a larger scale. Such as 100x. So 123.45 becomes 12345, and you then divide by 100 to set the attribute that flash uses to display.
Yes, there are limits to number sizes too, but if you're willing to accept some error rate, and the fact that you'll be limited to, I dunno, a million pixels in each direction, you can fit it in a regular int. The only rounding error you will encounter will be a single rounding error when you divide by 100 (or the factor you used). So instead of the compound rounding error which you described (0.12 * 4 = 0.475), you should see things like 0.47999999. Which doesn't matter because it's, well, so small.
To expand on #Pimgd answer a bit, you're probably hitting a floating point error (multiple +='s will exaggerate the error more than one *='s) - Numbers in Flash are 53-bit precision.
There's also another thing to keep in mind, which is probably playing a bigger role with such small movement values; Flash positions all objects using twips, which is roughly about 1/20th of a pixel, or 0.05, so all values are rounded to this. When you say image.x += 0.12, it's actually the equivalent of image.x += 0.10, hence which the different becomes apparent; you're losing 0.02 of a pixel with every move.
You should be able to get around it by moving to another scale, as #Pimgd says, or just storing your position separately - i.e. work from a property _x rather than image.x so you're not losing that precision everytime:
this._x += 0.12;
image.x = this._x;

What equation do endless running games use to set the player's speed?

I'm making an endless running game (e.g. canabalt, temple run, Jetpack Joyride) and I'm trying to get the "feel" of it right. So far, I'm using the following equation to set the speed:
speed = (time+500)*(.05+(time/300))
Any tips for how to make the increase feel just right, other than trial and error?
Well, I did something similar in one of my games but I did not increase speed constantly, I increased it once every minute or once the player reaches a certain amount of points. Like so:
- (void)setTravelTimeTo:(NSNumber*)targetTime
{
if (maxTravelTime > targetTime.floatValue)
{
maxTravelTime -= 0.1f;
[self performSelector:#selector(setTravelTimeTo:) withObject:targetTime afterDelay:2];
}
}
Where maxTravelTime is the time or in your case speed. Just modify it to suit your needs. The travel time in this case was the time a moving platform needed to get across the whole screen.
Hope it helps.
Generally you are going to accumulate the speed and position as you go. So something like
a = <some function of current speed (drag), player actions, and terrain>
v = v + a*deltaTime
x = x + v*deltaTime
DeltaTime is just the time since the last computation - possibly the last frame. An implication of this is that v should be at most linear with time (not quadratic as in your formula). Position is at most quadratic. The computation for "a" should ensure that as v approaches some maximum speed (possibly level dependent), "a" goes to zero.

Resources