Can't replicate RStan ESS code from Vehtari paper

Can't replicate RStan ESS code from Vehtari paper - mcmc

I am trying to replicate an ESS (effective sample size) calculation using the method of Vehtari et al. in: Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC
I am working from the code here:
https://github.com/avehtari/rhat_ess/blob/master/code/monitornew.R
# Geyer's initial positive sequence
rho_hat_t <- rep.int(0, n_samples)
t <- 0
rho_hat_even <- 1
rho_hat_t[t + 1] <- rho_hat_even
rho_hat_odd <- 1 - (mean_var - mean(acov[t + 2, ])) / var_plus # 251
rho_hat_t[t + 2] <- rho_hat_odd
while (t < nrow(acov) - 5 && !is.nan(rho_hat_even + rho_hat_odd) &&
(rho_hat_even + rho_hat_odd > 0)) {
t <- t + 2
rho_hat_even = 1 - (mean_var - mean(acov[t + 1, ])) / var_plus # 256
rho_hat_odd = 1 - (mean_var - mean(acov[t + 2, ])) / var_plus # 257
if ((rho_hat_even + rho_hat_odd) >= 0) {
rho_hat_t[t + 1] <- rho_hat_even
rho_hat_t[t + 2] <- rho_hat_odd
}
}
I can follow the code from the paper except when we get to equation 10 in the paper (calculating the cross-chain autocorrelation). The code (lines 251, 256 and 257) appears in the form:
1 - (mean_var - mean(acov[t + 1, ])) / var_plus
which is close to equation 10, except the missing the 's' terms from equation 10:
I can't see anywhere in the code that this is somehow accounted for elsewhere in the way the calculation is being done. I have tried putting the 's' terms back into those lines of code and it makes a big difference to the final ESS value.
Is anyone able to help me understand the discrepancy between paper and code?
Thanks.

In the formula in the paper, s^2 is is the estimate of variance and rho the estimate of autocorrelation. Thus s^2 * rho is an estimate of the autocovariance, which is what you see in the code.

Related

How to find and update levels accordingly based on points?

I am creating a rails application which is like a game. So it has points and levels. For example: to become level one the user has to get atleast 100 points and again for level two the user has to reach level 2 the user has to collect 200 points. The level difference changes after every 10 levels i.e., The difference between each level changes after 10 levels always. By that I mean the difference in points between level one and two is 100 and the difference in points in level 11 and 12 is 150 and so on. There is no upper bound for levels.
Now my question is let's say a user's total points is 3150 and just got updated to 3155. What's the optimal solution to find the current level and update it if needed?
I can get a solution using while loops and again looping inside it which will give a result in O(n^2). I need something better.
I think this code works but I'm not sure if this is the best way to go about it
def get_level(points)
diff = 100
sum = 0
level = -1
current_level = 0
while level.negative?
10.times do |i|
current_level += 1
sum += diff
if points > sum
next
elsif points <= sum
level = current_level
break
end
end
diff += 50
end
puts level
end

I wrote a get_points function (it should not be difficult). Then based on it get_level function in which it was necessary to solve the quadratic equation to find high value, and then calc low.
If you have any questions, let me know.
Check output here.
#!/usr/bin/env python3
import math
def get_points(level):
high = (level + 1) // 10
low = (level + 1) % 10
high_point = 250 * high * high + 750 * high # (3 + high) * high // 2 * 500
low_point = (100 + 50 * high) * low
return low_point + high_point
def get_level(points):
# quadratic equation
a = 250
b = 750
c = -points
d = b * b - 4 * a * c
x = (-b + math.sqrt(d)) / (2 * a)
high = int(x)
remainder = points - (250 * high * high + 750 * high)
low = remainder // (100 + 50 * high)
level = high * 10 + low
return level
def main():
for l in range(0, 40):
print(f'{l:3d} {get_points(l - 1):5d}..{get_points(l) - 1}')
for level, (l, r) in (
(1, (100, 199)),
(2, (200, 299)),
(9, (900, 999)),
(10, (1000, 1149)),
(11, (1150, 1299)),
(19, (2350, 2499)),
(20, (2500, 2699)),
):
for p in range(l, r + 1): # for in [l, r]
assert get_level(p) == level, f'{p} {l}'
if __name__ == '__main__':
main()
Why did you set the value of a=250 and b = 750? Can you explain that to me please?
Let's write out every 10 level and the difference between points:
lvl - pnt (+delta)
10 - 1000 (+1000 = +100 * 10)
20 - 2500 (+1500 = +150 * 10)
30 - 4500 (+2000 = +200 * 10)
40 - 7000 (+2500 = +250 * 10)
Divide by 500 (10 levels * 50 difference changes) and received an arithmetic progression starting at 2:
10 - 2 (+2)
20 - 5 (+3)
30 - 9 (+4)
40 - 14 (+5)
Use arithmetic progression get points formula for level = k * 10 equal to:
sum(x for x in 2..k+1) * 500 =
(2 + k + 1) * k / 2 * 500 =
(3 + k) * k * 250 =
250 * k * k + 750 * k
Now we have points and want to find the maximum high such that point >= 250 * high^2 + 750 * high, i. e. 250 * high^2 + 750 * high - points <= 0. Value a = 250 is positive and branches of the parabola are directed up. Now we find the solution of quadratic equation 250 * high^2 + 750 * high - points = 0 and discard the real part (is high = int(x) in python script).

Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned by the cost function should continuously decrease until convergence. When I imagine this being done in a loop of code, it seems like I could just keep track of what the cost was in the previous iteration and exit the loop if the new cost is greater than the previous, since this tells us the learning rate is too large. I'd like to hear opinions since I'm new to this, but in an effort to not make this question primarily opinion-based my main question is this: Would there be anything wrong with this method of detecting a learning rate that needs to be decreased? I'd appreciate an example of when this method would fail, if possible.

In this example below, we will vary the learning rate eta = 10^k with k={-6,-5,-4,...0}
def f(x):
return 100 * (x[ 0] *x[0] - x[ 1]) **2 + (x[ 0] -1) **2
def df(x):
a = x[ 0] *x[0] - x[ 1]
ret = np.zeros(2)
ret[ 0] = 400 * a * x[0] + 2 * (x[0] - 1)
ret[ 1] = -200 * a
return ret
for k in range(-6, 0):
eta = math.pow(10.0, k)
print("eta: " + str(eta))
x = -np.ones(2)
for iter in range(1000000):
fx = f(x)
if fx < 1e-10:
print(" solved after " + str(iter) + " iterations; f(x) = " + str(f(x)))
break
if fx > 1e10:
print(" divergence detected after " + str(iter) + " iterations; f(x) = " +
str(f(x)))
break
g = df(x)
x -= eta * g
if iter == 999999:
print(" not solved; f(x) = " + str(f(x)))
For too small learning rates, the optimization is very slow and the problem is not solved within the iteration budget.
For too large learning rates, the optimization process becomes unstable and diverges very quickly. The learning rate must be "just right" for the optimization process to work well.

Calculating Gradient Update

Lets say I want to manually calculate the gradient update with respect to the Kullback-Liebler divergence loss, say on a VAE (see an actual example from pytorch sample documentation here):
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
where the logvar is (for simplicitys sake, ignoring activation functions and multiple layers etc.) basically a single layer transformation from a 400 dim feature vector into a 20 dim one:
self.fc21 = nn.Linear(400, 20)
logvar = fc21(x)
I'm just not mathematically understanding how you take the gradient of this, with respect to the weight vector for fc21. Mathematically I thought this would look like:
KL = -.5sum(1 + Wx + b - m^2 - e^{Wx + b})
dKL/dW = -.5 (x - e^{Wx + b}x)
where W is the weight matrix of the fc21 layer. But here this result isn't in the same shape as W (20x400). Like, x is just a 400 feature vector. So how would I perform SGD on this? Does x just broadcast to the second term, and if so why? I feel like I'm just missing some mathematical understanding here...

Let's simplify the example a bit and assume a fully connected layer of input shape 3 and output shape 2, then:
W = [[w1, w2, w3], [w4, w5, w6]]
x = [x1, x2, x3]
y = [w1*x1 + w2*x2 + w3*x3, w4*x1 + w5*x2 + w6*x3]
D_KL = -0.5 * [ 1 + w1*x1 + w2*x2 + w3*x3 + w4*x1 + w5*x2 + w6*x3 + b - m^2 + e^(..)]
grad(D_KL, w1) = -0.5 * [x1 + x1* e^(..)]
grad(D_KL, w2) = -0.5 * [x2 + x2* e^(..)]
...
grad(D_KL, W) = [[grad(D_KL, w1), grad(D_KL, w2), grad(D_KL,w3)],
[grad(D_KL, w4), grad(D_KL, w5), grad(D_KL,w6)]
]
This generalizes for higher order tensors of any dimensionality. Your differentiation is wrong in treating x and W as scalars rather than taking element-wise partial derivatives.

Sage: Polynomial ring over finite field - inverting a polynomial non-prime

I'm trying to recreate the wiki's example procedure, available here:
https://en.wikipedia.org/wiki/NTRUEncrypt
I've run into an issue while attempting to invert the polynomials.
The SAGE code below seems to be working fine for the given p=3, which is a prime number.
However, the representation of the polynomial in the field generated by q=32 ends up wrong, because it behaves as if the modulus was 2.
Here's the code in play:
F = PolynomialRing(GF(32),'a')
a = F.gen()
Ring = F.quotient(a^11 - 1, 'x')
x = Ring.gen()
pollist = [-1, 1, 1, 0, -1, 0, 1, 0, 0, 1, -1]
fq = Ring(pollist)
print(fq)
print(fq^(-1))
The Ring is described as follows:
Univariate Quotient Polynomial Ring in x over Finite Field in z5 of size 2^5 with modulus a^11 + 1
And the result:
x^10 + x^9 + x^6 + x^4 + x^2 + x + 1
x^5 + x + 1
I've tried to replace the Finite Field with IntegerModRing(32), but the inversion ends up demanding a field, as implied by the message:
NotImplementedError: The base ring (=Ring of integers modulo 32) is not a field
Any suggestions as to how I could obtain the correct inverse of f (mod q) would be greatly appreciated.

GF(32) is the finite field with 32 elements, not the integers modulo 32. You must use Zmod(32) (or IntegerModRing(32), as you suggested) instead.
As you point out, Sage psychotically bans you from computing inverses in ℤ/32ℤ[a]/(a¹¹-1) because that is not a field, and not even a factorial ring. It can, however, compute those inverses when they exist, only you must ask more kindly:
sage: F.<a> = Zmod(32)[]
sage: fq = F([-1, 1, 1, 0, -1, 0, 1, 0, 0, 1, -1])
sage: print(fq)
31*a^10 + a^9 + a^6 + 31*a^4 + a^2 + a + 31
sage: print(fq.inverse_mod(a^11 - 1))
16*a^8 + 4*a^7 + 10*a^5 + 28*a^4 + 9*a^3 + 13*a^2 + 21*a + 1
Not ideal, admittedly.

Triangular Vertices - Lua calculation?

I am current determined to complete a problem in Lua and have no idea where to begin. I was thinking about beginning with a modulus operator. I am searching for advice from experienced Lua programmers on how to program this and mainly how I can calculate the theoretically mathematical side of the problem.
Source of the question... (http://www.eecs.qmul.ac.uk/~pbo/ACM/archive/00209.html)
Gratitude will be shown to anyone who answers correctly.
Thank-you.

function get_left(max)
i = 0
j = 1
ls = {}
repeat
i = i + 1
j = j + i - 1
ls[i] = j
print (ls[i], " ls")
until j >= max
return ls
end
a = get_left(27)
(need to format that as code -.-)
if the point1 is between a[i] and a[i+1]
if then point2 is still between a[i] and a[i+1] it is on the same line
else if point2 is between a[i + n] and a[i + n + 1]
then if point2 is at a[i + n] + (a[i] - point1) + n + 1 its in a strait line right above it
else if point2 is at a[i + n] + (a[i] - point1) + n - 1 its in a strait line left above it
if for all points n or n*(-1) is equal the distance between the points is equal.
That is if I didn't make any logical errors and you probably have to check more for it to work properly.
This is more a mathematical question than lua I recommend adding a math tag to it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Can't replicate RStan ESS code from Vehtari paper - mcmc

In the formula in the paper, s^2 is is the estimate of variance and rho the estimate of autocorrelation. Thus s^2 * rho is an estimate of the autocovariance, which is what you see in the code.

Related

How to find and update levels accordingly based on points?

Should I exit my gradient descent loop as soon as the cost increases?

Calculating Gradient Update

Sage: Polynomial ring over finite field - inverting a polynomial non-prime

Triangular Vertices - Lua calculation?

Categories

Resources