How can I perform sensitivity analysis on ODEs with SALib? - analysis

def iron(XYZ,t,a12,a21,a23,a32,b13,b31,I):
X1,X2,X3=XYZ
dX1=-a12*(X1)+a21*(X2)-b13*(X1)+b31*(X3)
dX2=-a23*(X2)-a21*(X2)+a12*(X1)+a32*(X3)
dX3=-a32*(X3)-b31*(X3)+a23*(X2)+b13*(X1)-I
return dX1,dX2,dX3;
a12=0.0005
a21=0.00001
a23=0.0003
a32=0.0002
b13=0.0001
b31=0.000001
I=0.001
XYZ0=[1000.,30.,10.]
X10=1000.
X20=50.
X30=30.
t=linspace(0,100,1000) #(start,stop,num samples to generate)
XYZ=odeint(iron,XYZ0,t,args=(a12,a21,a23,a32,b13,b31,I))
Is it possible to perform sensitivity analysis on this system of ODEs using SALib? I want to study the influence of model inputs (parameters a and b, initial condition). Also, can I obtain the asymptotic solution values?

The following code is a sample implementation of the problem for Sobol analysis. The bounds for each parameter would have to be adjusted based on the problem; I assumed a range for the example. Provided there is an asymptotic solution that can be found generally as a function of the parameter (a12, a21,...,b13,b31,I), you could follow a similar procedure. Determining the asymptotic solution is probably better posted as a separate question.
The X1, X2, and X3 values at the end of the time must be analyzed separately by the Sobol method. The sensitivity index could be computed for each X1, X2, and X3 for all time steps, but it would require saving all the output from each iteration of the loop. It would also require running the Sobol analysis many more times.
Some of the sample output for the following code example is:
====X2 Sobol output====
Parameter S1 S1_conf ST ST_conf
a12 0.409635 0.083264 0.411180 0.049683
a21 0.000002 0.000095 0.000001 0.000000
a23 -0.000626 0.002955 0.000471 0.000057
a32 0.000068 0.000504 0.000017 0.000002
b13 0.000045 0.000232 0.000004 0.000001
b31 0.000000 0.000000 0.000000 0.000000
x1_0 0.430008 0.078269 0.434074 0.053487
x2_0 0.169098 0.051591 0.162944 0.018678
x3_0 -0.000038 0.000335 0.000007 0.000001
Example Code Implementation
# importing packages
from scipy import integrate as sp
import numpy as np
import SALib
from SALib.sample import saltelli
from SALib.analyze import sobol
# definition of the system of ODEs
def iron(XYZ,t,a12,a21,a23,a32,b13,b31,I):
X1,X2,X3=XYZ
dX1=-a12*(X1)+a21*(X2)-b13*(X1)+b31*(X3)
dX2=-a23*(X2)-a21*(X2)+a12*(X1)+a32*(X3)
dX3=-a32*(X3)-b31*(X3)+a23*(X2)+b13*(X1)-I
return dX1,dX2,dX3;
# default parameter values
a12=0.0005
a21=0.00001
a23=0.0003
a32=0.0002
b13=0.0001
b31=0.000001
I=0.001
# initial condition
XYZ0=[1000.,30.,10.]
X10=1000.
X20=50.
X30=30.
# tmie steps
t=np.linspace(0,100,1000) #(start,stop,num samples to generate)
# example single calculation
XYZ=sp.odeint(iron,XYZ0,t,args=(a12,a21,a23,a32,b13,b31,I))
### Sobol analysis ###
# defining problem
# can add the 'I' parameter
# assumed that range for each parameter is 80-120% of value assumed above
# can be changed
problem = {
'num_vars': 9, #a's, b's and initial condition
'names': ['a12', 'a21','a23','a32','b13','b31','x1_0','x2_0','x3_0'],
'bounds': np.column_stack((np.array([a12, a21,a23,a32,b13,b31,XYZ0[0],XYZ0[1],XYZ0[2]])*0.8,np.array([a12, a21,a23,a32,b13,b31,XYZ0[0],XYZ0[1],XYZ0[2]])*1.2))
}
# Generate samples
vals = saltelli.sample(problem, 500)
# initializing matrix to store output
Y = np.zeros([len(vals),1])
# Run model (example)
# numerically soves the ODE
# output is X1, X2, and X3 at the end time step
# could save output for all time steps if desired, but requires more memory
Y = np.zeros([len(vals),3])
for i in range(len(vals)):
Y[i][:] = sp.odeint(iron,[vals[i][6],vals[i][7],vals[i][8]],t,\
args=(vals[i][0],vals[i][1],vals[i][2],vals[i][3],vals[i][4],vals[i][5],I))[len(XYZ)-1]
# completing soboal analysis for each X1, X2, and X3
print('\n\n====X1 Sobol output====\n\n')
Si_X1 = sobol.analyze(problem, Y[:,0], print_to_console=True)
print('\n\n====X2 Sobol output====\n\n')
Si_X2 = sobol.analyze(problem, Y[:,1], print_to_console=True)
print('\n\n====X3 Sobol output====\n\n')
Si_X3 = sobol.analyze(problem, Y[:,2], print_to_console=True)

Related

MCMC for multi coefficients with normal gaussian distribution

I have a linear model as follows: Acceleration= (C1V)+ C2(X- D) where D= alpha-(betaV)+(gammaM).
note that the values for V,X,M are given in dataset.
My goal is to run 350 times MCMC for each following coefficients: C1,C2,alpha,beta,gamma.
1- I have mean and standard deviation for C1,C2,alpha,beta,gamma.
2- All coefficients (C1,C2,alpha,beta,gamma) are normal distribution.
I have tried two methods to find MCMC for each of coefficients, one is by pymc3(which I'm not sure if I have done correctly), another is defining likelihood function based on the method mentioned in following link by Jonny Homfmeister (in my case, I changed the distribution from binomial to normal gaussian):
https://towardsdatascience.com/bayesian-inference-and-markov-chain-monte-carlo-sampling-in-python-bada1beabca7
The problem is that, after running MCMC for C1,C2,alpha,beta and gamma, and using the mean of posterior (output of MCMC) in my main model, I see that the absolute error has increased! it means the coefficients have not optimized by MCMC and my method does not work properly!
I do appreciate it if someone help me with correct algorithm for MCMC for Normal distribution.
####First method: pymc3##########
import pymc3 as pm
import scipy.stats as st
import arviz as az
for row in range(350):
X_c1 = st.norm(loc=-0.06, scale=0.47).rvs(size=100)
with pm.Model() as model:
prior = pm.Normal('c1', mu=-0.06, sd=0.47) ###### prior #####weights
obs = pm.Normal('obs', mu=prior, sd=0.47, observed=X_c1) #######likelihood
step = pm.Metropolis()
trace_c1 = pm.sample(draws=30, chains=2, step=step, return_inferencedata=True)
###calculate the mean of output (posterior distribution)#################3
mean_c1= az.summary(trace_c1, var_names=["c1"], round_to=2).iloc[0][['mean']]
mean_c1 = mean_c1.to_numpy()
Acceleration= (mean_C1*V)+ C2*(X- D) ######apply model
######second method in the given link##########
## Define the Likelihood P(x|p) - normal distribution
def likelihood(p):
return scipy.stats.norm.cdf(C1, loc=-0.06, scale=0.47)
def prior(p):
return scipy.stats.norm.pdf(p)
def acceptance_ratio(p, p_new):
# Return R, using the functions we created before
return min(1, ((likelihood(p_new) / likelihood(p)) * (prior(p_new) / prior(p))))
p = np.random.normal(C1, 0.47) # Initialzie a value of p
#### Define model parameters
n_samples = 790 ################# I HAVE NO IDEA HOW TO CHOOSE THIS VALUE????###########
burn_in = 99
lag = 2
##### Create the MCMC loop
for i in range(n_samples):
p_new = np.random.random_sample() ###Propose a new value of p randomly from a normal distribution
R = acceptance_ratio(p, p_new) #### Compute acceptance probability
u= np.random.uniform(0, 1) ####Draw random sample to compare R to
if u < R: ##### If R is greater than u, accept the new value of p
p = p_new
if i > burn_in and i%lag == 0: #### Record values after burn in - how often determined by lag
results_1.append(p)

FedProx with TensorFlow Federated

Would anyone know how to implement the FedProx optimisation algorithm with TensorFlow Federated? The only implementation that seems to be available online was developed directly with TensorFlow. A TFF implementation would enable an easier comparison with experiments that utilise FedAvg which the framework supports.
This is the link to the FedProx repo: https://github.com/litian96/FedProx
Link to the paper: https://arxiv.org/abs/1812.06127
At this moment, FedProx implementation is not available. I agree it would be a valuable algorithm to have.
If you are interested in contributing FedProx, the best place to start would be simple_fedavg which is a minimal implementation of FedAvg meant as a starting point for extensions -- see the readme there for more details.
I think the major change would need to happen to the client_update method, where you would add the proximal term depending on model_weights and initial_weights to the loss computed in forward pass.
I provide below my implementation of FedProx in TFF. I am not 100% sure that this is the right implementation; I post this answer also for discussing on actual code example.
I tried to follow the suggestions in the Jacub Konecny's answer and comment.
Starting from the simple_fedavg (referring to the TFF Github repo), I just modified the client_update method, and specifically changing the input argument for calculating the gradient with the GradientTape, i.e. instaead of just passing in input the outputs.loss, the tape calculates the gradient considering the outputs.loss + proximal_term previosuly (and iteratively) calculated.
#tf.function
def client_update(model, dataset, server_message, client_optimizer):
"""Performans client local training of "model" on "dataset".Args:
model: A "tff.learning.Model".
dataset: A "tf.data.Dataset".
server_message: A "BroadcastMessage" from server.
client_optimizer: A "tf.keras.optimizers.Optimizer".
Returns:
A "ClientOutput".
"""
def difference_model_norm_2_square(global_model, local_model):
"""Calculates the squared l2 norm of a model difference (i.e.
local_model - global_model)
Args:
global_model: the model broadcast by the server
local_model: the current, in-training model
Returns: the squared norm
"""
model_difference = tf.nest.map_structure(lambda a, b: a - b,
local_model,
global_model)
squared_norm = tf.square(tf.linalg.global_norm(model_difference))
return squared_norm
model_weights = model.weights
initial_weights = server_message.model_weights
tf.nest.map_structure(lambda v, t: v.assign(t), model_weights,
initial_weights)
num_examples = tf.constant(0, dtype=tf.int32)
loss_sum = tf.constant(0, dtype=tf.float32)
# Explicit use `iter` for dataset is a trick that makes TFF more robust in
# GPU simulation and slightly more performant in the unconventional usage
# of large number of small datasets.
for batch in iter(dataset):
with tf.GradientTape() as tape:
outputs = model.forward_pass(batch)
# ------ FedProx ------
mu = tf.constant(0.2, dtype=tf.float32)
prox_term =(mu/2)*difference_model_norm_2_square(model_weights.trainable, initial_weights.trainable)
fedprox_loss = outputs.loss + prox_term
# Letting GradientTape dealing with the FedProx's loss
grads = tape.gradient(fedprox_loss, model_weights.trainable)
client_optimizer.apply_gradients(zip(grads, model_weights.trainable))
batch_size = tf.shape(batch['x'])[0]
num_examples += batch_size
loss_sum += outputs.loss * tf.cast(batch_size, tf.float32)
weights_delta = tf.nest.map_structure(lambda a, b: a - b,
model_weights.trainable,
initial_weights.trainable)
client_weight = tf.cast(num_examples, tf.float32)
return ClientOutput(weights_delta, client_weight, loss_sum / client_weight)

Importance weighted autoencoder doing worse than VAE

I've been implementing VAE and IWAE models on the caltech silhouettes dataset and am having an issue where the VAE outperforms IWAE by a modest margin (test LL ~120 for VAE, ~133 for IWAE!). I don't believe this should be the case, according to both theory and experiments produced here.
I'm hoping someone can find some issue in how I'm implementing that's causing this to be the case.
The network I'm using to approximate q and p is the same as that detailed in the appendix of the paper above. The calculation part of the model is below:
data_k_vec = data.repeat_interleave(K,0) # Generate K samples (in my case K=50 is producing this behavior)
mu, log_std = model.encode(data_k_vec)
z = model.reparameterize(mu, log_std) # z = mu + torch.exp(log_std)*epsilon (epsilon ~ N(0,1))
decoded = model.decode(z) # this is the sigmoid output of the model
log_prior_z = torch.sum(-0.5 * z ** 2, 1)-.5*z.shape[1]*T.log(torch.tensor(2*np.pi))
log_q_z = compute_log_probability_gaussian(z, mu, log_std) # Definitions below
log_p_x = compute_log_probability_bernoulli(decoded,data_k_vec)
if model_type == 'iwae':
log_w_matrix = (log_prior_z + log_p_x - log_q_z).view(-1, K)
elif model_type =='vae':
log_w_matrix = (log_prior_z + log_p_x - log_q_z).view(-1, 1)*1/K
log_w_minus_max = log_w_matrix - torch.max(log_w_matrix, 1, keepdim=True)[0]
ws_matrix = torch.exp(log_w_minus_max)
ws_norm = ws_matrix / torch.sum(ws_matrix, 1, keepdim=True)
ws_sum_per_datapoint = torch.sum(log_w_matrix * ws_norm, 1)
loss = -torch.sum(ws_sum_per_datapoint) # value of loss that gets returned to training function. loss.backward() will get called on this value
Here are the likelihood functions. I had to fuss with the bernoulli LL in order to not get nan during training
def compute_log_probability_gaussian(obs, mu, logstd, axis=1):
return torch.sum(-0.5 * ((obs-mu) / torch.exp(logstd)) ** 2 - logstd, axis)-.5*obs.shape[1]*T.log(torch.tensor(2*np.pi))
def compute_log_probability_bernoulli(theta, obs, axis=1): # Add 1e-18 to avoid nan appearances in training
return torch.sum(obs*torch.log(theta+1e-18) + (1-obs)*torch.log(1-theta+1e-18), axis)
In this code there's a "shortcut" being used in that the row-wise importance weights are being calculated in the model_type=='iwae' case for the K=50 samples in each row, while in the model_type=='vae' case the importance weights are being calculated for the single value left in each row, so that it just ends up calculating a weight of 1. Maybe this is the issue?
Any and all help is huge - I thought that addressing the nan issue would permanently get me out of the weeds but now I have this new problem.
EDIT:
Should add that the training scheme is the same as that in the paper linked above. That is, for each of i=0....7 rounds train for 2**i epochs with a learning rate of 1e-4 * 10**(-i/7)
The K-sample importance weighted ELBO is
$$ \textrm{IW-ELBO}(x,K) = \log \sum_{k=1}^K \frac{p(x \vert z_k) p(z_k)}{q(z_k;x)}$$
For the IWAE there are K samples originating from each datapoint x, so you want to have the same latent statistics mu_z, Sigma_z obtained through the amortized inference network, but sample multiple z K times for each x.
So its computationally wasteful to compute the forward pass for data_k_vec = data.repeat_interleave(K,0), you should compute the forward pass once for each original datapoint, then repeat the statistics output by the inference network for sampling:
mu = torch.repeat_interleave(mu,K,0)
log_std = torch.repeat_interleave(log_std,K,0)
Then sample z_k. And now repeat your datapoints data_k_vec = data.repeat_interleave(K,0), and use the resulting tensor to efficiently evaluate the conditional p(x |z_k) for each importance sample z_k.
Note you may also want to use the logsumexp operation when calculating the IW-ELBO for numerical stability. I can't quite figure out what's going on with the log_w_matrix calculation in your post, but this is what I would do:
log_pz = ...
log_qzCx = ....
log_pxCz = ...
log_iw = log_pxCz + log_pz - log_qzCx
log_iw = log_iw.reshape(-1, K)
iwelbo = torch.logsumexp(log_iw, dim=1) - np.log(K)
EDIT: Actually after thinking about it a bit and using the score function identity, you can interpret the IWAE gradient as an importance weighted estimate of the standard single-sample gradient, so the method in the OP for calculation of the importance weights is equivalent (if a bit wasteful), provided you place a stop_gradient operator around the normalized importance weights, which you call w_norm. So I the main problem is the absence of this stop_gradient operator.

How does one make sure that the parameters are update manually in pytorch using modules?

I wanted to update the parameters of a model manually with pytorch. I made a super simple standard sequential model (full code here) but whenever I try to train my model it does not train unless I create the actual variables explicitly (code for model variables explicitly). So with the sequential model the code looks as follow:
mdl_sgd = torch.nn.Sequential( torch.nn.Linear(D_sgd,1,bias=False) )
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
y_pred = mdl_sgd.forward(X)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## Manually zero the gradients after updating weights
mdl_sgd.zero_grad()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
for W in mdl_sgd.parameters():
#print(W.grad.data)
W.data = W.data - eta*W.grad.data
when I train it it seems that nothing happens. I've tried many things to make this work like wrapping it in a class and putting explicit require_grads=True or change the locations where I make the zero out the gradients etc but nothing seems to work. What I really want/need is to be able to explicitly be able to do the update rule myself (not with optimum). Not sure if thats the reason it doesn't work but the following does work for some reason:
X = poly_kernel_matrix(x_true,Degree_mdl) # maps to the feature space of the model
X = Variable(torch.FloatTensor(X).type(dtype), requires_grad=False)
Y = Variable(torch.FloatTensor(Y).type(dtype), requires_grad=False)
w_init=torch.randn(D_sgd,1).type(dtype)
W = Variable( w_init, requires_grad=True)
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
#y_pred = mdl_sgd.forward(X)
y_pred = batch_xs.mm(W)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
W.data = W.data - eta*W.grad.data
## Manually zero the gradients after updating weights
#mdl_sgd.zero_grad()
W.grad.data.zero_()
the reason I know this is because the plot of the regression lines look sensible:
while when I use the torch.nn.Sequential I get:
I am sure its a really newbie question but I am not sure why I can't update the parameters. Does someone know why? I want to be able to update the parameters manually (however I want) and in this case I decided to use SGD to see if I could even update the parameters.
Note I also tried subclassing modules and registering params but it didn't work either. This is the class I built:
class regression_NN(torch.nn.Module):
def __init__(self,w_init):
"""
"""
super(type(self), self).__init__()
# mdl
#self.W = Variable(w_init, requires_grad=True)
#self.W = torch.nn.Parameter( Variable(w_init, requires_grad=True) )
#self.W = torch.nn.Parameter( w_init )
self.W = torch.nn.Parameter( w_init,requires_grad=True )
#self.mod_list = torch.nn.ModuleList([self.W])
def forward(self, x):
"""
"""
y_pred = x.mm(self.W)
return y_pred
All code is:
https://github.com/brando90/simple_regression
I'm relatively new at pytorch so I might have many bad practice...you can correct them if u want but Im mostly concerned that my paremters are not updating even when I try to explicitly register them in a class that inherits from torch.nn.Module.
I also linked to the question from the pytorch official forum: https://discuss.pytorch.org/t/how-does-one-make-sure-that-the-parameters-are-update-manually-in-pytorch-using-modules/6076

How to stack Autoencoder/ Create Deep Autoencoder with Theano class

I understand the concept behind Stacked/ Deep Autoencoders and therefore want to implement it with the following code of a single-layer de-noising Autoencoder. Theano also provides a tutorial for a Stacked Autoencoder but this is trained in a supervised fashion - I need to stack it to establish unsupervised (hierarchical) feature learning.
Any idea how to get this working with the following code?
import os
import sys
import timeit
import numpy
import theano
import theano.tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
from logistic_sgd import load_data
from utils import tile_raster_images
try:
import PIL.Image as Image
except ImportError:
import Image
class dA(object):
"""Denoising Auto-Encoder class (dA)
A denoising autoencoders tries to reconstruct the input from a corrupted
version of it by projecting it first in a latent space and reprojecting
it afterwards back in the input space. Please refer to Vincent et al.,2008
for more details. If x is the input then equation (1) computes a partially
destroyed version of x by means of a stochastic mapping q_D. Equation (2)
computes the projection of the input into the latent space. Equation (3)
computes the reconstruction of the input, while equation (4) computes the
reconstruction error.
.. math::
\tilde{x} ~ q_D(\tilde{x}|x) (1)
y = s(W \tilde{x} + b) (2)
x = s(W' y + b') (3)
L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
"""
def __init__(
self,
numpy_rng,
theano_rng=None,
input=None,
n_visible=784,
n_hidden=500,
W=None,
bhid=None,
bvis=None
):
"""
Initialize the dA class by specifying the number of visible units (the
dimension d of the input ), the number of hidden units ( the dimension
d' of the latent or hidden space ) and the corruption level. The
constructor also receives symbolic variables for the input, weights and
bias. Such a symbolic variables are useful when, for example the input
is the result of some computations, or when weights are shared between
the dA and an MLP layer. When dealing with SdAs this always happens,
the dA on layer 2 gets as input the output of the dA on layer 1,
and the weights of the dA are used in the second stage of training
to construct an MLP.
:type numpy_rng: numpy.random.RandomState
:param numpy_rng: number random generator used to generate weights
:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
:param theano_rng: Theano random generator; if None is given one is
generated based on a seed drawn from `rng`
:type input: theano.tensor.TensorType
:param input: a symbolic description of the input or None for
standalone dA
:type n_visible: int
:param n_visible: number of visible units
:type n_hidden: int
:param n_hidden: number of hidden units
:type W: theano.tensor.TensorType
:param W: Theano variable pointing to a set of weights that should be
shared belong the dA and another architecture; if dA should
be standalone set this to None
:type bhid: theano.tensor.TensorType
:param bhid: Theano variable pointing to a set of biases values (for
hidden units) that should be shared belong dA and another
architecture; if dA should be standalone set this to None
:type bvis: theano.tensor.TensorType
:param bvis: Theano variable pointing to a set of biases values (for
visible units) that should be shared belong dA and another
architecture; if dA should be standalone set this to None
"""
self.n_visible = n_visible
self.n_hidden = n_hidden
# create a Theano random generator that gives symbolic random values
if not theano_rng:
theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
# note : W' was written as `W_prime` and b' as `b_prime`
if not W:
# W is initialized with `initial_W` which is uniformely sampled
# from -4*sqrt(6./(n_visible+n_hidden)) and
# 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
# converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU
initial_W = numpy.asarray(
numpy_rng.uniform(
low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
size=(n_visible, n_hidden)
),
dtype=theano.config.floatX
)
W = theano.shared(value=initial_W, name='W', borrow=True)
if not bvis:
bvis = theano.shared(
value=numpy.zeros(
n_visible,
dtype=theano.config.floatX
),
borrow=True
)
if not bhid:
bhid = theano.shared(
value=numpy.zeros(
n_hidden,
dtype=theano.config.floatX
),
name='b',
borrow=True
)
self.W = W
# b corresponds to the bias of the hidden
self.b = bhid
# b_prime corresponds to the bias of the visible
self.b_prime = bvis
# tied weights, therefore W_prime is W transpose
self.W_prime = self.W.T
self.theano_rng = theano_rng
# if no input is given, generate a variable representing the input
if input is None:
# we use a matrix because we expect a minibatch of several
# examples, each example being a row
self.x = T.dmatrix(name='input')
else:
self.x = input
self.params = [self.W, self.b, self.b_prime]
def get_corrupted_input(self, input, corruption_level):
"""This function keeps ``1-corruption_level`` entries of the inputs the
same and zero-out randomly selected subset of size ``coruption_level``
Note : first argument of theano.rng.binomial is the shape(size) of
random numbers that it should produce
second argument is the number of trials
third argument is the probability of success of any trial
this will produce an array of 0s and 1s where 1 has a
probability of 1 - ``corruption_level`` and 0 with
``corruption_level``
The binomial function return int64 data type by
default. int64 multiplicated by the input
type(floatX) always return float64. To keep all data
in floatX when floatX is float32, we set the dtype of
the binomial to floatX. As in our case the value of
the binomial is always 0 or 1, this don't change the
result. This is needed to allow the gpu to work
correctly as it only support float32 for now.
"""
return self.theano_rng.binomial(size=input.shape, n=1,
p=1 - corruption_level,
dtype=theano.config.floatX) * input
def get_hidden_values(self, input):
""" Computes the values of the hidden layer """
return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
def get_reconstructed_input(self, hidden):
"""Computes the reconstructed input given the values of the
hidden layer
"""
return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
def get_cost_updates(self, corruption_level, learning_rate):
""" This function computes the cost and the updates for one trainng
step of the dA """
tilde_x = self.get_corrupted_input(self.x, corruption_level)
y = self.get_hidden_values(tilde_x)
z = self.get_reconstructed_input(y)
# note : we sum over the size of a datapoint; if we are using
# minibatches, L will be a vector, with one entry per
# example in minibatch
L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
# note : L is now a vector, where each element is the
# cross-entropy cost of the reconstruction of the
# corresponding example of the minibatch. We need to
# compute the average of all these to get the cost of
# the minibatch
cost = T.mean(L)
# compute the gradients of the cost of the `dA` with respect
# to its parameters
gparams = T.grad(cost, self.params)
# generate the list of updates
updates = [
(param, param - learning_rate * gparam)
for param, gparam in zip(self.params, gparams)
]
return (cost, updates)

Resources