Time series simulation (Monte Carlo) code - time-series

I'm trying to make a Monte Carlo Simulation with time series and I can't get what I'm doing wrong due to my little knowledge in Stata.
With my ARMA model I'm able to create a time series with 300 observations.
My idea is to do this process 1000 times and in every process I want to save the mean and variance of the 300 observations in a matrix.
This is my ARMA model:
This is my code:
clear all
set more off
set matsize 1000
matrix simulaciones =J(1000,2,0) *To save every simulation of every time series generated
matrix serie = J(300,3,0) *To save each time series
set obs 300 *For the 300 observations in every time series
gen t = _n
tsset t
g y1=0
forvalue j = 1(1)1000{
* Creating a time series
forvalues i = 1(1)300 {
gen e = rnormal(0,1)
replace y1=0 if t==1
replace y1 = 0.7*L1.y1 + e - 0.6*L1.e if t == 2
replace y1 = 0.7*L1.y1 - 0.1*L2.y1 + e - 0.6*L1.e + 0.08*L2.e if t > 2
matrix serie[`i',3] = y1
drop e y1
}
svmat serie
matrix simulaciones[`j',1] = mean(y1)
matrix simulaciones[`j',2] = var(y1)
}
I have no idea how to follow and any idea or recommendation is more than welcomed.
Thanks a lot for your help and time.

Related

Linear Regression not optimizing for non linear data

I am new to ML and am trying my hands on Linear regression. I am using this dataset. The data and my "optimized" model look like this:
I am modifying the data like this:
X = np.vstack((np.ones((X.size)),X,X**2))
Y = np.log10 (Y)
#have tried roots of Y and 3 degree feature as well
Intial cost: 0.8086672720475084
Optimized cost: 0.7282965408177141
I am unable to optimize further no matter the no. of runs.
Increasing learning rate causes increase in cost.
My rest algorithm is fine as I am able to optimize for a simpler dataset. Shown Below:
Sorry, If this is something basic but I can't seem to find a way to optimize my model for original data.
EDIT:
Pls have look at my code, I don't why its not working
def GradientDescent(X,Y,theta,alpha):
m = X.shape[1]
h = Predict(X,theta)
gradient = np.dot(X,(h - Y))
gradient.shape = (gradient.size,1)
gradient = gradient/m
theta = theta - alpha*gradient
cost = CostFunction(X,Y,theta)
return theta,cost
def CostFunction(X,Y,theta):
m = X.shape[1]
h = Predict(X,theta)
cost = h - Y
cost = np.sum(np.square(cost))/(2*m)
return cost
def Predict(X,theta):
h = np.transpose(X).dot(theta)
return h
x is 2,333
y is 333,1
I tried debugging it again but I can't find it. Pls help me.

Understanding code wrt Logistic Regression using gradient descent

I was following Siraj Raval's videos on logistic regression using gradient descent :
1) Link to longer video :
https://www.youtube.com/watch?v=XdM6ER7zTLk&t=2686s
2) Link to shorter video :
https://www.youtube.com/watch?v=xRJCOz3AfYY&list=PL2-dafEMk2A7mu0bSksCGMJEmeddU_H4D
In the videos he talks about using gradient descent to reduce the error for a set number of iterations so that the function converges(slope becomes zero).
He also illustrates the process via code. The following are the two main functions from the code :
def step_gradient(b_current, m_current, points, learningRate):
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
b_gradient += -(2/N) * (y - ((m_current * x) + b_current))
m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current))
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
return [new_b, new_m]
def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
b = starting_b
m = starting_m
for i in range(num_iterations):
b, m = step_gradient(b, m, array(points), learning_rate)
return [b, m]
#The above functions are called below:
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_m = 0 # initial slope guess
num_iterations = 1000
[b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
# code taken from Siraj Raval's github page
Why does the value of b & m continue to update for all the iterations? After a certain number of iterations, the function will converge, when we find the values of b & m that give slope = 0.
So why do we continue iteration after that point and continue updating b & m ?
This way, aren't we losing the 'correct' b & m values? How is learning rate helping the convergence process if we continue to update values after converging? Thus, why is there no check for convergence, and so how is this actually working?
In practice, most likely you will not reach to slope 0 exactly. Thinking of your loss function as a bowl. If your learning rate is too high, it is possible to overshoot over the lowest point of the bowl. On the contrary, if the learning rate is too low, your learning will become too slow and won't reach the lowest point of the bowl before all iterations are done.
That's why in machine learning, the learning rate is an important hyperparameter to tune.
Actually, once we reach a slope 0; b_gradient and m_gradient will become 0;
thus, for :
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
new_b and new_m will remain the old correct values; as nothing will be subtracted from them.

Theano gradient doesn't work with .sum(), only .mean()?

I'm trying to learn theano and decided to implement linear regression (using their Logistic Regression from the tutorial as a template). I'm getting a wierd thing where T.grad doesn't work if my cost function uses .sum(), but does work if my cost function uses .mean(). Code snippet:
(THIS DOESN'T WORK, RESULTS IN A W VECTOR FULL OF NANs):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.sum()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
(THIS DOES WORK, PERFECTLY):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.mean()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
The only difference is in the cost = single_error.sum() vs single_error.mean(). What I don't understand is that the gradient should be the exact same in both cases (one is just a scaled version of the other). So what gives?
The learning rate (0.1) is way to big. Using mean make it divided by the batch size, so this help. But I'm pretty sure you should make it much smaller. Not just dividing by the batch size (which is equivalent to using mean).
Try a learning rate of 0.001.
Try dividing your gradient descent step size by the number of training examples.

logistic regression with gradient descent error

I am trying to implement logistic regression with gradient descent,
I get my Cost function j_theta for the number of iterations and fortunately my j_theta is decreasing when plotted j_theta against the number of iteration.
The data set I use is given below:
x=
1 20 30
1 40 60
1 70 30
1 50 50
1 50 40
1 60 40
1 30 40
1 40 50
1 10 20
1 30 40
1 70 70
y= 0
1
1
1
0
1
0
0
0
0
1
The code that I managed to write for logistic regression using Gradient descent is:
%1. The below code would load the data present in your desktop to the octave memory
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);
%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];
X=x;
% Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);
% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather
max_iter=50;
theta = zeros(size(x(1,:)))';
j_theta=zeros(max_iter,1);
for num_iter=1:max_iter
% We calculate the cost Function
j_cost_each=0;
alpha=1;
theta
for i=1:m
z=0;
for j=1:n+1
% theta(j)
z=z+(theta(j)*x(i,j));
z
end
h= 1.0 ./(1.0 + exp(-z));
j_cost_each=j_cost_each + ( (-y(i) * log(h)) - ((1-y(i)) * log(1-h)) );
% j_cost_each
end
j_theta(num_iter)=(1/m) * j_cost_each;
for j=1:n+1
grad(j) = 0;
for i=1:m
z=(x(i,:)*theta);
z
h=1.0 ./ (1.0 + exp(-z));
h
grad(j) += (h-y(i)) * x(i,j);
end
grad(j)=grad(j)/m;
grad(j)
theta(j)=theta(j)- alpha * grad(j);
end
end
figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off
figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+');
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
plot_x = [min(x(:,2))-2, max(x(:,2))+2]; % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off
%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%
If you view the graph of x1 vs x2 the graph would look like,
After I run my code I create a decision boundary. The shape of the decision line seems to be okay but it is a bit displaced. The graph of the x1 vs x2 with decision boundary is given below:
![enter image description here][2]
Please suggest me where am I going wrong ....
Thanks:)
The New Graph::::
![enter image description here][1]
If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X.
The problem lies in your cost function calculation and/or gradient calculation, your plotting function is fine. I ran your dataset on the algorithm I implemented for logistic regression but using the vectorized technique because in my opinion it is easier to debug.
The final values I got for theta were
theta =
[-76.4242,
0.8214,
0.7948]
I also used alpha = 0.3
I plotted the decision boundary and it looks fine, I would recommend using the vectorized form as it is easier to implement and to debug in my opinion.
I also think your implementation of gradient descent is not quite correct. 50 iterations is just not enough and the cost at the last iteration is not good enough. Maybe you should try to run it for more iterations with a stopping condition.
Also check this lecture for optimization techniques.
https://class.coursera.org/ml-006/lecture/37

combine time series plot by using R

I wanna combine three graphics on one graph. The data from inside of R which is " nottem ". Can someone help me to write code to put a seasonal mean and harmonic (cosine model) and its time series plots together by using different colors? I already wrote model code just don't know how to combine them together to compare.
Code :library(TSA)
nottem
month.=season(nottem)
model=lm(nottem~month.-1)
summary(nottem)
har.=harmonic(nottem,1)
model1=lm(nottem~har.)
summary(model1)
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
points(y=nottem,x=time(nottem), pch=as.vector(season(nottem)))
Just put your time series inside a matrix:
x = cbind(serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)),
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)))
plot(x)
Or configure the plot region:
par(mfrow = c(2, 1)) # 2 rows, 1 column
serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
require(zoo)
plot(serie1)
lines(rollapply(serie1, width = 10, FUN = mean), col = 'red')
plot(serie2)
lines(rollapply(serie2, width = 10, FUN = mean), col = 'blue')
hope it helps.
PS.: zoo package is not needed in this example, you could use the filter function.
You can extract the seasonal mean with:
s.mean = tapply(serie, cycle(serie), mean)
# January, assuming serie is monthly data
print(s.mean[1])
This graph is pretty hard to read, because your three sets of values are so similar. Still, if you want to simply want to graph all of these on the sample plot, you can do it pretty easily by using the coefficients generated by your models.
Step 1: Plot the raw data. This comes from your original code.
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
Step 2: Set up x-values for the mean and cosine plots.
x <- seq(1920, (1940 - 1/12), by=1/12)
Step 3: Plot the seasonal means by repeating the coefficients from the first model.
lines(x=x, y=rep(model$coefficients, 20), col="blue")
Step 4: Calculate the y-values for the cosine function using the coefficients from the second model, and then plot.
y <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
lines(x=x, y=y, col="red")
ggplot variant: If you decide to switch to the popular 'ggplot2' package for your plot, you would do it like so:
x <- seq(1920, (1940 - 1/12), by=1/12)
y.seas.mean <- rep(model$coefficients, 20)
y.har.cos <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
plot_Data <- melt(data.frame(x=x, temp=nottem, seas.mean=y.seas.mean, har.cos=y.har.cos), id="x")
ggplot(plot_Data, aes(x=x, y=value, col=variable)) + geom_line()

Resources