combine time series plot by using R - time-series

I wanna combine three graphics on one graph. The data from inside of R which is " nottem ". Can someone help me to write code to put a seasonal mean and harmonic (cosine model) and its time series plots together by using different colors? I already wrote model code just don't know how to combine them together to compare.
Code :library(TSA)
nottem
month.=season(nottem)
model=lm(nottem~month.-1)
summary(nottem)
har.=harmonic(nottem,1)
model1=lm(nottem~har.)
summary(model1)
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
points(y=nottem,x=time(nottem), pch=as.vector(season(nottem)))

Just put your time series inside a matrix:
x = cbind(serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)),
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)))
plot(x)
Or configure the plot region:
par(mfrow = c(2, 1)) # 2 rows, 1 column
serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
require(zoo)
plot(serie1)
lines(rollapply(serie1, width = 10, FUN = mean), col = 'red')
plot(serie2)
lines(rollapply(serie2, width = 10, FUN = mean), col = 'blue')
hope it helps.
PS.: zoo package is not needed in this example, you could use the filter function.
You can extract the seasonal mean with:
s.mean = tapply(serie, cycle(serie), mean)
# January, assuming serie is monthly data
print(s.mean[1])

This graph is pretty hard to read, because your three sets of values are so similar. Still, if you want to simply want to graph all of these on the sample plot, you can do it pretty easily by using the coefficients generated by your models.
Step 1: Plot the raw data. This comes from your original code.
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
Step 2: Set up x-values for the mean and cosine plots.
x <- seq(1920, (1940 - 1/12), by=1/12)
Step 3: Plot the seasonal means by repeating the coefficients from the first model.
lines(x=x, y=rep(model$coefficients, 20), col="blue")
Step 4: Calculate the y-values for the cosine function using the coefficients from the second model, and then plot.
y <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
lines(x=x, y=y, col="red")
ggplot variant: If you decide to switch to the popular 'ggplot2' package for your plot, you would do it like so:
x <- seq(1920, (1940 - 1/12), by=1/12)
y.seas.mean <- rep(model$coefficients, 20)
y.har.cos <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
plot_Data <- melt(data.frame(x=x, temp=nottem, seas.mean=y.seas.mean, har.cos=y.har.cos), id="x")
ggplot(plot_Data, aes(x=x, y=value, col=variable)) + geom_line()

Related

Timeseries forecasting - Two values for slope

I am trying to do the classic method of decomposition of time series. I have somehow managed to get myself to the very last step where I'm supposed to calculate the Trend from the Trend-Cycle series using slope and intercept but I get 2 values instead of one for some reason. Why does slope contain two values?
pkg load io;
pkg load financial;
data = xlsread('exerciseinfo.xlsx','Φύλλο1','A1:B60');
t = data(:,1);
Y = data(:,2);
#Γραφική Παράσταση Δεδομένων
plot(t,Y);
title('Αφίξεις Αυτοκινήτων ανά Μήνα');
xlabel('Μήνας');
ylabel('Αφίξεις');
average = mean(Y);
#ΚΜΟ (5)
M = movmean(Y,5);
plot(t,M);
title('KMO(5)');
xlabel('Μήνας');
ylabel('Αφίξεις');
#Διπλός ΚΜΟ 3χ5
doublema = movmean(M,3);
plot(t,doublema);
title('Διπλός ΚΜΟ 5χ3');
xlabel('Μήνας');
ylabel('Αφίξεις');
# Κεντρικός ΚΜΟ 2χ4
MA4 = movmean(Y,4);
CMA = movmean(MA4,2);
plot(t,CMA);
title('Κεντρικός ΚΜΟ 2χ4');
xlabel('Μήνας');
ylabel('Αφίξεις');
#Βήμα 2 Αποσύνθεσης
Step2 = Y./CMA;
#Βήμα 3 Αποσύνθεσης
Step3 = mean(Step2);
#Βήμα 4 Αποσύνθεσης - Αποεποχικοποιημένη Σειρά
Step4 = Y./Step3;
#Βήμα 5 Αποσύνθεσης - Αφαίρεση τυχαιότητας
MA3 = movmean(Step4,3);
Step5 = movmean(MA3,3);
#Βήμα 6 Αποσύνθεσης
slope = polyfit(t,Step5,1)
I don't speak Greek, so I'll assume that by two "equations" you actually mean two "values", which is what polyfit should give you if you pass it the argument 1. You're fitting a polynomial of degree 1, i.e. an equation of the form y = ax + b, where you fit both a and b. To reconstruct your line then you use y = slope(0)*t + slope(1), where slope(0) is your "slope" and slope(0) your "intercept" (or y = slope(1)*t + slope(0), the docs don't state whether the coefficients are stored with increasing or decreasing degree.)

Time series simulation (Monte Carlo) code

I'm trying to make a Monte Carlo Simulation with time series and I can't get what I'm doing wrong due to my little knowledge in Stata.
With my ARMA model I'm able to create a time series with 300 observations.
My idea is to do this process 1000 times and in every process I want to save the mean and variance of the 300 observations in a matrix.
This is my ARMA model:
This is my code:
clear all
set more off
set matsize 1000
matrix simulaciones =J(1000,2,0) *To save every simulation of every time series generated
matrix serie = J(300,3,0) *To save each time series
set obs 300 *For the 300 observations in every time series
gen t = _n
tsset t
g y1=0
forvalue j = 1(1)1000{
* Creating a time series
forvalues i = 1(1)300 {
gen e = rnormal(0,1)
replace y1=0 if t==1
replace y1 = 0.7*L1.y1 + e - 0.6*L1.e if t == 2
replace y1 = 0.7*L1.y1 - 0.1*L2.y1 + e - 0.6*L1.e + 0.08*L2.e if t > 2
matrix serie[`i',3] = y1
drop e y1
}
svmat serie
matrix simulaciones[`j',1] = mean(y1)
matrix simulaciones[`j',2] = var(y1)
}
I have no idea how to follow and any idea or recommendation is more than welcomed.
Thanks a lot for your help and time.

Hide p_value and put stars to significant OR gtsummary

I'm using gtsummary package.
I need to merge different univariate logistic regression and in order to have a good presentation, I want to hide the p_value and bold or put a star to the significant OR (p< 0.05).
Anyone can help me?
Maybe it's easier to use another presentation type like kable, huxtable, I don't know?
Thank you for your help.
Have a nice day
There is a function called add_significance_stars() that hides the p-value and adds stars to the estimate indicating various levels of statistical significance. I've also added code to bold the estimate if significant with modify_table_styling().
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.4.0'
tbl <-
trial %>%
select(death, age, grade) %>%
tbl_uvregression(
y = death,
method = glm,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
# add significance stars to sig estimates
add_significance_stars() %>%
# additioanlly bolding significant estimates
modify_table_styling(
columns = estimate,
rows = p.value < 0.05,
text_format = "bold"
)
Created on 2021-04-14 by the reprex package (v2.0.0)
Here's a quick huxtable version:
l1 <- glm(I(cyl==8) ~ gear, data = mtcars, family = binomial)
l2 <- glm(I(cyl==8) ~ carb, data = mtcars, family = binomial)
huxtable::huxreg(l1, l2, statistics = "nobs", bold_signif = 0.05)
────────────────────────────────────────────────────
(1) (2)
───────────────────────────────────
(Intercept) 5.999 * -1.880 *
(2.465) (0.902)
gear -1.736 *
(0.693)
carb 0.579 *
(0.293)
───────────────────────────────────
nobs 32 32
────────────────────────────────────────────────────
*** p < 0.001; ** p < 0.01; * p < 0.05.
Column names: names, model1, model2
It doesn't show it here, but the significant coefficients are bold on screen (and in any other kind of output).

glmnet: Fit a GLMM with lasso or ridge and add binomial cloglog link

How can one specify link functions in glmnet for lasso / ridge / elastic net regression?
I have found the following post but not sure this helps me when I need to specify a cloglog link.
How to specify log link in glmnet?
I have a survey data set with binary response 0/1 (disease no/yes) and several predictor variables, which are mostly binary categorical (yes/no, male/female), some are counts (herd size), and a few are categorical with several levels.
I previously ran a generalized linear mixed model using glmer() function with binomial family and link = cloglog as doing so created the exact interpretation of the resulting intercept that I wanted (in disease study the intercept from this setup is equivalent to the mean value 'force of infection' - the rate at which susceptibles become infected - among the variation specified in the random effect (in my case the geographic unit (village or subvillage or household).
As there are several survey variables now available to me, I wanted to try a lasso and a ridge regression using glmnet. It is my understanding that I should best do this by putting in the glmm formula into the glmnet. However, I cannot find any documentation about how to add a link. I did so, in the syntax I thought would work, and it did run. But it also ran with nonsense entered in the link function.
Here is a reproducible example:
library(msm)
library(glmnet)
set.seed(1)
N = 1000
X = cbind( rbinom(n=N,size=1,prob=0.5), rnorm(n=N) )
beta = c(-0.1,0.1)
phi.true = exp( X%*%beta )
p = 1 - exp(-phi.true)
y = rbinom(n=N,size=1,prob = p)
dat <- data.frame(x=X,y=y)
x <- model.matrix(y~., dat)
glmnet(x, y, family="binomial", link="logit", alpha = 1, lambda = 2)
I get the same output whether I put in 'logit', 'cloglog' or even a name 'adam'. And cannot use same syntax as GLMM as in glmnet must be a character vector.
OUTPUT:
> glmnet(x, y, family="binomial"(link="logit"), alpha = 1, lambda = 2)
Error in match.arg(family) : 'arg' must be NULL or a character vector
> glmnet(x, y, family="binomial", link="logit", alpha = 1, lambda = 2)
Call: glmnet(x = x, y = y, family = "binomial", alpha = 1, lambda = 2, link = "logit")
Df %Dev Lambda
1 0 -7.12e-15 2
> glmnet(x, y, family="binomial", link="cloglog", alpha = 1, lambda = 2)
Call: glmnet(x = x, y = y, family = "binomial", alpha = 1, lambda = 2, link = "cloglog")
Df %Dev Lambda
1 0 -7.12e-15 2
> glmnet(x, y, family="binomial", link="adam", alpha = 1, lambda = 2)
Call: glmnet(x = x, y = y, family = "binomial", alpha = 1, lambda = 2, link = "adam")
Df %Dev Lambda
1 0 -7.12e-15 2
Is it not possible to change the default link function for binomial family in glmnet?
I think you want to use family = binomial(link = "cloglog")
See the new glmnet vignette: https://cran.r-project.org/web/packages/glmnet/vignettes/glmnetFamily.pdf

Why does RMSE increase with horizon when using the timeslice method in caret's trainControl function?

I'm using the timeslice method in caret's trainControl function to perform cross-validation on a time series model. I've noticed that RMSE increases with the horizon argument.
I realise this might happen for several reasons, e.g., if explanatory variables are being forecast and/or there's autocorrelation in the data such that the model can better predict nearer vs. farther ahead observations. However, I'm seeing the same behaviour even when neither is the case (see trivial reproducible example below).
Can anyone explain why RSMEs are increasing with horizon?
# Make data
X = data.frame(matrix(rnorm(1000 * 3), ncol = 3))
X$y = rowSums(X) + rnorm(nrow(X))
# Iterate over different different forecast horizons and record RMSES
library(caret)
forecast_horizons = c(1, 3, 10, 50, 100)
rmses = numeric(length(forecast_horizons))
for (i in 1:length(forecast_horizons)) {
ctrl = trainControl(method = 'timeslice', initialWindow = 500, horizon = forecast_horizons[i], fixedWindow = T)
rmses[i] = train(y ~ ., data = X, method = 'lm', trControl = ctrl)$results$RMSE
}
print(rmses) #0.7859786 0.9132649 0.9720110 0.9837384 0.9849005

Resources