Recursive daily forecast - time-series

I am doing a recursive one-step-ahead daily forecast with different time series models for 2010. For example:
set.seed(1096)
Datum=seq(as.Date("2008/1/1"), as.Date("2010/12/31"), "days")
r=rnorm(1096)
y=xts(r,order.by=as.Date(Datum))
List.y=vector(mode = "list", length = 365L)
for (i in 1:365) {
window.y <- window(y[,1], end = as.Date("2009-12-30") + i)
fit.y <- arima(window.y, order=c(5,0,0))
List.y[[i]] <- forecast(fit.y , h = 1)
}
the list looks like this:
List.y
[[1]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
732 -0.0506346 -1.333437 1.232168 -2.012511 1.911242
[[2]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
733 0.03905936 -1.242889 1.321008 -1.921511 1.99963
....
[[365]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1096 0.09242849 -1.1794 1.364257 -1.852665 2.037522
And now I want to extract only the forecast value for each period [1]-[365], so I can work with the forecast data. However, I am not sure how to do this.
I tried
sa=sapply(List.y[1:365], `[`, 4)
but then I only get this:
$mean
Time Series:
Start = 732
End = 732
Frequency = 1
[1] -0.0506346
$mean
Time Series:
Start = 733
End = 733
Frequency = 1
[1] 0.03905936
...
$mean
Time Series:
Start = 1096
End = 1096
Frequency = 1
[1] 0.09242849
but I want all 365 [1] values in a numeric vector or something, so I can work with the data.

Just use this: sa2=as.numeric(sa). sa2 will be a numeric vector of forecasted means.

Related

What is the average mobile rating on Google PageSpeed Insights?

We use Google PageSpeed Insights as a marketing tool to compare the download speed of websites we do with what our competitors do. But so many mobile sites are rated in the 30s and wondered if that's what the average mobile rating is. Does anyone know? Thx
Short Answer
The average mobile rating is 31.
Long Answer.
An article I found after writing the below that answers the question
This article from tunetheweb has actually done the hard work for us here and gathered the data from httparchive. (give the article a read it has a wealth of interesting information!)
The below table taken from that article covers your question (the answer is 31 for the performance metric 50th percentile)
Percentile Performance Accessibility Best Practices SEO PWA
10 8 56 64 69 14
25 16 69 64 80 25
50 31 80 71 86 29
75 55 88 79 92 36
90 80 95 86 99 54
95 93 97 93 100 54
99 99 100 93 100 64
I have left the below in as the information may be useful to somebody but the above answers the question much better. At least my guess of 35 wasn't a millions miles away from the actual answer. hehe.
My original Answer
You would imagine that a score of 50 would be the average right? Nope!
Lighthouse uses a log-normal curve to create a curve that dictates scores.
The two key control points on that curve are the 25th percentile for the median (a score of 50 means you are in the top 25% effectively) and the 8th percentile for a score of 90.
The numbers used to determine these points are derived from http archive data.
You can explore the curve used for Time To Interactive scoring here as an example.
Now I am sure someone who is a lot better at maths than me can use that data to calculate the average score for a site, but I would estimate it to be around 35 for a mobile site, which is pretty close to what you have observed.
One thing I can do is provide how the scoring works based on those control points so you can see the various cutoff points etc. for each metric.
The below is taken from the maths module at https://github.com/paulirish/lh-scorecalc/tree/190bed715a3589601f314b3c8a50fb0fb147c121
I have also included the median and falloff values currently used in this calculation in the scoring variable.
To play with it use either the VALUE_AT_QUANTILE function to get what value you need to achieve a certain percentage (so to see the value for the 90th percentile for Time To Interactive you would use VALUE_AT_QUANTILE(7300, 2900, 0.9); (take median (7300) and falloff (2900) from TTI in the scoring variable and then enter the desired percentile as a decimal (90 -> 0.9)).
Similar for QUANTILE_AT_VALUE function which does the reverse (shows the percentile that a particular value would fall at). E.g. if you wanted to see what percentile a First CPU Idle time of 3200 gets you would use QUANTILE_AT_VALUE(6500, 2900, 3200).
Anyway I have gone a bit off tangent, but hopefully the above and below will let someone cleverer than me the info needed to work it out (I have included the weightings for each item as well in the weights variable).
const scoring = {
FCP: {median: 4000, falloff: 2000, name: 'First Contentful Paint'},
FMP: {median: 4000, falloff: 2000, name: 'First Meaningful Paint'},
SI: {median: 5800, falloff: 2900, name: 'Speed Index'},
TTI: {median: 7300, falloff: 2900, name: 'Time to Interactive'},
FCI: {median: 6500, falloff: 2900, name: 'First CPU Idle'},
TBT: {median: 600, falloff: 200, name: 'Total Blocking Time'}, // mostly uncalibrated
LCP: {median: 4000, falloff: 2000, name: 'Largest Contentful Paint'},
CLS: {median: 0.25, falloff: 0.054, name: 'Cumulative Layout Shift', units: 'unitless'},
};
const weights = {
FCP: 0.15,
SI: 0.15,
LCP: 0.25,
TTI: 0.15,
TBT: 0.25,
CLS: 0.05
};
function internalErf_(x) {
// erf(-x) = -erf(x);
var sign = x < 0 ? -1 : 1;
x = Math.abs(x);
var a1 = 0.254829592;
var a2 = -0.284496736;
var a3 = 1.421413741;
var a4 = -1.453152027;
var a5 = 1.061405429;
var p = 0.3275911;
var t = 1 / (1 + p * x);
var y = t * (a1 + t * (a2 + t * (a3 + t * (a4 + t * a5))));
return sign * (1 - y * Math.exp(-x * x));
}
function internalErfInv_(x) {
// erfinv(-x) = -erfinv(x);
var sign = x < 0 ? -1 : 1;
var a = 0.147;
var log1x = Math.log(1 - x*x);
var p1 = 2 / (Math.PI * a) + log1x / 2;
var sqrtP1Log = Math.sqrt(p1 * p1 - (log1x / a));
return sign * Math.sqrt(sqrtP1Log - p1);
}
function VALUE_AT_QUANTILE(median, falloff, quantile) {
var location = Math.log(median);
var logRatio = Math.log(falloff / median);
var shape = Math.sqrt(1 - 3 * logRatio - Math.sqrt((logRatio - 3) * (logRatio - 3) - 8)) / 2;
return Math.exp(location + shape * Math.SQRT2 * internalErfInv_(1 - 2 * quantile));
}
function QUANTILE_AT_VALUE(median, falloff, value) {
var location = Math.log(median);
var logRatio = Math.log(falloff / median);
var shape = Math.sqrt(1 - 3 * logRatio - Math.sqrt((logRatio - 3) * (logRatio - 3) - 8)) / 2;
var standardizedX = (Math.log(value) - location) / (Math.SQRT2 * shape);
return (1 - internalErf_(standardizedX)) / 2;
}
console.log("Time To Interactive (TTI) 90th Percentile Time:", VALUE_AT_QUANTILE(7300, 2900, 0.9).toFixed(0));
console.log("First CPU Idle time of 3200 score / percentile:", (QUANTILE_AT_VALUE(6500, 2900, 3200).toFixed(3)) * 100);

How to arrange picture in lua like a grid?

I'm learning lua and I want to arrange my bubble picture with some specific x and y coordinates, here's my code so far, the value of my j and i is only incrementing by 1 instead of the +29, I know I'm lacking some knowledge so any help will be appreciated
local background = display.newImageRect("blueBackground.png",642, 1040)
background.x = display.contentCenterX
background.y = display.contentCenterY
local x = 15
local y=15
for i=15,25 do
for j=15, 25 do
local bubble = display.newImageRect("bubble.png", 23,23)
bubble.x = i
bubble.y = j
j = j + 29
print("j",j)
end
i = i + 29
print("i",i)
end
This should helps you.
From Lua documentation
The for statement has two variants: the numeric for and the
generic for.
A numeric for has the following syntax:
for var=exp1,exp2,exp3 do
something
end
That loop will execute something for each value of var from exp1
to exp2, using exp3 as the step to increment var. This third
expression is optional; when absent, Lua assumes one as the step
value. As typical examples of such loops, we have
for i=1,f(x) do print(i) end
for i=10,1,-1 do print(i) end
Use
for i=15, 29*10+15, 29 do
for j=15, 29*10+15, 29 do
local bubble = display.newImageRect("bubble.png", 23,23)
bubble.x = i
bubble.y = j
print("j",j)
end
print("i",i)
end
or
for i=0, 10 do
for j=0, 10 do
local bubble = display.newImageRect("bubble.png", 23,23)
bubble.x = 15 + i * 29
bubble.y = 15 + j * 29
...

Testing accuracy more than training accuracy

I am building a tuned random forest model for multiclass classification.
I'm getting the following results
Training accuracy(AUC) :0.9921996
Testing accuracy(AUC) :0.992237664
I saw a question related to this on this website and the common answer seems to be that the dataset must be small and your model got lucky
But in my case I have about 300k training data points and 100k testing data points
Also my classes are well balanced
> summary(train$Bucket)
0 1 TO 30 121 TO 150 151 TO 180 181 TO 365 31 TO 60 366 TO 540 541 TO 730 61 TO 90
166034 32922 4168 4070 15268 23092 8794 6927 22559
730 + 91 TO 120
20311 11222
> summary(test$Bucket)
0 1 TO 30 121 TO 150 151 TO 180 181 TO 365 31 TO 60 366 TO 540 541 TO 730 61 TO 90
55344 10974 1389 1356 5090 7698 2932 2309 7520
730 + 91 TO 120
6770 3741
Is it possible for a model to fit this well on a large testing data? Please answer if I can do something to cross verify that my model is indeed fitting really well.
My complete code
split = sample.split(Book2$Bucket,SplitRatio =0.75)
train = subset(Book2,split==T)
test = subset(Book2,split==F)
traintask <- makeClassifTask(data = train,target = "Bucket")
rf <- makeLearner("classif.randomForest")
params <- makeParamSet(makeIntegerParam("mtry",lower = 2,upper = 10),makeIntegerParam("nodesize",lower = 10,upper = 50))
#set validation strategy
rdesc <- makeResampleDesc("CV",iters=5L)
#set optimization technique
ctrl <- makeTuneControlRandom(maxit = 5L)
#start tuning
tune <- tuneParams(learner = rf ,task = traintask ,resampling = rdesc ,measures = list(acc) ,par.set = params ,control = ctrl ,show.info = T)
rf.tree <- setHyperPars(rf, par.vals = tune$x)
tune$y
r<- train(rf.tree, traintask)
getLearnerModel(r)
testtask <- makeClassifTask(data = test,target = "Bucket")
rfpred <- predict(r, testtask)
performance(rfpred, measures = list(mmce, acc))
The difference is of order 1e-4, nothing is wrong, it is a regular, statistical error (variance of the result). Nothing to worry about. This literally means that a difference is about 0.0001 * 100,000 = 10 samples ... 10 samples out of 100k.

Extracting sampled Time Points

I have a matlab Curve from which i would like to plot and find Concentration values at 17 different time samples
Following is the curve from which i would like to extract Concentration values at 17 different time points
following are the time points in minutes
t = 0,0.25,0.50,1,1.5,2,3,4,9,14,19,24,29,34,39,44,49. minutes samples
Following is the Function which i have written to plot the above graph
function c_t = output_function_constrainedK2(t, a1, a2, a3,b1,b2,b3,td, tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)= conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
plot(t,c_t);
axis([0 50 0 1400]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('Model :Constrained K2');
end
If possible, Kindly please suggest me some idea how i could possibly alter the above function so that i can come up with concentration values at 17 different time points stated above
Following are the input values that i have used to come up with the curve
output_function_constrainedK2(0:0.1:50,2501,18500,65000,0.5,0.7,0.3,3,8,0.014,0.051,0.07)
This will give you concentration values at the time points you wanted. You will have to put this inside the output_function_constrainedK2 function so that you can access the variables t and c_t.
T=[0 0.25 0.50 1 1.5 2 3 4 9 14 19 24 29 34 39 44 49];
concentration=interp1(t,c_t,T)

Negative probability in GMM

I am so confused. I have tested a program for myself by following MATLAB code :
feature_train=[1 1 2 1.2 1 1 700 709 708 699 678];
No_of_Clusters = 2;
No_of_Iterations = 10;
[m,v,w]=gaussmix(feature_train,[],No_of_Iterations,No_of_Clusters);
feature_ubm=[1000 1001 1002 1002 1000 1060 70 79 78 99 78 23 32 33 23 22 30];
No_of_Clusters = 3;
No_of_Iterations = 10;
[mubm,vubm,wubm]=gaussmix(feature_ubm,[],No_of_Iterations,No_of_Clusters);
feature_test=[2 2 2.2 3 1 600 650 750 800 658];
[lp_train,rp,kh,kp]=gaussmixp(feature_test,m,v,w);
[lp_ubm,rp,kh,kp]=gaussmixp(feature_test,mubm,vubm,wubm);
However, the result is wondering me because the feature_test must be classified in feature_train not feature_ubm. As you see below the probability of feature_ubm is more than feature_train!?!
Can anyone explain for me what is the problem ?
Is the problem related to gaussmip and gaussmix MATLAB functions ?
sum(lp_ubm)
ans =
-3.4108e+06
sum(lp_train)
ans =
-1.8658e+05
As you see below the probability of feature_ubm is more than feature_train!?!
You see exactly the opposite, despite the absolute value of ubm is big, you are considering negative numbers and
sum(lp_train) > sum(lp_ubm)
hense
P(test|train) > P(test|ubm)
So your test chunk is correctly classified as train, not as ubm.

Resources