The output is a vey very large matrix using bigmemory - ehcache-bigmemory

I'm using bigmemory i want to calculate w.
my v length is 478000 and k length is 240500.
The two matrix multiplication is w very large I run the code by loop still is running is not finished yet and i don't know if will give me the result or no.
I'm trying to calculate without the for loop. I got error.
Please any help to correct my code r make it fast.
v <-read.big.matrix('v.dat',type='double')
k <-read.big.matrix('k.dat',type='double')
m=length(v);
n=length(k);
w <-filebacked.big.matrix(length(v),length(k),type="double",backingfile="w.bin",descriptorfile="w.desc")
start <- Sys.time()
w <- as.big.matrix(2*cos(0.001*v[]%*%t(k[]))-2)
Sys.time() - start
# for(i in 1:m)
# {
# for(j in 1:n)
# {
# w[i,j]=2*cos(dt*v[i]*k[j])-2
# }
# }
Thanks.

Related

Direct Multi-Step Ahead Time Series Forecasting using R

I am building a multivariate model for direct time series forecasting, where the goal is to make 4 and 8-step-ahead forecasts using random forest and SVR.
The results look very similar to my 1 step-ahead forecast and I am wondering whether my code is sensible or not.
Here is an example for some 4-step-ahead forecasts using random forest in conjunction with the predict function.
As far as I understand the difference between the 1-step-ahead and the 4-step-ahead direct forecast is that instead of the first we feed the fourth row of the test set to the predict function. Meaning in the following example:
test <- mydata_2diff[(i+4), ]
instead of
test <- mydata_2diff[(i+1), ]
My code looks as follows:
train_end <- 112 # End of the training set
j <- 1 # Loop counter
k_max <- 10 # Number of RF estimations
pred_rf_4Q_dir <- matrix(0,(nrow(mydata_2diff)-train_end-3), k_max) # Prediction matrix
{
tic()
for (i in train_end:(nrow(mydata_2diff)-4)) {
train <- mydata_2diff[1:i, ] # Training data
test <- mydata_2diff[(i+4), ] # Test data
for (k in 1:k_max){
rf_RPI <- randomForest(RPI ~ RGDP + CPI + STI + LTI + UE + SER + SPI + ARH,
data = train, ntree = 500, importance = T)
pred_rf = predict(rf_RPI, newdata=test, predict.all = T)
pred_rf_4Q_dir[j,k] <- pred_rf[["aggregate"]]
}
j <- j+1
}
toc()
}
Is this approach correct or not?
I am grateful for any feedback.

Translating an optimization problem from CVX to CVXPY?

I am attempting to translate a semidefinite programming problem from CVX to CVXPY as described here. My attempt follows:
import cvxpy as cvx
import numpy as np
c = [0, 1]
n = len(c)
# Create optimization variables.
f = cvx.Variable((n, n), hermitian=True)
# Create constraints.
constraints = [f >> 0]
for k in range(1, n):
indices = [(i * n) + i - (n - k) for i in range(n - k, n)]
constraints += [cvx.sum(cvx.vec(f)[indices]) == c[n - k]]
# Form objective.
obj = cvx.Maximize(c[0] - cvx.trace(f))
# Form and solve problem.
prob = cvx.Problem(obj, constraints)
sol = prob.solve()
print(sol)
print(f.value)
The issue here is that when I take the coefficients of the Fourier series and translate them into the array c it fails on complex values. I think this is due to a discrepancy between the maximize function of CVX and CVXPY. I'm not sure what CVX is maximizing, since the trace of the matrix is a complex value. As pointed out below the trace is real since the matrix is Hermitian, but the code still fails. Can someone with CVXPY knowledge clear this up?

How to Implement r Script in Ruby on Rails App

I need to run an R script on a group of data within a RoR app. I tried using yth_filter (see here) but the response time was 15+ seconds. Does anyone know of a conversion of sorts or a better way to implement this R script?
## Filter macro series
# Filtering parameters
# A user will need to enter these
# values into interactive fields
# to adjust the trend line.
# h = number of periods forecasted
# into the future.
# p = number of previous values
# selected as independent
# variables.
rm(list=ls())
# Get the file
y <- read.csv('D:/Papers/cato/cmfa_filtering/NGDP.csv', header=TRUE, row.names=1)
y <- xts(y, order.by = as.Date(rownames(y), format="%m/%d/%Y"))
# Parameters
h <- 8
p <- 4
# Initialize
lag <- h
# Generate the lags
for (i in 1:p) {
# Create the lagged variable
assign(paste0('y', i), lag.xts(y, k=lag))
# Tick ahead
lag <- lag+1
}
# Do OLS
ols <- lm(y~y1+y2+y3+y4)
# Get the trend and cycle components
out.trend <- fitted(ols)
out.cycle <- resid(ols)
# Compare to yth_filter
out.pack <- yth_filter(y, h=h, p=p, output=c("x", "trend"), family = gaussian)
compare <- cbind.xts(out.pack,out.trend)

Identifying outlying datapoints from residuals (GeoLight package)

I am analysing some data collected from a geolocator placed on a migratory bird. In a nutshell, my data are sunrise and sunset times, which are then used to determine position on the globe.
I am using a package GeoLight (http://cran.r-project.org/web/packages/GeoLight/GeoLight.pdf) to identify outlying data - specifically, I am using the LoessFilter function which applies a polynomial regression and identify residuals that are greater than 3 interquantile ranges (specified by k in the code when applying the function)
My problem is: the function returns graphs in which outlying datapoints are identified in red. There seems to be an issue with the code itself regarding returned TRUE or FALSE statements stating which points are outliers - all are stated as TRUE, even if outliers are identified.
I have therefore modified the function code to state which residuals are outliers.
However, when I then remove those rows from the original dataset and re-run the function, the points have not been removed. Therefore, there is some discrepancy between which residuals are relating to values in the original data: i.e. if the output states that residual 78 is an outlying point, removing row 78 from the original data does not remove the outlying datapoint.
I would very much appreciate some help with removing the outlying datapoints identified using the function. It seems like a very easy fix but I can't seem to figure it out.
Code for full function and data below
Thanks
Emma
log2$tFirst<-as.POSIXlt(log2$tFirst)
log2$tSecond<-as.POSIXlt(log2$tSecond)
CODE TO GET OUTLYING RESIDUALS
i.get.outliers<-function(residuals, k=3) {
x <- residuals
# x is a vector of residuals
# k is a measure of how many interquartile ranges to take before saying that point is an outlier
# it looks like 3 is a good preset for k
QR<-quantile(x, probs = c(0.25, 0.75))
IQR<-QR[2]-QR[1]
Lower.band<-QR[1]-(k*IQR)
Upper.Band<-QR[2]+(k*IQR)
delete<-which(x<Lower.band | x>Upper.Band)
return(as.vector(delete))
}
LOESS FILTER FUNCTION CODE
loessFilter <- function(tFirst, tSecond, type, k=3, plot=TRUE){
tw <- data.frame(datetime=as.POSIXct(c(tFirst,tSecond),"UTC"),type=c(type,ifelse(type==1,2,1)))
tw <- tw[!duplicated(tw$datetime),]
tw <- tw[order(tw[,1]),]
hours <- as.numeric(format(tw[,1],"%H"))+as.numeric(format(tw[,1],"%M"))/60
for(t in 1:2){
cor <- rep(NA, 24)
for(i in 0:23){
cor[i+1] <- max(abs((c(hours[tw$type==t][1],hours[tw$type==t])+i)%%24 -
(c(hours[tw$type==t],hours[tw$type==t][length(hours)])+i)%%24),na.rm=T)
}
hours[tw$type==t] <- (hours[tw$type==t] + (which.min(round(cor,2)))-1)%%24
}
dawn <- data.frame(id=1:sum(tw$type==1),
datetime=tw$datetime[tw$type==1],
type=tw$type[tw$type==1],
hours = hours[tw$type==1], filter=FALSE)
dusk <- data.frame(id=1:sum(tw$type==2),
datetime=tw$datetime[tw$type==2],
type=tw$type[tw$type==2],
hours = hours[tw$type==2], filter=FALSE)
for(d in seq(30,k,length=5)){
predict.dawn <- predict(loess(dawn$hours[!dawn$filter]~as.numeric(dawn$datetime[!dawn$filter]),span=0.1))
predict.dusk <- predict(loess(dusk$hours[!dusk$filter]~as.numeric(dusk$datetime[!dusk$filter]),span=0.1))
del.dawn <- i.get.outliers(as.vector(residuals(loess(dawn$hours[!dawn$filter]~
as.numeric(dawn$datetime[!dawn$filter]),span=0.1))),k=d)
del.dusk <- i.get.outliers(as.vector(residuals(loess(dusk$hours[!dusk$filter]~
as.numeric(dusk$datetime[!dusk$filter]),span=0.1))),k=d)
if(length(del.dawn)>0) dawn$filter[!dawn$filter][del.dawn] <- TRUE
if(length(del.dusk)>0) dusk$filter[!dusk$filter][del.dusk] <- TRUE
}
if(plot){
par(mfrow=c(2,1),mar=c(3,3,0.5,3),oma=c(2,2,0,0))
plot(dawn$datetime[dawn$type==1],dawn$hours[dawn$type==1],pch="+",cex=0.6,xlab="",ylab="",yaxt="n")
lines(dawn$datetime[!dawn$filter], predict(loess(dawn$hours[!dawn$filter]~as.numeric(dawn$datetime[!dawn$filter]),span=0.1)) , type="l")
points(dawn$datetime[dawn$filter],dawn$hours[dawn$filter],col="red",pch="+",cex=1)
axis(2,labels=F)
mtext("Sunrise",4,line=1.2)
plot(dusk$datetime[dusk$type==2],dusk$hours[dusk$type==2],pch="+",cex=0.6,xlab="",ylab="",yaxt="n")
lines(dusk$datetime[!dusk$filter], predict(loess(dusk$hours[!dusk$filter]~as.numeric(dusk$datetime[!dusk$filter]),span=0.1)), type="l")
points(dusk$datetime[dusk$filter],dusk$hours[dusk$filter],col="red",pch="+",cex=1)
axis(2,labels=F)
legend("bottomleft",c("Outside filter","Inside filter"),pch=c("+","+"),col=c("black","red"),
bty="n",cex=0.8)
mtext("Sunset",4,line=1.2)
mtext("Time",1,outer=T)
mtext("Sunrise/Sunset hours (rescaled)",2,outer=T)
}
all <- rbind(subset(dusk,filter),subset(dawn,filter))
filter <- rep(FALSE,length(tFirst))
filter[tFirst%in%all$datetime | tSecond%in%all$datetime] <- TRUE
# original code:
#return(!filter)
# altered code to return outliersreturn(del.dusk)
# replace with code below to print outlying points
return(c("delete dawn",del.dawn,"delete dusk",del.dusk))
}
APPLY FUNCTION
loessFilter(log2$tFirst, log2$tSecond, type=1, k=4, plot=TRUE)
remove the values - need to remove both sunrise and sunset curves
log2b<-log2[-c(77,78,124,125),]
length(log2$tFirst)
length(log2b$tFirst)
repeat function to see if the values have gone
loessFilter(log2b$tFirst, log2b$tSecond, type=1, k=4, plot=TRUE)
outliers still there!!
HERE ARE THE DATA:
http://www.4shared.com/file/jxVuTsVHce/002_geolight.html
A bit too long to post the full data here and the example won't work with a dummy dataset :)

Cascaded Hough Transform in OpenCV

Is it possible to perform a Cascaded Hough Transform in OpenCV? I understand its just a HT followed by another one. The problem I'm facing is that the values returned are always rho and theta and never in y-intercept form.
Is it possible to convert these values back to y-intercept and split them into sub-spaces so I can detect vanishing points?
Or is it just better to program an implementation of HT myself in, say, Python?
you could try to populate the Hough domain with m and c parameters instead, so that y = mx + c can be re-written as c = y - mx so instead of the usual rho = x cos(theta) + y sin(theta), you have c = y - mx
normally, you'd go through the thetas and calculate the rho, then you increment the accumulator value for that pair of rho and theta. Here, you'd go through the value of m and calculate the values of c, then accumulate that m,c element in the accumulator. The bin with the most votes would be the right m,c
// going through the image looking for edge pixels
for (i = 0;i<numrows;i++)
{
for (j = 0;j<numcols;j++)
{
if (img[i*numcols + j] > 1)
{
for (n = first_m;n<last_m;n++)
{
index = i - n * j;
accum[n][index]++;
}
}
}
}
I guess where this becomes ineffective is that its hard to define the step size for going through m as they should technically go from -infinity to infinity so you'd kind of have trouble. yeah, so much for Hough transform in terms of m,c. Lol
I guess you could go the other way and isolate m so it would be m = (y-c)/x so that now, you cycle through a bunch of y values that make sense and its much more manageable though it's still hard to define your accumulator matrix because m still has no limit. I guess you could limit the values of m that you would be interested in looking for.
Yeah, much more sense to go with rho and theta and convert them into y = mx + c and then even making a brand new image and re-running the hough transform on it.
I don't think OpenCV can perform cascaded hough transforms. You should convert them to xy space yourself. This article might help you:
http://aishack.in/tutorials/converting-lines-from-normal-to-slopeintercept-form/

Resources