returns an error - time-series

I need to substitute or interpolate the NA in a vector of each cell from a rasterstack data. I have two functions fun_sub for substituting NA and fun_interp for interpolating NA.
I found fun_sub works very well. But fun_interp does not work but I cannot find a reason.
Thank you very much
Tianyi Zhang
Below is a simple example:
#------EXAMPLE
library(timeSeries)
library(raster)
fun_sub <- function(x) {
# substitute the NA to the mean of vector for each cell
v=as.vector(x)
z<-substituteNA(v,type="mean")
return (z)
}
fun_interp <- function(x) {
# interpolate the NA to the the linear regression of vector for each cell
v=as.vector(x)
z=interpNA(v, method="linear")
return (z)
}
# create data
r<-raster(ncols=2,nrows=2)
r1<-r; r2<-r; r3<-r; r4<-r
r1[]<-c(1,1,1,2)
r2[]<-c(5,5,NA,5)
r3[]<-c(3,3,4,2)
r4[]<-c(6,5,5,2)
s<-stack(r1,r2,r3,r4)
# try the two functions; the task is change the NA in r2 to a number;
res_sub<-calc(s,fun_sub) # works great! substitute the NA to the mean of c(1,NA,4,5); I got c(1,3.333,4,5)
res_inter<-calc(s,fun_interp) # cannot interpolate; have an error, don't know the reason; I expected it is c(1, 2.5 ,4, 5). But it returns an error
# try whether interpNA() can work or not
interpNA(c(1,NA,4,5),method="linear") # but this function is OK.

I would go this way using standard functions:
# create data
library(raster)
r1 <- r2 <- r3 <- r4 <-raster(ncol=2,nrow=2)
r1[] <- c(1,1,1,2)
r2[] <- c(5,5,NA,5)
r3[] <- c(3,3,4,2)
r4[] <- c(6,5,5,2)
s < -stack(r1,r2,r3,r4)
m <- mean(s, na.rm=TRUE)
r_sub <- cover(s, m)
r_int <- approxNA(s)
values(s)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5 3 6
##[2,] 1 5 3 5
##[3,] 1 NA 4 5
##[4,] 2 5 2 2
values(r_sub)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5.000000 3 6
##[2,] 1 5.000000 3 5
##[3,] 1 3.333333 4 5
##[4,] 2 5.000000 2 2
values(r_int)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5.0 3 6
##[2,] 1 5.0 3 5
##[3,] 1 2.5 4 5
##[4,] 2 5.0 2 2
To use the functions from the timeSeries package
library(timeSeries)
library(raster)
New data .At least 6 cells needed for calc to test the function, that is an issue.
r1 <- r2 <- r3 <- r4 <-raster(ncol=3,nrow=2)
r1[] <- c(1,1,1,2,1,1)
r2[] <- c(0,5,NA,5,1,1)
r3[] <- c(3,3,4,2,1,1)
r4[] <- c(6,5,5,2,1,1)
s <- stack(r1,r2,r3,r4)
values(s)
values(calc(s, function(v) substituteNA(v,type="mean")))
values(calc(s, function(v) interpNA(v, method="linear")))
values(s)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0 3 6
##[2,] 1 5 3 5
##[3,] 1 NA 4 5
##[4,] 2 5 2 2
##[5,] 1 1 1 1
##[6,] 1 1 1 1
values(calc(s, function(v) substituteNA(v,type="mean")))
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0.000000 3 6
##[2,] 1 5.000000 3 5
##[3,] 1 3.333333 4 5
##[4,] 2 5.000000 2 2
##[5,] 1 1.000000 1 1
##[6,] 1 1.000000 1 1
values(calc(s, function(v) interpNA(v, method="linear")))
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0.0 3 6
##[2,] 1 5.0 3 5
##[3,] 1 2.5 4 5
##[4,] 2 5.0 2 2
##[5,] 1 1.0 1 1
##[6,] 1 1.0 1 1

Related

Ramp least squares estimation with CVXPY and DCCP

This is a ramp least squares estimation problem, better described in math formula elsewhere:
https://scicomp.stackexchange.com/questions/33524/ramp-least-squares-estimation
I used Disciplined Convex-Concave Programming and DCCP package based on CVXPY. The code follows:
import cvxpy as cp
import numpy as np
import dccp
from dccp.problem import is_dccp
# Generate data.
m = 20
n = 15
np.random.seed(1)
X = np.random.randn(m, n)
Y = np.random.randn(m)
# Define and solve the DCCP problem.
def loss_fn(X, Y, beta):
return cp.norm2(cp.matmul(X, beta) - Y)**2
def obj_g(X, Y, beta, sval):
return cp.pos(loss_fn(X, Y, beta) - sval)
beta = cp.Variable(n)
s = 10000000000000
constr = obj_g(X, Y, beta, s)
t = cp.Variable(1)
t.value = [1]
cost = loss_fn(X, Y, beta) - t
problem = cp.Problem(cp.Minimize(cost), [constr >= t])
print("problem is DCP:", problem.is_dcp()) # false
print("problem is DCCP:", is_dccp(problem)) # true
problem.solve(verbose=True, solver=cp.ECOS, method='dccp')
# Print result.
print("\nThe optimal value is", problem.value)
print("The optimal beta is")
print(beta.value)
print("The norm of the residual is ", cp.norm(X*beta - Y, p=2).value)
Because of the large value s, I would hope to get a solution similar to the least squares estimation. But there is no solution as the output shows (with different solver, dimension of the problem, etc):
problem is DCP: False
problem is DCCP: True
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +2e+01 9e-02 1e-04 1e+00 9e+00 --- --- 1 1 - | - -
1 -7.422e-04 +2.695e-09 +2e-01 1e-03 1e-06 1e-02 9e-02 0.9890 1e-04 2 1 1 | 0 0
2 -1.638e-05 +5.963e-11 +2e-03 1e-05 2e-08 1e-04 1e-03 0.9890 1e-04 2 1 1 | 0 0
3 -2.711e-07 +9.888e-13 +2e-05 1e-07 2e-10 2e-06 1e-05 0.9890 1e-04 4 1 1 | 0 0
4 -3.991e-09 +1.379e-14 +2e-07 1e-09 2e-12 2e-08 1e-07 0.9890 1e-04 1 0 0 | 0 0
5 -5.507e-11 +1.872e-16 +3e-09 2e-11 2e-14 2e-10 1e-09 0.9890 1e-04 1 0 0 | 0 0
OPTIMAL (within feastol=1.6e-11, reltol=4.8e+01, abstol=2.6e-09).
Runtime: 0.001112 seconds.
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -5.811e-01 +1e+01 6e-01 6e-01 1e+00 2e+00 --- --- 1 1 - | - -
1 -7.758e+00 -2.575e+00 +1e+00 2e-01 7e-01 6e+00 3e-01 0.9890 1e-01 1 1 1 | 0 0
2 -3.104e+02 -9.419e+01 +4e-02 2e-01 8e-01 2e+02 8e-03 0.9725 8e-04 2 1 1 | 0 0
3 -2.409e+03 -9.556e+02 +5e-03 2e-01 8e-01 1e+03 1e-03 0.8968 5e-02 3 2 2 | 0 0
4 -1.103e+04 -5.209e+03 +2e-03 2e-01 7e-01 6e+03 4e-04 0.9347 3e-01 2 2 2 | 0 0
5 -1.268e+04 -1.592e+03 +8e-04 1e-01 1e+00 1e+04 2e-04 0.7916 4e-01 3 2 2 | 0 0
6 -1.236e+05 -2.099e+04 +9e-05 1e-01 1e+00 1e+05 2e-05 0.8979 9e-03 1 1 1 | 0 0
7 -4.261e+05 -1.850e+05 +4e-05 2e-01 7e-01 2e+05 1e-05 0.7182 3e-01 2 1 1 | 0 0
8 -2.492e+07 -1.078e+07 +7e-07 1e-01 7e-01 1e+07 2e-07 0.9838 1e-04 3 2 2 | 0 0
9 -2.226e+08 -9.836e+07 +5e-08 9e-02 5e-01 1e+08 1e-08 0.9339 2e-03 2 3 2 | 0 0
UNBOUNDED (within feastol=1.0e-09).
Runtime: 0.001949 seconds.
The optimal value is None
The optimal beta is
None
The norm of the residual is None

Count unique values across multiple columns

I have data in this format
A B C D
1 1 1 1
1 1 1 2
1 1 1 3
1 1 1 4
...
4 4 4 4
I want to count number of unique values in each row and print it
output:
A B C D unique-count
1 1 1 1 4
1 1 1 2 3
1 1 1 3 3
1 1 1 4 3
...
4 4 4 4 4

How to predict glm using new entry?

I have a data frame with next structure:
'data.frame': 29092 obs. of 8 variables:
$ loan_status : int 0 0 0 0 0 0 1 0 1 0 ...
$ loan_amnt : int 5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
$ grade : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 .
$ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4
$ annual_inc : num 24000 12252 49200 36000 48000 ...
$ age : int 33 31 24 39 24 28 22 22 28 22 ...
$ ir_cat : Factor w/ 5 levels "0-8","11-13.5",..: 4 5 2 5 5 2 2 4 4 3 .
$ emp_cat : Factor w/ 5 levels "0-15","15-30",..: 1 2 1 1 1 1 1 1 1 1 ...
I run logistic regression with a goal of predicting loan_status, and I want to use predict for some new entry let's say:
loan_amnt = 4200
grade = C
home_ownership = MORTGAGE
annual_income = 32500
age = 31
ir_cat = "0-8"
emp_cat = "0-15"
Let's say i run
glm(loan_status ~ ., data = loan_data, family = "binomial") -> glm1
and use predict:
predict(glm1, newdata, type = "response")
My problem is what to add my new entry to newdata?

How does Weka evaluate classifier model

I used random forest algorithm and got this result
=== Summary ===
Correctly Classified Instances 10547 97.0464 %
Incorrectly Classified Instances 321 2.9536 %
Kappa statistic 0.9642
Mean absolute error 0.0333
Root mean squared error 0.0952
Relative absolute error 18.1436 %
Root relative squared error 31.4285 %
Total Number of Instances 10868
=== Confusion Matrix ===
a b c d e f g h i <-- classified as
1518 1 3 1 0 14 0 0 4 | a = a
3 2446 0 0 0 1 1 27 0 | b = b
0 0 2942 0 0 0 0 0 0 | c = c
0 0 0 470 0 1 1 2 1 | d = d
9 0 0 9 2 19 0 3 0 | e = e
23 1 2 19 0 677 1 22 6 | f = f
4 0 2 0 0 13 379 0 0 | g = g
63 2 6 17 0 15 0 1122 3 | h = h
9 0 0 0 0 9 0 4 991 | i = i
I wonder how Weka evaluate errors(mean absolute error, root mean squared error, ...) using non numerical values('a', 'b', ...).
I mapped each classes to numbers from 0 to 8 and evaluated errors manually, but the evaluation was different from Weka.
How to reimplemen the evaluating steps of Weka?

How to encode video 3840x2160 with 32x32 and 16x16 CU with depth 2 and 1 in HEVC Encoder HM 13

When I try to encode a video the encoder crashes after finishing first GOP.
This is the configuration I'm using:
MaxCUWidth : 16 # Maximum coding unit width in pixel
MaxCUHeight : 16 # Maximum coding unit height in pixel
MaxPartitionDepth : 2 # Maximum coding unit depth
QuadtreeTULog2MaxSize : 3 # Log2 of maximum transform size for
# quadtree-based TU coding (2...5) = MaxPartitionDepth + 2 - 1
QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2...5)
QuadtreeTUMaxDepthInter : 1
QuadtreeTUMaxDepthIntra : 1
#======== Coding Structure =============
IntraPeriod : 8 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 1 # Random Accesss 0:none, 1:CDR, 2:IDR
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: P 4 1 0.5 0 0 0 1 1 -4 0
Frame2: B 2 2 0.5 1 0 1 1 2 -2 2 1 2 2 1 1
Frame3: B 1 3 0.5 2 0 2 1 3 -1 1 3 1 1 3 1 1 1
Frame4: B 3 3 0.5 2 0 2 1 2 -1 1 1 -2 4 0 1 1 0
This also happens with CU=16x16 with depth=1
Note: I encoded CU=64x64 with depth=4 with the same GOP configuration and every thing went fine.
This is most probably due to the fact that you have compiled the binary for a 32-bit system?
Please rebuild it for a 64-bit system and the problem will go away.

Resources