I need to substitute or interpolate the NA in a vector of each cell from a rasterstack data. I have two functions fun_sub for substituting NA and fun_interp for interpolating NA.
I found fun_sub works very well. But fun_interp does not work but I cannot find a reason.
Thank you very much
Tianyi Zhang
Below is a simple example:
#------EXAMPLE
library(timeSeries)
library(raster)
fun_sub <- function(x) {
# substitute the NA to the mean of vector for each cell
v=as.vector(x)
z<-substituteNA(v,type="mean")
return (z)
}
fun_interp <- function(x) {
# interpolate the NA to the the linear regression of vector for each cell
v=as.vector(x)
z=interpNA(v, method="linear")
return (z)
}
# create data
r<-raster(ncols=2,nrows=2)
r1<-r; r2<-r; r3<-r; r4<-r
r1[]<-c(1,1,1,2)
r2[]<-c(5,5,NA,5)
r3[]<-c(3,3,4,2)
r4[]<-c(6,5,5,2)
s<-stack(r1,r2,r3,r4)
# try the two functions; the task is change the NA in r2 to a number;
res_sub<-calc(s,fun_sub) # works great! substitute the NA to the mean of c(1,NA,4,5); I got c(1,3.333,4,5)
res_inter<-calc(s,fun_interp) # cannot interpolate; have an error, don't know the reason; I expected it is c(1, 2.5 ,4, 5). But it returns an error
# try whether interpNA() can work or not
interpNA(c(1,NA,4,5),method="linear") # but this function is OK.
I would go this way using standard functions:
# create data
library(raster)
r1 <- r2 <- r3 <- r4 <-raster(ncol=2,nrow=2)
r1[] <- c(1,1,1,2)
r2[] <- c(5,5,NA,5)
r3[] <- c(3,3,4,2)
r4[] <- c(6,5,5,2)
s < -stack(r1,r2,r3,r4)
m <- mean(s, na.rm=TRUE)
r_sub <- cover(s, m)
r_int <- approxNA(s)
values(s)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5 3 6
##[2,] 1 5 3 5
##[3,] 1 NA 4 5
##[4,] 2 5 2 2
values(r_sub)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5.000000 3 6
##[2,] 1 5.000000 3 5
##[3,] 1 3.333333 4 5
##[4,] 2 5.000000 2 2
values(r_int)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 5.0 3 6
##[2,] 1 5.0 3 5
##[3,] 1 2.5 4 5
##[4,] 2 5.0 2 2
To use the functions from the timeSeries package
library(timeSeries)
library(raster)
New data .At least 6 cells needed for calc to test the function, that is an issue.
r1 <- r2 <- r3 <- r4 <-raster(ncol=3,nrow=2)
r1[] <- c(1,1,1,2,1,1)
r2[] <- c(0,5,NA,5,1,1)
r3[] <- c(3,3,4,2,1,1)
r4[] <- c(6,5,5,2,1,1)
s <- stack(r1,r2,r3,r4)
values(s)
values(calc(s, function(v) substituteNA(v,type="mean")))
values(calc(s, function(v) interpNA(v, method="linear")))
values(s)
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0 3 6
##[2,] 1 5 3 5
##[3,] 1 NA 4 5
##[4,] 2 5 2 2
##[5,] 1 1 1 1
##[6,] 1 1 1 1
values(calc(s, function(v) substituteNA(v,type="mean")))
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0.000000 3 6
##[2,] 1 5.000000 3 5
##[3,] 1 3.333333 4 5
##[4,] 2 5.000000 2 2
##[5,] 1 1.000000 1 1
##[6,] 1 1.000000 1 1
values(calc(s, function(v) interpNA(v, method="linear")))
## layer.1 layer.2 layer.3 layer.4
##[1,] 1 0.0 3 6
##[2,] 1 5.0 3 5
##[3,] 1 2.5 4 5
##[4,] 2 5.0 2 2
##[5,] 1 1.0 1 1
##[6,] 1 1.0 1 1
Related
This is a ramp least squares estimation problem, better described in math formula elsewhere:
https://scicomp.stackexchange.com/questions/33524/ramp-least-squares-estimation
I used Disciplined Convex-Concave Programming and DCCP package based on CVXPY. The code follows:
import cvxpy as cp
import numpy as np
import dccp
from dccp.problem import is_dccp
# Generate data.
m = 20
n = 15
np.random.seed(1)
X = np.random.randn(m, n)
Y = np.random.randn(m)
# Define and solve the DCCP problem.
def loss_fn(X, Y, beta):
return cp.norm2(cp.matmul(X, beta) - Y)**2
def obj_g(X, Y, beta, sval):
return cp.pos(loss_fn(X, Y, beta) - sval)
beta = cp.Variable(n)
s = 10000000000000
constr = obj_g(X, Y, beta, s)
t = cp.Variable(1)
t.value = [1]
cost = loss_fn(X, Y, beta) - t
problem = cp.Problem(cp.Minimize(cost), [constr >= t])
print("problem is DCP:", problem.is_dcp()) # false
print("problem is DCCP:", is_dccp(problem)) # true
problem.solve(verbose=True, solver=cp.ECOS, method='dccp')
# Print result.
print("\nThe optimal value is", problem.value)
print("The optimal beta is")
print(beta.value)
print("The norm of the residual is ", cp.norm(X*beta - Y, p=2).value)
Because of the large value s, I would hope to get a solution similar to the least squares estimation. But there is no solution as the output shows (with different solver, dimension of the problem, etc):
problem is DCP: False
problem is DCCP: True
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +2e+01 9e-02 1e-04 1e+00 9e+00 --- --- 1 1 - | - -
1 -7.422e-04 +2.695e-09 +2e-01 1e-03 1e-06 1e-02 9e-02 0.9890 1e-04 2 1 1 | 0 0
2 -1.638e-05 +5.963e-11 +2e-03 1e-05 2e-08 1e-04 1e-03 0.9890 1e-04 2 1 1 | 0 0
3 -2.711e-07 +9.888e-13 +2e-05 1e-07 2e-10 2e-06 1e-05 0.9890 1e-04 4 1 1 | 0 0
4 -3.991e-09 +1.379e-14 +2e-07 1e-09 2e-12 2e-08 1e-07 0.9890 1e-04 1 0 0 | 0 0
5 -5.507e-11 +1.872e-16 +3e-09 2e-11 2e-14 2e-10 1e-09 0.9890 1e-04 1 0 0 | 0 0
OPTIMAL (within feastol=1.6e-11, reltol=4.8e+01, abstol=2.6e-09).
Runtime: 0.001112 seconds.
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -5.811e-01 +1e+01 6e-01 6e-01 1e+00 2e+00 --- --- 1 1 - | - -
1 -7.758e+00 -2.575e+00 +1e+00 2e-01 7e-01 6e+00 3e-01 0.9890 1e-01 1 1 1 | 0 0
2 -3.104e+02 -9.419e+01 +4e-02 2e-01 8e-01 2e+02 8e-03 0.9725 8e-04 2 1 1 | 0 0
3 -2.409e+03 -9.556e+02 +5e-03 2e-01 8e-01 1e+03 1e-03 0.8968 5e-02 3 2 2 | 0 0
4 -1.103e+04 -5.209e+03 +2e-03 2e-01 7e-01 6e+03 4e-04 0.9347 3e-01 2 2 2 | 0 0
5 -1.268e+04 -1.592e+03 +8e-04 1e-01 1e+00 1e+04 2e-04 0.7916 4e-01 3 2 2 | 0 0
6 -1.236e+05 -2.099e+04 +9e-05 1e-01 1e+00 1e+05 2e-05 0.8979 9e-03 1 1 1 | 0 0
7 -4.261e+05 -1.850e+05 +4e-05 2e-01 7e-01 2e+05 1e-05 0.7182 3e-01 2 1 1 | 0 0
8 -2.492e+07 -1.078e+07 +7e-07 1e-01 7e-01 1e+07 2e-07 0.9838 1e-04 3 2 2 | 0 0
9 -2.226e+08 -9.836e+07 +5e-08 9e-02 5e-01 1e+08 1e-08 0.9339 2e-03 2 3 2 | 0 0
UNBOUNDED (within feastol=1.0e-09).
Runtime: 0.001949 seconds.
The optimal value is None
The optimal beta is
None
The norm of the residual is None
I have data in this format
A B C D
1 1 1 1
1 1 1 2
1 1 1 3
1 1 1 4
...
4 4 4 4
I want to count number of unique values in each row and print it
output:
A B C D unique-count
1 1 1 1 4
1 1 1 2 3
1 1 1 3 3
1 1 1 4 3
...
4 4 4 4 4
I have a data frame with next structure:
'data.frame': 29092 obs. of 8 variables:
$ loan_status : int 0 0 0 0 0 0 1 0 1 0 ...
$ loan_amnt : int 5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
$ grade : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 .
$ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4
$ annual_inc : num 24000 12252 49200 36000 48000 ...
$ age : int 33 31 24 39 24 28 22 22 28 22 ...
$ ir_cat : Factor w/ 5 levels "0-8","11-13.5",..: 4 5 2 5 5 2 2 4 4 3 .
$ emp_cat : Factor w/ 5 levels "0-15","15-30",..: 1 2 1 1 1 1 1 1 1 1 ...
I run logistic regression with a goal of predicting loan_status, and I want to use predict for some new entry let's say:
loan_amnt = 4200
grade = C
home_ownership = MORTGAGE
annual_income = 32500
age = 31
ir_cat = "0-8"
emp_cat = "0-15"
Let's say i run
glm(loan_status ~ ., data = loan_data, family = "binomial") -> glm1
and use predict:
predict(glm1, newdata, type = "response")
My problem is what to add my new entry to newdata?
I used random forest algorithm and got this result
=== Summary ===
Correctly Classified Instances 10547 97.0464 %
Incorrectly Classified Instances 321 2.9536 %
Kappa statistic 0.9642
Mean absolute error 0.0333
Root mean squared error 0.0952
Relative absolute error 18.1436 %
Root relative squared error 31.4285 %
Total Number of Instances 10868
=== Confusion Matrix ===
a b c d e f g h i <-- classified as
1518 1 3 1 0 14 0 0 4 | a = a
3 2446 0 0 0 1 1 27 0 | b = b
0 0 2942 0 0 0 0 0 0 | c = c
0 0 0 470 0 1 1 2 1 | d = d
9 0 0 9 2 19 0 3 0 | e = e
23 1 2 19 0 677 1 22 6 | f = f
4 0 2 0 0 13 379 0 0 | g = g
63 2 6 17 0 15 0 1122 3 | h = h
9 0 0 0 0 9 0 4 991 | i = i
I wonder how Weka evaluate errors(mean absolute error, root mean squared error, ...) using non numerical values('a', 'b', ...).
I mapped each classes to numbers from 0 to 8 and evaluated errors manually, but the evaluation was different from Weka.
How to reimplemen the evaluating steps of Weka?
When I try to encode a video the encoder crashes after finishing first GOP.
This is the configuration I'm using:
MaxCUWidth : 16 # Maximum coding unit width in pixel
MaxCUHeight : 16 # Maximum coding unit height in pixel
MaxPartitionDepth : 2 # Maximum coding unit depth
QuadtreeTULog2MaxSize : 3 # Log2 of maximum transform size for
# quadtree-based TU coding (2...5) = MaxPartitionDepth + 2 - 1
QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2...5)
QuadtreeTUMaxDepthInter : 1
QuadtreeTUMaxDepthIntra : 1
#======== Coding Structure =============
IntraPeriod : 8 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 1 # Random Accesss 0:none, 1:CDR, 2:IDR
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: P 4 1 0.5 0 0 0 1 1 -4 0
Frame2: B 2 2 0.5 1 0 1 1 2 -2 2 1 2 2 1 1
Frame3: B 1 3 0.5 2 0 2 1 3 -1 1 3 1 1 3 1 1 1
Frame4: B 3 3 0.5 2 0 2 1 2 -1 1 1 -2 4 0 1 1 0
This also happens with CU=16x16 with depth=1
Note: I encoded CU=64x64 with depth=4 with the same GOP configuration and every thing went fine.
This is most probably due to the fact that you have compiled the binary for a 32-bit system?
Please rebuild it for a 64-bit system and the problem will go away.