Issue with converting R script into Shiny app - histogram

I need your support converting this R script into a Shiny app. Been trying for quite sometime but I'm always getting errors. The code is to apply stochastic modeling in business valuation.
The inputs are:
the number of desired simulations i.e. lapply (1:10000) ,
Sales volume ,
Selling price,
Unit cost,
Fixed costs,
The outputs are:
Net profit (= SalesVolume * (SellingPrice - unitcost) - fixedcosts)
a histogram for Net profit
The script is:
mydata <- lapply(1:10000, function(i) {
DU <- sample(x= 1:3, size = 1, replace = T)
if(DU==1) {
SalesVolume= 100000
SellingPrice= 8
}
if(DU==2) {
SalesVolume= 75000
SellingPrice= 10
}
if(DU==3) {
SalesVolume= 50000
SellingPrice= 11
}
unitcost <- rtriangle(1,5.5,7.5)
fixedcosts <- 120000
NetProfit <- SalesVolume * (SellingPrice - unitcost) - fixedcosts
})
NetProfit <- unlist(mydata)
summary(NetProfit)
par(mfrow= c(1,1))
hist(NetProfit)

Related

R: Error in predict.xgboost: Feature names stored in `object` and `newdata` are different

I wrote a script using xgboost to predict soil class for a certain area using data from field and satellite images. The script as below:
`
rm(list=ls())
library(xgboost)
library(caret)
library(raster)
library(sp)
library(rgeos)
library(ggplot2)
setwd("G:/DATA")
data <- read.csv('96PointsClay02finalone.csv')
head(data)
summary(data)
dim(data)
ras <- stack("Allindices04TIFF.tif")
names(ras) <- c("b1", "b2", "b3", "b4", "b5", "b6", "b7", "b10", "b11","DEM",
"R1011", "SCI", "SAVI", "NDVI", "NDSI", "NDSandI", "MBSI",
"GSI", "GSAVI", "EVI", "DryBSI", "BIL", "BI","SRCI")
set.seed(27) # set seed for generating random data.
# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%)
parts = createDataPartition(data$Clay, p = .8, list = F)
train = data[parts, ]
test = data[-parts, ]
#define predictor and response variables in training set
train_x = data.matrix(train[, -1])
train_y = train[,1]
#define predictor and response variables in testing set
test_x = data.matrix(test[, -1])
test_y = test[, 1]
#define final training and testing sets
xgb_train = xgb.DMatrix(data = train_x, label = train_y)
xgb_test = xgb.DMatrix(data = test_x, label = test_y)
#defining a watchlist
watchlist = list(train=xgb_train, test=xgb_test)
#fit XGBoost model and display training and testing data at each iteartion
model = xgb.train(data = xgb_train, max.depth = 3, watchlist=watchlist, nrounds = 100)
#define final model
model_xgboost = xgboost(data = xgb_train, max.depth = 3, nrounds = 86, verbose = 0)
summary(model_xgboost)
#use model to make predictions on test data
pred_y = predict(model_xgboost, xgb_test)
# performance metrics on the test data
mean((test_y - pred_y)^2) #mse - Mean Squared Error
caret::RMSE(test_y, pred_y) #rmse - Root Mean Squared Error
y_test_mean = mean(test_y)
rmseE<- function(error)
{
sqrt(mean(error^2))
}
y = test_y
yhat = pred_y
rmseresult=rmseE(y-yhat)
(r2 = R2(yhat , y, form = "traditional"))
cat('The R-square of the test data is ', round(r2,4), ' and the RMSE is ', round(rmseresult,4), '\n')
#use model to make predictions on satellite image
result <- predict(model_xgboost, ras[1:(nrow(ras)*ncol(ras))])
#create a result raster
res <- raster(ras)
#fill in results and add a "1" to them (to get back to initial class numbering! - see above "Prepare data" for more information)
res <- setValues(res,result+1)
#Save the output .tif file into saved directory
writeRaster(res, "xgbmodel_output", format = "GTiff", overwrite=T)
`
The script works well till it reachs
result <- predict(model_xgboost, ras[1:(nrow(ras)*ncol(ras))])
it takes some time then gives this error:
Error in predict.xgb.Booster(model_xgboost, ras[1:(nrow(ras) * ncol(ras))]) :
Feature names stored in `object` and `newdata` are different!
I realize that I am doing something wrong in that line. However, I do not know how to apply the xgboost model to a raster image that represents my study area.
It would be highly appreciated if someone give a hand, enlightened me, and helped me solve this problem....
My data as csv and raster image can be found here.
Finally, I got the reason for this error.
It was my mistake as the number of columns in the traning data was not the same as in the number of layers in the satellite image.

Handling error with regressions inside a parallel foreach loop

Hi I am having issues regarding a foreach loop where in every iteration I estimate a regression on a subset of the data with a different list of controls on several outcomes. The problem is that for some outcomes in some countries I only have missing values and therefore the regression function returns an error message. I would like to be able to run the loop, get the output with NAs or a string saying "Error" for example instead of the coefficient table. I tried several things but they don't quite work with the .combine = rbind option and if I use .combine = c I get a very messy output. Thanks in advance for any help.
reg <- function(y, d, c){
if (missing(c))
feols(as.formula(paste0(y, "~ 0 + treatment")), data = d)
else {
feols(as.formula(paste0(y, "~ 0 + treatment + ", c)), data = d)
}
}
# Here we set up the parallelization to run the code on the server
n.cores <- 9 #parallel::detectCores() - 1
#create the cluster
my.cluster <- parallel::makeCluster(
n.cores,
type = "PSOCK"
)
# print(my.cluster)
#register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)
# #check if it is registered (optional)
# foreach::getDoParRegistered()
# #how many workers are available? (optional)
# foreach::getDoParWorkers()
# Here is the cycle to parallel regress each outcome on the global treatment
# variable for each RCT with strata control
tables <- foreach(
n = 1:9, .combine = rbind, .packages = c('data.table', 'fixest'),
.errorhandling = "pass"
) %dopar% {
dt_target <- dt[country == n]
c <- controls[n]
est <- lapply(outcomes, function(x) reg(y = x, d = dt_target, c))
table <- etable(est, drop = "!treatment", cluster = "uid", fitstat = "n")
table
}

store results from a nested foreach loop in a FBM class

I am working with a big shared memory matrix of 1.3e6x1.3e6 in a foreach loop. I create that matrix with FBM function of bigstatsr package.
I need the results of the loop in the FBM class object to not run out of RAM memory.
This is what I want to do without FBM class object.
library(doParallel)
library(foreach)
library("doFuture")
cl=makeCluster(2)
registerDoParallel(cl
)
registerDoFuture()
plan(multicore)
results=foreach(a=1:4,.combine='cbind') %dopar% {
a=a-1
foreach(b=1:2,.combine='c') %dopar% {
return(10*a + b)
}
}
And this is how I try it
library(bigstatsr)
results=FBM(4,4,init=0)
saveinFBM=function(x,j){results[,j]=x}
foreach(a=1:4,.combine='savinFBM') %dopar% {
a=a-1
foreach(b=1:2,.combine='c') %dopar% {
return(10*a + b)
}
}
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'savinFBM' of mode 'function' was not found
PS: Could anybody add the tag "dofuture"?
If I understand correctly what you want to do, a faster alternative is using outer(1:2, 1:4, function(b, a) 10 * (a - 1) + b).
If you want to fill an FBM with this function, you can do:
library(bigstatsr)
X <- FBM(200, 400)
big_apply(X, a.FUN = function(X, ind) {
X[, ind] <- outer(rows_along(X), ind, function(b, a) 10 * (a - 1) + b)
NULL
})
Usually, using parallelism won't help when you write data on disk (what you do when you fill X[, ind]), but it you really want to try, you can use ncores = nb_cores() as additional argument of big_apply().

MPSCNN Weight Ordering

The Metal Performance Shader framework provides support for building your own Convolutional Neural Nets. When creating for instance an MSPCNNConvolution it requires a 4D weight tensor as init parameter that is represented as a 1D float pointer.
init(device: MTLDevice,
convolutionDescriptor: MPSCNNConvolutionDescriptor,
kernelWeights: UnsafePointer<Float>,
biasTerms: UnsafePointer<Float>?,
flags: MPSCNNConvolutionFlags)
The documentation has this to say about the 4D tensor
The layout of the filter weight is arranged so that it can be
reinterpreted as a 4D tensor (array)
weight[outputChannels][kernelHeight][kernelWidth][inputChannels/groups]
Unfortunately that information doesn't really tell me how to arrange a 4D array into a one dimensional Float pointer.
I tried ordering the weights like the BNNS counterpart requires it, but without luck.
How do I properly represent the 4D tensor (array) as a 1D Float pointer (array)?
PS: I tried arranging it like a C array and getting the pointer to the flat array, but it didn't work.
UPDATE
#RhythmicFistman: That's how I stored it in a plain array, which I can convert to a UsafePointer<Float> (but doesn't work):
var output = Array<Float>(repeating: 0, count: weights.count)
for o in 0..<outputChannels {
for ky in 0..<kernelHeight {
for kx in 0..<kernelWidth {
for i in 0..<inputChannels {
let offset = ((o * kernelHeight + ky) * kernelWidth + kx) * inputChannels + i
output[offset] = ...
}
}
}
}
Ok so I figured it out. Here are the 2 python functions I use to reform my convolutions and fully connected matrices
# shape required for MPSCNN [oC kH kW iC]
# tensorflow order is [kH kW iC oC]
def convshape(a):
a = np.swapaxes(a, 2, 3)
a = np.swapaxes(a, 1, 2)
a = np.swapaxes(a, 0, 1)
return a
# fully connected only requires a x/y swap
def fullshape(a):
a = np.swapaxes(a, 0, 1)
return a
This is something I recently had to do for Caffe weights, so I can provide the Swift implementation for how I reordered those. The following function takes in a Float array of Caffe weights for a convolution (in [c_o][c_i][h][w] order) and reorders those to what Metal expects ([c_o][h][w][c_i] order):
public func convertCaffeWeightsToMPS(_ weights:[Float], kernelSize:(width:Int, height:Int), inputChannels:Int, outputChannels:Int, groups:Int) -> [Float] {
var weightArray:[Float] = Array(repeating:0.0, count:weights.count)
var outputIndex = 0
let groupedInputChannels = inputChannels / groups
let outputChannelWidth = groupedInputChannels * kernelSize.width * kernelSize.height
// MPS ordering: [c_o][h][w][c_i]
for outputChannel in 0..<outputChannels {
for heightInKernel in 0..<kernelSize.height {
for widthInKernel in 0..<kernelSize.width {
for inputChannel in 0..<groupedInputChannels {
// Caffe ordering: [c_o][c_i][h][w]
let calculatedIndex = outputChannel * outputChannelWidth + inputChannel * kernelSize.width * kernelSize.height + heightInKernel * kernelSize.width + widthInKernel
weightArray[outputIndex] = weights[calculatedIndex]
outputIndex += 1
}
}
}
}
return weightArray
}
Based on my layer visualization, this seems to generate the correct convolution results (matching those produced by Caffe). I believe it also properly takes grouping into account, but I need to verify that.
Tensorflow has a different ordering than Caffe, but you should be able to change the math in the inner part of the loop to account for that.
The documentation here assumes some expertise in C. In that context, a[x][y][z] is typically collapsed into a 1-d array when x, y and z are constants known at compile time. When this happens, the z component varies most quickly, followed by y, followed by x -- outside in.
If we have a[2][2][2], it is collapsed to 1D as:
{ a[0][0][0], a[0][0][1], a[0][1][0], a[0][1][1],
a[1][0][0], a[1][0][1], a[1][1][0], a[1][1][1] }
I think tensorflow already has a convenient method for such task:
tf.transpose(aWeightTensor, perm=[3, 0, 1, 2])
Full documentation: https://www.tensorflow.org/api_docs/python/tf/transpose

Total sum from a set (logic)

I have a logic problem for an iOS app but I don't want to solve it using brute-force.
I have a set of integers, the values are not unique:
[3,4,1,7,1,2,5,6,3,4........]
How can I get a subset from it with these 3 conditions:
I can only pick a defined amount of values.
The sum of the picked elements are equal to a value.
The selection must be random, so if there's more than one solution to the value, it will not always return the same.
Thanks in advance!
This is the subset sum problem, it is a known NP-Complete problem, and thus there is no known efficient (polynomial) solution to it.
However, if you are dealing with only relatively low integers - there is a pseudo polynomial time solution using Dynamic Programming.
The idea is to build a matrix bottom-up that follows the next recursive formulas:
D(x,i) = false x<0
D(0,i) = true
D(x,0) = false x != 0
D(x,i) = D(x,i-1) OR D(x-arr[i],i-1)
The idea is to mimic an exhaustive search - at each point you "guess" if the element is chosen or not.
To get the actual subset, you need to trace back your matrix. You iterate from D(SUM,n), (assuming the value is true) - you do the following (after the matrix is already filled up):
if D(x-arr[i-1],i-1) == true:
add arr[i] to the set
modify x <- x - arr[i-1]
modify i <- i-1
else // that means D(x,i-1) must be true
just modify i <- i-1
To get a random subset at each time, if both D(x-arr[i-1],i-1) == true AND D(x,i-1) == true choose randomly which course of action to take.
Python Code (If you don't know python read it as pseudo-code, it is very easy to follow).
arr = [1,2,4,5]
n = len(arr)
SUM = 6
#pre processing:
D = [[True] * (n+1)]
for x in range(1,SUM+1):
D.append([False]*(n+1))
#DP solution to populate D:
for x in range(1,SUM+1):
for i in range(1,n+1):
D[x][i] = D[x][i-1]
if x >= arr[i-1]:
D[x][i] = D[x][i] or D[x-arr[i-1]][i-1]
print D
#get a random solution:
if D[SUM][n] == False:
print 'no solution'
else:
sol = []
x = SUM
i = n
while x != 0:
possibleVals = []
if D[x][i-1] == True:
possibleVals.append(x)
if x >= arr[i-1] and D[x-arr[i-1]][i-1] == True:
possibleVals.append(x-arr[i-1])
#by here possibleVals contains 1/2 solutions, depending on how many choices we have.
#chose randomly one of them
from random import randint
r = possibleVals[randint(0,len(possibleVals)-1)]
#if decided to add element:
if r != x:
sol.append(x-r)
#modify i and x accordingly
x = r
i = i-1
print sol
P.S.
The above give you random choice, but NOT with uniform distribution of the permutations.
To achieve uniform distribution, you need to count the number of possible choices to build each number.
The formulas will be:
D(x,i) = 0 x<0
D(0,i) = 1
D(x,0) = 0 x != 0
D(x,i) = D(x,i-1) + D(x-arr[i],i-1)
And when generating the permutation, you do the same logic, but you decide to add the element i in probability D(x-arr[i],i-1) / D(x,i)

Resources