Correcting for spatial autocorrelation (GLS, nlme package) - nlme

I am trying to account for spatial autocorrelation using the gls function in the nlme package using the following code:
sp2 <- gls(final$formula, data = BITS_B, correlation = corLin(form = ~ Mid.Y + Mid.X, nugget = TRUE))
But I keep getting the error:
Error in getCovariate.corSpatial(object, data = data) : cannot have
zero distances in "corSpatial"
I tried to add year as another group factor (since I had samples from 30 years):
sp2 <- gls(final$formula, data = BITS_B, correlation = corLin(form = ~ Mid.Y + Mid.X | Year, nugget = TRUE))
But the problem persists. Does anyone know what else I cand do? I am not sure I understand the nature of the issue.
Thank you!

Related

Leave one out cross validation RStudio randomForest package error

Creating an LOOCV loop using the randomForest package. I have adapted the following code from this link (https://stats.stackexchange.com/questions/459293/loocv-in-caret-package-randomforest-example-not-unique-results) however I am unable to reproduce a successful code.
Here is the code that I am running but on the iris dataset.
irisdata <- iris[1:150,]
predictionsiris <- 1:150
for (k in 1:150){
set.seed(123)
predictioniris[k] <- predict(randomForest(Petal.Width ~ Sepal.Length, data = irisdata[-k], ntree = 10), newdata = irisdata[k,,drop=F])[2]
}
What I would expect to happen is for it to run the random forest model on all but one row and then use that one row to test the model.
However, when I run this code, I get the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'predict': object 'Sepal.Length' not found
Any suggestions? I have been messing around with LOOCV code for the past two days including messing with code in this page (Compute Random Forest with a leave one ID out cross validation in R) and running the following:
iris %>%
mutate(ID = 1:516)
loocv <- NULL
for(i in iris$ID){
test[[i]] <- slice(iris, i)
train[[i]] <- slice(iris, i+1:516)
rf <- randomForest(Sepal.Length ~., data = train, ntree = 10, importance = TRUE)
loocv[[i]] <- predict(rf, newdata = test)
}
but I have had no success. Any help would be appreciated.

How to only include p-value for certain comparisons using stat_pvalue_manual function?

I am trying to visualise a friedman's test followed by pairwise comparisons using a boxplot with p-values.
Here is an example of how it should look like:
[example graph downloaded from the internet][1]
However, since there are way too many significant comparisons in my case, my graph currently looks like this:
[my graph][2]
[1]: https://i.stack.imgur.com/DO6Vz.png
[2]: https://i.stack.imgur.com/94OXK.png
Here is the code I used to generate the graph with p-value
pwc_IFX_plot <- pwc_IFX %>% add_xy_position(x = "Variant")
ggboxplot(IFX_variant, x = "Variant", y = "Concentration", add = "point") +
stat_pvalue_manual(pwc_IFX_plot, hide.ns = TRUE)+
labs(
subtitle = get_test_label(res.fried_IFX, detailed = TRUE),
caption = get_pwc_label(pwc_IFX)
)+scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
I hope to only show the comparisons of each group to my control group, rather than all the intergroup comparisons.
Thank you for your time.
Any suggestions would be highly appreciated!

Getting an error while converting Tibble to h2o hex file

I am running the h2o package in Rstudio, I am getting an error while converting Tibble into h2o.
Below is my code
#Augment Time Series Signature
PO_Data_aug = PO_Data %>%
tk_augment_timeseries_signature()
PO_Data_aug
# Split into training, validation and test sets
train_tbl = PO_Data_aug %>% filter(Date <= '2017-12-29')
valid_tbl = PO_Data_aug %>% filter(Date>'2017-12-29'& Date <='2018-03-31')
test_tbl = PO_Data_aug %>% filter(Date > '2018-03-31')
str(train_tbl)
train_tbl$month.lbl<-as.character(train_tbl$month.lbl)
h2o.init() # Fire up h2o
##hex
train_h2o = as.h2o(train_tbl)
valid_h2o = as.h2o(valid_tbl)
test_h2o = as.h2o(test_tbl)
ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/3/Parse)
ERROR MESSAGE:
Provided column type ordered is unknown. Cannot proceed with parse due to invalid argument.
Kindly Suggest
This is actually a bug in H2O -- it has nothing to do with tibbles. There is no support for the "ordered" column type in data.frames or tibbles. We will fix this (ticket here).
The work-around right now is to manually convert your "ordered" columns into un-ordered "factor" columns.
tb <- tibble(x = ordered(c(1,2,3)), y = 1:3)
tb$x <- factor(tb$x, ordered = FALSE)
hf <- as.h2o(tb)
as.h2o() expects an R dataframe. You could use an R dataframe instead of your tibble dataframe or as Tom mentioned in the comments you could use one of the supported file formats for H2O.
train_h2o = as.h2o(as_data_frame(train_tbl))
valid_h2o = as.h2o(as_data_frame(valid_tbl))
test_h2o = as.h2o(as_data_frame(test_tbl))

Glm with caret package producing "missing values in resampled performance measures"

I obtained the following code from this Stack Overflow question. caret train() predicts very different then predict.glm()
The following code is producing an error.
I am using caret 6.0-52.
library(car); library(caret); library(e1071)
#data import and preparation
data(Chile)
chile <- na.omit(Chile) #remove "na's"
chile <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required
chile$vote <- factor(chile$vote) #required to remove unwanted levels
chile$income <- factor(chile$income) # treat income as a factor
tc <- trainControl("cv", 2, savePredictions=T, classProbs=TRUE,
summaryFunction=twoClassSummary) #"cv" = cross-validation, 10-fold
fit <- train(chile$vote ~ chile$sex +
chile$education +
chile$statusquo ,
data = chile ,
method = "glm" ,
family = binomial ,
metric = "ROC",
trControl = tc)
Running this code produces the following error.
Something is wrong; all the ROC metric values are missing:
ROC Sens Spec
Min. : NA Min. :0.9354 Min. :0.9187
1st Qu.: NA 1st Qu.:0.9354 1st Qu.:0.9187
Median : NA Median :0.9354 Median :0.9187
Mean :NaN Mean :0.9354 Mean :0.9187
3rd Qu.: NA 3rd Qu.:0.9354 3rd Qu.:0.9187
Max. : NA Max. :0.9354 Max. :0.9187
NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Would anyone know what the issue is or can reproduce/ not reproduce this error. I've seen other answers to this error message that says this has to do with not having representation of classes in each cross validation fold but this isn't the issue as the number of folds is set to 2.
Looks like I needed to install and load the pROC package.
install.packages("pROC")
library(pROC)
You should install using
install.packages("caret", dependencies = c("Imports", "Depends", "Suggests"))
That gets most of the default packages. If there are specific modeling packages that are missing, the code usually prompts you to install them.
I know I'm late to the party, but I think you need to set classProbs = TRUE in train control.
You are using logistic regression when using the parameters method = "glm", family = binomial.
In this case, you must make sure that the target variable (chile$vote) has only 2 factor levels, because logistic regression only performs binary classification.
If the target has more than two labels, then you must set family = "multinomial"

Invalid indexing operation error when trying to draw epipolar lines

I'm creating Stereo images processing project modeled on Matlab's examples. A copy pasted code from one of them don't works well.
I1 = rgb2gray(imread('viprectification_deskLeft.png'));
I2 = rgb2gray(imread('viprectification_deskRight.png'));
points1 = detectHarrisFeatures(I1);
points2 = detectHarrisFeatures(I2);
[features1, valid_points1] = extractFeatures(I1, points1);
[features2, valid_points2] = extractFeatures(I2, points2);
indexPairs = matchFeatures(features1, features2);
matchedPoints1 = valid_points1(indexPairs(:, 1),:);
matchedPoints2 = valid_points2(indexPairs(:, 2),:);
figure; showMatchedFeatures(I1, I2, matchedPoints1, matchedPoints2);
load stereoPointPairs
[fLMedS, inliers] = estimateFundamentalMatrix(matchedPoints1,matchedPoints2,'NumTrials',4000);
figure;
subplot(121); imshow(I1);
title('Inliers and Epipolar Lines in First Image'); hold on;
plot(matchedPoints1(inliers,1), matchedPoints1(inliers,2), 'go');
An error:
Error using epilineTest (line 24) Invalid indexing operation.
Best regards
Looks like you have an older version of MATLAB. Try doing this:
[fLMedS, inliers] = estimateFundamentalMatrix(...
matchedPoints1.Location, matchedPoints2.Location,'NumTrials',4000);
Generally, look at the example in your own local MATLAB documentation, rather than the one on the website. The website has the doc for the latest release (currently R2014a), and the examples may be using new features that do not exist in the older versions.

Resources