Error in predict.NaiveBayes : "Not all variable names used in object found in newdata"-- (Although no variables are missing) - machine-learning

I'm still learning to use caret package through The caret Package by Max Kuhn and got stuck in 16.2 Partial Least Squares Discriminant Analysis section while trying to predict using the plsBayesFit model through predict(plsBayesFit, head(testing), type = "prob") as shown in the book as well.
The data used is data(Sonar) from mlbench package, with the data being split as:
inTrain <- createDataPartition(Sonar$Class, p = 2/3, list = FALSE)
sonarTrain <- Sonar[ inTrain, -ncol(Sonar)]
sonarTest <- Sonar[-inTrain, -ncol(Sonar)]
trainClass <- Sonar[ inTrain, "Class"]
testClass <- Sonar[-inTrain, "Class"]
and then preprocessed as follows:
centerScale <- preProcess(sonarTrain)
centerScale
training <- predict(centerScale, sonarTrain)
testing <- predict(centerScale, sonarTest)
after this the model is trained using plsBayesFit <- plsda(training, trainClass, ncomp = 20, probMethod = "Bayes"), followed by predicted using predict(plsBayesFit, head(testing), type = "prob").
When I'm trying to do this I get the following error:
Error in predict.NaiveBayes(object$probModel[[ncomp[i]]], as.data.frame(tmpPred[, : Not all variable names used in object found in newdata
I've checked both the training and testing sets to check for any missing variable but there isn't any. I've also tried to predict using the 2.7.1 version of pls package which was used to render the book at that time but that too is giving me same error. What's happening?

I've tried to replicate your problem using different models, as I have encountered this error as well, but I failed; and caret seems to behave differently now from when I used it.
In any case stumbled upon this Github-issues here, and it seems like that there is a specific problem with the klaR-package. So my guess is that this is simply a bug - and nothing that can be readily fixed here!

Related

Constraints for Direct Collocation method in pydrake

I am looking for a way to describe the constraints of the Direct Collocation method in pydrake.
I got the robot model from my own URDF by using FindResource as this(l11-16).
Then, I tried to make some functions which calculate the positions of the joints as swing_foot_height(q) of this.
However there is a problem.
It is maybe a type error.
I defined q as following
robot = MultibodyPlant(time_step=0.0)
scene_graph = SceneGraph()
robot.RegisterAsSourceForSceneGraph(scene_graph)
file_name = FindResource("models/robot.urdf")
Parser(robot).AddModelFromFile(file_name)
robot.Finalize()
context = robot.CreateDefaultContext()
dircol = DirectCollocation(
robot,
context,
...(Omission)...
input_port_index=robot.get_actuation_input_port().get_index())
x = dircol.state()
nq = biped_robot.num_positions()
q = x[0:nq]
Then, I used this q for the function like swing_foot_height(q).
The error is like
SetPositions(): incompatible function arguments. The following argument types are supported:
...
q: numpy.ndarray[numpy.float64[m, 1]]
...
Invoked with:
...
array([Variable('x(0)', Continuous), ... Variable('x(9)', Continuous)],dtype=object)
Are there some way to avoid this error?
Right. In the compass gait notebook that you cited, there was an important line:
# overwrite MultibodyPlant with its autodiff copy
compass_gait = compass_gait.ToAutoDiffXd()
so that multibody plant that was being used in the constraint is actually an AutoDiffXd version of the plant.
The littledog notebook has more examples of this, with a more robust implementation that works for both float and autodiff constraint evaluations.
As far as I understand this, trajectory optimization with DirectCollocation converts the data type of the decision variables (in your case, x and q) to AutoDiffXd type. That is the type you're seeing here in the "Invoked with" error message. This is the type used for automatic differentiation which is used for finding the gradients for the optimization solver.
You'll need to convert back to float to use the SetPositions() function.

Error when trying to get tidy summaries with broomstick for multiple randomForest models

I'm grouping my data frame and fitting each group's data with a random forest model, and then using broomstick to get tidy outputs for each group's model. I'm running into trouble when I get to tidy and augment.
I can group the data and fit the models.
library(tidyverse)
library(broomstick)
library(randomForest)
data<-data.frame(y=rep(rep(c(1,0),each=100),5),
group=rep(c("A","B","C","D","E"), each=200),
x1=rnorm(2000),
x2=rnorm(2000),
x3=rnorm(2000),
x4=rnorm(2000),
x5=rnorm(2000))
GroupModels<-data%>%
nest(data= -group)%>%
mutate(fit = map(data, ~ randomForest(y ~ ., ntree=101, mtry=2, data = .x, importance=TRUE)))
I then map glance to the fitted models and that works. I get mse and rsq for each group.
GroupModels%>%
mutate(glanced = map(fit, glance))%>%
unnest(glanced)%>%
select(-data, -fit)%>%
as.data.frame()
If I map tidy to the fitted models I get an output and a deprecation warning and I don't understand where tibble::as_tibble() should come into play.
GroupModels%>%
mutate(tidied = map(fit, tidy))%>%
unnest(tidied)%>%
select(-data, -fit)%>%
as.data.frame()
1: Problem with mutate() column tidied. i tidied = map(fit, tidy). i This function is deprecated as of broom 0.7.0 and will be
removed from a future release. Please see tibble::as_tibble().
If I map augment to the models I get an error and I'm not sure what to do with that.
GroupModels%>%
mutate(augmented = map(fit, augment))%>%
unnest(augmented)%>%
select(-data, -fit)%>%
as.data.frame()
Error: Problem with mutate() column augmented. i augmented = map(fit, augment). x argument must be coercible to non-negative
integer
I can now get augment to work using "map2", didn't know about this, but it's handy when you need both the fit and the data for a function. I guess I'll worry about the deprecation warning when it happens.
GroupModels%>%
mutate(augmented = map2(fit, data, augment))%>%
unnest(augmented)%>%
select(-data, -fit)%>%
as.data.frame()

Creating a simple Rcpp package with dependency with other Rcpp package

I am trying to improve my loop computation speed by using foreach, but there is a simple Rcpp function I defined inside of this loop. I saved the Rcpp function as mproduct.cpp, and I call out the function simply using
sourceCpp("mproduct.cpp")
and the Rcpp function is a simple one, which is to perform matrix product in C++:
// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
#include <RcppArmadillo.h>
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}
So, the function in the Rcpp file is MP, referring to matrix product. I need to perform the following foreach loop (I have simplified the code for illustration):
foreach(j=1:n, .package='Rcpp',.noexport= c("mproduct.cpp"),.combine=rbind)%dopar%{
n=1000000
A<-matrix(rnorm(n,1000,1000))
B<-matrix(rnorm(n,1000,1000))
S<-MP(A,B)
return(S)
}
Since the size of matrix A and B are large, it is why I want to use foreach to alleviate the computational cost.
However, the above code does not work, since it provides me error message:
task 1 failed - "NULL value passed as symbol address"
The reason I added .noexport= c("mproduct.cpp") is to follow some suggestions from people who solved similar issues (Can't run Rcpp function in foreach - "NULL value passed as symbol address"). But somehow this does not solve my issue.
So I tried to install my Rcpp function as a library. I used the following code:
Rcpp.package.skeleton('mp',cpp_files = "<my working directory>")
but it returns me a warning message:
The following packages are referenced using Rcpp::depends attributes however are not listed in the Depends, Imports or LinkingTo fields of the package DESCRIPTION file: RcppArmadillo, RcppEigen
so when I tried to install my package using
install.packages("<my working directory>",repos = NULL,type='source')
I got the warning message:
Error in untar2(tarfile, files, list, exdir, restore_times) :
incomplete block on file
In R CMD INSTALL
Warning in install.packages :
installation of package ‘C:/Users/Lenovo/Documents/mproduct.cpp’ had non-zero exit status
So can someone help me out how to solve 1) using foreach with Rcpp function MP, or 2) install the Rcpp file as a package?
Thank you all very much.
The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:
set.seed(42)
n <- 1000
A<-matrix(rnorm(n*n), n, n)
B<-matrix(rnorm(n*n), n, n)
MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}", depends = "RcppEigen")
bench::mark(MP(A, B), A %*% B)[1:5]
#> # A tibble: 2 x 5
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 MP(A, B) 277.8ms 278ms 3.60 7.63MB
#> 2 A %*% B 37.4ms 39ms 22.8 7.63MB
So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.
Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.

F# Compiler Services incorrectly parses program

UPDATE:
I now realize that the question was stupid, I should have just filed the issue. In hindsight, I don't see why I even asked this question.
The issue is here: https://github.com/fsharp/FSharp.Compiler.Service/issues/544
Original question:
I'm using FSharp Compiler Services for parsing some F# code.
The particular piece of code that I'm facing right now is this:
let f x y = x+y
let g = f 1
let h = (g 2) + 3
This program yields a TAST without the (+) call on the last line. That is, the compiler service returns TAST as if the last line was just let h = g 2.
The question is: is this is a legitimate bug that I ought to report or am I missing something?
Some notes
Here is a repo containing minimal repro (I didn't want to include it in this question, because Compiler Services require quite a bit of dancing around).
Adding more statements after the let h line does not change the outcome.
When compiled to IL (as opposed to parsed with Compiler Services), it seems to work as expected (e.g. see fiddle)
If I make g a value, the program parses correctly.
If I make g a normal function (rather than partially applied one), the program parses correctly.
I have no priori experience with FSharp.Compiler.Services but nevertheless I did a small investigation using Visual Studio's debugger. I analyzed abstract syntax tree of following string:
"""
module X
let f x y = x+y
let g = f 1
let h = (g 2) + 3
"""
I've found out that there's following object inside it:
App (Val (op_Addition,NormalValUse,D:\file.fs (6,32--6,33) IsSynthetic=false),TType_forall ([T1; T2; T3],TType_fun (TType_var T1,TType_fun (...,...))),...,...,...)
As you can see, there's an addition in 6th line between characters 32 and 33.
The most likely explanation why F# Interactive doesn't display it properly is a bug in a library (maybe AST is in an inconsistent state or pretty-printing is broken). I think that you should file a bug in project's issue tracker.
UPDATE:
Aforementioned object can be obtained in a debbuger in a following way:
error.[0]
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpImplementationFileDeclaration.Entity)
.Item2
.[2]
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpImplementationFileDeclaration.MemberOrFunctionOrValue)
.Item3
.f (private member)
.Value
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpExprConvert.ConvExprOnDemand#903)
.expr

IllegalArgumentException when using weka.clusterers.HierarchicalClusterer

I searched a lot, but I was not able to find any example code, which describes how to use the WEKA HierarchicalClusterer. Using the following C#-code gives me an IllegalArgumentException at "agg.buildClusterer(insts);".
weka.clusterers.HierarchicalClusterer agg = new weka.clusterers.HierarchicalClusterer();
agg.setNumClusters(NumCluster);
/*
Tag[] TAGS_LINK_TYPE = agg.getLinkType().getTags();
agg.setLinkType(new SelectedTag(1, TAGS_LINK_TYPE));
*/
agg.buildClusterer(insts);
for (int i = 0; i < insts.numInstances(); i++)
{
int clusterNumber = agg.clusterInstance(insts.instance(i));
}
The StackTrace says:
at java.util.PriorityQueue..ctor(Int32 initialCapacity, Comparator comparator)
at weka.clusterers.HierarchicalClusterer.doLinkClustering(Int32 , Vector[] , Node[] )
at weka.clusterers.HierarchicalClusterer.buildClusterer(Instances data)
but no Message or InnerException is specified.
The varaible "insts" is an Instances-object, which only holds instances with an equal amount of numerical attributes.
Is anyone able to quickly find my error or please post/link some example code?
Further, is the setting of the LinkType (commented code) correct?
Thanks,
Björn
The HierarchicalClusterer class has a TAGS_LINK_TYPE attribute. So like
agg.setLinkType(new SelectedTag(1, HierarchicalClusterer.TAGS_LINK_TYPE));
will achieve what you are after for setting the linking. Now what on earth does that 1 mean? From the javadocs we see what TAGS_LINK_TYPE contains:
-L Link type (Single, Complete, Average, Mean, Centroid, Ward, Adjusted complete, Neighbor Joining)
[SINGLE|COMPLETE|AVERAGE|MEAN|CENTROID|WARD|ADJCOMLPETE|NEIGHBOR_JOINING]
In general, your code looks ok for the C# case. I see you don't set the distance metric in your example above and maybe you would want to do this? I too use Weka as best I can with C# using IKVM. I have found the dataset allowed for hierarchical clustering is not too large. Maybe your dataset exceeds what WEKA can handled and you would avoid your error if you reduced the size of the dataset?

Resources