randomForest using caret : shows incorrect mtry value - r-caret

I am trying randomForest using the 'caret' package. When I run the basic command without providing any controls, it shows that caret used mtry=5 in the final model. ie, it used 5 predictors.
However, my data has 4 predictors. Can anyone explain why it shows mtry=5?
Here is my code:
library(caret)
data(iris)
set.seed(100)
model.rf = train(Petal.Length~., data=iris, method="rf")
print(model.rf$finalModel)
Call:
randomForest(x = x, y = y, mtry = param$mtry)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 5
Mean of squared residuals: 0.06799251
% Var explained: 97.8

If you do not specify a grid search, then the model info for method = "rf" will by default use var_seq(p = ncol(x)) where in this case x is the dataset iris. If you use var_seq(ncol(iris)) it will return 2 3 and 5. These values will be used in the default grid search for the mtry paramater. This returns 3 rf models and the one with the lowest rmse will be chosen as the final model. You can see this by just typing model.rf.
The reason you see 5 has to do with your seed. If you set the seed to 99 the chosen model will have an mtry of 3.
Of course just because the mtry is 5 does not mean that there suddenly is an extra variable to be chosen. It will just take all the variables available.

#phiver, Thank you for explaining var_seq. I am afraid it does not provide the full answer to my question. I found out that the following function provides the answer.
predictors(model.rf)
#[1] "Sepal.Length" "Sepal.Width" "Petal.Width"
#[4] "Speciesversicolor" "Speciesvirginica"
We see that caret replaces the categorical predictor 'Species' with 2 dummy variables. This is why we see 5 predictors, though we have 4 actual predictors for predicting Sepal.Length. (I assume that you are familiar with iris data, hence I am not giving the details of its data structure.).

I think that the mtry value means the number of forests used in you model.

Related

Unable to inverse_transform the value of feature because of different dimensionality

I'm designing a multivariate time series model. For that I'm inputing 5 features to lstm model and try to predict the output of 1 variable(i.e. whose value is dependent on itself and other 4 features).
For that I'm doing the feature scaling as follows:-
#Features Scaling
`from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))
training_set_scaled = sc.fit_transform(training_set)
print(training set scaled)`
Output:-
At the output of the model, I got the predicted value as:
However, when it tried to inverse transform it as:
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
I got the the following error:-
non-broadcastable output operand with shape (65,1) doesn't match the broadcast shape (65,5)
Please help. Thank you in advance :)
The problem is that you use sc to min-max-scale the five features. Therefore, sc can also only be used to inverse transform the scaled version of the features (shown by you as output), which would give you back the original feature values.
The label (model output) is independent from that. You can also, but do not necessarily have to scale your dependent variable, and certainly not with the same scaler object.

keras model output vector of one hot vectors, is it possible? are there any alternatives if not?

So I have an output vector of dim=7 and 4 possible classes for each position, so my question is, is it possible to feed the keras model a vector of one hot vectors, where each position of the vector is a one hot vector? something like this [[1000],[1000],[0100],[0010],[0001],[0001],[0010]].
If this is not possible are there any alternatives?
If you want your output of your model to be like that when your model = keras.models.Model(...), the answer is not possible because the output that you provide (which is like a step respond "[1000] => [0000]") will have a gradient of infinity at step and 0 at other point.
What people do is to create a model that give distribution over different action and select the highest probability as predicted value and using cross entropy loss to optimize the model. For example, from your output [1,0,0,0] you will have something like [0.9,0.01,0.01,0.08] instead. Then you can pick first instance as predicted value. This will make sure that your model have gradient at all point.
So if you really want your model to have dim = 7 and 4 different value, you can create output size of 28 = 7*4 with sigmoid activation, then pick first 4 as your dimension 1 (maybe using something like np.argmax(output[0:4])) and so on.

How to handle weighted average for AUC and selecting the right threshold for building the confusion matrix?

I have a binary classification task, where I fit the model using XGBClassifier classifier and try to predict ’1’ and ‘0’ using the test set. In this task I have a very unbalanced data majority ‘0‘ and minority ‘1’ at training data (of coarse the same in the test set). My data looks like this:
F1 F2 F3 …. Target
S1 2 4 5 …. 0
S2 2.3 4.3 6.4 1
… … … …. ..
S4000 3 6 7 0
I used the following code to train the model and calculate the roc value:
my_cls=XGBClassifier()
X=mydata_train.drop(['target'])
y= mydata_train['target']
x_tst=mydata_test.drop['target']
y_tst= mydata_test['target']
my_cls.fit(X, y)
pred= my_cls.predict_proba(x_tst)[:,1]
auc_score=roc_auc_score(y_tst,pred)
The above code gives me a value as auc_score, but it seems this value is for one class using this my_cls.predict_proba(x_tst)[:,1], If I change it to my_cls.predict_proba(x_tst)[:,0], it gives me another value as auc value. My first question is how can I directly get the weighted average for auc? My second question is how to select the right cut point to build the confusion matrix having the unbalanced data? This is because by default the classifier uses 50% as the threshold to build the matrix, but since my data is very unbalanced it seems we need to select a right threshold. I need to count TP and FP thats why I need to have this cut point.
If I use weight class to train the model, does it handle the problem (I mean can I use the 50% cut point by default)? For example some thing like this:
My_clss_weight=len(X) / (2 * np.bincount(y))
Then try to fit the model with this:
my_cls.fit(X, y, class_weight= My_clss_weight)
However the above code my_cls.fit(X, y, class_weight= My_clss_weight)
does not work with XGBClassifier and gives me error. This works with LogessticRegression, but I want to apply with XGBClassifier! any idea to handle the issues?
To answer your first question, you can simply use the parameter weighted of the roc_auc_score function.
For example -
roc_auc_score(y_test, pred, average = 'weighted')
To answer the second half of your question, can you please elaborate a bit. I can help you with that.

Parameter selection and k-fold cross-validation

I have one dataset, and need to do cross-validation, for example, a 10-fold cross-validation, on the entire dataset. I would like to use radial basis function (RBF) kernel with parameter selection (there are two parameters for an RBF kernel: C and gamma). Usually, people select the hyperparameters of SVM using a dev set, and then use the best hyperparameters based on the dev set and apply it to the test set for evaluations. However, in my case, the original dataset is partitioned into 10 subsets. Sequentially one subset is tested using the classifier trained on the remaining 9 subsets. It is obviously that we do not have fixed training and test data. How should I do hyper-parameter selection in this case?
Is your data partitioned into exactly those 10 partitions for a specific reason? If not you could concatenate/shuffle them together again, then do regular (repeated) cross validation to perform a parameter grid search. For example, with using 10 partitions and 10 repeats gives a total of 100 training and evaluation sets. Those are now used to train and evaluate all parameter sets, hence you will get 100 results per parameter set you tried. The average performance per parameter set can be computed from those 100 results per set then.
This process is built-in in most ML tools already, like with this short example in R, using the caret library:
library(caret)
library(lattice)
library(doMC)
registerDoMC(3)
model <- train(x = iris[,1:4],
y = iris[,5],
method = 'svmRadial',
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(C=3**(-3:3), sigma=3**(-3:3)), # all permutations of these parameters get evaluated
trControl = trainControl(method = 'repeatedcv',
number = 10,
repeats = 10,
returnResamp = 'all', # store results of all parameter sets on all partitions and repeats
allowParallel = T))
# performance of different parameter set (e.g. average and standard deviation of performance)
print(model$results)
# visualization of the above
levelplot(x = Accuracy~C*sigma, data = model$results, col.regions=gray(100:0/100), scales=list(log=3))
# results of all parameter sets over all partitions and repeats. From this the metrics above get calculated
str(model$resample)
Once you have evaluated a grid of hyperparameters you can chose a reasonable parameter set ("model selection", e.g. by choosing a well performing while still reasonable incomplex model).
BTW: I would recommend repeated cross validation over cross validation if possible (eventually using more than 10 repeats, but details depend on your problem); and as #christian-cerri already recommended, having an additional, unseen test set that is used to estimate the performance of your final model on new data is a good idea.

Weka Classification

I was trying to data model a Classification Machine Learning algorithm on a data set which has 32 attributes,the last column being Target class.I refined the attributes number in to 6 from 32 ,which I felt would be more useful for my Classification model.
I tried to perform J48 and some incremental classification algorithm.
I expected output structure which consists of confusion matrix,correctlt and incorrectly classified instances,kappa value.
But my result did not give any information on Correctly and Incorrectly classified instances.Also,it did not predict confusion matrix and Kappa value.All I received is like this:
=== Summary ===
Correlation coefficient 0.9482
Mean absolute error 0.2106
Root mean squared error 0.5673
Relative absolute error 13.4077 %
Root relative squared error 31.9157 %
Total Number of Instances 1461
Can anyone tell me why I did not get Confusion matrix,kappa and Correct,Incorrect instances information.
Unfortunately you didnt write your code, or what version of weka do you apply.
BTW, to calculate confusion mtx, kappa etc. you can use methods of Evaluation class, http://weka.sourceforge.net/doc.dev/weka/classifiers/Evaluation.html
for example, after you train your model:
classifier.buildClassifier(train); \\train is an instances
Evaluation eval = new Evaluation(train);
//evaulate your model at 10 fold cross validation manner
eval.crossValidateModel(classifier, train, 10, new Random(1));
System.out.println(classifier);
//print different stats with
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
System.out.println(eval.toClassDetailsString());

Resources