Log-Likelihood for Random Forest models - random-forest

I'm trying to compare multiple species distribution modeling approaches via k-fold cross-validation. Currently I'm calculating the RSME and AUC to compare model-performance. A friend suggested to further use the sum of log-likelihoods as metric to compare models. However, one of the models is a random forest fitted with the ranger package. If actually possible how would I calculate the log-likelihood for a random forest model and would it actually be a comparable metric to use with other models (GAM, GLM).
Thanks for your help.

Related

Number of Trees in Random Forest Regression

I am learning the Random Forest Regression Model. I know that it forms many Trees(models) and then we can predict our target variables by averaging the result of all Trees. I also have a descent understanding of Decision Tree Regression Algorithm. How can we form the best number of Trees?
For example i have a dataset where i am predicting person salary and i have only two input variables that are 'Years of Experience', 'Performance Score ' then how many random Trees can i form using such dataset? Are Random Forest Trees dependent upon the number of input variables? Any Good Example will highly be appreciated..
Thanks in Advance
A decision tree trains the model on the entire dataset and only one model is created. In random forest, multiple decision trees are created and each decision tree is trained on a subset of data by limiting the number of rows and the features. In your case, you only have two features so the model will create and train data on subset of data.
You can create any number of random trees for your data. Usually in random forest, more trees result in better performance but also more computation time. Experiment with your data and see the performance changes between different number of trees. If performance remains same, then use less trees to have faster computation. You can use grid search for this.
Also you can experiment with other ml models like linear regression, which migh† perform well in your case.

How to use Genetic Algorithm to find weight of voting classifier in WEKA?

I am working from this article: "A novel method for predicting kidney stone type using ensemble learning". The author used a genetic algorithm for finding the optimal weight vector for voting with WEKA, but i don't know see can they did that. How can i use a genetic algorithm to find weight of voting classifier with WEKA?
This below paragraph has been extracted from the article:
In order to enhance the performance of the voting algorithm,a weighted
majority vote is used. Simple majority vote algorithm is usually an
effective way to combine different classifiers, but not all
classifiers have the same effect on the classification problem. To
optimize the results from weight majority vote classifier, we need to
find the optimal weight vector. Applying Genetic algorithms is our
solution for finding the optimal weight vector in this problem.
Assuming you have some trained classifiers and a test set, you can create a method calculateFitness(double[] weights). In this method for each Instance calculate all predictions and a merged prediction according to the weights. Use the combined predictions and the real values to calculate the total score you want to maximize/minimize.
Using the calculateFitness method you can create a custom GA to find best weights.

k-fold cross validation model selection method

I want to know about how we select model from k-fold cross validation method. In k-fold cross validation we can get k models and an accuracy score using the average of k models' accuracies. Can you please provide a method to get the final best model from cross validation?
K-fold cross validation is for comparing the performance of two models not for building models. Say, we desingned two 2 seq2seq generative models with different structures and our dataset is small, and we want to choose one model. We can follow the k-fold cross-validation method and get an average score for each model and choose the superior one with the higher score.
We don't need to choose a model from the k models, but we can ensemble the k modles into one by utilizing bagging(one of the three Ensemble methods). For more information please refer to this blog: Bagging and Random Forest Ensemble Algorithms for Machine Learning.
Reference:
1. https://stats.stackexchange.com/a/52277/103153
2. https://stats.stackexchange.com/a/19053/103153

Can anyone give me some pointers for using SVM for user recognition using keystroke timing?

I am trying to perform user identification using keystroke dynamics. The data consists of the timing of individual keystrokes. I am using an SVM for binary classification. How can I train this for multiple users?
i have times of dynamic keyword, very times of users, example “hello” h->16seg, e->10, l->30, o->20, therefore, i not have class(1pos, -1neg)
SVMs are a binary classifier. However, SVMs do give you a confidence score (a function of distance from the separating hyperplane). So, you can use this information in one of two popular ways to convert a binary classifier into a multiclass classifier. These two ways are One-vs-All and One-vs-One.
See this article on how to use SVMs in a multiclass setting.
For example, in the One vs. All setting, for each class you separate the training data into samples that belong to that class and samples that belong to any other class. Then you fit an SVM on that data. At the end of the day you have k classifiers if you have k classes. Then you run your test data through all k classifiers and return the class with the highest probability (confidence score).

Late fusion step of classification using libLinear

I am doing a classification work that use libLinear as kernel these days.
And have trained two type of feature sets into two models to do prediction for a query input.
Wish to utilize Late Fusion to combine two result from models, I change the code of liblinear that I can get the decision score for different classes. So we got two sets of score to determine which class the query should be in.
Is there any standard way to do this "Late Fusion" or just intuitively add two scores of each classes and choose the class with highest score as candidate?
The standard way to combine multiple classifiers would be a weighted sum of the scores of the individual classifiers. Of course, you then have the problem of specifying the weight coefficients. There are different possibilities:
set weights uniformly
set weights proportional to performance of classifier
train a new classifier which takes the scores as input

Resources