I was looking at this answer to visualize the gradient boosting tree model in H2O, it says the method on GBM can be applied to XGBoost as well:
Finding contribution by each feature into making particular prediction by h2o ensemble model
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html
But when I try to use the method it mentioned on H2O XGBoost MOJO, it fails.
I check the source code of hex.genmodel.tools.PrintMojo:https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java
it seems like it can only work on randomforest and GBM models, but not XGBoost model.
Is there anyone who knows how to visualize trees in H2O XGBoost model? Thanks!
This is a feature H2O is currently adding, you can track its progress here: https://0xdata.atlassian.net/browse/PUBDEV-5743.
Note that in the ticket there is a suggestion in the comments on how to visualize the trees with native xgboost.
I finally found the solution, that seem not documented for XGBoost, but it is indeed the same as for other trees-related algorithms.
Just run this command to generate the first 50 trees from your model:
for tn in {1..50}
do
java -cp h2o-3.24.0.1/h2o.jar hex.genmodel.tools.PrintMojo --tree $tn -i <your mojo model> -o XGBOOST_$tn.gv
dot -Tpng XGBOOST_$tn.gv -o xgboost_$tn.png
done
Related
I'm looking for metrics to compare various regressions models (e.g. SVM, Decision Tree, Neural Network etc), to decide the merits of each for solving a specific problem.
For my problem I have just over 80,000 training samples with 12 variables, all of which are independent and identically distributed.
I've done most of my research into neural networks but I'm drawing a blank when trying to compare them against other models.
Any input (including reading suggestions) would be greatly appreciated, thanks!
You can compare regression models by calculating the mean squared error for each model over a test set. The best model will simply be the one with the least error.
Sadly, there ist nothing like roc curves for regression models. Except your output is a binary variable like with logistic regression.
I would like to do regression on a 13 column data set. The second column is dependent on the rest of the 12 columns. All column contains real number values.
How can I create a neural network using TensorFlow to do the regression? I have tried going through this tutorial but it is too advanced for me.
Thanks in advance for a MWE.
In that tutorial they are using logistic regression, that is a linear binary classifier. They use the class tf.contrib.learn.LinearClassifier as their model.
If you use class tf.contrib.learn.LinearRegressor then you can do linear regression instead of classification.
In that webpage you have tutorials for other models. If you want to create a neural network you have different tutorial in the left menu, for example:
https://www.tensorflow.org/tutorials/mnist/beginners/
In this repository you have python notebooks with the full code of many different neural networks:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/udacity
Could I find an implementation for SVM classifier based on Hidden Markov Model in JAVA ????
In other words, I'm looking for a JAVA implementation of Sequential based classifier for words with Some features in a sentence.
Any Help ??
Thanks
Mallet is a good package for sequence tagging. You can use Mallet-LibSVM to get Support Vector Machines as well.
I am a complete beginner in the field of machine learning. For a project, I have to use a customized loss function in the Random Forest Classification. I have used scikit till now. Suggestions on implementing this through scikit will be more helpful.
Loss functions (Gini impurity and entropy in case of classification trees) are implemented in _tree.pyx cython file in scikit (they're called criteria in the source). You can start by modifying/adding to these functions. If you add your custom loss function (criterion) to the cython file, you also need to expose it in the tree.py python file (look at the CRITERIA_CLF and CRITERIA_REG lists).
I would like to load a model I trained before and then update this model with new training data. But I found this task hard to accomplish.
I have learnt from Weka Wiki that
Classifiers implementing the weka.classifiers.UpdateableClassifier interface can be trained incrementally.
However, the regression model I trained is using weka.classifiers.functions.MultilayerPerceptron classifier which does not implement UpdateableClassifier.
Then I checked the Weka API and it turns out that no regression classifier implements UpdateableClassifier.
How can I train a regression model in Weka, and then update the model later with new training data after loading the model?
I have some data mining experience in Weka as well as in scikit-learn and r and updateble regression models do not exist in weka and scikit-learn as far as I know. Some R libraries however do support updating regression models (take a look at this linear regression model for example: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.html), so if you are free to switching data mining tool this might help you out.
If you need to stick to Weka than I'm afraid that you would probably need to implement such a model yourself, but since I'm not a complete Weka expert please check with the guys at weka list (http://weka.wikispaces.com/Weka+Mailing+List).
The SGD classifier implementation in Weka supports multiple loss functions. Among them are two loss functions that are meant for linear regression, viz. Epsilon insensitive, and Huber loss functions.
Therefore one can use a linear regression trained with SGD as long as either of these two loss functions are used to minimize training error.