Basic randomforest model not printing default values after fitting [duplicate] - machine-learning

This question already has an answer here:
Model definition does not give any output
(1 answer)
Closed 1 year ago.
My following model :
from sklearn.ensemble import RandomForestClassifier
forest1 = RandomForestClassifier()
forest1.fit(X_train, y_train)
Is not printing the default values of the hyperparameters after learning anymore. I did run this kind of command before and I was showing something, do you know how to see it?

This behavior was changed in 0.23. You can restore the old print-everything behavior globally with:
sklearn.set_config(print_changed_only=False)
or temporarily with a context manager:
with sklearn.config_context(print_changed_only=False):
forest1
or just view the parameters of one estimator with:
forest1.get_params(deep=False)

Related

Catboost's Incremental training with "init_model" fails when not all initial labels are present in new data

catboost python version: 1.0.6
I am training a CatboostClassifier on 10 different output classes, which works fine. Then I'm incrementally training a new classifier using the earlier trained init_model and training on a new training dataset. The catch is that this dataset has only 2 of the original 10 unique labels. Catboost warns me already with: Found only 2 unique classes in the data, but have defined 10 classes. Probably something is wrong with data.
but starts to train fine anyway. Only at the end (I assume when the model gets merged with the original one?) I get the following error message:
Exception has occurred: CatBoostError
CatBoostError: catboost/libs/model/model.cpp:1716: Approx dimensions don't match: 10 != 2
Is it expected behavior that incremental training is not possible on only a subset of the original classes? If yes, then maybe a clearer error message should be given. It would be even better though if the code could handle this case, but maybe I'm overseeing some things that do not allow such functionality.
The similar issue has been posted on github : https://github.com/catboost/catboost/issues/1953

How to plot two sets of high dimensional data in one visualization plot for comparision? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to compare my generated samples (i.e. MNIST digit images) from GAN (Generated Adversarial Network).
For my 1st experiment, the GAN training is not successful, so the generated samples are not similar to real MNIST images.
For my 2nd experiment, the GAN training is very successful, so the generated samples should be overlapped well with real MNIST samples in a visualized plot.
The above examplary figure shows what I hope to achieve:
(1) The first figure shows the original real image distribution
(2) The second figure shows that the results of GAN1 don't overlap well with real data
(3) The third figure shows that the results of GAN2 overlap well with the real data.
Could someone provide some guidance what is a good way to plot something like this with Python, and provide some examplary code?
You can try to use dimensionality reduction methods like PCA, t-SNE, LLE or UMAP to reduce the dimension of your images to 2 and plot the images as you already pointed out.
Here is some example code in python:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
X_real = ... # real images e.g. 1000 images as vectors
X_gan = ... # generated images from GAN with same shape
X = np.vstack([X_real, X_gan]) # stack matrices vertically
X_pca = PCA(n_components=50).fit_transform(X) # for high-dimensional data it's advisible to reduce the dimension first (e.g. 50) before using t-SNE
X_embedded = TSNE(n_components=2).fit_transform(X_pca)
# plot points with corresponding class and method labels
plt.scatter(...)
Instead of t-SNE you can directly use PCA or one of the other methods mentioned above.

What is the CV model of LightGBM (lgb.cv) and how exactly do I use it?

I am able to train a lgmb model using lgb.train and I can do the same with the CV model.
However, I can atleast use the train model for predictions, I am not sure how to understand what the lgb.cv returns.
Check the official documentation here
Specifically, the returned value is the following:
Returns:
eval_hist – Evaluation history. The dictionary has the
following format: {‘metric1-mean’: [values], ‘metric1-stdv’: [values],
‘metric2-mean’: [values], ‘metric2-stdv’: [values], …}.
Return type:
dict
A very similar topic is discussed here: Cross-validation in LightGBM

Is there a way to save the preprocessing objects in scikit-learn? [duplicate]

This question already has answers here:
Save MinMaxScaler model in sklearn
(5 answers)
Saving StandardScaler() model for use on new datasets
(3 answers)
Closed 1 year ago.
I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future, when new data comes in I have to use the same preprocessing scales to transform the new data before putting it into the model. Currently, I have to load all of the old data, fit the preprocessor, then transform the new data with those preprocessors. Is there a way for me to save the preprocessing objects objects (like sklearn.preprocessing.StandardScaler) so that I can just load the old objects rather than have to remake them?
I think besides pickle, you can also use joblib to do this. As stated in Scikit-learn's manual 3.4. Model persistence
In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:
from joblib import dump, load
dump(clf, 'filename.joblib')
Later you can load back the pickled model (possibly in another Python process) with:
clf = load('filename.joblib')
Refer to other posts for more information, Saving StandardScaler() model for use on new datasets, Save MinMaxScaler model in sklearn.
As mentioned by lejlot, you can use the library pickle to save the trained network as a file in your hard drive, then you just need to load it to start to make predictions.
Here is an example on how to use pickle to save and load python objects:
import pickle
import numpy as np
npTest_obj = np.asarray([[1,2,3],[6,5,4],[8,7,9]])
strTest_obj = "pickle example XXXX"
if __name__ == "__main__":
# store object information
pickle.dump(npTest_obj, open("npObject.p", "wb"))
pickle.dump(strTest_obj, open("strObject.p", "wb"))
# read information from file
str_readObj = pickle.load(open("strObject.p","rb"))
np_readObj = pickle.load(open("npObject.p","rb"))
print(str_readObj)
print(np_readObj)

How to reavaluate model in WEKA?

I am trying to solve a numeric classification problem with numeric attributes in WEKA using linear regression and then I want to test my model on the existing dataset with ""re-evaluate model on current test dataset.
As a result of the evaluation I am getting the summary:
Correlation coefficient 0.9924
Mean absolute error 1.1017
Root mean squared error 1.2445
Total Number of Instances 17
But I don't have results as it is shown here: http://weka.wikispaces.com/Making+predictions
How to bring WEKA to the result I need?
Thank you.
To answer my question - for trained and tested model, right click on the model and go to visualize classifier error. there use save option to save actual and predicted values.
Are you using command line interface (CLI) or GUI.
If CLI, the command given in the above link works pretty fine
java weka.classifiers.trees.J48 -T unclassified.arff -l j48.model -p 0
So when you train the model you save it as *.model (j48.model) and later use it to evaluate on test data (unclassified.arff)

Resources