Salvaging a piclked XGBoostClassifier from older major version - machine-learning

TL;DR How to import a pickled XGBoost model from an older major version?
I trained a XGBoost model using version 0.6 using their scikit-learn API so the classifier is of class xgboost.XGBClassifier. I saved that trained model in the pickle format.
However, I need to move my model to an updated version of XGBoost 1.0.
I've tried following their guide on loading/saving model (https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) but it seems like the old XGBClassifer model doesn't have any of those methods.
What do I do with this trained xgboost.XGBClassifier object so I can convert it to be loadable in XGBoost 1.0?

In the old environment (with old xgboost version) you load the pickled model normally, then call the hidden _booster.save_model method:
import pickle as pkl
clf = pkl.load(model,'rb')
clf._booster.save_model('clf.model')
Then in the updated environment (here: with xgboost==1.0) you load the model using the new load_model method:
import xgboost as xgb
clf = xgb.XGBoostClassifier()
clf.load_model('clf.model')
This relies on the guarantee of backward compatibility of XGBoost models (as opposed to lack thereof for serializations as pickled objects) - see docs.

Related

How to load pretrained weights in npz format into tensorflow 1.x model

I am unable to restore flownet 2 weights which is present in npz format into my tensorflow 1.x implementation. I need to use it without tensorlayer.

How to export/save/load the actual AutoKeras "super" model, not the underlying tensorflow model

Is there a way to export/save/load a previously trained autokeras model? I understand I can use the following code to save/load the underlying tensorflow best model:
model = reg.export_model()
model.save(MODEL_FILEPATH, save_format="tf")
best_model = load_model(MODEL_FILEPATH, custom_objects=ak.CUSTOM_OBJECTS)
However, in practice that wouldn't work, since my data has been fitted by autokeras, which takes care of data preparation and scaling. I don't think I have access to what autokeras is doing to the input data (X) before actually fitting, so I can't actually use the exported tensorflow best model to predict labels for new samples with un-prepared and unscaled features.
Am I missing something major here?
Also I noticed that there are some binaries in the autokeras temporary dir. That dir seems to be generated automatically. Is there a way to use that dir to load the previously-fit autokeras "super" model?
Just using import pickle will do the job - https://github.com/keras-team/autokeras/issues/1081#issuecomment-645508111 :

Does Scikit-learn support transfer learning?

Does Scikit-learn support transfer learning? Please check the following code.
model clf is gotten by fit(X,y)
Can model clf2 learn on the base of clf and transfer learn by fit(X2,y2) ?
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> X, y= ....
>>> clf.fit(X, y)
SVC()
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.fit(X2,y2)
>>> clf2.predict(X[0:1])
In the context of scikit-learn there's no transfer learning as such, there is incremental learning or continuous learning or online learning.
By looking at your code, whatever you're intending to do won't work the way you're thinking here. From this scikit-learn documentation:
Calling fit() more than once will overwrite what was learned by any
previous fit()
Which means using fit() more than once on the same model will simply overwrite all the previously fitted coefficients, weights, intercept (bias), etc.
However if you want to fit a portion of your data set and then improve your model by fitting a new data, what you can do is look for estimators that include partial_fit API implementation.
If we call partial_fit() multiple times, framework will update the
existing weights instead of re-initialising them.
Another way to do incremental learning with scikit-learn is to look for algorithms that support the warm_start parameter.
From this doc:
warm_start: bool, default=False
When set to True, reuse the solution of
the previous call to fit() as initialization, otherwise, just erase the
previous solution. Useless for liblinear solver.
Another example is Random forrest regressor.

Is there a way to save the preprocessing objects in scikit-learn? [duplicate]

This question already has answers here:
Save MinMaxScaler model in sklearn
(5 answers)
Saving StandardScaler() model for use on new datasets
(3 answers)
Closed 1 year ago.
I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future, when new data comes in I have to use the same preprocessing scales to transform the new data before putting it into the model. Currently, I have to load all of the old data, fit the preprocessor, then transform the new data with those preprocessors. Is there a way for me to save the preprocessing objects objects (like sklearn.preprocessing.StandardScaler) so that I can just load the old objects rather than have to remake them?
I think besides pickle, you can also use joblib to do this. As stated in Scikit-learn's manual 3.4. Model persistence
In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:
from joblib import dump, load
dump(clf, 'filename.joblib')
Later you can load back the pickled model (possibly in another Python process) with:
clf = load('filename.joblib')
Refer to other posts for more information, Saving StandardScaler() model for use on new datasets, Save MinMaxScaler model in sklearn.
As mentioned by lejlot, you can use the library pickle to save the trained network as a file in your hard drive, then you just need to load it to start to make predictions.
Here is an example on how to use pickle to save and load python objects:
import pickle
import numpy as np
npTest_obj = np.asarray([[1,2,3],[6,5,4],[8,7,9]])
strTest_obj = "pickle example XXXX"
if __name__ == "__main__":
# store object information
pickle.dump(npTest_obj, open("npObject.p", "wb"))
pickle.dump(strTest_obj, open("strObject.p", "wb"))
# read information from file
str_readObj = pickle.load(open("strObject.p","rb"))
np_readObj = pickle.load(open("npObject.p","rb"))
print(str_readObj)
print(np_readObj)

Can I remove layers in a pre-trained Keras model?

I'm importing a pre-trained VGG model in Keras, with
from keras.applications.vgg16 import VGG16
I've noticed that the type of a standard model is keras.models.Sequential, while a pre-trained model is keras.engine.training.Model. I usually add and remove layers with add and pop for sequential models respectively, however, I cannot seem to use pop with pre-trained models.
Is there an alternative to pop for these type of models?
Depends on what you're wanting to remove. If you want to remove the last softmax layer and use the model for transfer learning, you can pass the include_top=False kwarg into the model like so:
from keras.applications.vgg16 import VGG16
IN_SHAPE = (256, 256, 3) # image dimensions and RGB channels
pretrained_model = VGG16(
include_top=False,
input_shape=IN_SHAPE,
weights='imagenet'
)
I wrote a blog post on this use case recently that has some code examples and goes into a bit more detail: http://innolitics.com/10x/pretrained-models-with-keras/
If you're wanting to modify the model architecture more than that, you can access the pop() method via pretrained_model.layers.pop(), as is explained in the link #indraforyou posted.
Side note: When you're modifying layers in a pretrained model, it can be especially helpful to have a visualization of the structure and input/output shapes. pydot and graphviz is particularly useful for this:
import pydot
pydot.find_graphviz = lambda: True
from keras.utils import plot_model
plot_model(model, show_shapes=True, to_file='../model_pdf/{}.pdf'.format(model_name))

Resources