Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function

Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function - machine-learning

I trained a Logistic Regression model with PySpark MLlib built-in class LogisticRegression. However, when it was trained, it couldn't be used to predict other dataframes because AttributeError: 'LogisticRegression' object has no attribute 'predictProbability' OR AttributeError: 'LogisticRegression' object has no attribute 'predict'.
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
Properties:
PySpark version:
>>import pyspark
>>pyspark.__version__
3.1.2
JDK version:
>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
Environment: Google Colab

Your code here
model.fit(df_train)
did not actually give you a trained model since the type of variable model is still pyspark.ml.classification.LogisticRegression class
type(model)
# pyspark.ml.classification.LogisticRegression
So, you should catch the returned object by assigning it to a variable or overwriting your model variable, then it will give you the trained logistic regression model of pyspark.ml.classification.LogisticRegressionModel class
model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel
Finally, .predict and .predictProbability methods need an argument of a pyspark.ml.linalg.DenseVector object. So, I think you want to use .transform instead since it will add predicted label and probability as columns to the input dataframe. It would be like this
predicted_df = model.transform(df_val)

Related

Decision Tree classifier throws KeyError: 'log_loss'

I used Decision Tree from sklearn, normally there is log_loss
classifier = DecisionTreeClassifier(random_state = 42,class_weight ='balanced' ,criterion='log_loss')
classifier.fit(X_train, y_train)
error :
KeyError: 'log_loss'

The log_loss option for the parameter criterion was added only in the latest scikit-learn version 1.1.2:
criterion{“gini”, “entropy”, “log_loss”}, default=”gini”
It is not there in either of the two previous ones, version 1.0.2 or version 0.24.2:
criterion{“gini”, “entropy”}, default=”gini”
The error suggests that you are using an older version; you can check your scikit-learn version with
import sklearn
print(sklearn.__version__)
So, you will need to upgrade scikit-learn to v1.1.2.

log_loss criterion is applicable for the case when we have 2 classes in our target column.
Otherwise, if we have more than 2 classes then we can use entropy as our criterion for keeping the same impurity measure.

Exception: The passed model is not callable and cannot be analyzed directly with the given masker

I am dealing with a Regression problem and I used StackingRegressor to train data and then make prediction on test set. For model explainability purpose, I used SHAP as follows:
import xgboost
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor
import shap
# train a model
X, y = shap.datasets.boston()
stkr = StackingRegressor(
estimators = [('xgbr', xgboost.XGBRegressor()), ('rfr', RandomForestRegressor())],
final_estimator = xgboost.XGBRegressor(),
cv = 3
)
model = stkr.fit(X, y)
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.summary_plot(explainer(X), X)
After running this code, I face with the following error:
Exception: The passed model is not callable and cannot be analyzed directly with the given masker! Model: StackingRegressor
I have no idea why I got The passed model is not callable and cannot be analyzed directly with the given masker! Model: StackingRegressor error, while I could use the same code and replace StackingRegressor with RandomForestRegressor or XGBoostRegressor and run it without any problem.
Does anyone have any idea?

I have had the same issue with a different model. The solution that worked for me was to use KernelExplainer instead of explainer. Additionally you need to use the model.predict function instead of just the model. Note that to get the shaps values you need to use KernelExplainer.shap_values() uses a function
So I think this should work:
explainer = shap.KernelExplainer(model.predict, X)
shap_values = explainer.shap_values(X)
shap.summary_plot(shap_values, X_train, plot_type="bar")

Which version of shap are you using?
I just found this error and fixed it by upgrading the version from 0.39.0 to 0.40.0
Not sure can help.

Salvaging a piclked XGBoostClassifier from older major version

TL;DR How to import a pickled XGBoost model from an older major version?
I trained a XGBoost model using version 0.6 using their scikit-learn API so the classifier is of class xgboost.XGBClassifier. I saved that trained model in the pickle format.
However, I need to move my model to an updated version of XGBoost 1.0.
I've tried following their guide on loading/saving model (https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) but it seems like the old XGBClassifer model doesn't have any of those methods.
What do I do with this trained xgboost.XGBClassifier object so I can convert it to be loadable in XGBoost 1.0?

In the old environment (with old xgboost version) you load the pickled model normally, then call the hidden _booster.save_model method:
import pickle as pkl
clf = pkl.load(model,'rb')
clf._booster.save_model('clf.model')
Then in the updated environment (here: with xgboost==1.0) you load the model using the new load_model method:
import xgboost as xgb
clf = xgb.XGBoostClassifier()
clf.load_model('clf.model')
This relies on the guarantee of backward compatibility of XGBoost models (as opposed to lack thereof for serializations as pickled objects) - see docs.

Where should I pass pre trained word embedding in a encoder-decoder architecture?

I have pre-trained word embeddings from two different languages using MUSE. Now suppose I have a encoder-decoder architecture. And I created a embedding layer from one of this embedding. But where do I pass it in the model?
The model is trying to translate from one language to another. I have created a embedding_layer. Where do I pass it in in the below code?
"""
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
"""

Look at the docs of keras: https://keras.io/getting-started/faq/
If you have the whole model saved, you can load the model using the command
keras.models.load_model(filepath)
This is the code example from the Kera docs:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
If you have just only the weights, You can use this command:
model.load_weights('my_model_weights.h5')

WEKA 3.7.10 not compatible format, class index differ

I use weka for text classification, I have a train set and untagged test set, the goal is to classify test set.
In WEKA 3.6.6 everything goes well, I can select Supplied test set and train the model and get result.
On the same files, WEKA 3.7.10 says that
Train and test set are not compatible. Would you like to automatically wrap the classifier in "inputMappedClassifier" before porceeding?
And when I press No it outputs the following error message
Problem evaluating classfier: Train and test are not compatible Class index differ
: 2!= 0
I understand that the key is Class index differ: 2!= 0.
However what does it mean? Why it works in WEKA 3.6.6 and not compatible in WEKA 3.7.10?
How can I make the test set compatible to train set?

When you import the supplied test set, are you selecting the same class attribute as the one that you use in the train set? If you don't change this field, weka selects the last attribute as being the class automatically.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function - machine-learning

Related

Decision Tree classifier throws KeyError: 'log_loss'

Exception: The passed model is not callable and cannot be analyzed directly with the given masker

Salvaging a piclked XGBoostClassifier from older major version

Where should I pass pre trained word embedding in a encoder-decoder architecture?

WEKA 3.7.10 not compatible format, class index differ

Categories

Resources