AttributeError in fit_transform(self, raw_documents, y) - machine-learning

I am beginner to Machine Learning. I am using "from sklearn.feature_extraction.text import CountVectorizer to" module and it throws error. I am not sure why it throws error. I am using
Here is the screenshot of error

Related

Exception: The passed model is not callable and cannot be analyzed directly with the given masker

I am dealing with a Regression problem and I used StackingRegressor to train data and then make prediction on test set. For model explainability purpose, I used SHAP as follows:
import xgboost
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor
import shap
# train a model
X, y = shap.datasets.boston()
stkr = StackingRegressor(
estimators = [('xgbr', xgboost.XGBRegressor()), ('rfr', RandomForestRegressor())],
final_estimator = xgboost.XGBRegressor(),
cv = 3
)
model = stkr.fit(X, y)
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.summary_plot(explainer(X), X)
After running this code, I face with the following error:
Exception: The passed model is not callable and cannot be analyzed directly with the given masker! Model: StackingRegressor
I have no idea why I got The passed model is not callable and cannot be analyzed directly with the given masker! Model: StackingRegressor error, while I could use the same code and replace StackingRegressor with RandomForestRegressor or XGBoostRegressor and run it without any problem.
Does anyone have any idea?
I have had the same issue with a different model. The solution that worked for me was to use KernelExplainer instead of explainer. Additionally you need to use the model.predict function instead of just the model. Note that to get the shaps values you need to use KernelExplainer.shap_values() uses a function
So I think this should work:
explainer = shap.KernelExplainer(model.predict, X)
shap_values = explainer.shap_values(X)
shap.summary_plot(shap_values, X_train, plot_type="bar")
Which version of shap are you using?
I just found this error and fixed it by upgrading the version from 0.39.0 to 0.40.0
Not sure can help.

scikit-learn.impute isn't being imported from Imputer via Spyder using the code from Machine Learning A-Z tutorial

My code isn't working that I copied word for word from the Machine Learning A-Z™: Hands-On Python & R In Data Science tutorial course. I am using Python 3.7, I have installed the scikit-learn package in my environment. It isn't working, I have tried looking for a package that has sklearn although it doesn't seem to find anything. It is giving me this error.
I am running my environment through Anaconda.
ImportError: cannot import name 'Imputer' from 'sklearn.preprocessing' (C:\Users\vygan\.conda\envs\env\lib\site-packages\sklearn\preprocessing\__init__.py)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = pd.DataFrame(dataset.iloc[:, :-1].values)
y = pd.DataFrame(dataset.iloc[:, 3].values)
# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
it moved permanently from preprocessing to impute library, u can call it like:
from sklearn.impute import SimpleImputer
it's quite the same.
if it doesn't work, you should uninstall it with pip and then install it again
it may not installed properly for the first time
it doesn't have axis anymore but you could easily handle it with pandas dataframe header like this:
si=SimpleImputer()
si.fit([dataset["headername"]])
there is a strategy parameter that let you choose between "mean", "most_frequent","median" and "constant"
but there is another imputer that I like more:
from sklearn.impute import KNNImputer
which will impute missing values with an average of k nearest neighbors
A more complete answer:
Imputer (https://sklearn.org/modules/generated/sklearn.preprocessing.Imputer.html`)
can be found only in versions 0.19.1 and below.
SimpleImputer appeared at the latest versions and this is what you need.
Try to install the latest version:
pip install -U scikit-learn # or using conda
And then use:
from sklearn.impute import SimpleImputer
Source: https://github.com/mindsdb/lightwood/issues/75
Your code works fine for me. Which sklearn version do you have?
import sklearn
sklearn.__version__
'0.21.3'
You can upgrade packages with conda in the following way:
How to upgrade scikit-learn package in anaconda
I had faced the same problem because the library was changed from preprocessing to impute and the class was changed to SimpleImputer from Imputer.
I changed my code as follows:
from sklearn.impute import SimpleImputer
simp = SimpleImputer(missing_values = 'NaN', strategy = 'mean')
simp = SimpleImputer().fit(X[:, 1:3])
X[:, 1:3] = simp.transform(X[:, 1:3])

Flatten layer incompatible with input

I am trying to run the code
import data_processing as dp
import numpy as np
test_set = dp.read_data("./data2019-12-01.csv")
import tensorflow as tf
import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The test_set in line 3 is of numpy array of shape (3280977,400). I am using keras 2.1.4 and tensorflow 1.5.
However, this puts out the following error
ValueError: Input 0 is incompatible with layer flatten_1: expected min_ndim=3, found ndim=2
How can I solve it? I tried changing the input_shape in flatten layer and also searched on the internet for possible solutions but none of them worked out. Can anyone help me out here? Thanks
After much trial and error, I was able to run the code. This is the code which runs:-
import data_processing as dp
import numpy as np
test_set = np.array(dp.read_data("./datanew.csv"))
print(np.shape(test_set))
import tensorflow as tf
from tensorflow import keras
# import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The change I made is I replaced
import keras
with
from tensorflow import keras
This may work for others also, who are using old versions of tensorflow and keras. I used tensorflow 1.5 and keras 2.1.4 in my code.
Keras and TensorFlow only accept batch input data for prediction.
You must 'simulate' the batch index dimension.
For example, if your data is of shape (M x N), you need to feed at the prediction step a tensor of form (K x M x N), where K is the batch_dimension.
Simulating the batch axis is very easy, you can use numpy to achieve that:
Using: np.expand_dims(axis = 0), for an input tensor of shape M x N, you now have the shape 1 x M x N. This why you get that error, that missing '1' or 'K', the third dimension is that batch_index.

AttributeError occurs while running ResNet50 using Keras on Pycharm

I was using PyCharm and imported ResNet50 for image recognition of a sample image. When I run the code , the following error occured.
I was learning using an online code which needed to be completed by learners. I configured PyCharm and installed required packages that were recommended. During learning image recognition using ResNet50 , while running the code I ended up with following error. Should I custom install ResNet50 on pycharm for this to work? The instructor said the IDE will auto download ResNet50 during execution of code. Attaching python code below.
import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50
model = resnet50.ResNet50
img = image.load_img("bay.jpg", target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = resnet50.preprocess_input(x)
predictions = model.predict(x)
predicted_classes = resnet50.decode_predictions(predictions, top=9)
print("This is an image of:")
for imagenet_id, name, likelihood in predicted_classes[0]:
print(" - {}: {:2f} likelihood".format(name, likelihood))
This is the resultant error that I am getting during execution.
File "/home/warlock/Downloads/Ex_Files_Building_Deep_Learning_Apps/
Exercise Files/05/image_recognition.py", line 21, in <module>
predictions = model.predict(x)
AttributeError: 'function' object has no attribute 'predict'
You have this error because ResNet50 is a fonction so you need to implement it like a fonction :
model = resnet50.ResNet50()
In order to have a resnet50 model with all default parameters

How convert ML VectorUDT features from .mllib to .ml type

Using pySpark ML API in version 2.0.0 for a linear regression simple example, I get an error with new ML library.
The code is:
from pyspark.sql import SQLContext
sqlContext =SQLContext(sc)
from pyspark.mllib.linalg import Vectors
data=sc.parallelize(([1,2],[2,4],[3,6],[4,8]))
def f2Lp(inStr):
return (float(inStr[0]), Vectors.dense(inStr[1]))
Lp = data.map(f2Lp)
testDF=sqlContext.createDataFrame(Lp,["label","features"])
(trainingData, testData) = testDF.randomSplit([0.8,0.2])
from pyspark.ml.regression import LinearRegression
lr=LinearRegression()
model=lr.fit(trainingData)
and error:
IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT#3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT#f71b0bce.'
how I should transform vector features from .mllib to .ml type ?
From Spark2.0 use
from pyspark.ml.linalg import Vectors, VectorUDT
instead of
from pyspark.mllib.linalg import Vectors, VectorUDT
Method 1
This problem can be solved by using the right imports
That is
from pyspark.mllib.linalg import Vectors, VectorUDT
Other way is using a mapper function you convert mllib vector to string and the parse it back to a ml vector.

Resources