AttributeError: 'StandardScaler' object has no attribute 'var_' - machine-learning

I get this error message attempting to use the .var_ and .mean_ attribute of StandardScaler from Sci-Kit Learn. I saw in another SO post that it is no longer supported in newer versions so I downloaded an older one and it did not work either.

From offical documentation the attributes var_ and mean_ are still available on the latest stable version (0.24.1 at the time of this post).
Nevertheless to access var_ and mean_ which respectively returns variance value and mean value the scaler needs to be fitted to your data. Otherwise these attributes won't be available. Also be sure that Scaler parameters with_mean and with_std are set to True.
e.g. :
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
scaler.mean_ ---> AttributeError: 'StandardScaler' object has no attribute 'mean_'
scaler.fit(data)
scaler.mean_ ---> array([0.5, 0.5])

Related

OperatorNotAllowedInGraphError: Iterating over a symbolic `tf.Tensor` is not allowed when using a dataset with tuples

I am trying to create my own transformer with tensorflow and of course I want to train it. For the purpuse I use dataset to handle my data. The data is created by a code snippet from the tensorflow dataset.from_tensor_slices() method documentation article. Nevertheless, tensorflow is giving me the following error when I call the fit() method:
"OperatorNotAllowedInGraphError: Iterating over a symbolic tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature."
Here is the code that I am using:
import numpy as np
import tensorflow as tf
batched_features = tf.constant([[[1, 3], [2, 3]],
[[2, 1], [1, 2]],
[[3, 3], [3, 2]]], shape=(3, 2, 2))
batched_labels = tf.constant([['A', 'A'],
['B', 'B'],
['A', 'B']], shape=(3, 2, 1))
dataset = tf.data.Dataset.from_tensor_slices((batched_features, batched_labels))
dataset = dataset.batch(1)
for element in dataset.as_numpy_iterator():
print(element)
class MyTransformer(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, inputs, training):
print(type(inputs))
feature, lable = inputs
return feature
model = MyTransformer()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=[tf.keras.metrics.BinaryAccuracy(),
tf.keras.metrics.FalseNegatives()])
model.fit(train_data, batch_size = 1, epochs = 1)
The code is reduced significantly just for the purpuse of reproducing the issue.
I've tried passing the data as dictionary instead of tuple and couple more things but nothing worked. It seems that I am missing something.

How to use tf.nn.crelu in tensorflow?

I am trying different activation functions in my simple neural network.
It does not matter using tf.nn.relu, tf.nn.sigmoid,... the network does what it should do.
But if I am using tf.nn.crelu, I have a dimension error.
It returns something like [max, min] and the width dimension is twice bigger.
What do I have to do? Fitting the following weights and biases to the output of crelu?
You're right, if you're building the network manually, you need to adjust the dimensions of the following layer to match tf.nn.crelu output. In this sense, tf.nn.crelu is not interchangeable with tf.nn.relu, tf.nn.elu, etc.
The situation is simpler if you use a high-level API, e.g. tensorflow slim. In this case, the layer functions are taking care of matching dimensions, so you can replace tf.nn.relu easily with tf.nn.crelu in code. However, keep in mind that the network is silently becoming twice as big.
Here's an example:
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.crelu,
normalizer_fn=slim.batch_norm,
normalizer_params={'is_training': is_training, 'decay': 0.95}):
conv1 = slim.conv2d(x_image, 16, [5, 5], scope='conv1')
pool1 = slim.max_pool2d(conv1, [2, 2], scope='pool1')
conv2 = slim.conv2d(pool1, 32, [5, 5], scope='conv2')
pool2 = slim.max_pool2d(conv2, [2, 2], scope='pool2')
flatten = slim.flatten(pool2)
fc = slim.fully_connected(flatten, 1024, scope='fc1')
drop = slim.dropout(fc, keep_prob=keep_prob)
logits = slim.fully_connected(drop, 10, activation_fn=None, scope='logits')
slim.arg_scope simply applies all provided arguments to the underlying layers, in particular activation_fn. Also note activation_fn=None in the last layer to fix the output dimension. Complete code can be found here.

Why does sklearn Imputer need to fit?

I'm really new in this whole machine learning thing and I'm taking an online course on this subject. In this course, the instructors showed the following piece of code:
imputer = Inputer(missing_values = 'Nan', strategy = 'mean', axis=0)
imputer = Imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
I don't really get why this imputer object needs to fit. I mean, I´m just trying to get rid of missing values in my columns by replacing them with the column mean. From the little I know about programming, this is a pretty simple, iterative procedure, and wouldn´t require a model that has to train on data to be accomplished.
Can someone please explain how this imputer thing works and why it requires training to replace some missing values by the column mean?
I have read sci-kit's documentation, but it just shows how to use the methods, and not why they´re required.
Thank you.
The Imputer fills missing values with some statistics (e.g. mean, median, ...) of the data.
To avoid data leakage during cross-validation, it computes the statistic on the train data during the fit, stores it and uses it on the test data, during the transform.
from sklearn.preprocessing import Imputer
obj = Imputer(strategy='mean')
obj.fit([[1, 2, 3], [2, 3, 4]])
print(obj.statistics_)
# array([ 1.5, 2.5, 3.5])
X = obj.transform([[4, np.nan, 6], [5, 6, np.nan]])
print(X)
# array([[ 4. , 2.5, 6. ],
# [ 5. , 6. , 3.5]])
You can do both steps in one if your train and test data are identical, using fit_transform.
X = obj.fit_transform([[1, 2, np.nan], [2, 3, 4]])
print(X)
# array([[ 1. , 2. , 4. ],
# [ 2. , 3. , 4. ]])
This data leakage issue is important, since the data distribution may change from the training data to the testing data, and you don't want the information of the testing data to be already present during the fit.
See the doc for more information about cross-validation.

Placeholder tensors require a value in ml-engine predict but not local predict

I've been developing a model for use with the cloud ML engine's online prediction service. My model contains a placeholder_with_default tensor that I use to hold a threshold for prediction significance.
threshold = tf.placeholder_with_default(0.01, shape=(), name="threshold")
I've noticed that when using local predict:
gcloud ml-engine local predict --json-instances=data.json --model-dir=/my/model/dir
I don't need to supply values for this tensor. e.g. this is a valid input:
{"features": ["a", "b"], "values": [10, 5]}
However when using online predict:
gcloud ml-engine predict --model my_model --version v1 --json-instances data.json
If I use the above JSON I get an error:
{
"error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"input size does not match signature\")"
}
However if I include threshold, then I don't. e.g:
{"features": ["a", "b"], "values": [10, 5], "threshold": 0.01}
Is there a way to have "threshold" be an optional input?
Thanks
Matthew
Looks like currently it's not possible in CloudML. If you're getting predictions from a JSON file, you need to add the default values explicitly (like you did with "threshold": 0.01).
In Python I'm just dynamically adding the required attributes before doing the API request:
def add_empty_fields(instance):
placeholder_defaults = {"str_placeholder": "", "float_placeholder": -1.0}
for ph, default_val in placeholder_defaults.items():
if ph not in instance:
instance[ph] = default_val
which would mutate the instance dict that maps placeholder names to placeholder values. For a model with many optional placeholders, this is a bit nicer than manually setting missing placeholder values for each instance.

How to use a ValidationMonitor for an Estimator in TensorFlow 1.0?

TensorFlow provides the possibility for combining ValidationMonitors with several predefined estimators like tf.contrib.learn.DNNClassifier.
But I want to use a ValidationMonitor for my own estimator which I have created based on 1.
For my own estimator I initialize first a ValidationMonitor:
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(testX,testY,every_n_steps=50)
estimator = tf.contrib.learn.Estimator(model_fn=model,model_dir=direc,config=tf.contrib.learn.RunConfig(save_checkpoints_secs=1))
input_fn = tf.contrib.learn.io.numpy_input_fn({"x": x}, y, 4, num_epochs=1000)
Here I pass the monitor as shown in 2 for tf.contrib.learn.DNNClassifier:
estimator.fit(input_fn=input_fn, steps=1000,monitors=[validation_monitor])
This fails and following error was printed:
ValueError: Features are incompatible with given information. Given features: Tensor("input:0", shape=(?, 1), dtype=float64), required signatures: {'x': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(None)]), is_sparse=False)}.
How can I use monitors for my own estimators?
Thanks.
Problem is solved when passing input_fn containing testX and testY to ValidationMonitor instead of passing the tensors testX and testY directly.
For the record, your error was caused by the fact that ValidationMonitor expects x to be a dictionary like { 'feature_name_as_a_string' : feature_tensor }, which in your input_fn is done internally by the call to tf.contrib.learn.io.numpy_input_fn(...).
More information about how to build features dictionaries can be found in the Building Input Functions with tf.contrib.learn article of the documentation.

Resources