OperatorNotAllowedInGraphError: Iterating over a symbolic `tf.Tensor` is not allowed when using a dataset with tuples - machine-learning

I am trying to create my own transformer with tensorflow and of course I want to train it. For the purpuse I use dataset to handle my data. The data is created by a code snippet from the tensorflow dataset.from_tensor_slices() method documentation article. Nevertheless, tensorflow is giving me the following error when I call the fit() method:
"OperatorNotAllowedInGraphError: Iterating over a symbolic tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature."
Here is the code that I am using:
import numpy as np
import tensorflow as tf
batched_features = tf.constant([[[1, 3], [2, 3]],
[[2, 1], [1, 2]],
[[3, 3], [3, 2]]], shape=(3, 2, 2))
batched_labels = tf.constant([['A', 'A'],
['B', 'B'],
['A', 'B']], shape=(3, 2, 1))
dataset = tf.data.Dataset.from_tensor_slices((batched_features, batched_labels))
dataset = dataset.batch(1)
for element in dataset.as_numpy_iterator():
print(element)
class MyTransformer(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, inputs, training):
print(type(inputs))
feature, lable = inputs
return feature
model = MyTransformer()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=[tf.keras.metrics.BinaryAccuracy(),
tf.keras.metrics.FalseNegatives()])
model.fit(train_data, batch_size = 1, epochs = 1)
The code is reduced significantly just for the purpuse of reproducing the issue.
I've tried passing the data as dictionary instead of tuple and couple more things but nothing worked. It seems that I am missing something.

Related

Converting keras code to pytorch code with Conv1D layer

I have some keras code that I need to convert to Pytorch. I've done some research but so far I am not able to reproduce the results I got from keras. I have spent many hours on this any tips or help is very appreciated.
Here is the keras code I am dealing with. The input shape is (None, 105, 768) where None is the batch size and I want to apply Conv1D to the input. The desire output in keras is (None, 105)
x = tf.keras.layers.Dropout(0.2)(input)
x = tf.keras.layers.Conv1D(1,1)(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Activation('softmax')(x)
What I've tried, but worse in term of results:
self.conv1d = nn.Conv1d(768, 1, 1)
self.dropout = nn.Dropout(0.2)
self.softmax = nn.Softmax()
def forward(self, input):
x = self.dropout(input)
x = x.view(x.shape[0],x.shape[2],x.shape[1])
x = self.conv1d(x)
x = torch.squeeze(x, 1)
x = self.softmax(x)
The culprit is your attempt to swap the dimensions of the input around, since Keras and PyTorch have different conventions for the dimension order.
x = x.view(x.shape[0],x.shape[2],x.shape[1])
.view() does not swap the dimensions, but changes which part of the data is part of a given dimension. You can consider it as a 1D array, then you decide how many steps you take to cover the dimension. An example makes it much simpler to understand.
# Let's start with a 1D tensor
# That's how the underlying data looks in memory.
x = torch.arange(6)
# => tensor([0, 1, 2, 3, 4, 5])
# How the tensor looks when using Keras' convention (expected input)
keras_version = x.view(2, 3)
# => tensor([[0, 1, 2],
# [3, 4, 5]])
# Vertical isn't swapped with horizontal, but the data is arranged differently
# The numbers are still incrementing from left to right
incorrect_pytorch_version = keras_version.view(3, 2)
# => tensor([[0, 1],
# [2, 3],
# [4, 5]])
To swap the dimensions you need to use torch.transpose.
correct_pytorch_version = keras_version.transpose(0, 1)
# => tensor([[0, 3],
# [1, 4],
# [2, 5]])

How should I split my data for cross validation and grid search?

Should i split my data in to two parts similar in size to use each half for eaxh tasks or i should do grid search on my whole data and then just do cross validation again on my whole data to check my accuracy ?
You need to split the data into test and train (20:80) (eg. test_train_split in sklearn), then run the model with the train data and check the accuracy. If its not what you expect, then you can try applying Hyper parameter Tuning.
You can do this by GridSearchCV, where you need to fit the desired estimator (depending on the type of problem ) and the parameter values.
Attached a sample code :
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_depth': [50, 55, 60, 65],
'max_features': ["auto","sqrt", 2, 3],
'min_samples_leaf': [1, 2, 3],
'min_samples_split': [2, 3, 4],
'n_estimators': [60, 65, 70, 75]
}
grid_search = GridSearchCV(estimator = rfcv, param_grid = param_grid, cv = 3, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, Y_train)
grid_search.best_params_
Based the best parameter results, you can fine tune the grid search.
Eg, if best parameter value is near 60 for n_estimators then you need to change the values as surrounding to 60 like [50,55,60,60]. To figure out the exact value.
Then build the machine learning model based on the best parameters value. Evaluate the train data accuracy and then predict the result using test data values.
rf = rgf(n_estimators = 70, random_state=0, min_samples_split = 2, min_samples_leaf=1, max_features = 'sqrt',bootstrap='True', max_depth=65)
regressor = rf.fit(X_train,Y_train)
pred_tuned = regressor.predict(X_test)
You can find an improvement in your accuracy !!

Why does sklearn Imputer need to fit?

I'm really new in this whole machine learning thing and I'm taking an online course on this subject. In this course, the instructors showed the following piece of code:
imputer = Inputer(missing_values = 'Nan', strategy = 'mean', axis=0)
imputer = Imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
I don't really get why this imputer object needs to fit. I mean, I´m just trying to get rid of missing values in my columns by replacing them with the column mean. From the little I know about programming, this is a pretty simple, iterative procedure, and wouldn´t require a model that has to train on data to be accomplished.
Can someone please explain how this imputer thing works and why it requires training to replace some missing values by the column mean?
I have read sci-kit's documentation, but it just shows how to use the methods, and not why they´re required.
Thank you.
The Imputer fills missing values with some statistics (e.g. mean, median, ...) of the data.
To avoid data leakage during cross-validation, it computes the statistic on the train data during the fit, stores it and uses it on the test data, during the transform.
from sklearn.preprocessing import Imputer
obj = Imputer(strategy='mean')
obj.fit([[1, 2, 3], [2, 3, 4]])
print(obj.statistics_)
# array([ 1.5, 2.5, 3.5])
X = obj.transform([[4, np.nan, 6], [5, 6, np.nan]])
print(X)
# array([[ 4. , 2.5, 6. ],
# [ 5. , 6. , 3.5]])
You can do both steps in one if your train and test data are identical, using fit_transform.
X = obj.fit_transform([[1, 2, np.nan], [2, 3, 4]])
print(X)
# array([[ 1. , 2. , 4. ],
# [ 2. , 3. , 4. ]])
This data leakage issue is important, since the data distribution may change from the training data to the testing data, and you don't want the information of the testing data to be already present during the fit.
See the doc for more information about cross-validation.

How would you do RandomizedSearchCV with VotingClassifier for Sklearn?

I'm trying to tune my voting classifier. I wanted to use randomized search in Sklearn. However how could you set parameter lists for my voting classifier since I currently use two algorithms (different tree algorithms)?
Do I have to separately run randomized search and combine them together in voting classifier later?
Could someone help? Code examples would be highly appreciated :)
Thanks!
You can perfectly combine both, the VotingClassifier with RandomizedSearchCV. No need to run them separately. See the documentation: http://scikit-learn.org/stable/modules/ensemble.html#using-the-votingclassifier-with-gridsearch
The trick is to prefix your params list with your estimator name. For example, if you have created a RandomForest estimator and you created it as ('rf',clf2) then you can set up its parameters in the form <name__param>. Specific example: rf__n_estimators: [20,200], so you refer to a specific estimator and set values to test for a specific param.
Ready to test executable code example ;)
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.grid_search import RandomizedSearchCV
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf1 = DecisionTreeClassifier()
clf2 = RandomForestClassifier(random_state=1)
params = {'dt__max_depth': [5, 10], 'rf__n_estimators': [20, 200],}
eclf = VotingClassifier(estimators=[('dt', clf1), ('rf', clf2)], voting='hard')
random_search = RandomizedSearchCV(eclf, param_distributions=params,n_iter=4)
random_search.fit(X, y)
print(random_search.grid_scores_)

How to use a Gaussian Process for Binary Classification?

I know that a Gaussian Process model is best suited for regression rather than classification. However, I would still like to apply a Gaussian Process to a classification task but I am not sure what is the best way to bin the predictions generated by the model. I have reviewed the Gaussian Process classification example that is available on the scikit-learn website at:
http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html
But I found this example confusing (I have listed the things I found confusing about this example at the end of the question). To try and get a better understanding I have created a very basic python code example using scikit-learn that generates classifications by applying a decision boundary to the predictions made by a gaussian process:
#A minimum example illustrating how to use a
#Gaussian Processes for binary classification
import numpy as np
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.gaussian_process import GaussianProcess
if __name__ == "__main__":
#defines some basic training and test data
#If the descriptive features have large values
#(i.e., 8s and 9s) the target is 1
#If the descriptive features have small values
#(i.e., 2s and 3s) the target is 0
TRAININPUTS = np.array([[8, 9, 9, 9, 9],
[9, 8, 9, 9, 9],
[9, 9, 8, 9, 9],
[9, 9, 9, 8, 9],
[9, 9, 9, 9, 8],
[2, 3, 3, 3, 3],
[3, 2, 3, 3, 3],
[3, 3, 2, 3, 3],
[3, 3, 3, 2, 3],
[3, 3, 3, 3, 2]])
TRAINTARGETS = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
TESTINPUTS = np.array([[8, 8, 9, 9, 9],
[9, 9, 8, 8, 9],
[3, 3, 3, 3, 3],
[3, 2, 3, 2, 3],
[3, 2, 2, 3, 2],
[2, 2, 2, 2, 2]])
TESTTARGETS = np.array([1, 1, 0, 0, 0, 0])
DECISIONBOUNDARY = 0.5
#Fit a gaussian process model to the data
gp = GaussianProcess(theta0=10e-1, random_start=100)
gp.fit(TRAININPUTS, TRAINTARGETS)
#Generate a set of predictions for the test data
y_pred = gp.predict(TESTINPUTS)
print "Predicted Values:"
print y_pred
print "----------------"
#Convert the continuous predictions into the classes
#by splitting on a decision boundary of 0.5
predictions = []
for y in y_pred:
if y > DECISIONBOUNDARY:
predictions.append(1)
else:
predictions.append(0)
print "Binned Predictions (decision boundary = 0.5):"
print predictions
print "----------------"
#print out the confusion matrix specifiy 1 as the positive class
cm = confusion_matrix(TESTTARGETS, predictions, [1, 0])
print "Confusion Matrix (1 as positive class):"
print cm
print "----------------"
print "Classification Report:"
print metrics.classification_report(TESTTARGETS, predictions)
When I run this code I get the following output:
Predicted Values:
[ 0.96914832 0.96914832 -0.03172673 0.03085167 0.06066993 0.11677634]
----------------
Binned Predictions (decision boundary = 0.5):
[1, 1, 0, 0, 0, 0]
----------------
Confusion Matrix (1 as positive class):
[[2 0]
[0 4]]
----------------
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 4
1 1.00 1.00 1.00 2
avg / total 1.00 1.00 1.00 6
The approach used in this basic example seems to work fine with this simple dataset. But this approach is very different from the classification example given on the scikit-lean website that I mentioned above (url repeated here):
http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html
So I'm wondering if I am missing something here. So, I would appreciate if anyone could:
With respect to the classification example given on the scikit-learn website:
1.1 explain what the probabilities being generated in this example are probabilities of? Are they the probability of the query instance belonging to the class >0?
1.2 why the example uses a cumulative density function instead of a probability density function?
1.3 why the example divides the predictions made by the model by the square root of the mean square error before they are input into the cumulative density function?
With respect to the basic code example I have listed here, clarify whether or not applying a simple decision boundary to the predictions generated by a gaussian process model is an appropriate way to do binary classification?
Sorry for such a long question and thanks for any help.
In the GP classifier, a standard GP distribution over functions is "squashed," usually using the standard normal CDF (also called the probit function), to map it to a distribution over binary categories.
Another interpretation of this process is through a hierarchical model (this paper has the derivation), with a hidden variable drawn from a Gaussian Process.
In sklearn's gp library, it looks like the output from y_pred, MSE=gp.predict(xx, eval_MSE=True) are the (approximate) posterior means (y_pred) and posterior variances (MSE) evaluated at points in xx before any squashing occurs.
To obtain the probability that a point from the test set belongs to the positive class, you can convert the normal distribution over y_pred to a binary distribution by applying the Normal CDF (see [this paper again] for details).
The hierarchical model of the probit squashing function is defined by a 0 decision boundary (the standard normal distribution is symmetric around 0, meaning PHI(0)=.5). So you should set DECISIONBOUNDARY=0.

Resources