Tensorflow 2 XOR implementation - machine-learning

I am a tensorflow newbie and to start with I want to train XOR model giving 4 inputs having 2 values and learn 4 output having 1 value.
Here is what I am doing in TF 2
model = keras.Sequential([
keras.layers.Dense(units=2, activation='relu'),
keras.layers.Dense(units=1, activation='softmax')
history = model.fit(
(tf.cast([[0,0],[0,1],[1,0],[1,1]], tf.float32), tf.cast([0,1,1,0], tf.float32)),
validation_data=(tf.cast([[0.7, 0.7]], tf.float32), tf.cast([0], tf.float32)),
Above code is giving error IndexError: list index out of range
Please help me with this and I want to understand how to come up with shapes to give to model.

You have a problem with assigning your parameters in the fit function in:
history = model.fit(
(tf.cast([[0,0],[0,1],[1,0],[1,1]], tf.float32), tf.cast([0,1,1,0], tf.float32)),
validation_data=(tf.cast([[0.7, 0.7]], tf.float32), tf.cast([0], tf.float32)),
Try and replace that line with this:
x_train = tf.cast([[0,0],[0,1],[1,0],[1,1]], tf.float32)
y_train = tf.cast([0,1,1,0], tf.float32)
x_test = tf.cast([[0.7, 0.7]], tf.float32)
y_test = tf.cast([0], tf.float32)
history = model.fit(
x=x_train, y=y_train,
validation_data=(x_test, y_test),
And your issue should be solved.
PS: Just a suggestion, when you are doing binary classification, try to use sigmoid instead of a softmax, and respectively a BinaryCrossentropy loss instead of CategoricalCrossentropy. Good luck


How to create an ANN regression model with several vectors as input and several vectors as output?

There is one input variable and one output variable.
However each data point of the input/output variable is a vector.
Size of each input vector is 141X1 and size of each output vector is 400X1.
I have attached the input data file(Ivec.xls) and output data file(Ovec.xls)
data link
For training:
Input vectors: Ivec(:,1:9) and output vectors: Ovec(:,1:9)
For testing:
Input vector: Ivec(:,10) and the predicted_Ovec10 can be compared with Ovec(:,10)
to know the performance of the model.
How to create a regression model from this?
dataset_Ivec = pd.read_excel(r'Ivec.xls',header = None)
dataset_Ovec = pd.read_excel(r'Ovec.xls',header = None)
dataset_Ivec_numpy = dataset_Ivec.to_numpy()
dataset_Ovec_numpy = dataset_Ovec.to_numpy()
X_train = dataset_Ivec_numpy[:,:-1]
y_train = dataset_Ovec_numpy[:,:-1]
X_test = dataset_Ivec_numpy[:,9]
y_test = dataset_Ovec_numpy[:,9]
model = Sequential()
model.add(Dense(activation="relu", input_dim=X_train.shape[0], units = X_train.shape[1], kernel_initializer="uniform"))
model.add(Dense(activation="linear", input_dim=y_train.shape[1], units = X_train.shape[1], kernel_initializer="uniform"))
model.compile(optimizer="adagrad", loss="mean_squared_error", metrics=["accuracy"])
# model = baseline_model()
result = model.fit(X_train, y_train, batch_size=2, epochs=20, validation_data=(X_test, y_test))
I tried to write this code, however, it is too confusing for me.
And now I am quite stuck for many days. Please help someone.

Size Mismatch using pytorch when trying to train data

I am really new to pytorch and just trying to use my own dataset to do a simple Linear Regression Model. I am only using the numbers values as inputs, too.
I have imported the data from the CSV
dataset = pd.read_csv('mlb_games_overview.csv')
I have split the data into four parts X_train, X_test, y_train, y_test
X = dataset.drop(['date', 'team', 'runs', 'win'], 1)
y = dataset['win']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=True)
I have converted the data to pytorch tensors
X_train = torch.from_numpy(np.array(X_train))
X_test = torch.from_numpy(np.array(X_test))
y_train = torch.from_numpy(np.array(y_train))
y_test = torch.from_numpy(np.array(y_test))
I have created a LinearRegressionModel
class LinearRegressionModel(torch.nn.Module):
def __init__(self):
super(LinearRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
I have initialized the optimizer and the loss function
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Now when I start to train the data I get the runtime error mismatch
EPOCHS = 500
for epoch in range(EPOCHS):
pred_y = model(X_train) # RUNTIME ERROR HERE
loss = criterion(pred_y, y_train)
optimizer.zero_grad() # zero out gradients to update parameters correctly
loss.backward() # backpropagation
optimizer.step() # update weights
print('epoch {}, loss {}'. format(epoch, loss.data[0]))
Error Log:
RuntimeError Traceback (most recent call last)
<ipython-input-40-c0474231d515> in <module>
1 EPOCHS = 500
2 for epoch in range(EPOCHS):
----> 3 pred_y = model(X_train)
4 loss = criterion(pred_y, y_train)
5 optimizer.zero_grad() # zero out gradients to update parameters correctly
RuntimeError: size mismatch, m1: [3540 x 8], m2: [1 x 1] at
In your Linear Regression model, you have:
self.linear = torch.nn.Linear(1, 1)
But your training data (X_train) shape is 3540 x 8 which means you have 8 features representing each input example. So, you should define the linear layer as follows.
self.linear = torch.nn.Linear(8, 1)
A linear layer in PyTorch has parameters, W and b. If you set the in_features to 8 and out_features to 1, then the shape of the W matrix will be 1 x 8 and the length of b vector will be 1.
Since your training data shape is 3540 x 8, you can perform the following operation.
linear_out = X_train W_T + b
I hope it clarifies your confusion.

Keras accuracy metrics differ from manual computation

I am working on a binary classification problem on Keras. The loss function I use is binary_crossentropy and metrics is metrics=['accuracy']. Since two classes are imbalanced, I use class_weight='auto' when I fit training data set to the model.
To see the performance, I print out the accuracy by
print GNN.model.test_on_batch([test_sample_1, test_sample_2], test_label)[1]
The output is 0.973. But this result is different when I use following lines to get the prediction accuracy
predict_label = GNN.model.predict([test_sample_1, test_sample_2])
rounded = predict_label.round(1)
print (rounded == test_label).sum()/float(rounded.shape[0])
which is 0.953.
So I am wondering how metrics=['accuracy'] evaluate the model performance and why the result is different.
For details, I attached the model summary below.
input_size = self.n_feature
encoder_size = 2000
dropout_rate = 0.5
X1 = Input(shape=(input_size, ), name='input_1')
X2 = Input(shape=(input_size, ), name='input_2')
encoder = Sequential()
encoder.add(Dropout(dropout_rate, input_shape=(input_size, )))
encoder.add(Dense(encoder_size, activation='tanh'))
encoded_1 = encoder(X1)
encoded_2 = encoder(X2)
merged = concatenate([encoded_1, encoded_2])
comparer = Sequential()
comparer.add(Dropout(dropout_rate, input_shape=(encoder_size * 2, )))
comparer.add(Dense(500, activation='relu'))
comparer.add(Dense(200, activation='relu'))
comparer.add(Dense(1, activation='sigmoid'))
Y = comparer(merged)
model = Model(inputs=[X1, X2], outputs=Y)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
self.model = model
And I train model by
self.hist = self.model.fit(
x=[train_sample_1, train_sample_2],
class_weight = 'auto',

Keras - Image to Word (OCR)

I am trying to build a very simple OCR for start my tests on bigger models. The problem here is that I can't figure out how should be my output data for my training
def simple_model():
output = 28
if K.image_data_format() == 'channels_first':
input_shape = (1, input_height, input_width)
input_shape = (input_height, input_width, 1)
conv_to_rnn_dims = (input_width // (2), (input_height // (2)) * conv_blades)
model = Sequential()
model.add(Conv2D(conv_blades, (3, 3), input_shape=input_shape, padding='same'))
model.add(MaxPooling2D(pool_size=(2,2), name='max2'))
model.add(Reshape(target_shape=conv_to_rnn_dims, name='reshape'))
model.add(GRU(64, return_sequences=True, kernel_initializer='he_normal', name='gru1'))
model.add(TimeDistributed(Dense(output, kernel_initializer='he_normal', name='dense2')))
model.add(Activation('softmax', name='softmax'))
return model
img = load_img('exit.png', grayscale=True, target_size=[input_height, input_width])
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
y = np.array(['exit'])
model = simple_model()
model.fit(x, y, batch_size=1,
validation_data=(x, y),
print model.predict(y)
Image Example:
(source: exitfest.org)
When I run this code, I get the following error:
ValueError: Error when checking target: expected softmax to have 3 dimensions, but got array with shape (1, 1)
Note 1: I know I can't train my model with only one image and one label, I am aware and I have a bunch more images like that, but first I need to run this simple model before improve it.
Note 2: this is the first time I work with Image-to-Sequence output, it may have other problems, so feel free to change the code if there is this kind of mistake.
Well, as I haven't received any answer, I will link to the answer I posted in another question
Here I explain how to use the keras OCR example and answer some other questions.

Cross validation in classifying text documents using scikit-learn

Do you first do cross validation followed by feature extraction or the other way while classifying text documents using scikit-learn?
Here is my pipeline:
union = FeatureUnion(
transformer_list = [
('tfidf', TfidfVectorizer()),
('featureEx', FeatureExtractor()),
('spell_chker', Spellingchecker()),
], n_jobs = -1)
I am doing it in the following way, but I wonder if I should extract the features first and do the cross validation. In this example X is list of documents and y is label.
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size= 0.2)
X_train = union.fit_transform(X_train)
X_test = union.transform(X_test)
ch2 = SelectKBest(f_classif, k = 7000)
X_train = ch2.fit_transform(X_train, y_train)
X_test = ch2.transform(X_test)
clf = SVC(C=1, gamma=0.001, kernel = 'linear', probability=True).fit(
X_train , y_train)
print("classification report:")
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
Doing the feature selection and then cross validating on those features is sometimes common on text data, but it is less desirable. This can lead to over-fitting and the cross-validation procedure may over-estimate your true accuracy.
When you do the feature selection first, that feauter selection process got to look at all the data. The point of cross validation is to hide 1 fold from the others. By doing the FS first, you leak some of that data knowledge to the other folds.
