Validation accuracy and validation loss almost remains constant in every epoch - machine-learning

I am making an autonomous farming robot for my final year project. I want to move it autonomously in lanes in side the farms. I am just using the raspberry pi image in front of my vehicle. I collect my data through pi and then send it to my computer for training.
Initially i have just trained it for moving in a straight line. As i have not used encoders in my motors so there is a possibility of its being diverging along one direction , so i have to constantly give it the feedback to stay on the right path.
Sample image is as follows, Note this is black and white image :enter image description here
I have 836 images for training and 356 for validation. When i am trying to train it, my model accuracy doesnot improves much. I have tried changing different structures, from fully connected layers to different convolutional layers, my training accuracy doesnot improves much and perhaps most of the times validation accuracy and validation loss remains same.
I am confused that why is this so, is this to do with my code or should i apply computer vision techniques on the image so that features are more prominently visible. What should be the best approach to tackle this problem.
My code is as follows:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
# fix dimension ordering issue
from keras import backend as K
import numpy as np
import glob
import pandas as pd
from sklearn.model_selection import train_test_split
K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
def load_data(path):
print("Loading training data...")
training_data = glob.glob(path)[0]
data=np.load(training_data)
a=data['train']
b=data['train_labels']
s=np.concatenate((a, b), axis=1)
data=pd.DataFrame(s)
data=data.sample(frac=1)
X = data.iloc[:,:-4]
y=data.iloc[:,-4:]
print("Image array shape: ", X.shape)
print("Label array shape: ", y.shape)
# normalize data
# train validation split, 7:3
return train_test_split(X, y, test_size=0.3)
data_path = "*.npz"
X_train,X_test,y_train,y_test=load_data(data_path)
# reshape to be [samples][channels][width][height]
X_train = X_train.values.reshape(X_train.shape[0], 1, 120, 320).astype('float32')
X_test = X_test.values.reshape(X_test.shape[0], 1, 120, 320).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# one hot encode outputs
num_classes = y_test.shape[1]
# define a simple CNN model
def baseline_model():
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 120, 320), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=10)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
sample output: This is the best output and it is of the above code:
enter image description here

I solved this problem by changing the structure of my algorithm and using NVIDIA's deep learning car algorithm to solve this problem. The algorithm is very robust and applies basic computer vision also on it. You can easily find sample implementation for toy cars on medium/youtube also.
this article was really helpful for me:
https://towardsdatascience.com/deeppicar-part-1-102e03c83f2c
additionally this resource was also very helpful:
https://zhengludwig.wordpress.com/projects/self-driving-rc-car/

Related

CNN on tfidf as input

I am working on fake news detection using CNN, I am new to ccoding CNNs in keras and tensorflow. I need help regarding creating a CNN that takes input as statements in form of vectors each of length 100 and outputs 0 or 1 depending on its predicted value as false or true.
X_train.shape
# 10229, 100
X_train = np.expand_dims(X_train, axis=2)
X_train.shape
# 10229,100,1
# actual cnn model here
import tensorflow as tf
from tensorflow.keras import layers
# Conv1D + global max pooling
from keras.layers import Conv1D, MaxPooling1D, Embedding, Dropout, Flatten, Dense
from keras.layers import Input
text_len=100
from keras.models import Model
inp = Input(batch_shape=(None, text_len, 1))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
flat2 = Flatten()(pool2)
out = Dense(1, activation='softmax')(flat2)
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, Y_train)
I will really appreciate if someone could give me a working code for this with a little bit of explaination
in this dummy example, I use a Conv1D with 2D features. The Conv1D accepts as input sequences in 3D format (n_samples, time_steps, features). If you are using 2D features you have to adapt it to 3D. the normal choice is to consider your features as is expanding simply the temporal dimension (expand_dims on axis 1) there is no reason to assume positional/temporal pattern on tfidf/one-hot features.
When you build your NN you start with 3D dimension and you have to pass in 2D. to pass from to 3D to 2D there are lot of possibilities, the post simple is flattening, with 1 temporal dim a pooling layer is useless. if u are using softmax as last activation layer remember to pass to your dense layer a dimensionality equal to the number of your classes
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
## define variable
n_sample = 10229
text_len = 100
## create dummy data
X_train = np.random.uniform(0,1, (n_sample,text_len))
y_train = np.random.randint(0,2, n_sample)
## expand train dimnesion: pass from 2d to 3d
X_train = np.expand_dims(X_train, axis=1)
print(X_train.shape, y_train.shape)
## create model
inp = Input(shape=(1,text_len))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu', padding='same')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu', padding='same')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = Flatten()(drop22) # this is an option to pass from 3d to 2d
out = Dense(2, activation='softmax')(pool2) # the output dim must be equal to the num of class if u use softmax
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=5)

How to interpret MSE in Keras Regressor

I am trying to build a model to predict house prices.
I have some features X (no. of bathrooms , etc.) and target Y (ranging around $300,000 to $800,000)
I have used sklearn's Standard Scaler to standardize Y before fitting it to the model.
Here is my Keras model:
def build_model():
model = Sequential()
model.add(Dense(36, input_dim=36, activation='relu'))
model.add(Dense(18, input_dim=36, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mae','mse'])
return model
I am having trouble trying to interpret the results -- what does a MSE of 0.617454319755 mean?
Do I have to inverse transform this number, and square root the results, getting an error rate of 741.55 in dollars?
math.sqrt(sc.inverse_transform([mse]))
I apologise for sounding silly as I am starting out!
I apologise for sounding silly as I am starting out!
Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions.
Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what you have to do is:
Transform back your predictions to the initial scale of the original data
Get the MSE between these invert-transformed predictions and the original data
Take the square root of the result
in order to get a performance indicator of your model that will be meaningful in the business context of your problem (e.g. US dollars here).
Let's see a quick example with toy data, omitting the model itself (which is irrelevant here, and in fact can be any regression model - not only a Keras one):
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# toy data
X = np.array([[1,2], [3,4], [5,6], [7,8], [9,10]])
Y = np.array([3, 4, 5, 6, 7])
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
# outcome scaling:
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y.reshape(-1, 1))
Y_train
# array([[-1.41421356],
# [-0.70710678],
# [ 0. ],
# [ 0.70710678],
# [ 1.41421356]])
Now, let's say that we fit our Keras model (not shown here) using the scaled sets X_train and Y_train, and get predictions on the training set:
prediction = model.predict(X_train) # scaled inputs here
print(prediction)
# [-1.4687586 -0.6596055 0.14954728 0.95870024 1.001172 ]
The MSE reported by Keras is actually the scaled MSE, i.e.:
MSE_scaled = mean_squared_error(Y_train, prediction)
MSE_scaled
# 0.052299712818541934
while the 3 steps I have described above are simply:
MSE = mean_squared_error(Y, sc_Y.inverse_transform(prediction)) # first 2 steps, combined
MSE
# 0.10459946572909758
np.sqrt(MSE) # 3rd step
# 0.323418406602187
So, in our case, if our initial Y were US dollars, the actual error in the same units (dollars) would be 0.32 (dollars).
Notice how the naive approach of inverse-transforming the scaled MSE would give a very different (and incorrect) result:
np.sqrt(sc_Y.inverse_transform([MSE_scaled]))
# array([2.25254588])
MSE is mean square error, here is the formula.
Basically it is a mean of square of different of expected output and prediction. Making square root of this will not give you the difference between error and output. This is useful for training.
Currently you have build a model.
If you want to train the model use these function.
mode.fit(x=input_x_array, y=input_y_array, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
If you want to do prediction of the output you should use following code.
prediction = model.predict(np.array(input_x_array))
print(prediction)
You can find more details here.
https://keras.io/models/about-keras-models/
https://keras.io/models/sequential/

Wrong predictions with own MNIST-like image data in trained CNN model

I have made a CNN model to predict number, trained by MNIST data. Using keras wrapper for tensorflow. I am having trouble in predicting my own input data. I have trained the model with epochs =100 and tested model with test set of MNIST which is working fine with accuracy of 97% approx. I have saved this model as 'my_model_conv2d.h5'.
1st Code:
# importing modules
import numpy as np
import keras
import matplotlib.pyplot as plt
from keras.optimizers import SGD
from keras.models import Sequential
from keras.layers import Dense,Activation, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.datasets import mnist
from keras.utils import np_utils
plt.rcParams['figure.figsize'] = (7,7)
#reading MNIST data
(X_train,y_train),(X_test,y_test)=mnist.load_data('mnist.npz')
print X_train.shape, " train datad shape"
print y_train.shape, " labels shape"
#copying data for plotting
test=X_test.copy()
#reshaping data type according to the tensorflow param.
X_train=X_train.reshape(X_train.shape[0],X_train.shape[1],X_train.shape[2],1)
X_test=X_test.reshape(X_test.shape[0],X_test.shape[1],X_test.shape[2],1)
input_shape=(X_train.shape[1],X_train.shape[2],1)
X_train=X_train.astype('float32')
X_test=X_test.astype('float32')
X_train/=255
X_test/=255
# one_hot vector
Y_train=np_utils.to_categorical(y_train,10)
Y_test=np_utils.to_categorical(y_test,10)
print X_train.shape, " train datad shape"
print y_train.shape, " labels shape"
#building CNN model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
#optimizer type initialization
sgd=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#compiling model
model.compile(loss='categorical_crossentropy', optimizer='sgd')
#training model
model.fit(X_train,Y_train, batch_size=128,epochs=100, verbose=1,validation_data=(X_test,Y_test))
#evaluating
score=model.evaluate(X_test,Y_test,verbose=1)
print score, "score"
#predicting classes using MNIST test data
predicted_classes=model.predict_classes(X_test)
print predicted_classes.shape, "predicted_classes.shape"
correct_indices = np.nonzero(predicted_classes == y_test)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]
#saving model for future use
model.save('my_model_conv2d.h5')
print len(correct_indices), "no . of correct samples"
print len(incorrect_indices), "no . of incorrect samples"
print str((len(incorrect_indices)/float(len(y_test)))*100)+'%', "error percentage"
#plotting graph of predicted test data
plt.figure()
for i, correct in enumerate(correct_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(test[correct], cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))
plt.show()
plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(test[incorrect], cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))
plt.show()
I have made second python code which opens the model 'my_model_conv2d.h5'. In this code, I have also made an interactive window to draw a number, which is used as an input image for the prediction. I have taken care of background and font color of image and also the size (28,28), approximately it's similar to MNIST data.
2nd Code:
import cv2
import numpy as np
from keras.models import Sequential,load_model
drawing=False
mode=True
#function for mouse events
def interactive_drawing(event,x,y,flags,param):
global ix,iy,drawing, mode
if event==cv2.EVENT_LBUTTONDOWN:
drawing=True
ix,iy=x,y
print "EVENT_LBUTTONDOWN"
elif event==cv2.EVENT_MOUSEMOVE:
if drawing==True:
print "EVENT_MOUSEMOVE"
if mode==True:
print "EVENT_MOUSEMOVE"
cv2.line(img,(ix,iy),(x,y),1,1)
ix=x
iy=y
elif event==cv2.EVENT_LBUTTONUP:
drawing=False
print "EVENT_LBUTTONUP"
if mode==True:
print "EVENT_LBUTTONUP"
cv2.line(img,(ix,iy),(x,y),1,1)
ix=x
iy=y
return x,y
#function for predicting number
def cnn(img):
im=img.copy()
#loading cnn model
model = load_model('my_model_conv2d.h5')
img=img.reshape(1,28,28,1).astype('float32')
#prediction of class using drawn image as an input
predicted_class=model.predict_classes(img)
print predicted_class, "class"
cv2.imwrite('actual=9:predicted='+str(predicted_class)+'.jpg',im*255)
#image same as mnist image
img = np.zeros((28,28), 'float32')
cv2.namedWindow('Drawing_window')
cv2.setMouseCallback('Drawing_window',interactive_drawing)
while(1):
cv2.imshow('Drawing_window',img*255)
k=cv2.waitKey(1)&0xFF
if k==27:
break
cv2.destroyAllWindows()
#calling cnn function for prediction
cnn(img)
Most of the predictions are wrong with my input data. Also, I have made fully connected layer before implementing CNN, but the result was same. So, I tried CNN but the problem is same. I have checked the test data of MNIST in the second code which is working fine. You can check the result here https://drive.google.com/open?id=1E26zinOLrMw7XKhsHd9vnSYoHXzadiJ3 . Please check the image name, name of the image will suggest actual and predicted value.
Where I am doing wrong please suggest.
You need to use the exact same preprocessing applied to the images during training to images during inference.
This is why your program normalizes the train and test data in the same way.
X_train/=255
X_test/=255
This is also true to any other preprocessing you might want to do such as PCA or normalization by z-score.
So make sure that, in your case, your input image is in the range (0,1) by dividing it by 255 (if a range of 0 to 255 is to be expected from that opencv call)
Edit:
I just went ahead and trained the model and tried your program. Indeed it seems to make alot of mistakes (much more that what's to be expected since it got 1% validation error).
My guess is that since mnist is already somewhat preprocessed,
The digits have been size-normalized and centered in a fixed-size
image.
your model might expect your test images to be preprocessed as well.
I would recommend then using some mnist variant with more noise/rotation that might mitigate the problem. Eg: mnist-rot,mnist-back-rand.

How to avoid overfitting on a simple feed forward network

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:
# Visualize training history
from keras import callbacks
from keras.layers import Dropout
tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
write_graph=True, write_grads=True, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
# Compile model
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.
What else could I do to optimize this network?
UPDATE:
based on the comments I got I've tweaked the code like so:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01), activation='relu',
name='first_input')) # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden')) # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Here are the graphs for 500 epochs
The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.
The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.
The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.
So how to avoid over-fitting in this particular case? Adopt early stopping.
CODE CHANGES: To include early stopping and input scaling.
# input scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Early stopping
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')
# create model - almost the same code
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
model.add(Dense(500, activation='relu', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer')))
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])
The Accuracy and loss graphs:
First, try adding some regularization (https://keras.io/regularizers/) like with this code:
model.add(Dense(12, input_dim=12,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Also, make sure to decrease your network size i.e. you don't need a hidden layer of 500 neurons - try just taking that out to decrease the representation power and maybe even another layer if it's still overfitting. Also, only use relu activation. Maybe also try increasing your dropout rate to something like 0.75 (although it's already high). You probably also don't need to run it for so many epochs - it will just begin to overfit after long enough.
For a dataset like the Diabetes one you can use a much simpler network. Try to reduce the neurons in your second layer. (Is there a specific reason why you chose tanh as the activation there?).
In addition you simply can add an EarlyStopping callback to your training: https://keras.io/callbacks/

Keras model creates linear classification for make_moons data

I'm trying to reproduce the model in this WildML - Implementing a Neural Network From Scratch tutorial but using Keras instead. I've tried to use all of the same configurations as the tutorial, but I keep getting a linear classification even after tweaking the number of epochs, batch sizes, activation functions, and number of units in the hidden layer:
Here's my code:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils.visualize_util import plot
from keras.utils.np_utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt
import sklearn
from sklearn import datasets, linear_model
# Build model
model = Sequential()
model.add(Dense(input_dim=2, output_dim=3, activation="tanh", init="normal"))
model.add(Dense(output_dim=2, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
# Train
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.20)
y_binary = to_categorical(y)
model.fit(X, y_binary, nb_epoch=100)
# Helper function to plot a decision boundary.
# If you don't fully understand this function don't worry, it just generates the contour plot below.
def plot_decision_boundary(pred_func):
# Set min and max values and give it some padding
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = 0.01
# Generate a grid of points with distance h between them
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Predict the function value for the whole gid
Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the contour and training examples
plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
# Predict and plot
plot_decision_boundary(lambda x: model.predict_classes(x, batch_size=200))
plt.title("Decision Boundary for hidden layer size 3")
plt.show()
I believe I figured out the problem. If I remove the np.random.seed(0) and train for 2000 epochs, the output isn't always linear and occasionally gets to higher 90%+ accuracy:
It must have been that np.random.seed(0) led to the parameters being seeded poorly, and since I was fixing the random seeding I would see the same graph every time.
I think I solved this problem, but I do not know why it should be solved. If you change the output layer's activation function to 'sigmoid' instead of 'softmax', the system will work.
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics= . ['accuracy'])
From this, I can achieve an accuracy of 95% or greater. If I leave the above code in softmax, the linear classifier remains.
.

Resources