Dropout Tensorflow: Weight scaling at test time - machine-learning

Do I need to scale weights at test time in tensorflow i.e weights*keep_prob at testing or tensorflow does it itself? if so then how?
At training my keep_prob is 0.5. and at test time its 1.
Although network is regularized but accuracy is not good as before regularization.
P.S i'm classifying CIFAR10
n_nodes_h1=1000
n_nodes_h2=1000
n_nodes_h3=400
n_nodes_h4=100
classes=10
x=tf.placeholder('float',[None,3073])
y=tf.placeholder('float')
keep_prob=tf.placeholder('tf.float32')
batch_size=100
def neural_net(data):
hidden_layer1= {'weight':tf.Variable(tf.random_normal([3073,n_nodes_h1])),
'biases':tf.Variable(tf.random_normal([n_nodes_h1]))}
hidden_layer2={'weight':tf.Variable(tf.random_normal([n_nodes_h1,n_nodes_h2])),
'biases':tf.Variable(tf.random_normal([n_nodes_h2]))}
out_layer={'weight':tf.Variable(tf.random_normal([n_nodes_h2,classes])),
'biases':tf.Variable(tf.random_normal([classes]))}
l1= tf.add(tf.matmul(data,hidden_layer1['weight']), hidden_layer1['biases'])
l1=tf.nn.relu(l1)
#************DROPOUT*******************
l1=tf.nn.dropout(l1,keep_prob)
l2= tf.add(tf.matmul(l1,hidden_layer2['weight']), hidden_layer2['biases'])
l2=tf.nn.relu(l2)
out= tf.matmul(l2,out_layer['weight'])+ out_layer['biases']
return out
This was network
iterations=20
Train_loss=[]
Test_loss=[]
def train_nn(x):
prediction=neural_net(x)
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
optimizer=tf.train.AdamOptimizer().minimize(cost)
epochs=iterations
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range (epochs):
e_loss=0
i=0
for _ in range(int(X_train.shape[0]/batch_size)):
e_x=X_train[i:i+batch_size]
e_y=y_hot_train[i:i+batch_size]
i+=batch_size
_,c=sess.run([optimizer,cost],feed_dict={x:e_x,y:e_y, keep_prob:0.5})
e_loss+=c
print "Epoch: ",epoch," Train loss= ",e_loss
Train_loss.append(e_loss)
correct=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct,'float'))
print "Accuracy on test: " ,accuracy.eval({x:X_test,y:y_hot_test , keep_prob:1.})
print "Accuracy on train:" ,accuracy.eval({x:X_train[0:2600],y:y_hot_train[0:2600], keep_prob=1.})
train_nn(x)
Do I need something like
hidden_layer1['weight']*=keep_prob
#testing time

Tensorflow does it itself:
With probability keep_prob, outputs the input element scaled up by 1 /
keep_prob, otherwise outputs 0. The scaling is so that the expected
sum is unchanged.
(from this page)

Related

LSTM regression model flat prediction

This is a time series regression problem for the battery capacity as output and a single input variable as voltage; the relation is non-linear.
LSTM Model prediction of the test data always returns a semi-flat line, probably the mean of the output variable in the training data.
This is an example of predicted vs test set output values, with the following model parameters:
(Window size: 10, batch site: 256, LSTM nodes: 16)
Prediction of the test data
Data had been normalized, down-sampled to 1 sec and later to 3 sec, original sampling was 10 Hz.
I was suspecting the voltage fluctuation is the problem, but sampling at 3 seconds hadn't resulted into noticeable improvement.
Here are the data after being down-sampled to 3 seconds:
Normalized Training Data ; Y:SOC, X: Voltage
Normalized Test Data ; Y:SOC, X: Voltage
I've tried many changes in the model and learning parameters as follows, but still the behavior is the same.
That's why i think it's not a parameter tuning issue, rather the model is not learning at all.
LSTM layer: always single, followed by Dense with no options.
LSTM nodes: [4,8,16,32]
Epoch: : [16,32,64,128]
window size (input vector depth): [8,32,64,128]
Batch size: [32,64,128,256]
learning rate: [.0005,.0001,.001]
optimizer : ADAM, options:[ none, clipnorm=1, clipvalue=0.5]
Model specification Code:
backend.clear_session()
model1 = Sequential()
model1.add(LSTM(16,input_shape=(win_sz, features_cnt) )) # stateless
model1.add(layers.Dense(1))
model1.summary()
Model training and validation Code:
n_epochs = 12
iterations = tr_samples_sh_cnt // batch_sz_tr
loss = tf.keras.losses.MeanAbsoluteError()
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
loss_history = []
#tf.function
def train_model_on_batch():
start = epoch * batch_sz_tr
X_batch = df_feat_tr_3D[start:start+batch_sz_tr, :, :]
y_batch = df_SOC_tr_2D[start:start+batch_sz_tr, :]
with tf.GradientTape() as tape:
current_loss = loss(model1(X_batch), y_batch)
gradients = tape.gradient(current_loss, model1.trainable_variables)
optimizer.apply_gradients(zip(gradients, model1.trainable_variables))
return current_loss
for epoch in range(n_epochs+1):
for iteration in range(iterations):
current_loss = train_model_on_batch()
if epoch % 1 == 0:
loss_history.append(current_loss.numpy())
print("{}. \t\tLoss: {}".format(
epoch, loss_history[-1]))
print('\nTraining complete.')
P_test = model1.predict(df_feat_test_3D)
After adding sigmoid activation function in both LSTM and Dense layers, a very small change observed, but far from reasonable fit.
Prediction of the test data after adding activation function
The problem was the activation function as #Dr. Snoopy recommended

Horrible forecasting KPI scores (RMSE, MAE and MAPE) using RNN with three flavors ("LSTM", "GRU" and "RNN")

My project is trying to forecast COVID 19 total confirmed case for 2021. This is the overlook of the confirm case data, which I use to train my RNN model.
enter image description here
The confirmed number though doesn't show any repeating pattern. But there has been research studies using RNN and LSTM on the same data (in fact, I use the same data source as them). This is the research study I drew inspiration from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8523546/
This is the result I got:
beginning the training of the LSTM RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 33.10it/s, loss=0.115, v_num=logs, train_loss=0.0933, val_loss=0.466]
training of the LSTM RNN completed: 105.23 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.48it/s]
LSTM :
MAPE : 199.3029
RMSPE : 0.6659
RMSE : 0.6763
-R squared : 15366746993394475597824.0000
se : 0.0871
beginning the training of the GRU RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 37.86it/s, loss=0.11, v_num=logs, train_loss=0.0911, val_loss=0.474]
training of the GRU RNN completed: 90.99 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 3.91it/s]
GRU :
MAPE : 209.6602
RMSPE : 0.6771
RMSE : 0.6877
-R squared : 467975998695823638528.0000
se : 0.0871
beginning the training of the Vanilla RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 43.54it/s, loss=0.114, v_num=logs, train_loss=0.109, val_loss=0.461]
training of the Vanilla RNN completed: 79.32 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.08it/s]
Vanilla :
MAPE : 200.5710
RMSPE : 0.6673
RMSE : 0.6778
-R squared : 115942252717688101559392358367232.0000
se : 0.0871
Also, here is my prediction plot. Three of the plot looks exactly the same, different MAPE score:
enter image description here
For the package i use, i use darts (u8darts[all] when importing the packages).
for methodology, I set the parameters for my model as this. I used this article by Heiko Oinen for methodology (detailed code is below the page):
https://medium.com/towards-data-science/temporal-loops-intro-to-recurrent-neural-networks-for-time-series-forecasting-in-python-b0398963dc1f:
#Set up the models, run the models, plot and evaluate
EPOCH = 100
def run_RNN(flavor, ts, train, val):
#set the model up
model_RNN = RNNModel(
model = flavor,
model_name = flavor + str(" RNN"),
input_chunk_length = 12,
training_length = 20,
hidden_dim = 20,
batch_size = 16,
n_epochs = EPOCH,
dropout = 0,
optimizer_kwargs = {'lr': 1e-3},
log_tensorboard = True,
random_state = 42,
force_reset = True
)
if flavor == "RNN": flavor = "Vanilla"
#fit the model
fit_it(model_RNN, train, val, flavor)
#compute N predictions
pred = model_RNN.predict(n = FC_N,future_covariates = covariates)
#plot predictions vs actual
plot_fitted(pred, ts, flavor)
#print accuracy metrics
res_acc = accuracy_metrics(pred, ts)
I have even tries epoch to 300, but the train loss and loss during training didn't decrease further.
I haven't gotten experience in asking questions here, I'll try to articulate and provide more detail if you have questions. Thank you so much for your help!

How to overfit data with Keras?

I'm trying to build a simple regression model using keras and tensorflow. In my problem I have data in the form (x, y), where x and y are simply numbers. I'd like to build a keras model in order to predict y using x as an input.
Since I think images better explains thing, these are my data:
We may discuss if they are good or not, but in my problem I cannot really cheat them.
My keras model is the following (data are splitted 30% test (X_test, y_test) and 70% training (X_train, y_train)):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=() activation="relu", name="first_layer"))
model.add(tf.keras.layers.Dense(16, activation="relu", name="second_layer"))
model.add(tf.keras.layers.Dense(1, name="output_layer"))
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics=["mse"] )
history = model.fit(X_train, y_train, epochs=500, batch_size=1, verbose=0, shuffle=False)
eval_result = model.evaluate(X_test, y_test)
print("\n\nTest loss:", eval_result, "\n")
predict_Y = model.predict(X)
note: X contains both X_test and X_train.
Plotting the prediction I get (blue squares are the prediction predict_Y)
I'm playing a lot with layers, activation funztions and other parameters. My goal is to find the best parameters to train the model, but the actual question, here, is slightly different: in fact I have hard times to force the model to overfit the data (as you can see from the above results).
Does anyone have some sort of idea about how to reproduce overfitting?
This is the outcome I would like to get:
(red dots are under blue squares!)
EDIT:
Here I provide you the data used in the example above: you can copy paste directly to a python interpreter:
X_train = [0.704619794270697, 0.6779457393024553, 0.8207082120250023, 0.8588819357831449, 0.8692320257603844, 0.6878750931810429, 0.9556331888763945, 0.77677964510883, 0.7211381534179618, 0.6438319113259414, 0.6478339581502052, 0.9710222750072649, 0.8952188423349681, 0.6303124926673513, 0.9640316662124185, 0.869691568491902, 0.8320164648420931, 0.8236399177660375, 0.8877334038470911, 0.8084042532069621, 0.8045680821762038]
y_train = [0.7766424210611557, 0.8210846773655833, 0.9996114311913593, 0.8041331063189883, 0.9980525368790883, 0.8164056182686034, 0.8925487603333683, 0.7758207470960685, 0.37345286573743475, 0.9325789202459493, 0.6060269037514895, 0.9319771743389491, 0.9990691225991941, 0.9320002808310418, 0.9992560731072977, 0.9980241561997089, 0.8882905258641204, 0.4678339275898943, 0.9312152374846061, 0.9542371205095945, 0.8885893668675711]
X_test = [0.9749191829308574, 0.8735366740730178, 0.8882783211709133, 0.8022891400991644, 0.8650601322313454, 0.8697902997857514, 1.0, 0.8165876695985228, 0.8923841531760973]
y_test = [0.975653685270635, 0.9096752789481569, 0.6653736469114154, 0.46367666660348744, 0.9991817903431941, 1.0, 0.9111205717076893, 0.5264993912088891, 0.9989199241685126]
X = [0.704619794270697, 0.77677964510883, 0.7211381534179618, 0.6478339581502052, 0.6779457393024553, 0.8588819357831449, 0.8045680821762038, 0.8320164648420931, 0.8650601322313454, 0.8697902997857514, 0.8236399177660375, 0.6878750931810429, 0.8923841531760973, 0.8692320257603844, 0.8877334038470911, 0.8735366740730178, 0.8207082120250023, 0.8022891400991644, 0.6303124926673513, 0.8084042532069621, 0.869691568491902, 0.9710222750072649, 0.9556331888763945, 0.8882783211709133, 0.8165876695985228, 0.6438319113259414, 0.8952188423349681, 0.9749191829308574, 1.0, 0.9640316662124185]
Y = [0.7766424210611557, 0.7758207470960685, 0.37345286573743475, 0.6060269037514895, 0.8210846773655833, 0.8041331063189883, 0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0, 0.4678339275898943, 0.8164056182686034, 0.9989199241685126, 0.9980525368790883, 0.9312152374846061, 0.9096752789481569, 0.9996114311913593, 0.46367666660348744, 0.9320002808310418, 0.9542371205095945, 0.9980241561997089, 0.9319771743389491, 0.8925487603333683, 0.6653736469114154, 0.5264993912088891, 0.9325789202459493, 0.9990691225991941, 0.975653685270635, 0.9111205717076893, 0.9992560731072977]
Where X contains the list of the x values and Y the corresponding y value. (X_test, y_test) and (X_train, y_train) are two (non overlapping) subset of (X, Y).
To predict and show the model results I simply use matplotlib (imported as plt):
predict_Y = model.predict(X)
plt.plot(X, Y, "ro", X, predict_Y, "bs")
plt.show()
Overfitted models are rarely useful in real life. It appears to me that OP is well aware of that but wants to see if NNs are indeed capable of fitting (bounded) arbitrary functions or not. On one hand, the input-output data in the example seems to obey no discernible pattern. On the other hand, both input and output are scalars in [0, 1] and there are only 21 data points in the training set.
Based on my experiments and results, we can indeed overfit as requested. See the image below.
Numerical results:
x y_true y_pred error
0 0.704620 0.776642 0.773753 -0.002889
1 0.677946 0.821085 0.819597 -0.001488
2 0.820708 0.999611 0.999813 0.000202
3 0.858882 0.804133 0.805160 0.001026
4 0.869232 0.998053 0.997862 -0.000190
5 0.687875 0.816406 0.814692 -0.001714
6 0.955633 0.892549 0.893117 0.000569
7 0.776780 0.775821 0.779289 0.003469
8 0.721138 0.373453 0.374007 0.000554
9 0.643832 0.932579 0.912565 -0.020014
10 0.647834 0.606027 0.607253 0.001226
11 0.971022 0.931977 0.931549 -0.000428
12 0.895219 0.999069 0.999051 -0.000018
13 0.630312 0.932000 0.930252 -0.001748
14 0.964032 0.999256 0.999204 -0.000052
15 0.869692 0.998024 0.997859 -0.000165
16 0.832016 0.888291 0.887883 -0.000407
17 0.823640 0.467834 0.460728 -0.007106
18 0.887733 0.931215 0.932790 0.001575
19 0.808404 0.954237 0.960282 0.006045
20 0.804568 0.888589 0.906829 0.018240
{'me': -0.00015776709314323828,
'mae': 0.00329163070145315,
'mse': 4.0713782563067185e-05,
'rmse': 0.006380735268216915}
OP's code seems good to me. My changes were minor:
Use deeper networks. It may not actually be necessary to use a depth of 30 layers but since we just want to overfit, I didn't experiment too much with what's the minimum depth needed.
Each Dense layer has 50 units. Again, this may be overkill.
Added batch normalization layer every 5th dense layer.
Decreased learning rate by half.
Ran optimization for longer using the all 21 training examples in a batch.
Used MAE as objective function. MSE is good but since we want to overfit, I want to penalize small errors the same way as large errors.
Random numbers are more important here because data appears to be arbitrary. Though, you should get similar results if you change random number seed and let the optimizer run long enough. In some cases, optimization does get stuck in a local minima and it would not produce overfitting (as requested by OP).
The code is below.
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
# Set seed just to have reproducible results
np.random.seed(84)
tf.random.set_seed(84)
# Load data from the post
# https://stackoverflow.com/questions/61252785/how-to-overfit-data-with-keras
X_train = np.array([0.704619794270697, 0.6779457393024553, 0.8207082120250023,
0.8588819357831449, 0.8692320257603844, 0.6878750931810429,
0.9556331888763945, 0.77677964510883, 0.7211381534179618,
0.6438319113259414, 0.6478339581502052, 0.9710222750072649,
0.8952188423349681, 0.6303124926673513, 0.9640316662124185,
0.869691568491902, 0.8320164648420931, 0.8236399177660375,
0.8877334038470911, 0.8084042532069621,
0.8045680821762038])
Y_train = np.array([0.7766424210611557, 0.8210846773655833, 0.9996114311913593,
0.8041331063189883, 0.9980525368790883, 0.8164056182686034,
0.8925487603333683, 0.7758207470960685,
0.37345286573743475, 0.9325789202459493,
0.6060269037514895, 0.9319771743389491, 0.9990691225991941,
0.9320002808310418, 0.9992560731072977, 0.9980241561997089,
0.8882905258641204, 0.4678339275898943, 0.9312152374846061,
0.9542371205095945, 0.8885893668675711])
X_test = np.array([0.9749191829308574, 0.8735366740730178, 0.8882783211709133,
0.8022891400991644, 0.8650601322313454, 0.8697902997857514,
1.0, 0.8165876695985228, 0.8923841531760973])
Y_test = np.array([0.975653685270635, 0.9096752789481569, 0.6653736469114154,
0.46367666660348744, 0.9991817903431941, 1.0,
0.9111205717076893, 0.5264993912088891, 0.9989199241685126])
X = np.array([0.704619794270697, 0.77677964510883, 0.7211381534179618,
0.6478339581502052, 0.6779457393024553, 0.8588819357831449,
0.8045680821762038, 0.8320164648420931, 0.8650601322313454,
0.8697902997857514, 0.8236399177660375, 0.6878750931810429,
0.8923841531760973, 0.8692320257603844, 0.8877334038470911,
0.8735366740730178, 0.8207082120250023, 0.8022891400991644,
0.6303124926673513, 0.8084042532069621, 0.869691568491902,
0.9710222750072649, 0.9556331888763945, 0.8882783211709133,
0.8165876695985228, 0.6438319113259414, 0.8952188423349681,
0.9749191829308574, 1.0, 0.9640316662124185])
Y = np.array([0.7766424210611557, 0.7758207470960685, 0.37345286573743475,
0.6060269037514895, 0.8210846773655833, 0.8041331063189883,
0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0,
0.4678339275898943, 0.8164056182686034, 0.9989199241685126,
0.9980525368790883, 0.9312152374846061, 0.9096752789481569,
0.9996114311913593, 0.46367666660348744, 0.9320002808310418,
0.9542371205095945, 0.9980241561997089, 0.9319771743389491,
0.8925487603333683, 0.6653736469114154, 0.5264993912088891,
0.9325789202459493, 0.9990691225991941, 0.975653685270635,
0.9111205717076893, 0.9992560731072977])
# Reshape all data to be of the shape (batch_size, 1)
X_train = X_train.reshape((-1, 1))
Y_train = Y_train.reshape((-1, 1))
X_test = X_test.reshape((-1, 1))
Y_test = Y_test.reshape((-1, 1))
X = X.reshape((-1, 1))
Y = Y.reshape((-1, 1))
# Is data scaled? NNs do well with bounded data.
assert np.all(X_train >= 0) and np.all(X_train <= 1)
assert np.all(Y_train >= 0) and np.all(Y_train <= 1)
assert np.all(X_test >= 0) and np.all(X_test <= 1)
assert np.all(Y_test >= 0) and np.all(Y_test <= 1)
assert np.all(X >= 0) and np.all(X <= 1)
assert np.all(Y >= 0) and np.all(Y <= 1)
# Build a model with variable number of hidden layers.
# We will use Keras functional API.
# https://www.perfectlyrandom.org/2019/06/24/a-guide-to-keras-functional-api/
n_dense_layers = 30 # increase this to get more complicated models
# Define the layers first.
input_tensor = Input(shape=(1,), name='input')
layers = []
for i in range(n_dense_layers):
layers += [Dense(units=50, activation='relu', name=f'dense_layer_{i}')]
if (i > 0) & (i % 5 == 0):
# avg over batches not features
layers += [BatchNormalization(axis=1)]
sigmoid_layer = Dense(units=1, activation='sigmoid', name='sigmoid_layer')
# Connect the layers using Keras Functional API
mid_layer = input_tensor
for dense_layer in layers:
mid_layer = dense_layer(mid_layer)
output_tensor = sigmoid_layer(mid_layer)
model = Model(inputs=[input_tensor], outputs=[output_tensor])
optimizer = Adam(learning_rate=0.0005)
model.compile(optimizer=optimizer, loss='mae', metrics=['mae'])
model.fit(x=[X_train], y=[Y_train], epochs=40000, batch_size=21)
# Predict on various datasets
Y_train_pred = model.predict(X_train)
# Create a dataframe to inspect results manually
train_df = pd.DataFrame({
'x': X_train.reshape((-1)),
'y_true': Y_train.reshape((-1)),
'y_pred': Y_train_pred.reshape((-1))
})
train_df['error'] = train_df['y_pred'] - train_df['y_true']
print(train_df)
# A dictionary to store all the errors in one place.
train_errors = {
'me': np.mean(train_df['error']),
'mae': np.mean(np.abs(train_df['error'])),
'mse': np.mean(np.square(train_df['error'])),
'rmse': np.sqrt(np.mean(np.square(train_df['error']))),
}
print(train_errors)
# Make a plot to visualize true vs predicted
plt.figure(1)
plt.clf()
plt.plot(train_df['x'], train_df['y_true'], 'r.', label='y_true')
plt.plot(train_df['x'], train_df['y_pred'], 'bo', alpha=0.25, label='y_pred')
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Train data. MSE={np.round(train_errors["mse"], 5)}.')
plt.legend()
plt.show(block=False)
plt.savefig('true_vs_pred.png')
A problem you may encountering is that you don't have enough training data for the model to be able to fit well. In your example, you only have 21 training instances, each with only 1 feature. Broadly speaking with neural network models, you need on the order of 10K or more training instances to produce a decent model.
Consider the following code that generates a noisy sine wave and tries to train a densely-connected feed-forward neural network to fit the data. My model has two linear layers, each with 50 hidden units and a ReLU activation function. The experiments are parameterized with the variable num_points which I will increase.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(7)
def generate_data(num_points=100):
X = np.linspace(0.0 , 2.0 * np.pi, num_points).reshape(-1, 1)
noise = np.random.normal(0, 1, num_points).reshape(-1, 1)
y = 3 * np.sin(X) + noise
return X, y
def run_experiment(X_train, y_train, X_test, batch_size=64):
num_points = X_train.shape[0]
model = keras.Sequential()
model.add(layers.Dense(50, input_shape=(1, ), activation='relu'))
model.add(layers.Dense(50, activation='relu'))
model.add(layers.Dense(1, activation='linear'))
model.compile(loss = "mse", optimizer = "adam", metrics=["mse"] )
history = model.fit(X_train, y_train, epochs=10,
batch_size=batch_size, verbose=0)
yhat = model.predict(X_test, batch_size=batch_size)
plt.figure(figsize=(5, 5))
plt.plot(X_train, y_train, "ro", markersize=2, label='True')
plt.plot(X_train, yhat, "bo", markersize=1, label='Predicted')
plt.ylim(-5, 5)
plt.title('N=%d points' % (num_points))
plt.legend()
plt.grid()
plt.show()
Here is how I invoke the code:
num_points = 100
X, y = generate_data(num_points)
run_experiment(X, y, X)
Now, if I run the experiment with num_points = 100, the model predictions (in blue) do a terrible job at fitting the true noisy sine wave (in red).
Now, here is num_points = 1000:
Here is num_points = 10000:
And here is num_points = 100000:
As you can see, for my chosen NN architecture, adding more training instances allows the neural network to better (over)fit the data.
If you do have a lot of training instances, then if you want to purposefully overfit your data, you can either increase the neural network capacity or reduce regularization. Specifically, you can control the following knobs:
increase the number of layers
increase the number of hidden units
increase the number of features per data instance
reduce regularization (e.g. by removing dropout layers)
use a more complex neural network architecture (e.g. transformer blocks instead of RNN)
You may be wondering if neural networks can fit arbitrary data rather than just a noisy sine wave as in my example. Previous research says that, yes, a big enough neural network can fit any data. See:
Universal approximation theorem. https://en.wikipedia.org/wiki/Universal_approximation_theorem
Zhang 2016, "Understanding deep learning requires rethinking generalization". https://arxiv.org/abs/1611.03530
As discussed in the comments, you should make a Python array (with NumPy) like this:-
Myarray = [[0.65, 1], [0.85, 0.5], ....]
Then you would just call those specific parts of the array whom you need to predict. Here the first value is the x-axis value. So you would call it to obtain the corresponding pair stored in Myarray
There are many resources to learn these types of things. some of them are ===>
https://www.geeksforgeeks.org/python-using-2d-arrays-lists-the-right-way/
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=video&cd=2&cad=rja&uact=8&ved=0ahUKEwjGs-Oxne3oAhVlwTgGHfHnDp4QtwIILTAB&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DQgfUT7i4yrc&usg=AOvVaw3LympYRszIYi6_OijMXH72

Conv nets accuracy not changing while loss decreases

I'm training several CNNs to do image classification in TensorFlow. The training losses decrease normally. However the test accuracy never changed throughout the whole training procedure, plus the accuracy is very low (0.014) where the accuracy for randomly guessing would be 0.003 (There are around 300 classes). One thing I've noticed is that only those models that I applied batch norm to showed such a weird behavior. What can possibly be wrong to cause this issue? The training set has 80000 samples, in case you might figure this was caused by overfitting. Below is part of the code for evaluation:
Accuracy function:
correct_prediction = tf.equal(tf.argmax(Model(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
the test_image is a batch with only one sample in it while the test_image_label is a scalar.
Session:
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord, start=True)
print('variables initialized')
step = 0
for epoch in range(epochs):
sess.run(enqueue_train)
print('epoch: %d' %epoch)
if epoch % 5 == 0:
save_path = saver.save(sess, savedir + "/Model")
for batch in range(num_batch):
if step % 400 == 0:
summary_str = cost_summary.eval(feed_dict={phase: True})
file_writer.add_summary(summary_str, step)
else:
sess.run(train_step, feed_dict={phase: True})
step += 1
sess.run(train_close)
sess.run(enqueue_test)
accuracy_vector = []
for num in range(len(testnames)):
accuracy_vector.append(sess.run(accuracy, feed_dict={phase: False}))
mean_accuracy = sess.run(tf.divide(tf.add_n(accuracy_vector), len(testnames)))
print("test accuracy %g"%mean_accuracy)
sess.run(test_close)
save_path = saver.save(sess, savedir + "/Model_final")
coord.request_stop()
coord.join(threads)
file_writer.close()
The phase above is to indicate if it is training or testing for the batch norm layer.
Note that I tried to calculate the accuracy with the training set, which led to the minimal loss. However it gives the same poor accuracy. Please help me, I really appreciate it!

How to use a fixed validation set (not K-fold cross validation) in Scikit-learn for a decision tree classifier/random forest classifier?

I am new to machine learning and data science. Sorry, if it is a very stupid question.
I see there is an inbuilt function for cross-validation but not for a fixed validation set. I have a dataset with 50,000 samples labeled with years from 1990 to 2010. I need to train different classifiers on 1990-2008 samples, then validate on 2009 samples, and test on 2010 samples.
EDIT:
After #Quan Tran's answer, I tried this. This is how it should be?
# Fit a decision tree
estimator1 = DecisionTreeClassifier( max_depth = 9, max_leaf_nodes=9)
estimator1.fit(X_train, y_train)
print estimator1
# validate using validation set
acc = np.zeros((20,20)) # store accuracy
for i in range(20):
for j in range(20):
estimator1 = DecisionTreeClassifier(max_depth = i+1, max_leaf_nodes=j+2)
estimator1.fit(X_valid, y_valid)
y_pred = estimator1.predict(X_valid)
acc[i,j] = accuracy_score(y_valid, y_pred)
best_mod = np.where(acc == acc.max())
print best_mod
print acc[best_mod]
# Predict target values
estimator1 = DecisionTreeClassifier(max_depth = int(best_mod[0]) + 1, max_leaf_nodes= int(best_mod[1]) + 2)
estimator1.fit(X_valid, y_valid)
y_pred = estimator1.predict(X_test)
confusion = metrics.confusion_matrix(y_test, y_pred)
TP = confusion[1, 1]
TN = confusion[0, 0]
FP = confusion[0, 1]
FN = confusion[1, 0]
# Classification Accuracy
print "======= ACCURACY ========"
print((TP + TN) / float(TP + TN + FP + FN))
print accuracy_score(y_valid, y_pred)
# store the predicted probabilities for class
y_pred_prob = estimator1.predict_proba(X_test)[:, 1]
# plot a ROC curve for y_test and y_pred_prob
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_prob)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve for DecisionTreeClassifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)
print("======= AUC ========")
print(metrics.roc_auc_score(y_test, y_pred_prob))
I get this answer, which is not the best accuracy.
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=9,
max_features=None, max_leaf_nodes=9, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
(array([5]), array([19]))
[ 0.8489011]
======= ACCURACY ========
0.574175824176
0.538461538462
======= AUC ========
0.547632099893
In this case, there are three separate sets. The train set, the test set and the validation set.
The train set is used to fit the parameters of the classifier. For example:
clf = DecisionTreeClassifier(max_depth=2)
clf.fit(trainfeatures, labels)
The validation set is used to tune the hyper parameters of the classifier or find the cutoff point for the training procedure. For example, in the case of Decision tree, max_depth is a hyper parameter. You will need to find a good set of hyper parameters by experimenting with different values of hyper parameters (tuning) and compare the performance measures (accuracy/precision,..) on the validation set.
The test set is used to estimate the error rate on unseen data. After having the performance measures on the test set, the model must not be trained/tuned any further.

Resources