This question already has an answer here:
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Closed 2 years ago.
I'm trying to train a model for mnist.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print(x_train.shape)
What i got is (60000, 28, 28), there are 60,000 items in the data set.
Then, I create the model with the following code.
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
However, I got only 1875 items for each epoch.
2020-06-02 04:33:45.706474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-06-02 04:33:45.706617: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-06-02 04:33:47.437837: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2020-06-02 04:33:47.437955: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-02 04:33:47.441329: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441480: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441876: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-02 04:33:47.448274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27fc6b2c210 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-02 04:33:47.448427: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/5
1875/1875 [==============================] - 1s 664us/step - loss: 0.2971 - accuracy: 0.9140
Epoch 2/5
1875/1875 [==============================] - 1s 661us/step - loss: 0.1421 - accuracy: 0.9582
Epoch 3/5
1875/1875 [==============================] - 1s 684us/step - loss: 0.1068 - accuracy: 0.9675
Epoch 4/5
1875/1875 [==============================] - 1s 695us/step - loss: 0.0868 - accuracy: 0.9731
Epoch 5/5
1875/1875 [==============================] - 1s 682us/step - loss: 0.0764 - accuracy: 0.9762
Process finished with exit code 0
You are using the whole data, no worries!
Due to the Keras documentation, https://github.com/keras-team/keras/blob/master/keras/engine/training.py
when you use model.fit and you do not specify the batch size, it got assigned to 32 by default.
batch_size Integer or NULL. Number of samples per gradient update. If
unspecified, batch_size will default to 32
It means that for each epoch you have 1875 steps, and in each step, your model has taken 32 data examples into the account. And guess what, 1875*32 is equal to 60,000.
Related
I am using Ubuntu 20.04 on wsl2 running on win11. The code to execute is as follows:
import tensorflow as tf
from tensorflow import keras
from keras.layers.convolutional import Conv2D, MaxPooling2D
import numpy as np
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(2500, input_shape=(784,), activation='relu'),
keras.layers.Dense(2000, activation='relu'),
keras.layers.Dense(1500, activation='relu'),
keras.layers.Dense(1000, activation='relu'),
keras.layers.Dense(500, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=500)
If I run the code on CPU the output is as follows:
Epoch 205/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.1887 - accuracy: 0.9466
Epoch 206/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.3433 - accuracy: 0.9484
Epoch 207/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.1987 - accuracy: 0.9690
Epoch 208/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.2632 - accuracy: 0.9582
But if I run the same code over a docker(tensorflow/tensorflow:latest-gpu-py3-jupyter) the output is as follows:
Epoch 205/500
60000/60000 [==============================] - 45s 752us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 206/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 207/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 208/500
60000/60000 [==============================] - 45s 745us/sample - loss: 9.5371 - accuracy: 0.0987
The accuracy is constant.
The installation was made based on:
https://www.youtube.com/watch?v=CO43b6XWHNI
https://docs.nvidia.com/cuda/wsl-user-guide/index.html
In the installation process I did not get any error.
Another thing is that in the compilation of the model on gpu the time used is huge(more than 5 minutes).
Thanks in advance for any help/idea.
You should decrease number of neurons, layers and use softmax in Dense(10), as you have multiclass output.
I'm building a Keras model to predict predict if the user will select the certain product or not (binary classification).
Model seems to be making progress on Validation set that is heldout while training, but the model's predictions are all 0s when it comes to the test set.
My dataset looks something like this:
train_dataset
customer_id id target customer_num_id
0 TCHWPBT 4 0 1
1 TCHWPBT 13 0 1
2 TCHWPBT 20 0 1
3 TCHWPBT 23 0 1
4 TCHWPBT 28 0 1
... ... ... ... ...
1631695 D4Q7TMM 849 0 7417
1631696 D4Q7TMM 855 0 7417
1631697 D4Q7TMM 856 0 7417
1631698 D4Q7TMM 858 0 7417
1631699 D4Q7TMM 907 0 7417
I split it into Train/Val sets using:
from sklearn.model_selection import train_test_split
Train, Val = train_test_split(train_dataset, test_size=0.1, random_state=42, shuffle=False)
After I split the dataset, I select the features that are used when training and validating the model:
train_customer_id = Train['customer_num_id']
train_vendor_id = Train['id']
train_target = Train['target']
val_customer_id = Val['customer_num_id']
val_vendor_id = Val['id']
val_target = Val['target']
... And run the model:
epochs = 2
for e in range(epochs):
print('EPOCH: ', e)
model.fit([train_customer_id, train_vendor_id], train_target, epochs=1, verbose=1, batch_size=384)
prediction = model.predict(x=[train_customer_id, train_vendor_id], verbose=1, batch_size=384)
train_f1 = f1_score(y_true=train_target.astype('float32'), y_pred=prediction.round())
print('TRAIN F1: ', train_f1)
val_prediction = model.predict(x=[val_customer_id, val_vendor_id], verbose=1, batch_size=384)
val_f1 = f1_score(y_true=val_target.astype('float32'), y_pred=val_prediction.round())
print('VAL F1: ', val_f1)
EPOCH: 0
1468530/1468530 [==============================] - 19s 13us/step - loss: 0.0891
TRAIN F1: 0.1537511577647422
VAL F1: 0.09745762711864409
EPOCH: 1
1468530/1468530 [==============================] - 19s 13us/step - loss: 0.0691
TRAIN F1: 0.308748569645272
VAL F1: 0.2076433121019108
The validation accuracy seems to be improving with time, and model predicts both 1s and 0s:
prediction = model.predict(x=[val_customer_id, val_vendor_id], verbose=1, batch_size=384)
np.unique(prediction.round())
array([0., 1.], dtype=float32)
But when I try predict the test set, model predicts 0 for all values:
prediction = model.predict(x=[test_dataset['customer_num_id'], test_dataset['id']], verbose=1, batch_size=384)
np.unique(prediction.round())
array([0.], dtype=float32)
Test dataset looks similar to the training and validation sets, and it has been left out during training just like the validation set, yet the model can't output values other than 0.
Here's what test dataset looks like:
test_dataset
customer_id id customer_num_id
0 Z59FTQD 243 7418
1 0JP29SK 243 7419
... ... ... ...
1671995 L9G4OFV 907 17414
1671996 L9G4OFV 907 17414
1671997 FDZFYBA 907 17415
Does anyone know what might be the issue here?
EDIT: made dataset text more readable
Please take a look at the distribution of your data. I see in the sample data you've shown that target is all 0's. Consider that if most users don't select the product, then if the model always predicts 0, it will be right most of the time. So, it could be improving it's accuracy by over-fitting to the majority class (0).
You can prevent over-fitting by adjusting params like the learning rate and model architecture by adding dropout layers.
Also, I'm not sure what your model looks like, but you're only training for 2 epochs so it may not have had enough time to generalize the data, and depending on how deep your model is it could need a lot more training time
I'm trying to apply a pretrained HuggingFace ALBERT transformer model to my own text classification task, but the loss is not decreasing beyond a certain point.
Here's my code:
There are four labels in my text classification dataset which are:
0, 1, 2, 3
Define the tokenizer
maxlen=25
albert_path = 'albert-large-v1'
from transformers import AlbertTokenizer, TFAlbertModel, AlbertConfig
tokenizer = AlbertTokenizer.from_pretrained(albert_path, do_lower_case=True, add_special_tokens=True,
max_length=maxlen, pad_to_max_length=True)
Encode all sentences in text, using the tokenizer
encodings = []
for t in text:
encodings.append(tokenizer.encode(t, max_length=maxlen, pad_to_max_length=True, add_special_tokens=True))
Define the pretrained transformer model and add Dense layer on top
from tensorflow.keras.layers import Input, Flatten, Dropout, Dense
from tensorflow.keras import Model
optimizer = tf.keras.optimizers.Adam(learning_rate= 1e-4)
token_inputs = Input((maxlen), dtype=tf.int32, name='input_word_ids')
config = AlbertConfig(num_labels=4, dropout=0.2, attention_dropout=0.2)
albert_model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path=albert_path, config=config)
X = albert_model(token_inputs)[1]
X = Dropout(0.2)(X)
output_= Dense(4, activation='softmax', name='output')(X)
bert_model2 = Model(token_inputs,output_)
print(bert_model2.summary())
bert_model2.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
Finally, feed the encoded text and labels to the model
encodings = np.asarray(encodings)
labels = np.asarray(labels)
bert_model2.fit(x=encodings, y = labels, epochs=20, batch_size=128)
Epoch 11/20
5/5 [==============================] - 2s 320ms/step - loss: 1.2923
Epoch 12/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2412
Epoch 13/20
5/5 [==============================] - 2s 322ms/step - loss: 1.3118
Epoch 14/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2531
Epoch 15/20
5/5 [==============================] - 2s 318ms/step - loss: 1.2825
Epoch 16/20
5/5 [==============================] - 2s 322ms/step - loss: 1.2479
Epoch 17/20
5/5 [==============================] - 2s 321ms/step - loss: 1.2623
Epoch 18/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2576
Epoch 19/20
5/5 [==============================] - 2s 321ms/step - loss: 1.3143
Epoch 20/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2716
Loss has decreased from 6 to around 1.23 but doesn't seem to decrease any further, even after 30+ epochs.
What am I doing wrong?
All advice is greatly appreciated!
You can try using SGD Optimizer
Introduce Batch Normalization
Try adding a few layers (not-pretrained) on the top of Albert layer.
I'm somewhat new to machine learning in general, and I wanted to make a simple experiment to get more familiar with neural network autoencoders: To make an extremely basic autoencoder that would learn the identity function.
I'm using Keras to make life easier, so I did this first to make sure it works:
# Weights are given as [weights, biases], so we give
# the identity matrix for the weights and a vector of zeros for the biases
weights = [np.diag(np.ones(84)), np.zeros(84)]
model = Sequential([Dense(84, input_dim=84, weights=weights)])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=10, batch_size=8, validation_split=0.3)
As expected, the loss is zero, both in train and validation data:
Epoch 1/10
97535/97535 [==============================] - 27s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 2/10
97535/97535 [==============================] - 28s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Then I tried to do the same but without initializing the weights to the identity function, expecting that after a while of training it would learn it. It didn't. I've let it run for 200 epochs various times in different configurations, playing with different optimizers, loss functions, and adding L1 and L2 activity regularizers. The results vary, but the best I've got is still really bad, looking nothing like the original data, just being kinda in the same numeric range.
The data is simply some numbers oscillating around 1.1. I don't know if an activation layer makes sense for this problem, should I be using one?
If this "neural network" of one layer can't learn something as simple as the identity function, how can I expect it to learn anything more complex? What am I doing wrong?
EDIT
To have better context, here's a way to generate a dataset very similar to the one I'm using:
X = np.random.normal(1.1090579, 0.0012380764, (139336, 84))
I'm suspecting that the variations between the values might be too small. The loss function ends up having decent values (around 1e-6), but it's not enough precision for the result to have a similar shape to the original data. Maybe I should scale/normalize it somehow? Thanks for any advice!
UPDATE
In the end, as it was suggested, the issue was with the dataset having too small variations between the 84 values, so the resulting prediction was actually pretty good in absolute terms (loss function) but comparing it to the original data, the variations were far off. I solved it by normalizing the 84 values in each sample around the sample's mean and dividing by the sample's standard deviation. Then I used the original mean and standard deviation to denormalize the predictions at the other end. I guess this could be done in a few different ways, but I did it by adding this normalization/denormalization into the model itself by using some Lambda layers that operated on the tensors. That way all the data processing was incorporated into the model, which made it nicer to work with. Let me know if you would like to see the actual code.
I believe the problem could be either the number of epoch or the way you inizialize X.
I ran your code with an X of mine for 100 epochs and printed the argmax() and max values of the weights, it gets really close to the identity function.
I'm adding the code snippet that I used
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import random
import pandas as pd
X = np.array([[random.random() for r in xrange(84)] for i in xrange(1,100000)])
model = Sequential([Dense(84, input_dim=84)], name="layer1")
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=100, batch_size=80, validation_split=0.3)
l_weights = np.round(model.layers[0].get_weights()[0],3)
print l_weights.argmax(axis=0)
print l_weights.max(axis=0)
And I'm getting:
Train on 69999 samples, validate on 30000 samples
Epoch 1/100
69999/69999 [==============================] - 1s - loss: 0.2092 - val_loss: 0.1564
Epoch 2/100
69999/69999 [==============================] - 1s - loss: 0.1536 - val_loss: 0.1510
Epoch 3/100
69999/69999 [==============================] - 1s - loss: 0.1484 - val_loss: 0.1459
.
.
.
Epoch 98/100
69999/69999 [==============================] - 1s - loss: 0.0055 - val_loss: 0.0054
Epoch 99/100
69999/69999 [==============================] - 1s - loss: 0.0053 - val_loss: 0.0053
Epoch 100/100
69999/69999 [==============================] - 1s - loss: 0.0051 - val_loss: 0.0051
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83]
[ 0.85000002 0.85100001 0.79799998 0.80500001 0.82700002 0.81900001
0.792 0.829 0.81099999 0.80800003 0.84899998 0.829 0.852
0.79500002 0.84100002 0.81099999 0.792 0.80800003 0.85399997
0.82999998 0.85100001 0.84500003 0.847 0.79699999 0.81400001
0.84100002 0.81 0.85100001 0.80599999 0.84500003 0.824
0.81999999 0.82999998 0.79100001 0.81199998 0.829 0.85600001
0.84100002 0.792 0.847 0.82499999 0.84500003 0.796
0.82099998 0.81900001 0.84200001 0.83999997 0.815 0.79500002
0.85100001 0.83700001 0.85000002 0.79900002 0.84100002 0.79699999
0.838 0.847 0.84899998 0.83700001 0.80299997 0.85399997
0.84500003 0.83399999 0.83200002 0.80900002 0.85500002 0.83899999
0.79900002 0.83399999 0.81 0.79100001 0.81800002 0.82200003
0.79100001 0.83700001 0.83600003 0.824 0.829 0.82800001
0.83700001 0.85799998 0.81999999 0.84299999 0.83999997]
When I used only 5 numbers as an input and printed the actual weights I got this:
array([[ 1., 0., -0., 0., 0.],
[ 0., 1., 0., -0., -0.],
[-0., 0., 1., 0., 0.],
[ 0., -0., 0., 1., -0.],
[ 0., -0., 0., -0., 1.]], dtype=float32)
When I run the below code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
import time
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("C:/Users/AQader/Desktop/Keraslearn/mammm.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:5]
Y = dataset[:,5]
# create model
model = Sequential()
model.add(Dense(50, input_dim=5, init='uniform', activation='relu'))
model.add(Dense(25, init='uniform', activation='tanh'))
model.add(Dense(15, init='uniform', activation='tanh'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, nb_epoch=200, batch_size=20, verbose = 0)
time.sleep(0.1)
# evaluate the model
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
I end up receiving the following.
32/829 [>.............................] - ETA: 0sacc: 84.20%
That's it. Only one line that shows up after a half minute of training. After looking through other questions, the usual output looks like:
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Can anyone tell me what may be wrong here? I am a beginner in this yet this doesn't seem normal. Please note that there are no errors in the "code" sections. What I mean is that the 0sacc is what shows up. I'm running this in the Anaconda Environment Python 2.7 on a Windows 7 64-bit machine. 8GB RAM and Core i5 5th gen.
By calling model.fit with verbose = 0 you have suppressed detailed output. Try setting verbose = 1.