tensorflow using gpu on wsl2 is not learning - docker

I am using Ubuntu 20.04 on wsl2 running on win11. The code to execute is as follows:
import tensorflow as tf
from tensorflow import keras
from keras.layers.convolutional import Conv2D, MaxPooling2D
import numpy as np
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(2500, input_shape=(784,), activation='relu'),
keras.layers.Dense(2000, activation='relu'),
keras.layers.Dense(1500, activation='relu'),
keras.layers.Dense(1000, activation='relu'),
keras.layers.Dense(500, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
model.fit(X_train, y_train, epochs=500)
If I run the code on CPU the output is as follows:
Epoch 205/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.1887 - accuracy: 0.9466
Epoch 206/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.3433 - accuracy: 0.9484
Epoch 207/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.1987 - accuracy: 0.9690
Epoch 208/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.2632 - accuracy: 0.9582
But if I run the same code over a docker(tensorflow/tensorflow:latest-gpu-py3-jupyter) the output is as follows:
Epoch 205/500
60000/60000 [==============================] - 45s 752us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 206/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 207/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 208/500
60000/60000 [==============================] - 45s 745us/sample - loss: 9.5371 - accuracy: 0.0987
The accuracy is constant.
The installation was made based on:
In the installation process I did not get any error.
Another thing is that in the compilation of the model on gpu the time used is huge(more than 5 minutes).
Thanks in advance for any help/idea.

You should decrease number of neurons, layers and use softmax in Dense(10), as you have multiclass output.


ALBERT not converging - HuggingFace

I'm trying to apply a pretrained HuggingFace ALBERT transformer model to my own text classification task, but the loss is not decreasing beyond a certain point.
Here's my code:
There are four labels in my text classification dataset which are:
0, 1, 2, 3
Define the tokenizer
albert_path = 'albert-large-v1'
from transformers import AlbertTokenizer, TFAlbertModel, AlbertConfig
tokenizer = AlbertTokenizer.from_pretrained(albert_path, do_lower_case=True, add_special_tokens=True,
max_length=maxlen, pad_to_max_length=True)
Encode all sentences in text, using the tokenizer
encodings = []
for t in text:
encodings.append(tokenizer.encode(t, max_length=maxlen, pad_to_max_length=True, add_special_tokens=True))
Define the pretrained transformer model and add Dense layer on top
from tensorflow.keras.layers import Input, Flatten, Dropout, Dense
from tensorflow.keras import Model
optimizer = tf.keras.optimizers.Adam(learning_rate= 1e-4)
token_inputs = Input((maxlen), dtype=tf.int32, name='input_word_ids')
config = AlbertConfig(num_labels=4, dropout=0.2, attention_dropout=0.2)
albert_model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path=albert_path, config=config)
X = albert_model(token_inputs)[1]
X = Dropout(0.2)(X)
output_= Dense(4, activation='softmax', name='output')(X)
bert_model2 = Model(token_inputs,output_)
bert_model2.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
Finally, feed the encoded text and labels to the model
encodings = np.asarray(encodings)
labels = np.asarray(labels)
bert_model2.fit(x=encodings, y = labels, epochs=20, batch_size=128)
Epoch 11/20
5/5 [==============================] - 2s 320ms/step - loss: 1.2923
Epoch 12/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2412
Epoch 13/20
5/5 [==============================] - 2s 322ms/step - loss: 1.3118
Epoch 14/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2531
Epoch 15/20
5/5 [==============================] - 2s 318ms/step - loss: 1.2825
Epoch 16/20
5/5 [==============================] - 2s 322ms/step - loss: 1.2479
Epoch 17/20
5/5 [==============================] - 2s 321ms/step - loss: 1.2623
Epoch 18/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2576
Epoch 19/20
5/5 [==============================] - 2s 321ms/step - loss: 1.3143
Epoch 20/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2716
Loss has decreased from 6 to around 1.23 but doesn't seem to decrease any further, even after 30+ epochs.
What am I doing wrong?
All advice is greatly appreciated!
You can try using SGD Optimizer
Introduce Batch Normalization
Try adding a few layers (not-pretrained) on the top of Albert layer.

Tensorflow model.train() not looping through all data [duplicate]

This question already has an answer here:
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Closed 2 years ago.
I'm trying to train a model for mnist.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
What i got is (60000, 28, 28), there are 60,000 items in the data set.
Then, I create the model with the following code.
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.fit(x_train, y_train, epochs=5)
However, I got only 1875 items for each epoch.
2020-06-02 04:33:45.706474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-06-02 04:33:45.706617: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-06-02 04:33:47.437837: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2020-06-02 04:33:47.437955: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-02 04:33:47.441329: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441480: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441876: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-02 04:33:47.448274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27fc6b2c210 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-02 04:33:47.448427: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/5
1875/1875 [==============================] - 1s 664us/step - loss: 0.2971 - accuracy: 0.9140
Epoch 2/5
1875/1875 [==============================] - 1s 661us/step - loss: 0.1421 - accuracy: 0.9582
Epoch 3/5
1875/1875 [==============================] - 1s 684us/step - loss: 0.1068 - accuracy: 0.9675
Epoch 4/5
1875/1875 [==============================] - 1s 695us/step - loss: 0.0868 - accuracy: 0.9731
Epoch 5/5
1875/1875 [==============================] - 1s 682us/step - loss: 0.0764 - accuracy: 0.9762
Process finished with exit code 0
You are using the whole data, no worries!
Due to the Keras documentation, https://github.com/keras-team/keras/blob/master/keras/engine/training.py
when you use model.fit and you do not specify the batch size, it got assigned to 32 by default.
batch_size Integer or NULL. Number of samples per gradient update. If
unspecified, batch_size will default to 32
It means that for each epoch you have 1875 steps, and in each step, your model has taken 32 data examples into the account. And guess what, 1875*32 is equal to 60,000.

Why normalizing labels in MxNet makes accuracy close to 100%?

I am training a model using multi-label logistic regression on MxNet (gluon api) as described here: multi-label logit in gluon
My custom dataset has 13 features and one label of shape [,6].
My features are normalized from original values to [0,1]
I use simple dense neural net with 2 hidden layers.
I noticed when I don't normalize labels (which take discrete values of 1,2,3,4,5,6 and are purely my choice to map categorical values to these numbers), my training process slowly converges to some minima for example:
Epoch: 0, ela: 8.8 sec, Loss: 1.118188, Train_acc 0.5589, Test_acc 0.5716
Epoch: 1, ela: 9.6 sec, Loss: 0.916276, Train_acc 0.6107, Test_acc 0.6273
Epoch: 2, ela: 10.3 sec, Loss: 0.849386, Train_acc 0.6249, Test_acc 0.6421
Epoch: 3, ela: 9.2 sec, Loss: 0.828530, Train_acc 0.6353, Test_acc 0.6304
Epoch: 4, ela: 9.3 sec, Loss: 0.824667, Train_acc 0.6350, Test_acc 0.6456
Epoch: 5, ela: 9.3 sec, Loss: 0.817131, Train_acc 0.6375, Test_acc 0.6455
Epoch: 6, ela: 10.6 sec, Loss: 0.815046, Train_acc 0.6386, Test_acc 0.6333
Epoch: 7, ela: 9.4 sec, Loss: 0.811139, Train_acc 0.6377, Test_acc 0.6289
Epoch: 8, ela: 9.2 sec, Loss: 0.808038, Train_acc 0.6381, Test_acc 0.6484
Epoch: 9, ela: 9.2 sec, Loss: 0.806301, Train_acc 0.6405, Test_acc 0.6485
Epoch: 10, ela: 9.4 sec, Loss: 0.804517, Train_acc 0.6433, Test_acc 0.6354
Epoch: 11, ela: 9.1 sec, Loss: 0.803954, Train_acc 0.6389, Test_acc 0.6280
Epoch: 12, ela: 9.3 sec, Loss: 0.803837, Train_acc 0.6426, Test_acc 0.6495
Epoch: 13, ela: 9.1 sec, Loss: 0.801444, Train_acc 0.6424, Test_acc 0.6328
Epoch: 14, ela: 9.4 sec, Loss: 0.799847, Train_acc 0.6445, Test_acc 0.6380
Epoch: 15, ela: 9.1 sec, Loss: 0.795130, Train_acc 0.6454, Test_acc 0.6471
However, when I normalize labels and train again I get this wired result showing 99.99% accuracy on both training and testing:
Epoch: 0, ela: 12.3 sec, Loss: 0.144049, Train_acc 0.9999, Test_acc 0.9999
Epoch: 1, ela: 12.7 sec, Loss: 0.023632, Train_acc 0.9999, Test_acc 0.9999
Epoch: 2, ela: 12.3 sec, Loss: 0.013996, Train_acc 0.9999, Test_acc 0.9999
Epoch: 3, ela: 12.7 sec, Loss: 0.010092, Train_acc 0.9999, Test_acc 0.9999
Epoch: 4, ela: 12.7 sec, Loss: 0.007964, Train_acc 0.9999, Test_acc 0.9999
Epoch: 5, ela: 12.6 sec, Loss: 0.006623, Train_acc 0.9999, Test_acc 0.9999
Epoch: 6, ela: 12.6 sec, Loss: 0.005700, Train_acc 0.9999, Test_acc 0.9999
Epoch: 7, ela: 12.4 sec, Loss: 0.005026, Train_acc 0.9999, Test_acc 0.9999
Epoch: 8, ela: 12.6 sec, Loss: 0.004512, Train_acc 0.9999, Test_acc 0.9999
How is this possible? Why normalizing labels affects training accuracy in such way?
The tutorial you linked to does multiclass classification. In multilabel classification, label for an example is a one-hot array. For example label [0 0 1 0] means this example belongs to class 2 (assuming classes start with 0). Normalizing this vector does not make sense because the values are already between 0 and 1. Also, in multiclass classification, only one of the label can be true and the other have to be false. Values other than 0 and 1 do not make sense in multi class classification.
When representing a batch of examples, it is common to write the labels as integers instead of on-hot arrays for easier readability. For example label [4 6 1 7] means the first example belongs to class 4, the second example belongs to class 6 and so on. Normalizing this representation also does not make sense because this representation is internally converted to one hot array.
Now, if you normalize the second representation, the behavior is undefined because floating points cannot be array indices. It is possible something weird is happening to give you the 99% accuracy. Maybe you normalized the values to 0 to 1 and the resulting one-hot arrays mostly points to class 0 and rarely class 1. That could give you a 99% accuracy.
I would suggest to not normalize the labels.

Autoencoder not learning identity function

I'm somewhat new to machine learning in general, and I wanted to make a simple experiment to get more familiar with neural network autoencoders: To make an extremely basic autoencoder that would learn the identity function.
I'm using Keras to make life easier, so I did this first to make sure it works:
# Weights are given as [weights, biases], so we give
# the identity matrix for the weights and a vector of zeros for the biases
weights = [np.diag(np.ones(84)), np.zeros(84)]
model = Sequential([Dense(84, input_dim=84, weights=weights)])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=10, batch_size=8, validation_split=0.3)
As expected, the loss is zero, both in train and validation data:
Epoch 1/10
97535/97535 [==============================] - 27s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 2/10
97535/97535 [==============================] - 28s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Then I tried to do the same but without initializing the weights to the identity function, expecting that after a while of training it would learn it. It didn't. I've let it run for 200 epochs various times in different configurations, playing with different optimizers, loss functions, and adding L1 and L2 activity regularizers. The results vary, but the best I've got is still really bad, looking nothing like the original data, just being kinda in the same numeric range.
The data is simply some numbers oscillating around 1.1. I don't know if an activation layer makes sense for this problem, should I be using one?
If this "neural network" of one layer can't learn something as simple as the identity function, how can I expect it to learn anything more complex? What am I doing wrong?
To have better context, here's a way to generate a dataset very similar to the one I'm using:
X = np.random.normal(1.1090579, 0.0012380764, (139336, 84))
I'm suspecting that the variations between the values might be too small. The loss function ends up having decent values (around 1e-6), but it's not enough precision for the result to have a similar shape to the original data. Maybe I should scale/normalize it somehow? Thanks for any advice!
In the end, as it was suggested, the issue was with the dataset having too small variations between the 84 values, so the resulting prediction was actually pretty good in absolute terms (loss function) but comparing it to the original data, the variations were far off. I solved it by normalizing the 84 values in each sample around the sample's mean and dividing by the sample's standard deviation. Then I used the original mean and standard deviation to denormalize the predictions at the other end. I guess this could be done in a few different ways, but I did it by adding this normalization/denormalization into the model itself by using some Lambda layers that operated on the tensors. That way all the data processing was incorporated into the model, which made it nicer to work with. Let me know if you would like to see the actual code.
I believe the problem could be either the number of epoch or the way you inizialize X.
I ran your code with an X of mine for 100 epochs and printed the argmax() and max values of the weights, it gets really close to the identity function.
I'm adding the code snippet that I used
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import random
import pandas as pd
X = np.array([[random.random() for r in xrange(84)] for i in xrange(1,100000)])
model = Sequential([Dense(84, input_dim=84)], name="layer1")
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=100, batch_size=80, validation_split=0.3)
l_weights = np.round(model.layers[0].get_weights()[0],3)
print l_weights.argmax(axis=0)
print l_weights.max(axis=0)
And I'm getting:
Train on 69999 samples, validate on 30000 samples
Epoch 1/100
69999/69999 [==============================] - 1s - loss: 0.2092 - val_loss: 0.1564
Epoch 2/100
69999/69999 [==============================] - 1s - loss: 0.1536 - val_loss: 0.1510
Epoch 3/100
69999/69999 [==============================] - 1s - loss: 0.1484 - val_loss: 0.1459
Epoch 98/100
69999/69999 [==============================] - 1s - loss: 0.0055 - val_loss: 0.0054
Epoch 99/100
69999/69999 [==============================] - 1s - loss: 0.0053 - val_loss: 0.0053
Epoch 100/100
69999/69999 [==============================] - 1s - loss: 0.0051 - val_loss: 0.0051
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83]
[ 0.85000002 0.85100001 0.79799998 0.80500001 0.82700002 0.81900001
0.792 0.829 0.81099999 0.80800003 0.84899998 0.829 0.852
0.79500002 0.84100002 0.81099999 0.792 0.80800003 0.85399997
0.82999998 0.85100001 0.84500003 0.847 0.79699999 0.81400001
0.84100002 0.81 0.85100001 0.80599999 0.84500003 0.824
0.81999999 0.82999998 0.79100001 0.81199998 0.829 0.85600001
0.84100002 0.792 0.847 0.82499999 0.84500003 0.796
0.82099998 0.81900001 0.84200001 0.83999997 0.815 0.79500002
0.85100001 0.83700001 0.85000002 0.79900002 0.84100002 0.79699999
0.838 0.847 0.84899998 0.83700001 0.80299997 0.85399997
0.84500003 0.83399999 0.83200002 0.80900002 0.85500002 0.83899999
0.79900002 0.83399999 0.81 0.79100001 0.81800002 0.82200003
0.79100001 0.83700001 0.83600003 0.824 0.829 0.82800001
0.83700001 0.85799998 0.81999999 0.84299999 0.83999997]
When I used only 5 numbers as an input and printed the actual weights I got this:
array([[ 1., 0., -0., 0., 0.],
[ 0., 1., 0., -0., -0.],
[-0., 0., 1., 0., 0.],
[ 0., -0., 0., 1., -0.],
[ 0., -0., 0., -0., 1.]], dtype=float32)

Keras output looks different than others

When I run the below code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
import time
# fix random seed for reproducibility
seed = 7
# load dataset
dataset = numpy.loadtxt("C:/Users/AQader/Desktop/Keraslearn/mammm.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:5]
Y = dataset[:,5]
# create model
model = Sequential()
model.add(Dense(50, input_dim=5, init='uniform', activation='relu'))
model.add(Dense(25, init='uniform', activation='tanh'))
model.add(Dense(15, init='uniform', activation='tanh'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, nb_epoch=200, batch_size=20, verbose = 0)
# evaluate the model
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
I end up receiving the following.
32/829 [>.............................] - ETA: 0sacc: 84.20%
That's it. Only one line that shows up after a half minute of training. After looking through other questions, the usual output looks like:
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Can anyone tell me what may be wrong here? I am a beginner in this yet this doesn't seem normal. Please note that there are no errors in the "code" sections. What I mean is that the 0sacc is what shows up. I'm running this in the Anaconda Environment Python 2.7 on a Windows 7 64-bit machine. 8GB RAM and Core i5 5th gen.
By calling model.fit with verbose = 0 you have suppressed detailed output. Try setting verbose = 1.
