I'm training a neural network using Caffe. In the solver.prototxt file, I can set average_loss to print the loss averaged over last N iterations. Is it possible to do so using other values as well ?
For example, I wrote a custom PythonLayer outputting accuracy, and I would like to display the average accuracy over the last N iterations as well.
EDIT: here is the log. The DEBUG lines show the accuracy computed at each image, and every 3 images (average_loss: 3 and display: 3), the accuracy is displayed with the loss. We see that only the last one is displayed, what I want is the average of the 3).
2018-04-24 10:38:06,383 [DEBUG]: Accuracy: 0 / 524288 = 0.000000
I0424 10:38:07.517436 99964 solver.cpp:251] Iteration 0, loss = 1.84883e+06
I0424 10:38:07.517503 99964 solver.cpp:267] Train net output #0: accuracy = 0
I0424 10:38:07.517521 99964 solver.cpp:267] Train net output #1: loss = 1.84883e+06 (* 1 = 1.84883e+06 loss)
I0424 10:38:07.517536 99964 sgd_solver.cpp:106] Iteration 0, lr = 2e-12
I0424 10:38:07.524904 99964 solver.cpp:287] Time: 2.44301s/1iters
2018-04-24 10:38:08,653 [DEBUG]: Accuracy: 28569 / 524288 = 0.054491
2018-04-24 10:38:11,010 [DEBUG]: Accuracy: 22219 / 524288 = 0.042379
2018-04-24 10:38:13,326 [DEBUG]: Accuracy: 168424 / 524288 = 0.321243
I0424 10:38:14.533329 99964 solver.cpp:251] Iteration 3, loss = 1.84855e+06
I0424 10:38:14.533406 99964 solver.cpp:267] Train net output #0: accuracy = 0.321243
I0424 10:38:14.533426 99964 solver.cpp:267] Train net output #1: loss = 1.84833e+06 (* 1 = 1.84833e+06 loss)
I0424 10:38:14.533440 99964 sgd_solver.cpp:106] Iteration 3, lr = 2e-12
I0424 10:38:14.534195 99964 solver.cpp:287] Time: 7.01088s/3iters
2018-04-24 10:38:15,665 [DEBUG]: Accuracy: 219089 / 524288 = 0.417879
2018-04-24 10:38:17,943 [DEBUG]: Accuracy: 202896 / 524288 = 0.386993
2018-04-24 10:38:20,210 [DEBUG]: Accuracy: 0 / 524288 = 0.000000
I0424 10:38:21.393121 99964 solver.cpp:251] Iteration 6, loss = 1.84769e+06
I0424 10:38:21.393190 99964 solver.cpp:267] Train net output #0: accuracy = 0
I0424 10:38:21.393210 99964 solver.cpp:267] Train net output #1: loss = 1.84816e+06 (* 1 = 1.84816e+06 loss)
I0424 10:38:21.393224 99964 sgd_solver.cpp:106] Iteration 6, lr = 2e-12
I0424 10:38:21.393940 99964 solver.cpp:287] Time: 6.85962s/3iters
2018-04-24 10:38:22,529 [DEBUG]: Accuracy: 161180 / 524288 = 0.307426
2018-04-24 10:38:24,801 [DEBUG]: Accuracy: 178021 / 524288 = 0.339548
2018-04-24 10:38:27,090 [DEBUG]: Accuracy: 208571 / 524288 = 0.397818
I0424 10:38:28.297776 99964 solver.cpp:251] Iteration 9, loss = 1.84482e+06
I0424 10:38:28.297843 99964 solver.cpp:267] Train net output #0: accuracy = 0.397818
I0424 10:38:28.297863 99964 solver.cpp:267] Train net output #1: loss = 1.84361e+06 (* 1 = 1.84361e+06 loss)
I0424 10:38:28.297878 99964 sgd_solver.cpp:106] Iteration 9, lr = 2e-12
I0424 10:38:28.298607 99964 solver.cpp:287] Time: 6.9049s/3iters
I0424 10:38:28.331749 99964 solver.cpp:506] Snapshotting to binary proto file snapshot/train_iter_10.caffemodel
I0424 10:38:36.171842 99964 sgd_solver.cpp:273] Snapshotting solver state to binary proto file snapshot/train_iter_10.solverstate
I0424 10:38:43.068686 99964 solver.cpp:362] Optimization Done.
Caffe only averages over average_loss iteration the global loss of the net (the weighted sum of all loss layers) while reporting the output of only the last batch for all other output blobs.
Therefore, if you want your Python layer to report accuracy averaged over several iterations, I suggest you store a buffer SS a member of your layer class and display this aggregated value.
Alternatively, you can implement a "moving average" on top of the accuracy calculation and output this value as a "top".
You can have a "moving average output layer" implemented in python.
This layer can take any number of "bottoms" and output the moving average of these bottoms.
Python code of layer:
import caffe
class MovingAverageLayer(caffe.Layer):
def setup(self, bottom, top):
assert len(bottom) == len(top), "layer must have same number of inputs and outputs"
# average over how many iterations? read from param_str
self.buf_size = int(self.param_str)
# allocate a buffer for each "bottom"
self.buf = [[] for _ in self.bottom]
def reshape(self, bottom, top):
# make sure inputs and outputs have the same size
for i, b in enumerate(bottom):
def forward(self, bottom, top):
# put into buffers
for i, b in enumerate(bottom):
if len(self.buf[i]) > self.buf_size:
# compute average
a = 0
for elem in self.buf[i]:
a += elem
top[i].data[...] = a / len(self.buf[i])
def backward(self, top, propagate_down, bottom):
# this layer does not back prop
How to use this layer in prototxt:
layer {
name: "moving_ave"
type: "Python"
bottom: "accuracy"
top: "av_accuracy"
python_param {
layer: "MovingAverageLayer"
module: "path.to.module"
param_str: "30" # buf size
See this tutorial for more information.
Original incorrect answer:
Caffe outputs to log whatever the net outputs: loss, accuracy or any other blob that appears as "top" of a layer and is not used as a "bottom" in any other layer.
Therefore, if you want to see accuracy computed by a "Python" layer, simply make sure no other layer uses this accuracy as an input.
I'm building a Keras model to predict predict if the user will select the certain product or not (binary classification).
Model seems to be making progress on Validation set that is heldout while training, but the model's predictions are all 0s when it comes to the test set.
My dataset looks something like this:
customer_id id target customer_num_id
0 TCHWPBT 4 0 1
1 TCHWPBT 13 0 1
2 TCHWPBT 20 0 1
3 TCHWPBT 23 0 1
4 TCHWPBT 28 0 1
... ... ... ... ...
1631695 D4Q7TMM 849 0 7417
1631696 D4Q7TMM 855 0 7417
1631697 D4Q7TMM 856 0 7417
1631698 D4Q7TMM 858 0 7417
1631699 D4Q7TMM 907 0 7417
I split it into Train/Val sets using:
from sklearn.model_selection import train_test_split
Train, Val = train_test_split(train_dataset, test_size=0.1, random_state=42, shuffle=False)
After I split the dataset, I select the features that are used when training and validating the model:
train_customer_id = Train['customer_num_id']
train_vendor_id = Train['id']
train_target = Train['target']
val_customer_id = Val['customer_num_id']
val_vendor_id = Val['id']
val_target = Val['target']
... And run the model:
epochs = 2
for e in range(epochs):
print('EPOCH: ', e)
model.fit([train_customer_id, train_vendor_id], train_target, epochs=1, verbose=1, batch_size=384)
prediction = model.predict(x=[train_customer_id, train_vendor_id], verbose=1, batch_size=384)
train_f1 = f1_score(y_true=train_target.astype('float32'), y_pred=prediction.round())
print('TRAIN F1: ', train_f1)
val_prediction = model.predict(x=[val_customer_id, val_vendor_id], verbose=1, batch_size=384)
val_f1 = f1_score(y_true=val_target.astype('float32'), y_pred=val_prediction.round())
print('VAL F1: ', val_f1)
1468530/1468530 [==============================] - 19s 13us/step - loss: 0.0891
TRAIN F1: 0.1537511577647422
VAL F1: 0.09745762711864409
1468530/1468530 [==============================] - 19s 13us/step - loss: 0.0691
TRAIN F1: 0.308748569645272
VAL F1: 0.2076433121019108
The validation accuracy seems to be improving with time, and model predicts both 1s and 0s:
prediction = model.predict(x=[val_customer_id, val_vendor_id], verbose=1, batch_size=384)
array([0., 1.], dtype=float32)
But when I try predict the test set, model predicts 0 for all values:
prediction = model.predict(x=[test_dataset['customer_num_id'], test_dataset['id']], verbose=1, batch_size=384)
array([0.], dtype=float32)
Test dataset looks similar to the training and validation sets, and it has been left out during training just like the validation set, yet the model can't output values other than 0.
Here's what test dataset looks like:
customer_id id customer_num_id
0 Z59FTQD 243 7418
1 0JP29SK 243 7419
... ... ... ...
1671995 L9G4OFV 907 17414
1671996 L9G4OFV 907 17414
1671997 FDZFYBA 907 17415
Does anyone know what might be the issue here?
EDIT: made dataset text more readable
Please take a look at the distribution of your data. I see in the sample data you've shown that target is all 0's. Consider that if most users don't select the product, then if the model always predicts 0, it will be right most of the time. So, it could be improving it's accuracy by over-fitting to the majority class (0).
You can prevent over-fitting by adjusting params like the learning rate and model architecture by adding dropout layers.
Also, I'm not sure what your model looks like, but you're only training for 2 epochs so it may not have had enough time to generalize the data, and depending on how deep your model is it could need a lot more training time
This question already has an answer here:
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Closed 2 years ago.
I'm trying to train a model for mnist.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
What i got is (60000, 28, 28), there are 60,000 items in the data set.
Then, I create the model with the following code.
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.fit(x_train, y_train, epochs=5)
However, I got only 1875 items for each epoch.
2020-06-02 04:33:45.706474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-06-02 04:33:45.706617: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-06-02 04:33:47.437837: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2020-06-02 04:33:47.437955: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-02 04:33:47.441329: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441480: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441876: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-02 04:33:47.448274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27fc6b2c210 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-02 04:33:47.448427: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/5
1875/1875 [==============================] - 1s 664us/step - loss: 0.2971 - accuracy: 0.9140
Epoch 2/5
1875/1875 [==============================] - 1s 661us/step - loss: 0.1421 - accuracy: 0.9582
Epoch 3/5
1875/1875 [==============================] - 1s 684us/step - loss: 0.1068 - accuracy: 0.9675
Epoch 4/5
1875/1875 [==============================] - 1s 695us/step - loss: 0.0868 - accuracy: 0.9731
Epoch 5/5
1875/1875 [==============================] - 1s 682us/step - loss: 0.0764 - accuracy: 0.9762
Process finished with exit code 0
You are using the whole data, no worries!
Due to the Keras documentation, https://github.com/keras-team/keras/blob/master/keras/engine/training.py
when you use model.fit and you do not specify the batch size, it got assigned to 32 by default.
batch_size Integer or NULL. Number of samples per gradient update. If
unspecified, batch_size will default to 32
It means that for each epoch you have 1875 steps, and in each step, your model has taken 32 data examples into the account. And guess what, 1875*32 is equal to 60,000.
I am testing a dataset with two labels 'A' and 'B' on a decision tree classifier. I accidentally found out that the model get different precision result on the same testing data. I want to know why.
Here is what I do, I train the model, and test it on
1. the testing set,
2. the data only labelled 'A' in the testing set,
3. and the data only labelled 'B'.
Here is what I got:
for testing dataset
precision recall f1-score support
A 0.94 0.95 0.95 25258
B 0.27 0.22 0.24 1963
for data only labelled 'A' in testing dataset
precision recall f1-score support
A 1.00 0.95 0.98 25258
B 0.00 0.00 0.00 0
for data only labelled 'B' in testing dataset
precision recall f1-score support
A 0.00 0.00 0.00 0
B 1.00 0.22 0.36 1963
The training dataset and model are the same, the data in 2 and 3rd test are also same with those in 1. Why the precision for 'A' and 'B' differ so much? What is the real precision for this model? Thank you very much.
You sound confused, and it is not at all clear why you are interested in metrics where you have completely remove one of the two labels from your evaluation set.
Let's explore the issue with some reproducible dummy data:
from sklearn.metrics import classification_report
import numpy as np
y_true = np.array([0, 1, 0, 1, 1, 0, 0])
y_pred = np.array([0, 0, 1, 1, 0, 0, 1])
target_names = ['A', 'B']
print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support
A 0.50 0.50 0.50 4
B 0.33 0.33 0.33 3
avg / total 0.43 0.43 0.43 7
Now, let's keep only class A in our y_true:
indA = np.where(y_true==0)
(array([0, 2, 5, 6], dtype=int64),)
[0 0 0 0]
[0 1 0 1]
Now, here is the definition of precision from the scikit-learn documentation:
The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
For class A, a true positive (tp) would be a case where the true class is A (0 in our case), and we have indeed predict A (0); from above, it is apparent that tp=2.
The tricky part is the false positives (fp): they are the cases where we have predicted A (0), where the true label is B (1). But it is apparent here that we cannot have any such cases, since we have (intentionally) removed all the B's from our y_true (why we would want to do such a thing? I don't know, it does not make any sense at all); hence, fp=0 in this (weird) setting. Hence, our precision for class A will be tp / (tp+0) = tp/tp = 1.
Which is the exact same result given by the classification report:
print(classification_report(y_true[indA], y_pred[indA], target_names=target_names))
# result:
precision recall f1-score support
A 1.00 0.50 0.67 4
B 0.00 0.00 0.00 0
avg / total 1.00 0.50 0.67 4
and obviously the case for B is identical.
why the precision is not 1 in case #1 (for both A and B)? The data are the same
No, they are very obviously not the same - the ground truth is altered!
Bottom line: removing classes from your y_true before computing precision etc. does not make any sense at all (i.e. your reported results in case #2 and case #3 are of no practical use whatsoever); but, since for whatever reasons you decide to do so, your reported results are exactly as expected.
I have a dataframe with approximately 14560 word vectors of dimension 400. I have reshaped each vector in 20*20 and used 1 channel for applying a CNN so the dimension has become (14560,20,20,1). When I try to fit the CNN model it throws an error.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers import BatchNormalization
from keras.utils import np_utils
from keras import backend as K
model_cnn.add(Convolution2D(filters = 16, kernel_size = (3, 3),
activation='relu',input_shape = (20, 20,1)))
model_cnn.compile(loss='categorical_crossentropy', optimizer = 'adadelta',
Error when checking target: expected conv2d_6 to have 4 dimensions,
but got array with shape (14560, 1). When I reshape train data to
(14560,1,20,20) still it gives error as model receives input
=(1,20,20) and required is (20,20,1).
How do I fix it ?
The problem is not only with x_tr shape, which should be (-1,20,20,1) as correctly pointed out in another answer. It's also the network architecture itself. If you do model_cnn.summary(), you'll see the following:
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 18, 18, 16) 160
Total params: 160
Trainable params: 160
Non-trainable params: 0
The output of the model is rank 4: (batch_size, 18, 18, 16). It can't compute the loss when the labels are (batch_size, 1).
The correct architecture must reshape the convolutional output tensor (batch_size, 18, 18, 16) to (batch_size, 1). There can be many ways to do it, here's one:
model_cnn = Sequential()
model_cnn.add(Convolution2D(filters=16, kernel_size=(3, 3), activation='relu', input_shape=(20, 20, 1)))
model_cnn.compile(loss='sparse_categorical_crossentropy', optimizer='adadelta', metrics=["accuracy"])
The summary:
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 18, 18, 16) 160
max_pooling2d_1 (MaxPooling2 (None, 1, 1, 16) 0
flatten_1 (Flatten) (None, 16) 0
dense_1 (Dense) (None, 1) 17
Total params: 177
Trainable params: 177
Non-trainable params: 0
Note that I added max-pooling to reduce 18x18 feature maps to 1x1, then flatten layer to squeeze the tensor to (None, 16) and finally the dense layer to output a single value. Also pay attention to the loss function: it's sparse_categorical_crossentropy. If you wish to do categorical_crossentropy, you have to do one-hot encoding and output not a single number, but the probability distribution over classes: (None, classes).
By the way, also check that your validation arrays have valid shape.
I'm somewhat new to machine learning in general, and I wanted to make a simple experiment to get more familiar with neural network autoencoders: To make an extremely basic autoencoder that would learn the identity function.
I'm using Keras to make life easier, so I did this first to make sure it works:
# Weights are given as [weights, biases], so we give
# the identity matrix for the weights and a vector of zeros for the biases
weights = [np.diag(np.ones(84)), np.zeros(84)]
model = Sequential([Dense(84, input_dim=84, weights=weights)])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=10, batch_size=8, validation_split=0.3)
As expected, the loss is zero, both in train and validation data:
Epoch 1/10
97535/97535 [==============================] - 27s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 2/10
97535/97535 [==============================] - 28s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Then I tried to do the same but without initializing the weights to the identity function, expecting that after a while of training it would learn it. It didn't. I've let it run for 200 epochs various times in different configurations, playing with different optimizers, loss functions, and adding L1 and L2 activity regularizers. The results vary, but the best I've got is still really bad, looking nothing like the original data, just being kinda in the same numeric range.
The data is simply some numbers oscillating around 1.1. I don't know if an activation layer makes sense for this problem, should I be using one?
If this "neural network" of one layer can't learn something as simple as the identity function, how can I expect it to learn anything more complex? What am I doing wrong?
To have better context, here's a way to generate a dataset very similar to the one I'm using:
X = np.random.normal(1.1090579, 0.0012380764, (139336, 84))
I'm suspecting that the variations between the values might be too small. The loss function ends up having decent values (around 1e-6), but it's not enough precision for the result to have a similar shape to the original data. Maybe I should scale/normalize it somehow? Thanks for any advice!
In the end, as it was suggested, the issue was with the dataset having too small variations between the 84 values, so the resulting prediction was actually pretty good in absolute terms (loss function) but comparing it to the original data, the variations were far off. I solved it by normalizing the 84 values in each sample around the sample's mean and dividing by the sample's standard deviation. Then I used the original mean and standard deviation to denormalize the predictions at the other end. I guess this could be done in a few different ways, but I did it by adding this normalization/denormalization into the model itself by using some Lambda layers that operated on the tensors. That way all the data processing was incorporated into the model, which made it nicer to work with. Let me know if you would like to see the actual code.
I believe the problem could be either the number of epoch or the way you inizialize X.
I ran your code with an X of mine for 100 epochs and printed the argmax() and max values of the weights, it gets really close to the identity function.
I'm adding the code snippet that I used
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import random
import pandas as pd
X = np.array([[random.random() for r in xrange(84)] for i in xrange(1,100000)])
model = Sequential([Dense(84, input_dim=84)], name="layer1")
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=100, batch_size=80, validation_split=0.3)
l_weights = np.round(model.layers[0].get_weights()[0],3)
print l_weights.argmax(axis=0)
print l_weights.max(axis=0)
And I'm getting:
Train on 69999 samples, validate on 30000 samples
Epoch 1/100
69999/69999 [==============================] - 1s - loss: 0.2092 - val_loss: 0.1564
Epoch 2/100
69999/69999 [==============================] - 1s - loss: 0.1536 - val_loss: 0.1510
Epoch 3/100
69999/69999 [==============================] - 1s - loss: 0.1484 - val_loss: 0.1459
Epoch 98/100
69999/69999 [==============================] - 1s - loss: 0.0055 - val_loss: 0.0054
Epoch 99/100
69999/69999 [==============================] - 1s - loss: 0.0053 - val_loss: 0.0053
Epoch 100/100
69999/69999 [==============================] - 1s - loss: 0.0051 - val_loss: 0.0051
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83]
[ 0.85000002 0.85100001 0.79799998 0.80500001 0.82700002 0.81900001
0.792 0.829 0.81099999 0.80800003 0.84899998 0.829 0.852
0.79500002 0.84100002 0.81099999 0.792 0.80800003 0.85399997
0.82999998 0.85100001 0.84500003 0.847 0.79699999 0.81400001
0.84100002 0.81 0.85100001 0.80599999 0.84500003 0.824
0.81999999 0.82999998 0.79100001 0.81199998 0.829 0.85600001
0.84100002 0.792 0.847 0.82499999 0.84500003 0.796
0.82099998 0.81900001 0.84200001 0.83999997 0.815 0.79500002
0.85100001 0.83700001 0.85000002 0.79900002 0.84100002 0.79699999
0.838 0.847 0.84899998 0.83700001 0.80299997 0.85399997
0.84500003 0.83399999 0.83200002 0.80900002 0.85500002 0.83899999
0.79900002 0.83399999 0.81 0.79100001 0.81800002 0.82200003
0.79100001 0.83700001 0.83600003 0.824 0.829 0.82800001
0.83700001 0.85799998 0.81999999 0.84299999 0.83999997]
When I used only 5 numbers as an input and printed the actual weights I got this:
array([[ 1., 0., -0., 0., 0.],
[ 0., 1., 0., -0., -0.],
[-0., 0., 1., 0., 0.],
[ 0., -0., 0., 1., -0.],
[ 0., -0., 0., -0., 1.]], dtype=float32)