Any way to print input and output data for each row during training in DL4J? - deeplearning4j

The score for my model is dropping far too fast. For a huge dataset it's declining from > 5 to 0 well before the first epoch is complete.
I suspect I may have misconfigured it somehow and perhaps each batch contains the same data.
Is there any way to print out the inputs and outputs for each row during training so I can test this theory?

Related

Is this LSTM underfitting?

I am trying to create a model that predicts if it will rain in the next 5 days (multi-step) or not, so I dont need the precipitation value, just a "yes" or "no". I've been testing with some different tools/algorithms and I guess the big challenge here is dealing with the zero skewed data.
The dataset consists of hourly data that has columns such as precipitation, temperature, pressure, wind speed, humidity. It has around 1 milion rows. There is no requisite to use a multivariate approach.
Rain occurs mostly on months 1,2,3,11 and 12.
So I tried using a univariate LSTM on the data, and with hourly sample I had the best results. I used the following architecture:
model=Sequential()
model.add(LSTM(150,return_sequences=True,input_shape=(1,look_back)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, epochs=15, batch_size=4096, validation_data=(testX, testY), shuffle=False)
I'm using a lookback value of 24*60, which should mean 2 months.
Train/Validation Loss:
https://i.stack.imgur.com/CjDbR.png
Final result:
https://i.stack.imgur.com/p6SnD.png
So I read that this train/validation loss means the model is underfitting, is it? What could I do to prevent this?
Before using LSTM I tried using Prophet, which rendered really bad results and tried used autoarima, but it couldn't handle a yearly seasonality (365 days).
In case of underfitting what you can do is icreasing the learning rate, increasing training duration and number of training data.
It is also worth having some external metric such as the F1 score because loss isn't a good metrics for human evaluation.
Just looking at your example I would start with experimenting a bit with the loss function, it seems like your data is binary so it would be wiser to use a binary loss instead of a regression loss

Validation and training loss per batch and epoch

I am using Pytorch to run some deep learning models. I am currently keeping track of training and validation loss per epoch, which is pretty standard. However, what is the best way of going about keeping track of training and validation loss per batch/iteration?
For training loss, I could just keep a list of the loss after each training loop. But, validation loss is calculated after a whole epoch, so I’m not sure how to go about the validation loss per batch. The only thing I can think of is to run the whole validation step after each training batch and keeping track of those, but that seems overkill and a lot of computation.
For example, the training is like this:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
And for validation loss:
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
# validation loss
batch_loss = error(outputs.float(), labels.long()).item()
loss_test += batch_loss
loss_test /= len(testloader)
The validation loss/test part is done per epoch. I’m looking for a way to get the validation loss per batch, which is my point above.
Any tips?
Well, you're right that's the way to do it "run the whole validation step after each training batch and keeping track of those" and also as you've thought it's pretty time-consuming and would be overkill. However, If that's something you really need then there's a way you can do it. What you can do is, let's say you've 1000 batches in your data. Now to calculate per batch val_loss you can choose not to run the validation step for each of the batch (then you'd have to do it 1000 times!) but for a small subset of those batches, let's say 50/100 (choose as you please or find feasible). Now, you can use some statistical power so that your calculation for 50/100 batches becomes very very close to that of 1000 batches (meaning this val_loss for a small number of batches must be as close as to those of 1000 batches if you had calculated that), so to achieve it you can introduce some randomness in your batch selection.
This means you randomly select 100 batches from your 1000 batches for which you'll run the validation step.
An epoch is the process of making the model go through the entire training set - which is, generally, divided into batches. Also, it tends to be shuffled. The validation set, on the other hand is used to tune the hyper-parameters of your training and find out what's your model's behavior towards new data. In that respect, to me, evaluating at epoch=1/2 doesn't make much sense. Because the question is - whatever the performance on the evaluation set at epoch=1/2 - what can you do about it? Since, you don't know which data it has been going through in the first half of the epoch, there's no way to take advantage of 'a first half being better'... And remember your data will likely be shuffled into batches.
Therefore, I would stick with the classic approach: train on the entire set then, and only then, evaluate on another set. In some cases, you won't even allow yourself to evaluate once per epoch, because of the computation time. Instead you would evaluate every n epochs. But then again it will depend on your dataset size, your sampling from that dataset, the batch size, and the computation cost.
For the training loss, you can keep track of its value per-update-step vs. per-epoch. This will give you much more control over whether or not your model is learning independently from the validation phase.
Edit - As an alternative for not having to run the entire evaluation set per train batch you could do the following: shuffle your validation and set the same batch size as your trainset.
len(trainset)//batch_size is the number of updates per epoch
len(validset)//batch_size is the number of allowed evaluation per epoch
Every len(trainset)//len(validset) train updates you can evaluate on 1
batch
This allows you to get a feedback len(trainset)//len(validset) times per epoch.
If you set your train/valid ratio as 0.1, then len(validset)=0.1*len(trainset), that's ten partial evaluations per epoch.

Massive drop in training error after the first epoch

I am training an LSTM autoencoder to recreate the input consisting of eight features(floating-point numbers between 0 and 1). Currently, am utilizing a window size of two and am training the model for 50 epochs. However, while training the network I observed that the training error (Mean Square Error) drops significantly after the first epoch. For example, during the first epoch the training error was 17.25. It dropped to 1.8 at the very next and stagnates after the seventh epoch. I was wondering if random initialization of weights might be causing this therefore I retrained one more network and the same phenomenon repeated.
I am not able to deduce the reason for this significant drop in training error after the first epoch and would appreciate any help. I have attached the training error graph and model information for reference.
Model info:
LSTM_AutoencoderModel(
(encoder): Encoder(
(lstm1): LSTM(16, 64)
(lstm2): LSTM(64, 16)
)
(decoder): Decoder(
(lstm1): LSTM(16, 64)
(lin1): Linear(in_features=64, out_features=16, bias=True)
)
)
Training error graph

How to apply same processing pipeline for train and test data when they result in different final features

I'm trying to create a regression model to predict some housing sales and I am facing an issue with processing the train data and test data (this is not the validation data taken from the training set itself) the same way. The steps I'm performing for the processing are follows:
drop the columns with null values >50%
Impute the rest of the columns containing null values
One-hot encode the categorical columns
Say my train data has the following columns (after label extraction) (the ones in ** ** contain null values):
['col1', 'col2', '**col3**', 'col4', '**col5**', 'col6', '**col7**','**col8**', '**col9**', '**col10**', 'col11']
test data has the following columns:
['col1', '**col2**', 'col3', 'col4', 'col5', 'col6', '**col7**', '**col8**', '**col9**', '**col10**', 'col11']
I only drop those columns with >50% null values and the rest of the columns in bold, I impute. Say, in the train data, I will have:
cols_to_drop= ['**col3**','**col5**','**col7**' ]
cols_to_impute= ['**col8**', '**col9**','**col10**' ]
And if I retain the same columns to be dropped from test data too, my test data will have the following:
cols_to_drop= ['**col3**','**col5**','**col7**' ]
cols_to_impute= ['**col2**', '**col8**', '**col9**','**col10**' ]
The problem now comes with imputation where I have to .fit_transform my imputer with the cols_to_impute in train data and have to .transform the same imputer with the cols_to_impute in the test data since there is a clear difference in the number of features supplied here in both the cols_to_impute lists. (I did this as well and had issues with imputation)
Say, if I keep the same cols_to_impute in both train and test datasets ignoring the null column **col2** of test data, I faced an issue when it came to one-hot encoding saying Nan's need to be handled before encoding. So, how should the processing be done for train and test sets in such cases? Should I be concatenating both of them, perform processing and split them later again? I read about leakage issues in doing this.
Well, you should do the following:
Combine both train and test dataframe, then do the first two steps i.e. dropping the column with nulls and imputing them.
Then, split it back into train and test, then do one hot encoding.
This would ensure that both the data frames have same columns and there is no leakage in doing one hot encoding.

How does Caffe determine test set accuracy?

Using the BVLC reference AlexNet file, I have been training a CNN against a training set I created.  In order to measure the progress of training, I have been using a rough method to approximate the accuracy against the training data.  My batch size on the test net is 256.  I have ~4500 images.  I perform 17 calls to solver.test_nets[0].forward() and record the value of solver.test_nets[0].blobs['accuracy'].data (the accuracy of that forward pass).  I take the average across these.  My thought was that I was taking 17 random samples of 256 from my validation set and getting the accuracy of these random samplings.  I would expect this to closely approximate the true accuracy against the entire set.  However, I later went back and wrote a script to go through each item in my LMDB so that I could generate a confusion matrix for my entire test set.  I discovered that the true accuracy of my model was significantly lower than the estimated accuracy.  For example, my expected accuracy of ~75% dropped to ~50% true accuracy.  This is a far worse result than I was expecting.
My assumptions match the answer given here.
Have I made an incorrect assumption somewhere?  What could account for the difference?  I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case.  blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.
I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case. blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.
The forward() function from Caffe does not perform any random sampling, it will only fetch the next batch according to your DataLayer. E.g., in your case forward() will pass the next 256 images in your network. Performing this 17 times will pass sequentially 17x256=4352 images.
Have I made an incorrect assumption somewhere? What could account for the difference?
Check that the script that goes through your whole LMDB performs the same data pre-processing as during training.

Resources