How does a multivariate multistep TCN actually looks like? - machine-learning

I am trying to fit a time convolutional network (TCN) to a multivariate and multistep regression / forecasting problem but I don't have a clue what I am actually doing.
What I am doing is:
model = tf.keras.Sequential()
model.add(TCN(input_shape = (255, 7),
nb_filters = 128,
kernel_size = 2,
nb_stacks = 2,
dilations = [64,32,16,8,4,2,1],
padding = 'causal',
use_skip_connections = True,
dropout_rate = 0.01,
return_sequences = False,
activation = "LeakyReLU",
kernel_initializer = "he_normal",
use_batch_norm = False,
use_layer_norm = True ,
use_weight_norm = False
))
model.add(Dense(24, activation="linear"))
I am guessing my network looks similar to the appended figure from the much appreciated Wavenet paper. But I am unsure about 3 things:
For multiple labels I assume there is one full architecture like in the appended figure for each label and they each have in my case two edges to the output. Is that correct?
How does the network look for multistep outputs?
For multiple targets I am also assuming that there is a full architecture for every target as well, but with multiple labels this would result in an enormously large network?! How is it really?
Are there figures showing more complex TCN? Is it good practice to have such TCN or should they be used in their simplest way and integrated into more complex, paralleled architectures?

Related

Keras discrepancy between .evaluate and .predict

I know this question has been asked before, but I have tried all of their solutions and nothing is working for me.
My Problem:
I am running a CNN to classify some images, a typical task, nothing too crazy. I have the following compilation of my model.
model.compile(optimizer = keras.optimizers.Adam(learning_rate = exp_learning_rate),
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = ['accuracy'])
I fit this on my training dataset, and evaluated on my validation dataset as follows:
history = model.fit(train_dataset, validation_data = validation_dataset, epochs = 5)
And then I evaluated on a separate test set as follows:
model.evaluate(test_dataset)
Which resulted in this:
4/4 [==============================] - 30s 7s/step - loss: 1.7180 - accuracy: 0.8627
However, when I run:
model.predict(test_dataset)
I have the following confusion matrix output:
This clearly isn't 86% accuracy like the .evaluate method tells me. In fact, it's actually 35.39% accuracy. To make sure it wasn't an issue with my testing dataset, I had my model predict on my training and validation datasets and I still got a similar percentage as here (~30%) despite my training, validation accuracy during fitting going up to 96%, 87%, respectively.
Question:
I don't know why .predict and .evaluate are outputting different results? What's happening there? It seems like when I call .predict, it's not using any of the weights that I trained during fitting? (in fact, given that there are 3 classes, this output is no better than just blindly guessing each label). Are the weights from my fitting not being transferred over to my prediction? My loss function is correct (I label encoded my data as tensorflow wishes to be used with sparse_categorical_crossentropy) and when I pass 'accuracy', it will just take the accuracy corresponding to my loss function. All of this should be consistent. But why is there such a discrepancy with the results of .evaluate and .predict? Which one should I trust?
My Attempts to Fix My Issue:
I thought maybe the sparse categorical cross entropy wasn't right, so I one-hot encoded my target labels and used the categorical_crossentropy loss instead. I still have the EXACT same issue as above.
Concerns:
If the .evaluate is incorrect, then doesn't that mean my training accuracy and validation accuracy during fitting are inaccurate as well? Don't those use the .evaluate method as well? If that's the case, then what can I trust? The loss isn't a good indication of if my model is doing well because it is well-known that minimal loss does not imply good accuracy (although the converse is usually true depending on what standard of "good" we're using). How do I gauge my model's effectiveness in the case that my accuracy metrics aren't correct? I don't really know what to look at anymore because I have no other way to gauge if my model is learning, if someone could please help me understand what is happening I would appreciate it so much. I'm so frustrated.
Edit: (10-28-2021: 12:26 AM)
Ok, so I'll provide some more code to really troubleshoot this.
I originally preprocessed my data as such:
image_size = (256, 256)
batch_size = 16
train_ds = keras.preprocessing.image_dataset_from_directory(
directory = image_directory,
label_mode = 'categorical',
shuffle = True,
validation_split = 0.2,
subset = 'training',
seed = 24,
batch_size = batch_size
)
val_ds = keras.preprocessing.image_dataset_from_directory(
directory = image_directory,
label_mode = 'categorical',
shuffle = True,
validation_split = 0.2,
subset = 'validation',
seed = 24,
batch_size = batch_size
)
Where image_directory is a string with a path containing my images. Now you could probably read documentation, but the image_dataset_from_directory method actually returns a tf.data.Dataset object containing a bunch of batches of the respective (training, validation) data.
I imported the VGG16 architecture to do my classification so I called the respective preprocessing function for VGG16 as follows:
preprocess_input = tf.keras.applications.vgg16.preprocess_input
train_ds = train_ds.map(lambda x, y: (preprocess_input(x), y))
val_ds = val_ds.map(lambda x, y: (preprocess_input(x), y))
This transformed the images into something that was suitable as input for VGG16. Then, in my last processing steps, I did the following validation/test split:
val_batches = tf.data.experimental.cardinality(val_ds)
test_dataset = val_ds.take(val_batches // 3)
validation_dataset = val_ds.skip(val_batches // 3)
Then I proceeded to cache and prefetch my data:
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_ds.prefetch(buffer_size = AUTOTUNE)
validation_dataset = validation_dataset.prefetch(buffer_size = AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size = AUTOTUNE)
The Problem:
The problem occurs in the method above. I'm still not sure whether or not .evaluate is a true indicator of accuracy for my model. But I realized that the .evaluate and .predict always coincide when my neural network is a keras.Sequential() model. However, (correct me if I'm wrong) what I am suspecting is that VGG16, when imported from keras.applications API, is actually NOT a keras.Sequential() model. Therefore, I don't think that the .predict and .evaluate results actually coincide when I feed my data straight into my model (I was going to post this as an answer, but I don't have sufficient knowledge nor research to confirm that any of what I said is correct, someone please chime in because I like learning things I know little to nothing about, an edit this is for now).
In the end, I worked around my problem by calling Image_Data_Generator() instead of image_dataset_from_directory() as follows:
train_datagen = ImageDataGenerator(
preprocessing_function = preprocess_input,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True
)
val_datagen = ImageDataGenerator(
preprocessing_function = preprocess_input
)
train_ds = train_datagen.flow_from_directory(
train_image_directory,
target_size = (224, 224),
batch_size = 16,
seed = 24,
shuffle = True,
classes = ['class1', 'class2', 'class3'],
class_mode = 'categorical'
)
test_ds = val_datagen.flow_from_directory(
test_image_directory,
target_size = (224, 224),
batch_size = 16,
seed = 24,
shuffle = False,
classes = ['class1', 'class2', 'class3'],
class_mode = 'categorical'
)
(NOTE: I got this based off the following link from tensorflow's documentation: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow_from_directory)
This completes all the preprocessing for me. Then, when I call model.evaluate(test_ds), it returns the exact same result as when I do model.predict_generator(test_ds). After some minor processing of the prediction output, I use the following code for my confusion matrix:
Y_pred = model.predict(test_ds)
y_pred = np.argmax(Y_pred, axis=1)
cf = confusion_matrix(test_ds.classes, y_pred)
sns.heatmap(cf, annot= True, xticklabels = class_names,
yticklabels = class_names)
plt.title('Performance of Model on Testing Set')
This eliminates the discrepancy in the confusion matrix and the result of model.evaluate(test_ds).
The Takeaway:
If you're loading images onto a classification model, and your loss and accuracy match, but you're getting discrepancy between your predictions and loss, accuracy, try preprocessing in every way possible. I usually preprocess my images using the image_dataset_from_directory() method for all my keras.sequential() models, however, for the VGG16 model, which I suspect is not a sequential() model, using the ImageDataGenerator(...).flow_from_directory(...) resulted in the correct format for the model to generate a prediction that is consistent with the performance metrics.
TLDR I didn't answer any of my original questions, but I found a workaround. Sorry if this is spam in any way. As is the nature of most Stack Overflow posts, I hope my turmoil in the last few hours helps someone way in the future.
I had the same problem. And even with the ImageDataGenerator it stayed that odd behaviour.
But I think the problem is the shuffle flag of the validation set.
You changed that from here:
val_ds = keras.preprocessing.image_dataset_from_directory(
directory = image_directory,
label_mode = 'categorical',
shuffle = True,
validation_split = 0.2,
subset = 'validation',
seed = 24,
batch_size = batch_size
)
To here:
test_ds = val_datagen.flow_from_directory(
test_image_directory,
target_size = (224, 224),
batch_size = 16,
seed = 24,
shuffle = False,
classes = ['class1', 'class2', 'class3'],
class_mode = 'categorical'
)

when setting .eval() my model performs worse than when I set .train()

During the training phase, I select the model parameters with the best performance metric.
if performance_metric.item()>max_performance:
max_performance= performance_metric.item()
torch.save(neural_net.state_dict(), PATH+'/best_model.pt')
This is the neural network model used:
class Neural_Net(nn.Module):
def __init__(self, M,shape_input,batch_size):
super(Neural_Net, self).__init__()
self.lstm = nn.LSTM(shape_input,M)
#self.dense1 = nn.Linear(shape_input,M)
self.dense1 = nn.Linear(M,M) #Used with the LSTM
torch.nn.init.xavier_uniform_(self.dense1.weight)
self.dense2 = nn.Linear(M,M)
torch.nn.init.xavier_uniform_(self.dense2.weight)
self.dense3 = nn.Linear(M,1)
torch.nn.init.xavier_uniform_(self.dense3.weight)
self.drop = nn.Dropout(0.7)
self.bachnorm1 = nn.BatchNorm1d(M)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
self.hidden_cell = (torch.zeros(1,batch_size,M),torch.zeros(1,batch_size,M))
def forward(self, x):
lstm_out, self.hidden_cell = self.lstm(x.view(1 ,len(x), -1), self.hidden_cell)
x = self.drop(self.relu(self.dense1(self.bachnorm1(lstm_out.view(len(x), -1)))))
x = self.drop(self.relu(self.dense2(x)))
x = self.relu(self.dense3(x))
return x
After that I load the model with the best parameters and set the evaluation mode:
neural_net.load_state_dict(torch.load(PATH+'/best_model.pt'))
neural_net.eval()
The results are completely random. When I set train() the performance is similar to the selected best model parameter.
There is an important aspect of the eval() that I am forgetting? Is the batch normalization correctly used? I am using a batch the same size as in the training phase for the test phase.
Without knowing your batch size, training/test dataset size, or the training/test dataset discrepancies, this issue has been discussed on the pytorch forums previously here.
In my experience, it sounds very much like your latent training data representation in your model is significantly different to your validation data representation. The main advice I can provide is for you to try reducing the momentum of your batchnorm layer. It might be worth substituting a layernorm layer instead (which doesn't track a running mean/standard deviation) OR setting track_running_stats=False in the batchnorm1d function and seeing if the problem persists.

Layers for predicting financial data using Tensorflow/tflearn

I'd like to predict the interest rate and I've got some relevant factors like stock index and money supply number, something like that. The number of factors may be up to 200.
For example,the training data like, X contains factors and y is the interest rate I want to train and predict.
factor1 factor2 factor3 factor176 factor177 factor178
X= [[ 2.1428 6.1557 5.4101 ..., 5.86 6.0735 6.191 ]
[ 2.168 6.1533 5.2315 ..., 5.8185 6.0591 6.189 ]
[ 2.125 4.7965 3.9443 ..., 5.7845 5.9873 6.1283]...]
y= [[ 3.5593]
[ 3.014 ]
[ 2.7125]...]
So I want to use tensorflow/tflearn to train this model but I don't really know what method exactly I should choose to do regression. I have tried LinearRegression from tflearn before, but the result is not so great.
For now, I just use the code I found online.
net = tflearn.input_data([None, 178])
net = tflearn.fully_connected(net, 64, activation='linear',
weight_decay=0.0005)
net = tflearn.fully_connected(net, 1, activation='linear')
net = tflearn.regression(net, optimizer=
tflearn.optimizers.AdaGrad(learning_rate=0.01, initial_accumulator_value=0.01),
loss='mean_square', learning_rate=0.05)
model = tflearn.DNN(net, tensorboard_verbose=0, checkpoint_path='tmp/')
model.fit(X, y, show_metric=True,
batch_size=1, n_epoch=100)
The result is roughly 50% accuracy when the error range is ±10%.
I have tried to make the window to 7 days but the result is still bad. So I want to know what additional layer I can use to make this network better.
First of all this network makes no sense. If you do not have any activations on your hidden units, you network is equivalent to linear regression.
So first of all change
net = tflearn.fully_connected(net, 64, activation='linear',
weight_decay=0.0005)
to
net = tflearn.fully_connected(net, 64, activation='relu',
weight_decay=0.0005)
Another general thing is to always normalise your data. Your X's are big, y's are big as well - make sure they aren't, by for example whitening them (making them 0 mean and 1 std).
Finding right architecture is hard problem and you will not find any "magical recipies" for that. Start with understanding what you are doing. Log your training, see if the training loss converges to small values, if it does not - you either do not train long enough, network is too small, or training hyperparameters are off (like too big learning right, too high regularisation etc.)

Variational Autoencoder for Feature Extraction

I would like to ask if would it be possible (rather if it can make any sense) to use a variational autoencoder for feature extraction. I ask because for the encoding part we sample from a distribution, and then it means that the same sample can have a different encoding (Due to the stochastic nature in the sampling process). Thanks!
Yes the feature extraction goal is the same for vae's or sparse autoencoders.
Once you have an encoder plug-in a classifier on the extracted features.
Best reggards,
Yes the output of encoder network can be used as your feature.
Just think about this: using the output of encoder network as input, the decoder network can generate you an image quite like your old image. Therefore the output of encoder network has pretty much covered most of the information in your original image. In other words, they are the most important features of your original image that distinguish it from other images.
The only thing you want to pay attention to is that variational autoencoder is a stochastic feature extractor, while usually the feature extractor is deterministic. You can either use the mean and variance as your extracted feature, or use Monte Carlo method by drawing from the Gaussian distribution defined by the mean and variance as "sampled extracted features".
Yes, you can.
I used the below code to extract the important features from my dataset.
prostate_df <- read.csv('your_data')
prostate_df <- prostate_df[,-1] # first column.
train_df<-prostate_df
outcome_name <- 'subtype' # my label column
feature_names <- setdiff(names(prostate_df), outcome_name)
library(h2o)
localH2O = h2o.init()
prostate.hex<-as.h2o(train_df, destination_frame="train.hex")
prostate.dl = h2o.deeplearning(x = feature_names,
#y="subtype",
training_frame = prostate.hex,
model_id = "AE100",
# input_dropout_ratio = 0.3, #Quite high,
#l2 = 1e-5, #Quite high
autoencoder = TRUE,
#validation_frame = prostate.hex,
#reproducible = T,seed=1,
hidden = c(1), epochs = 700,
#activation = "Tanh",
#activation ="TanhWithDropout",
activation ="Rectifier",
#activation ="RectifierWithDropout",
standardize = TRUE,
#regression_stop = -1,
#stopping_metric="MSE",
train_samples_per_iteration = 0,
variable_importances=TRUE
)
label1<-ncol(train_df)
train_supervised_features2 = h2o.deepfeatures(prostate.dl, prostate.hex, layer=1)
plotdata = as.data.frame(train_supervised_features2)
plotdata$label = as.character(as.vector(train_df[,label1]))
library(ggplot2)
qplot(DF.L1.C1, DF.L1.C2, data = plotdata, color = label, main = "Cancer Normal Pathway data ")
prostate.anon = h2o.anomaly(prostate.dl, prostate.hex, per_feature=FALSE)
head(prostate.anon)
err <- as.data.frame(prostate.anon)
h2o.scoreHistory(prostate.dl)
head(h2o.varimp(prostate.dl),10)
h2o.varimp_plot(prostate.dl)

Fitting 3 Normals using PyMC: wrong convergence on simple data

I wrote a PyMC model for fitting 3 Normals to data using (similar to the one in this question).
import numpy as np
import pymc as mc
import matplotlib.pyplot as plt
n = 3
ndata = 500
# simulated data
v = np.random.randint( 0, n, ndata)
data = (v==0)*(10+ 1*np.random.randn(ndata)) \
+ (v==1)*(-10 + 2*np.random.randn(ndata)) \
+ (v==2)*3*np.random.randn(ndata)
# the model
dd = mc.Dirichlet('dd', theta=(1,)*n)
category = mc.Categorical('category', p=dd, size=ndata)
precs = mc.Gamma('precs', alpha=0.1, beta=0.1, size=n)
means = mc.Normal('means', 0, 0.001, size=n)
#mc.deterministic
def mean(category=category, means=means):
return means[category]
#mc.deterministic
def prec(category=category, precs=precs):
return precs[category]
obs = mc.Normal('obs', mean, prec, value=data, observed = True)
model = mc.Model({'dd': dd,
'category': category,
'precs': precs,
'means': means,
'obs': obs})
M = mc.MAP(model)
M.fit()
# mcmc sampling
mcmc = mc.MCMC(model)
mcmc.use_step_method(mc.AdaptiveMetropolis, model.means)
mcmc.use_step_method(mc.AdaptiveMetropolis, model.precs)
mcmc.sample(100000,burn=0,thin=10)
tmeans = mcmc.trace('means').gettrace()
tsd = mcmc.trace('precs').gettrace()**-.5
plt.plot(tmeans)
#plt.errorbar(range(len(tmeans)), tmeans, yerr=tsd)
plt.show()
The distributions from which I sample my data are clearly overlapping, yet there are 3 well distinct peaks (see image below). Fitting 3 Normals to this kind of data should be trivial and I would expect it to produce the means I sample from (-10, 0, 10) in 99% of the MCMC runs.
Example of an outcome I would expect. This happened in 2 out of 10 cases.
Example of an unexpected result that happened in 6 out of 10 cases. This is weird because on -5, there is no peak in the data so I can't really a serious local minimum that the sampling can get stuck in (going from (-5,-5) to (-6,-4) should improve the fit, and so on).
What could be the reason that (adaptive Metropolis) MCMC sampling gets stuck in the majority of cases? What would be possible ways to improve the sampling procedure that it doesn't?
So the runs do converge, but do not really explore the right range.
Update: Using different priors, I get the right convergence (appx. first picture) in 5/10 and the wrong one (appx. second picture) in the other 5/10. Basically, the lines changed are the ones below and removing the AdaptiveMetropolis step method:
precs = mc.Gamma('precs', alpha=2.5, beta=1, size=n)
means = mc.Normal('means', [-5, 0, 5], 0.0001, size=n)
Is there a particular reason you would like to use AdaptiveMetropolis? I imagine that vanilla MCMC wasn't working, and you got something like this:
Yea, that's no good. There are a few comments I can make. Below I used vanilla MCMC.
Your means prior variance, 0.001, is too big. This corresponds to a std deviation of about 31 ( = 1/sqrt(0.001) ), which is too small. You are really forcing your means to be close to 0. You want a much larger std. deviation to help explore the area. I decreased the value to 0.00001 and got this:
Perfect. Of course, apriori I knew the true means were 50,0,and -50. Usually we don't know this, so it's always a good idea to set that value to be quite small.
2. Do you really think all the normals line up at 0, like your mean prior suggests? (You set the mean of all of them to 0) The point of this exercise is to find them to be different, so your priors should reflect that. Something like:
means = mc.Normal('means', [-5,0,5], 0.00001, size=n)
more accurately reflects your true belief. This actually also helps convergence by suggesting to the MCMC where the means should be. Of course, you'd have to use your best estimate to come up with these numbers (I've naively chosen -5,0,5 here).
The problem is caused by a low acceptance rate for the category variable. See the answer I gave to a similar question.

Resources