Prediction with GPytorch DeepGP was poor - machine-learning

[Link to DeepGP definition:] (https://drive.google.com/file/d/1UwU9fz6vqxOQ3F0NSKKmRSgiP6pNnbwo/view?usp=sharing)
I have defined a DeepGP model using GPytorch, trained it, and the predictions are made as:
test_dataset = TensorDataset(test_x, test_a)
test_loader = DataLoader(test_dataset, batch_size=128)
model.eval()
predictive_means, predictive_variances, test_lls = model.predict(test_loader)
And the predicted output (all are almost same values) are as follows:
predictive_means.mean(0)
tensor([6.4420, 6.4423, 6.4429, 6.4441, 6.4414, 6.4440, 6.4440, 6.4435, 6.4432,
6.4435, 6.4447, 6.4446, 6.4412, 6.4447, 6.4425, 6.4439, 6.4442, 6.4440,
6.4417, 6.4431, 6.4435, 6.4417, 6.4439, 6.4438, 6.4421, 6.4432, 6.4434,
6.4421, 6.4431, 6.4424, 6.4412])
Whereas the ground truth is
test_a
tensor([5.4880, 5.4247, 7.8780, 5.5635, 8.0862, 5.9888, 8.3903, 5.5700, 6.0913,
5.6440, 5.5785, 5.4150, 8.3801, 5.5642, 8.1350, 5.4410, 5.4670, 5.7932,
5.4650, 8.4411, 5.8117, 5.5729, 7.8776, 5.4746, 5.6451, 8.0486, 6.0792,
5.4944, 5.6321, 5.7548, 5.5903])
The training as well as test losses are huge.
Where am I missing? Or is there a different way to get the actual predicted output?
I tried training the model
with and without scaling the inputs
with different Kernels and their combinations available in GPytorch
checked codes and working of MLL
Tried with different set of inputs (guessing that sometimes the x and y in data may be wrongly matched)

Related

Creating a new dataset of hidden state probabilities using a HMM results in different shapes after each run

I'm trying to create a new dataset of hidden state probabilities using a hidden Markov model. Everything works fine unless each time the output dataset comes up with different values (sometimes the same values) for hidden_states_train and hidden_states_test hence resulting a different column sizes in the columns stack/ a feature mismatch. e.g New dataset size (15261, 197) (5087, 194), New dataset size (15261, 197) (5087, 197) etc.
I can't figure out why this is happening each time I run the code. I tried to give same number of samples for both X_train_st and X_test_st but this keeps happening. If I set n_comp in range a smaller range e.g for n_comp in range(1,6) then often it results the same shapes.
Can someone shed some light to what's going on and a possible fix, please?
newX = X_train_st
newXtest = X_test_st
for n_comp in range(1,16):
print("fitting to HMM and decoding %d ..." % n_comp , end="")
modelHMM = GaussianHMM(n_components=n_comp, covariance_type="diag").fit(X_train_st)
hidden_states_train = to_categorical(modelHMM.predict(X_train_st))
hidden_states_test = to_categorical(modelHMM.predict(X_test_st))
print("done")
newX = np.column_stack((newX,hidden_states_train))
newXtest = np.column_stack((newXtest,hidden_states_test))
print('New dataset size',newX.shape,newXtest.shape)

Issue with multilabel classification

I followed this tutorial: https://medium.com/#vijayabhaskar96/multi-label-image-classification-tutorial-with-keras-imagedatagenerator-cd541f8eaf24
and wrote some of my code for multilabel classification. I had it working with one-hot encoding on a small scale but I had to move to option 2 mentioned in the article because I have 6000 classes and therefore one hot was not viable. I managed to train the network and it said 99% accuracy and 83% f1 score. However, when I'm trying to test the network, for every image it's outputting some combination of only 3 labels when there are 6000 possible labels. I wondered if maybe the code to test the model was incorrect. I tried using the code mentioned in the post and it doesn't work:
test_generator.reset()
pred = model.predict_generator(test_generator, steps=STEP_SIZE_TEST, verbose=1);
pred_bool = (pred > 0.5)
unorderable types: list() > float()
I've tried hard to fix this and not figured it out and I can't find any examples online of anyone doing something similar. Does anyone have an idea of how to get this prediction part working using this code block (I had it with another 2 options and was getting that issue printing one or several labels) or why the model might be failing in training with this behavior?
EDIT: for more context on the training issue, here is all the training code:
import json
input_file = open ('class_names_6000.json')
json_array = json.load(input_file)
#print(str(json_array))
args = parser.parse_args()
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
print('Loading Data...')
df = pd.read_csv('dataset_train.csv')
df["labels"]=df["labels"].apply(lambda x:x.split(","))
datagen=ImageDataGenerator(rescale=1./255.)
test_datagen=ImageDataGenerator(rescale=1./255.)
train_generator=datagen.flow_from_dataframe(
dataframe=df,
directory="",
x_col="Filepaths",
y_col="labels",
batch_size=128,
seed=42,
shuffle=True,
class_mode="categorical",
classes=json_array,
target_size=(100,100))
df = pd.read_csv('dataset_test.csv')
df["labels"]=df["labels"].apply(lambda x:x.split(","))
test_generator=test_datagen.flow_from_dataframe(
dataframe=df,
directory="",
x_col="Filepaths",
y_col="labels",
batch_size=128,
seed=42,
shuffle=True,
class_mode="categorical",
classes=json_array,
target_size=(100,100))
df = pd.read_csv('dataset_validation.csv')
df["labels"]=df["labels"].apply(lambda x:x.split(","))
valid_generator=test_datagen.flow_from_dataframe(
dataframe=df,
directory="",
x_col="Filepaths",
y_col="labels",
batch_size=128,
seed=42,
shuffle=True,
class_mode="categorical",
classes=json_array,
target_size=(100,100))
print('Data Loaded.')
f1_score_callback = ComputeF1()
model = build_model('train', numclasses=len(json_array), model_name = args.model)
ImageFile.LOAD_TRUNCATED_IMAGES = True
Also, an important detail, when training, it says the accuracy is 99% and the f1 score is 84% with an validation f1 score at 84% as well.

Tuning the classification threshold in mlr

I am training a Naive Bayes model using the mlr package.
I would like to tune the threshold (and only the threshold) for the classification. The tutorial provides an example for doing this while also doing additional hyperparameter tuning in a nested CV-setting. I actually do not want to tune any other (hyper)parameter while finding the optimal threshold value.
Based on the discussion here I set up a makeTuneWrapper() object and set another parameter (laplace) to a fixed value (1) and subsequently run resample() in a nested CV-setting.
nbayes.lrn <- makeLearner("classif.naiveBayes", predict.type = "prob")
nbayes.lrn
nbayes.pst <- makeParamSet(makeDiscreteParam("laplace", value = 1))
nbayes.tcg <- makeTuneControlGrid(tune.threshold = TRUE)
# Inner
rsmp.cv5.desc<-makeResampleDesc("CV", iters=5, stratify=TRUE)
nbayes.lrn<- makeTuneWrapper(nbayes.lrn, par.set=nbayes.pst, control=nbayes.tcg, resampling=rsmp.cv5.desc, measures=tpr)
# Outer
rsmp.cv10.desc<-makeResampleDesc("CV", iters=10, stratify=TRUE)
nbayes.res<-resample(nbayes.lrn, beispiel3.tsk, resampling= rsmp.cv10.desc, measures=list(tpr,ppv), extract=getTuneResult)
print(nbayes.res$extract)
Setting up a resampling scheme for the inner loop in the nested CV seems superfluous. The internal call to tuneThreshold() apparently does a more thorough optimization anyhow. However, calling makeTuneWrapper() without a resampling scheme leads to an error message.
I have two specific questions:
1.) Is there a simpler way of tuning a threshold (and only the threshold)?
2.) Given the setup used above: how can I access the threshold values that were actually tested?
EDIT:
This would be a code example for tuning the threshold for different measures (accuraccy, sensitivity, precision) based on the answer by #Lars Kotthoff.
### Create fake data
y<-c(rep(0,500), rep(1,500))
x<-c(rep(0, 300), rep(1,200), rep(0,100), rep(1,400))
balanced.df<-data.frame(y=y, x=x)
balanced.df$y<-as.factor(balanced.df$y)
balanced.df$x<-as.factor(balanced.df$x)
balanced.tsk<-makeClassifTask(data=balanced.df, target="y", positive="1")
summarizeColumns(balanced.tsk)
### TuneThreshold
logreg.lrn<-makeLearner("classif.logreg", predict.type="prob")
logreg.mod<-train(logreg.lrn, balanced.tsk)
logreg.preds<-predict(logreg.mod, balanced.tsk)
threshold_tpr<-tuneThreshold(logreg.preds, measure=list(tpr))
threshold_tpr
threshold_acc<-tuneThreshold(logreg.preds, measure=list(acc))
threshold_acc
threshold_ppv<-tuneThreshold(logreg.preds, measure=list(ppv))
threshold_ppv
You can use tuneThreshold() directly:
require(mlr)
iris.model = train(makeLearner("classif.naiveBayes", predict.type = "prob"), iris.task)
iris.preds = predict(iris.model, iris.task)
res = tuneThreshold(iris.preds)
Unfortunately, you can't access the threshold values that were tested when using tuneThreshold(). You could however treat the threshold value as a "normal" hyperparameter and use any of the tuning methods in mlr. This would allow you to get the values and corresponding performance.

Slightly differing output from Pybrain neural network despite consistent initialisation?

I am working on a feed forward network in PyBrain. To allow me to compare the effects of varying certain parameters I have initialised the network weights myself. I have done this under the assumption that if the weights are always the same then the output should always be the same. Is this assumption incorrect? Below is the code used to set up the network
n = FeedForwardNetwork()
inLayer = LinearLayer(7, name="in")
hiddenLayer = SigmoidLayer(1, name="hidden")
outLayer = LinearLayer(1, name="out")
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer, name="in-to-hidden")
hidden_to_out = FullConnection(hiddenLayer, outLayer, name="hidden-to-out")
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
in_to_hidden_params = [
0.27160018, -0.30659429, 0.13443352, 0.4509613,
0.2539234, -0.8756649, 1.25660715
]
hidden_to_out_params = [0.89784474]
net_params = in_to_hidden_params + hidden_to_out_params
n._setParameters(net_params)
trainer = BackpropTrainer(n, ds, learningrate=0.01, momentum=0.8)
UPDATE
It looks like even by seeding the random number generator, reproducibility is still an issue. See the GitHub issue here
I have done this under the assumption that if the weights are always the same then the output should always be the same
The assumption is correct, but your code is not doing so. Your are training your weights, thus they do not end up being the same. Stochastic training methods often permute training samples, and this permutation leads to different results, in particular BackpropTrainer does so:
def train(self):
"""Train the associated module for one epoch."""
assert len(self.ds) > 0, "Dataset cannot be empty."
self.module.resetDerivatives()
errors = 0
ponderation = 0.
shuffledSequences = []
for seq in self.ds._provideSequences():
shuffledSequences.append(seq)
shuffle(shuffledSequences)
If you want repeatable results - seed your random number generators.

What is the meaning of the GridSearchCV best_score_ attribute? (the value is different from the mean of the cross validation array)

I'm confused with the results, probably I'm not getting the concept of cross validation and GridSearch right. I had followed the logic behind this post:
https://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/
argd = CommandLineParser(argv)
folder,fname=argd['dir'],argd['fname']
df = pd.read_csv('../../'+folder+'/Results/'+fname, sep=";")
explanatory_variable_columns = set(df.columns.values)
response_variable_column = df['A']
explanatory_variable_columns.remove('A')
y = np.array([1 if e else 0 for e in response_variable_column])
X =df[list(explanatory_variable_columns)].as_matrix()
kf_total = KFold(len(X), n_folds=5, indices=True, shuffle=True, random_state=4)
dt=DecisionTreeClassifier(criterion='entropy')
min_samples_split_range=[x for x in range(1,20)]
dtgs=GridSearchCV(estimator=dt, param_grid=dict(min_samples_split=min_samples_split_range), n_jobs=1)
scores=[dtgs.fit(X[train],y[train]).score(X[test],y[test]) for train, test in kf_total]
# SAME AS DOING: cross_validation.cross_val_score(dtgs, X, y, cv=kf_total, n_jobs = 1)
print scores
print np.mean(scores)
print dtgs.best_score_
RESULTS OBTAINED:
# score [0.81818181818181823, 0.78181818181818186, 0.7592592592592593, 0.7592592592592593, 0.72222222222222221]
# mean score 0.768
# .best_score_ 0.683486238532
ADDITIONAL NOTE:
I ran it using another combination of the explanatory variables (using only some of them) and I got the inverse problem. Now the .best_score_ is higher than all the values in the cross validation array.
# score [0.74545454545454548, 0.70909090909090911, 0.79629629629629628, 0.7407407407407407, 0.64814814814814814]
# mean score 0.728
# .best_score_ 0.802752293578
The code is confusing several things.
dtgs.fit(X[train_],y[train_]) does internal 3-fold cross-validation for every parameter combination from param_grid, producing a grid of 20 results, which you can open by calling dtgs.grid_scores_.
[dtgs.fit(X[train_],y[train_]).score(X[test],y[test]) for train_, test in kf_total] Therefore this line fits grid search five times and then takes its score using 5-Fold cross validation. The result is the array of scores of 5-Fold validation.
And when you call dtgs.best_score_ you get the best score in the grid of the results of 3-fold validation of hyperparameters for the last fit (of 5).

Resources