Neural Network for Regression with tflearn - machine-learning

My question is about coding a neural network which does regression (and NOT classification) using tflearn.
Dataset:
fixed acidity volatile acidity citric acid ... alcohol quality
7.4 0.700 0.00 ... 9.4 5
7.8 0.880 0.00 ... 9.8 5
7.8 0.760 0.04 ... 9.8 5
11.2 0.280 0.56 ... 9.8 6
7.4 0.700 0.00 ... 9.4 5
I want to build a neural network which takes in 11 features (chemical values in wine) and outputs or predicts a score i.e., quality(out of 10). I DON'T want to classify the wine like quality_1, quality_2,... I want the model to perform a regression function for my features and predict a value out of 10(could be even a float).
The quality column in my data only has values = [3, 4, 5, 6, 7, 8, 9].
It does not contain 1, 2, and 10.
Due to the lack in experience, I could only code a neural network that CLASSIFIES the wine into classes like [score_3, score_4,...] and I used one hot encoding to do so.
Processed Data:
Features:
[[ 7.5999999 0.23 0.25999999 ..., 3.02999997 0.44
9.19999981]
[ 6.9000001 0.23 0.34999999 ..., 2.79999995 0.54000002
11. ]
[ 6.69999981 0.17 0.37 ..., 3.25999999 0.60000002
10.80000019]
...,
[ 6.30000019 0.28 0.47 ..., 3.11999989 0.50999999
9.5 ]
[ 5.19999981 0.64499998 0. ..., 3.77999997 0.61000001
12.5 ]
[ 8. 0.23999999 0.47999999 ..., 3.23000002 0.69999999
10. ]]
Labels:
[[ 0. 1. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 1. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 1. ..., 0. 0. 0.]]
Code for a neural network which CLASSIFIES into different classes:
import pandas as pd
import numpy as np
import tflearn
from tflearn.layers.core import input_data, fully_connected
from tflearn.layers.estimator import regression
from sklearn.model_selection import train_test_split
def preprocess():
data_source_red = 'F:\Gautam\...\Datasets\winequality-red.csv'
data_red = pd.read_csv(data_source_red, index_col=False, sep=';')
data = pd.get_dummies(data, columns=['quality'], prefix=['score'])
x = data[data.columns[0:11]].values
y = data[data.columns[11:18]].values
x = np.float32(x)
y = np.float32(y)
return (x, y)
x, y = preprocess()
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size = 0.2)
network = input_data(shape=[None, 11], name='Input_layer')
network = fully_connected(network, 10, activation='relu', name='Hidden_layer_1')
network = fully_connected(network, 10, activation='relu', name='Hidden_layer_2')
network = fully_connected(network, 7, activation='softmax', name='Output_layer')
network = regression(network, batch_size=2, optimizer='adam', learning_rate=0.01)
model = tflearn.DNN(network)
model.fit(train_x, train_y, show_metric=True, run_id='wine_regression',
validation_set=0.1, n_epoch=1000)
The neural network above is a poor one(accuracy=0.40). Moreover, it classifies the data into different classes. I would like to know how to code a regression neural network which gives a score out of 10 for the input features (and NOT CLASSIFICATION). I would also prefer tflearn as I'm quite comfortable with it.

This is the line in your code which makes your network a classifier with seven categories, instead of a regressor:
network = fully_connected(network, 7, activation='softmax', name='Output_layer')
I don't use TFLearn any more, I have switched over to Keras (which is similar, and has better support). However, I will suggest that you want the following output layer instead:
network = fully_connected(network, 1, activation='linear', name='Output_layer')
Also, your training data will need to change. If you want to perform a regression, you want a one-dimensional scalar label instead. I assume that you still have the original data, which you say that you altered? If not, the UC Irvine Machine Learning Data Repository has the wine quality data with a single, numerical Quality column.

Related

What do the 'normalize' parameters mean in sklearns confusion_matrix?

I am using sklearns confusion_matrix package to plot the results coupled with the accuracy, recall and precision score etc and the graph renders as it should. However I am slightly confused by what the different values for what the normalize parameter mean. Why do we do it and what are the differences between the 3 options? As quoting from their documentation:
normalize{‘true’, ‘pred’, ‘all’}, default=None
Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population.
If None, confusion matrix will not be normalized.
Does it normalize the points to a percentage format to make it easily visually if datasets are too large? Or am I missing the point all together here. I have searched but the questions all appear to be stating how to do it, rather than the meaning behind them.
A normalized version makes it easier to visually interpret how the labels are being predicted and also to perform comparisons. You can also pass values_format= '.0%' to display the values as percentages. The normalize parameter specifies what the denominator should be
'true': sum of rows (True label)
'pred': sum of columns (Predicted label)
'all': sum of all
Example:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_moons
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
# Generate some example data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=10)
# Train the classifier
clf = LogisticRegression()
clf.fit(X, y)
plot_confusion_matrix(clf, X_test, y_test); plt.title("Not normalized");
plot_confusion_matrix(clf, X_test, y_test, values_format= '.0%', normalize='true'); plt.title("normalize='true'");
plot_confusion_matrix(clf, X_test, y_test, values_format= '.0%', normalize='pred'); plt.title("normalize='pred'");
plot_confusion_matrix(clf, X_test, y_test, values_format= '.0%', normalize='all'); plt.title("normalize='all'");
Yes, you can think of it as a percentage. The default is to just show the absolute count value in each cell of the confusion matrix, i.e. how often each combination of true and predicted category levels occurrs.
But if you choose e.g. normalize='all', every count value will be divided by the sum of all count values, so that you have relative frequencies whose sum over the whole matrix is 1. Similarly, if you pick normalize='true', you will have relative frequencies per row.
If you repeat an experiment with different sample sizes, you may want to compare confusion matrices across experiments. To do so, you wouldn't want to see the total counts for each matrix. Instead, you would want to see the counts normalized but you need to decide if you want terms normalized by total number of samples ("all"), predicted class counts ("pred"), or true class counts ("true"). For example:
In [30]: yt
Out[30]: array([1, 0, 0, 0, 0, 1, 1, 0, 0, 0])
In [31]: yp
Out[31]: array([0, 0, 1, 0, 1, 0, 0, 1, 0, 0])
In [32]: confusion_matrix(yt, yp)
Out[32]:
array([[4, 3],
[3, 0]])
In [33]: confusion_matrix(yt, yp, normalize='pred')
Out[33]:
array([[0.57142857, 1. ],
[0.42857143, 0. ]])
In [34]: confusion_matrix(yt, yp, normalize='true')
Out[34]:
array([[0.57142857, 0.42857143],
[1. , 0. ]])
In [35]: confusion_matrix(yt, yp, normalize='all')
Out[35]:
array([[0.4, 0.3],
[0.3, 0. ]])

high variance with Randomforest learner

I'm using Random Forest Regressor to fit a 10-dimensional regression problem with around 300 thousand samples. Although not necessary when dealing with Random Forest I started by putting the data on the same scale (by using preprocessing of sklearn) and then I did a randomised search over the following parameter space:
n_estimators=[int(x) for x in linspace (start=100, stop= 2000, num=11)]
max_features= auto, sqrt
max_depth= from 1- to 150 with step =11
min_sampl_split=2,5,10,12
min_samples_leaf=1,2,4,6
Bootstrap true or false
Moreover, after getting the best parameters I did a second narrower search.
Though I am using a 10-Fold cross validation scheme with the random search I'm still getting a serious overfitting problem!
Moreover, I have also tried using DBSCAN algorithm to check for outliers. After excluding some parts of the dataset I got even worse results!
Should I include other parameters of the Random Forest in the randomised search? or should I apply some more preprocessing techniques on the data set before fitting?
For convenience, this is my implementation I wrote:
from sklearn.model_selection import ShuffleSplit
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
n_estimators = [int(x) for x in np.linspace(start = 1, stop =
15, num = 15)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
min_samples_split = [2, 5, 10,12]
min_samples_leaf = [1, 2, 4,6]
bootstrap = [True, False]
cv = ShuffleSplit(n_splits=10, test_size=0.01, random_state=0)
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
rf = RandomForestRegressor()
rf_random = RandomizedSearchCV(estimator = rf, param_distributions
= random_grid, n_iter = 50, cv = cv, verbose=2, random_state=42,
n_jobs = 32)
rf_random.fit(x_train, y_train)
the best parameters returned by the randomizedsearch function:
bootstrap: Fasle. Min_samples_leaf=2. n_estimators= 1647. Max_features: sqrt. min_samples_split=3. Max_depth: None.
The range of the target is from 0 to 10000 [unit]. This model is resulting in 6.98 [unit] RMSE accuracy on the training set and and average of 67.54 [unit] RMSE accuracy on the test sets.
that line
max_depth= from 1- to 150 with step =11
For a 10 feature problem, the optimum depth is under 10. You are overfitting like crazy beacause of that. consider putting max_depth from 1 to 15 with step 1
min_sampl_split=2,5,10,12
min_samples_leaf=1,2,4,6
This should help reduce the variance, however, the step of 11 for max_depth is killing all the efforts you could possibly make

Keras convolutional network scoring low on CIFAR-10 Dataset

I'm trying to train a CNN on the CIFAR-10 Dataset in Keras, but I'm only getting around 10% accuracy, essentially random. I'm training over 50 epochs, with a batch size of 32 and learning rate of 0.01. Is there anything in particular that I am doing wrong?
import os
import numpy as np
import pandas as pd
from PIL import Image
from keras.models import Model
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout, Flatten
from keras.optimizers import SGD
from keras.utils import np_utils
# trainingData = np.array([np.array(Image.open("train/" + f)) for f in os.listdir("train")]) #shape: 50k, 32, 32, 3
# testingData = np.array([np.array(Image.open("test/" + f)) for f in os.listdir("test")]) #shape: same as training
#
# trainingLabels = np.array(pd.read_csv("trainLabels.csv"))[:,1] #categorical labels ["dog", "cat", "etc"....]
# listOfLabels = sorted(list(set(trainingLabels)))
# trainingOutput = np.array([np.array([1.0 if label == ind else 0.0 for ind in listOfLabels]) for label in trainingLabels]) #converted to output
# #for example: training output for dog =
# #[1.0, 0.0, 0.0, ...]
# np.save("trainingInput.np", trainingData)
# np.save("testingInput.np", testingData)
# np.save("trainingOutput.np", trainingOutput)
trainingInput = np.load("trainingInput.npy") #shape = 50k, 32, 32, 3
testingInput = np.load("testingInput.npy") #shape = 10k, 32, 32, 3
listOfLabels = sorted(list(set(np.array(pd.read_csv("trainLabels.csv"))[:,1]))) #categorical list of labels as strings
trainingOutput = np.load("trainingOutput.npy") #shape = 50k, 10
#looks like [0.0, 1.0, 0.0 ... 0.0, 0.0]
print(listOfLabels)
print("Data loaded\n______________\n")
inp = Input(shape=(32, 32, 3))
conva1 = Conv2D(64, (3, 3), padding='same', activation='relu')(inp)
conva2 = Conv2D(64, (3, 3), padding='same', activation='relu')(conva1)
poola = MaxPool2D(pool_size=(3, 3))(conva2)
dropa = Dropout(0.1)(poola)
convb1 = Conv2D(128, (5, 5), padding='same', activation='relu')(dropa)
convb2 = Conv2D(128, (5, 5), padding='same', activation='relu')(convb1)
poolb = MaxPool2D(pool_size=(3, 3))(convb2)
dropb = Dropout(0.1)(poolb)
flat = Flatten()(dropb)
dropc = Dropout(0.5)(flat)
out = Dense(len(listOfLabels), activation='softmax')(dropc)
print(out.shape)
model = Model(inputs=inp, outputs=out)
lrSet = SGD(lr=0.01, clipvalue=0.5)
model.compile(loss='categorical_crossentropy', optimizer=lrSet, metrics=['accuracy'])
model.fit(trainingInput, trainingOutput, batch_size=32, epochs=50, verbose=1, validation_split=0.1)
print(model.predict(testingInput))
Is there anything in particular that I am doing wrong?
Not necessarily "wrong", but some pointers I can suggest are:
It is important that you rescale your data, in case you are not doing so. Instead of handling values ranging from [0,255] it is better to divide all by 255 and handle data with ranges [0,1]. This helps your model's weights converge faster, as each gradient update will be more significant compared to it's unscaled version.
I think that your dropout may be affecting your performance. Even more seeing that you are using CNNs and a strong (0.5) Dropout when passing data to your output. Quoting this great answer:
In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.
More recent research has shown some value in applying dropout also to convolutional layers, although at much lower levels: p=0.1 or 0.2.
So perhaps reducing your dropout or playing with it a bit will yield better results. Do notice that you are doing consecutive dropouts on your data, which doesn't seem quite helpful in my opinion and could also be causing problem, so consider redesigning that part:
dropb = Dropout(0.1)(poolb) #drop
flat = Flatten()(dropb) #flatten
dropc = Dropout(0.5)(flat) #then drop again?
Your learning rate may be higher than what is normally used. Although that is SGD's default learning rate, with higher learning values you may be "rushing" your training and failing to find better minima that could yield better performance. Consider using a lower learning rate (0.001 or lower, adjust epochs as needed), or well adding weight decay on your SGD instance. This will prevent your model from getting stuck on local minima that give sub-optimal results.

Keras and the input layer

So I'm trying to learn ANN's with Keras as I heard it is simpler that Theano or TensorFlow. I have a number of questions the first is to do with the input layer.
So far I have this line of code as the input:
model.add(Dense(3 ,input_shape=(2,), batch_size=50 ,activation='relu'))
Now the data I want to add into the model is of the following shape:
Index(['stock_price', 'stock_volume', 'sentiment'], dtype='object')
[[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 1.42857143e-01]
[ 3.01440000e+02 7.87830000e+04 5.88235294e-02]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 0.00000000e+00]
[ 3.01440000e+02 7.87830000e+04 5.26315789e-02]]
I want to make a model see if I can find a correlation between stock prices and tweet sentiment and I just threw volume in there because eventually, I want to see if it can find a pattern with that as well.
So my second question is after running my input layer with several different parameters I get this problem which I can't explain. So when I run this line:
model.add(Dense(3 ,input_shape=(2,), batch_size=50 ,activation='relu'))
with the following line I get this output error:
ValueError: Error when checking model input: expected dense_1_input to have shape (50, 2) but got array with shape (50, 3)
But when I change the input shape to the requested '3' I get this error:
ValueError: Error when checking model target: expected dense_2 to have shape (50, 1) but got array with shape (50, 302)
Why has the 2 changed into '302' on the error message?
I'm probably overlooking some really basic problems since this is the first neural net I've tried to implement because I've only used the application for of Weka before.
Anyway here is a copy of my full code:
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Input
from keras.optimizers import SGD
from keras.utils import np_utils
import pymysql as mysql
import pandas as pd
import config
import numpy
import pprint
model = Sequential()
try:
sql = "SELECT stock_price, stock_volume, sentiment FROM tweets LIMIT 50"
con = mysql.connect(config.dbhost, config.dbuser, config.dbpassword, config.dbname, charset='utf8mb4', autocommit=True)
results = pd.read_sql(sql=sql, con=con, columns=['stock_price', 'stock_volume', 'sentiment'])
finally:
con.close()
npResults = results.as_matrix()
cols = np_utils.to_categorical(results['stock_price'].values)
data = results.values
print(cols)
# inputs:
# 1st = stock price
# 2nd = tweet sentiment
# 3rd = volume
model.add(Dense(3 ,input_shape=(3,), batch_size=50 ,activation='relu'))
model.add(Dense(20, activation='linear'))
sgd = SGD(lr=0.3, decay=0.01, momentum=0.2)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.summary()
model.fit(x=data, y=cols, epochs=100, batch_size=100, verbose=2)
EDIT:
Here is all the output I get fom the console:
C:\Users\Def\Anaconda3\python.exe C:/Users/Def/Dropbox/Dissertation/ann.py
Using Theano backend.
C:\Users\Def\Dropbox\Dissertation
[[ 0. 0. 0. ..., 0. 0. 1.]
[ 0. 0. 0. ..., 0. 0. 1.]
[ 0. 0. 0. ..., 0. 0. 1.]
...,
[ 0. 0. 0. ..., 0. 0. 1.]
[ 0. 0. 0. ..., 0. 0. 1.]
[ 0. 0. 0. ..., 0. 0. 1.]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (50, 3) 12
_________________________________________________________________
dense_2 (Dense) (50, 20) 80
=================================================================
Traceback (most recent call last):
File "C:/Users/Def/Dropbox/Dissertation/ann.py", line 38, in <module>
model.fit(x=data, y=cols, epochs=100, batch_size=100, verbose=2)
File "C:\Users\Def\Anaconda3\lib\site-packages\keras\models.py", line 845, in fit
initial_epoch=initial_epoch)
File "C:\Users\Def\Anaconda3\lib\site-packages\keras\engine\training.py", line 1405, in fit
batch_size=batch_size)
File "C:\Users\Def\Anaconda3\lib\site-packages\keras\engine\training.py", line 1299, in _standardize_user_data
exception_prefix='model target')
File "C:\Users\Def\Anaconda3\lib\site-packages\keras\engine\training.py", line 133, in _standardize_input_data
str(array.shape))
ValueError: Error when checking model target: expected dense_2 to have shape (50, 20) but got array with shape (50, 302)
Total params: 92.0
Trainable params: 92
Non-trainable params: 0.0
_________________________________________________________________
Process finished with exit code 1
I think you are using the wrong metric: sparse_categorical_crossentropy
Is there a reason you prefer this over the normal: categorical_crossentropy ?
When using categorical_crossentropy, you should encode your targets in 1-hot coding fasion (using for instance: cols = np_utils.to_categorical(results['stock_price'].values)).
On the other hand, sparse_categorical_crossentropy uses integer-based labels.
So either use:
cols = np_utils.to_categorical(results['stock_price'].values)
with
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
and an output layer of (num-categories) neurons
or use:
cols = results['stock_price'].values.astype(np.int32)
with
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
and an single-neuron output layer.

How to use a Gaussian Process for Binary Classification?

I know that a Gaussian Process model is best suited for regression rather than classification. However, I would still like to apply a Gaussian Process to a classification task but I am not sure what is the best way to bin the predictions generated by the model. I have reviewed the Gaussian Process classification example that is available on the scikit-learn website at:
http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html
But I found this example confusing (I have listed the things I found confusing about this example at the end of the question). To try and get a better understanding I have created a very basic python code example using scikit-learn that generates classifications by applying a decision boundary to the predictions made by a gaussian process:
#A minimum example illustrating how to use a
#Gaussian Processes for binary classification
import numpy as np
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.gaussian_process import GaussianProcess
if __name__ == "__main__":
#defines some basic training and test data
#If the descriptive features have large values
#(i.e., 8s and 9s) the target is 1
#If the descriptive features have small values
#(i.e., 2s and 3s) the target is 0
TRAININPUTS = np.array([[8, 9, 9, 9, 9],
[9, 8, 9, 9, 9],
[9, 9, 8, 9, 9],
[9, 9, 9, 8, 9],
[9, 9, 9, 9, 8],
[2, 3, 3, 3, 3],
[3, 2, 3, 3, 3],
[3, 3, 2, 3, 3],
[3, 3, 3, 2, 3],
[3, 3, 3, 3, 2]])
TRAINTARGETS = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
TESTINPUTS = np.array([[8, 8, 9, 9, 9],
[9, 9, 8, 8, 9],
[3, 3, 3, 3, 3],
[3, 2, 3, 2, 3],
[3, 2, 2, 3, 2],
[2, 2, 2, 2, 2]])
TESTTARGETS = np.array([1, 1, 0, 0, 0, 0])
DECISIONBOUNDARY = 0.5
#Fit a gaussian process model to the data
gp = GaussianProcess(theta0=10e-1, random_start=100)
gp.fit(TRAININPUTS, TRAINTARGETS)
#Generate a set of predictions for the test data
y_pred = gp.predict(TESTINPUTS)
print "Predicted Values:"
print y_pred
print "----------------"
#Convert the continuous predictions into the classes
#by splitting on a decision boundary of 0.5
predictions = []
for y in y_pred:
if y > DECISIONBOUNDARY:
predictions.append(1)
else:
predictions.append(0)
print "Binned Predictions (decision boundary = 0.5):"
print predictions
print "----------------"
#print out the confusion matrix specifiy 1 as the positive class
cm = confusion_matrix(TESTTARGETS, predictions, [1, 0])
print "Confusion Matrix (1 as positive class):"
print cm
print "----------------"
print "Classification Report:"
print metrics.classification_report(TESTTARGETS, predictions)
When I run this code I get the following output:
Predicted Values:
[ 0.96914832 0.96914832 -0.03172673 0.03085167 0.06066993 0.11677634]
----------------
Binned Predictions (decision boundary = 0.5):
[1, 1, 0, 0, 0, 0]
----------------
Confusion Matrix (1 as positive class):
[[2 0]
[0 4]]
----------------
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 4
1 1.00 1.00 1.00 2
avg / total 1.00 1.00 1.00 6
The approach used in this basic example seems to work fine with this simple dataset. But this approach is very different from the classification example given on the scikit-lean website that I mentioned above (url repeated here):
http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html
So I'm wondering if I am missing something here. So, I would appreciate if anyone could:
With respect to the classification example given on the scikit-learn website:
1.1 explain what the probabilities being generated in this example are probabilities of? Are they the probability of the query instance belonging to the class >0?
1.2 why the example uses a cumulative density function instead of a probability density function?
1.3 why the example divides the predictions made by the model by the square root of the mean square error before they are input into the cumulative density function?
With respect to the basic code example I have listed here, clarify whether or not applying a simple decision boundary to the predictions generated by a gaussian process model is an appropriate way to do binary classification?
Sorry for such a long question and thanks for any help.
In the GP classifier, a standard GP distribution over functions is "squashed," usually using the standard normal CDF (also called the probit function), to map it to a distribution over binary categories.
Another interpretation of this process is through a hierarchical model (this paper has the derivation), with a hidden variable drawn from a Gaussian Process.
In sklearn's gp library, it looks like the output from y_pred, MSE=gp.predict(xx, eval_MSE=True) are the (approximate) posterior means (y_pred) and posterior variances (MSE) evaluated at points in xx before any squashing occurs.
To obtain the probability that a point from the test set belongs to the positive class, you can convert the normal distribution over y_pred to a binary distribution by applying the Normal CDF (see [this paper again] for details).
The hierarchical model of the probit squashing function is defined by a 0 decision boundary (the standard normal distribution is symmetric around 0, meaning PHI(0)=.5). So you should set DECISIONBOUNDARY=0.

Resources