include_bias in Polynomial Regression - machine-learning

I'm training a polynomial regression model after adding polynomial features with include_bias=True
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=True)
X_poly = poly_features.fit_transform(X)
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
print(lin_reg.intercept_, lin_reg.coef_)
[1.95551099] [[0. 1.07234332 0.5122747 ]]
Question: include_bias essentially adds another feature vector of 1s, for intercept parameter (theta0). so, Im expecting intercept 1.9555 in place of 0. Why does this return 0 for theta0?
[[0. 1.07234332 0.5122747 ]]

Related

sklearn GP return std dev is zero for predictions where it must be large

I am trying regression using Gaussian processes sklearn package. The standard deviation on predictions are zero, where it must be larger.
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
Here I have shown the samples from posterior after training the model. It clearly shows the standard deviation increases along with x-axis.
This is the output I got. As the value increases along x-axis the stddev must increase, where as it is showing zero stddev.
Acutal results should look something like this.
Is it a bug ?
Full Code to reproduce the issue.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, WhiteKernel, DotProduct
df = pd.read_csv('train.csv')
X_train = df[:,0].to_numpy().reshape(-1,1)
y_train = df[:,1].to_numpy()
X_pred = np.linspace(0.01, 8.5, 1000).reshape(-1,1)
# Instantiate a Gaussian Process model
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
print(
f"Kernel parameters before fit:\n{kernel} \n"
f"Kernel parameters after fit: \n{gpr.kernel_} \n"
f"Log-likelihood: {gpr.log_marginal_likelihood(gpr.kernel_.theta):.3f} \n"
f"Score = {gpr.score(X_train,y_train)}"
)
n_samples = 10
y_samples = gpr.sample_y(X_pred, n_samples)
for idx, single_prior in enumerate(y_samples.T):
plt.plot(
X_pred,
single_prior,
linestyle="--",
alpha=0.7,
label=f"Sampled function #{idx + 1}",
)
plt.title('Sample from posterior distribution')
plt.show()
y_pred, sigma = gpr.predict(X_pred, return_std=True)
plt.figure(figsize=(10,6))
plt.plot(X_train, y_train, 'r.', markersize=3, label='Observations')
plt.plot(X_pred, y_pred, 'b-', label='Prediction',)
plt.fill_between(X_pred[:,0], y_pred-1*sigma, y_pred+1*sigma,
alpha=.4, fc='b', ec='None', label='68% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-2*sigma, y_pred+2*sigma,
alpha=.3, fc='b', ec='None', label='95% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-3*sigma, y_pred+3*sigma,
alpha=.1, fc='b', ec='None', label='99% confidence interval')
plt.legend()
plt.show()
Not really an answer but something to look out for that maybe it might help. I was having the same problem and had some results when changing the alpha, some kernel parameters or normalizing the data.
Probably it was due to a matter of scale (with big numbers, the std dev is too small in proportion)

How to extract coefficients from fitted pipeline for penalized logistic regression?

I have a set of training data that consists of X, which is a set of n columns of data (features), and Y, which is one column of target variable.
I am trying to train my model with logistic regression using the following pipeline:
pipeline = sklearn.pipeline.Pipeline([
('logistic_regression', LogisticRegression(penalty = 'none', C = 10))
])
My goal is to obtain the values of each of the n coefficients corresponding to the features, under the assumption of a linear model (y = coeff_0 + coeff_1*x1 + ... + coeff_n*xn).
What I tried was to train this pipeline on my data with model = pipeline.fit(X, Y). So I think that I now have the model that contains the coefficients that I want. However, I don't know how to access them. I'm looking for something like mode.best_params_('logistic_regression').
Does anyone know how to extract the fitted coefficients from a model like this?
Have a look at the scikit-learn documentation for Pipeline, this example is inspired by it:
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
X, y = make_classification(n_informative=5, n_redundant=0, random_state=42)
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
# access coefficients
print(anova_svm['svc'].coef_)
model.coef_ does the job, .best_params_ is usualy associated with GridSearch, i.e. hyperparameter optimization.
In your specific case try: model['logistic_regression'].coefs_.
Example to get the coefs from a pipeline.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
X, y = load_iris(return_X_y=True)
pipeline = Pipeline([('lr', LogisticRegression(penalty = 'l2',
C = 10))])
pipeline.fit(X, y)
pipeline['lr'].coef_
array([[-0.42923513, 2.08235619, -4.28084811, -1.97174699],
[ 1.06321671, -0.08077595, -0.46911772, -2.3221883 ],
[-0.63398158, -2.00158024, 4.74996583, 4.29393529]])
here is how to visualize the coefficients and measure model accuracy. I used the baby weight and height and gestation period to predict preterm
pipeline = Pipeline([('lr', LogisticRegression(penalty='l2',C=10))])
scaler=StandardScaler()
#X=np.array(df['gestation_wks']).reshape(-1,1)
X=scaler.fit_transform(df[['bwt_lbs','height_ft','gestation_wks']])
y=np.array(df['PreTerm'])
X_train,X_test, y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=123)
pipeline.fit(X_train,y_train)
y_pred_prob=pipeline.predict_proba(X_test)
predictions=pipeline.predict(X_test)
print(predictions)
sns.countplot(x=predictions, orient='h')
plt.show()
#print(predictions[:,0])
print(pipeline['lr'].coef_)
print(pipeline['lr'].intercept_)
print('Coefficients close to zero will contribute little to the end result')
num_err = np.sum(y != pipeline.predict(X))
print("Number of errors:", num_err)
def my_loss(y,w):
s = 0
for i in range(y.size):
# Get the true and predicted target values for example 'i'
y_i_true = y[i]
y_i_pred = w[i]
s = s + (y_i_true - y_i_pred)**2
return s
print("Loss:",my_loss(y_test,predictions))
fpr, tpr, threshholds = roc_curve(y_test,y_pred_prob[:,1])
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
accuracy=round(pipeline['lr'].score(X_train, y_train) * 100, 2)
print("Model Accuracy={accuracy}".format(accuracy=accuracy))
cm=confusion_matrix(y_test,predictions)
print(cm)

How to implement custom logloss with identical behavior to binary objective in LightGBM?

I am trying to implement my own loss function for binary classification. To get started, I want to reproduce the exact behavior of the binary objective. In particular, I want that:
The loss of both functions have the same scale
The training and validation slope is similar
predict_proba(X) returns probabilities
None of this is the case for the code below:
import sklearn.datasets
import lightgbm as lgb
import numpy as np
X, y = sklearn.datasets.load_iris(return_X_y=True)
X, y = X[y <= 1], y[y <= 1]
def loglikelihood(labels, preds):
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
model = lgb.LGBMClassifier(objective=loglikelihood) # or "binary"
model.fit(X, y, eval_set=[(X, y)], eval_metric="binary_logloss")
lgb.plot_metric(model.evals_result_)
With objective="binary":
With objective=loglikelihood the slope is not even smooth:
Moreover, sigmoid has to be applied to model.predict_proba(X) to get probabilities for loglikelihood (as I have figured out from https://github.com/Microsoft/LightGBM/issues/2136).
Is it possible to get the same behavior with a custom loss function? Does anybody understand where all these differences come from?
Looking at the output of model.predict_proba(X) in each case, we can see that the built-in binary_logloss model returns probabilities, while the custom model returns logits.
The built-in evaluation function takes probabilities as input. To fit the custom objective, we need a custom evaluation function which will take logits as input.
Here is how you could write this. I've changed the sigmoid calculation so that it doesn't overflow if logit is a large negative number.
def loglikelihood(labels, logits):
#numerically stable sigmoid:
preds = np.where(logits >= 0,
1. / (1. + np.exp(-logits)),
np.exp(logits) / (1. + np.exp(logits)))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
def my_eval(labels, logits):
#numerically stable logsigmoid:
logsigmoid = np.where(logits >= 0,
-np.log(1 + np.exp(-logits)),
logits - np.log(1 + np.exp(logits)))
loss = (-logsigmoid + logits * (1 - labels)).mean()
return "error", loss, False
model1 = lgb.LGBMClassifier(objective='binary')
model1.fit(X, y, eval_set=[(X, y)], eval_metric="binary_logloss")
model2 = lgb.LGBMClassifier(objective=loglikelihood)
model2.fit(X, y, eval_set=[(X, y)], eval_metric=my_eval)
Now the results are the same.

Binary Classification using logistic regression with Tensorflow

I just too an ML course and am trying to get better at tensorflow. To that end, I purchased the book by Nishant Shukhla (ML with tensorflow) and am trying to run the 2 feature example with a different data set.
With the fake dataset in the book, my code runs fine. However, with data I used in the ML course, the code refuses to converge. With a really small learning rate it does converge, but the learned weights are wrong.
Also attaching the plot of the feature data. It should not a feature scaling issue as values on both features vary between 30-100 units.
I am really struggling with how opaque tensorflow is- any help would be appreciated:
""" Solution for simple logistic regression model
"""
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import numpy as np
import tensorflow as tf
import time
import matplotlib.pyplot as plt
# Define paramaters for the model
learning_rate = 0.0001
training_epochs = 300
data = np.loadtxt('ex2data1.txt', delimiter=',')
x1s = np.array(data[:,0]).astype(np.float32)
x2s = np.array(data[:,1]).astype(np.float32)
ys = np.array(data[:,2]).astype(np.float32)
print('Plotting data with + indicating (y = 1) examples and o \n indicating (y = 0) examples.\n')
color = ['red' if l == 0 else 'blue' for l in ys]
myplot = plt.scatter(x1s, x2s, color = color)
# Put some labels
plt.xlabel("Exam 1 score")
plt.ylabel("Exam 2 score")
# Specified in plot order
plt.show()
# Step 2: Create datasets
X1 = tf.placeholder(tf.float32, shape=(None,), name="x1")
X2 = tf.placeholder(tf.float32, shape=(None,), name="x2")
Y = tf.placeholder(tf.float32, shape=(None,), name="y")
w = tf.Variable(np.random.rand(3,1), name='w', dtype='float32',trainable=True)
y_model = tf.sigmoid(w[2]*X2 + w[1]*X1 + w[0])
cost = tf.reduce_mean(-tf.log(y_model*Y + (1-y_model)*(1-Y)))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
writer = tf.summary.FileWriter('./graphs/logreg', tf.get_default_graph())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
prev_error = 0.0;
for epoch in range(training_epochs):
error, loss = sess.run([cost, train_op], feed_dict={X1:x1s, X2:x2s, Y:ys})
print("epoch = ", epoch, "loss = ", loss)
if abs(prev_error - error) < 0.0001:
break
prev_error = error
w_val = sess.run(w, {X1:x1s, X2:x2s, Y:ys})
print("w learned = ", w_val)
writer.close()
sess.close()
Both X1 and X2 range from ~20-100. However, once I scaled them, the solution converged just fine.

Keras LSTM RNN forecast - Shifting fitted forecast backward

I am trying to use LSTM Recurrent Neural Net using Keras to forecast future purchase. My input variables are time-window of purchases for previous 5 days, and a categorical variable which I encoded as dummy variables A, B, ...,I. My input data looks like following:
>>> dataframe.head()
day price A B C D E F G H I TS_bigHolidays
0 2015-06-16 7.031160 1 0 0 0 0 0 0 0 0 0
1 2015-06-17 10.732429 1 0 0 0 0 0 0 0 0 0
2 2015-06-18 21.312692 1 0 0 0 0 0 0 0 0 0
My problem is my forecasts/fitted values (both for trained and test data) seem to be shifted forward. Here is a plot:
My question is what parameter in LSTM Keras should I change to correct this issue? Or do I need to change anything in my input data?
Here is my code:
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas
import math
import time
import csv
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from sklearn.preprocessing import MinMaxScaler
np.random.seed(1234)
exo_feature = ["A","B","C","D","E","F","G","H","I", "TS_bigHolidays"]
look_back = 5 #this is number of days we are looking back for sliding window of time series
forecast_period_length = 40
# load the dataset
dataframe = pandas.read_csv('processedDataframeGameSphere.csv', header = 0, engine='python', skipfooter=6)
dataframe["price"] = dataframe['price'].astype('float32')
scaler = MinMaxScaler(feature_range=(0, 100))
dataframe["price"] = scaler.fit_transform(dataframe['price'])
# this function is used to make sliding window for time series data
def create_dataframe(dataframe, look_back=1):
dataX, dataY = [], []
for i in range(dataframe.shape[0]-look_back-1):
price_lookback = dataframe['price'][i: (i + look_back)] #i+look_back is exclusive here
exog_feature = dataframe[exo_feature].ix[i + look_back - 1] #Y is i+ look_back ,that's why
row_i = price_lookback.append(exog_feature)
dataX.append(row_i)
dataY.append(dataframe["price"][i + look_back])
return np.array(dataX), np.array(dataY)
window_dataframe, Y = create_dataframe(dataframe, look_back)
# split into train and test sets
train_size = int(dataframe.shape[0] - forecast_period_length) #28 is the number of days we want to forecast , 4 weeks
test_size = dataframe.shape[0] - train_size
test_size_start_point_with_lookback = train_size - look_back
trainX, trainY = window_dataframe[0:train_size,:], Y[0:train_size]
print(trainX.shape)
print(trainY.shape)
#below changed datawindowY indexing, since it's just array.
testX, testY = window_dataframe[train_size:dataframe.shape[0],:], Y[train_size:dataframe.shape[0]]
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
print(trainX.shape)
print(testX.shape)
# create and fit the LSTM network
dimension_input = testX.shape[2]
model = Sequential()
layers = [dimension_input, 50, 100, 1]
epochs = 100
model.add(LSTM(
input_dim=layers[0],
output_dim=layers[1],
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
layers[2],
return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(
output_dim=layers[3]))
model.add(Activation("linear"))
start = time.time()
model.compile(loss="mse", optimizer="rmsprop")
print "Compilation Time : ", time.time() - start
model.fit(
trainX, trainY,
batch_size= 10, nb_epoch=epochs, validation_split=0.05,verbose =2)
# Estimate model performance
trainScore = model.evaluate(trainX, trainY, verbose=0)
trainScore = math.sqrt(trainScore)
trainScore = scaler.inverse_transform(np.array([[trainScore]]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = model.evaluate(testX, testY, verbose=0)
testScore = math.sqrt(testScore)
testScore = scaler.inverse_transform(np.array([[testScore]]))
print('Test Score: %.2f RMSE' % (testScore))
# generate predictions for training
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# shift train predictions for plotting
np_price = np.array(dataframe["price"])
print(np_price.shape)
np_price = np_price.reshape(np_price.shape[0],1)
trainPredictPlot = np.empty_like(np_price)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
testPredictPlot = np.empty_like(np_price)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+look_back+1:dataframe.shape[0], :] = testPredict
# plot baseline and predictions
plt.plot(dataframe["price"])
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()
It's not a problem of LSTM, if you use just simple feed-forward network, the effect will be the same.
the problem is the network tend to mimic yesterday value instead of 'forecasting' you expect.
(it is nice strategy in term of reducing MSE loss)
you need more 'care' to avoid this issue and it's not a simple issue.

Resources