Tensorflow and Scikit learn: Same solution but different outputs - machine-learning

Im implementing a simple linear regression with scikitlearn and tensorflow.
My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.
The problem is basically to try to predict a salary based in years of experience.
I not sure what Im doing wrong in Tensorflow's code.
Thanks!
ScikitLearn solution
import pandas as pd
data = pd.read_csv('Salary_Data.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
X_single_data = [[4.6]]
y_single_pred = regressor.predict(X_single_data)
print(f'Train score: {regressor.score(X_train, y_train)}')
print(f'Test score: {regressor.score(X_test, y_test)}')
Train score: 0.960775692121653
Test score: 0.9248580247217076
Tensorflow solution
import tensorflow as tf
f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
({'average_loss': 7675087400.0,
'label/mean': 84588.11,
'loss': 69075790000.0,
'prediction/mean': 5.0796494,
'global_step': 6},
[])
Data
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00

Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])
def func(x, a, b, c):
return a / (1.0 + numpy.exp(-(x-b)/c))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([minY, maxY]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([minX, maxX]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('Years of experience') # X axis data label
axes.set_ylabel('Salary in thousands') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)

Related

sklearn GP return std dev is zero for predictions where it must be large

I am trying regression using Gaussian processes sklearn package. The standard deviation on predictions are zero, where it must be larger.
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
Here I have shown the samples from posterior after training the model. It clearly shows the standard deviation increases along with x-axis.
This is the output I got. As the value increases along x-axis the stddev must increase, where as it is showing zero stddev.
Acutal results should look something like this.
Is it a bug ?
Full Code to reproduce the issue.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, WhiteKernel, DotProduct
df = pd.read_csv('train.csv')
X_train = df[:,0].to_numpy().reshape(-1,1)
y_train = df[:,1].to_numpy()
X_pred = np.linspace(0.01, 8.5, 1000).reshape(-1,1)
# Instantiate a Gaussian Process model
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
print(
f"Kernel parameters before fit:\n{kernel} \n"
f"Kernel parameters after fit: \n{gpr.kernel_} \n"
f"Log-likelihood: {gpr.log_marginal_likelihood(gpr.kernel_.theta):.3f} \n"
f"Score = {gpr.score(X_train,y_train)}"
)
n_samples = 10
y_samples = gpr.sample_y(X_pred, n_samples)
for idx, single_prior in enumerate(y_samples.T):
plt.plot(
X_pred,
single_prior,
linestyle="--",
alpha=0.7,
label=f"Sampled function #{idx + 1}",
)
plt.title('Sample from posterior distribution')
plt.show()
y_pred, sigma = gpr.predict(X_pred, return_std=True)
plt.figure(figsize=(10,6))
plt.plot(X_train, y_train, 'r.', markersize=3, label='Observations')
plt.plot(X_pred, y_pred, 'b-', label='Prediction',)
plt.fill_between(X_pred[:,0], y_pred-1*sigma, y_pred+1*sigma,
alpha=.4, fc='b', ec='None', label='68% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-2*sigma, y_pred+2*sigma,
alpha=.3, fc='b', ec='None', label='95% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-3*sigma, y_pred+3*sigma,
alpha=.1, fc='b', ec='None', label='99% confidence interval')
plt.legend()
plt.show()
Not really an answer but something to look out for that maybe it might help. I was having the same problem and had some results when changing the alpha, some kernel parameters or normalizing the data.
Probably it was due to a matter of scale (with big numbers, the std dev is too small in proportion)

Trying to predict running time of algorithms through regression

I'm following this paper:
http://robotics.stanford.edu/users/shoham/www%20papers/Empirical%20Hardness.pdf
and I try to predict the running time for the Traveling salesman problem on a blackbox solver.
I get some weird results during regression that I'd love to consult about:
I find it hard to believe that in XGBOOST or at any regessor the number of cities is irrelevant as a feature? as seen in XGBOOST feature importance image.
In the RIDGE and LINEAR REGRESSION results graphs you can see that for some problem instances the graphs you can see that the predicted value is negative (when we talk about run time), I saw in other question here that this is because "Linear regression does not respect the bounds of 0" and that I should put a natural log on it, but I don't know exactly where. So I'd love help with that also.
I'd also love to be reccomended on other regression models that may fit my problem.
Thanks a lot!
Here is my code pieces (google colab), followed by the results I got:
1
# Import the standard libraries of pandas.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings. filterwarnings("ignore")
sns.set_style('whitegrid')
from google.colab import files
2
# Install the solver and import its libraries, in addition import all the
# libraries with which we will prepare the features.
!pip3 install ortools
!pip install python-igraph
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
import numpy as np
import time
import random
from random import randrange
from scipy import stats
from scipy.stats import skew
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import minimum_spanning_tree
from scipy.sparse.csgraph import depth_first_tree
from igraph import Graph, mean
import igraph
import itertools
import math
3
# Simple travelling salesman problem between cities - solver OR Tools By Google.
def create_data_model():
# Stores the data for the problem.
data = {}
# dim will be the number of Vertices\Cities in the Traveling Salesman Problem.
# Randomly select the matrix dimension in unifom distribution.
dim = np.random.randint(10, 350)
# Generate a square symmetric matrix It will be the distance matrix that the solver will solve.
square_matrice = [[0 for row in range(dim)] for col in range(dim)]
for i in range(dim):
for j in range(dim):
if i == j:
square_matrice[i][j] = 0
else:
# Randomly fill the matrix in unifom distribution.
square_matrice[i][j] = square_matrice[j][i] = np.random.randint(1, 1000)
data['distance_matrix'] = square_matrice # yapf: disable
data['num_vehicles'] = 1
data['depot'] = 0
return data
def main():
# Start measuring solution time.
start_time = time.time()
# Instantiate the data problem.
data = create_data_model()
# Create the routing index manager.
manager = pywrapcp.RoutingIndexManager(len(data['distance_matrix']),
data['num_vehicles'], data['depot'])
# Create Routing Model.
routing = pywrapcp.RoutingModel(manager)
def distance_callback(from_index, to_index):
# Returns the distance between the two nodes.
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['distance_matrix'][from_node][to_node]
transit_callback_index = routing.RegisterTransitCallback(distance_callback)
# Define cost of each arc.
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
# Setting first solution heuristic.
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
# Solve the problem.
solution = routing.SolveWithParameters(search_parameters)
solution_time = time.time() - start_time
'''In this part of the code we will create the following features on the distance matrix of the problem.
* Mean - Average weights of the distance matrix.
* Std - Standard Deviation of the distance matrix.
* Skewness - What is the tendency of the weights in the distance matrix.
* Noc - Number of cities we have in the distance matrix [matrix dimension].
* Td - The total distance of the solution rout.
* Dmft - Distance matrix features time, That is how long it took us to calculate all these features.
'''
dmt_start_time = time.time()
mat = np.array(data['distance_matrix'])
mean = mat.mean()
std = mat.std()
merged = list(itertools.chain(*mat))
skewness = skew(merged)
noc = len(data['distance_matrix'])
td = solution.ObjectiveValue() if solution else -1
dmft = time.time() - dmt_start_time
'''In this part of the code we will create from the distance matrix of the problem an MST and than
on the MST we take the following features.
* MST_Mean - Average weights of the MST.
* MST_Std - Standard Deviation of the MST.
* MST_Skewness - What is the tendency of the weights in the MST.
* MST_ft - MST features time, That is how long it took us to calculate the MST & all these features.
'''
spt_start_time = time.time()
X = csr_matrix(mat)
Tcsr = minimum_spanning_tree(X)
mat_st = np.array(Tcsr.toarray().astype(int))
mst_mean = mat_st.mean()
mst_std = mat_st.std()
merged_st = list(itertools.chain(*mat_st))
mst_skewness = skew(merged_st)
mst_ft = time.time() - spt_start_time
'''In this part of the code we calculate features from the MST that are considered to be
related to the rank and depth of the tracks in it.
* D_Mean - Average degree of the MST.
* D_Std - Standard Deviation of the MST degrees.
* D_Skewness - What is the tendency of the degrees in the MST.
* DFT_Mean - The average weight of the deepest track in MST.
* DFT_Std - Standard Deviation of the deepest track in MST.
* DFT_Max - The heaviest arch on the longest route in MST.
* DDFT_ft - Degree & DFT features time, That is how long it took us to calculate all these features.
'''
dstt_start_time = time.time()
g = Graph.Weighted_Adjacency(mat_st.tolist())
d_mean = igraph.statistics.mean(g.degree())
d_std = igraph.statistics.sd(g.degree())
d_skewness = skew(g.degree())
d_t = depth_first_tree(X, 0, directed=False)
mat_dt = np.array(d_t.toarray().astype(int))
dft_mean = mat_dt.mean()
dft_std = mat_dt.std()
dft_max = np.amax(mat_dt)
ddft_ft = time.time() - dstt_start_time
# In this map we will hold all the features and their results.
features_map = {'Mean': mean, 'Std': std, 'Skewness': skewness, 'Noc': noc, 'Td': td, 'Dmft': dmft,
'MST_Mean': mst_mean, 'MST_Std': mst_std, 'MST_Skewness': mst_skewness, 'MST_ft': mst_ft,
'D_Mean': d_mean, 'D_Std': d_std, 'D_Skewness': d_skewness, 'DFT_Mean': dft_mean,'DFT_Std': dft_std,
'DFT_Max': dft_max, 'DDFT_ft': ddft_ft, 'Solution_time': solution_time}
return features_map
# Main
# Create dataFrame.
data_TSP = pd.DataFrame()
# Fill the dataFrame.
for i in range(10000):
#print(i)
features_map = main()
data_TSP = data_TSP.append(features_map, ignore_index=True)
# Show data frame.
data_TSP.head()
data_TSP.to_csv('data_10000.csv')
files.download('data_10000.csv')
Regression models:
# Import the standard libraries of pandas.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings. filterwarnings("ignore")
sns.set_style('whitegrid')
2
# Neaded for opening data file in drive.
from google.colab import files
uploaded = files.upload()
import io
df = pd.read_csv(io.BytesIO(uploaded['data_10000_clean.csv']))
try:
df.drop(['Unnamed: 0'], axis=1, inplace=True)
except:
pass
df.head()
from sklearn.model_selection import train_test_split
# Split the data to training set and test set (70%, 30%)
features = list(df.drop('Solution_time', axis = 1, inplace = False))
y = df['Solution_time']
X = df[features]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
Import models which we predicted with tham the solution,
And scoring methods to evaluate these models.
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import GridSearchCV
from sklearn import linear_model
!pip3 install xgboost
from xgboost import XGBRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import cross_val_score
!pip install scikit-plot
import scikitplot as skplt
import matplotlib as mpl
########################################## Several functions for different regression models. ##########################################
class Score:
r2 = 0.0 # This score determines how close the predictions really are to the real data.
cross_vali_score = 0.0 # How true is our algorithm if it going to well predict new data.
class Regressor:
def __init__(self, name):
self.name = name
self.score = Score()
self.y_pred = None
self.reg = None
# Map between the name of a model and the model itself.
models_map = {'Random Forest': RandomForestRegressor(), 'Xgboost': XGBRegressor(), 'Ridge': Ridge(),
'Kneighbors': KNeighborsRegressor(), 'Linear Regressor': linear_model.LinearRegression()}
# This function return a map that maps between each model and its Regressor class.
def get_models():
result_map = {}
for key, val in models_map.items():
result = Regressor(key)
reg = val
result.score.cross_vali_score = np.mean(cross_val_score(reg, X_train, y_train, cv=5))
result.reg = reg.fit(X_train, y_train)
result.y_pred = reg.predict(X_test)
result.score.r2 = r2_score(y_test, result.y_pred)
result_map[key] = result
return result_map
# This function print a graph for models of the features that influenced their decision making.
def print_influence_graph(map):
for key, val in map.items():
if key == 'Random Forest' or key == 'Xgboost':
# The parameters that most influenced the decision.
feature_imp = pd.Series(val.reg.feature_importances_,index=features).sort_values(ascending=False)
sns.barplot(x=feature_imp, y=feature_imp.index)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title(val.name.upper() +" - Visualizing Important Features")
plt.show()
# This function print a graph for models that show the real results against the model predictions.
def show_predicted_vs_actual(map):
for key, val in map.items():
fig, ax = plt.subplots()
ax.scatter(y_test, val.y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=3)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
ax.title.set_text(val.name.upper() +" - Predicted time vs actual time")
plt.show()
# This function print numerical scores for the models.
def print_scores(map):
for key, val in map.items():
print(val.name.upper() + ' SCORE: ')
print('R2score' + ' = ', val.score.r2)
print('Cross_val_score' + ' = ', val.score.cross_vali_score)
print('------------------------------------------\n')
# This function print a graph showing the differences between the scores of the models.
def show_models_differences_graph(map):
comp_df = pd.DataFrame(columns = ('Method', 'R2 Score', 'Cross val score'))
for i in map:
row = {'Method': i, 'R2 Score': map[i].score.r2, 'Cross val score': map[i].score.cross_vali_score}
comp_df = comp_df.append(row, ignore_index=True)
ax = comp_df.plot.bar(x='Method', rot=30, figsize=(12,6))
ax.set_title('Comparison graph')
#########################################################################################################################################
models = get_models()
print_influence_graph(models)
show_predicted_vs_actual(models)
print_scores(models)
show_models_differences_graph(models)
And here are the results:

Unable to get appropriate prediction using statsmodel for HoltWinters

I am trying to experiment with HoltWinters using some random data. However, using the statsmodel api I am unable to prediction for the next X data points.
Here is my sample code. I am unable to understand the predict API and what it means by start and end.
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing
data = np.linspace(start=15, stop=25, num=100)
noise = np.random.uniform(0, 1, 100)
data = data + noise
split = int(len(data)*0.7)
data_train = data[0:split]
data_test = data[-(len(data) - split):]
model = ExponentialSmoothing(data_train)
model_fit = model.fit()
# make prediction
pred = model_fit.predict(split+1, len(data))
test_index = [i for i in range(split, len(data))]
plt.plot(data_train, label='Train')
plt.plot(test_index, data_test, label='Test')
plt.plot(test_index, pred, label='Prediction')
plt.legend(loc='best')
plt.show()
I get a weird graph for prediction and I believe it has something to do with my understanding of the predict API.
The exponential smoothing model you've chosen doesn't include a trend, so it is forecasting the best level, and that gives a horizontal line forecast.
If you do:
model = ExponentialSmoothing(data_train, trend='add')
then you will get a trend, and likely it will look more like you expect.
For example:
# Simulate some data
np.random.seed(12346)
dta = pd.Series(np.arange(100) + np.sin(np.arange(100)) * 5 + np.random.normal(scale=4, size=100))
# Perform exponention smoothing, no trend
mod1 = sm.tsa.ExponentialSmoothing(dta)
res1 = mod1.fit()
fcast1 = res1.forecast(30)
plt.plot(dta)
plt.plot(fcast1, label='Model without trend')
# Perform exponention smoothing, with a trend
mod2 = sm.tsa.ExponentialSmoothing(dta, trend='add')
res2 = mod2.fit()
fcast2 = res2.forecast(30)
plt.plot(fcast2, label='Model with trend')
plt.legend(loc='lower right')
gives the following:

Binary Classification using logistic regression with Tensorflow

I just too an ML course and am trying to get better at tensorflow. To that end, I purchased the book by Nishant Shukhla (ML with tensorflow) and am trying to run the 2 feature example with a different data set.
With the fake dataset in the book, my code runs fine. However, with data I used in the ML course, the code refuses to converge. With a really small learning rate it does converge, but the learned weights are wrong.
Also attaching the plot of the feature data. It should not a feature scaling issue as values on both features vary between 30-100 units.
I am really struggling with how opaque tensorflow is- any help would be appreciated:
""" Solution for simple logistic regression model
"""
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import numpy as np
import tensorflow as tf
import time
import matplotlib.pyplot as plt
# Define paramaters for the model
learning_rate = 0.0001
training_epochs = 300
data = np.loadtxt('ex2data1.txt', delimiter=',')
x1s = np.array(data[:,0]).astype(np.float32)
x2s = np.array(data[:,1]).astype(np.float32)
ys = np.array(data[:,2]).astype(np.float32)
print('Plotting data with + indicating (y = 1) examples and o \n indicating (y = 0) examples.\n')
color = ['red' if l == 0 else 'blue' for l in ys]
myplot = plt.scatter(x1s, x2s, color = color)
# Put some labels
plt.xlabel("Exam 1 score")
plt.ylabel("Exam 2 score")
# Specified in plot order
plt.show()
# Step 2: Create datasets
X1 = tf.placeholder(tf.float32, shape=(None,), name="x1")
X2 = tf.placeholder(tf.float32, shape=(None,), name="x2")
Y = tf.placeholder(tf.float32, shape=(None,), name="y")
w = tf.Variable(np.random.rand(3,1), name='w', dtype='float32',trainable=True)
y_model = tf.sigmoid(w[2]*X2 + w[1]*X1 + w[0])
cost = tf.reduce_mean(-tf.log(y_model*Y + (1-y_model)*(1-Y)))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
writer = tf.summary.FileWriter('./graphs/logreg', tf.get_default_graph())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
prev_error = 0.0;
for epoch in range(training_epochs):
error, loss = sess.run([cost, train_op], feed_dict={X1:x1s, X2:x2s, Y:ys})
print("epoch = ", epoch, "loss = ", loss)
if abs(prev_error - error) < 0.0001:
break
prev_error = error
w_val = sess.run(w, {X1:x1s, X2:x2s, Y:ys})
print("w learned = ", w_val)
writer.close()
sess.close()
Both X1 and X2 range from ~20-100. However, once I scaled them, the solution converged just fine.

MXNet - application of GANs to MNIST

So this question is about GANs.
I am trying to do a trivial example for my own proof of concept; namely, generate images of hand written digits (MNIST). While most will approach this via deep convolutional gans (dgGANs), I am just trying to achieve this via the 1D array (i.e. instead of 28x28 gray-scale pixel values, a 28*28 1d array).
This git repo features a "vanilla" gans which treats the MNIST dataset as a 1d array of 784 values. Their output values look pretty acceptable so I wanted to do something similar.
Import statements
from __future__ import print_function
import matplotlib as mpl
from matplotlib import pyplot as plt
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn, utils
import numpy as np
import os
from math import floor
from random import random
import time
from datetime import datetime
import logging
ctx = mx.gpu()
np.random.seed(3)
Hyper parameters
batch_size = 100
epochs = 100
generator_learning_rate = 0.001
discriminator_learning_rate = 0.001
beta1 = 0.5
latent_z_size = 100
Load data
mnist = mx.test_utils.get_mnist()
# convert imgs to arrays
flattened_training_data = mnist["test_data"].reshape(10000, 28*28)
define models
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="tanh"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="tanh"))
loss = gluon.loss.SoftmaxCrossEntropyLoss()
init stuff
G.initialize(mx.init.Normal(0.02), ctx=ctx)
D.initialize(mx.init.Normal(0.02), ctx=ctx)
trainer_G = gluon.Trainer(G.collect_params(), 'adam', {"learning_rate": generator_learning_rate, "beta1": beta1})
trainer_D = gluon.Trainer(D.collect_params(), 'adam', {"learning_rate": discriminator_learning_rate, "beta1": beta1})
metric = mx.metric.Accuracy()
dynamic plot (for juptyer notebook)
import matplotlib.pyplot as plt
import time
def dynamic_line_plt(ax, y_data, colors=['r', 'b', 'g'], labels=['Line1', 'Line2', 'Line3']):
x_data = []
y_max = 0
y_min = 0
x_min = 0
x_max = 0
for y in y_data:
x_data.append(list(range(len(y))))
if max(y) > y_max:
y_max = max(y)
if min(y) < y_min:
y_min = min(y)
if len(y) > x_max:
x_max = len(y)
ax.set_ylim(y_min, y_max)
ax.set_xlim(x_min, x_max)
if ax.lines:
for i, line in enumerate(ax.lines):
line.set_xdata(x_data[i])
line.set_ydata(y_data[i])
else:
for i in range(len(y_data)):
l = ax.plot(x_data[i], y_data[i], colors[i], label=labels[i])
ax.legend()
fig.canvas.draw()
train
stamp = datetime.now().strftime('%Y_%m_%d-%H_%M')
logging.basicConfig(level=logging.DEBUG)
# arrays to store data for plotting
loss_D = nd.array([0], ctx=ctx)
loss_G = nd.array([0], ctx=ctx)
acc_d = nd.array([0], ctx=ctx)
labels = ['Discriminator Loss', 'Generator Loss', 'Discriminator Acc.']
%matplotlib notebook
fig, ax = plt.subplots(1, 1)
ax.set_xlabel('Time')
ax.set_ylabel('Loss')
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
for epoch in range(epochs):
tic = time.time()
data_iter.reset()
for i, batch in enumerate(data_iter):
####################################
# Update Disriminator: maximize log(D(x)) + log(1-D(G(z)))
####################################
# extract batch of real data
data = batch.data[0].as_in_context(ctx)
# add noise
# Produce our noisey input to the generator
latent_z = mx.nd.random_normal(0,1,shape=(batch_size, latent_z_size), ctx=ctx)
# soft and noisy labels
# real_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(.7, 1.2, shape=(1)).asscalar()
# fake_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(0, .3, shape=(1)).asscalar()
# real_label = nd.random_uniform(.7, 1.2, shape=(batch_size), ctx=ctx)
# fake_label = nd.random_uniform(0, .3, shape=(batch_size), ctx=ctx)
real_label = mx.nd.ones((batch_size, ), ctx=ctx)
fake_label = mx.nd.zeros((batch_size, ), ctx=ctx)
with autograd.record():
# train with real data
real_output = D(data)
errD_real = loss(real_output, real_label)
# train with fake data
fake = G(latent_z)
fake_output = D(fake.detach())
errD_fake = loss(fake_output, fake_label)
errD = errD_real + errD_fake
errD.backward()
trainer_D.step(batch_size)
metric.update([real_label, ], [real_output,])
metric.update([fake_label, ], [fake_output,])
####################################
# Update Generator: maximize log(D(G(z)))
####################################
with autograd.record():
output = D(fake)
errG = loss(output, real_label)
errG.backward()
trainer_G.step(batch_size)
####
# Plot Loss
####
# append new data to arrays
loss_D = nd.concat(loss_D, nd.mean(errD), dim=0)
loss_G = nd.concat(loss_G, nd.mean(errG), dim=0)
name, acc = metric.get()
acc_d = nd.concat(acc_d, nd.array([acc], ctx=ctx), dim=0)
# plot array
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
name, acc = metric.get()
metric.reset()
logging.info('Binary training acc at epoch %d: %s=%f' % (epoch, name, acc))
logging.info('time: %f' % (time.time() - tic))
output
img = G(mx.nd.random_normal(0,1,shape=(100, latent_z_size), ctx=ctx))[0].reshape((28, 28))
plt.imshow(img.asnumpy(),cmap='gray')
plt.show()
Now this doesn't get nearly as good as the repo's example from above. Although fairly similar.
Thus I was wondering if you could take a look and figure out why:
the colors are inverted
why the results are sub par
I have been fiddling around with this trying a lot of various things to improve the results (I will list this in a second), but for the MNIST dataset this really shouldn't be needed.
Things I have tried (and I have also tried a host of combinations):
increasing the generator network
increasing the discriminator network
using soft labeling
using noisy labeling
batch norm after every layer in the generator
batch norm of the data
normalizing all values between -1 and 1
leaky relus in the generator
drop out layers in the generator
increased learning rate of discriminator compared to generator
decreased learning rate of i compared to generator
Please let me know if you have any ideas.
1) If you look into original dataset:
training_set = mnist["train_data"].reshape(60000, 28, 28)
plt.imshow(training_set[10,:,:], cmap='gray')
you will notice that the digits are white on a black background. So, technically speaking, your results are not inversed - they match the pattern of original images you used as a real data.
If you want to invert colors for visualization purposes, you can easily do that by changing the pallete to reversed one by adding '_r' (it works for all color palletes):
plt.imshow(img.asnumpy(), cmap='gray_r')
You also can play with ranges of colors by changing vmin and vmax parameters. They control how big the difference between colors should be. By default it is calculated automatically based on provided set.
2) "Why the results are sub par" - I think this is exactly the reason why the community started to use dcGANs. To me the results in the git repo you provided are quite noisy. Surely, they are different from what you receive, and you can achieve the same quality just by changing your activation functions from tanh to sigmoid as in the example on github:
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="sigmoid"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="sigmoid"))
Sigmoid never goes below zero and it works better in this scenario. Here is a sample picture I get if I train updated model for 30 epochs (the rest of the hyperparameters are same).
If you decide to explore dcGAN to get even better results, take a look here - https://mxnet.incubator.apache.org/tutorials/unsupervised_learning/gan.html It is a well explained tutorial on how to build dcGAN with Mxnet and Gluon. By using dcGAN you will get way better results than that.

Resources