Forecasting stock prices using SVR: A timeseries based problem - time-series

I am trying to forecast stock prices (Adj Close) using SVR. I am able to train the model for training data but I'm getting an error for test data. Train data is stored in dataframe df, from 2014 to 2018 and test data is stored in dataframe test_df from 2019 till today. Here is the code:
import pandas as pd
import pandas_datareader.data as web
import datetime
import numpy as np
from matplotlib import style
# Get the stock data using yahoo API:
style.use('ggplot')
# get 2014-2018 data to train our model
start = datetime.datetime(2014,1,1)
end = datetime.datetime(2018,12,30)
df = web.DataReader("TSLA", 'yahoo', start, end)
# get 2019 data to test our model on
start = datetime.datetime(2019,1,1)
end = datetime.date.today()
test_df = web.DataReader("TSLA", 'yahoo', start, end)
# sort by date
df = df.sort_values('Date')
test_df = test_df.sort_values('Date')
# fix the date
df.reset_index(inplace=True)
df.set_index("Date", inplace=True)
test_df.reset_index(inplace=True)
test_df.set_index("Date", inplace=True)
df.tail()
# Converting dates
import matplotlib.dates as mdates
# change the dates into ints for training
dates_df = df.copy()
dates_df = dates_df.reset_index()
# Store the original dates for plotting the predicitons
org_dates = dates_df['Date']
# convert to ints
dates_df['Date'] = dates_df['Date'].map(mdates.date2num)
dates_df.tail()
# Use sklearn support vector regression to predicit our data:
from sklearn.svm import SVR
dates = dates_df['Date'].to_numpy()
prices = df['Adj Close'].to_numpy()
#Convert to 1d Vector
dates = np.reshape(dates, (len(dates), 1))
prices = np.reshape(prices, (len(prices), 1))
svr_rbf = SVR(kernel= 'rbf', C= 1e3, gamma= 0.1)
svr_rbf.fit(dates, prices)
plt.figure(figsize = (12,6))
plt.plot(dates, prices, color= 'black', label= 'Data')
plt.plot(org_dates, svr_rbf.predict(dates), color= 'red', label= 'RBF model')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
For training data it works fine till here. Next, how do I forecast test data (test_df).

Following your convention, it should look as follows:
# change the dates into ints for training
test_dates_df = test_df.copy()
test_dates_df = test_dates_df.reset_index()
# Store the original dates for plotting the predicitons
test_org_dates = test_dates_df['Date']
# convert to ints
test_dates_df['Date'] = test_dates_df['Date'].map(mdates.date2num)
test_dates = test_dates_df['Date'].to_numpy()
test_prices = test_df['Adj Close'].to_numpy()
#Convert to 1d Vector
test_dates = np.reshape(test_dates, (len(test_dates), 1))
test_prices = np.reshape(test_prices, (len(test_prices), 1))
# Predict on unseen test data
y_hat_test = svr_rbf.predict(test_dates)
# Visualize predictions against real values
plt.figure(figsize = (12,6))
plt.plot(test_dates, test_prices, color= 'black', label= 'Data')
plt.plot(test_org_dates, y_hat_test, color= 'red', label= 'RBF model (test)')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

Related

my LSTM time-series forecast predictions have a shift to the future

I am trying to make predcitions on a time-series data (univariate) using Tensorflow. However, when I see the results, the predictions have a time shift (please see the figure). Anyone ideas what the problem is?
Below is the function I use to prepare the set of inputs and outputs (X and Y) as well as normalizing the data
# define a fucntion to prepare the X and Y for training the model
def df_to_X_y_group(df_as_np, window_size, num): # input is numpy array. j_start and j_end show the features, num is the number of wavelength in a group
X=[] # make an empty list
y=[]
for j in range(0,df_as_np.shape[1]-num,num): #(start, stop, step). defining a window over features of size num wihtout any overlaps
for i in range(len(df_as_np)-window_size): # there is overlap between windows in time
row=[a for a in df_as_np[i:i+window_size, j:j+num]]
X.append(row)
label=[r for r in df_as_np[i+window_size, j:j+num]]
y.append(label)
return np.array(X), np.array(y)
# convert the data to X and y format
window_size_group=50
output_group=2
num=2
X_group , y_group= df_to_X_y_group(df_as_np, window_size_group, num) # 70085 rows (number of training samples), 6 is the size of the window (number of time steps) and 5 is the number of features
X_group.shape , y_group.shape
# Split the data to train, val and test by 70,20 and 10% of the data
X_train_group , y_train_group = X_group[:int(.7 * X_group.shape[0])], y_group[:int(.7 * X_group.shape[0])]
X_val_group , y_val_group = X_group[int(.7 * X_group.shape[0]):int(.9 * X_group.shape[0])] , y_group[int(.7 * X_group.shape[0]):int(.9 * X_group.shape[0])]
X_test_group , y_test_group = X_group[int(.9 * X_group.shape[0]):] , y_group[int(.9 * X_group.shape[0]):]
# Normalize the data
X_train_mean_group = np.mean(X_train_group)
X_train_std_group = np.std(X_train_group)
def preprocess_group(X):
X_norm = (X - X_train_mean_group) / X_train_std_group
return X_norm
# Now convert the data back to before normalization
def postprocess_group(pred):
actual = (pred * X_train_std_group) + X_train_mean_group
return actual
# apply the preprocess to the inputs and outputs
X_train_norm_group=preprocess_group(X_train_group)
X_val_norm_group=preprocess_group(X_val_group)
X_test_norm_group=preprocess_group(X_test_group)
y_train_norm_group=preprocess_group(y_train_group)
y_val_norm_group=preprocess_group(y_val_group)
y_test_norm_group=preprocess_group(y_test_group)
X_train_group.shape , y_train_group.shape , X_val_group.shape , y_val_group.shape , X_test_group.shape , y_test_group .shape
Here is the model:
# Build the model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint # to save the model
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.losses import Huber
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
model_group=Sequential()
model_group.add(InputLayer((X_group.shape[1],X_group.shape[2]))) # size of the input (windows (time steps) of size 6 and 1024 features )
model_group.add(LSTM(128))
model_group.add(Dense(20, 'relu'))
model_group.add(Dense(output_group))
model_group.summary()
# Train the model
cp2=ModelCheckpoint('model/group/overlapnum2' , save_best_only=True) # it saves the best model based on val error
model_group.compile(loss=MeanSquaredError(), optimizer=Adam(learning_rate=0.0001), metrics=[RootMeanSquaredError()])
model_group.fit(X_train_norm_group, y_train_norm_group, validation_data=(X_val_norm_group, y_val_norm_group), epochs=15, , callbacks=[cp2])
Below is the function to predict and plot the data:
# We make a function which predict and plot
from sklearn.metrics import mean_squared_error as mse
def plot_predictions(model, X, y, start=0, end=100):
predictions = model.predict(X[start:end])
predictions_actual=postprocess(predictions)
y_actual=postprocess(y[start:end])
plt.plot(predictions_actual, c='r', label='Predictions')
plt.plot(y_actual, label= 'Actuals')
plt.xlabel("Order of the data (TIme)")
plt.ylabel("Number of Photons")
plt.legend() # so the label shows up
return mse(y_actual, predictions_actual)
# plot
plot_predictions(model, X_test_norm, y_test_norm, start=0, end=300)
Here is the predcitions results which are shifted in time (for the true results, y(t) should be replaced with y(t+1)):
enter image description here
I have tried tuning several hyperparameters such as batch size, window size, or adding layers but none of these helped.

How to forecast actual future values using XGBoost?

So I have a solar Irradiation dataset having around 61000+ rows & 2 columns. I have made the model using XGBoost to predict the future values.
I have splitted the data in 2 parts train and test and trained the model accordingly. Furthermore, I have made the predictions on the test data set. Everything is going fine. But now I want to predict the actual forecast How can I do that ??
import os
import pandas as pd
import numpy as np
import xgboost
import matplotlib.pyplot as plt
from xgboost import plot_importance
from sklearn import metrics
# Dataset
df=pd.read_csv('Readings_last_7yr.csv')
df.index = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')
## Copied the dataset
df2 = df.copy()
del df2['Date']
## Test Train Split
from pandas import read_csv
from matplotlib import pyplot
# series = read_csv('sunspots.csv', header=0, index_col=0)
X = df2
train_size = int(len(X) * 0.75)
train, test = X[0:train_size], X[train_size:len(X)]
print('Observations: %d' % (len(X)))
print('Training Observations: %d' % (len(train)))
print('Testing Observations: %d' % (len(test)))
pyplot.plot(train)
pyplot.plot(test)
pyplot.show()
## Creating Features
def create_features(df, target_variable):
df['date'] = df.index
df['hour'] = df['date'].dt.hour
df['dayofweek'] = df['date'].dt.dayofweek
df['quarter'] = df['date'].dt.quarter
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
df['dayofyear'] = df['date'].dt.dayofyear
df['dayofmonth'] = df['date'].dt.day
df['weekofyear'] = df['date'].dt.weekofyear
X = df[['hour','dayofweek','quarter','month','year',
'dayofyear','dayofmonth','weekofyear']]
if target_variable:
y = df[target_variable]
return X, y
return X
## METRICS
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def timeseries_evaluation_metrics_func(y_true, y_pred):
print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
print(f'MAPE is : {mean_absolute_percentage_error(y_true, y_pred)}')
print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n')
train_copy = train.copy()
test_copy = test.copy()
trainX, trainY = create_features(train_copy, target_variable='Irr')
testX, testY = create_features(test_copy, target_variable='Irr')
xgb = XGBRegressor(objective= 'reg:linear', n_estimators=1000)
xgb
xgb.fit(trainX, trainY,
eval_set=[(trainX, trainY), (testX, testY)],
early_stopping_rounds=50,
verbose=False)
# Predictions
predicted_results = xgb.predict(testX)
# Metrics
timeseries_evaluation_metrics_func(testY, predicted_results)
# Plotting graph for test and Predicted
plt.figure(figsize=(13,8))
plt.plot(list(testY))
plt.plot(list(predicted_results))
plt.title("Actual vs Predicted")
plt.ylabel("Irr")
plt.legend(('Actual','predicted'))
plt.show()
# Making graph of predicted on the whole dataframe
test['Prediction'] = predicted_results
Irr_all = pd.concat([test, train], sort=False)
Irr_all = Irr_all.rename(columns={'Irradiation':'Original_Value'})
Overview_Complete_Data_And_Prediction = Irr_all[['Irr','Prediction']].plot(figsize=(12, 5))
I am getting the results as:
So now I want to predict future values from the year 2021 till 2023.
FUTURE PREDICTION
dti = pd.date_range("2021-01-01 00:30:00", periods=20000, freq="H")
df_future_dates = pd.DataFrame(dti, columns = ['Date'])
df_future_dates['Irr'] = np.nan
df_future_dates.index = pd.to_datetime(df_future_dates['Date'], format='%Y-%m-%d %H:%M:%S')
df_future_dates_copy = df_future_dates.copy()
testX_future, testY_future = create_features(df_future_dates, target_variable='Irr')
xgb = XGBRegressor(objective= 'reg:linear', n_estimators=1000)
xgb
## Now here I have used train and test from above
xgb.fit(trainX, trainY,
eval_set=[(trainX, trainY), (testX, testY)],
early_stopping_rounds=50,
verbose=False)
predicted_results_future = xgb.predict(testX_future)
# Graph
plt.figure(figsize=(13,8))
plt.plot(list(predicted_results_future))
plt.title("Predicted")
plt.ylabel("Irr")
plt.legend(('predicted'))
plt.show()
df_future_dates_copy['Prediction'] = predicted_results_future
Irr_all_future = pd.concat([df2, df_future_dates_copy], sort=False)
# Future Graph
Overview_Complete_Data_And_Prediction_future = Irr_all_future[['Irr','Prediction']].plot(figsize=(15, 5))

Trying to predict running time of algorithms through regression

I'm following this paper:
http://robotics.stanford.edu/users/shoham/www%20papers/Empirical%20Hardness.pdf
and I try to predict the running time for the Traveling salesman problem on a blackbox solver.
I get some weird results during regression that I'd love to consult about:
I find it hard to believe that in XGBOOST or at any regessor the number of cities is irrelevant as a feature? as seen in XGBOOST feature importance image.
In the RIDGE and LINEAR REGRESSION results graphs you can see that for some problem instances the graphs you can see that the predicted value is negative (when we talk about run time), I saw in other question here that this is because "Linear regression does not respect the bounds of 0" and that I should put a natural log on it, but I don't know exactly where. So I'd love help with that also.
I'd also love to be reccomended on other regression models that may fit my problem.
Thanks a lot!
Here is my code pieces (google colab), followed by the results I got:
1
# Import the standard libraries of pandas.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings. filterwarnings("ignore")
sns.set_style('whitegrid')
from google.colab import files
2
# Install the solver and import its libraries, in addition import all the
# libraries with which we will prepare the features.
!pip3 install ortools
!pip install python-igraph
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
import numpy as np
import time
import random
from random import randrange
from scipy import stats
from scipy.stats import skew
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import minimum_spanning_tree
from scipy.sparse.csgraph import depth_first_tree
from igraph import Graph, mean
import igraph
import itertools
import math
3
# Simple travelling salesman problem between cities - solver OR Tools By Google.
def create_data_model():
# Stores the data for the problem.
data = {}
# dim will be the number of Vertices\Cities in the Traveling Salesman Problem.
# Randomly select the matrix dimension in unifom distribution.
dim = np.random.randint(10, 350)
# Generate a square symmetric matrix It will be the distance matrix that the solver will solve.
square_matrice = [[0 for row in range(dim)] for col in range(dim)]
for i in range(dim):
for j in range(dim):
if i == j:
square_matrice[i][j] = 0
else:
# Randomly fill the matrix in unifom distribution.
square_matrice[i][j] = square_matrice[j][i] = np.random.randint(1, 1000)
data['distance_matrix'] = square_matrice # yapf: disable
data['num_vehicles'] = 1
data['depot'] = 0
return data
def main():
# Start measuring solution time.
start_time = time.time()
# Instantiate the data problem.
data = create_data_model()
# Create the routing index manager.
manager = pywrapcp.RoutingIndexManager(len(data['distance_matrix']),
data['num_vehicles'], data['depot'])
# Create Routing Model.
routing = pywrapcp.RoutingModel(manager)
def distance_callback(from_index, to_index):
# Returns the distance between the two nodes.
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['distance_matrix'][from_node][to_node]
transit_callback_index = routing.RegisterTransitCallback(distance_callback)
# Define cost of each arc.
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
# Setting first solution heuristic.
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
# Solve the problem.
solution = routing.SolveWithParameters(search_parameters)
solution_time = time.time() - start_time
'''In this part of the code we will create the following features on the distance matrix of the problem.
* Mean - Average weights of the distance matrix.
* Std - Standard Deviation of the distance matrix.
* Skewness - What is the tendency of the weights in the distance matrix.
* Noc - Number of cities we have in the distance matrix [matrix dimension].
* Td - The total distance of the solution rout.
* Dmft - Distance matrix features time, That is how long it took us to calculate all these features.
'''
dmt_start_time = time.time()
mat = np.array(data['distance_matrix'])
mean = mat.mean()
std = mat.std()
merged = list(itertools.chain(*mat))
skewness = skew(merged)
noc = len(data['distance_matrix'])
td = solution.ObjectiveValue() if solution else -1
dmft = time.time() - dmt_start_time
'''In this part of the code we will create from the distance matrix of the problem an MST and than
on the MST we take the following features.
* MST_Mean - Average weights of the MST.
* MST_Std - Standard Deviation of the MST.
* MST_Skewness - What is the tendency of the weights in the MST.
* MST_ft - MST features time, That is how long it took us to calculate the MST & all these features.
'''
spt_start_time = time.time()
X = csr_matrix(mat)
Tcsr = minimum_spanning_tree(X)
mat_st = np.array(Tcsr.toarray().astype(int))
mst_mean = mat_st.mean()
mst_std = mat_st.std()
merged_st = list(itertools.chain(*mat_st))
mst_skewness = skew(merged_st)
mst_ft = time.time() - spt_start_time
'''In this part of the code we calculate features from the MST that are considered to be
related to the rank and depth of the tracks in it.
* D_Mean - Average degree of the MST.
* D_Std - Standard Deviation of the MST degrees.
* D_Skewness - What is the tendency of the degrees in the MST.
* DFT_Mean - The average weight of the deepest track in MST.
* DFT_Std - Standard Deviation of the deepest track in MST.
* DFT_Max - The heaviest arch on the longest route in MST.
* DDFT_ft - Degree & DFT features time, That is how long it took us to calculate all these features.
'''
dstt_start_time = time.time()
g = Graph.Weighted_Adjacency(mat_st.tolist())
d_mean = igraph.statistics.mean(g.degree())
d_std = igraph.statistics.sd(g.degree())
d_skewness = skew(g.degree())
d_t = depth_first_tree(X, 0, directed=False)
mat_dt = np.array(d_t.toarray().astype(int))
dft_mean = mat_dt.mean()
dft_std = mat_dt.std()
dft_max = np.amax(mat_dt)
ddft_ft = time.time() - dstt_start_time
# In this map we will hold all the features and their results.
features_map = {'Mean': mean, 'Std': std, 'Skewness': skewness, 'Noc': noc, 'Td': td, 'Dmft': dmft,
'MST_Mean': mst_mean, 'MST_Std': mst_std, 'MST_Skewness': mst_skewness, 'MST_ft': mst_ft,
'D_Mean': d_mean, 'D_Std': d_std, 'D_Skewness': d_skewness, 'DFT_Mean': dft_mean,'DFT_Std': dft_std,
'DFT_Max': dft_max, 'DDFT_ft': ddft_ft, 'Solution_time': solution_time}
return features_map
# Main
# Create dataFrame.
data_TSP = pd.DataFrame()
# Fill the dataFrame.
for i in range(10000):
#print(i)
features_map = main()
data_TSP = data_TSP.append(features_map, ignore_index=True)
# Show data frame.
data_TSP.head()
data_TSP.to_csv('data_10000.csv')
files.download('data_10000.csv')
Regression models:
# Import the standard libraries of pandas.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings. filterwarnings("ignore")
sns.set_style('whitegrid')
2
# Neaded for opening data file in drive.
from google.colab import files
uploaded = files.upload()
import io
df = pd.read_csv(io.BytesIO(uploaded['data_10000_clean.csv']))
try:
df.drop(['Unnamed: 0'], axis=1, inplace=True)
except:
pass
df.head()
from sklearn.model_selection import train_test_split
# Split the data to training set and test set (70%, 30%)
features = list(df.drop('Solution_time', axis = 1, inplace = False))
y = df['Solution_time']
X = df[features]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
Import models which we predicted with tham the solution,
And scoring methods to evaluate these models.
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import GridSearchCV
from sklearn import linear_model
!pip3 install xgboost
from xgboost import XGBRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import cross_val_score
!pip install scikit-plot
import scikitplot as skplt
import matplotlib as mpl
########################################## Several functions for different regression models. ##########################################
class Score:
r2 = 0.0 # This score determines how close the predictions really are to the real data.
cross_vali_score = 0.0 # How true is our algorithm if it going to well predict new data.
class Regressor:
def __init__(self, name):
self.name = name
self.score = Score()
self.y_pred = None
self.reg = None
# Map between the name of a model and the model itself.
models_map = {'Random Forest': RandomForestRegressor(), 'Xgboost': XGBRegressor(), 'Ridge': Ridge(),
'Kneighbors': KNeighborsRegressor(), 'Linear Regressor': linear_model.LinearRegression()}
# This function return a map that maps between each model and its Regressor class.
def get_models():
result_map = {}
for key, val in models_map.items():
result = Regressor(key)
reg = val
result.score.cross_vali_score = np.mean(cross_val_score(reg, X_train, y_train, cv=5))
result.reg = reg.fit(X_train, y_train)
result.y_pred = reg.predict(X_test)
result.score.r2 = r2_score(y_test, result.y_pred)
result_map[key] = result
return result_map
# This function print a graph for models of the features that influenced their decision making.
def print_influence_graph(map):
for key, val in map.items():
if key == 'Random Forest' or key == 'Xgboost':
# The parameters that most influenced the decision.
feature_imp = pd.Series(val.reg.feature_importances_,index=features).sort_values(ascending=False)
sns.barplot(x=feature_imp, y=feature_imp.index)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title(val.name.upper() +" - Visualizing Important Features")
plt.show()
# This function print a graph for models that show the real results against the model predictions.
def show_predicted_vs_actual(map):
for key, val in map.items():
fig, ax = plt.subplots()
ax.scatter(y_test, val.y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=3)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
ax.title.set_text(val.name.upper() +" - Predicted time vs actual time")
plt.show()
# This function print numerical scores for the models.
def print_scores(map):
for key, val in map.items():
print(val.name.upper() + ' SCORE: ')
print('R2score' + ' = ', val.score.r2)
print('Cross_val_score' + ' = ', val.score.cross_vali_score)
print('------------------------------------------\n')
# This function print a graph showing the differences between the scores of the models.
def show_models_differences_graph(map):
comp_df = pd.DataFrame(columns = ('Method', 'R2 Score', 'Cross val score'))
for i in map:
row = {'Method': i, 'R2 Score': map[i].score.r2, 'Cross val score': map[i].score.cross_vali_score}
comp_df = comp_df.append(row, ignore_index=True)
ax = comp_df.plot.bar(x='Method', rot=30, figsize=(12,6))
ax.set_title('Comparison graph')
#########################################################################################################################################
models = get_models()
print_influence_graph(models)
show_predicted_vs_actual(models)
print_scores(models)
show_models_differences_graph(models)
And here are the results:

Unable to get appropriate prediction using statsmodel for HoltWinters

I am trying to experiment with HoltWinters using some random data. However, using the statsmodel api I am unable to prediction for the next X data points.
Here is my sample code. I am unable to understand the predict API and what it means by start and end.
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing
data = np.linspace(start=15, stop=25, num=100)
noise = np.random.uniform(0, 1, 100)
data = data + noise
split = int(len(data)*0.7)
data_train = data[0:split]
data_test = data[-(len(data) - split):]
model = ExponentialSmoothing(data_train)
model_fit = model.fit()
# make prediction
pred = model_fit.predict(split+1, len(data))
test_index = [i for i in range(split, len(data))]
plt.plot(data_train, label='Train')
plt.plot(test_index, data_test, label='Test')
plt.plot(test_index, pred, label='Prediction')
plt.legend(loc='best')
plt.show()
I get a weird graph for prediction and I believe it has something to do with my understanding of the predict API.
The exponential smoothing model you've chosen doesn't include a trend, so it is forecasting the best level, and that gives a horizontal line forecast.
If you do:
model = ExponentialSmoothing(data_train, trend='add')
then you will get a trend, and likely it will look more like you expect.
For example:
# Simulate some data
np.random.seed(12346)
dta = pd.Series(np.arange(100) + np.sin(np.arange(100)) * 5 + np.random.normal(scale=4, size=100))
# Perform exponention smoothing, no trend
mod1 = sm.tsa.ExponentialSmoothing(dta)
res1 = mod1.fit()
fcast1 = res1.forecast(30)
plt.plot(dta)
plt.plot(fcast1, label='Model without trend')
# Perform exponention smoothing, with a trend
mod2 = sm.tsa.ExponentialSmoothing(dta, trend='add')
res2 = mod2.fit()
fcast2 = res2.forecast(30)
plt.plot(fcast2, label='Model with trend')
plt.legend(loc='lower right')
gives the following:

MXNet - application of GANs to MNIST

So this question is about GANs.
I am trying to do a trivial example for my own proof of concept; namely, generate images of hand written digits (MNIST). While most will approach this via deep convolutional gans (dgGANs), I am just trying to achieve this via the 1D array (i.e. instead of 28x28 gray-scale pixel values, a 28*28 1d array).
This git repo features a "vanilla" gans which treats the MNIST dataset as a 1d array of 784 values. Their output values look pretty acceptable so I wanted to do something similar.
Import statements
from __future__ import print_function
import matplotlib as mpl
from matplotlib import pyplot as plt
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn, utils
import numpy as np
import os
from math import floor
from random import random
import time
from datetime import datetime
import logging
ctx = mx.gpu()
np.random.seed(3)
Hyper parameters
batch_size = 100
epochs = 100
generator_learning_rate = 0.001
discriminator_learning_rate = 0.001
beta1 = 0.5
latent_z_size = 100
Load data
mnist = mx.test_utils.get_mnist()
# convert imgs to arrays
flattened_training_data = mnist["test_data"].reshape(10000, 28*28)
define models
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="tanh"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="tanh"))
loss = gluon.loss.SoftmaxCrossEntropyLoss()
init stuff
G.initialize(mx.init.Normal(0.02), ctx=ctx)
D.initialize(mx.init.Normal(0.02), ctx=ctx)
trainer_G = gluon.Trainer(G.collect_params(), 'adam', {"learning_rate": generator_learning_rate, "beta1": beta1})
trainer_D = gluon.Trainer(D.collect_params(), 'adam', {"learning_rate": discriminator_learning_rate, "beta1": beta1})
metric = mx.metric.Accuracy()
dynamic plot (for juptyer notebook)
import matplotlib.pyplot as plt
import time
def dynamic_line_plt(ax, y_data, colors=['r', 'b', 'g'], labels=['Line1', 'Line2', 'Line3']):
x_data = []
y_max = 0
y_min = 0
x_min = 0
x_max = 0
for y in y_data:
x_data.append(list(range(len(y))))
if max(y) > y_max:
y_max = max(y)
if min(y) < y_min:
y_min = min(y)
if len(y) > x_max:
x_max = len(y)
ax.set_ylim(y_min, y_max)
ax.set_xlim(x_min, x_max)
if ax.lines:
for i, line in enumerate(ax.lines):
line.set_xdata(x_data[i])
line.set_ydata(y_data[i])
else:
for i in range(len(y_data)):
l = ax.plot(x_data[i], y_data[i], colors[i], label=labels[i])
ax.legend()
fig.canvas.draw()
train
stamp = datetime.now().strftime('%Y_%m_%d-%H_%M')
logging.basicConfig(level=logging.DEBUG)
# arrays to store data for plotting
loss_D = nd.array([0], ctx=ctx)
loss_G = nd.array([0], ctx=ctx)
acc_d = nd.array([0], ctx=ctx)
labels = ['Discriminator Loss', 'Generator Loss', 'Discriminator Acc.']
%matplotlib notebook
fig, ax = plt.subplots(1, 1)
ax.set_xlabel('Time')
ax.set_ylabel('Loss')
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
for epoch in range(epochs):
tic = time.time()
data_iter.reset()
for i, batch in enumerate(data_iter):
####################################
# Update Disriminator: maximize log(D(x)) + log(1-D(G(z)))
####################################
# extract batch of real data
data = batch.data[0].as_in_context(ctx)
# add noise
# Produce our noisey input to the generator
latent_z = mx.nd.random_normal(0,1,shape=(batch_size, latent_z_size), ctx=ctx)
# soft and noisy labels
# real_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(.7, 1.2, shape=(1)).asscalar()
# fake_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(0, .3, shape=(1)).asscalar()
# real_label = nd.random_uniform(.7, 1.2, shape=(batch_size), ctx=ctx)
# fake_label = nd.random_uniform(0, .3, shape=(batch_size), ctx=ctx)
real_label = mx.nd.ones((batch_size, ), ctx=ctx)
fake_label = mx.nd.zeros((batch_size, ), ctx=ctx)
with autograd.record():
# train with real data
real_output = D(data)
errD_real = loss(real_output, real_label)
# train with fake data
fake = G(latent_z)
fake_output = D(fake.detach())
errD_fake = loss(fake_output, fake_label)
errD = errD_real + errD_fake
errD.backward()
trainer_D.step(batch_size)
metric.update([real_label, ], [real_output,])
metric.update([fake_label, ], [fake_output,])
####################################
# Update Generator: maximize log(D(G(z)))
####################################
with autograd.record():
output = D(fake)
errG = loss(output, real_label)
errG.backward()
trainer_G.step(batch_size)
####
# Plot Loss
####
# append new data to arrays
loss_D = nd.concat(loss_D, nd.mean(errD), dim=0)
loss_G = nd.concat(loss_G, nd.mean(errG), dim=0)
name, acc = metric.get()
acc_d = nd.concat(acc_d, nd.array([acc], ctx=ctx), dim=0)
# plot array
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
name, acc = metric.get()
metric.reset()
logging.info('Binary training acc at epoch %d: %s=%f' % (epoch, name, acc))
logging.info('time: %f' % (time.time() - tic))
output
img = G(mx.nd.random_normal(0,1,shape=(100, latent_z_size), ctx=ctx))[0].reshape((28, 28))
plt.imshow(img.asnumpy(),cmap='gray')
plt.show()
Now this doesn't get nearly as good as the repo's example from above. Although fairly similar.
Thus I was wondering if you could take a look and figure out why:
the colors are inverted
why the results are sub par
I have been fiddling around with this trying a lot of various things to improve the results (I will list this in a second), but for the MNIST dataset this really shouldn't be needed.
Things I have tried (and I have also tried a host of combinations):
increasing the generator network
increasing the discriminator network
using soft labeling
using noisy labeling
batch norm after every layer in the generator
batch norm of the data
normalizing all values between -1 and 1
leaky relus in the generator
drop out layers in the generator
increased learning rate of discriminator compared to generator
decreased learning rate of i compared to generator
Please let me know if you have any ideas.
1) If you look into original dataset:
training_set = mnist["train_data"].reshape(60000, 28, 28)
plt.imshow(training_set[10,:,:], cmap='gray')
you will notice that the digits are white on a black background. So, technically speaking, your results are not inversed - they match the pattern of original images you used as a real data.
If you want to invert colors for visualization purposes, you can easily do that by changing the pallete to reversed one by adding '_r' (it works for all color palletes):
plt.imshow(img.asnumpy(), cmap='gray_r')
You also can play with ranges of colors by changing vmin and vmax parameters. They control how big the difference between colors should be. By default it is calculated automatically based on provided set.
2) "Why the results are sub par" - I think this is exactly the reason why the community started to use dcGANs. To me the results in the git repo you provided are quite noisy. Surely, they are different from what you receive, and you can achieve the same quality just by changing your activation functions from tanh to sigmoid as in the example on github:
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="sigmoid"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="sigmoid"))
Sigmoid never goes below zero and it works better in this scenario. Here is a sample picture I get if I train updated model for 30 epochs (the rest of the hyperparameters are same).
If you decide to explore dcGAN to get even better results, take a look here - https://mxnet.incubator.apache.org/tutorials/unsupervised_learning/gan.html It is a well explained tutorial on how to build dcGAN with Mxnet and Gluon. By using dcGAN you will get way better results than that.

Resources