I'm running a fairly simple test of CNTK but not getting results that make much sense. My training/test data consist of one feature and one label. The feature is a decimal and the label will be an integer between 0-5. In a majority of cases the value of the label will be 0 or 1 and get increasingly rare as the value gets higher. 5 appears in about 16/30,000 cases.
What's odd is that when I output the results they indicate that each possible label has about an equal chance of occurring. I would expect 0 or one to be the most likely and 5 the be extremely unlikely. I was hoping SO could shed some light on what I might be doing wrong here. I included some sample data, sample output and a config file below.
Config:
# Parameters can be overwritten on the command line
# for example: cntk configFile=myConfigFile RootDir=../..
# For running from Visual Studio add
# currentDirectory=$(SolutionDir)/<path to corresponding data folder>
RootDir = ".."
ConfigDir = "$RootDir$/Config"
DataDir = "$RootDir$/Data"
OutputDir = "$RootDir$/Output"
ModelDir = "$OutputDir$/Models"
# deviceId=-1 for CPU, >=0 for GPU devices, "auto" chooses the best GPU, or CPU if no usable GPU is available
deviceId = 0
command = Simple_Demo_Train:Simple_Demo_Train
precision = "float"
traceLevel = 1
modelPath = "$ModelDir$/simple.dnn"
outputNodeNames = ScaledLogLikelihood
#######################################
# TRAINING CONFIG #
#######################################
Simple_Demo_Train = [
action = "train"
# Notation xxx:yyy*n:zzz is equivalent to xxx, then yyy repeated n times, then zzz
# Example: 10:20*3:5 is equivalent to 10:20:20:20:5
SimpleNetworkBuilder = [
# 2 input, 2 50-element hidden, 2 output
layerSizes = 1:50*3:6
trainingCriterion = "CrossEntropyWithSoftmax"
evalCriterion = "ErrorPrediction"
layerTypes = "Sigmoid"
initValueScale = 1.0
applyMeanVarNorm = true
uniformInit = true
needPrior = true
]
SGD = [
# epochSize = 0 means epochSize is the size of the training set
epochSize = 0
minibatchSize = 25
learningRatesPerMB = 0.5:0.2*20:0.1
momentumPerMB = 0.9
dropoutRate = 0.0
maxEpochs = 10000
]
# Parameter values for the reader
reader = [
readerType = "UCIFastReader"
file = "$DataDir$/train.txt"
miniBatchMode = "partial"
randomize = "none"
verbosity = 1
features = [
dim = 1 # two-dimensional input data
start = 0 # Start with first element on line
]
labels = [
start = 1 # Skip two elements
dim = 1 # One label dimension
labelDim = 5 # Two labels possible
labelMappingFile = "$DataDir$/mapping.txt"
]
]
]
########################################
# TEST RESULTS #
# (computes prediction error and #
# perplexity on a test set and #
# writes the output to the console.) #
########################################
Simple_Demo_Test = [
action = "test"
# Parameter values for the reader
reader = [
readerType = "UCIFastReader"
file = "$DataDir$/test.txt"
miniBatchMode = "partial"
randomize = "none"
verbosity = 1
features = [
dim = 1 # two-dimensional input data
start = 0 # Start with first element on line
]
labels = [
start = 1 # Skip two elements
dim = 1 # One label dimension
labelDim = 5 # Two labels possible
labelMappingFile = "$DataDir$/mapping.txt"
]
]
]
########################################
# OUTPUT RESULTS #
# (Computes the labels for a test set #
# and writes the results to a file.) #
########################################
Simple_Demo_Output=[
action = "write"
# Parameter values for the reader
reader = [
readerType = "UCIFastReader"
file = "$DataDir$/test.txt"
miniBatchMode = "partial"
randomize = "none"
verbosity = 1
features = [
dim = 1 # two-dimensional input data
start = 0 # Start with first element on line
]
labels = [
start = 1 # Skip two elements
dim = 1 # One label dimension
labelDim = 5 # Two labels possible
labelMappingFile = "$DataDir$/mapping.txt"
]
]
outputPath = "$OutputDir$/SimpleOutput" # Dump output as text
]
Sample Training Data:
0.86 2
0.84 0
6.818182 0
1.34 1
1 1
0.92 0
0.7692308 0
0.755102 1
0.86 2
5.466667 0
0.96 0
0.9459459 1
1 4
1 0
0.8421053 2
5.5 0
0.84 2
1.2 2
1.32 1
0.98 0
1 1
1.2 2
5.4 1
1.06 2
0.98 1
1.041667 3
0.82 2
7.333333 0
Sample Output:
3.18673 3.18266 3.19894 3.18264 3.2388 3.235
3.18683 3.18272 3.19895 3.18264 3.23872 3.23491
3.18668 3.18263 3.19894 3.18263 3.23884 3.23505
3.18653 3.18255 3.19893 3.18263 3.23895 3.23518
6.53459 4.97457 3.46288 3.3192 0.668835 0.204602
3.18667 3.18263 3.19894 3.18263 3.23884 3.23505
3.18657 3.18258 3.19893 3.18263 3.23892 3.23515
3.18655 3.18257 3.19893 3.18263 3.23894 3.23516
3.18665 3.18262 3.19894 3.18263 3.23886 3.23507
3.18656 3.18257 3.19893 3.18263 3.23893 3.23515
3.18654 3.18256 3.19893 3.18263 3.23895 3.23517
3.18688 3.18274 3.19895 3.18264 3.23869 3.23487
3.18675 3.18267 3.19894 3.18264 3.23879 3.23498
3.18679 3.18269 3.19895 3.18264 3.23875 3.23494
3.1866 3.18259 3.19893 3.18263 3.2389 3.23512
3.18655 3.18256 3.19893 3.18263 3.23894 3.23517
3.18652 3.18255 3.19893 3.18263 3.23896 3.23519
3.18656 3.18257 3.19893 3.18263 3.23893 3.23515
3.18656 3.18257 3.19893 3.18263 3.23894 3.23516
3.18688 3.18274 3.19895 3.18264 3.23869 3.23487
3.18698 3.1828 3.19896 3.18265 3.23861 3.23477
Mapping File:
0
1
2
3
4
5
With the information provided, it is pretty difficult to give a definitive answer to your question. What is most likely going on: Your network did not yet learn much, and is still in a rather "vague" state, where weights did not yet deviate much from their initial values. This is most likely because you have only used very little training, compared to the number of weights you are learning (mind you that you are using 3 layers of 50 neurons, on a 1D input!). When asked to predict on your test data, most of the test data is dissimilar to any training data, and your network will predict its "best guess", which is that all classes are equally likely.
To check that some learning is actually going on, try for example reducing your problem to a two-class setting, 0 versus the rest, and reduce the network complexity.
Related
Using GEKKO Python, we have trouble trying to learn a parameter that can vary multiple times per day. In some disciplines this is also known as 'regime detection or regime change detection'. We (me and my colleague Henri ter Hofte from Windesheim University of Applied Sciences) conceived 3 strategies but fail (more below).
Our question(s):
What are we doing wrong, is there an obvious error in our GEKKO code (more below in the details)?
Is strategy I doomed to fail, and should we switch to strategy II or II?
Is GEKKO Python even suitable for doing this kind of regime (change) detection?
Your help is much appreciated.
=== The problem:
We have time series data about:
(1) CO₂ concentration
(2) ventilation rates (or rather: valve fractions, which give ventilation rates, when multiplied with known maximum ventilation rates)
(3) occupancy (number of persons in a room)
For research question (A) we would like to know a proper estimate for (2) for each hour of the day, given time series data about (1) and (3).
For research question (B) we would like to know a proper estimate for (3) for each hour of the day, given time series data about (1) and (2).
We focus on research question A (but have similar questions for B).
=== The 3 strategies:
We considered 3 different strategies how to implement this using GEKKO Python:
Strategy I. Declare the variable valve_frac as a Manipulated Variabe in our GEKKO model (m.MV), since the GEKKO documentation says these variables can be "adjusted by the optimizer to minimize an objective function at every time point". and "Manipulated variables are like FVs, but can change with each data row, either calculated by the optimizer (STATUS=1) or specified by the user (STATUS=0)." according to https://gekko.readthedocs.io/en/latest/imode.html#mv
Strategy II. Split the time into several shorter time spans (e.g. one time span per hour) and then learn valve_frac as a GEKKO Fixed Variable (m.FV), one for each hour.
Strategy III. Reframe the problem to GEKKO as if it were a control problem: the setpoint is reaching a particular CO2 concentration and GEKKO has can use valve_frac as a Control Variable (m.CV)
We tried implementing strategy I (see more info and code below) but fail to get proper results.
Considering some equation derived from physics, we intend to find the best value for some specific variable (valve_frac__0 variable in following table. Having a dataframe (df_learn) like this:
Index
Date-Time
occupancy__p
valve_frac__0
co2__ppm
1
2022.12.01 – 00:00:00
0
0.51
546
2
2022.12.01 – 00:15:00
4
0.85
820
3
2022.12.01 – 00:30:00
1
0.21
595
4
2022.12.01 – 00:45:00
2
0.74
635
5
2022.12.01 – 00:15:00
0
0.65
559
6
2022.12.01 – 00:15:00
0
0.45
538
7
2022.12.01 – 00:15:00
2
0.82
659
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1920
2022.12.20 – 00:15:00
3
0.73
749
We are trying to develop a moving horizon estimation model (IMODE=5) or Control model (IMODE=6) to predict the valve_frac__0 value.
Following comes the code in Gekko format:
=== Code:
from gekko import GEKKO
# Gekko Model - Initialize
m = GEKKO(remote = False)
m.time = np.arange(0, duration__s, step__s)
# Conversion factors
s_min_1 = 60
min_h_1 = 60
s_h_1 = s_min_1 * min_h_1
mL_m_3 = 1e3 * 1e3
million = 1e6
# Constants
MET__mL_min_1_kg_1_p_1 = 3.5
desk_work__MET = 1.5
P_std__Pa = 101325
R__m3_Pa_K_1_mol_1 = 8.3145
T_room__degC = 20.0
T_std__degC = 0.0
T_zero__K = 273.15
T_std__K = T_zero__K + T_std__degC
T_room__K = T_zero__K + T_room__degC
infilt__m2 = 0.001
# Approximations
room__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_room__K)
std__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_std__K)
co2_ext__ppm = 415
# National averages
weight__kg = 77.5
MET__m3_s_1_p_1 = MET__mL_min_1_kg_1_p_1 * weight__kg / (s_min_1 * mL_m_3)
MET_mol_s_1_p_1 = MET__m3_s_1_p_1 * std__mol_m_3
co2_o2 = 0.894
co2__mol0_p_1_s_1 = co2_o2 * desk_work__MET * MET_mol_s_1_p_1
# Room averages
wind__m_s_1 = 3.0
# GEKKO Manipulated Variables: measured values
occupancy__p = m.MV(value = df_learn.occupancy__p.values)
occupancy__p.STATUS = 0; occupancy__p.FSTATUS = 1
# Strategy I:
valve_frac__0 = m.MV(value = df_learn.valve_frac__0.values)
valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# Strategy II:
#valve_frac__0 = m.FV(value = df_learn.valve_frac__0.values)
#valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# GEKKO Control Varibale (predicted variable)
co2__ppm = m.CV(value = df_learn.co2__ppm.values)
co2__ppm.STATUS = 1; co2__ppm.FSTATUS = 1
# GEKKO - Equations
co2_loss__ppm_s_1 = m.Intermediate((co2__ppm - co2_ext__ppm) * (vent_max__m3_s_1 * valve_frac__0 + wind__m_s_1 * infilt__m2) / room__m3)
co2_gain_mol0_s_1 = m.Intermediate(occupancy__p * co2__mol0_p_1_s_1 / (room__m3 * room__mol_m_3))
co2_gain__ppm_s_1 = m.Intermediate(co2_gain_mol0_s_1 * million)
m.Equation(co2__ppm.dt() == co2_gain__ppm_s_1 - co2_loss__ppm_s_1)
# GEKKO - Solver setting
m.options.IMODE = 5
m.options.EV_TYPE = 1
m.options.NODES = 2
m.solve(disp = False)
The result which I got for each scenario come as follow:
Strategy I:
There is no output for simulated “co2__ppm” and the output value for
valve_frac__0 = 0
Strategy II:
There is big difference between simulated and measured “co2__ppm” and the output value for
valve_frac__0 = 0.166 (which is not reasonable)
The code looks like it should work as long as valve_frac__0 is the adjustable unknown parameter that should be estimated from the CO2 PPM data. Here is a result on a smaller subset of the posted data.
The data doesn't fit exactly if there is a lower bound of zero on the valve position.
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
Otherwise, the valve position can be adjusted to fit the CO2 data perfectly.
Here is a complete script with the sample data.
from gekko import GEKKO
import numpy as np
# Gekko Model - Initialize
m = GEKKO(remote = False)
# data
# 1 2022.12.01 – 00:00:00 0 0.51 546
# 2 2022.12.01 – 00:15:00 4 0.85 820
# 3 2022.12.01 – 00:30:00 1 0.21 595
# 4 2022.12.01 – 00:45:00 2 0.74 635
# 5 2022.12.01 – 00:15:00 0 0.65 559
# 6 2022.12.01 – 00:15:00 0 0.45 538
# 7 2022.12.01 – 00:15:00 2 0.82 659
occupancy__p = np.array([0,4,1,2,0,0,2])
valve_frac__0 = np.array([0.51,0.85,0.21,0.74,0.65,0.45,0.82])
co2__ppm_meas = np.array([546,820,595,635,559,538,659])
duration__s = len(co2__ppm_meas)
m.time = np.linspace(0,duration__s-1,duration__s)
vent_max__m3_s_1 = 1
room__m3 = 1
# Conversion factors
s_min_1 = 60
min_h_1 = 60
s_h_1 = s_min_1 * min_h_1
mL_m_3 = 1e3 * 1e3
million = 1e6
# Constants
MET__mL_min_1_kg_1_p_1 = 3.5
desk_work__MET = 1.5
P_std__Pa = 101325
R__m3_Pa_K_1_mol_1 = 8.3145
T_room__degC = 20.0
T_std__degC = 0.0
T_zero__K = 273.15
T_std__K = T_zero__K + T_std__degC
T_room__K = T_zero__K + T_room__degC
infilt__m2 = 0.001
# Approximations
room__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_room__K)
std__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_std__K)
co2_ext__ppm = 415
# National averages
weight__kg = 77.5
MET__m3_s_1_p_1 = MET__mL_min_1_kg_1_p_1 \
* weight__kg / (s_min_1 * mL_m_3)
MET_mol_s_1_p_1 = MET__m3_s_1_p_1 * std__mol_m_3
co2_o2 = 0.894
co2__mol0_p_1_s_1 = co2_o2 * desk_work__MET * MET_mol_s_1_p_1
# Room averages
wind__m_s_1 = 3.0
# GEKKO Manipulated Variables: measured values
occupancy__p = m.MV(value = occupancy__p)
occupancy__p.STATUS = 0; occupancy__p.FSTATUS = 1
# Strategy I:
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# Strategy II:
#valve_frac__0 = m.FV(value = df_learn.valve_frac__0.values)
#valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# GEKKO Control Varibale (predicted variable)
co2__ppm = m.CV(value = co2__ppm_meas)
co2__ppm.STATUS = 1; co2__ppm.FSTATUS = 1
# GEKKO - Equations
co2_loss__ppm_s_1 = m.Intermediate((co2__ppm - co2_ext__ppm) \
* (vent_max__m3_s_1 * valve_frac__0 \
+ wind__m_s_1 * infilt__m2) / room__m3)
co2_gain_mol0_s_1 = m.Intermediate(occupancy__p \
* co2__mol0_p_1_s_1 / (room__m3 * room__mol_m_3))
co2_gain__ppm_s_1 = m.Intermediate(co2_gain_mol0_s_1 * million)
m.Equation(co2__ppm.dt() == co2_gain__ppm_s_1 - co2_loss__ppm_s_1)
# GEKKO - Solver setting
m.options.IMODE = 5
m.options.EV_TYPE = 1
m.options.NODES = 2
m.options.SOLVER = 1
m.solve(disp = True)
import matplotlib.pyplot as plt
plt.subplot(2,1,1)
plt.plot(m.time,valve_frac__0.value,'r-',label='Valve Frac')
plt.legend(); plt.grid(); plt.ylabel('Valve Frac')
plt.subplot(2,1,2)
plt.plot(m.time,co2__ppm_meas,'ko',label='Measured')
plt.plot(m.time,co2__ppm.value,'k--',label='Predicted')
plt.legend(); plt.grid()
plt.xlabel('Time'); plt.ylabel('CO2')
plt.savefig('results.png',dpi=300)
plt.show()
For question B, adjust the code to make the valve position fixed at the measured values and the occupancy determined by the optimizer.
occupancy__p = m.MV(value = occupancy__p)
occupancy__p.STATUS = 1; occupancy__p.FSTATUS = 0
# Strategy I:
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
valve_frac__0.STATUS = 0; valve_frac__0.FSTATUS = 1
Use occupancy__p.MV_STEP_HOR = 2 or higher decrease the frequency at which the optimized parameter can change (e.g. every 2 hours).
I was trying to train detectron2 to detect basketball net in a basketball match, but the training ran once after which I started to get the following error.
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Here is the function that I have defined to train my dataset
def train_on_my_dataset():
###Printing out the samples from dataset for verification.
# my_dataset_train_metadata = MetadataCatalog.get("my_dataset_train")
# dataset_dicts = DatasetCatalog.get("my_dataset_train")
# for d in random.sample(dataset_dicts, 3):
# img = cv2.imread(d["file_name"])
# visualizer = Visualizer(img[:, :, ::-1], metadata=my_dataset_train_metadata, scale=0.5)
# vis = visualizer.draw_dataset_dict(d)
# cv2.imshow(" ",vis.get_image()[:, :, ::-1])
# cv2.waitKey(0)
# cv2.destroyAllWindows()
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ("my_dataset_val",)
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 1200
cfg.SOLVER.STEPS = (800, 1000)
cfg.SOLVER.GAMMA = 0.05
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5
cfg.TEST.EVAL_PERIOD = 500
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
In this scenario, I present a box observation with numbers 0, 1 or 2 and shape (1, 10).
The odds for 0 and 2 are 2% each, and 96% for 1.
I want the model to learn to pick the index of any 2 that comes. If it doesn't have a 2, just choose 0.
Bellow is my code:
import numpy as np
import gym
from gym import spaces
from stable_baselines3 import PPO, DQN, A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecFrameStack
action_length = 10
class TestBot(gym.Env):
def __init__(self):
super(TestBot, self).__init__()
self.total_rewards = 0
self.time = 0
self.action_space = spaces.Discrete(action_length)
self.observation_space = spaces.Box(low=0, high=2, shape=(1, action_length), dtype=np.float32)
def generate_next_obs(self):
p = [0.02, 0.02, 0.96]
a = [0, 2, 1]
self.observation = np.random.choice(a, size=(1, action_length), p=p)
if 2 in self.observation[0][1:]:
self.best_reward += 1
def reset(self):
if self.time != 0:
print('Total rewards: ', self.total_rewards, 'Best possible rewards: ', self.best_reward)
self.best_reward = 0
self.time = 0
self.generate_next_obs()
self.total_rewards = 0
self.last_observation = self.observation
return self.observation
def step(self, action):
reward = 0
if action != 0:
last_value = self.last_observation[0][action]
if last_value == 2:
reward = 1
else:
reward = -1
self.time += 1
self.generate_next_obs()
done = self.time == 4096
info = {}
self.last_observation = self.observation
self.total_rewards += reward
return self.observation, reward, done, info
For training, I used the following:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
model = PPO('MlpPolicy', env, verbose=0)
iters = 0
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)
PPO gave the best result, which wasn't so great. It learned to have positive rewards, but took a long time and got stuck in a point far from optimal.
How can I improve the learning of this scenario?
I managed to solve my problem by tunning the PPO parameters.
I had to change the following parameters:
gamma: from 0.99 to 0. It determines the importance of future rewards in the decision-making process. A value of 0 means that only imediate rewards should be considered.
gae_lambda: from 0.95 to 0.65. The gae_lambda parameter in Reinforcement Learning is used in the calculation of the Generalized Advantage Estimation (GAE). GAE is a method for estimating the advantage function in reinforcement learning, which is a measure of how much better a certain action is compared to the average action. A lower value means that PPO doesn't need to use the GAE too much.
clip_range: from 0.2 to function based. It determines the percentage of the decisions that will be done for exploration. At the end, exploration starts to be irrelevant. So, I made a function that uses a high exploration in the first few iteractions and goes to 0 at the end.
I also made a small modification in the environment in order to penalize more the loss of oportunity of picking a number 2 index, but that is done just to accelerate the training.
The following is my final code:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
iters = 0
def clip_range_schedule():
def real_clip_range(progress):
global iters
cr = 0.2
if iters > 20:
cr = 0.0
elif iters > 12:
cr = 0.05
elif iters > 6:
cr = 0.1
return cr
return real_clip_range
model = PPO('MlpPolicy', env, verbose=0, gamma=0.0, gae_lambda=0.65, clip_range=clip_range_schedule())
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)
I am trying to model a translation between two numerical (floating point) datasets and thought of using sequence to sequence learning with teaching enforcement. I am able to run the training model with a decently low mse but when it comes to the inference model, my outputs are really off from the target data or maybe I am inferencing incorrectly. My question is how can we inference floating type values? On the internet, I can find several tutorials which one-hot encode integer type data and draw inference in form of an one-hot encoded vector and then decode it to the predicted integer. But, how can I carry out the same with my data?
My both the datasets are numeric with floating points.
encoder input data =
array([[0. ],
[0.00075804],
[0.00024911],
...,
[0. ],
[0. ],
[0. ]])
I am using a masking layer with 0 as a the start/stop character because my encoder dataset consists of 4096 time steps per sample.
My decoder output data =
array([[0.04930792],
[0.0509621 ],
[0.05045872],
...,
[0.02535375],
[0.02148524],
[0.02867743]], dtype=float32)
Decoder data consists of 8192 time steps per sample.
My decoder input data =
array([[0. ],
[0.04930792],
[0.0509621 ],
...,
[0.01980789],
[0.02535375],
[0.02148524]], dtype=float32)
Decoder also consists of 8192 time steps per sample.
My train model architecture:
encoder_inputs= Input(shape=(max_input_sequence, input_dimension), name='encoder_inputs')
masking = Masking(mask_value= 0)
encoder_inputs_masked = masking(encoder_inputs)
encoder_lstm=LSTM(LSTMoutputDimension,activation='elu', return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs_masked)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, input_dimension), name='decoder_inputs')
decoder_lstm = LSTM(LSTMoutputDimension, activation='elu', return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(input_dimension ,name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
opt = Adam(lr=0.007, clipnorm=1)
model_encoder_training.compile(optimizer=opt, loss='mean_squared_error', metrics=['mse'])
My inference model architecture:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, input_dimension))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, 0] = 0
# target_seq = 0
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_seq = list()
while not stop_condition:
# in a loop
# decode the input to a token/output prediction + required states for context vector
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# convert the token/output prediction to a token/output
# sampled_token_index = np.argmax(output_tokens[0, -1, :])
# sampled_digit = sampled_token_index
# add the predicted token/output to output sequence
decoded_seq.append(output_tokens)
# Exit condition: either hit max length
# or find stop character.
if (
len(decoded_seq) == max_input_sequence):
stop_condition = True
# Update the input target sequence (of length 1)
# with the predicted token/output
# target_seq = np.zeros((1, 1, input_dimension))
# target_seq[0, 0, sampled_token_index] = 1.
target_seq = output_tokens
# Update input states (context vector)
# with the ouputed states
states_value = [h, c]
# loop back.....
# when loop exists return the output sequence
return decoded_seq
sampleNo = 1
# for sample in range(0,sampleNo):
for sample in range(0,sampleNo):
predicted= decode_sequence(encoder_input_data[sample].reshape(1,max_input_sequence,input_dimension))
# store.append(predicted)
So far, I have tried playing out with different activation functions for the Dense layer output but to no luck, nothing seems to be working the way I expect it to. Any sort of suggestions or help will be greatly appreciated!
Following a subsampling inside of resampling procedure, as exemplified here https://topepo.github.io/caret/subsampling-for-class-imbalances.html#subsampling-during-resampling my question is simply how to extract the actual data-set resulting from this procedure when the caret method = “rf” and the sampling method is “smote”.
If, for example, method= glm is used then the data can be extracted with model$finalModel$data; if the method = “rpart” the data can be similarly extracted with model$finalModel$call$data.
Using subsampling inside of resampling and the method = rpart the smote data-set can be extrated as follows:
library(caret)
library(DMwR)
data("GermanCredit")
set.seed(122)
index1<-createDataPartition(GermanCredit$Class, p=.7, list = FALSE)
training<-GermanCredit[index1, ]
#testing<-GermanCredit[-index1,]
colnames(training)
metric <- "ROC"
ctrl1<- trainControl(
method = "repeatedcv",
number = 10,
repeats = 5,
search = "random",
classProbs = TRUE, # note class probabilities included
savePredictions = T, #"final"
returnResamp = "final",
allowParallel = TRUE,
summaryFunction = twoClassSummary,
sampling = "smote")
set.seed(1)
mod_fit<-train(Class ~ Age +
ForeignWorker +
Property.RealEstate +
Housing.Own +
CreditHistory.Critical, data=training, method="rpart",
metric = metric,
trControl= ctrl1)
mod_fit # ROC 0.5951215
dat_smote<- mod_fit$finalModel$call$data
table(dat_smote$.outcome)
# Bad Good
# 630 840
head(dat_smote)
# Age ForeignWorker Property.RealEstate Housing.Own CreditHistory.Critical .outcome
# 40 1 0 1 1 Good
# 29 1 0 0 0 Good
# 37 1 1 0 1 Good
# 47 1 0 0 0 Good
# 53 1 0 1 0 Good
# 29 1 0 1 0 Good
I simply would like to be able to perform the same data-set extraction when the method = "rf". The code might look like this
dat<- mod_fit$trainingData[mod_fit$trainingData == mod_fit$finalModel$x,]
I think that the only way to do it is to write a custom model that saves the data object in the fit module (that's pretty unsatisfying though).